Abstract
Introduction
Urinary incontinence (UI) is a relatively common condition that is gaining wider attention because of its potential for significant negative physical and psychological effects on women.1,2 According to an authoritative review of the literature published in recent years, the results of worldwide epidemiologic surveys suggested a prevalence of UI ranging from 5% to 72%, with large fluctuations due to differences in the populations interviewed, survey methodologies, and diagnostic criteria. 3 A recent large-scale epidemiological survey from China showed that 24.8% of females aged 20–70 suffers from UI. 4 In reality, however, the true prevalence of UI has been difficult to assess correctly because the condition is often underestimated and undiagnosed, and a significant proportion of women hold the incorrect view that urinary leakage is a natural process that accompanies aging rather than a disease, suggesting that the number of women in the unreported group afflicted with UI may be considerable. 5 How to help these women out is an urgent and challenging task for urogynecologists, and supervised pelvic floor muscle training (PFMT) has been reported to potentially reduce the prevalence of UI in specific populations.6,7 A variety of risk factors, such as obesity, multiple vaginal deliveries, advanced age, and instrumented delivery, are closely related to the occurrence of UI. 8 However, it is difficult to accurately assess the risk of UI by only obtaining rough information on such risk factors, and blindly implementing early prevention measures without accurately identifying high-risk individuals may lead to a waste of healthcare resources and unnecessary treatment costs, which ultimately leads to the ineffectiveness of early prevention and intervention strategies.9,10
Based on the abovementioned, the application of technological tools capable of recognizing the risk of urine leakage in women is constructive and practical. In the past decade, artificial intelligence (AI) has been increasingly applied to various scientific fields, and healthcare is one of the important branches. Such emerging technologies are centered on machine learning (ML) algorithms, which play an increasingly important role in the prediction, prevention, diagnosis, and decision-making of clinical treatment programs for diseases. More importantly, the increasing electronic, digital, and automation degree of healthcare system provides a broader potential application scenario for ML. There is a large volume of literature and reviews on the application of ML in the field of healthcare, covering a wide range of medical specialties from urology, cardiovascular, and geriatrics, suggesting a broad and growing interest in the application of ML in the medical community.11–15
Notably, the application of ML in the field of obstetrics and gynecology is quite extensive, covering health care during pregnancy to decision-making on surgical protocols and prognosis, indicating its potential for significant predictive power.16–19 Despite the high frequency of AI and ML in the literature, it is worth noting that research on their application to female UI is still uncommon. Theoretically, ML has tremendous application prospects in the field of UI prevention and treatment, through the extraction of women's individual characteristics, clinical data and follow-up results, analysis and screening of high-risk factors for diseases, increasing the accuracy of prediction of the occurrence and prognosis of diseases, and assisting the doctors and patients to make clinical decisions together with a view to obtaining better preventive and therapeutic effects. However, clinicians still face non-negligible challenges in applying ML to solve UI problems, which resulted from their lack of systematic understanding of ML technologies and the imperfections of existing ML technologies.
The purpose of this scoping review is to comprehensively map the literature from the last decade (2013–2023) concerning the application of ML techniques in addressing female UI. By cataloging and assessing the breadth of existing research, this review aims to identify key strengths and limitations within the field, highlight existing knowledge gaps, and offer informed insights to guide future research endeavors and clinical applications.
Methods
Based on the significant heterogeneity of current published research in this field, this study adopted a scoping review, which is a powerful tool to better articulate the current state of research in the field. This study was conducted following the specification requirements of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses for Scoping Review (PRISMA-ScR). 20
Search strategy and eligibility
Based on the following keywords we assembled a specialized search character set: [Urinary incontinence] and [(Machine learning) or (Predict) or (Prediction model)]. Subsequently, the Medline, Google Scholar, PubMed, and Web of Science databases were sequentially searched for all relevant papers published in the decade from 2013 to 2023. Inclusion criteria included (1) It is necessary to use ML algorithms in the intervention process of female UI, such as prevention, diagnosis, or prognosis; (2) These studies need to evaluate the models developed and report the corresponding metrics. Exclusion criteria included: (1) Reviews, commentaries, abstracts, book chapters, and animal experimental studies were excluded from this study; (2) The study subjects were not adult females or the topic of the study was not UI; (3) Indicators of model assessment were not reported; and (4) Literature that was not in English or for which the full text was not available.
Study selection and data extraction
Two authors (Q.W. and X.X.W.) conducted a literature search and review based on the inclusion and exclusion criteria described above: first, the titles, abstracts, keywords, and conclusions of the articles were read to screen for potentially eligible studies. Each potentially eligible study was thoroughly assessed by full-text reading to determine that it met the inclusion criteria. Disagreements regarding whether or not they qualified were considered by additional reviewers (X.X.J and C.Q.L) and resulted in a consensus opinion. References of included studies were also manually screened to avoid the presence of omitted potentially eligible studies. The search for this scoping review was completed in March 2024.
We extracted the following data from the finalized included studies: year of publication, type of study, purpose of study, sample size, and type of ML algorithm. Depending on the purpose of the study, we will further detail the following relevant information: type of incontinence, input variables, model establishment and validation methods, evaluation metrics, and the method of model visualization. It is worth noting that if the study builds multiple models, the effectiveness metrics of the best performing model will be included in the final table.
Results
The review team found a total of 798 relevant records from Medline, Google Scholar, PubMed, and Web of Science, with 102 papers remaining after removal of duplicate documents and initial screening. After full-text review, the review team finally included 23 literatures. The process of searching, identifying, and screening the literature is shown in Figure 1. Over the time period of this scoping study (2013–2023), the volume of published literature demonstrated a significant upward trend, with 87.0% (20/23) of the literature published in 2018–2023, indicating the rapidly rising interest in this topic among the scientific community. Table 1 presented information specific to the included studies, including year of publication, type of study, purpose, sample size, type of ML algorithm, study objectives, and assessment of the performance of the model created.

Flowchart of study identification and inclusion.
Overview of studies on applications of machine learning in female urinary incontinence.
Abbreviations: NR: not reported; AUC: areas under the curve; SE: sensitivity; SP: specificity; RKKS: reproducing kernel Krein space; TVT: tension free vaginal tape; MRI: magnetic resonance imaging; XGBoost: extreme gradient boosting.
Based on the purpose of the study, these studies can be categorized into four categories: predicting postpartum and pregnancy UI (9, 39.1%), predicting postoperative de novo UI (8, 34.8%), predicting the outcome of UI treatment (3, 13.0%)和 assisted Diagnostics of UI (3, 13.0%). Retrospective and prospective studies were included in roughly equal proportions (47.8% vs. 52.2%), and the sample sizes included in these studies ranged from 77 to 3051, with a mean sample size of 620. The ML algorithms adopted included Fisher linear discriminant analysis, random forests, logistic regression, convolutional neural network, and XGBoost (eXtreme gradient boosting), but logistic regression clearly dominated among them as 91.3% (21/23) of the studies selected to apply logistic regression. It is particularly noteworthy that the two studies which did not use logistic regression employed ultrasound parameters and ML algorithms to aid in the diagnosis of UI.
The input variables varied greatly among these models, depending on the target and purpose of the prediction, and could be broadly categorized into patient characteristics, clinical data, ultrasound and urodynamic parameters, and so on. These studies took a variety of methods to evaluate the model prediction efficacy, including setting up the validation set independently, K-fold cross-validation, and bootstrapping method. All of these models reported area under receiver operating characteristic curve (AUC) with values fluctuating from 0.59 to 0.95, unfortunately more than half of the studies (12/23, 52.2%) did not disclose further details of model efficacy such as sensitivity and specificity, the remaining models reported sensitivity fluctuating from 20% to 96.2% and specificity fluctuating from 59.8% to 94.5%, it is worth noting that the unsatisfactory data were all from external validation studies applying additional cohorts to the original models. The following is a detailed analysis of each of the four categories of models according to research intent:
Predicting postpartum and pregnancy UI
A total of nine papers21,26,30,36–38,40–42 applied ML to develop models to predict the occurrence of UI during pregnancy and postpartum. As demonstrated in Table 2, all of these literatures employed logistic regression, and the predictors were mostly stress UI (SUI), but they had a variety of different input predictors, which can be summarized and categorized into the following categories, such as basic characteristics including (age, body mass index [BMI], race, level of education, and income), previous maternal history (parity, mode of delivery, infant weight, forceps deliveries, and the presence of urinary leakage during pregnancy), and auxiliary findings (ultrasound and magnetic resonance imaging [MRI]). Chen et al. 26 established a prediction model by combining clinical data and ultrasound parameters, and the final variables included in the model included bladder neck (BN) funneling and β angle at rest, in addition to BMI gain, constipation, previous delivery mode, and the model achieved an AUC of 0.79 with a sensitivity of 78.7% and a specificity of 69.3%.
Predicting postpartum and pregnancy UI.
Abbreviations: UI: urinary incontinence; SUI: stress urinary incontinence; ML: machine learning; NR: not reported; AUC: areas under the curve; SE: sensitivity; SP: specificity; LR: logistic regression; SUI: stress urinary incontinence; BMI: body mass index; BN: bladder neck; BND: bladder neck descent; MRI: magnetic resonance imaging.
As the only study that adopted MRI parameters, You et al. 40 developed a model using MRI measurements of retrovesicourethral angle during straining, functional urethral length during straining, bladder funnel. After cross-validation, the model reached an AUC of 0.95, a sensitivity of 96.2%, and a specificity of 86.4%, which were the best among the models in this series, suggesting that the application of MRI imaging may have potential advantages and prospects for the prediction of postpartum UI, but unfortunately, this study did not give a visual and interpretable approach, whereas all other models in the series used a nomogram approach to make the model interpretable. It is also difficult to compare the predictive efficacy of the models because of the great variety of variables included in each model.
Predicting postoperative de novo UI
Eight studies were conducted on predicting de novo UI after pelvic floor repair surgery,22–25,28,31,35,43 most of which focused on SUI, and the variables included in the models can be categorized into two categories: basic characteristics (including age, BMI, and number of vaginal deliveries), perioperative clinical data (e.g., preoperative stress test and leakage, type of prolapse surgery, and concomitant anti-incontinence surgery), and detailed data are summarized in Table 3. Notably, in 2014, Jelovsek et al. 22 first successfully constructed a model to predict de novo SUI 12 months after pelvic floor repair using the aforementioned variables and logistic regression, and its AUC reached 0.73 and 0.62 after internal and external validation, respectively. The researchers visualized this model in the form of a nomogram, since then, scholars have carried out external validation of this model based on multiple cohorts in different locations,25,28,31,35 and the AUC fluctuated from 0.56 to 0.69, making this model the most frequently externally validated model in the field. In addition to this, Oh et al. 35 validated the original model using a prospective cohort and developed a novel model based on this cohort, which showed a significant increase in AUC value compared to the original model (0.74 vs. 0.63). In other independent studies, age, BMI, and parity were the most frequent predictors, while logistic regression remained the most common algorithm, with AUCs for these models fluctuating from 0.70 to 0.79.
Predicting postoperative de novo UI.
Abbreviations: UI: urinary incontinence; ML: machine learning; NR: not reported; AUC: areas under the curve; SE: sensitivity; SP: specificity; LR: logistic regression; RF: random forest; XGBoost: extreme gradient boosting; POP-Q: pelvic organ prolapse quantification; MUS: midurethral sling.
Predicting the outcome of UI treatment.
Abbreviations: UI: urinary incontinence; SUI: stress urinary incontinence; UUI: urgency urinary incontinence; ML: machine learning; NR: not reported; AUC: areas under the curve; SE: sensitivity; SP: specificity; LR: logistic regression; PFM: pelvic floor muscle; ICIQ-UI-SF: the International Consultation on Incontinence Questionnaire Urinary Incontinence Short Form.
Predicting the outcome of UI treatment
As demonstrated in Table 4, only three studies32,33,39 were relevant for predicting the effect of UI treatment, but they all focused on different branches. Zhong et al. 32 incorporated urodynamic examination parameters such as maximal urethral closure pressure and Valsalva leak point pressure in establishing a model to predict the efficacy of anti-incontinence surgery, and the resulting model had an AUC of 0.69. Another study predicted the therapeutic effect of PFMT, and the final inclusion in the model was the following four predictors: the International Consultation on Urinary Incontinence Questionnaire Urinary Incontinence Short Form, pelvic floor muscle tone, BN height during quiet standing, and BN height during standing cough, with the model having an AUC of 0. 80, a sensitivity of 70%, and a specificity of 75%. 33 The last study predicted the corresponding efficacy of responding to urgency UI (UUI) with pharmacological, conservative, and invasive treatments, respectively, with a model AUC of 0.70, which may be helpful in the choice of treatment for UUI, but unfortunately this study did not provide a visualization tool.
Assisted diagnostics of UI
Only three studies27,29,34 explored the possibility of applying ML to assist in the diagnosis of UI, and there are some commonalities in these studies, such as the predicted events all assisting in the diagnosis of SUI, and it is clear that ultrasound parameters were quite important for these studies, details are shown in Table 5. Xiao et al. 27 developed several prediction models by utilizing different combinations of four ultrasound parameters with good results, namely, BN position on maximal Valsalva maneuver, levator hiatus area on maximal Valsalva maneuver, BN descent, urethral rotation angle, and finally the model with all variables was validated to have the best predictive efficacy, with an AUC of 0.82, a sensitivity of 60.5% and a specificity of 94.5%. Keshavarz et al. 29 also utilized ultrasound parameters for prediction, with the difference that their model was simpler, and their results found that a β angle higher than 127° with the Valsalva maneuver, was a strong predictor with an AUC of 0.89, with 89% sensitivity and 79% specificity. The last study 34 applied convolutional neural network algorithm to build an AI image recognition system to predict the occurrence of SUI by recognizing ultrasound images, this simple and pioneering method has good predictive efficacy with AUC of 0.92, sensitivity of 75.0% and specificity of 92.3%.
Assisted diagnostics of UI.
Abbreviations: UI: urinary incontinence; SUI: stress urinary incontinence; ML: machine learning; NR: not reported; AUC: areas under the curve; SE: sensitivity; SP: specificity; LR: logistic regression; FLDA: Fisher linear discriminant analysis; CNN: convolutional neural network; BNP: bladder neck position on maximal Valsalva maneuver; LHA: levator hiatus area on maximal Valsalva maneuver; BND: bladder neck descent; URA: urethral rotation angle.
Discussion
The pursuit of human beings to predict the occurrence and prognosis of diseases has a long history, from the ancient times when phenomena were summarized as experiences, and the development of statistics in the last century which provided more effective methods for this purpose, until in the last decade, the high-speed development of AI and big data systems has greatly assisted health professionals in exploring the intrinsic developmental patterns of specific diseases. Since then, an increasing number of predictive models based on complex databases and diverse information emerged in anticipation of providing more accurate predictions of disease prevention, treatment, prognosis, and follow-up.44–48
Since the latest review on predicting female UI was published nearly a decade ago,49,50 an update on this is necessary, and this study provides an up-to-date overview of the application of ML algorithms and techniques to predicting female UI. Because there are multiple domains of female UI, with marked differences in the purpose, applicable populations, and methodologies of the corresponding models, as well as significant heterogeneity in the variables entered, it is difficult to conduct a systematic review and meta-analysis of the literature for this category. In order to provide a more in-depth and intuitive understanding of these literatures, this study recorded a series of detailed information about these models: year of publication, type of study, purpose, sample size, type of ML algorithm, objectives of the study, and an assessment of the performance of the constructed models. Based on the purpose of the predictive models, this review classified the included literature into four categories, which were further analyzed in detail for their subtypes of predicted UI, input variables, validation methods, and visualization pathways.
Through this scoping review, we found that sociodemographic factors such as age, BMI, education, and income level proved to be important in predicting de novo UI in the postpartum period and after pelvic floor repair surgery, and that equally important predictors of postpartum UI were obstetrical-related factors such as mode of delivery, infant weight, and instrumented or noninstrumented delivery, and the presence of urinary leakage during pregnancy. Ultrasound parameters could also assist in predicting the development of postpartum UI. Although there was less literature on assisting in the diagnosis of UI, it was also possible to intuitively realize the crucial role that ultrasound and MRI examinations play in this program, and such studies can intelligently analyze ultrasound and MRI images to get an accurate prediction. In addition to this, some studies have combined clinical data and ultrasound parameters to create models that can be used to predict the outcome of UI patients after undergoing surgery, drugs, and PFMT, respectively.
This is a timely scoping review that provides clinicians and analysts with a broader and deeper understanding of the application of ML to female UI, and the fact that the majority of studies (87.0%) were published in the last five years is indicative of the growing interest in this topic in the medical community. Currently, ML-based UI prediction studies have focused on the assessment of UI risk in specific populations (e.g., postpartum and after pelvic floor repair surgeries), and these ML-based prediction models are very interesting for both pregnant women (patients) and physicians, as they have the opportunity to improve the prognosis of the patients by identifying the corresponding high-risk populations and delivering early interventions to improve their quality of life as well as to reduce the associated healthcare costs. For health service providers, empowering them with simple and more accurate primary screening tools can enable precise management of high-risk populations, undoubtedly contributing to the saving of medical resources while improving the effectiveness of interventions. At the same time, the development of telemedicine assessments, telemedicine, and the proliferation of wearable devices may facilitate the collection and processing of medical data, and such solutions may be of great importance to women living in developing countries and in rural areas, where access to health care may be limited mainly by socioeconomic factors.
Although the application of ML modeling for UI diagnosis and efficacy assessment is still relatively limited, it is not surprising that this topic will be the most promising development in the field in the coming years. The clinical decision-making process of humans can often be accompanied by errors, biases, or shackled by personal experience paths, 51 and assisted decision-making systems based on predictive modeling can help clinicians reduce the risk of misdiagnosis and incorrect treatment. From the patient's perspective, when consulting with a physician about treatment options, compared with simply being informed of possible risk factors and prognosis in general, predictive modeling can provide individualized risk and prognosis information based on the patient's situation, which is undoubtedly more conducive to the patient's understanding of his or her own situation and decision-making together with the physician.
The application of ML to female UI is still in its infancy, although some of its current clinical applications show its future potential to revolutionize the prevention, diagnosis, treatment, and prognosis of UI in women. In order for the corresponding predictive model to be suitable for wider clinical implementation, it must be accurate and generalizable. There are several important shortcomings that should not be overlooked when using ML techniques to achieve these goals.
The first is that the process of model development, validation, and evaluation should follow appropriate standards. As more and more predictive models emerged, the need to develop standards to increase the accuracy and credibility of research results became more and more urgent. Because some ML algorithms are often considered a black box because it is difficult to explain how a prediction is derived, 52 the lack of transparency in the process of model development and validation will undoubtedly weaken the credibility of the model's prediction results and reduce clinicians’ willingness to apply the model. An important milestone was the publication of the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) guidelines in 2015, 53 a guideline that sought to ensure that researchers provide sufficient information when reporting on prediction modeling studies to enable others to understand, evaluate, and reproduce the results. By following the TRIPOD guidelines, researchers can improve the reliability and transparency of predictive models and promote their effective application in healthcare and other fields. Unfortunately, there are still many studies that lack sufficiently transparent descriptions of model development and the necessary evaluation metrics in the model evaluation stage. More than half of the literature in this review (12/23, 52.2%) only reported the AUC of the model and lacked further details about the model performance, such as sensitivity, specificity, accuracy, positive predictive value, and negative predictive value, which undoubtedly weakened the credibility of the model.
Secondly, although a few studies have combined multiple ML algorithms, overall, the current ML algorithms applied to the field of female UI are very homogeneous, and the proportion of studies that applied logistic regression in this review was as high as 91.3% (21/23). Although this approach has been widely used, it has certain deficiencies in dealing with multidimensional data and feature interactions, and this deficiency cannot be ignored in the era of big data. In fact, emerging ML algorithms in recent years, such as XGBoost, support vector machines, and convolutional neural network, have demonstrated excellent performance in prediction tasks in other medical scenarios.34,54–56 Applying multiple algorithms to model separately and then selecting the best performer from them seems to be an ideal solution.
By screening the existing literature, we found that many models (43.5%, 10/23) on female UI prediction lacked effective visualization tools. This phenomenon limits the generalization and application of predictive models in clinical practice. Often, the use of visualization tools such as nomogram or web calculators can make complex statistical models intuitive and understandable, thus facilitating their use by clinicians and researchers.
In addition, the visualization of the model helps other researchers to externally validate and improve the original model. External validation is an indispensable step in determining the reliability and applicability of a model, which means validating the predictive model using at least one other dataset separate from the development dataset. While the methods of internal validation are well established, this does not replace the role of external validation. While the population for which a predictive model is applicable should be explicitly characterized, ideally, predictive models should be applicable to patients from a wide range of races, ethnicities, and backgrounds that are common in clinical practice. The predictive model for postoperative de novo UI developed by Jelovsek et al. 22 in 2014 was validated by several external cohorts worldwide,25,28,31,35 the key reason for which is that the availability of a nomogram in this study allowed for a significant enhancement of the model's usability. Therefore, we suggest that future studies should emphasize and incorporate effective visualization methods when developing predictive models in order to improve the usability and interpretability of the models and facilitate their widespread use and validation in clinical work.
Conclusion
This review provides a timely summary of the current status of the development of applying ML to female UI. The increasing interest of the scientific community in applying ML techniques to the prevention, diagnosis, treatment, and prognosis phases of female UI suggests that ML may have a meaningful impact on the field of female UI. Future research in this field should employ more diverse ML algorithms while providing clearer and more transparent descriptions of the model building process and validated results. These models also need to provide effective visualization tools to facilitate large-scale external validation to ensure the applicability of the models. To quote a popular saying, the future mass adoption of AI and predictive modeling will not replace the role of the specialist, but those who understand and can use AI and predictive modeling techniques may replace those who cannot.
Supplemental Material
sj-docx-1-dhj-10.1177_20552076241281450 - Supplemental material for Machine learning in female urinary incontinence: A scoping review
Supplemental material, sj-docx-1-dhj-10.1177_20552076241281450 for Machine learning in female urinary incontinence: A scoping review by Qi Wang, Xiaoxiao Wang, Xiaoxiang Jiang and Chaoqin Lin in DIGITAL HEALTH
Footnotes
Acknowledgements
Contributorship
Data availability
Declaration of conflicting interests
Ethical approval
Funding
Guarantor
Supplemental material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
