Abstract
Keywords
Introduction
The core objective of medical risk prediction is to identify and quantify the risks that individuals face regarding disease occurrence, disease progression, recurrence, and treatment-related complications, in order to facilitate earlier interventions and more precise clinical decision-making. 1 Traditional risk prediction methods primarily rely on statistical models that analyze specific risk factors such as age, family history, and lifestyle choices. 2 However, these conventional approaches exhibit significant limitations when addressing high-dimensional, nonlinear, and complex health data, making it challenging to comprehensively reveal potential risk factors and their interactions. 3
Artificial Intelligence (AI), as an interdisciplinary field, encompasses multiple domains, including expert systems, machine learning, robotics, decision support systems, and pattern recognition, aiming to enhance decision support capabilities through the simulation and extension of human intelligence.4,5 Artificial intelligence–based medical risk prediction tools demonstrate immense potential in improving the accuracy of risk assessments, assisting in early disease diagnosis, and formulating personalized treatment plans. 6 Advanced technologies, such as machine learning and deep learning, excel in modeling complex nonlinear relationships, enabling the identification of medical risk factors that human experts may overlook within vast datasets. 7 The application of these AI models has expanded across various fields, including cardiovascular diseases, cancers, neurodegenerative diseases, and infectious diseases, allowing for predictions of initial disease occurrence risks as well as dynamic assessments of disease progression, recurrence, and treatment-related complications.7–9 By integrating demographic data, clinical biomarkers, imaging characteristics, and behavioral patterns into a unified predictive framework, these models achieve highly personalized risk assessments, demonstrating significant advantages in the early identification of critical and high-recurrence diseases.10,11
Although AI shows great potential in medical risk prediction, the topics and trends of existing research are still not clear enough. Some scholars have analyzed the application of AI medical risk prediction for a certain disease, such as Teguede al. 12 and Drazen et al., 13 respectively, analyzed the risk prediction application of AI in pulmonary arterial hypertension and infectious diseases. These studies focus on specific fields and lack a systematic review of the overall layout, international trends, and interdisciplinary collaborative development of AI in medical risk prediction. 14 Therefore, it is urgent to comprehensively analyze the relevant research from a global perspective, not only systematically summarize the latest progress and key research context but also put forward more forward-looking guidance for future development directions and potential research gaps. Bibliometrics is a research method that employs mathematical and statistical techniques to quantitatively analyze scientific publications, revealing development trends, structural distributions, and intrinsic relationships within specific fields by assessing characteristics such as publication quantity, authorship, and journal distribution. 15 This study employs bibliometric methods to systematically analyze research related to AI in medical risk prediction, summarize the field's development and current status, identify key research hotspots, explore future trends and challenges, and provide a scientific foundation and guidance for researchers and decision-makers.
Methods
Data sources and search strategy
This bibliometric analysis followed the guidelines for reporting bibliometric reviews of biomedical literature (BIBLIO). 13 The publications used in this study were sourced from the Science Citation Index (SCI) and the Social Sciences Citation Index (SSCI) in the Web of Science (WOS) Core Collection. The WOS is a renowned database, widely recognized for its high-quality resources and extensive multidisciplinary coverage. 14 Using a single data source can ensure data consistency and uniformity, alleviating potential problems caused by differences in data format and quality between different data sources. 12 The literature search for this study was conducted within the selected WOS Core Collection database, covering the period from its inception to the date of November 18, 2024. We entered the retrieval search string by combing keywords with Boolean operators: (TS = (artificial intelligence OR machine intelligence OR robot technology OR computational intelligence OR computer reasoning OR deep learning OR computer vision system OR neural network* OR data learning OR natural language processing OR support vector machine* OR decision tree*OR bayesian network*OR intelligent learning OR feature* learning OR feature* extraction OR time series analysis OR reinforcement learning OR logistic regression OR recurrent neural network OR long short-term memory OR transformer OR self-attention mechanism OR generative adversarial network OR word embeddings OR sentiment analysis OR deep q network OR k-means clustering OR graph attention network OR bayesian networks OR probabilistic graph models)) AND TS = ((rtificial intellediction” OR “health risk assessment” OR “disease risk prediction” OR “clinical risk prediction” OR “patient risk assessment” OR “healthcare risk prediction” OR “predictive modeling in healthcare” OR “disease forecasting” OR “(rtificial intellediction” OR stic models” OR “mortality risk prediction” OR “readmission risk prediction” OR “comorbidity risk prediction” OR “adverse event prediction” OR “chronic disease prediction” OR “cardiovascular risk prediction” OR “cancer risk prediction” OR “diabetes risk prediction” OR “infection risk prediction” OR “health risk stratification” OR “medical risk models” OR “early disease detection” OR “predictive analytics in medicine” OR “machine learning in risk assessment” OR “AI in risk prediction” OR “health data analytics” OR “clinical outcome prediction” OR “patient outcome prediction” OR “health screening prediction” OR “preventive health prediction” OR “population health risk prediction” OR “risk scoring systems” OR “personalized risk prediction” OR “health deterioration prediction” OR “clinical deterioration prediction” OR “complication prediction” OR “disease progression prediction” OR “risk modeling in healthcare” OR “risk prediction algorithm” OR “risk factor analysis in medicine” OR “medical risk stratification” OR “risk scoring in healthcare” OR “epidemiological risk prediction”))
Selection process
Publications were initially screened by the research team, and titles and abstracts were reviewed against the following inclusion and exclusion criteria. Inclusion criteria for this study were as follows: (1) studies involving AI; (2) studies focusing on medical risk prediction; and (3) publications in the form of peer-reviewed journal papers. Exclusion criteria included publications unrelated to the study topic, as well as specific types of literature such as corrections, letters, retracted articles, book chapters, book reviews, and conference abstracts. Full-text articles were then selected for further evaluation to ensure that they met all required criteria. Any disagreements that arose during the screening process were resolved through team discussions to maintain consistency and rigor.
A total of 2929 publications were initially retrieved. After excluding 334 articles that did not meet the criteria for the type of study, the remaining 2595 documents were imported into the web-based Rayyan platform. 15 After reviewing the titles and abstracts, 515 articles were excluded, leaving 2080 articles for inclusion in this study. Figure 1 shows the flowchart of literature screening and research framework.

Flowchart of the literature-screening process and research framework.
Data analysis
Bibliometric analysis
Bibliometric analysis, as initially defined by Pritchard, 16 employs mathematical and statistical methods to systematically evaluate scholarly publications and other forms of academic communication. This approach investigates research trends and the structural framework of knowledge within specific domains, providing quantifiable and objective insights. 17 Widely utilized as a quantitative tool, bibliometrics facilitates the identification of emerging research topics and assesses the contributions of individual researchers, academic journals, and nations mprehend the current research landscape, distribution patterns, and central themes within their fields of interest. 18 By mapping the evolution of scientific disciplines, recognizing influential works and authors, and identifying gaps in the literature, bibliometric analysis plays a crucial role in guiding future research directions and informing policy-making.
Data analysis
We employed scientific mapping tools to conduct our bibliometric analysis. Currently, widely used tools in this domain include VOSviewer, 19 CiteSpace, 20 BibExcel, 20 and HistCite. 21 For our analysis, we selected VOSviewer (version 1.6.20) and CiteSpace (version 6.4.R1) due to their robust functionalities. VOSviewer offers powerful network analysis algorithms, including centrality analysis, cluster analysis, and module detection, alongside data cleaning capabilities that eliminate errors and duplicate information, thereby ensuring data quality. 19 We utilized VOSviewer for conducting country analysis, journal analysis, and author analysis. CiteSpace, on the other hand, constructs knowledge graphs of network structures by analyzing citation relationships between documents and employs clustering algorithms to categorize nodes. 22 By continuously optimizing the network structure and adjusting parameters, CiteSpace generates more accurate knowledge graphs, facilitating deep exploration of associations between documents. We primarily used CiteSpace for keyword analysis and identifying emerging research trends.
Additionally, we incorporated RStudio (version 4.4.1), 23 Scimago Graphica (version 1.0.44), 24 and Pajek (version 5.19) 25 for supplementary visualization tasks. These tools collectively enhanced our ability to visualize complex bibliometric data, providing comprehensive insights into the research landscape.
Ethical considerations
The data used in this study were sourced from the WOS Core Collection, and no patients or members of the public were involved in this research.
Result
Development history and future trend of publications
Figure 2 plots the annual trends in publications in the field of AI in medical risk prediction and projects future trends by the actual number of publications (solid blue dots) versus a linear fit (dashed line) and an exponential fit (solid line). From 1986 through 2004, the number of annual publications in the field was less than 10, in the early stages of the field; 2005 through 2020, in the midstage of the field, the number of publications began to grow significantly, at a significantly faster rate. Although the linear model can reflect part of the growth trend, the fit with the exponential model is better, reflecting the increasing heat of the field year by year. This stage is a period of rapid development of the field, thanks to the maturity of deep learning technology, the improvement of computational power, and the development of digital medicine; 2021 to present is the late stage of the field, with an accelerated growth in the number of publications, and the exponential model significantly outperforms the linear model, especially after 2021, when the field of research enters the maturity stage of rapid expansion (

Artificial intelligence in medical risk prediction: publication growth and trend forecasting.
Analysis of authors and coauthorship networks
A total of 13,872 authors contributed to the 2080 studies. An analysis of authors in the field provides an insight into the representative scholars and research power distribution in the field. Price points out that half the papers in a subject area are written by a group of highly productive authors, which is roughly equal to the square root of the total number of authors
26
:
In the formula (1),
Here m ≈2.485, therefore, authors who published more than three times (including three times) were identified as the core authors in this field. A total of 211 authors participated in the publication of 782 papers, accounting for 37.60% of the total number of papers published, but did not reach half of the total number of papers published. Then we speculate that the distribution of author productivity in this field is very uneven, with a few core authors contributing a large amount of literature, while the majority of authors contribute less, and a mature productivity pattern has not yet formed, which may be related to the rapid progress and development of AI technology and medical technology. Table 1 shows the top 10 authors, and Steyerberg is the author with the largest number of publications. His research includes the development of predictive models for breast cancer, prostate cancer, esophageal cancer, and brain injury. Quality control, internal validation, performance improvement, and research strategies of medical prediction models.27–32 Meanwhile, we found that seven authors, such as D'Andrea and Laukhtina, had the same number of published papers and citations, and we hypothesized that these seven formed a relatively mature cooperative relationship, which was verified in the author cooperative network analysis.
The most important author in the field of the application of artificial intelligence in medical risk prediction.
Finally, we conducted a collaborative network analysis of authors who participated in at least three studies. Figure 3 illustrates the collaborative relationships among authors in the field, with different colors representing different groups of authors. Through the analysis, we found that there were many authors who did not form a cooperative network, which is one of the reasons for the large number of gray dots in the figure. Further analysis of the largest cooperative cluster (the red cluster) revealed that the seven authors mentioned above formed a cooperative network. Their research mainly focused on the use of AI in urological disease risk prediction.

Author collaboration network in the field of the application of artificial intelligence in medical risk prediction.
Analyzing journal publications for publication and citations
The research results included in this study were published in 956 journals, and Table 2 shows the top 10 journals in terms of publication volume. Scientific Reports published the most studies (
The distribution of the bibliographic records by top 10 (by quantity) journals.
Analysis of countries’ publication outputs and cooperation
We analyzed productivity in different countries to reveal patterns of publication in the field. A total of 101 countries contributed related research. Table 3 lists the top 10 countries in the number of publications. The United States has the largest number of publications (
Top 10 productive countries and citations per country.
Figure 4 shows the national cooperation network in this field, and it is obvious that the United States and China are the countries with the most extensive cooperation. Their circles in the figure are the largest and full of connecting lines, indicating that these two countries have played a leading role in this field, and the United Kingdom is next to the two countries. Many marginal countries are generated in this figure, and their number of publications and cooperation networks are relatively small, indicating that the field is not popular in these countries.

National collaborative network for the application of artificial intelligence in medical risk prediction.
Research trend analysis
The frequency of keywords is one of the important indicators of research hotspots, which can help researchers better understand the hotspots and trends of AI in the field of medical risk prediction. As shown in Table 4, in order to observe the research trends more accurately and clearly, based on a simple analysis of keyword frequency, we summarized the keywords in the three topics of disease, AI technology and function, and ranked them according to their frequency of occurrence. The top five diseases of concern in this field are cancer, COVID-19, traumatic brain injury, stroke, and sepsis. The top five AI technologies were machine learning, deep learning, random forest, support vector machine, and neural networks. The top five functions were prediction, classification, diagnosis, management, and prevention.
Keyword ranking of artificial intelligence in the field of medical risk prediction.
In this study, we also employed the spectral clustering algorithm within CiteSpace to conduct a cluster analysis of keywords pertaining to our field of interest. Figure 5 delineates the five predominant clusters of keywords identified in this domain. Figure 5 shows the five major keyword clusters in this field. Intravenous thrombolysis (#0) was the largest cluster, which belongs to the brain disease, such as traumatic brain injury (#1). Breast cancer risk prediction (#2) is the most frequently addressed cancer disease in the field, followed by lung adenocarcinoma (#3). The topics of hepatic injury (#5) and the predictive modeling of diabetes (#6) are currently among the most active areas of research (#7). Artificial neural networks are likely to become the hottest technology in the field.

Keywords cluster analysis of artificial intelligence in medical risk prediction.
Keywords that emerge suddenly and receive extensive or relatively high citations within a short period are referred to as burst keywords. 21 They are identified using CiteSpace (version 6. 3. R1)'s default Kleinberg algorithm. Burst keywords, regarded as key indicators of frontier research hotspots, signal emerging trends in the field. Figure 6 shows top 24 keywords with strong citation bursts between 1986 and 2024. Prognostic models (20.27) had the highest burst strength, followed by logistic regression (4.41). The thick red line shows the period of the keyword's outbreak. red line shows the p was the keyword with the longest burst (20 years). Recent burst keywords included t burst (20 yn,” ecent burst k,” ecent burst keyword,” ecent burst,” and “big data.”

Top 24 keywords with the strongest citation bursts.
We take 1990 to 2000 as the initial exploration period in this field, and during this period, research hotspots mainly focused on basic prediction models and algorithms. The emergence of artificial neural network marks the beginning of AI technology to be introduced into the field of medical risk prediction. Prognostic models and logistic regression are commonly used to construct risk prediction models. From 2000 to 2010, it was the period of technology development and application expansion in this field. In this period, research began to focus on the performance evaluation and verification of the model. Mortality and morbidity are important health indicators, and the development of their prediction models has become a research focus. The emergence of risk assessment sheet indicates that the research has begun to develop into a broader field of disease risk prediction. Since 2010, there has been a period of deep integration and innovation in this field. Research in this field has begun to focus on the health problems of specific populations (such as women) and explore the impact of medical interventions. The emergence of big data and biomarkers reflects the driving role of technological progress in the field of medical risk prediction. Keywords, such as outcome prediction and pollution, indicate that the study begins to design more complex environmental health problems.
Discussion
Principal findings
Through a bibliometric analysis of 2080 publications, we systematically introduced the application of AI in medical risk prediction, focusing on publications, collaboration networks, research hotspots, and trends. The analysis covered the number of publications, countries, scholars, journals, and keywords. These analyses have led us to arrive at the following main conclusions:
As a key finding, a review of the timeline of changes in publication volume and the emergence of keywords provides a comprehensive overview of the development of AI applications in medical risk prediction. This highlights a technology-driven revolution that has evolved from simple, algorithm-guided predictions to sophisticated decision support systems integrated into complex clinical scenarios. During the early period (1986–2004), research predominantly focused on basic prognostic models and the construction of logistic regression frameworks. 33 From 2005 to 2020, with the rise of artificial neural networks, deep learning, and machine learning, AI began to handle more complex medical data, including imaging data and genetic information.2,6,30 Today, AI's ability to integrate multimodal data enables not only the prediction of risk for individual diseases but also provides comprehensive support for complex clinical decision-making.7,13,34,35 The future remains focused on the trends of precision medicine and personalized healthcare. After understanding the brief development milestones of AI in the field of medical risk prediction, we also hope to analyse meaningful research hotspots and potential hotspots in the field through this study. Through the hotspot analysis, we found that cancer is the most concerning disease in the field.
The second important finding is that through hotspot analysis, the diseases of interest in this field include cancer (especially breast cancer), COVID-19, and cerebrovascular diseases. Chronic diseases are the main objects of interest, and the final outcome is often death.36,37 By 2023, chronic diseases will cause 80% of human deaths worldwide, resulting in a severe global burden of disease. 38 For chronic diseases such as cancer, early prevention and early detection are very key. 39 Nowadays, the ability of multimodal data integration and learning is very beneficial for the prevention and management of multifactor chronic diseases. Among the top five functions, all are applicable to the prevention and management of chronic diseases. The broad capabilities of AI are capable of assisting modern healthcare by providing intelligent medical data analysis and developing accurate and efficient treatment predictions. 40 In future studies, these functions will hopefully be applied to more disease interventions. Whether it is from more accurate diagnosis and management of cancer or public health management that can coordinate the whole situation, AI plays an irreplaceable role, and at the same time, it also reflects that the future of AI in the field of medical risk prediction at the moment is precision medicine. By analyzing broader and more complete patient information, it is one of the experiences of precision medicine to develop personalized diagnosis and treatment measures for patients. 41 From a public health perspective, more rational resource allocation and accurate epidemiological prediction are also another aspect of precision medicine. This is evidenced in our analysis of outbreak keywords, and precision medicine will be another milestone in the future development of AI in the field of medical risk prediction.
The third important finding is that AI medical risk prediction is no longer limited to clinical scenarios, and has a trend of deep integration with human health: personalization and environmental health. The rise of personalized medicine means that the data generated by each patient, each scene, and even each space is unique. All these, through the powerful data integration and learning ability of AI, may finally become a means to prevent human health problems. 42 The advent of environmental health means that AI will help modern medicine to observe and think about human health problems from a broader perspective. In the future, perhaps the whole process from birth to death will be protected by accurate and personalized medical AI. 7
In the process of this research, we also found that there are still many challenges in the future development of this field, especially in the aspects of ethics, privacy, algorithm transparency, and standardization. Although deep learning models have excellent performance in prediction accuracy, their “black box” nature makes it difficult for clinical medicine and patients to understand the decision-making process of AI and cannot perfectly meet the evidence-based requirements of medicine. 43 The issue of ethics and privacy protection has always been an issue that needs to be paid attention to since the advent of AI. 44 How to conduct data analysis and commonality under the premise of ensuring patient privacy is one of the key challenges in the development of technology. 45 The wide application of AI has become a fact, but there is a lack of standardized research in this field. Formulating unified standards, laws, and regulations to regulate and recognize the role of AI in medical care and ensure its safety and reliability is crucial for the development of this field.
Limitations
Inevitably, we need to acknowledge the limitations of this study. First of all, due to the applicability of the three bibliometric tools and the challenge of data integration of different databases, only SCI and SSCI in WOS core collection are selected as data sources in this study. Although WOS database is the most influential multidisciplinary academic literature abstracts index database in the world, But because our search was limited to one database, we may have missed some important findings. 21 In addition, CiteSpace software has the limitation that not every node can be computed, and only representative and prominent nodes are presented. Many studies are still coming out after our search time, while the field is evolving rapidly and requires dynamic and timely evaluation. In the future, we will further expand data sources and standardize keywords to help us improve the overall quality of our paper and the accuracy of our predictions.
Conclusion
The evolution of AI in the field of medical risk prediction reflects the transformation from technical exploration to clinical application, from single disease prediction to multimodal and complex environmental health prediction. With the continuous development of machine learning, artificial neural networks, and personalized medicine, AI is no longer just a tool but is gradually becoming a very important part of the medical decision-making and management process. However, to achieve widespread use of AI in health care, multifaceted challenges such as interpretability, privacy protection, ethical issues, reliability, and standardization need to be addressed. In the future, AI will play an increasingly important role in improving prediction accuracy, improving health management, and promoting personalized medicine.
