Abstract
Keywords
Introduction
Chronic liver disease (CLD) and cirrhosis have a high burden on global health. CLD is the 11th leading cause of death globally, attributing to 1.1 million deaths annually.
1
In previous decades, the major causes of cirrhosis were chronic hepatitis
The gold standard for the diagnosis of NAFLD, NASH, and cirrhosis is liver biopsy. It provides an assessment of hepatic steatosis, inflammation, and fibrosis. However, liver biopsy is relatively invasive with complications, such as hemoperitoneum and hemothorax. 7 Due to its invasive nature, liver biopsy is also not pragmatic as a follow-up tool. Alternative diagnostic methods for NAFLD, such as clinical/laboratory scores and imaging modalities have been proposed, but with limited performance. For example, NAFLD Liver Fat Score has a sensitivity of 86% and specificity of 71%, 8 whereas ultrasonography have a reasonable performance for the diagnosis of moderate steatosis (>33% of hepatocytes contain steatosis) but is less reliable for mild steatosis (⩽33% steatosis). 9 Magnetic resonance imaging proton density fat fraction (MRI-PDFF) has greater accuracy but comes with a high cost and limited availability. 10 Moreover, limitations also extend to the detection of NASH and significant fibrosis among NAFLD patients. For example, the previously reported area under the receiver operating characteristic curves (AUROCs) for the diagnosis of NASH among NAFLD were up to 0.82 for ultrasonography scores (e.g. ultrasonography fatty liver indicator and ultrasonography fatty score) and 0.82 for transient elastography (TE). 11 On the contrary, the AUROCs for detecting significant fibrosis among NAFLD were 0.83 for TE, 0.88 for MRE and 0.64–0.75 for clinical scoring systems, for example, BARD score (0.64) and FIB-4 (0.75). 12 Artificial intelligence (AI) has begun to be incorporated into these clinical scoring systems and imaging modalities in order to improve diagnostic performance.
Over the past decade, AI has been used to identify and predict patterns or connections within large data sets in various fields of medicine, demonstrating particular usefulness in the diagnostic process. Previous systematic review of AI in hepatology reported on the utilization of machine-learning for assessing liver fibrosis, predicting liver decompensation, screening eligible liver transplant recipients as well as predicting post-transplant survival and complications.13,14 Another recent systematic review summarized the integration of AI in imaging modalities, digital pathology, and electronic health records for the diagnosis and staging of NAFLD. 15 The review emphasized on the high accuracy of AI-based system for NAFLD diagnosis and staging. However, very few meta-analyses have been conducted to summarize the overall diagnostic performance of AI-assisted diagnosis of liver diseases. 16 In this systematic review and meta-analysis, we aimed to determine the performance of AI-assisted systems for the diagnosis of NAFLD, NASH, and liver fibrosis.
Methods
The study was conducted based on the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) checklist. 17 The protocol was registered with PROSPERO (CRD42021230391).
Search strategy
The objective of the search was to identify studies utilizing AI in the diagnosis and classification of NAFLD, NASH and liver fibrosis among NAFLD patients. A literature search was conducted on MEDLINE, Scopus, Web of Science, and Google Scholar databases. The search was conducted from January 2000 through September 2021. We excluded studies published prior to the year 2000 to avoid obsolete computer-based algorithms which are not consistent with the modern AI classification. The keywords for the search included: ‘artificial intelligence’, ‘computer-assisted’, ‘computer-aided’, ‘neural network’, ‘machine learning’, ‘deep learning’, ‘liver’, ‘hepatic’, ‘steatosis’, ‘fatty’, ‘NAFLD’, ‘NASH’, ‘steatohepatitis’, ‘fibrosis,’ and ‘cirrhosis’. Due to the previously mentioned updated nomenclature, the search term ‘metabolic associated fatty liver disease’ or ‘MAFLD’ was also included. However, at the time of literature search, no studies with MAFLD and AI were identified. The search strategies for all databases are present in the Supplemental method.
Inclusion and exclusion criteria
We included articles using AI to assist in the diagnosis and grading of NAFLD. The inclusion criteria consisted of studies with sufficient data to generate a 2 × 2 table of true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The articles also had to specify the reference standard (diagnostic method) and class(es) of AI. The exclusion criteria were studies which did not report the desired outcomes or did not have sufficient data to complete the 2 × 2 table. We also excluded studies that did not clearly describe validation methods or characteristics of training and validation cohorts. Studies in languages other than English as well as reviews, editorials, conference proceedings, and abstracts with incomplete information on the study population or characteristics of source image data sets were also excluded.
Data extraction
Two authors (PD and TT) independently screened the abstracts and titles to select the studies for full-text review. After screening, data extraction and quality assessment were also independently performed and cross-checked by the two authors (PD and TT). Any disagreements were discussed and decided by the third author (RC). Extracted data included author’s last name, publication year, study location, study design (prospective or retrospective cohort), validation methods (
Quality assessment
The quality of the studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool which is comprised of 12 questions assessing risk of bias and applicability in four domains (patient selection, appropriate index test, reference standard, and flow and timing). 18 As mentioned in our previous work, some questions were slightly modified to better assess the quality of AI-related studies. 16 For instance, the interpretation of the index test in clinical diagnostic studies should be conducted with an optimal pre-specified threshold in order to avoid overfitting. In AI-related research, separate validation or testing cohorts should be conducted in order to prevent the overfitting issue. Therefore, we assessed whether the included studies provided clear validation methods. Other questions for the assessment of human-oriented bias were also modified including whether knowing the reference standard results influenced the index test results. This was interpreted as a risk of bias caused by human manipulation in the AI protocol which could affect the AI output.
Statistical analysis
We used Covidence (Veritas Health Innovation, Melbourne, and Australia) for the screening, data extraction, and quality assessment process. After data extraction, TP, FP, TN, and FN values were exported from Covidence. If not available, the values were calculated from sensitivity, specificity, and prevalence using Review Manager version 5.3.5.
19
All statistical analysis was conducted using
Results
Literature search
The searching process and results are shown in Figure 1. After literature search, a total of 430 articles were identified. After removing 173 duplicates, 257 abstracts were screened and 183 articles were excluded due to the following reasons: conducted on animals (

Flow diagram of search methodology and literature selection process.
The studies in the systematic review were divided into 5 categories: (1) AI-assisted ultrasonography to diagnose NAFLD (
Characteristics of included studies in systematic review (13 studies included in meta-analysis are in bold).
ANN, artificial neural network; AODE, aggregating one-dependence estimators; BN, Bayesian network; CNN, convolutional neural networks; ELM, extreme learning machine; F0-4, METAVIR fibrosis staging; FLD, fatty liver disease; HNB, hidden naïve Bayes; kNN, k-nearest network; LFTs, liver function tests; LR, logistic regression; LSTM, long short-term memory; MLP, multilayer perceptron; MRI, magnetic resonance imaging; NAFLD, non-alcoholic fatty liver disease; NASH, non-alcoholic steatohepatitis; NB, naïve Bayes; RF, random forest; RT, regression tree; SGD, stochastic gradient descent; SVM, support vector machine; XGBoost, extreme gradient boosting.
Selected AI in the analysis.
Studies conducted on the same population cohorts.
Quality assessment by QUADAS-2 showed that most studies contained low risk of bias and had no applicability concerns, except for the four studies which contained uncertain risk of bias, one study with high risk of bias, and one study with high risk for applicability concerns. Studies with uncertain risk for bias were studies referring to both alcoholic and non-alcoholic fatty liver disease (
Performance of AI-assisted ultrasonography for the diagnosis of NAFLD
Systematic review included six studies incorporating AI into ultrasonography for NAFLD diagnosis.22–27 Three studies relied on multiple AI classifiers22,24,27 and three studies utilized a single AI classifier (2 CNN23,26 and 1 RT
25
). Liver biopsy was employed as the diagnostic method for NAFLD in four studies,22–24,27 whereas the other two studies chose MRI-PDFF.25,26 Two studies included 50% of patients with less than 30% steatosis.23,27 One study consisted of 92% of patients with less than 20% steatosis
25
and the other study had a mean steatosis of 11%.
26
Two pairs of studies (Kuppili
The pooled sensitivity, specificity, PPV, NPV, and DOR for the four studies was computed as 0.97 (95% CI: 0.91–0.99), 0.98 (95% CI: 0.89–1.00), 0.98 (95% CI: 0.93–1.00), 0.95 (95% CI: 0.88–0.98), and 599.53 (95% CI: 96.73–3716.06), respectively (Figure 2(a)–(e)). Heterogeneity was relatively low with

Sensitivity (a), specificity (b), positive predictive value (c), negative predictive value (d), and diagnostic odds ratio (e) of AI-assisted ultrasonography for the diagnosis of NAFLD.

SROC curves demonstrating performance of AI-assisted diagnosis of NAFLD (AI-assisted ultrasonography and AI-assisted clinical data sets) and AI-assisted diagnosis of NASH).
We further performed meta-regression with the diagnostic method (liver biopsy
Subgroup analysis by AI classifiers revealed that neural network AI had slightly higher sensitivity, NPV, and DOR than non-neural network AI, with sensitivity of 0.98 (95% CI: 0.94–0.99)
Performance of AI-assisted clinical data sets for the diagnosis of NAFLD
We performed a meta-analysis of six studies incorporating AI into clinical data sets for NAFLD diagnosis.28–33 Examples of clinical data sets primarily included demographic data (age, sex, weight, and height) and laboratory values (liver and renal function tests, lipid profile, and plasma glucose). Multiple AI classifiers were used in four studies,28–30,33 while the other two studies used a single AI classifier (1 ANN 32 and 1 random forest 31 ). Five articles selected ultrasonography as the diagnostic method,28–30,32,33 while one study relied on MRI. 31
The pooled sensitivity, specificity, PPV, NPV, and DOR were 0.75 (95% CI: 0.66–0.82), 0.82 (95% CI: 0.74–0.88), 0.75 (95% CI: 0.60–0.86), 0.82 (0.74–0.87), and 13.29 (95% CI: 8.32–21.21), respectively (Figure 4(a)–(e)). Figure 3 shows the SROC with an AUC of 0.85. We observed a high degree of heterogeneity with

Sensitivity (a), specificity (b), positive predictive value (c), negative predictive value (d), and diagnostic odds ratio (e) of AI-assisted clinical data sets for the diagnosis of NAFLD.
Meta-regression performed with diagnostic method and AI classifier as covariates resulted in
Performance of AI-assisted diagnosis of NASH in patients at-risk for NASH
We identified five studies focusing on the diagnosis of NASH among patients with NAFLD or with at-risk for NAFLD (i.e. obese and hypertensive).21,34–37 In this category, two studies integrated AI with imaging modalities21,34 and three studies incorporated AI with clinical data sets.35–37 Almost all studies selected liver biopsy as the diagnostic methods, except for one study which used ultrasonography findings in combination with elevated liver enzymes.
36
The pooled sensitivity, specificity, PPV, NPV, and DOR for the diagnosis of NASH were 0.80 (95% CI: 0.75–0.85), 0.69 (95% CI: 0.53–0.82), 0.71 (95% CI: 0.36–0.91), 0.75 (95% CI: 0.35–0.94), and 8.27 (95% CI: 5.53–12.37), respectively. The heterogeneity was relatively high with
Performance of AI-assisted diagnosis of liver fibrosis in NAFLD
Systematic review included a total of five studies integrating AI for the diagnosis of liver fibrosis among NAFLD patients.21,38–41 However, the meta-analysis was not feasible due to differences in diagnostic modalities and outcomes of the included studies. Three studies integrated AI with clinical data38,39,41 and one study incorporated AI with imaging biomarkers 21 to evaluate liver fibrosis in NAFLD patients. The other study investigated AI-assisted clinical data sets for evaluating both the diagnosis of NASH and fibrosis. 40 Two studies conducted by the same investigator group contained overlapping study population.40,41 Regarding diagnostic methods in each study, three study relied on liver biopsy,21,38,41 one study used elastography 39 and one study selected liver biopsy and ultrasonography as diagnostic method for the NAFLD group and control group, respectively. 40 Overall, the reported sensitivity and specificity varied by different stages of fibrosis. For example, one study found that the performance for identifying METAVIR F1-F4 ranged from a sensitivity of 0.993 for F1 to 1.00 for F4 and a specificity of 0.757 for F1 to 1.00 for F4. 39
Performance of AI-assisted steatosis quantification in pathological specimen
Our systematic review identified four studies integrating AI with pathological imaging analysis for steatosis quantification and diagnosis of NAFLD.42–45 The outcome of each study was different from each other, including steatosis grading, differentiating macrosteatosis from other structures, identify significant steatosis or macrosteatosis and diagnosing NASH among NAFLD samples. Therefore, meta-analysis was not performed. All studies relied on pathologist as the reference standard. The diagnostic performance varied by outcomes of the study. For example, the AI-assisted identification of macrosteatosis showed a sensitivity and specificity of 0.98 and 0.94, respectively, 42 while the sensitivity and specificity for diagnosing ⩾30% steatosis were 0.714 and 0.973, respectively. 44 The performance of AI-assisted system for steatosis grading according to the NASH Clinical Research Network histological scoring system ranged from a sensitivity of 0.99 for grade 1 to 0.67 for grade 3 and a specificity of 1.00 for grade 1 to 0.98 for grade 3 steatosis. 43 Furthermore, the AI-assisted pathological identification of NASH among NAFLD had a sensitivity and specificity of 0.879–0.909 and 0.909–1.00, respectively. 45
Publication bias
In the Deeks funnel plot, the slope coefficients were relatively symmetrical with a
Discussion
This systematic review and meta-analysis have identified many types of AI-assisted methods to diagnose NAFLD, NASH, and fibrosis among NAFLD patients and quantify liver steatosis in pathological specimens. Meta-analysis results showed excellent performance of AI-assisted ultrasonography for the diagnosis of NAFLD, with an AUC of 0.98 and relatively low heterogeneity. Combining AI with clinical data sets also demonstrated an acceptable performance level for the diagnosis of NAFLD, with an AUC of 0.85, with a higher degree of heterogeneity, which was likely due to variations in clinical input data.
Integrating AI into ultrasonography can improve the performance of NAFLD diagnosis. Ultrasonography is widely available in most hospitals and healthcare facilities. The equipment is also relatively inexpensive and the procedure is non-invasive. However, since the image analysis is user-dependent, it is also subject to inter- and intra-observer variations. The performance of conventional ultrasonography is often less reliable for the diagnosis of early-stage NAFLD. Therefore, incorporating AI with ultrasonography image analysis can minimize both human-related errors as well as improve overall performance. Our meta-analysis found that three out of the four studies had enrolled patients with mild steatosis (50–92% of patient cohorts had less than 30% steatosis) emphasizing the ability of AI-integrated methods to identify early-stage steatosis. The meta-analysis results show promising performance of AI-assisted ultrasonography with excellent sensitivity, specificity, PPV, and NPV of 0.95 and above as well as high accuracy with an AUC of 0.98. Heterogeneity assessment for AI-assisted ultrasonography was also relatively low with
Comparisons between the performance of AI-assisted systems in this meta-analysis and the performance of conventional methods reported in previous studies for the diagnosis of NAFLD.
DGE-MRI, dual-gradient echo magnetic resonance imaging.
AI has also been employed to analyze large clinical data sets with various inputs, such as demographic data, physical findings, and laboratory results. The performance of AI in this category is promising but less satisfactory with findings showing only moderate accuracy compared to AI-assisted ultrasonography (AUC: 0.85
Other applications of AI in NAFLD are the identification of NASH and fibrosis which could offer tremendous clinical benefits as the degree of hepatic inflammation or fibrosis is associated with liver-related mortality.
4
Regarding the AI-assisted diagnosis of NASH, our meta-analysis showed an acceptable sensitivity of 80% and AUC of 0.8 but with relatively high heterogeneity. We hypothesized that the different diagnostic methods and different population might in part contribute to the high heterogeneity. Due to the limited number of studies included in the meta-analysis (
The last application of AI in NAFLD is to quantify liver steatosis in pathological specimens. Previous studies have shown that conventional identification of pathological specimen is susceptible to inter- and intra-observer variations and also considered to be a time-consuming process.49,50 AI-supported analysis has shown that it can provide reliable results with acceptable performance levels including a sensitivity and specificity of 0.71 and 0.97 for the diagnosis of more than 30% steatosis 44 as well as 0.67 – 0.99 and 0.85 – 1.00 for steatosis grading. 43
This manuscript represents one of the very first meta-analyses focusing on the application of AI in the diagnosis of NAFLD. In the production of this effort, we conducted a comprehensive literature search, including articles from medical journals, computer science, and engineering journals. Our selection criteria also only included articles with clear validation methods which is crucial for evaluating performance of AI technology. We do recognize some limitations remain present in this study. No AI algorithms were completely identical among the included articles. Since AI inputs were slightly different among the studies despite being classified as similar, interpretation of the pooled diagnostic performance must proceed with caution. More studies in each subgroup are required for comprehensive subgroup analysis. Another limitation is the difference in the diagnostic method among the included studies. The gold standard for the diagnosis of NAFLD and steatosis quantification is liver biopsy or MRI-PDFF as the best alternative. However, some studies integrating AI with clinical data sets instead relied on ultrasonography which may affect performance results. In order to accurately evaluate the performance of the AI-assisted diagnostic system, liver biopsy or MRI-PDFF should be employed as the diagnostic method for NAFLD. Finally, prospective or randomized controlled studies comparing AI-supported analysis with conventional methods would be beneficial in assessing the potential utility of AI in clinical practice.
Conclusion
AI-assisted ultrasonography and clinical data sets delivered satisfactory performance as a diagnostic tool for NAFLD. AI-assisted systems used in the identification of fibrosis and NASH as well as the quantification of steatosis of a pathological specimen also yielded promising results albeit the limited number of the studies available for review. Randomized controlled studies or prospective studies are warranted to validate the benefit of AI use in clinical setting.
Supplemental Material
sj-docx-1-tag-10.1177_17562848211062807 – Supplemental material for Application of artificial intelligence in non-alcoholic fatty liver disease and liver fibrosis: a systematic review and meta-analysis
Supplemental material, sj-docx-1-tag-10.1177_17562848211062807 for Application of artificial intelligence in non-alcoholic fatty liver disease and liver fibrosis: a systematic review and meta-analysis by Pakanat Decharatanachart, Roongruedee Chaiteerakij, Thodsawit Tiyarattanachai and Sombat Treeprasertsuk in Therapeutic Advances in Gastroenterology
Supplemental Material
sj-docx-2-tag-10.1177_17562848211062807 – Supplemental material for Application of artificial intelligence in non-alcoholic fatty liver disease and liver fibrosis: a systematic review and meta-analysis
Supplemental material, sj-docx-2-tag-10.1177_17562848211062807 for Application of artificial intelligence in non-alcoholic fatty liver disease and liver fibrosis: a systematic review and meta-analysis by Pakanat Decharatanachart, Roongruedee Chaiteerakij, Thodsawit Tiyarattanachai and Sombat Treeprasertsuk in Therapeutic Advances in Gastroenterology
Footnotes
Author contributions
Conflict of interest statement
Funding
Supplemental material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
