Abstract
Introduction
Chest radiographs (CXRs) are one of the most common imaging examinations requested in acute inpatient and outpatient settings. Pneumothorax is a potentially life-threatening condition requiring urgent clinical attention. 1 A CXR is a common first-line imaging study used when a pneumothorax is suspected. As imaging examination volumes continue to rise, artificial intelligence (AI) can serve an important role in identifying high acuity conditions, such as pneumothorax, and alerting the radiologist and referring clinician.2,3 Quicker identification may result in faster clinical intervention for the patient. Past studies have suggested that AI assisted interpretation can improve performance and efficiency for identifying pneumothorax on CXR. 4
Computer-aided image segmentation typically utilizes convolutional neural networks, a subfield of AI. This model is comprised of multiple layers of connected weighted nodes, inspired by neurons in the brain. 5 During the training process, the neural network will initially randomly assign weights to the nodes, comparing its outputs to the real “ground truth” label. Based on output results, the weights are adjusted until a satisfactory performance has been achieved. 5 In computer-aided pneumothorax detection, a training set of both normal and abnormal chest radiographs are labeled by an expert (typically a radiologist or physician in a different specialty). They are entered into a convolutional neural network and will train the algorithm. This deep learning (DL) process attempts to learn the image features. 6 After analysis and adjustments, the network is tested against a new set of chest radiographs for assessment of diagnostic accuracy. The results of the test phase are often compared to the true interpretation of the image, as decided by an experienced radiologist or another physician.
There are multiple studies on computer-aided detection of pneumothorax on chest radiograph, however, their results are varied.2,7 Understanding the value of its addition to clinical practice can serve as a valuable step in the implementation of this technology. As a result, we performed a diagnostic test accuracy systematic review and meta-analysis to assess the performance of computer-aided pneumothorax detection, as trained by DL, on CXR.
Methods
A study protocol was created and registered a priori (PROSPERO CRD42023391375). Research Ethics Board approval was not sought as all data are available in the public domain. Contemporary guidelines and reporting criteria were followed based on the Cochrane handbook and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Diagnostic Test Accuracy (PRISMA DTA) statement.8,9 Literature resources were utilized for strategies. 10
Search Strategy
A comprehensive literature search was performed to identify and select studies that assess the diagnostic performance of computer-aided detection of pneumothorax on adult CXRs. MEDLINE, EMBASE, Cochrane Central Register of Controlled Trials (CENTRAL), and Scopus databases were searched for potentially relevant studies. The search was conducted by an experienced hospital librarian (RC) and included studies published up until January 10, 2023. The search strategy is provided in the Supplemental Materials.
The following inclusion criteria were used: (1) the patients in the study were adults (>18 years of age); (2) study discussed computer-aided detection of pneumothorax on CXR; (3) Dataset was evaluated by a qualified physician (ie, Radiologist, Emergency Department physician); (4) sufficient data to construct 2 × 2 contingency table provided. Studies that did not meet these criteria were excluded.
Study Selection
Literature results were exported to an Excel spreadsheet to facilitate the selection. Title and abstract review was conducted by a radiology fellow with 5 years of experience performing systematic reviews (N.Z.) to ensure papers met the inclusion criteria. Full-text screening was conducted by 2 medical students with 2 years of experience conducting systematic reviews (B.K., N.I.), and N.Z. An initial pilot phase with 3 studies was performed to ensure consistency. Results were discussed and discrepancies were adjudicated by a radiologist with 25 years of experience (M.P.).
Data Extraction
Data extraction was done by 3 authors (BK, NI, NZ). An initial pilot phase with 5 studies was conducted and discussed to ensure consistency, with discrepancies adjudicated by MP. The following metrics were extracted: first author surname, journal, year of publication, study funding, study design, patient age, patient sex, total number of studies, source of studies (eg, publicly available database or independent dataset), number of patients with pneumothorax, how the reference standard was determined, cross-validation, performance scores (ie, sensitivity, specificity, area under curve [AUC]). A true positive was a CXR that AI and an expert opinion concur that a pneumothorax is present. A false negative was a CXR that had a pneumothorax, as identified by an expert, but was missed by AI. A true negative was a CXR that AI and the expert agreed that there was no pneumothorax. A false positive was a CXR that AI deemed to be positive for pneumothorax but an expert disagreed.
Quality Assessment and Risk of Bias
Risk of Bias in individual studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool modified for the specific study question. 11 The following criteria were assessed for each included study: patient selection, index test, reference standard, flow, and timing. This was done by 3 authors (BK, NI, NZ); with discrepancies resolved via discussion with a fourth author (MP). Overall study risk of bias was considered “high” if the risk of bias in any category was considered “high” or if the risk of bias was “unclear” in 2 or more categories. For patient selection, a study was considered low risk of bias if a random selection of patients was used for inclusion. For the index test, utilizing a separate image set for training/validation of the algorithm versus testing was classified as a low risk of bias. The reference standard was deemed low risk if the training/validation set were labeled by a qualified physician (eg, radiologist). A study was given an “unclear” or “high” rating if the data set source was not stated, or if not confirmed by a qualified physician. For flow and timing, the study was deemed low risk if all CXRs were included in the test set analysis.
Statistical Analysis
A bivariate random-effects model meta-analysis was performed to determine estimates of the mean for the sensitivity, specificity, positive and negative likelihood ratios of computer-aided detection of pneumothorax on chest radiographs with 95% confidence intervals. 12 Coupled forest plots and hierarchical summary receiver operating characteristic (hsROC) curves were created using the estimated model parameters.
Sources for variability in accuracy were explored through meta-regression. A multivariate meta-regression model was utilized, and the following variables were assessed for their impact on sensitivity and specificity: study design factors (single vs multicentre), use of an institutional training/validation data set versus commercially available software, and risk of bias. As per contemporary guidance for diagnostic accuracy systematic reviews, publication bias was not assessed.13,14 Analysis was performed using the “midas,” “metandi,” and “metaprop” packages in STATA version 11.2 (Texas, United States), as well as the “mada” package in R version 3.5.1 (Auckland, New Zealand).12,15
Results
Search Results and Selection Characteristics
The literature search yielded 835 unique studies, and ultimately 23 met the inclusion criteria, with a total of 34 011 patients and 34 075 CXRs.16-38 A study flow diagram is shown in Figure 1. The studies were published between 2016 and 2023 from various institutions around the world. All studies were retrospective in design; 18 were multi-centre studies, while 5 were single-centre studies. The amount of CXRs used to evaluate the algorithms ranged from 425 to 4574. Table 1 summarizes the study characteristics.

Diagram of search and selection process.
Study Characteristics.
Quality Assessment
The risk of bias was assessed for each study using the QUADAS-2 criteria. A summary of the assessment is provided in Table 2. 20 of the 23 studies were deemed to be at low risk of bias, while 3 studies were assessed to have a high risk of bias. The most frequent source of bias was “Reference Standard,” where the source data was unclear or not directly stated. In addition, if the patient selection method from a large database was not specified, the study would receive an “unclear” bias rating for “Patient Selection.”
Risk of Bias Based on QUADAS-2 Tool for Each Included Study.
Meta Analysis and Regression
Pooled estimates of the mean sensitivity and specificity forest plots and the hsROC curve are displayed in Figures 2 and 3 respectively. The pooled sensitivity was 87% (95% confidence interval [CI], 81%, 92%) and pooled specificity was 95% (95% CI, 92%, 97%). The area under the curve (AUC) of the hsROC is 97% (95% CI, 95%, 98%).

Forest plot of selected studies.

Hierarchical summary receiver operating characteristic curve for AI pneumothorax detection.
A comparative meta-regression to examine the impact of multiple covariates on the sensitivity and specificity was performed and is summarized in Table 3. There was no significant effect of study design, use of an institutional training/validation set versus commercially available software, or risk of bias on the sensitivity or specificity (
Meta Regression Model Evaluating Impact of Several Covariates on Sensitivity and Specificity.
Discussion
This systematic review and meta-analysis assessed the performance of pneumothorax detection on CXRs by DL algorithms, with a total of 23 studies, 34 011 patients, and 34 075 CXRs included in the analysis. The pooled sensitivity and specificity were 87% (95% CI, 81%, 92%) and 95% (95% CI, 92%, 97%), respectively. The meta-regression models found that study design, use of an institutional training/validation set versus commercially available software, and risk of bias had no significant effect on the sensitivity and specificity of pneumothorax detection.
The ability to detect pneumothorax rapidly and reliably can have critical impacts on patient care. As imaging volumes increase, it is necessary to diagnose critical findings efficiently. 39 Our findings demonstrate the remarkable capabilities of DL algorithms for pneumothorax detection on CXR. One similar review study found a pooled sensitivity of 84% and specificity of 96% for the performance of DL for pneumothorax detection across all imaging modalities, in close agreement with our results. 40 Our study focused specifically on pneumothorax detection in CXRs (which are the first line imaging technique for pneumothorax) and used meta-regression to further evaluate several factors that may have an impact on performance. DL algorithms have also been developed to examine a variety of other common CXR pathologies, such as pneumonia. In a recent systematic review and meta analysis, DL detected pneumonia with a 98% sensitivity and 94% specificity. 41
The performance of DL algorithms is often compared to the capabilities of radiology trainees (ie, residents, fellows) and radiologists. In one study, junior residents generally performed better than DL algorithms detecting pneumothorax, though the algorithm could interpret images 1000 times faster. 38 In a similar study, DL generally performed at a comparable level to mid-level radiology residents at producing preliminary reports of CXR reads. 42 Limitations of DL technology generally include a higher false positive rate and factors such as pneumothorax size and presence of tubes/lines affecting performance.43,44
The consensus amongst a majority of the literature is that DL, and AI in general, serve to greatly augment the clinical duties of a radiologist. Several studies suggest that DL provides an additive benefit to interpreting radiologists, resulting in improved performance and faster interpretation times for those who utilized DL.45-48 Beyond the role of DL in conjunction with radiologists, there is a benefit to smaller community centres. For instance, the role of AI may be of substantial clinical value in emergency departments that do not have 24/7 radiology coverage. 31 Furthermore, it has been deployed to assist with image triaging, allowing for a system to prioritize images that contain more urgent findings and reduce reporting delay.49,50 As pneumothorax is a common presentation in the emergency department, deployment of algorithms to prioritize and detect pneumothorax will likely expedite care for patients. Our study supports the performance of DL in pneumothorax detection and demonstrates the exciting potential of these algorithms.
Our study has several limitations. The literature search was performed through several major databases, therefore gray literature was not included. Studies that were published in languages other than English were excluded as well. The quality of CXR included in each study could not be evaluated. Meta-regression was used to evaluate the influence of several variables, though other potentially important covariates such as experience level of interpreting radiologist were not considered. Specific subgroup analysis, such as size or cause (eg, trauma) of pneumothorax was not performed. The influence of a larger training set was not assessed. Finally, studies including pediatric imaging were excluded.
In conclusion, this systematic review and meta-analysis/regression evaluated the performance of pneumothorax detection on CXR by DL algorithms. As AI remains a field of ongoing development, the next generation of algorithms will hopefully continue to improve in performance. The promising results of this study provide an exciting foundation toward integrating DL technology into the emergency radiology workflow.
Supplemental Material
sj-docx-1-caj-10.1177_08465371231220885 – Supplemental material for Deep Learning for Pneumothorax Detection on Chest Radiograph: A Diagnostic Test Accuracy Systematic Review and Meta Analysis
Supplemental material, sj-docx-1-caj-10.1177_08465371231220885 for Deep Learning for Pneumothorax Detection on Chest Radiograph: A Diagnostic Test Accuracy Systematic Review and Meta Analysis by Benjamin D. Katzman, Mostafa Alabousi, Nabil Islam, Nanxi Zha and Michael N. Patlas in Canadian Association of Radiologists Journal
Footnotes
Abbreviations
Author Contributions
Declaration of Conflicting Interests
Funding
Informed Consent and Patient Details
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

