Abstract
Introduction
Digital mammography (DM) is currently the most used technique for breast cancer detection, and population-based mammography screening programs have been proven to reduce mortality among women while being cost-effective (1,2). However, mammography projects a three-dimensional (3D) object, the breast, onto a two-dimensional (2D) image. As a consequence, there is an inherent loss of sensitivity and specificity due to anatomical noise arising from tissue superposition. Digital breast tomosynthesis (DBT) can overcome the limitations of DM by providing a pseudo-3D image of the breast (3), and many prospective trials and retrospective studies have demonstrated the clinical benefit of introducing DBT for breast cancer detection (4–9). Therefore, DBT might be considered a potential candidate to replace DM for population-based screening (10,11).
DBT consists of the acquisition of several low-dose planar X-ray projections of the compressed breast over a limited angular range, which are then reconstructed into a pseudo-3D volume. This acquisition strategy has inherent challenges that deteriorate image quality (3). The limited angle acquisition gives rise to out-of-plane artefacts and low vertical resolution (12–15), the low-dose per projection increases the impact of noise, and X-ray scatter decreases contrast (16). The reconstruction algorithm is one of the main aspects of image creation that could ameliorate these technical drawbacks, and therefore can greatly affect the final quality of DBT images.
Many different reconstruction approaches have been studied over time (17). Traditionally, the most widespread algorithm across DBT systems is filtered back projection (FBP), an analytical reconstruction method widely used in computed tomography (CT) and adapted for DBT (18,19). Fully iterative reconstruction algorithms are also in use (20–22). In order to make the most out of both approaches, FBP is recently being complemented with
One manufacturer has followed this approach in their DBT system (Mammomat Inspiration, Siemens Healthineers, Forchheim, Germany), recently updating the clinical standard reconstruction algorithm on their system from FBP to FBP with
In this work, we compare this new DBT reconstruction algorithm to the previous one using clinical patient images with two methodologies. First, in order to assess the benefits of the new algorithm in terms of image quality and lesion depiction, we perform a visual grading analysis (VGA) study (26) with human readers. Second, we assess if the new DBT reconstruction algorithm provides images that also benefit automated computer detection systems. In particular, we trained and tested two equivalent deep-learning based 3D convolutional neural networks for the task of detecting calcifications in DBT, one using FBP images and the other with EMPIRE images. Deep learning is an artificial intelligence computer technique (27) that has achieved similar to superior performance to humans for many complex medical imaging tasks (28). In mammograms, a small calcification may indicate the presence of cancer, either in situ or invasive, thus detection is important (29). However, their small size (range = 0.050–3 mm) increases detection time, and certainly deep-learning based computer systems could aid humans in this task (30).
Material and Methods
Reconstruction algorithms
The two reconstruction algorithms compared in this work are both clinical standard algorithms used by the Siemens Mammomat Inspiration DBT system: the FBP algorithm; and the new Enhanced Multiple Parameter Iterative Reconstruction (EMPIRE), introduced in 2016.
The FBP algorithm for DBT is described in detail in the work by Mertelmeier et al. (19). It basically back projects the DBT projections after application of different filters to account for the limited sampling of DBT in the vertical direction throughout the breast. The EMPIRE algorithm is based on FBP, but it includes additional processes aiming to achieve better artifact suppression, higher resolution, and less noise (23–25).
Patient data
DBT patient studies used in each experiment.
No cases with soft tissue lesions were included in the automated computer detection study.
All patients underwent an imaging protocol consisting of at least unilateral one-view DBT and digital mammography with a Siemens Mammomat Inspiration DBT system. All images were acquired in automatic exposure control mode. For a full DBT scan, the X-ray tube moves in an arc of 50° and acquires 25 projection images with an angular range of approximately 46°, during a total scan time of 20 s. The projection images were subsequently reconstructed by the DBT system into a pseudo-3D volume with focal planes parallel to the detector 1 mm apart, using the standard FBP algorithm. These raw projection images were reconstructed using the EMPIRE reconstruction algorithm on an off-line workstation only for this study, so this process took place well after the acquisition of each case.
Visual grading analysis study population
For the VGA study, 100 patient unilateral mediolateral oblique (MLO) view DBT studies were consecutively selected out of the 374 described above to achieve the desired proportion of patient cases (Table 1): 40 biopsy proven malignant cases; 30 biopsy proven benign cases; and 30 normal cases. The latter were scored as BIRADS® 1 or 2 and had at least one year of negative follow-up. The ground truth location of the lesions was annotated under the supervision of an experienced radiologist (13 years of experience with mammography, three with DBT) with access to pathology and radiology reports.
Automated computer detection study population
For the computer detection study, out of our set of 374 cases, all abnormal cases due to calcifications scored as BI-RADS 3, 4, or 5 cases were selected. Cases with calcifications were used since visibility of this type of lesion has been proposed to be the main advantage of EMPIRE over FBP (24). No cases with soft tissue lesions were included in this study. This yielded 60 DBT patient studies (Table 1). From these, 114 DBT volumes (either MLO, cranio-caudal [CC], or both views) were available. Location of calcifications were annotated individually for each reconstructed volume (independently in EMPIRE and FBP), under the supervision of the same experienced radiologist with access to pathology and radiology reports. A sample of 245 normal patient studies (bilateral, BI-RADS 1 or 2) was also selected for training of the computer detection algorithms.
Visual grading analysis study
An absolute VGA observer study (26) was performed to assess several aspects of image quality in both reconstruction algorithms. It was carried out by four readers specializing in breast imaging (one radiologist, one clinical PhD student, and two physicists specializing in mammography), who had a median of 12 years of experience in breast imaging (range = 3–21 years).
Two reading sessions separated by at least two weeks were performed in order to avoid possible bias in the results due to a direct comparison between reconstruction algorithms of the same patient. Both reconstructions (FBP and EMPIRE) of each patient were alternatively and randomly split between the two reading sessions. In total, 50 FBP volumes and 50 EMPIRE volumes were scored during each session. Scoring was performed on a 5-point scale (1 = poor quality to 5 = excellent quality) on six aspects of normal anatomy (presence of noise and artifacts, visualization of skin line and Cooper’s ligaments, contrast, and overall image quality) and, when present, visibility and sharpness of both types of lesions (calcifications and soft tissue). The location of the lesions was outlined for the readers. The reading was performed on an in-house developed workstation (CIRRUS Observer, Diagnostic Image Analysis Group, Nijmegen, the Netherlands) (Fig. 1), using high-resolution mammographic monitors of at least 5 MP.
In-house developed workstation for the scoring of the visual grading analysis reader study. The readers answered ten questions on a 5-point scale (1 = poor quality to 5 = excellent quality) and the lesions were outlined. The workstation automatically registered the results and provided a summary report per reader after each session.
To account for repeated measures and multiple independent reader variability, the average results were analyzed with generalized estimating equations (GEE) models, using as outcome the scores of each of the questions. The two-way GEE models were built using the reconstruction algorithm and reader as main effects as well as their interaction term. An exchangeable working correlation matrix structure was chosen. Wald 95% confidence intervals (CI) were computed. Differences in the scores between reconstruction algorithms for each reader were tested with the Mann–Whitney U (Wilcoxon) non-parametric test. A two-tailed
Computer automated detection study
Number of DBT patient studies, DBT image volumes, and extracted patches used for the training, validation, and testing of the 3D-CNNs.
Differences on a patch level between EMPIRE and FBP reconstruction algorithms are due to different individual calcification annotations between reconstructed volumes.
The 3D-CNN used in this study is an extension of the 2D deep-learning approach to detect calcifications in mammography developed by Mordang et al. (31). It was trained to discriminate between 3D DBT patches (size = 29 × 29 × 9 voxels) with and without suspicious calcifications. More details regarding the architecture and training strategy of the 3D-CNN can be found in Appendix 1.
The 3D-CNN with the parameters that achieved the best accuracy on the validation data during the training was then used to compute the receiver operating characteristic curve (ROC) on the test dataset. The partial area under the receiver operating characteristic (ROC) curve (pAUC) for a false-positive rate of 0–0.05 was computed. This range was empirically defined as the range where the largest difference of pAUC between EMPIRE and FBP was found. The pAUC was compared between the 3D-CNN trained with FBP data (3D-CNN-FBP) and the 3D-CNN trained with EMPIRE data (3D-CNN-EMPIRE) after bootstrapping (n = 5000), via the Mann–Whitney U (Wilcoxon) non-parametric test. A two-tailed
Results
Visual grading analysis study
Average scores (1 = poor quality to 5 = excellent quality) of each of the parameters of the visual grading analysis (VGA) for each reconstruction algorithm, obtained with a generalized estimating equations (GEE) model, which accounts for the variability of repeated measures by multiple independent readers.
Within parentheses, 95% Wald CIs are shown.
A two-tailed
There was significant inter-reader variability in all the scores ( Cumulative percentages of the scores (1 = poor quality, 5 = excellent quality) across readers for the four most relevant aspects that were found on average better for EMPIRE compared to FBP. (a) Absence of artifacts, (b) Image contrast, (c) Visibility calcifications, (d) Overall image quality. Average scores per reader (1 = poor quality, 5 = excellent quality) for the four more relevant aspects that were found on average better for EMPIRE in comparison with FBP reconstruction. Differences between reconstruction algorithms for each reader were tested with the Mann–Whitney U (Wilcoxon) non-parametric test. (a) Absence of artefacts, (b) Image contrast, (c) Visibility calcifications, (d) Overall image quality. Example ROIs of two DBT cases containing malignant calcifications (outlined) reconstructed with EMPIRE (left) and standard FBP (right). Three observers scored calcification visibility higher for EMPIRE in case (a), while all four of them scored EMPIRE higher in case (b). These images are displayed with the default window width and level set by the DBT system. Example ROIs of a DBT case containing a malignant soft tissue lesion (outlined) reconstructed with EMPIRE (left) and standard FBP (right). Three observers scored soft tissue visibility similar between EMPIRE and FBP (one reader scored EMPIRE higher than FBP). Also note how an artefact nearby the nipple (white circle), due to a calcification in another DBT plane, is visible in FBP but not in EMPIRE. These images are displayed with the default window width and level set by the DBT system. Example of patient DBT slice reconstructed with EMPIRE (left) and standard FBP (right). All four observers scored the artefacts on the FBP volume worse than on EMPIRE. It can be seen that for tissue near the skin line, EMPIRE provides a better visualization compared with FBP. Also, the large vein on the lateral side of the breast (under the star mark) shows more overshooting artefact (shadow like artefact, 21) in FBP than in EMPIRE. These images are displayed with the default window width and level set by the DBT system.




Computer automated detection study
The ROC curves of the 3D-CNN for FBP and EMPIRE are shown in Fig. 7a. The 3D-CNN-EMPIRE showed similar high performance as the one trained and tested with FBP (AUC-EMPIRE = 0.990 vs. AUC-FBP = 0.986). This is mainly influenced by the operating points at high false-positive rate (FPR, or 1 – Specificity), which have a sensitivity almost equal to 1. However, at low FPRs, we observed that 3D-CNN-EMPIRE performed better than 3D-CNN-FBP. For instance, at FPR = 0.01, 3D-CNN-EMPIRE achieved a sensitivity of 0.958 while 3D-CNN-FBP achieved a sensitivity of 0.845. The partial ROC curve delimited in the range with FPR of 0–0.05 is shown in Fig. 7b. After bootstrapping, the partial AUC (pAUC) of EMPIRE is 0.880 (95% CI = 0.846–0.897), significantly better ( Complete (a) and partial (b) ROC curves of the same 3D-CNN trained and validated with EMPIRE images and trained and validated with FBP images, for the task of detecting suspicious calcifications in DBT slices.
Discussion
The comparison of breast tomosynthesis reconstruction algorithms shows that the new EMPIRE reconstruction improves the image quality of the standard FBP reconstruction on the Siemens Mammomat Inspiration DBT system. The VGA results yielded in average better results for EMPIRE in some of the analyzed aspects of image quality. Also, the 3D-CNN using EMPIRE images achieved higher performance with a better ROC curve, specially at the range of high specificity, relevant for screening.
In general, performing additional iterative processes on the FBP reconstructed volumes appears useful in order to enhance the visualization of DBT images, heavily degraded due to the acquisition limitations of DBT. In particular, we have observed that image contrast can be enhanced and the presence of artifacts reduced. In addition, Cooper’s ligaments are slightly better visualized with EMPIRE. Cooper’s ligaments are fibrous connective tissue between the inner side of the skin and the pectoral muscles. Usually, changes in their structure yield a high predictive value for malignant mass lesions (32).
Furthermore, skin line visualization was similar among both algorithms. An excellent skin line visualization and sharpness is one of the main reported benefits of FBP in comparison to fully iterative algorithms (17). This remains unchanged with EMPIRE. Assessment of possible breast skin thickening anomalies is of importance since it may be associated with malignancy (33).
As pointed out in preliminary studies (24), it has been confirmed in our study that the new EMPIRE algorithm significantly improves the visibility of calcifications in the DBT volumes for humans. In addition, we also showed a similar benefit for a deep-learning based computer detection system when it comes to classification of calcifications. The higher contrast of calcifications achieved by EMPIRE, combined with a similar visualization of soft tissue lesions, suggests that EMPIRE might improve the clinical performance of DBT for lesion detection in a clinical setting.
A topic of future work is to study the impact of the EMPIRE algorithm on tests designed for quality control of the reconstructed slices of breast tomosynthesis (13). Moreover, further expansion of the 3D-CNN for EMPIRE is also still required, since here we just used a basic network while, similar techniques can also be applied in order to detect/classify groups of calcifications, as well as other types of lesions.
A limitation of this study is the fact that an actual detection reader study was not performed to account for lesion visibility. In addition, some of the observers were not breast radiologists, but given the non-clinical task of evaluating image quality, we believe this is a minor limitation. Also, the medical physicist observers provided the least number of significantly different assessments between the two reconstruction algorithms in the VGA study. Therefore, any potential bias would be in favor of the FBP algorithm.
It should also be noted that, although images from both algorithms were objectively and independently annotated, not the same calcifications were included for evaluation of the 3D-CNN with EMPIRE and FBP. We observed that more calcifications were annotated in EMPIRE. This might support that calcification visibility for human observers is higher in EMPIRE. As a consequence, this might lead to a bias in favor of FBP, since likely many true-positive calcifications for EMPIRE were labeled as true negatives in FBP, while they could have been considered as false negatives.
In conclusion, the new EMPIRE reconstruction algorithm, in comparison with FBP, provides breast tomosynthesis volumes with better contrast and overall image quality, fewer artifacts, and improved visibility of calcifications according to the human observers, as well as improved detection capability in deep-learning systems. As a consequence, this new algorithm might enhance DBT clinical performance of radiologists and improve the accuracy of deep-learning based computer detection systems.
Footnotes
Acknowledgments
Declaration of Conflicting Interests
Funding
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
