Abstract
Keywords
Injuries to the anterior cruciate ligament (ACL) are common in athletes, with more than 2 million occurring worldwide annually. 38 Surgical management of this ligament through ACL reconstruction (ALCR) aims to restore knee function, stability, and preinjury levels of activity. 33 The techniques for femoral and tibial tunnel placement and graft type selection vary among surgeons, 39 but a recent systematic review 16 found that single-bundle reconstruction with independent tunnel drilling seemed to be the current preferred technique in the United States. These evidence-based decisions are driven by the findings from randomized controlled trials (RCTs), which support the highest-level recommendations produced by the American Academy of Orthopaedic Surgeons (AAOS). 9 However, the statistical stability of these studies may be more fragile than previously thought.
The importance of data from comparative studies and RCTs is commonly conveyed via various test statistics and statistical thresholds. One common test statistic is the
To our knowledge, there have been no studies applying fragility analysis to comparative studies and RCTs regarding the different graft bundle options for ACLR. The purpose of this study was to determine the statistical stability of studies comparing single-bundle and double-bundle autografts in primary ACLR with independent tunnel drilling. The primary objective for this study was to calculate the mean FI and mean FQ for dichotomous outcomes reported by these studies. The secondary aim for this study was to perform subgroup analysis and calculate the proportion of outcome events for which FI is less than the number of patients lost to follow-up. We hypothesize that the findings of these studies are vulnerable to a small number of outcome event reversals and that the number of outcome event reversals are often less than the number of patients lost to follow-up.
Methods
A systematic review was performed according to

Study identification flowchart. BTB, bone–patellar tendon–bone; FI, fragility index; FQ, fragility quotient; HT, hamstring tendon; RCT, randomized controlled trial.
Search Strategy
This review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Relevant literature searches were performed via PubMed for select journals. The articles must have included comparative studies and RCTs pertaining to the utilization of single-bundle and double-bundle autografts published in select journals from 2005 to 2020. The select journals were chosen for their prominence within the field of orthopaedic surgery and sports medicine. The 10 orthopaedic journals included were the
Inclusion and Exclusion Criteria
Three independent authors (C.B.E., A.J.C., E.S.C.) screened each search result to determine if it met inclusion and exclusion criteria. These were evaluated by the following inclusion criteria: (1) autografts with independent tunnel drilling techniques were implemented; (2) the patients underwent primary ACLR for chronic, subacute, or acute injuries; and (3) the study reported a 12-month minimum follow-up period. The studies were excluded if (1) the surgical technique was not explicitly stated, described, or referenced; (2) allografts or transtibial tunnel drilling techniques were implemented; (3) the patients underwent concomitant ligamentous repair or reconstructions at the time of ACLR, although partial meniscectomies and meniscus repairs were permitted; (4) the studies were on cadaveric, in vitro, or animal models; and (5) the studies utilized population databases, national registries, or cross-sectional data.
Risk-of-Bias Assessment and Methodology Scoring
Two authors (C.B.E., K.P.) independently evaluated each study. Risk of bias was assessed via the Cochrane Collaboration tool. Seven items were utilized to assess bias risk: random sequence generation (selection bias), allocation concealment (selection bias), blinding of participants and personnel (performance bias), blinding of outcome assessment (detection bias), complete outcome data (attrition bias), selective reporting (reporting bias), and other bias. Scoring was determined using the signaling questions and algorithm provided by Cochran, with each category scored having a low, high, or unclear risk of bias. Methodology scoring was conducted according to the COSMIN (Consensus-based Standards for the Selection of Health Measurement Instruments) checklist.
Data Analysis
For each dichotomous outcome reported in a study, the following information was recorded: the type of outcome being measured, the number of patients in each outcome group, the population size, and the number of patients lost to follow-up. It was also recorded whether the outcome was listed as primary or secondary and if it was reported as significant or insignificant by recording the
Through a trial-and-error method, outcome events were manipulated in a 2 × 2 contingency table until significance was reversed, as demonstrated in Figure 2. For example, if a particular outcome were initially reported as significant, the number of outcome event reversals required to raise the

Demonstration of fragility index = 1; a single-outcome event reversal resulting in altered statistical significance. BTB, bone–patellar tendon–bone; HT, hamstring tendon.
Three subgroups were analyzed for significant differences via independent
The 48 dichotomous outcome measures ultimately included pivot-shift tests (n = 11), flexion/extension restrictions (n = 8), International Knee Documentation Committee (IKDC) ratings (n = 6), Lachman tests (n = 6), return to sport (n = 4), postoperative evidence of osteoarthritis (n = 4), rate of retear (n = 4), requirement for surgical reintervention (n = 2), the presence of anterior knee pain with kneeling (n = 2), and quality of bundle status at follow-up according to magnetic resonance imaging findings (n = 1). Nondichotomous data points were not included, as these cannot be analyzed with current fragility methodology.
Results
Of the 1794 studies screened, 709 met initial search criteria, with 15 comparative studies ultimately included for analysis, 13 of which were RCTs. The included studies are detailed in Table 1. Nearly all studies utilized hamstring autografts, except for one study 19 that used single-bundle quadriceps tendon autografts and another study 47 that used single-bundle bone–patellar tendon–bone autografts.
Studies Meeting Study Inclusion Criteria (N = 15)
A summary of the risk-of-bias assessment is shown in Figure 3, and methodology scoring is illustrated in Figure 4.

(A) Risk-of-bias assessment and (B) summary of risk-of-bias assessment according to the Cochrane Collaboration tool.

Summary of methodology scoring for the studies included by this systematic review. Following the layout of the COSMIN (Consensus-based Standards for the Selection of Health Measurement Instruments) checklist, the x-axis contains the categories within the checklist, and the y-axis depicts the number of included studies that fall in each category.
Overall and subgroup analyses of fragility are displayed in Table 2. Incorporating 48 total outcome events from all 15 studies, the overall mean FI was 3.14 (IQR, 2-4). The overall FQ was 0.050 (IQR, 0.032-0.062). Of these 48 outcome events, 35 (72.9%) events had an associated FI that was less than the number lost to follow-up.
Overall Fragility Data and Analysis of Subgroups
All 48 outcome events were recorded as either primary or secondary and as significant (
Significant (n = 7) and insignificant (n = 41) outcomes were analyzed and found to have medians FIs of 3 and 3, respectively. The mean FIs were 3.29 (IQR, 2-4) and 3.12 (IQR, 2-4), and the mean FQs were 0.047 (IQR, 0.029-0.069) and 0.051 (IQR, 0.033-0.060), respectively. There was no significant difference between mean FIs (
Primary (n = 37) and secondary (n = 11) outcomes were analyzed and found to have median FIs of 3 and 2, respectively. The mean FIs were 3.32 (IQR, 2-4) and 2.54 (IQR, 2-3) and the mean FQs were 0.053 (IQR, 0.038-0.065) and 0.043 (IQR, 0.028-0.052), respectively. There was no significant difference between mean FIs (
For the outcomes where FI ≤ LTF (n = 35), the median FI was found to be 3 and the mean FI to be 3.40 (IQR, 2-4). For the outcomes where FI > LTF (n = 13), the median FI was found to be 2 and the mean FI to be 2.46 (IQR, 2-3). The associated mean FQs were 0.054 (IQR, 0.041-0.067) and 0.041 (IQR, 0.029-0.060), respectively. There was a significant difference between mean FIs (
Discussion
For this systematic review, the overall FI was found to be 3.14 and the overall FQ to be 0.050, which are findings consistent with prior orthopaedic literature reporting an average median FI of 2.5 11,17,22,23,30,44 and a mean FQ of 0.031. 35,44 Our findings demonstrate that statistical significance may be altered by the reversal of fewer than 4 outcome events or the reversal of 4% of outcome events. As hypothesized, the FI was less than the number of patients lost to follow-up in nearly three-quarters of outcomes (72.9%). We believe this is an important finding, as the composite results from comparative studies and RCTs are typically viewed as the best evidence available for influencing clinical practice and medical decision-making. Our results emphasize the need to renovate classical statistical reporting.
Although this study directly examines fragility in the setting of ACLR, 1 prior study 42 did so indirectly through analysis of the Scandinavian knee ligament registries. These authors examined the fragility of 13 studies with median sample sizes of 5540, including large analyses of national databases. 14,36,37 The authors found the mean FI to be 178.5, with extensive variability (median, 116; range, 1-1089). 42 One possible explanation for the deviation of these values from prior fragility studies and from the results of our study is that Svantesson et al 42 reported an FI of zero in nearly one-third (30.4%) of the outcomes. An FI of zero indicates that zero outcome reversals were necessary to make a result insignificant because it was reported as a statistically insignificant finding; this is considered to be a “1-directional” fragility analysis. The authors exclusively examined the number of events necessary to make a significant result insignificant. In our analysis, we performed “2-directional” fragility analysis, as reported by Parisien et al. 35 This allows us to examine not only the number of event reversals required to make significant outcomes insignificant but also the number of event reversals needed to convert insignificant findings to significant. More outcomes are able to be examined through this technique, and it may allow for greater generalizability of findings.
Double-bundle ACLR allows for restoration of the 2 functional bundles of the ACL, the anteromedial and posterolateral bundles. The anteromedial bundle controls anteroposterior stability, while the posterolateral bundle primarily controls rotational stability. The principle behind this technique is to re-create the native ACL anatomy and restore the proper tension pattern of each bundle. Although biomechanical studies 29,34 have shown the technique to be superior, many studies 7,10,25,27,29,43,46,49 have shown no significant difference with respect to subjective clinical outcomes. Despite the potential benefits of double-bundle ACLR, single-bundle anatomic ACLR remains the preferred technique for surgeons in the United States and globally. 16 This may be due to the technical demands of surgeries utilizing double-bundle grafts, 10 and the potential for increased difficulty of revision ACLR in these patients. 29 The inclusion of fragility analysis in future investigations may provide better clarity for graft bundle choice in ACLR.
This systematic review has several strengths. This study is strengthened by the utilization of 2-directional fragility analysis as discussed earlier. The study also examined primary and secondary outcomes to make our findings more generalizable. In addition to common primary outcomes such as retear rates and physical examination tests, many studies report radiologic findings or physical examination tests as secondary outcomes; our methodology captures all of these outcomes, allowing fragility to be applied more broadly. A final strength is the methodology of the literature search according to PRISMA guidelines. This search included orthopaedic journals with a mean impact factor of 5.04, which is higher than those of recent similar systematic reviews conducted on sports medicine (3.2) 22 and spine literature (2.4). 11
However, this study is not without limitations. One potential limitation is the number of studies that met inclusion criteria is small in comparison with prior fragility analysis of medical and orthopaedic literature. 17,23,30,44 However, the scope of this systematic review was narrower, and therefore, it was more difficult to have a large number of articles meet our study inclusion. Another limitation is that FI and FQ can only evaluate categorical inputs with dichotomous outputs and therefore cannot be applied to continuous or ordinal variables. For example, rates of retear and physical examination maneuvers such as the Lachman or pivot-shift test results are easily encompassed, but pain measurements or functional outcomes scores cannot be captured. Other outcomes that were captured included return to play within a year and the presence or absence of radiographic findings. Additionally, some dichotomous outcome measures are inherently flawed. For example, return to sport may not be the strongest outcome measure as it is influenced by psychosocial factors as well as repair integrity. Third, this study did not track methods of graft fixation or other methods of bone tunneling. However, we believe the strict inclusion and exclusion criteria of our study are consistent with the current trends in surgical practices in the United States. Lastly, it is unknown if different levels of fragility exist between the high-impact journals included in our analysis and lower-impact journals and open access literature that were not included.
Conclusion
Studies comparing single-bundle versus double-bundle ACLR may not be as statistically stable as previously thought and may warrant additional investigation. Comparative studies and RCTs are at substantial risk for statistical fragility with few event reversals required to alter significance. The reversal of fewer than 4 outcome events in a treatment group can alter the statistical significance of a given result; this is commonly less than the number of patients lost to follow-up. Future comparative study analyses might consider including FI and FQ with
