Sage Journals: Discover world-class research

Abstract

Objectives

Network meta-analysis is a popular tool to simultaneously compare multiple treatments and improve treatment effect estimates. However, no widely accepted guidelines are available to classify the treatment nodes in a network meta-analysis, and the node-making process was often insufficiently reported. We aim at empirically examining the impact of different treatment classifications on network meta-analysis results.

Methods

We collected nine published network meta-analyses with various disease outcomes; each contained some similar treatments that may be lumped. The Bayesian random-effects model was applied to these network meta-analyses before and after lumping the similar treatments. We estimated the odds ratios and their 95% credible intervals in the original and lumped network meta-analyses. We used the adjusted deviance information criterion to assess the model performance in the lumped network meta-analyses, and used the ratios of credible interval lengths and ratios of odds ratios to quantitatively evaluate the estimates’ changes due to lumping. In addition, the unrelated mean effect model was applied to examine the extents of evidence inconsistency.

Results

The estimated odds ratios of many treatment comparisons had noticeable changes due to lumping; many of their precisions were substantially improved. The deviance information criterion values reduced after lumping similar treatments in seven (78%) network meta-analyses, indicating better model performance. Substantial evidence inconsistency was detected in only one network meta-analysis.

Conclusions

Different ways of classifying treatment nodes may substantially affect network meta-analysis results. Including many insufficiently compared treatments and analysing them as separate nodes may not yield more precise estimates. Researchers should report the node-making process in detail and investigate the results’ robustness to different ways of classifying treatments.

Keywords

Lumping network meta-analysis node-making splitting systematic review treatment classification

Introduction

In many scientific fields, multiple related studies are often available to provide evidence on a certain common topic. As an effort to synthesize the evidence from different sources and find the potential differences between them, systematic reviews and meta-analyses have been increasingly popular to produce reliable and precise evidence. They are especially useful to assess treatment effects in comparative effectiveness research for decision makers. Methods for meta-analysis have been rapidly developed over the recent forty years.¹ In this era of big data, researchers continue to be enthusiastic about looking for evidence from all possible sources and combining them together in a more informative form. This motivates the idea of network meta-analysis (NMA), also known as mixed treatment comparison, which compares multiple treatments simultaneously.^2–6 Compared with traditional meta-analysis on each pair of treatments at one time, NMA has an attractive advantage of allowing comparisons between all available treatments for a certain disease outcome, even if there are no head-to-head studies comparing some treatments. By combining both direct evidence (from head-to-head studies) and indirect evidence (from studies with common comparators, say placebo), an NMA likely yields treatment effect estimates with higher precision than traditional pairwise meta-analysis. Its use in health-related research has been remarkably increasing since 2009.⁷

Despite these benefits, researchers take several risks, including heterogeneity between studies, when performing NMAs.⁸ Even in traditional pairwise meta-analysis, the extent of heterogeneity is a critical factor that determines whether the collected studies may be properly combined.⁹ Because an NMA pools not only multiple studies but also multiple treatments, inconsistent definitions of treatments between studies may lead to an additional source of heterogeneity. Because meta-analysis depends on comprehensive searching for all available studies on the targeted research topic to avoid various types of bias¹⁰,¹¹ and wasting any information,¹² it is common to collect studies with similar but not identical treatments. Such treatments may be analysed jointly as a single treatment node in some NMAs, while they may also serve as separate nodes in other NMAs. For example, in an NMA of tocolytic therapy for preterm delivery, Haas et al.¹³ treated usual or standard care without a tocolytic drug in the same group with placebo. Also, they classified ritodrine, terbutaline, nylidrin, salbutamol, fenoterol, hexoprenaline and isoxsuprine in a group of beta mimetics, and many other active treatments in other groups. In total, the collected studies originally reported 25 distinct treatments, but the classifications led to eight groups. Each group was considered as the same treatment node in the NMA. In another NMA that compared antihypertensive drugs’ effects on cancer risk by Bangalore et al.,¹⁴ however, placebo and non-placebo controls were treated as two distinct treatment nodes. Lack of consensus on defining treatments and classifying them may lead to overlapping NMAs and cause serious confusion about final conclusions.¹⁵

In the current literature, such node-making processes are generally insufficiently reported and lack widely recognized guidelines, not only for pharmacological treatments but also for non-pharmacological ones.¹⁶,¹⁷ This problem is often referred to as the dilemma between lumping and splitting treatments.⁴ The process of lumping classifies similar treatments into the same group, so it collapses treatment nodes and reduces the number of parameters to be estimated (or model complexity) in an NMA. On the other hand, the process of splitting produces more treatment nodes and increases the NMA model complexity. For example, in the NMA by Haas et al.,¹³ there were 25 × (25 – 1)/2 = 300 comparisons if all treatments were split, while this number reduced to 8 × (8 – 1)/2 = 28 after lumping.

The extreme case of lumping is that all active treatments are classified as one group and all remaining non-active treatments as another group; the NMA is subsequently reduced to a pairwise meta-analysis by such lumping. This model contains the minimal number of parameters, and thus its complexity is reduced to the minimum. However, it likely fits the data poorly because the results may be seriously biased if the treatments lumped in the same group actually differ substantially. On the other hand, the extreme case of splitting is that any different definitions of treatments are classified as separate nodes in NMA. Although this may fit the data well, it complicates the NMA model and likely leads to large variances of the effect estimates. Due to the large number of parameters in the model, the estimation procedures, such as the restricted maximum likelihood for frequentist methods and the Markov chain Monte Carlo (MCMC) algorithm for Bayesian methods, may even fail to converge.¹⁸ From the statistical perspective, the dilemma between lumping and splitting treatments is essentially the tradeoff between goodness-of-fit and model complexity for model selection. Various statistical criteria are available to deal with this problem.¹⁹,²⁰

This article reanalyses nine NMA datasets, each containing some similar treatments that may be lumped. We examine the effects of different classifications of treatments on their effect estimates. Also, we propose a criterion for assessing the appropriateness of lumping treatments.

Methods

Data sources

We extracted nine NMAs with binary outcomes that contained similar treatments from a total of 58 NMAs investigated by Trinquart et al.,²¹ which were originated from the datasets collected by Veroniki et al.²² and Bafeta et al.²³,²⁴ They compared treatments for various important disease conditions, including atrial fibrillation, plantar fasciitis, severe erosive oesophagitis and stroke. We denoted each NMA using its first author’s surname with the publication year. All nine NMAs contained similar treatments that may be classified as common groups. For example, these similar treatments included pharmacological ones with different dose levels, intake frequencies or intake methods and non-pharmacological interventions with different intensity levels.

No ethical approval and patient consent were required in our study, because this article focused on statistical methods for NMAs; all analyses were performed based on published data in the literature.

Lumping treatments

We considered lumping all similar treatments in each treatment class as one node in each NMA. The lumped treatments included: (1) drugs with different dose levels, such as milnacipran 100 and 200 mg/day in the NMA by Roskell et al.,²⁵ (2) drugs with different intake frequencies, such as calcipotriol b.i.d. (twice a day) and o.d. (once daily) in the NMA by van de Kerkhof et al.,²⁶ (3) drugs with different intake methods, such as intravenous and oral amiodarone in the NMA by Bash et al.²⁷ and (4) non-pharmacological interventions with different intensity levels, such as low-, medium-, and high-intensity focused shock wave therapies (with energy flux density ≤0.08 mJ/mm², 0.08–0.28 mJ/mm², and ≥0.28 mJ/mm², respectively) in the NMA by Chang et al.²⁸

Most studies were two-armed, while the remaining studies were multi-armed (comparing more than two treatments) in the NMAs. Some studies’ designs (i.e. treatments compared within studies) changed after lumping similar treatments. If a two-arm study contained two similar treatments to be lumped (e.g. one study in the NMA by Edwards et al.²⁹ comparing omeprazole 20 and 40 mg), it became single-armed after lumping and thus was removed from the lumped NMA, because such a single-arm study cannot be used in the conventional contrast-based NMA model.³⁰ Each multi-arm study included at most one group of similar treatments. Each lumped group in all NMAs except that by Chang et al.²⁸ contained two similar treatments. The lumping in the NMA by Chang et al.²⁸ involved three similar treatments, while each study in this NMA contained at most two lumped treatments. If a multi-arm study (say, $t$ -arm with $t$ > 2) contained two similar treatments, it reduced to be ( $t -$ 1)-armed after lumping these two treatments. For example, the NMA by Roskell et al.³¹ included a three-arm study comparing adjusted-dose vitamin K antagonist, dabigatran etexilate 110 mg and dabigatran etexilate 150 mg; after lumping the latter two similar treatments, this study became two-armed for adjusted-dose vitamin K antagonist vs dabigatran etexilate. In addition, for either two- or multi-arm study, if it involved only one lumped treatment, its design remained the same. For example, each study in Owen³² contained up to one lumped treatment, so all studies’ designs in the lumped NMA did not change.

Statistical analyses

Implementation

We used the Bayesian random-effects model to estimate (log) odds ratios with 95% credible intervals (CrIs) of all treatment comparisons and account for the heterogeneity between studies in all nine NMAs before and after lumping similar treatments.³,³³ The correlation coefficients between treatment comparisons within multi-arm studies were assumed to be 0.5.³ All treatment comparisons within each NMA were assumed to have a common heterogeneity standard deviation, and the uniform prior bounded between 0 and 10 was used for it. The Bayesian NMAs were implemented using the R package ‘rjags’³⁴ via the MCMC algorithm. We used three Markov chains, each having 200,000 iterations after a 50,000-run burn-in period with a thinning rate 2 for reducing sample autocorrelations. We checked the chains’ trace plots for the MCMC algorithm’s convergence.

The deviance information criterion and the adjusted deviance information criterion for lumping

We used the deviance information criterion (DIC) to assess the performance of the nine NMAs before and after lumping similar treatments.²⁰ The DIC is defined as $DIC = \bar{D} + p_{D}$ , which is the sum of a deviance term $\bar{D}$ that evaluates the model’s goodness-of-fit and a penalty term $p_{D}$ that describes the model complexity. The penalty term can be interpreted as the effective number of parameters in the Bayesian hierarchical model. The calculation of the deviance depends on the outcome measures.³³ Specifically, suppose that an NMA contained $N$ studies and compared a total of $K$ treatments and denote the set of treatments compared by study $i$ as $T_{i}$ ( $i$ = 1, …, $N$ ). Recall that all nine NMAs had binary outcomes; let $r_{i k}$ and $n_{i k}$ be the event count and sample size in study $i$ ’s treatment group $k$ , respectively, and denote $p_{i k}$ as the underlying true event rate. Consequently, for the event count $r_{i k}$ following $binomial (n_{i k}, p_{i k})$ , the deviance term is³³,³⁵ $\begin{array}{l} \bar{D} = \sum_{i = 1}^{N} \sum_{k \in T_{i}} {dev}_{i k} \\ = \sum_{i = 1}^{N} \sum_{k \in T_{i}} 2 [r_{i k} \log (\frac{r_{i k}}{n_{i k} p_{i k}}) + (n_{i k} - r_{i k}) \log (\frac{n_{i k} - r_{i k}}{n_{i k} - {n_{i k} p}_{i k}})] \end{array}$ where $p_{i k}$ can be estimated sequentially during the MCMC algorithm. The deviance of each treatment group ${dev}_{i k}$ approximately follows $χ_{1}^{2}$ with mean 1 under the true model; therefore, the total deviance $\bar{D}$ is expected to be around the number of treatment groups across all studies in the NMA (i.e. $\sum_{i = 1}^{N} {| T}_{i} |$ where $| \cdot |$ denotes the set’s size).³⁵

Of note, the number of treatment groups within studies may change due to treatment lumping; some studies were removed as they became single-armed, and some reduced from multi-armed to two-armed. Therefore, the deviance term and thus the DIC may need to be adjusted for fairly comparing NMAs before and after lumping treatments. Recall that the NMA by Chang et al.²⁸ had three similar treatments to be lumped, while the others had only two similar treatments in each lumped treatment group. In the NMA by Chang et al.,²⁸ only one two-arm study contained two similar treatments, and it became single-armed after lumping; no studies contained all three similar treatments. Therefore, we focused on scenarios of deviance adjustments for lumping two similar treatments within studies.

Specifically, considering a $t$ -arm study, the adjustment of the study’s deviance due to lumping had three scenarios as follows.

Scenario 0: The study’s expected deviance remained unchanged when it did not involve lumping similar treatments; thus, this study continued to have $t$ treatment groups in the lumped NMA, and its deviance was still expected to be around $t$ .

Scenario 1: The study’s expected deviance decreased by 1 when it was multi-armed ( $t \geq$ 3) and included exactly two lumped treatments, because the number of treatment groups changed from $t$ to $t -$ 1 after lumping.

Scenario 2: The study’s expected deviance decreased by 2 when it was two-armed and contained two treatments to be lumped; this study became single-armed, and thus its original two treatment groups were removed from the lumped NMA.

No studies in the nine NMAs belonged to any other scenarios not specified above. Each NMA had diverse scenarios of deviance adjustments; suppose it contained $N_{0}$ , $N_{1}$ and $N_{2}$ studies in scenarios 0, 1 and 2, respectively. The deviance term of each lumped NMA was adjusted by adding $N_{1}$ + 2 × $N_{2}$ across all studies, so that the adjusted deviance of the lumped NMA could be fairly compared with the deviance of the original NMA on the same basis in terms of the total number of treatment groups (i.e. data points). The penalty term did not change during the process of adjustment, because it described the model complexity, not the NMA data. The adjusted DIC was the sum of the adjusted deviance and the penalty in the lumped NMA.

Comparing the changes of odds ratios due to lumping

We investigated density plots of the estimated odds ratios of similar treatments (say A₁ and A₂) vs other non-lumped treatments (say B) in the original NMAs before lumping. If the estimates of A₁ vs B and A₂ vs B dramatically differed for many non-lumped treatments B, then lumping A₁ and A₂ might not be appropriate. Also, we compared the point estimates of the odds ratios and their 95% CrIs in the original and lumped NMAs. To quantitatively evaluate the effects of lumping similar treatments, we calculated the 95% CrI’s length of the log odds ratio for each treatment comparison. The CrI length’s change due to lumping similar treatments indicated the impact of lumping on the estimate’s precision. Specifically, we used the ratio of the CrI lengths, RCL = $L_{LNMA} / L_{NMA}$ , to quantify this fold change for each treatment comparison, where $L_{NMA}$ was the treatment comparison’s CrI length in the original NMA before lumping, and $L_{LNMA}$ was that in the lumped NMA. Moreover, we calculated the ratio of odds ratios, ROR = ${OR}_{LNMA} / {OR}_{NMA}$ , to examine the fold change of the odds ratios’ point estimates, where ${OR}_{NMA}$ and ${OR}_{LNMA}$ were the estimated odds ratios before and after lumping, respectively.

The cutoffs of interpreting the fold change’s extent were often defined differently case by case.^36–38 A fold change of one indicated no change after lumping. In this article, for a fold change larger than 1, it was considered unimportant, moderate, substantial and considerable if it was within 1–1.1, 1.1–1.2, 1.2–1.5 and >1.5, respectively. Also, reciprocally, for a fold change less than 1, it was considered unimportant, moderate, substantial and considerable if it was within 0.91–1, 0.83–0.91, 0.67–0.83 and <0.67, respectively.

Assessing evidence inconsistency

All analyses above assumed that the direct and indirect evidence was consistent in both original and lumped NMAs. However, evidence inconsistency may appear in NMAs and may impact their results.³⁹,⁴⁰ Also, lumping treatments may change the structures of treatment comparisons.⁴¹ Consider three nodes A₁, A₂ and B in an NMA; A₁ and A₂ are similar treatments to be lumped as a single node A, and B is a non-lumped treatment. There are three cases of changes of treatment comparison structures due to lumping: (i) no direct comparison exists between A₁ and B and between A₂ and B, so the lumped A and B still have no direct comparison, (ii) A₁ and B have direct comparisons while A₂ and B do not, so A and B have direct evidence after lumping and (iii) both A₁ and A₂ have direct comparisons with B, so A and B are still directly compared. The changes of treatment comparison structures may impact the risk of evidence inconsistency.

Therefore, in addition to using the NMA model with the assumption of evidence consistency, we also applied the unrelated mean effect (UME) model suggested by Dias et al.⁴² to allow evidence inconsistency. In the UME model, each treatment comparison is considered as a separate and unrelated parameter. Like the consistency model, this model also assumes a common between-study variance for all comparisons, but it only makes use of information about direct evidence. We compared the consistency model with the inconsistency UME model in terms of DIC. The posterior mean deviances ${dev}_{i k}$ of each treatment group in each study produced by the two models were also compared; similar posterior mean deviances indicated no substantial evidence inconsistency. Because the UME model only produced the results of direct comparisons, we focused on the studies with treatments belonging to the case (iii) above.

Results

Basic characteristics

Figure 1 shows the nine NMAs’ geometry with the index for each treatment. All treatments were ordered alphabetically in each network, except placebo or control that was indexed as the first node (if available). Table 1 presents the NMAs’ basic characteristics, including the outcomes and the numbers of studies, treatments, and patients. The NMA by Chang et al.²⁸ investigated non-pharmacological interventions (shock wave therapies); the remaining eight NMAs compared pharmacological treatments. All nine NMAs originally treated similar treatments as separate nodes; seven NMAs contained studies that directly compared similar treatments. Table 2 lists all treatments with abbreviations, including similar treatments to be lumped in each NMA.

Figure 1.

Network plots of the nine network meta-analyses. Each node represents a treatment, and each edge represents a direct comparison between the corresponding two treatments. The edge’s width is proportional to the number of studies that provide the direct comparison. The dashed circle contains similar treatments that may be lumped.

Table 1.

Characteristics of the nine network meta-analyses.

Network	Outcome	No. of studies	No. of treatments	No. of direct comparisons	No. of patients	No. of studies with ≥2 lumped treatments
Bash et al.²⁷	Rapid cardioversion at 2 h	20	10	13	2234	4
Chang et al.²⁸	Success of intervention	12	5	5	1455	1
Edwards et al.²⁹	Healing at 4 weeks	10	5	6	4970	1
Owen³²	All strokes	14	4	5	8250	0
Phung et al.⁴³	Deep venous thrombosis	13	4	5	9822	0
Reich et al.⁴⁴	PASI 50 response	18	8	11	9564	5
Roskell et al.³¹	Ischemic stroke	17	11	15	49,956	1
Roskell et al.²⁵	30% pain response	12	10	12	4502	3
van de Kerkhof et al.²⁶	PASI 75 response	7	10	22	6708	2

PASI, psoriasis area and severity index.

Table 2.

Treatment names, abbreviations and classifications in the nine network meta-analyses.

Network	Treatment	Classification	Network	Treatment	Classification
Bash et al.²⁷	PlaceboAmiodarone (intravenous) (AMI-INT)Amiodarone (oral) (AMI-ORA)Flecainide (intravenous) (FLE-INT)Flecainide (oral) (FLE-ORA)Ibutilide (intravenous) (IBU)Procainamide (intravenous) (PRO)Propafenone (intravenous) (PRO-INT)Propafenone (oral) (PRO-ORA)Vernakalant (intravenous) (VER)	Amiodarone (AMI):AMI-INT, AMI-ORAFlecainide (FLE):FLE-INT, FLE-ORAPropafenone (PRO):PRO-INT, PRO-ORA	Reich et al.⁴⁴	PlaceboAdalimumab (ADA)Efalizumab (EFA)Etanercept 25 mg (ETA-LOW)Etanercept 50 mg (ETA-HIGH)Infliximab (INF)Ustekinumab 45 mg (UST-LOW)Ustekinumab 90 mg (UST-HIGH)	Etanercept (ETA):ETA-LOW, ETA-HIGHUstekinumab (UST):UST-LOW, UST-HIGH
Chang et al.²⁸	Placebo;High-intensity focused shock wave (IFSW-HIGH);Low intensity focused shock wave (IFSW-LOW);Medium intensity focused shock wave (IFSW-MED);Radio shock wave (RSW)	Shock wave:IFSW-HIGH, IFSW-LOW, IFSW-MED	Roskell et al.³¹	PlaceboAdjusted-dose vitamin K antagonist (AVKA)Aspirin monotherapy (AM)Aspirin + clopidogrel (AC)Dabigatran etexilate 110 mg b.i.d. (DE-LOW)Dabigatran etexilate 150 mg b.i.d. (DE-HIGH)Fixed low-dose warfarin (FLW)Fixed low-dose warfarin + aspirin (FLWA)Idraparinux (IDR)Indobufen (IND)Ximelagatran (XIM)	Dabigatran etexilate (DE):DE-LOW, DE-HIGH
Edwards et al.²⁹	Esomeprazole 40 mg (ESO)Lansoprazole 30 mg (LAN)Omeprazole 20 mg (OME-LOW)Omeprazole 40 mg (OME-HIGH)Pantoprazole 40 mg (PAN)	Omeprazole (OME):OME-LOW, OME-HIGH	Roskell et al.²⁵	PlaceboDuloxetine (DUL)Fluoxetine (FLU)Gabapentin (GAB)Milnacipran 100 mg (MIL-LOW)Milnacipran 200 mg (MIL-HIGH)Pregabalin 300 mg (PRE-LOW)Pregabalin 450 mg (PRE-HIGH)Tramadol + paracetamol (TP)Tricyclic antidepressant(TA)	Milnacipran (MIL):MIL-LOW, MIL-HIGHPregabalin (PRE):PRE-LOW, PRE-HIGH
Owen³²	ControlAspirin (high dose) (ASP-HIGH)Aspirin (low dose) (ASP-LOW)Warfarin (WAR)	Aspirin (ASP):ASP-HIGH, ASP-LOW	van de Kerkhof et al.²⁶	PlaceboBetamethasone dipropionate b.i.d. (BD-BID)Betamethasone dipropionate o.d. (BD-OD)Calcipotriol b.i.d. (CAL-BID)Calcipotriol o.d. (CAL-OD)Calcipotriol o.d. + betamethasone valerate o.d. (CBV)Calcipotriol o.d. + clobetasol butyrate o.d. (CCB)Tacalcitol o.d. (TAC)Two-compound formulation b.i.d. (TF-BID)Two-compound formulation o.d. (TF-OD)	Betamethasone dipropionate (BD):BD-BID, BD-ODCalcipotriol (CAL):CAL-BID, CAL-ODTwo-compound formulation (TF):TF-BID, TF-OD
Phung et al.⁴³	ControlLow molecular weight heparin (LMWH)Unfractionated heparin b.i.d. (UFH-BID)Unfractionated heparin t.i.d. (UFH-TID)	Unfractionated heparin (UFH):UFH-BID, UFH-TID

t.i.d., three times a day; b.i.d., twice a day; o.d., once daily.

DICs

Table 3 shows the DICs of NMAs before and after lumping similar treatments. The penalty terms (representing model complexity) of eight NMAs, except that by Owen,³² decreased after lumping. These decreases were generally expected because lumped NMAs contained fewer treatments and thus the model had fewer effective parameters. The increased penalty term in the NMA by Owen³² was possibly due to Monte Carlo error. Seven (78%) NMAs, except those by Owen³² and van de Kerkhof et al.,²⁶ performed better after lumping because their adjusted DICs were smaller than the original DICs before lumping.

Table 3.

Original and adjusted deviance information criterion values for the nine network meta-analyses before and after lumping similar treatments. The values of deviance information criterion, deviance and penalty term outside parentheses are produced by the consistency model, and those inside parentheses are produced by the inconsistency unrelated mean effect model.

Network	No. of reduced groups^a	Treatment classification	No. of total treatment groups	DIC^b	Deviance^c	Penalty^d
Bash et al.²⁷	6	Before lumping	43	80.68 (82.30)	43.78 (46.50)	36.90 (35.79)
		After lumping	37	65.72 (68.95)	36.16 (38.56)	29.56 (30.38)
		After adjustment^e		71.72 (73.95)	42.16 (44.16)	29.56 (30.38)
Chang et al.²⁸	2	Before lumping	24	48.58 (49.63)	25.12 (26.48)	23.45 (23.14)
		After lumping	22	44.50 (46.03)	23.42 (25.50)	21.07 (20.53)
		After adjustment^e		46.50 (48.03)	25.42 (27.50)	21.07 (20.53)
Edwards et al.²⁹	2	Before lumping	20	35.94 (36.52)	19.60 (19.18)	16.34 (17.33)
		After lumping	18	31.15 (30.76)	16.94 (16.36)	14.21 (14.40)
		After adjustment^e		33.15 (32.76^f)	18.94 (18.36)	14.21 (14.40)
Owen³²	0	Before lumping	30	52.79 (55.07)	31.42 (32.91)	21.36 (22.15)
		After lumping	30	55.64 (56.49)	33.15 (34.88)	22.49 (21.60)
		After adjustment^e		55.64 (56.49)	33.15 (34.88)	22.49 (21.60)
Phung et al.⁴³	0	Before lumping	26	45.25 (40.18)	26.13 (21.29)	19.12 (18.89)
		After lumping	26	44.30 (36.41)	26.16 (19.50)	18.14 (16.90)
		After adjustment^e		44.30 (36.41^f)	26.16 (19.50)	18.14 (16.90)
Reich et al.⁴⁴	5	Before lumping	41	67.96 (70.18)	39.89 (40.59)	28.06 (29.59)
		After lumping	36	60.34 (59.52)	34.40 (33.61)	25.93 (25.91)
		After adjustment^e		65.34 (64.52^f)	39.40 (38.61)	25.93 (25.91)
Roskell et al.³¹	1	Before lumping	37	68.72 (67.47)	37.45 (37.68)	31.26 (29.79)
		After lumping	36	66.67 (65.38)	36.40 (36.67)	30.26 (28.70)
		After adjustment^e		67.67 (66.38^f)	37.40 (37.67)	30.26 (28.70)
Roskell et al.²⁵	5	Before lumping	30	49.61 (49.23)	26.50 (26.35)	23.11 (22.87)
		After lumping	25	43.34 (42.89)	22.51 (22.41)	20.82 (20.47)
		After adjustment^e		48.34 (47.89^f)	27.51 (27.41)	20.82 (20.47)
van de Kerkhof et al.²⁶	2	Before lumping	23	40.40 (43.76)	22.04 (22.65)	18.36 (21.12)
		After lumping	21	39.81 (40.49)	21.67 (21.83)	18.13 (18.65)
		After adjustment^e		41.81 (42.49)	23.67 (23.83)	18.13 (18.65)

^aReduced groups due to lumping treatments.

^bThe value of deviance information criterion for Bayesian model selection.

^cThe value of deviance indicating goodness-of-fit.

^dThe value of penalty term indicating model complexity.

^eAdjusting for the reduced groups after lumping treatments.

^fSmaller adjusted DICs produced by the inconsistency model compared with the consistency model.

The NMAs had different scenarios of the DIC adjustment (i.e. the reduced treatment groups due to lumping). For example, six treatment groups were removed due to lumping in the NMA by Bash et al.:²⁷ two studies belonged to scenario 1, two belonged to scenario 2 and the remaining belonged to scenario 0, so the DIC of the lumped NMA was adjusted by adding 6 for a fair comparison with the original DIC before lumping. In another example of the NMA by Owen,³² no adjustment was applied to the DIC because all studies belonged to scenario 0.

Density plots

Figures S1–S9 in the Supplementary Material present the density plots of the estimated log odds ratios between lumped and non-lumped treatments in the nine NMAs. The density plots showed different relationships among similar treatments. For example, in the NMA by Bash et al.²⁷ in Figure S1, the density plots of each pair of similar treatments (2, 3) and (4, 5) were centred on common means, while their shapes were noticeably different, indicating different precisions. The density plots of the similar treatments (8, 9) had similar shapes, but their locations differed. Here, the treatment indexes are detailed in Figure 1. In the NMA by Chang et al.²⁸ in Figure S2, the three similar treatments (2–4) had similar density plots; however, in the NMA by Edwards et al.²⁹ in Figure S3, the density plots of the similar treatments (3, 4) differed in both locations and shapes.

Changes due to lumping

Figure 2 presents the changes of the estimated (log) odds ratios with 95% CrIs in the original and lumped networks by Bash et al.,²⁷ and Figures S10–S17 in the Supplementary Material present those in the remaining eight NMAs. The comparisons among the same group of similar treatments (if any) were unavailable in the lumped NMAs, so their results were not shown in these figures; such comparisons were noted in italic face on vertical axes. In the NMA by Bash et al.²⁷ in Figure 2, many CrIs’ lengths noticeably shrank after lumping similar treatments, and the changes of the estimated odds ratios differed for different treatment comparisons.

Figure 2.

Estimated log odds ratios with 95% credible intervals in the network meta-analysis by Bash et al.²⁷ before and after lumping similar treatments.

Figure 3 presents the RCLs and RORs of treatment comparisons in all nine NMAs and the frequencies of their fold changes to different extents. Figure 3(a) focuses RCLs < 1 that implied improved precision by lumping similar treatments; RCLs > 1 implying lowered precision are denoted as plus signs (+) at the plot’s top. RCLs < 0.67 implying considerable changes in precisions are also denoted as plus signs at the plot’s bottom. Similarly, considerable changes of RORs that were <0.67 or >1.5 are denoted as plus signs in Figure 3(b). Some treatment comparisons in four NMAs (i.e. Owen,³² Reich et al.,⁴⁴ Roskell et al.³¹ and van de Kerkhof et al.²⁶) were less precise after lumping similar treatments, while the other five NMAs yielded improved precision for all treatment comparisons. For example, 17 treatment comparisons in the NMA by Bash et al.²⁷ had considerable changes in RCLs, and the RORs of all comparisons indicated at least moderate changes. These results were consistent with the change of DIC as shown in Table 3 in this NMA; the DIC decreased from 80.68 to 71.72 (after adjustment) after lumping, implying better model performance in the lumped NMA. Furthermore, the histograms in Figures 3(c) and 3(d) indicate noticeable changes of RCLs and RORs for many treatment comparisons in all NMAs.

Figure 3.

The ratios of 95% credible interval lengths (RCLs, panel a) and the ratios of odds ratios (RORs, panel b) for all treatment comparisons in the nine network meta-analyses and the frequencies of fold changes of RCLs (panel c) and RORs (panel d) to different extents. In panels a and b, the plus signs (+) denote values that are below or above the vertical axis ranges, and the associated numbers are the numbers of such values not shown in the plots. The solid lines indicate no changes, and the dashed, dotted and dash-dotted lines differentiate unimportant, moderate, substantial and considerable fold changes accordingly.

Assessing evidence inconsistency

Table 3 gives the deviance terms and the DICs of both the consistency model and the inconsistency UME model. Most results were fairly close with differences <1, suggesting no substantial evidence inconsistency in most NMAs. However, in the NMA by Phung et al.,⁴³ the adjusted DIC produced by the inconsistency model was much smaller than that by the consistency model, indicating potential evidence inconsistency in this NMA.

Figures S18–S26 in the Supplementary Material show the posterior mean deviances produced by the consistency model and the inconsistency UME model in the original and lumped NMAs. The posterior mean deviances produced by the two models were similar for most NMAs; they were distributed around the expected value 1. However, for the NMA by Phung et al.,⁴³ Figure S22 shows that some posterior mean deviances produced by the consistency model were away from 1; again, they indicated evidence inconsistency.

Discussion

Strengths and limitations

This article included nine NMAs with various disease outcomes, so the results may be representative for a broad class of NMAs. We proposed an adjustment for the DIC to numerically evaluate the benefit of lumping treatments, and we used the RCLs and RORs to quantify the impact of lumping. The potential evidence inconsistency was also considered in our study.

Our study had some limitations. First, for the purpose of illustration, this article considered the case that all similar treatments were lumped. However, in practice, some similar treatments may have truly different effects. As shown by the density plots in Figures S1–S9 in the Supplementary Material, some similar treatments’ density plots were noticeably different. For example, in the NMA by Reich et al.,⁴⁴ each of the lumped treatments, etanercept and ustekinumab, originally had two different dose levels, and their density plots in Figure S6 show that the treatment effects depended on the dose levels. In this case, directly lumping these treatments across different dose levels may mask possible dose effects, and the dose–response meta-analysis may be alternatively used to incorporate such effects.⁴⁵ In another NMA by van de Kerkhof et al.,²⁶ Figure S9 indicates that each pair of betamethasone dipropionate b.i.d. and o.d. and two-compound formulation b.i.d. and o.d. had approximately similar density plots, while the pair of calcipotriol b.i.d. and o.d. had noticeable different density plots. Therefore, the former two pairs of similar treatments may be properly lumped, but the latter pair may not.

Second, this article investigated the effects of lumping similar treatments mainly from the statistical perspective, but many clinical considerations about the treatment definitions and effects should be employed in practical NMAs. For example, in the NMA by Bash et al.,²⁷ although the effects of oral and intravenous flecainide were similar in our statistical analyses, the two intake methods may be dramatically different from the clinical perspective, and they should be analysed separately for certain clinical purposes.

Third, this article focused on NMAs with binary outcomes and we investigated the effects of different treatment classifications only on the estimated odds ratios. These may limit the generalizability of our conclusions in NMAs with other types of effect sizes (e.g. risk ratios).

Recommendations and future studies

Meta-analysts should provide detailed information about treatment definitions and justification for classifying treatments when performing NMAs, especially when multiple similar treatments are available. It may not be optimal to analyse all treatments separately, and NMAs with too many insufficiently compared treatments may yield underpowered effect estimates. Some similar treatments may be lumped to effectively increase the estimates’ precision, if the lumping is reasonable from both statistical and clinical perspectives.

Future studies include developing methods to evaluate the effects of lumping similar treatments while maintaining treatments’ interpretability. It will be also important to account for various types of dose effects in NMAs, instead of simply lumping them.

Conclusions

The node-making process has been recognized as an important problem in NMAs, and it was poorly reported in many NMAs and often lacked detailed explanations.¹⁵,¹⁶ This empirical study has shown that different ways of making treatment nodes could substantially affect the results of NMAs. These findings have been based on nine published NMAs with similar treatments that could be lumped. The DICs were reduced in many NMAs after lumping, indicating better model performance. Also, RCLs and RORs indicated noticeable changes of the estimated odds ratios for many treatment comparisons due to lumping. The use of the UME model did not imply substantial evidence inconsistency in the NMAs except the one by Phung et al.⁴³

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: LL was supported in part by the Committee on Faculty Research Support (COFRS) programme from Florida State University Council on Research and Creativity.

ORCID iD

Lifeng Lin

Supplemental Material

The complete density plots,the plots of the estimated odds ratios’ changes and the plots of posterior mean deviances for all nine NMAs are available in the Supplemental Material.

References

Gurevitch

Koricheva

Nakagawa

, et al. Meta-analysis and the science of research synthesis. Nature 2018; 555: 175–182.

Lumley

Network meta-analysis for indirect treatment comparisons. Stat Med 2002; 21: 2313–2324.

Ades

AE.

Combination of direct and indirect evidence in mixed treatment comparisons. Stat Med 2004; 23: 3105–3124.

Salanti

Indirect and mixed-treatment comparison, network, or multiple-treatments meta-analysis: many names, many benefits, many concerns for the next generation evidence synthesis tool. Res Synth Methods 2012; 3: 80–97.

Zhang

Carlin

Neaton

, et al. Network meta-analysis of randomized clinical trials: reporting the proper summaries. Clin Trials 2014; 11: 246–262.

Lin

Zhang

Hodges

, et al. Performing arm-based network meta-analysis in R with the pcnetmeta package. J Stat Software 2017; 80: 1–25.

Lee

AW.

Review of mixed treatment comparisons in published systematic reviews shows marked increase since 2009. J Clin Epidemiol 2014; 67: 138–143.

Brignardello-Petersen

Mustafa

Siemieniuk

RAC

, et al. GRADE approach to rate the certainty from a network meta-analysis: addressing incoherence. J Clin Epidemiol 2019; 108: 77–85.

Higgins

JPT

Thompson

Deeks

, et al. Measuring inconsistency in meta-analyses. BMJ 2003; 327: 557–560.

10.

Bax

Moons

KG.

Beyond publication bias. J Clin Epidemiol 2011; 64: 459–462.

11.

Lin

Chu

Murad

, et al. Empirical comparison of publication bias tests in meta-analysis. J Gen Intern Med 2018; 33: 1260–1267.

12.

Ioannidis

JPA

Greenland

Hlatky

, et al. Increasing value and reducing waste in research design, conduct, and analysis. Lancet 2014; 383: 166–175.

13.

Haas

Caldwell

Kirkpatrick

, et al. Tocolytic therapy for preterm delivery: systematic review and network meta-analysis. BMJ 2012; 345: e6226.

14.

Bangalore

Kumar

Kjeldsen

, et al. Antihypertensive drugs and risk of cancer: network meta-analyses and trial sequential analyses of 324 168 participants from randomised trials. Lancet Oncol 2011; 12: 65–82.

15.

Naudet

Schuit

Ioannidis

JPA.

Overlapping network meta-analyses on the same topic: survey of published studies. Int J Epidemiol 2017; 46: 1999–2008.

16.

James

Yavchitz

Ravaud

, et al. Node-making process in network meta-analysis of nonpharmacological treatment are poorly reported. J Clin Epidemiol 2018; 97: 95–102.

17.

Del Giovane

Vacchi

Mavridis

, et al. Network meta‐analysis models to account for variability in treatment definitions: application to dose effects. Stat Med 2013; 32: 25–39.

18.

van Valkenhoef

de Brock

, et al. Automating network meta-analysis. Res Synth Methods 2012; 3: 285–299.

19.

Akaike

A new look at the statistical model identification. IEEE Trans Autom Control 1974; 19: 716–723.

20.

Spiegelhalter

Best

Carlin

, et al. Bayesian measures of model complexity and fit. J R Stat Soc Ser B Stat Methodol 2002; 64: 583–639.

21.

Trinquart

Attiche

Bafeta

, et al. Uncertainty in treatment rankings: reanalysis of network meta-analyses of randomized trialsuncertainty in treatment rankings from network meta-analyses. Ann Intern Med 2016; 164: 666–673.

22.

Veroniki

Vasiliadis

Higgins

JPT

, et al. Evaluation of inconsistency in networks of interventions. Int J Epidemiol 2013; 42: 332–345.

23.

Bafeta

Trinquart

Seror

, et al. Analysis of the systematic reviews process in reports of network meta-analyses: methodological systematic review. BMJ 2013; 347: f3675.

24.

Bafeta

Trinquart

Seror

, et al. Reporting of results from network meta-analyses: methodological systematic review. BMJ 2014; 348: g1741.

25.

Roskell

Beard

Zhao

, et al. A meta-analysis of pain response in the treatment of fibromyalgia. Pain Pract 2011; 11: 516–527.

26.

van de Kerkhof

de Peuter

Ryttov

, et al. Mixed treatment comparison of a two-compound formulation (TCF) product containing calcipotriol and betamethasone dipropionate with other topical treatments in psoriasis vulgaris. Curr Med Res Opin 2011; 27: 225–238.

27.

Bash

Buono

Davies

, et al. Systematic review and meta-analysis of the efficacy of cardioversion by vernakalant and comparators in patients with atrial fibrillation. Cardiovasc Drugs Ther 2012; 26: 167–179.

28.

Chang

K-V

Chen

S-Y

Chen

W-S

, et al. Comparative effectiveness of focused shock wave therapy of different intensity levels and radial shock wave therapy for treating plantar fasciitis: a systematic review and network meta-analysis. Arch Phys Med Rehabil 2012; 93: 1259–1268.

29.

Edwards

Lind

Lundell

, et al. Systematic review: standard- and double-dose proton pump inhibitors for the healing of severe erosive oesophagitis – a mixed treatment comparison of randomized controlled trials. Aliment Pharmacol Ther 2009; 30: 547–556.

30.

Lin

Chu

Hodges

JS.

Sensitivity to excluding treatments in network meta-analysis. Epidemiology 2016; 27: 562–569.

31.

Roskell

Lip

GYH

Noack

, et al. Treatments for stroke prevention in atrial fibrillation: a network meta-analysis and indirect comparisons versus dabigatran etexilate. Thromb Haemost 2010; 104: 1106–1115.

32.

Owen

Antithrombotic treatment for the primary prevention of stroke in patients with non valvular atrial fibrillation: a reappraisal of the evidence and network meta analysis. Int J Cardiol 2010; 142: 218–223.

33.

Dias

Sutton

Ades

, et al. Evidence synthesis for decision making 2: a generalized linear modeling framework for pairwise and network meta-analysis of randomized controlled trials. Med Decis Making 2013; 33: 607–617.

34.

Plummer

rjags: Bayesian graphical models using MCMC. R package version 4-6. 2016. https://CRAN.R-project.org/package=rjags

35.

Ades

AE.

Assessing evidence inconsistency in mixed treatment comparisons. J Am Stat Assoc 2006; 101: 447–459.

36.

McCarthy

Smyth

GK.

Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics 2009; 25: 765–771.

37.

Pereira

Horwitz

Ioannidis

JPA.

Empirical evaluation of very large treatment effects of medical interventions. JAMA 2012; 308: 1676–1684.

38.

Mills

Kanters

Thorlund

, et al. The effects of excluding treatments from network meta-analyses: survey. BMJ 2013; 347: f5195.

39.

Song

Xiong

Parekh-Bhurke

, et al. Inconsistency between direct and indirect comparisons of competing interventions: meta-epidemiological study. BMJ 2011; 343: d4909.

40.

Dias

Welton

Caldwell

, et al. Checking consistency in mixed treatment comparison meta-analysis. Stat Med 2010; 29: 932–944.

41.

Mawdsley

Bennetts

Dias

, et al. Model-based network meta-analysis: a framework for evidence synthesis of clinical trial data. CPT Pharmacometrics Syst Pharmacol 2016; 5: 393–401.

42.

Dias

Welton

Sutton

, et al. Evidence synthesis for decision making 4: inconsistency in networks of evidence based on randomized controlled trials. Med Decis Making 2013; 33: 641–656.

43.

Phung

Kahn

Cook

, et al. Dosing frequency of unfractionated heparin thromboprophylaxis: a meta-analysis. Chest 2011; 140: 374–381.

44.

Reich

Burden

Eaton

, et al. Efficacy of biologics in the treatment of moderate to severe psoriasis: a network meta-analysis of randomized controlled trials. Br J Dermatol 2012; 166: 179–188.

45.

Liu

Jia

P-L

, et al. The methodological quality of dose–response meta-analyses needed substantial improvement: a cross-sectional survey and proposed recommendations. J Clin Epidemiol 2019; 107: 1–11.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.88 MB

Effects of treatment classifications in network meta-analysis

Abstract

Objectives

Methods

Results

Conclusions

Keywords

Introduction

Methods

Data sources

Lumping treatments

Statistical analyses

Implementation

The deviance information criterion and the adjusted deviance information criterion for lumping

Comparing the changes of odds ratios due to lumping

Assessing evidence inconsistency

Results

Basic characteristics

DICs

Density plots

Changes due to lumping

Assessing evidence inconsistency

Discussion

Strengths and limitations

Recommendations and future studies

Conclusions

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iD

Supplemental Material

References

Supplementary Material