Abstract
Keywords
Introduction
A desire to reduce the number of patients required in a clinical trial has seen a large number of trialists utilise a group sequential study design. In this approach, interim analyses after certain landmark amounts of data have been collected afford the possibility for a trial to terminate early. Approaches to design for fixed sample studies are not appropriate with repeated data analysis, due to inflation of error rates. Therefore, a large amount of methodology has been developed to facilitate attaining desired type I and type II error rates in a group sequential trial. For an overview of this methodology, see Jennison and Turnbull 1 or Whitehead. 2
In addition, following a group sequential trial, special inferential techniques are required. In the case of confidence intervals, this is to ensure the desired coverage is attained. For p-values, it is to guarantee that they are consistent with the decision about the null hypothesis. Here, our focus is on point estimation, where specially developed methods are required because the standard maximum likelihood estimator (MLE) is no longer either unbiased or minimum variance.
Numerous authors have now proposed point estimation procedures for after a group sequential trial. Often, these have sought to reduce the bias in the estimate; an important aim given that the magnitude of a treatment effect is always of principal interest in a clinical trial. However, there are several important factors to consider when choosing a preferred estimator, including also the residual mean squared error (RMSE), and the division of the bias and RMSE into their marginal and conditional (on stage of termination) values.
Several authors have also sought to compare estimators.3–6 However, each of these works has compared a limited number of estimators, and has been set within the context of a specific study setting (e.g., Shimura 6 consider survival data). Recently, Robertson et al. 7 provided an extended discussion of adjusted point estimation following adaptive design. Along with describing available methodologies, they provided guidance for researchers on best practice, and reviewed current use in published trials. They found, like Zhang et al., 8 that to date few studies have computed an adjusted estimate. Shimura et al. 6 previously argued this is because of a lack of available comparisons of estimators and lack of software for their computation. Robertson et al. 7 also noted the potential importance of conducting an extensive comparison of estimators to choose the preferred approach for a given trial. The purpose of this paper is therefore to describe a large number of estimators, and historical results pertaining to them, within a common notation. Through this, we make the principal evaluations that are required to compare estimators clear. We then examine the performance of these estimators in terms of their biases and RMSE through several informative examples. By focusing on a certain type of two-stage trial, we also derive a number of small new results that simplify the determination of several estimators in practice. Furthermore, we provide code for implementation such that modification for alternative settings can be readily achieved. We proceed by first describing the exact design setting considered.
Methods
Design setting
We consider point estimation following a two-stage group sequential trial that bases decisions on the standardised test statistics
At least approximately, this joint distribution holds for an extremely wide variety of trial designs and study endpoint types. We assume
We let
Note that
We compare the performance of nine-point estimators for
Maximum likelihood estimator
The MLE of
Whitehead
19
suggested adjusting
A simpler alternative to
Median unbiased estimator
Next, we consider an MUE for
Here, we focus on the stage-wise ordering, first proposed by Armitage,
23
which has since been used by many authors (see, e.g., Siegmund
24
, Fairbanks and Madsen
25
, Tsiatis et al.
26
). As noted earlier, this ordering requires the continuation region to be an interval, which was our reason for the restriction to designs such that stage 2 occurs when
Using this, along with the distribution of the test statistics, we have
Uniform minimum variance unbiased estimator
A uniform minimum variance unbiased estimator (UMVUE) is an estimator of interest in most estimation problems, and the considered two-stage group sequential design framework is no exception. Emerson and Fleming
3
proposed an unbiased estimator of
Conditional maximum likelihood estimator
Highlighting the potential importance of reducing conditional bias, Liu et al.
13
and Fan et al.
15
considered an estimator, which we refer to as
Conditional weighted MAE
Shimura et al.
30
proposed a shrinkage estimator for use when a group sequential trial terminates early. Their estimator requires a prior guess at
Shimura et al.
30
proposed this estimator as a method of reducing conditional bias on early termination. They did not discuss a functional form for an estimate when a trial continues to stage 2. Here, to be able to compute the marginal performance of
Conditional median unbiased estimator
A conditional MUE (CMUE) was proposed by Zhong and Prentice
31
and Koopmeiners et al.
4
No explicit solution exists to its value in general, but it can be determined numerically as the solution to
Conditional UMVUE
The final estimator we consider is the conditional uniform minimum variance unbiased estimator (CUMVUE, sometimes called the uniform minimum variance conditionally unbiased estimator). That is, an estimator
Code
Having described each of the nine estimators that will be compared, we next proceed to present evaluations of their performance in five indicative study examples. Code to reproduce these results exactly is available at https://github.com/mjg211/article_code. In addition, estimators can be compared within the setting of a two-stage group sequential trial with a parallel two-arm design assuming normally distributed outcome data via the GUI to the OptGS 35 package, available at https://mjgrayling.shinyapps.io/optgs/.
Note that in the majority of our examples, if the study is powered for
We highlight also that whilst each of our examples is given some clinical context to make them more tangible and practically useful, ultimately the performance of the estimators is only dependent on the clinical context through the specified values of
Results
Log-rank test for survival data in a two-arm parallel-group trial
To begin, we consider an important context: A two-arm parallel-group individually randomised trial for time-to-event data under the proportional hazards assumption. This may correspond, for example, to an oncology trial in which the objective is to ascertain improved overall survival for some new treatment (indexed
Firstly, it is assumed that the hazard rate at time
To design such a trial, given the exact information levels
Figure 1 then shows the performance of the estimators for the design with efficacy and futility stopping, while Figure 2 gives the corresponding results for the design with futility stopping only. In Figure 1, the CMLE and CMUE arguably have the best performance conditional on termination after stage 1. They also perform well conditional on stage 2 termination, where the CUMVUE also provides effective performance. Though the marginal RMSE of the CMLE and CMUE is sometimes larger than the other estimators (e.g., for

The conditional and marginal biases and residual mean square error (RMSE) of the nine considered estimators is given for Example 1: two-arm survival data, with efficacy and futility stopping.

The conditional and marginal biases and residual mean square error (RMSE) of the nine considered estimators is given for Example 1: two-arm survival data, with futility stopping only. The vertical axis limits have been constrained such that CMLE performance is not visible for all
However, in Figure 2, it can be seen that the marginal performance and performance conditional on termination after stage 1 of the CMLE and CMUE is extremely poor. We comment further on the CMLE’s performance in this setting in the subsequent example and in the Discussion. In the case of futility stopping only, the CUMVUE seems to be a substantially better option if one is willing to forgo a slightly larger absolute marginal bias. Alternatively, the MAEs, MAE1 and MAE2, may be preferred if the marginal bias of the CUMVUE is viewed to be too great. Contrasting the findings of Figures 1 and 2, it is thus clear that the performance of the estimators can be substantially impacted by the inclusion (or not) of efficacy stopping.
Wason et al.
38
discussed the use of continuous tumour shrinkage endpoints in two-stage phase II single-arm oncology trials, for the purposes of reducing requisite sample sizes. It was assumed that outcome
Figure 3 displays the conditional and marginal biases and RMSE of the nine considered estimators in this design when

The conditional and marginal biases and residual mean square error (RMSE) of the nine considered estimators is given for Example 2: single-arm normally distributed data. The vertical axis limits have been constrained such that CMLE performance is not visible for all
Jones and Kenward
39
considered sample size calculation for a
Figure 4 displays the conditional and marginal biases and RMSE of the nine considered estimators in this design when

The conditional and marginal biases and residual mean square error (RMSE) of the nine considered estimators is given for Example 3: crossover data.
Above, we have covered examples with time-to-event and normally distributed data. Here, we focus on a case with Bernoulli data. Schoffski et al.
42
presented the results of a phase II single-arm oncology trial conducted to assess the activity of crizotinib in patients with advanced clear-cell sarcoma with MET alterations. Tumour response was used as the primary outcome and thus it was assumed that outcome
This design can be mapped to our setting as follows (see section 3.6 of Jennison and Turnbull
1
for further details). First, set
The design from Schoffski et al.
42
corresponds to
Figure 5 shows the nine estimators’ performance in this setting, for

The conditional and marginal biases and residual mean square error (RMSE) of the nine considered estimators is given for Example 4: single-arm Bernoulli distributed data. The vertical axis limits have been constrained such that CMLE performance is not visible for all
Note that we comment in the Discussion on the use of the canonical joint distribution framework for Bernoulli data.
We conclude with an example with a two-sided alternative hypothesis. Jennison and Turnbull
1
discussed (see Sections 3.1.3 and 3.2.2 of their book) the sequential design of a matched pairs trial in which subjects are paired so that those in the same pair have similar values of important prognostic factors. Matched pairs designs are also commonly employed in paired-eye and paired-teeth trials, as well as twin studies. One subject in each pair is randomly allocated treatment A and the other receives treatment B; letting
Figure 6 displays the conditional and marginal biases and RMSE of the nine considered estimators in this design when

The conditional and marginal biases and residual mean square error (RMSE) of the nine considered estimators is given for Example 5: matched pairs data.
In this paper, we have compared the performance of nine estimators for the principal parameter of interest after a two-stage group sequential trial, within the context of five example trials, evaluating their conditional and marginal biases and RMSE. Unfortunately, as is clear from Figures 1 to 6, there is no single estimator that performs best for the conditional and marginal biases and RMSE. However, a number of recommendations remain possible. Firstly, if one cares solely about the marginal bias, the UMVUE is naturally the optimal estimator. Secondly, if only bias on termination after the interim analysis is of concern, as it may be in the case where interim stopping is only allowed for futility, then the CUMVUE should likely be preferred. Both of these estimators are observed to perform poorly by some measures, though. In particular, as has been discussed previously, the UMVUE often has large conditional bias and RMSE.13–15 In general, we would also caution against focusing solely on a single measure of estimator performance. For example, even on termination for futility after stage 1, effective estimation may still be important for decision-making about subsequent studies. Thirdly, the use of the CMLE (and to a lesser extent the CMUE) is not advisable in certain settings due to its large bias. This result about the CMLE was recently formally proved by Berckmoes et al.,
44
who demonstrated for the early stopping boundaries used in the second example (
Where the conditional and marginal values of the bias and RMSE are all of importance, arguably the MAEs may be considered the best choice as they are typically amongst the better estimators on all six sub-panels of Figures 1 to 6. However, an additional potentially important consideration when choosing an estimator, beyond their bias and RMSE, is the predictability of the information levels. The MAE estimates of
Overall, given the performance of the estimators is highly dependent on the underlying design, we would thus always recommend evaluations be performed to help choose the estimators for any proposed group sequential design. When this is not possible, our simple recommendation would be to utilise one of the MAE estimators in the case that information levels are predictable. When the information levels are not predictable, the CWMAE may be a fallback choice, owing to its effective performance across the sub-panels in the figures given here. Such recommendations, however, are based on surveying performance across a wide range of values of
We acknowledge some limitations to our work. Firstly, though we have considered a large number of possible estimators, others have been proposed in the literature. In particular, Wang and Leung
45
proposed the use of parametric-bootstrapping procedures to attain a bias-adjusted estimator. We omit consideration of it here due to the computational complexity involved in evaluating their approach across numerous values of
A reviewer raised an interesting comment in regard to the trial scenarios considered, about how point estimation may be affected by the choice of stopping boundaries (e.g., O’Brien-Fleming, Pocock). In general, it is reasonable to anticipate that more ‘aggressive’ stopping boundaries (i.e., boundaries that increase the probability of termination after stage 1) will result in reduced values of
We also end with caution against the use of the canonical joint distribution and the estimators given here in some settings. In particular, though we included an example with Bernoulli data to illustrate how the methodology can be applied in that setting, for such studies designs leveraging exact binomial densities and associated estimators should likely be preferred for all but the largest trial sample sizes (see, e.g., Porcher and Desseaux 5 for the single-arm case, while Bibbona and Rubba 47 provide relevant results for other designs). In addition, for time-to-event data in small-sample settings, direct simulation of study data and evaluation of estimator performance in this way (rather than using the canonical approximation) would be advisable.
In conclusion, the best estimator for a given trial is dependent on the estimator’s relative performance for, and the relative desire to minimise each of, the conditional and marginal biases and RMSE. Evaluating the performance of each of the estimators in a given trial design scenario can be efficiently completed. Undertaking this task in practice will enable investigators to make a more effective choice on how to estimate their parameter of interest. 7 To date, few group sequential trials have computed adjusted estimates,7,8 which may be particularly problematic for their subsequent inclusion in meta-analyses. It may also negatively impact decision-making around whether a treatment should be further developed. We encourage their increased use in future studies.
Supplemental Material
sj-pdf-1-smm-10.1177_09622802221137745 - Supplemental material for Point estimation following a two-stage group sequential trial
Supplemental material, sj-pdf-1-smm-10.1177_09622802221137745 for Point estimation following a two-stage group sequential trial by Michael J Grayling and James MS Wason in Statistical Methods in Medical Research
Footnotes
Acknowledgements
Declaration of conflicting interests
Funding
Supplemental material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
