Abstract
Introduction
In conventional dose finding for oncology treatment, a common assumption is that both efficacy and toxicity of the drug increase monotonically with the dose. Traditional phase I clinical trials mainly focus on toxicity with the goal to determine the maximum tolerated dose (MTD) based on the target dose-limiting toxicity (DLT) rate.
1
However, due to the revolution of the targeted therapy and immunotherapy in cancer treatment,
2
many new agents in clinical oncology violate the monotonic dose–efficacy relationship. For some immunotherapy agents, a higher dose may yield lower efficacy, which leads to an umbrella-shape dose–efficacy curve.
3
An example of a plateau-shape efficacy curve can be observed for the efficacy of PTK/ZK, an orally active inhibitor of vascular endothelial growth factor receptor tyrosine kinases. Its efficacy initially increases with the dose but then remains unchanged after reaching a threshold,
4
which results in a plateau-shape curve. It becomes commonplace to incorporate efficacy evaluation in dose recommendation for oncology clinical trials. By incorporating both efficacy and toxicity data, such dose-finding trials are typically referred to as seamless phase I/II trials, which aim to identify the optimal biological dose (OBD), defined as the dose with the highest efficacy probability while controlling the DLT rate.
5
Due to violation of the monotonic dose–efficacy relationship, the traditional phase I trial designs, such as the “3
However, all the aforementioned methods either rely upon a parametric model assumption or require tedious specification of design parameters. Either misspecification of the model or inappropriate tuning of design parameters would lead to compromised or even poor trial performance. Our goal is to develop a model-free and calibration-free approach to dose finding, which does not require calibration of any essential design parameters. In general, early stopping rules are not intrinsic part of a design, which serve as external monitoring schemes for safety and futility. The model-free and calibration-free features guarantee our design to be robust and simple for practical use.
Our research is motivated by a collaboration with clinicians on a phase I dose-escalation study of the CD19 chimetric antigen receptor (CAR) induced-T-to-natural-killer (ITNK) cell therapy. The objective of the study was to assess the safety as well as the efficacy of CD19 CAR-ITNK cell therapy in adult patients with relapsed or refractory diffuse large-B-cell lymphoma. Three prespecified doses were considered in the trial,
Another motivating example is a phase I/II trial of lenalidomide in combination with high-dose melphalan for patients with relapsed or progressive multiple myeloma. 23 There were four doses of lenalidomide in the dose escalation phase: 25, 50, 75 and 100 mg, while the dose of melphalan was fixed. The goal of the trial was to identify the OBD of lenalidomide in terms of the trade-off between toxicity and efficacy. The DLTs were defined as regimen-related death, graft failure, grade 3 or 4 atrial fibrillation as well as the grade 4 deep venous thrombosis or pulmonary embolism before day 30 after the autologous hematopoietic stem cell transplantation (auto-HCT). The efficacy outcome was defined as being alive in complete response on day 90 after auto-HCT.
The major difficulty in these phase I/II studies is to determine the shapes of dose–efficacy and dose–toxicity curves without adequate prior information. The model-based dose-finding designs may be at risk of violation of parametric assumptions and thus lead to unreliable dose assignment and incorrect OBD identification. Further, most of the existing phase I/II designs require calibration of certain design parameters prior to the implementation. However, due to a lack of preclinical information in the first-in-human study, it is challenging to specify the design parameters suitable for such phase I/II designs. To avoid the potential risk of model misspecification and alleviate the burden of parameter calibration, we propose a calibration-free odds (CFO) design to identify the OBD. The CFO design bypasses all the parametric model assumptions, which is thus model-free or curve-free. Before the dose assignment for each new cohort of patients, CFO casts the current dose level in competition with its two neighboring (left and right) dose levels based on evidence in the form of odds to determine an admissible set. An incoming cohort is then assigned to the dose level that has the largest posterior probability to achieve the highest efficacy rate among the dose levels in the admissible set. The CFO design is calibration-free in the sense that its implementation does not require prespecification of any essential design parameter except for the target DLT rate
The rest of the paper is organized as follows. In the next section, we introduce the CFO design for both MTD and OBD identification. We then present the simulation studies to evaluate the operating characteristics of the new method and compare CFO with several phase I and phase I/II designs in
Methodology of the CFO Design
Identification of the MTD
Suppose that a clinical trial is initiated to investigate
After enrolling
We first illustrate the CFO design for a phase I trial, which aims to determine the MTD of the drug and its dose level satisfies
Intuitively, the odds

Illustration of the posterior distributions of the DLT probabilities for the left, current and right doses,
The key issue for dose finding is to determine how large the value of
Specifically, a large value of
In addition, when making
The essential step is to choose a suitable threshold
Given
Dose escalation and de-escalation rules of the CFO design in searching for the MTD.
Given the target

The threshold values of
Under the dose movement rule in Table 1, the CFO design for MTD identification is described as follows.
Start the trial by treating the first cohort of patients at the lowest dose or a prespecified initial dose. After enrolling Select the dose for the next cohort following the rules in Table 1. Repeat steps (ii) and (iii) until the maximal sample size is reached or the early stopping criteria are met.
Table 1 includes the case with
Identification of the OBD
As an essential part of the outcomes collected in phase I/II trials, efficacy data need to be incorporated in dose finding under the CFO design. Upon the arrival of the If the decision is to de-escalate the dose, then If the decision is to stay at the current dose, then If the decision is to escalate the dose, then
The admissible set is constructed using toxicity data alone and no dose skipping is allowed during dose escalation, while dose skipping is permitted for dose de-escalation due to jointly modelling both toxicity and efficacy data.
Given the current data
Following the above dose movement decisions when accounting for both toxicity and efficacy, the proposed phase I/II dose-finding procedure for the OBD proceeds as follows.
Start the trial by treating the first cohort of patients at the lowest dose or a prespecified initial dose. After enrolling The dose level for the next cohort is determined by (1). Repeat steps (ii) and (iii) until the maximal sample size is reached or the early stopping criteria are met.
At the beginning of the trial, there is no information for the neighboring dose levels, while the CFO design can still work normally because we assign non-informative priors to the DLT and efficacy rates of each dose. An example in Appendix C.3 demonstrates how CFO works at the beginning of a trial.
Early Stopping and Final Selection
During the implementation of the CFO design, it is preferable to impose some early stopping criteria to ensure the safety and benefit for the patients. For toxicity monitoring, we eliminate the dose level when there is strong evidence to corroborate its over-toxicity. In particular, we eliminate dose level
For the phase I/II trial design, we further consider the efficacy data to terminate the trial early if none of the admissible dose levels shows adequate efficacious effect. Given the lowest acceptable efficacy rate
In our simulation studies and real data application, the two cutoff values for toxicity and efficacy early stopping are set as
After the trial is completed, to guarantee the monotonically increasing trend of the dose–toxicity curve, an isotonic regression
24
is performed on the observed DLT rates to obtain the final estimates
Simulation Studies
Toxicity Evaluation Under Random/Fixed Scenarios
As determination of the MTD is an essential part of the CFO design, we first conduct extensive simulation studies in the context of identifying the MTD. We compare CFO with BOIN
25
and CRM
7
. The target DLT rate is
Six performance statistics are used to assess the operating characteristics of the three designs. The two main measurements, reflecting the accuracy and efficiency of a design, are the percentage of MTD selection and the percentage of patients treated at the MTD, for which the larger the better. The remaining four measurements quantify the safety aspects of a trial, which include the percentage of trials of selecting overdoses as the MTD, the percentage of patients allocated to overdoses, the risk of high toxicity (defined as the percentage of trials leading to the DLT rates greater than
The results on the MTD identification are shown in Figure 3. When the average probability difference around the target increases, all the three methods lead to better performances in terms of the six measurements because the MTD is more easily distinguishable from its neighboring doses. In terms of the two main measurements on accuracy and efficiency, the CRM design performs the best, while the CFO method ranks the second. The gap diminishes when the average probability difference around the target increases. When the average probability difference is

Simulation results for the MTD identification based on
To better evaluate the characteristics of CFO, BOIN and CRM, we further investigate the operating characteristics of the three designs under six fixed representative dose–toxicity scenarios. The metrics of evaluation are the percentage of MTD selection, the number of patients allocated to each dose level and the percentage of patients experiencing DLT. For consistent comparisons, we adopt the same settings as the random scenarios. We also include the non-parametric optimal design as the benchmark,29,30 for which the non-selection rule is incorporated for a fair comparison. For each scenario, we replicate 5000 simulations and summarize the results in Table 2. Overall, the two algorithm-based methods, CFO and BOIN, yield more robust performances across the six scenarios. In particular, CFO performs slightly better than BOIN in terms of both the MTD selection and patient allocation in the first five scenarios. The model-based CRM appears to be sensitive to the parametric modeling structure, i.e., the matching between the model skeleton and the truth. For example, in scenario 3 where the truth is close to the CRM model skeleton, the CRM performs better than the other two methods with an increment of around
The percentage of MTD selection (the number of patients treated at each dose) under the CFO design in comparison with the BOIN and CRM under six fixed scenarios with the target toxicity probability 0.33 in boldface. None represents the percentage of trials of non-selection. Benchmark indicates the results under the non-parametric optimal design with complete information.
We also investigate the influential factors which affect the result of the dose-finding trial in terms of the percentage of MTD selection via the analysis of variance (ANOVA) method used by Cangul et al. (2009) 31 in Appendix C.1. The results also indicate that the CFO design strikes a good balance between efficiency and safety in our settings.
Toxicity and Efficacy Evaluation Under Random/Fixed Scenarios
We further compare the CFO design for identification of the OBD with the WT design,
15
STEIN
18
and model adaptation (MADA) design
17
in phase I/II clinical trials. We consider
To assess the four designs comprehensively, we evaluate them under the randomly generated phase I/II scenarios. We first consider the umbrella-shape and plateau-shape dose–efficacy curves separately, and then we mix the two types of curves together to show the overall performance of the four designs. For the dose–toxicity curve, we still follow the generation method of Paoletti et al. (2004)
28
and control the average probability difference around
The comparison mainly focuses on two important metrics: the percentages of OBD selection and OBD allocation. The results under the random scenarios are presented in Figure 4. The top row of Figure 4 shows the percentages of the OBD selection and allocation for the umbrella-shape scenarios. Among the four methods, MADA has the overall best performance in the OBD selection percentage, while CFO also shows satisfactory results. The WT design performs the best when the probability difference is

Simulation results for the OBD identification based on
We further assess the four designs under six fixed scenarios as shown in Figure 5, which include the plateau-shape (scenarios 1 and 2), umbrella-shape (scenarios 3 and 4) and monotone increasing (scenario 5) dose–efficacy relationships as well as the over-toxic (scenario 6) case. We adopt the same settings as the random scenarios and report the percentage of OBD selection and the number of patients allocated to each dose level as well as the percentage of patients experiencing DLT, the percentage of patients showing efficacy outcomes and the non-selection rate (i.e., the percentage of trials that do not select any dose as the OBD). To facilitate the comparison, we also add the non-parametric optimal design29,32,33 as the benchmark. Under each scenario, we carry out

Six simulation scenarios for assessing the CFO design in identification of the optimal biological dose (OBD). The dashed line is the dose–efficacy curve while the solid line is the dose–toxicity curve. The OBD is highlighted by asterisk in the
The percentage of OBD selection (the number of patients treated at each dose) under the CFO design in comparison with existing phase I/II dose-finding methods under six fixed scenarios in Figure 5. None represents the percentage of trials of non-selection. Benchmark indicates the results under the non-parametric optimal design with complete information.
In scenarios 1 and 2 where the dose–efficacy curves are plateau-shape, the WT design yields the highest percentage of OBD selection while CFO ranks the second. The CFO design has a relatively small percentage of DLT in scenario 1 and the WT design appears to be the safest in scenario 2. The MADA design also leads to satisfactory results for the two plateau-shape scenarios. The STEIN design performs well in scenario 2 but poorly in scenario 1.
Under the umbrella-shape dose–efficacy curves corresponding to scenarios 3 and 4, similarly, the WT performs the best in terms of the OBD selection while CFO yields the second highest percentage of OBD selection. With regard to the safety, the WT design yields the best result in scenario 3 and CFO has the smallest percentage of DLT in scenario 4. The MADA and STEIN designs also demonstrate satisfactory results, but they are consistently worse than the CFO and WT designs. In scenario 5 where the MTD and OBD are identical, MADA has a significantly higher percentage of OBD selection than the other three designs, while CFO still delivers a decent performance in comparison with the WT and STEIN methods. The WT design performs rather poorly under this scenario, which may be due to the model misspecification because it is a model-based method. When all the dose levels are overly toxic as in scenario 6, CFO leads to the highest non-selection rate, while the performances of STEIN and WT are comparable. The MADA design has an extremely low non-selection rate and it selects the first dose level for most of the times, which is due to the fact the MADA design has no early stopping rule for futility. In the first five scenarios, there are large gaps between the four designs and the non-parametric optimal benchmark. Under the over-toxic scenario, the CFO, STEIN and WT designs have comparable results with the benchmark.
Aggregating results under both the random and fixed scenarios, it can be concluded that overall the WT and CFO designs perform the best in phase I/II trials. However, the performance of the WT design depends on the scenarios which may yield rather poor performance under some specific cases due to the potential risk of assuming a model-based structure. Because of its model-free and calibration-free nature, the CFO design leads to a more robust performance in the OBD-identification task in contrast to the other three methods. Although the STEIN design is also a model-free approach, it still requires to specify some design parameters, and thus it is still sensitive to certain dose–response scenarios. The performance of the MADA design varies dramatically as the scenarios change and it yields fairly low percentages of the OBD allocation because it is a two-stage design.
Real Trial Application
As an illustration, we apply the proposed CFO to redesign the aforementioned phase I/II trial of lenalidomide in combination with the high-dose melphalan. The trial enrolled a total of
We rerun this trial on the basis of the estimated DLT and efficacy rates using the CFO design, for which we set the target DLT rate as

Dose allocations and the corresponding toxicity and efficacy outcomes for the redesigned trial.
Discussion
We have proposed a new calibration-free odds design for phase I/II clinical trials to find the OBD for the targeted therapy and immunotherapy treatments. Identification of the MTD is a by-product of the CFO design, if we monitor the toxicity alone. Unlike other methods which monitor the toxicity data by considering either the current dose level only (e.g., the
Although minimization of
The early stopping rules used in CFO are not internal components of the design, and other rules may be adopted for safety and futility stopping.
15
Our stopping rules follow the work of Yin et al. (2013) and Yin and Yang (2020),18,25 which deliver robust and good performances with the toxicity and futility cutoff values of
In the development of the CFO method, we only consider the case where the efficacy and DLT outcomes are ascertainable quickly after the treatment. However, it is straightforward to extend the CFO design for the late-onset endpoints; for example, we can combine the CFO design with the so-called factional imputation method35,34 for the late-onset endpoints, which warrants further development.
The R code for reproducing the simulation results is available at https://github.com/JINhuaqing/CFO-simu, and the one-trial implementation of the CFO design is accessible at https://github.com/JINhuaqing/CFO.
Supplemental Material
sj-pdf-1-smm-10.1177_09622802221079353 - Supplemental material for CFO: Calibration-free odds design for phase I/II clinical trials
Supplemental material, sj-pdf-1-smm-10.1177_09622802221079353 for CFO: Calibration-free odds design for phase I/II clinical trials by Huaqing Jin and Guosheng Yin in Statistical Methods in Medical Research
Footnotes
Acknowledgements
Declaration of conflicting interests
Supplemental material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
