Abstract
Introduction
Knowledge modeling in radiation therapy treatment planning has been heavily researched 1 -19 and clinically implemented in recent years via commercial products such as RapidPlan, a clinical treatment planning guidance solution provided by Varian Medical Systems (Palo Alto, California). Knowledge models predict dosimetric parameters, such as 1-dimensional dose–volume points or 3-dimensional (3D) dose distribution, based upon patient anatomical features. Depending on how dose end points are generated, there are 2 major categories of knowledge modeling. The first category is anatomy-based method. It utilizes nearest neighbors (NN) to find the most similar case for the query case based on some similarity metrics. Atlas-based treatment planning modeling 20,21 falls into this category. Plan-related parameters such as the fluence map used by Good et al 20 or 3-D dose distribution used by Sheng et al 21 are transferred to the query case’s anatomy with appropriate fine-tuning. High similarity between the atlas case and the query case could guarantee decent dose prediction accuracy. Another category of knowledge modeling is statistics-based methods. It utilizes statistical regression and other machine learning models to learn the closed-form solution to the anatomy-dosimetry relation. Yuan et al 1 used distant-to-target histogram (DTH) to describe the geometric features between the organ-at-risk (OAR) and the planning target volume (PTV). The DTH-based piece-wise regression model is the fundamental solution employed by RapidPlan. 22 Wu et al 3 developed overlap volume histogram (OVH) to describe the geometric relation between OAR and PTV. Appenzoller et al 2 investigated multiple normal regression model to predict the dose level at certain distance. All aforementioned methods rely on the substantial amount of training cases to fully capture the relation between anatomy and dosimetry features.
Translating knowledge modeling into clinical practice requires substantial effort to guarantee the overall robustness of the model. Thorough analysis and validation of the knowledge model require efforts in both treatment planning and statistical modeling. Knowledge models are typically approximations of highly nonlinear maps. They require a sufficient number of cases for training and usually perform accurately when new cases are inliers in the feature space. Since there is intrinsic plan quality variation even within a single institution, sifting out high-quality plans to be included in model training is required. Delaney et al, 23 Tol et al, 24 and Sheng et al 25 analyzed the effect of outliers existing in the knowledge models. These studies found that the existence of outliers could adversely impact the model quality. Delaney et al and Tol et al focused on dosimetric outliers, while Sheng et al focused on the anatomical outliers. Planning target volume target delineation changes, such as moving from treating prostate only to treating pelvic lymph node (LN) as well, could introduce anatomical outliers to the knowledge model and subsequently deteriorate the model prediction accuracy as demonstrated by Sheng et al. However, no comprehensive strategy has been provided yet to handle these anatomical outliers in practice. On the other hand, improving the robustness of the statistical model is also needed to account for the variation in the training data set. One solution is to identify outliers and exclude them from modeling. One limitation of this approach is that it does not increase the model’s capability to handle such outlier cases. If such similar outlier case does occur again, it may be excluded from modeling or prediction.
In order to improve the model’s overall robustness, especially when dealing with novel anatomy, we propose a case-based reasoning framework for knowledge modeling. In this article, cases with novel anatomy, for example, the prostate-plus-LN cases in the present study, are also referred as outliers, or geometric outliers as referred by Sheng et al. 25 Geometric outlier is often referred to as anatomical outlier. It contrasts with the dosimetric outliers which are similar in anatomy but vary in plan quality. The novel anatomy, or anatomy outlier, contrasts with the inlier prostate cases primarily in the PTV definition. A prostate intensity-modulated radiation therapy (IMRT) plan generally treats the PTV which includes the prostate and seminal vesicles. 26 A prostate-plus-LN plan, however, requires treatment of the pelvic LN in addition to the prostate and seminal vesicles. 25 An example of a major difference in PTV delineation is depicted in Figure 1. Therefore, while they are both considered prostate cases, prostate and prostate-plus-LN cases are different in the sense of target delineation, which could affect the dose distribution as well as the OAR dose.

Example of anatomy comparison between prostate (A and B) PTV and prostate-plus-LN PTV (C and D). A, An example of axial slice of a prostate case which goes through the middle of the prostate. B, The axial slice of the same case as (A), which goes through the seminal vesicles. C, An example of axial slice of a prostate-plus-LN case which goes through the middle of the prostate. D, The axial slice of the same case as (C), which goes through the pelvic LN. LN indicates lymph node; PTV, planning target volume.
Case-based reasoning originates in artificial intelligence (AI) research as an effective framework to provide a solution to novel tasks. It consists of a closed-loop 4-R steps, namely “Retrieve”, “Reuse”, “Revise”, and “Retain”. 27 Retrieve aims to recall the most relevant experience to solving the current task by identifying a matching case that is in some sense the most similar to the current query case. Reuse refers to employing the solution from previous experience recorded in the matching case to the current task. Revise means adopting certain modifications to the previous solution to better solve the current task. Retain refers to storing the current experience and possibly updating the available knowledge for future practice. This closed-loop solution implemented in knowledge modeling could accumulate valuable knowledge over time to improve the capability of the knowledge model to handle more anatomy variation. The proposed case-based reasoning framework could provide a better understanding and utilization of machine learning and AI models in radiation therapy. As the general AI application evolves from shallow learning to deep learning, these AI tools turn into black box from the user’s perspective. It is now of supreme importance that the user understands the model better in terms of when and how to use the tool for each individual scenario. This step needs to be well studied and understood before the machine learning, and AI tools can be safely deployed in the clinical application. This study is the first attempt to introduce case-based reasoning framework in radiation therapy knowledge modeling.
In this study, we implemented the case-based reasoning framework to radiation therapy knowledge modeling. This case-based reasoning framework is capable of handling different scenarios, evolving from when the novel anatomy available is scarce to when the novel anatomy has accumulated ample amount of cases. We used pelvic cases in this study to demonstrate the concept.
Materials and Methods
Materials
A total of 105 pelvic IMRT plans were retrospectively selected for this study. Eighty plans are clinical prostate IMRT plans. The other 25 plans are clinical prostate-plus-LN IMRT plans. Prostate-plus-LN IMRT plans were included to mimic novel anatomy with respect to prostate-only plans.
Knowledge Model Design
The current framework involves 2 types of predictive models, namely the multiple stepwise regression model and the atlas-based model. The multiple stepwise regression model
1
has been commonly used as the methodology for knowledge modeling. For most inlier cases, that is, the prostate cases in this study, the framework will retrieve a regression-based model for prediction. For initial cases with novel anatomy, that is, the prostate-plus-LN cases, the framework adopts a dose atlas constructed from limited available training cases for prediction. When the number of cases with novel anatomy grows to a sufficient size, the Retain step of the case-based reasoning framework generates a regression-based model for future retrieval. Similar cases in the future will no longer be considered novel anatomy by the framework. In order to demonstrate the versatile workflow for various scenarios, we simulated 4 scenarios, namely
Regression model
In this study, we implemented the stepwise multiple-regression knowledge model proposed by Yuan et al 1 to predict OAR’s dose–volume histograms (DVHs). The regression model predicts DVH’s first three principal components (PCs) based on a set of carefully designed anatomical features, which include first three PCs of the DTH, OAR–PTV overlapping ratio, OAR outside treatment field ratio, OAR volume, and PTV volume. The stepwise multiple-regression model selects the features in forward selection manner. A feature is included in the model if the inclusion of it can provide statistically significant improvement of fit. Once features are selected, a multiple linear regression using all selected features is performed to establish the model.
Case-based reasoning using atlas
In addition to conventional regression knowledge modeling, we proposed a case-based reasoning framework that incorporates dose atlas guidance to supplement the regression model. We hypothesize that the introduction of case-based reasoning using atlas-based method could improve the overall model prediction accuracy. Atlas-based dose guidance has been implemented by Sheng et al for prostate cases.
21
In this study, we construct a case atlas for prostate-plus-LN cases, which ask for different anatomy features/descriptors. The case-based dose atlas was constructed using
where
To build the atlas, the anatomy pattern was extracted via 3 anatomical features: topological connectivity, nodal separation, and nodal length, as illustrated in Figure 2. Topological connectivity

Illustration of parameterization of anatomy pattern for prostate-plus-LN anatomy. First row denotes disconnected LN in the superior portion of PTV; second row denotes connected LN in the superior portion of PTV. The other 2 features are nodal separation which is the center-of-mass distance between left and right branch of nodal PTV, and nodal length which is defined as the superior–inferior length of the nodal PTV. LN indicates lymph node; PTV, planning target volume.
Each of N training prostate-plus-LN cases served as a dose atlas case. For each of query prostate-plus-LN case
where

Flowchart for dose atlas construction (top box) and case-based reasoning dose guidance (bottom box).
Scarce Scenario Simulation
The

Flowchart of case-based reasoning framework for knowledge modeling using atlas-based guidance for prostate-plus-LN anatomy. LN indicates lymph node.
Semiscarce, Semiample, and Ample Scenario Simulation
Retaining experience into memory, by adding current query case into the knowledge model training pool, should ideally boost model performance in the future practice. In order to validate this hypothesis, we simulated the

Workflow for retaining cases in regression model and atlas-based model. The number in the parenthesis indicates the number of cases available in that model.
We also evaluated if the regression model can take over the atlas-based model under different scenarios. The regression models can offer several advantages over the atlas-based model, such as the fact that the regression model can be easily transferrable once the model is trained and validated since the key information is the fitting parameters. However, the atlas-based model needs to carry the image, structure, and dose information at all times, making the model transfer difficult without protocols. We hypothesize that the regression model can take over the atlas-based model if (1) statistical significant improvement over the 5-case regression model is established and (2) no statistical significance is observed between the regression model and the corresponding size atlas-based model. One-tailed Wilcoxon Rank-Sum test was performed for each comparison.
Performance Evaluation
The regression model and atlas-guided prediction were evaluated for DVH accuracy using the sum of squared residual (SSR):
where
Results
Scarce Scenario
Of the 20 validation cases, 13 prostate-plus-LN cases were identified as outliers and were subsequently guided by the dose atlas (8 cases needed case-based reasoning guidance for bladder, 7 cases needed case-based reasoning guidance for rectum, and 2 cases needed both). The 2D feature map (nodal length and nodal separation) for all training and validation prostate-plus-LN cases is shown in Figure 6. Connectivity is color-coded with connected group colored by red and nonconnected group colored by blue. Atlas-to-query match is shown by the black line connecting the atlas case (square) and the query case (diamond). Query cases without line connection mean that they were identified as inliers and were subsequently predicted using the regression model. Figure 7 shows the DVH SSR comparison between the regression model and dose atlas guidance for all 13 outlier validation cases. For the bladder, the DVH SSRs from 5-case atlas guidance (0.174 ± 0.166) were significantly lower than those (0.459 ± 0.508) from the regression model trained with 5 prostate-plus-LN cases added (

Two-dimensional feature space (nodal length and nodal separation) showing atlas and query cases. Atlas case is square mark and query case is diamond mark. Node connectivity is color coded (blue is disconnected and red is connected). Line connecting atlas and query cases denotes that atlas-based model was invoked and the atlas and query cases were matched.

Boxplot of DVH SSR comparison between regression knowledge model prediction and case-based dose atlas prediction. Left boxplot shows the comparison for the bladder and right boxplot shows the prediction for the rectum. The blue box denotes interquartile range. Red bar in the box denotes the median value. DVH indicates dose–volume histogram; SSR, sum of squared residual.
Figure 8 shows DVH of one example case guided by case-based reasoning. Green lines are DVHs for the bladder and brown lines are DVHs for the rectum. Solid lines are clinical plan’s DVHs. Long-dashed lines are DVH predictions given by atlas-based guidance as a part of the case-based reasoning framework. Short-dashed lines are regression model-based predictions. For low-dose and high-dose regions, all 3 DVH groups (clinical plan, case-based prediction, and regression model prediction) agreed with each other for both OARs. For example, in intermediate dose region, regression model prediction overpredicted (∼10%) for the bladder and rectum when compared to the clinical DVH. Case-based prediction, on the other hand, agreed well with the clinical DVH.

DVH comparison among clinical plan (solid line), regression model prediction (dashed line), and atlas-guided prediction (dotted line). Green DVH is bladder and brown DVH is rectum. Atlas-guided prediction agrees better with clinical DVH than regression model prediction, especially for median dose level. DVH indicates dose–volume histogram.
Semiscarce, Semiample, and Ample Scenario
For the

Boxplots of DVH error, sum of squared residual, for regression model
Workflow for Using Case-Based Reasoning
We propose here a complete workflow to use case-based reasoning-assisted knowledge modeling for pelvic cases. When novel anatomy initially arises, for example, for the initial 5 prostate-plus-LN cases, the regression model does not predict well for these outlier cases as demonstrated in the studies by Delaney et al, 23 Tol et al, 24 and Sheng et al. 25 Since no prior knowledge exists, we recommend human intelligence to help provide clinical solutions in these instances. Human’s interaction with planning novel cases can help feed new knowledge back to the knowledge model in the future model training/refining process. When novel cases accumulate to 5, an atlas size recommended by Sheng et al, 21 the case-based reasoning framework will adopt dose atlas to provide prediction guidance as proposed in Knowledge Model Design Section. As more novel cases arise, the case-based framework will retain them as training pool for the regression model while still maintaining the atlas-based method. As the number of novel cases reaches 20, the regression model can be learned and independently functions with satisfactory accuracy. The entire workflow is illustrated in Figure 10.

Case-based reasoning workflow for handling pelvic novel anatomy cases for different scenarios when using knowledge models. When the available prostate-plus-LN cases are less than 5, manual planning is encouraged. Atlas-based case-based reasoning is effective when the number of available novel cases is between 5 and 20. Case-based reasoning framework retires after 20 novel cases are accumulated and the regression model is solely responsible for prediction.
Discussion
In this study, we proposed a case-based reasoning framework for a radiation therapy knowledge model. Results showed that case-based prediction achieved better accuracy than the regression model when dealing with novel anatomy cases. Results also showed that retaining these novel cases into the regression model did boost the prediction accuracy of the regression model for future query cases. This study demonstrated that case-based reasoning that judiciously combines the use of an atlas-based prediction and regression-based prediction can help improve the overall robustness of the knowledge-based modeling especially when the existing data in the system are sparse or the new observation is novel to the existing system. In addition, the closed-loop feedback Retain step helps the knowledge-based model learn the novel anatomy pattern in order to be able to generalize for more cases. This study demonstrated that the 4-R steps of case-based reasoning can be implemented under the knowledge-based modeling framework to make it more robust and less prone to erroneous generalization for novel unseen cases. We also provided a systematic workflow to guide generating and/or predicting dose for novel anatomy under various scenarios. When the number of novel cases is small (eg, less than 5), manual planning is encouraged to leverage human knowledge for the interpretation of novel anatomy. As novel cases accumulate to a sufficient size (eg, more than 20), a regression model provides good prediction accuracy. An atlas-based model is primarily useful between 5 and 20 novel cases, a range where the novel knowledge is rapidly growing from the regression model’s perspective. The proposed case-based reasoning framework addresses a major drawback of the conventional case-based and atlas-based knowledge models that require a large database of prior cases and are usually specific to one type of treatment sites or scenarios. The case-based reasoning framework could potentially integrate multiple regression models and multiatlas- (or case) based models into 1 overall knowledge modeling framework that can provide treatment planning guidance for various cancer sites. With the case-based reasoning framework, each case can be assigned to a specific local model, which is part of the general model. We are actively working along this direction.
The rationale of case-based reasoning originates from mimicking human planner’s behavior when dealing with novel cases. Human planner’s behavior is based on memory of training with similar cases. A good planner is capable of creating effective strategies based on past experience. Nowadays, machine modeling is repeating the first step by analytically parsing the anatomy and dosimetry relation, and as long as the anatomy pattern is within range of the training data, the prediction is mostly reliable. However, it is common that many patients have to be analyzed case by case, and they are often referred as new knowledge. This is where case-based reasoning is helpful in terms of improving the system’s overall robustness. And we need to deal with “Scarce scenario” which is also commonly seen in a clinical setting. Therefore, we believe the case-based reasoning framework provides a systematic approach to taking advantages of both the regression model and atlas-based method to build an overall enhanced and dynamically adaptive modeling scheme. The regression-based model requires sufficient numbers of training cases to reach optimal prediction accuracy, while the atlas-based approach can provide case-by-case guidance even if the novel knowledge is scarce. On the other hand, as the number of novel cases increases, both approaches show similar prediction accuracy with the regression model showing advantages. Once the regression model is trained, the training cases can be released from the model and the model can be easily transferred as a combination of model parameters. The overall prediction speed is faster as the atlas-based approach needs deformable registration and transferring dose. We believe the dual-model system is versatile and can adapt as the case available evolves.
This study is the first attempt to introduce case-based reasoning in radiation therapy knowledge modeling. The proposed case-based reasoning framework also fills the gap in translating knowledge models into effective clinical applications. While the specific design of the 4-R steps could vary for different knowledge models and for different clinical scenarios, the general principles of an intelligent system that learns from novel cases and accumulates new knowledge should remain the same and are well captured in the 4-R steps. We hope the introduction of case-based reasoning framework will provide a valuable foundation and inspire the future practice of handling knowledge models in complex clinical settings that will inevitably encounter novel scenarios. We anticipate that in the near future AI-based tool would be widely implemented and accepted in the clinic, and this study completes the final step of translating the tool into the clinic.
The 4-R steps of case-based reasoning framework add a layer on top of the original knowledge models which are known for inferior performance when predicting novelties. Specifically, the first 3-R steps address predicting and generating guidance for novel cases and the last R step, Retaining, is responsible for feeding new knowledge back to the knowledge model behind the scene. The 4-R steps within the case-based reasoning framework work collaboratively with each other and should not be separated.
We noticed in the result that there was less improvement provided by case-based reasoning for the rectum than for the bladder. The geometry change from treating prostate only to treating prostate-plus-LN affects the bladder more than the rectum. As shown in Figure 1, pelvic LN wraps around the bladder and changes the dose gradient inside the bladder entirely when compared to prostate cases. On the other hand, the rectum is less affected since the PTV shape around the rectum remains similar even with the inclusion of pelvic LN in the PTV although treating more superior component on top of the prostate results in a scaling effect of the DVH for the rectum. This is probably why the prostate model can still acceptably predict for the rectum for the prostate plus LN cases.
Cased-based prediction showed superior accuracy for outlier/novel geometry than the regression model as shown by pelvic cases in this study. One limitation for the statistical regression knowledge-based model is that it needs certain amount of training cases to saturate for accurate prediction. 24 This number could vary for different treatment sites. This makes the regression model difficult to adapt to new patient cases when deployed clinically. The model has to be thoroughly evaluated and validated for all possible anatomy geometry before released for use and even after this, the generalizability of the model will always have limits. Sometimes it is not feasible due to the lack of cases from particular treatment sites. Alternatively, we can implement the case-based reasoning framework that incorporates an atlas-based model to boost the overall performance. In this study, we used 3 shape descriptors to cluster the high-dimensional shape feature space. The entire space was clustered into 5 subspaces, with each atlas case responsible for predicting cases falling into its NN. Combined with deformable image registration, the warped dose from the atlas case can serve as a reasonable and clinically relevant prediction for the query novel anatomy. The regression model plus the case-based reasoning framework is the overall robust whether the data are sparse or not.
We constructed the current atlas based solely on the PTV’s geometry. We did not include the shape feature of the OAR into constructing the atlas. The reason is 2-fold. First, the PTV shape is highly variant for prostate-plus-LN cases. Since the intermediate-to-high dose level should be conformal to the PTV, shape descriptors for the PTV could best categorize all cases to better guide the subsequent dose warping, making the warped dose with reasonable fall-off around the target, and achievable for the optimization. Second, the OAR spatial location relative to the target is relatively similar for pelvic cases. This assumption may not hold true for other treatment sites such as gastrointestinal cases where the bowel can form any shape around the target. To expand the framework to other treatment sites, special consideration such as site-specific handcrafted feature is needed when constructing the atlas to best reflect the relation between the dose and the target/OAR shape features. Substantial amount of effort is needed in this regard, and it could be a limitation for deployment in many clinics as it stands. Developing transferable case-based reasoning framework which respects patient privacy and data transfer protocol is an option. Future research along this line is warranted.
This study demonstrated the feasibility of implementing case-based reasoning framework using pelvic cases. The case-based reasoning framework could potentially be more important and meaningful for other treatment sites. The anatomy commonly has more variation than pelvic cases, which results in the fact that more cases are needed for the regression model to saturate. However, often the cases available are extremely sparse, such as for the liver or pancreatic stereotactic body radiation therapy. Case-based reasoning would be helpful in this context to make decision about dose constraints for the OAR or even dose sparing tradeoff among OARs. These are current challenges for clinical implementation of knowledge-based modeling, and case-based reasoning offers a solution. Further research along this line is under way.
Retaining novel case did show performance improvement for the regression model. This observation echoes the fact that the regression model needs to reach a certain number to saturate for predicting accurately. Based on the results, interestingly, we found that when retaining up to 10 or 15 cases, the regression model was not statistically different than the atlas-based model. However, the regression model continued to improve (red median bar in Figure 9) as more cases were retained, and with 20 cases, statistical significance was observed in the difference. These results suggest that the regression model could replace the atlas-based model when the number of cases reaches 20. Adding novel cases into the regression model adds to the feature space covered by the regression model and subsequently reduces the chance of seeing novel anatomy in the future practice, which makes the regression model more robust against outliers. As more and more cases are retained, the chance of seeing outlier case is so small that the regression model reaches saturation for the specific treatment site. However, as new treatment techniques and treatment modalities arise, novel dose-anatomy patterns could appear again in the current model’s context. The case-based reasoning’s 4-R steps allow the framework to repeat learning and accumulating new knowledge.
Conclusion
In this study, a case-based reasoning framework was proposed and constructed that properly combines the use of a regression model for inlier cases (eg, prostate cases) and a dose atlas for novel cases (eg, prostate-plus-LN cases). The dose atlas served as a better prediction model when regression-based knowledge model is not suitable for prediction. Results showed that dose atlas guidance had superior prediction accuracy over the regression model when the number of novel case available is limited. A versatile workflow was provided to handle novel anatomy at different case number levels for pelvic plans. Establishing the case-based reasoning framework has the potential to improve the overall robustness of the clinical application of knowledge models.
