Abstract
Introduction
β-Thalassemia major (β-TM) is an autosomal recessive genetic disorder characterized by reduced or absent synthesis of β-globin chains. Currently, hematopoietic stem cell transplantation (HSCT) remains the only definitive curative therapy for β-TM. 1 However, HSCT significantly increases the risk of viral infections, with cytomegalovirus (CMV) infection being the most prevalent complication, affecting up to 70% of transplant recipients.2,3 CMV infection can cause severe tissue and organ disease, impair immune function, elevate the risk of Epstein-Barr virus (EBV) reactivation leading to lymphoproliferative disorders, and adversely impact graft function and transplantation outcomes. 4
Multiple risk factors contribute to the occurrence of CMV infection after HSCT, including older recipient age, acute graft-versus-host disease (aGVHD), T-cell depletion protocols, and donor-recipient mismatch or unrelated donors. 5 Despite these recognized factors, to our knowledge, no predictive model has yet been established specifically to identify β-TM patients at high risk for CMV infection after HSCT. Such a predictive model is critically needed to enable clinicians to stratify risk early, initiate timely prophylactic or preemptive interventions, and thereby potentially improve patient outcomes.
In this study, we aimed to develop and validate a practical, reliable scoring system to predict CMV infection risk after HSCT specifically in patients with β-TM. By integrating clinical data from our center and external validation cohorts, we identified independent risk factors—including serum albumin levels, donor type, and severe aGVHD (grade III–IV)—and assigned corresponding risk scores. This novel scoring system represents the first predictive tool tailored to this specific patient population, providing clinicians with a straightforward method for timely risk assessment and targeted management to reduce CMV infection rates and improve transplantation efficacy.
Patients and methods
Study patients
Between July 2020 and November 2023, a total of 291 patients with β-thalassemia major underwent allo-HSCT at the First Affiliated Hospital of Guangxi Medical University. All patients were subjected to a standardized conditioning regimen, which included fludarabine, anti-thymocyte globulin, busulfan, and cyclophosphamide. Furthermore, the prophylactic regimens against GVHD consisted of cyclosporine or tacrolimus, mycophenolate mofetil, and methotrexate. 6 The patients were monitored regularly, and the follow-up endpoint was set for May 31, 2024. 5 Following HSCT, 84 of these patients developed CMV infection. The derivation cohort consisted of 291 patients and was used to develop the prediction model. In order to assess the model’s applicability, it was tested using both internal and external validation data. The sample size was estimated to be 5–10 times the number of variables in the model, while the total sample size was determined based on the endpoint event incidence to ensure consistency with the study scale. Approximately 30% of the training set sample size was chosen to define the external validation sample size. 7 Following the estimation, 84 patients who met the inclusion criteria were selected to constitute the external validation set. The externally validated cohort was recruited from Liuzhou Workers’ Hospital. Out of 84 β-TM patients who underwent allo-HSCT between September 2019 and August 2022, 17 developed CMV infection following transplantation. The conditioning regimens for these patients included fludarabine, busulfan, and cyclophosphamide. Furthermore, anti-thymocyte globulin, cyclosporine, mycophenolate mofetil, and methotrexate were used as part of the treatment protocol. All patients underwent hydroxyurea treatment before transplantation.
Inclusion criteria: (1) Genetic testing confirmed a diagnosis of severe β-thalassemia. (2) Negative CMV-DNA test results prior to transplantation. (3) Negative serological findings for human immunodeficiency virus (HIV), hepatitis C virus (HCV), or hepatitis B virus (HBV). Exclusion criteria: (1) Positive CMV-DNA test results prior to transplantation. (2) The patient underwent a second transplantation. (3) Diagnosis of α-thalassemia. (4) Incomplete clinical data. The study design is depicted in Figure 1.

Flow diagram illustrating the development of the CMV infection model.
Ethics approval and consent to participate
The study was conducted in accordance with the principles of the Declaration of Helsinki and obtained ethical approval from the Institutional Review Board of The First Affiliated Hospital of Guangxi Medical University (approval number: 2025-E0022). Written informed consent was obtained from all participants prior to their involvement in the study. Adult participants (aged 18 and above) provided consent independently, whereas for minors (under 18), consent was granted by their legally authorized representative on their behalf.
Data collection and definitions
This study utilized a retrospective cohort design to investigate the outcomes of interest. A total of 31 variables were systematically collected, encompassing general patient information, transplantation-related drug regimens, graft characteristics and engraftment status, post-transplantation complications, and pre-transplantation blood biochemistry test results. All data were retrieved from the hospital’s electronic medical record system. A summary of these variables is presented in Table 1. Blood biochemical test results were extracted from reports generated within 24 h following patient admission.
Comparison of clinical data between non-CMV infection group and CMV infection group.
CSA: cyclosporin A; TAC: tacrolimus; MMF: mycophenolate mofetil; MTX: methotrexate; EBV: Epstein-Barr virus; aGVHD: acute graft-versus-host disease; WBC: white blood cell; HB: hemoglobin; PLT: platelet; NEU: absolute neutrophil; ALB: serum albumin; MNC: total mononuclear cells; CMV: cytomegalovirus; IQR: interquartile range; HLA: human leukocyte antigen.
Peripheral blood samples were regularly collected from all patients following transplantation, and plasma CMV-DNA concentrations were measured by means of quantitative polymerase chain reaction (Q-PCR). CMV infection was defined as the detection of >1000 copies/mL of CMV-DNA in viral nucleic acids in any body fluid or tissue specimen on at least two consecutive occasions. The period from the transplantation date to the first positive Q-PCR result for CMV (>1000 copies/mL) was defined as the time of CMV infection. 5 EBV infection was defined as the detection of EBV-DNA levels greater than 500 copies/mL in peripheral blood on at least one occasion. 8 Neutrophil engraftment was defined as the first of three consecutive days during which the absolute neutrophil count reached or exceeded 0.5 × 109/L without granulocyte colony-stimulating factor support. Platelet engraftment was defined as the first of seven consecutive days on which the platelet count reached or exceeded 20 × 109/L in the absence of platelet transfusion support. 4 aGVHD is characterized by the manifestation of allogeneic inflammatory responses predominantly in three key target organs: the skin, liver, and gastrointestinal tract. At the time of diagnosis, no evidence of chronic graft-versus-host disease (cGVHD) was observed, and histopathological findings were negative. The severity of aGVHD was classified in accordance with established criteria from the literature. 9 Neutropenia duration was defined as the time interval during which the peripheral blood neutrophil count remained below 0.5 × 109/L. Septicemia is defined by infection signs, positive blood cultures, and organ dysfunction. 10 Human leukocyte antigen matching was defined as complete concordance at all antigenic loci.
Construction and validation of predictive models
A total of 31 variables were included in the univariate analysis. Variables with a
The nomogram was constructed by assigning a score to each parameter, and the scores were summed to generate a total score. Each total score was then associated with the probability of the outcome event occurring. The discriminative ability and prediction accuracy of the nomogram were assessed using the concordance index (C-index). The calibration curve evaluated the agreement between actual and predicted CMV infection risks from the nomogram. The predictive ability of the nomogram was assessed using the area under the curve (AUC). The clinical net benefit was evaluated via the decision curve analysis curve. Finally, we employed the bootstrap internal validation method (1000 bootstrap resamples) to calculate the C-index for validating the constructed nomogram and simultaneously conducted external validation.
Statistical analysis
All statistical analyses were conducted using R (v4.3.3)(R Core Team, Vienna, Austria) on Windows. Continuous variables are presented as median (interquartile range (IQR)) where appropriate, and categorical variables as frequencies (%). The Kruskal-Wallis test was used to compare continuous variables among groups, while Pearson’s chi-squared or Fisher’s exact test was applied for categorical data. All statistical tests were two-sided, and the significance level was set at
Results
Patient characteristics
A total of 291 patients were enrolled in the training set. The median age of the patients was 8 years (ranging from 2 to 19 years), with 179 males and 112 females. Among these patients, 84 developed CMV infection, while 207 remained free of CMV infection. The cumulative incidence rate of CMV infection was 40.6%. The median time to the onset of CMV infection was 31 days (ranging from 11 to 151 days) after HSCT. A detailed summary of the demographic and clinical characteristics of the two groups is provided in Table 1. The validation set consisted of 84 patients with a median age of 8 years (IQR: 6–11 years), including 43 males and 41 females. Of these, 17 patients (20.2%) developed CMV infection at 29.24 ± 12.65 days post transplantation.
Establishment of CMV infection prediction model
This study identified and analyzed 31 variables potentially influencing CMV infection using univariate analysis. Key results in Table 1 show five significant factors (

Forest plot depicting the factors influencing the progression of CMV infection after HSCT in thalassemia patients.

Nomogram for predicting CMV infection following HSCT in patients with β-thalassemia major.
Evaluation and validation of CMV infection prediction model
The nomogram model’s predictive performance was evaluated using the C-index (0.745; 95% CI: 0.684–0.807). Receiver operating characteristic (ROC) analysis yielded an AUC of 0.745 (Figure 4(a)), and the validation set had an AUC of 0.649 (Figure 4(b)), indicating that the model exhibited satisfactory predictive accuracy. The calibration plot was constructed using the Hosmer-Lemeshow goodness-of-fit test to evaluate the model’s accuracy. The nomogram calibration curve for CMV infection risk closely aligns with the ideal curve, indicating strong consistency between the training and validation sets (Figure 5(a) and (b)). The closer the calibration curve is to the ideal curve, the stronger the model’s predictive ability. Our results demonstrate the model’s excellent predictive performance. The decision curve was constructed to evaluate the clinical applicability of the model, and the results indicate a high patient benefit rate (Figure 6). Finally, the predictive accuracy and stability of the nomogram model were validated using the bootstrap internal validation method. The C-index was 0.746 (based on 1000 bootstrap resamples), closely aligning with the training set’s C-index of 0.745. Furthermore, external validation resulted in a C-index of 0.649.

ROC curves for the nomogram model (a) and external validation (b).

Calibration curves for the nomogram model (a) and external validation (b).

Decision curve analysis for the nomogram model.
Discussion
In this study, we developed a predictive model for CMV infection in patients with β-TM following HSCT, under standardized conditions to ensure consistency in underlying disease and treatment protocols. Based on univariate and multivariate analyses, three independent predictors—serum albumin (ALB), donor type, and grade III–IV aGVHD—were identified and used to construct a nomogram. To our knowledge, this is the first predictive scoring model specifically developed for assessing CMV infection risk in β-TM patients after HSCT, demonstrating good clinical performance in distinguishing high-risk individuals.
Among these predictors, lower pretransplant albumin levels were associated with a higher risk of CMV infection. While prior studies have not directly evaluated this relationship in HSCT recipients, evidence from other settings supports this association. For instance, hypoalbuminemia has been linked to increased CMV risk in patients with acute ulcerative colitis and kidney transplant recipients.11,12 Albumin is a key biomarker of inflammation and nutritional status, with immunomodulatory and antioxidant properties. Its depletion may reflect systemic immune impairment, contributing to viral susceptibility.13,14
Serum albumin has also been explored as a predictor of severe aGVHD in HSCT settings.15 –17 In our analysis, only grade III–IV aGVHD, not grade I–II, was significantly associated with CMV infection. This aligns with previous findings indicating that severe aGVHD impairs T-cell recovery,18 –20 compromises bone marrow niche function, and predisposes patients to opportunistic infections.21,22 Furthermore, the intensified immunosuppressive therapy required for severe aGVHD may further hinder immune reconstitution, increasing infection risk.
Donor type was another significant predictor. CMV infection rates were notably lower in matched sibling donor recipients (15%) compared with those receiving grafts from matched unrelated (40%) or haploidentical donors (44%). These findings are consistent with previous reports indicating that identical donor-recipient pairings support better immune recovery and lower infection rates.23,24 In contrast, mismatched or unrelated donors often necessitate more aggressive conditioning and immunosuppression (e.g., ATG, tacrolimus), leading to prolonged immune dysfunction and heightened CMV risk. Although CMV seropositivity in both donors and recipients is often considered a risk factor, its predictive value remains controversial. Some studies report a significant association, 25 while others do not.26,27 Given the high seroprevalence of CMV in the Chinese population (>97%), 28 we excluded serological status from our model, as it may offer limited discriminatory power in this context.
The predictive performance of our model was robust, with a C-index of 0.745 in the training cohort, 0.746 after bootstrap validation, and 0.649 in the external validation cohort. 29 The model was developed using a combination of univariate and multivariate analytical techniques. Incorporating the predictors into a nomogram enabled intuitive scoring and facilitated individualized risk estimation. 30 Such models have been successfully applied to other HSCT-related complications, such as acute kidney injury, 31 mortality in non‑ischemic dilated cardiomyopathy, 32 and transfusion risk in spinal tuberculosis surgery. 33 Evaluation via ROC, calibration, and decision curves confirmed the model’s predictive value and potential clinical utility.
Despite its strengths, this study has limitations. First, the model is disease-specific and may not generalize to other post-HSCT populations. Second, the retrospective design introduces potential selection bias, though efforts were made to standardize inclusion criteria. Prospective validation is warranted. Third, some potentially relevant factors—such as immune reconstitution markers, corticosteroid use, and the distinction between CMV reactivation and actual infection—were not included due to data limitations. Future studies with broader datasets may enhance model performance and generalizability.
Conclusion
We developed and validated a nomogram-based model to predict the risk of CMV infection in patients with β-thalassemia major undergoing HSCT. The model incorporates readily available clinical parameters—serum albumin, donor type, and grade III–IV aGVHD—and demonstrates good discrimination and clinical applicability. This simple, practical tool can aid clinicians in early risk stratification, guide targeted preventive strategies, and improve transplant outcomes.
