Abstract
Keywords
Highlights
The proposed approach introduced a comprehensive pipeline that integrates segmentation and classification models to determine the MGMT promoter status, as it resides within the tumor region. To address the challenge of gradient vanishing, a modified U-Net architecture called 3D Residual U-Net is employed in the segmentation phase, effectively preserving important low-level features. The pipeline incorporates a 3D tumor subregion voxel-based MGMT classification using a proposed 3D ResNet10 architecture. An adaptive average pooling layer is introduced to enhance spatial feature learning and improve classification performance. The segmentation phase, utilizing the 3D ResU-Net architecture, achieves promising results with average dice scores of 0.81, 0.84, and 0.80 for TC, WT, and ET tumor subregions on model validation set, respectively and the classification phase, using the 3D ResNet10 classifier, achieves a ROC–AUC score of 0.66 on model validation set, indicating its potential in predicting MGMT promoter status. The pipeline enables precise identification of tumor subregions and accurate prediction of the MGMT promoter status, facilitating improved treatment planning and decision-making.
Introduction
Glioblastoma multiforme (GBM) is a kind of grade IV central nervous system (CNS) tumor, with over 90% of cases having primary gliomas arising from glial cells. The remaining 10% are secondary gliomas from lower-grade tumors that develop more slowly. Despite the genetic differences between primary and secondary GBMs, according to the World Health Organization (WHO) classification, 1 their morphology is generally the same. GBM is an aggressive malignant brain tumor that develops and spreads quickly within the human brain. 2 Both children and adults may be affected, and chemotherapy, radiation therapy, and surgery are frequently used in their treatment. 3 The WHO considers glioblastoma to be the most aggressive brain tumor, with a median survival time of approximately 12–15 months despite treatment advancements. 4 The presence of a particular genetic sequence in brain tumor cells of GBM patients, known as “O6-methylguanine-DNA methyltransferase (MGMT)” promoter methylation, plays a significant role in prognostic prediction rather than determining eligibility for chemotherapy. While MGMT promoter methylation is associated with increased sensitivity to alkylating chemotherapy agents, such as temozolomide (TMZ), it does not preclude patients from receiving chemotherapy. 5 In fact, the standard of care for glioblastoma, based on the Stupp protocol, includes a combination of radiotherapy and chemotherapy regardless of MGMT status. 6 However, MGMT methylation status remains a crucial biomarker for predicting treatment response and overall prognosis in GBM patients.
The MGMT gene plays a role in repairing cellular DNA damage, and methylation of its promoter region can reduce its activity. According to a study, 5 the MGMT gene is less active when its promoter region is methylated in brain tumor cells, compromising the tumor cell's ability to repair DNA damage caused by chemotherapy drugs. This increased vulnerability to chemotherapy can lead to improved therapeutic outcomes. Therefore, while MGMT promoter methylation does not determine whether a patient receives chemotherapy, it provides valuable prognostic insights that can help tailor treatment expectations and strategies. 7
Clinically, the evaluation of MGMT promoter methylation status has become an essential tool in guiding treatment decisions for GBM patients. Although the standard of care remains chemotherapy and radiotherapy for all eligible patients, assessing MGMT methylation status helps refine prognostic predictions and optimize therapeutic approaches. Specialized laboratory testing is required to determine MGMT status, which typically involves obtaining a tissue sample from the tumor through surgical excision. This process can introduce delays in diagnosis and treatment planning. 8
BraTS2021 and MGMT promoter status datasets are benchmark datasets for brain tumor segmentation (BTS) and MGMT promoter status classification which consists of MRI scans having four modalities: Fluid-attenuated inversion recovery (FLAIR), T1 weighted (T1w), T1-weighted with contrast enhancement (T1wCE), and T2 weighted (T2w), The BraTS challenge initially focused on tumor segmentation, and the inclusion of MGMT methylation assessment is a recent development, highlighting the growing interest in radiogenomic analysis. 9 Each Modality has its significance and plays a vital role in training segmentation and the classification model. 10 Segmentation of the MRI is the core element for the detection or diagnosis of a tumor. In segmentation, MRI is divided into many segments where tumors exist by providing an accurate mask of the tumorous region. 2 For a further comprehensive analysis, automated BTS using MRI images might increase the radiologist's capacity to diagnose and plan the surgery. The segmentation results have been used in many studies as input to classifiers in the classification of MGMT promoter status since MGMT lies within the tumorous region. 11
Surgical removal, radiation therapy, chemotherapy, targeted therapy, immunotherapy, hormone therapy, and palliative care are all viable options for tumor treatment. The decisions are based on the type, stage, and health of the patients. The objective is to remove or reduce tumors, alleviate symptoms, and enhance the life expectancy of the patients. Collaboration with healthcare professionals assists in determining the optimal treatment strategy for each individual. In addition to the previously mentioned treatment options, deep learning (DL)-based non-invasive solutions are being actively researched as a potential tool for tumor therapy. While these approaches have shown promise in aiding treatment decision-making and improving patient outcomes, they remain at an experimental stage and are not yet integrated into routine clinical practice. 12 DL can assist in tumor detection, segmentation, and classification, contributing to the development of personalized treatment plans. However, further validation and clinical trials are necessary before these methods can be widely adopted in medical settings. The incorporation of DL into tumor management has the potential to improve the precision and efficacy of treatment approaches, leading to enhanced patient care.
Several challenges are associated with the BTS problem. Firstly, the computational requirements, particularly for 3D DL models, pose a significant obstacle due to the intensive computing power they demand. Moreover, there is a notable imbalance in the distribution of tumor regions among different classes, with a majority of tumors concentrated within a single class. This highlights the need for a robust methodology to accurately segment tumor areas into their respective groups. U-Net, a widely used DL architecture for medical image segmentation, has demonstrated effectiveness in various applications. U-Net has proven effective, but it has limitations, particularly regarding class imbalance in the training data. Suboptimal performance may occur when the target region, such as a tumor, is smaller than the surrounding healthy tissue, leading to challenges in accurate segmentation and an increased risk of false negatives or under-segmentation. Addressing the class imbalance is crucial to improve U-Net's performance and improving medical image segmentation. To overcome these limitations, this study proposes the 3D Residual U-Net (3D ResU-Net) architecture, which incorporates residual convolutional blocks into the U-Net model to mitigate issues like vanishing gradients and leverage low-level features for accurate tumor cell prediction. Additionally, soft dice loss (SDL) function is utilized during the model training process to address the problem of class imbalance. Furthermore, predicting MGMT promoter methylation presents additional challenges. Invasive tissue sampling is currently required due to the lack of non-invasive indicators, limiting safe treatment options for all patients. The fluctuating spatial nature of MGMT status within the tumor further complicates its prediction. 10
Additionally, privacy concerns restrict the availability of MGMT datasets to the public, resulting in a small number of cases for researchers to work with. The objective of this research is to enhance MGMT classification results through a segmentation and classification pipeline that incorporates multimodal segmentation outcomes. By considering the spatial variability of MGMT status within the tumor, this study aims to address the challenges associated with MGMT status prediction and contribute to the advancement of brain tumor analysis. The proposed pipeline addresses the classification of MGMT promoter status, which significantly impacts the treatment and outcomes of GBM patients. Accurate classification of MGMT status in this research aims to improve patient outcomes by informing treatment planning, predicting treatment response, guiding therapy selection, and providing prognostic insights. The pipeline consists of two phases, segmentation and classification. The U-Net-based 3D ResU-Net architecture is used in the segmentation phase to segment tumors into subregions (WT, TC, and ET) using multimodal whole-brain MRI scans from the BraTS2021 dataset. This produces a 3D voxel representation highlighting tumor regions. The 3D ResNet10 model is employed in the classification phase, designed for 3D data. The associated MGMT values from the BraTS2021 task 2 dataset are assigned to tumor voxels (segmentation results of BraTS2021 Task 1 dataset) for training the model, enabling prediction of the MGMT promoter status (methylated or unmethylated). The proposed pipeline aims to contribute to the personalized and effective management of GBM patients by accurately predicting MGMT promoter status. The classification results provide insights into treatment response, guide therapy selection, and offer prognostic information. By leveraging advanced segmentation and classification techniques on multimodal MRI data, this research strives to enhance the understanding and treatment of GBM, ultimately improving patient outcomes. The main contribution of this research is as follows:
This study utilized the BraTS2021 BTS dataset and MGMT promoter status classification dataset and proposed an approach that introduced a comprehensive pipeline that integrates segmentation and classification models to determine the MGMT promoter status, as it resides within the tumor region. To address the challenge of gradient vanishing, a modified U-Net architecture called 3D Residual U-Net is used in the segmentation phase for effectively preserving important low-level features. The pipeline incorporates a 3D tumor subregion voxel-based MGMT classification using a proposed 3D ResNet10 architecture. An adaptive average pooling layer is introduced within the architecture to enhance spatial feature learning and improve the classification performance. By combining segmentation and classification models, the pipeline enables precise identification of tumor subregions and accurate prediction of the MGMT promoter status, facilitating improved treatment planning and decision-making.
This paper is organized as follows: The second section covers a literature review of the state-of-the-art segmentation and classification studies, and the proposed pipeline is discussed in the third section. The results of the proposed pipeline and experiments and comparisons of results with state-of-the-art techniques are discussed in the fourth section. The limitations of this study are discussed in fifth section and finally, the paper is concluded in the sixth section.
Related work
Medical study places a lot of emphasis on the classification of brain tumors, especially in relation to MGMT status. Knowing and understanding the MGMT status can give useful information for personalized treatment plans. The field of brain tumor classification, particularly concerning MGMT, remains dynamic and evolving, with ongoing research continuously refining diagnostic and prognostic approaches. Advances in molecular profiling, imaging techniques, and machine learning have contributed to a deeper understanding of MGMT methylation and its implications in glioblastoma treatment. Ongoing scholarly research and conference contributions have expanded the existing knowledge base by introducing diverse methods and techniques for advancement in this area. This section fully reviews the research on BTS and MGMT classification. Diagnosing and treating brain tumors at an early stage is necessary to increase the likelihood of a favorable result for the patient. When it came to extracting information from brain scans, classic machine learning approaches relied on feature engineering that was handmade in the past, before the development of DL. However, in the past several years, DL has completely transformed the area of medical imaging by making it possible for both local and global information to be automatically learned. Because of this development, the capacities of DL models for analyzing brain images have been greatly increased, improving the detection and characterization of brain tumors. The RSNA and MICCAI collaborated to launch The Bain Tumor Segmentation (BraTS) challenge. Many scholars participate in this challenge, which has two tasks: the first is merely for segmentation, and the second is to classify the “MGMT promotor status methylation or unmethylation.”
State-of-the-art literature review: U-Net-based brain tumor segmentation and variants
Using MRI images, automatic BTS has been suggested in many ways. The U-Net model is one of these that has become very popular in the area of biomedical image segmentation. U-Net was first built by Weng et al., 13 and even with a small amount of training data, it has done a great job of segmenting. The U-Net architecture has a U-shaped design. The left (encoder) section focuses on getting the most important features, while the right (decoder) section focuses on getting the most accurate segmentation. U-Net has concatenation processes that allow feature maps from the encoder path to be sent to the decoder path. This makes sure that contextual information flows. This makes it possible for the computer to make good guesses. Most state-of-the-art BTS techniques use 2D or 3D convolution (convs) to train deep CNN models. However, 2D convs may not fully use the spatial information in medical images, and 3D convs take more computational resources and memory. Chen et al., 14 came up with a new way to deal with this problem. They called it the separable 3D U-Net model, and it was meant to get around the memory needs of 3D convs. The suggested model made use of the whole 3D volume of the brain by using three separate 3D convs. To improve performance, the authors added a separable temporal conv to the residual inception model and used a multiview fusion method to combine the results of the convs. The model did well on the BraTS 2018 test dataset, especially in terms of how well it worked. When it comes to making decisions about segmentation prediction, both local and global traits are very important. But as you go deeper into the network, gradients of low-level traits tend to get less strong. When segmenting a medical image, especially a 3D volumetric image, it is important to include both local and global features because they hold valuable context information that is needed for accurate segmentation.
Ahmad et al., 15 proposed a modified 3D U-Net model for the segmentation of brain tumors. In order to capture multi-contextual hierarchal characteristics with their method, they incorporated dense blocks into the contracting and expanding channels of the network. This helped to promote feature reusability. The researchers combined features with a variety of filter sizes in order to extract both local and global information from volumetric brain images. They did this by using residual inception blocks. The method that was proposed was checked for accuracy using the BraTS 2020 dataset. Utilizing contextual information at each level of the model allowed the study to successfully overcome the difficulty posed by limited data. However, it is important to highlight that the suggested model has high need for GPU RAM and that augmentation techniques were used to address the problem of class imbalance. Li et al. 16 presented a novel architecture that they named MSFR-Net. It was built in particular to meet the BraTS 2021 Segmentation challenge requirements. Their strategy consisted of utilizing cross-entropy and dice loss during feature extraction. As a result, they achieved segmentation dice coefficients of 89.15%, 83.02%, and 82.08%. These encouraging results highlight the usefulness and potential of the MSFR-Net architecture in greatly enhancing the accuracy of segmentation tasks within the context of the BraTS dataset. For the purpose of glioma segmentation in MRI scans, a multi-plane ensemble of U-Net ++ models was suggested in a previous study. 9 The authors successfully segregated GBM and their subregions by combining ensemble majority voting and training with boundary loss. This allowed for the most accurate results. The results of the dice score were 0.792, 0.835, and 0.906 for the ET, TC, and WT, respectively. The U-Net architecture's vanishing gradient problem reduces the gradient signal as it backpropagates over deep layers. The model may struggle to learn and capture fine-grained information, especially in deeper sections of the network. Inspired by ResNet, ResU-Net incorporates residual connections to overcome this restriction. Residual connections in the ResU-Net architecture facilitate direct information flow from earlier to later layers, bypassing intermediate layers. This mechanism enhances gradient flow and preserves valuable network information, effectively addressing the issue of vanishing gradients and enabling the training of deeper networks. This lets the model capture low-level and high-level characteristics, improving semantic image segmentation. In medical imaging, where small details are essential for correct diagnosis and segmentation, ResU-Net has proven its efficacy. Zeineldin, et al., 17 suggested a hybrid ResNet-U-Net BTS model to overcome vanishing and bursting gradients. The ResNet model solves gradient propagation with skip connections and the same number of parameters. The U-Net decoder employs spatial information extracted by ResNet to provide reliable segmentation masks. This work used ResNet U-Net to binary segment BraTS 2019 2D. Shehab, et al. 18 presented a deep residual network to solve gradient exploding. Identity shortcut connections simplified the model and reduced training overfitting. The suggested model trained three times quicker than previous deep neural network architectures on the BraTS 2015 dataset. Yang, et al., 19 suggested ResU-net, a U-Net with residual units, to improve model resilience and efficiency. The study built a deeper U-Net with a squeeze operator to regulate network parameters and improve local and global feature extraction. Residual units were merged into the deeper ResU-net, preserving the same number of parameters but improving model performance. Luu et al., 20 presented a modification of the U-Net algorithm for BraTS2021 Segmentation that was given the name nn-U-Net. The success of the suggested model can be attributed to the substantial adjustments that were made to it, such as the utilization of a wide network, the transition from batch normalization to group normalization, and the incorporation of axial Attention in the decoder. It was determined to be the victor in the BraTS 2021 competition after receiving winning dice scores of 88.35% for the ET tumor, 88.78% for the TC, and 93.19% for the WT. In a different piece of research by Jia et al., 21 long-range features were extracted by employing a model called BiTr-U-Net, which consisted of a combination of CNN and Transformers. On the BraTS2021 testing dataset, the model produced satisfactory dice scores after undergoing specific changes. The Dice score values were 0.9257, 0.9350, and 0.8874 TC, WT, and ET respectively. For the purpose of tumor segmentation in brain MRIs, Pei et al. 22 utilized a 3D ResU-Net. They attained dice scores of 0.8196, 0.9195, and 0.8503 for the ET, WT, and TC, respectively, by using this strategy in the BraTS2021 validation dataset. In another study by Sindh et al. 23 multimodal BTS was accomplished by the utilization of a 3D-U-Net -based architecture. On the validation data, they received dice values of 0.87, 0.76, and 0.73 for the WT, the TC, and the ET, respectively. On the test data, they acquired dice values of 0.73, 0.67, and 0.63. Wang et al. 24 presented a modified version of the U-Net that they referred to as the attention block U-Net (AttU-Net). Their goal was to use the anatomical structure of the brain. They were able to enhance their dice scores to 0.793 for ET and 0.879 for WT by including attention blocks. In the realm of optimized information fusion, Ali et al. 25 proposed a wrapper-based technique aimed at improving classification performance for chest infection detection, including COVID-19, using X-ray images. Their approach involves extracting deep features through pretrained DL models and employing ten optimization techniques to select optimal features for a support vector machine classifier. The results suggest that the proposed wrapper-based automatic DL network selection and feature optimization framework has a high classification rate of 97.7%. This method underscores the potential of combining multiple optimization algorithms to enhance model performance in medical image analysis. While their study focuses on chest X-rays, the principles of network selection and optimized information fusion they discuss are pertinent to our work. Applying similar techniques to multimodal MRI data could potentially refine the integration of diverse imaging features, thereby improving the accuracy of MGMT promoter status classification in glioblastoma patients.
Jia et al. 26 emphasized the significance of precise BTS in MRI diagnosis and therapy monitoring. To tackle challenges posed by variations in lesion severity, structural changes in brain tumors, and the low quality of MR images, the authors proposed an end-to-end BTS method. This approach integrates an enhanced 3D U-Net with super-resolution image reconstruction within a unified framework. Additionally, a coordinate attention module was incorporated before the upsampling process in the backbone network to enhance the extraction of both local texture and global spatial features. The system was trained and tested on the BraTS dataset and evaluated against other DL models using Dice similarity scores. The BraTS2021 dataset evaluation yielded Dice similarity scores of 89.61% for enhancing tumors, 88.30% for tumor cores (TCs), and 91.05% for entire tumors. Additionally, the Hausdorff distances (95%) were 1.414 mm for enhancing tumors, 7.810 mm for TCs, and 4.583 mm for entire tumors. These results indicate that the proposed technique outperformed the baseline 3D U-Net and demonstrated reliability in segmenting brain tumor MR images with high structural heterogeneity. Division of cerebrum cancers is basic for clinical finding and therapy. Specialists use multimodal attractive reverberation imaging (X-ray) to depict line districts of cancers, which are frequently interlaced and testing to precisely distinguish, prompting possible symptomatic mistakes. To address this, Çetiner et al. 27 proposed DenseUNet+, a clever profound learning-based approach, has been created for high-precision division utilizing multimodal pictures. The DenseUNet+ model utilizes information from four unique modalities inside thick block structures, applying direct tasks and connection prior to moving the handled information to the decoder layer. The strategy was thought about in contrast to cutting edge (SOTA) procedures utilizing dice and Jaccard measurements on the BraTS2021 and FeTS2021 datasets. The DenseUNet+ accomplished dice and Jaccard measurements of 95% and 88% on the BraTS2021 dataset, and 86% and 87% on the FeTS2021 dataset, separately. These outcomes exhibit that DenseUNet+ beats many existing SOTA strategies in mind cancer division.
DL has shown promise in GBM MRI analysis, aiding tumor segmentation and molecular assessment by Bonada et al. 28 However, challenges such as MRI variability, limited training data, and inaccuracies in postoperative imaging hinder clinical adoption. Ethical concerns regarding data protection also need standardization. While DL enhances the diagnostic accuracy and treatment planning, integration into clinical practice requires improved data collection, technical advancements, and regulatory frameworks. This review explores the current state of DL in GBM segmentation and molecular subtyping, addressing key limitations and future directions for broader implementation. The literature highlights the efficacy of several DL architectures in the segmentation of brain tumors. This demonstrates both the ongoing progress that is being made in this area as well as the possibility of enhancing the accuracy of automatic BTS. Each method presents novel improvements and obtains results that are comparable to those of other methods when applied to the BraTS datasets.
State-of-the-art literature review: CNN-based MGMT Status classification
Several studies focusing on the classification of MGMT promoter status in the BraTS2021 dataset were examined, each employing different architectures and techniques. Faghani et al., 10 compared three DL-based approaches for predicting MGMT promoter methylation status using imaging data. The researchers utilized T2 images from task 1 and MGMT labels from task 2 of the Brain Tumor Segmentation (BraTS) 2021 dataset. Three models were developed: voxel-wise, slice-wise, and whole-brain. Voxel-wise classification involved training a 3D-Vnet model for tumor segmentation and using majority voting for the final prediction. The slice-wise classification used an object detection model for tumor detection and prediction, followed by majority voting. For the whole-brain approach, a 3D Densenet121 model was trained for prediction. On validation, the ROC–AUC obtained were 65.42% for whole-brain, 61.37% for slice-wise, and 56.84% for voxel-wise approaches. 5 the state of MGMT promoters was determined by a 7-layered CNN architecture. The authors made a classification model with a simple and effective design by feeding FLAIR images into the neural network and converting them to PNG format. The one-cycle approach was used to train the model, which got an AUC of 0.61680 on the public validation set and a ROC of 0.53363 on the private test set. Roth et al., 9 the state of the MGMT promoter was predicted by using a classifier ensemble. The authors used 3D EfficientNet Architecture, and each medium had its own 3D EfficientNet model. The classification was done by taking the average of what each method said would happen. On testing, the model got an AUROC score of 0.577.
Mun et al. 29 preprocessing and augmentation methods were used. This included aligning and resampling the different types of data, making 3D voxels for the MRI, standardizing and normalizing the strength, balancing the histogram, and adding more data. EfficientNet was used to figure out the state of MGMT promoters, and each modality had its own EfficientNet architecture. The AUC for the ensemble single modality was 0.634, while the AUC for the late-fusion multimodality method was 0.698 on the validation. Transfer learning was used by Lang et al. 30 with the C3D video clip sorting network. After enrichment and feature extraction, the weights of the C3D model that had already been trained were used. On the validation, the model's AUC was 0.689, and on the testing, it was 0.577. Saeed et al., 31 used an ensemble network to put together the data from all the different methods. ResNet18 network strands were used to send inputs to their correct destinations, and layered feature maps were put in front of the fully linked layer. CNN-based sorting was used to get rid of slices that did not have tumors. Vision transformers and the attention mechanism were used to classify the MGMT promoter state. On validation, they got an AUC of 0.58. S. Das et al., 7 used an improved Intermediate State Generator (IS-Gen) to prepare the data. The writers used a Random Forest machine learning model along with baseline and enhanced radiomic feature models. Also used were ResNet10 and an Enhanced ResNet10 Classifier, with an AUC of 0.66 being the best on validation. In the study of Calabrese et al. 6 a CNN classifier was used to figure out the state of the MGMT promoter and to segment the tumor. To get the radiomics data in balance, SMOTE feature synthesis was used. The random forest was used, and the MGMT promoter status was found by taking the average of the CNN and random forest results. This gave a ROC and AUC of 0.77 on validation, using BraTS2021 and Private dataset.
Representation learning in medical imaging derives relevant information from multimodal pictures to help with diagnosis, illness detection, and therapy planning. Efficiently integrating information from diverse modalities is difficult since present approaches frequently overfit when employing high-dimensional feature vectors. Deep fusion approaches that merge characteristics at the intermediate level perform better. Ibtehaz et al. 32 presented a unique deep fusion approach based on depthwise 1D convolution that is both computationally efficient and effective. The method computes radiomic features from MRI and deep features from pre-trained models, then selects useful features for fusion using depthwise 1D convolution, followed by fully connected layers for classification. When tested on the BraTs-21 and Lumiere datasets, the approach outperforms previous strategies with an AUC score of 0.748 and a binary cross-entropy (BCE) loss of 0.62.
Overall, these studies represent the classification of the status of MGMT promoters in the BraTS2021 dataset by using different DL architectures, ensemble techniques, preprocessing methods (selective modalities), and augmentation strategies. The results show that these methods have the ability to improve the accuracy of predicting the status of MGMT promoters and move the field of brain tumor classification forward. Several studies that are summarized in the literature have focused on the classification of the status of the MGMT promoter by utilizing a variety of methods, including CNNs, ensemble models, transfer learning, and radiomics analysis. On the other hand, two significant gaps in the research can be found. To begin, the majority of studies have not completely incorporated all modalities of MRI into their examinations. Incorporating all modalities, such as T1-weighted, T2-weighted, FLAIR, and T1ce, can provide a more comprehensive depiction of the tumor characteristics, which may help to enhance MGMT promoter status categorization. This can be accomplished by providing a more accurate image of the tumor's properties. Second, although some of the research did make use of automated segmentation approaches, the segmentation models that were utilized in those studies did not receive any specific training on MGMT-related characteristics. It is essential to construct a segmentation model that is trained on MGMT-specific characteristics considering the close association that exists between the status of the MGMT promoter and the location of the tumor. Incorporating a dedicated segmentation step that focuses on capturing tumor classes related to MGMT promoter status can help the subsequent classification model benefit from more accurate and informative tumor representations, which could potentially lead to improved MGMT promoter status classification performance. This is because the segmentation step focuses on capturing tumor classes related to MGMT promoter status. It is possible to considerably increase the accuracy and reliability of MGMT promoter status classification by addressing these research gaps and building a pipeline that blends extensive multi-modal MRI data with MGMT-specific tumor segmentation and the subsequent classification. This comprehensive approach has the potential to provide vital insights into the underlying molecular properties of brain tumors and to contribute to the development of tailored treatment options in the field of neuro-oncology.
Research methodology
In this section, the dataset, pre-processing steps, proposed pipeline-based approach with 3D ResU-Net architecture for segmentation, and 3D ResNet10 for classification are described in detail along with the loss function used in both models.
Dataset
The BraTS 2021 MGMT dataset consists of preoperative multi-parametric MRI scans from patients diagnosed with glioblastoma (GBM, WHO Grade IV). The cases were histopathologically confirmed and annotated with O6-methylguanine-DNA methyltransferase (MGMT) promoter methylation status, which plays a critical role in predicting the response to TMZ chemotherapy. Glioblastomas in this dataset were classified based on the World Health Organization (WHO) classification of CNS tumors, with cases comprising both methylated and unmethylated MGMT promoters, allowing for the development of predictive models for treatment response. The MRI scans include T1-weighted, T1 contrast-enhanced (T1ce), T2-weighted, and FLAIR sequences, ensuring comprehensive imaging analysis. The proposed segmentation-based classification method is trained and evaluated using the publicly available BraTS 2021 benchmark dataset, a joint initiative of the Radiological Society of North America (RSNA), the American Society of Neuroradiology (ASNR), and the Medical Image Computing and Computer-Assisted Interventions (MICCAI) Society. The dataset focuses on two primary objectives: segmenting distinct brain tumor sub-regions and classifying the MGMT promoter methylation status of the tumor. BraTS 2021 includes pre-operative baseline 3D multiparametric MRI images from 2040 patients. 33 For Segmentation Task 1, there are 1251 publicly accessible scans with segmentation labels. There are 585 patients in task 2, which contains the MGMT labels. 536 patient's data contain both segmentation masks and MGMT classification labels, as determined by intersecting the datasets for segmentation and classification tasks details are shown in Table 1.
BraTS2021 dataset distribution.
In this study, we make use of task 1 MRI scans and their associated MGMT labels, concentrating on the 536 scans for which segmentation masks and MGMT classification labels are available. The MRI scans consist of four modalities and each modality has its significance. The importance of each modality is shown in Table 2.
Importance of each modality. 34
Each modality has a dimension of 240 by 240 pixels with 155 different slices. Figure 1 illustrates a sample from the benchmark BraTS2021 dataset, showcasing all four modalities along with the MGMT label of slice number 83.

Multimodal MRI scan was taken at axial view of patient IDs 789 and 703 with segmentation and MGMT label.
One to four neuroradiologists manually annotated the ground truth segmentation file. This dataset has four classes:
Background (0) Tumor with necrosis and no enhancement (1) Edema (2) Tumor Enhancement (ET) (4)
In the Implementation, label 4 (ET) has been changed to label 3. For the Segmentation phase the dataset is divided into training and validation sets 80% of the data is used for training and 20% of the data is used for validation and in the classification phase 90% of the data is used for training and 10% of the data is used for validation since only 536 instances were available for phase 2.
Integration of segmentation and classification: The proposed pipeline
A pipeline of segmentation and classification is proposed for identifying the presence of MGMT promoter status within the MRIs of GBM patients. The proposed pipeline was constructed in two phases. Phases of segmentation and classification. The segmentation phase aims to segment the tumor subregions from whole-brain MRIs, whereas the classification phase aims to detect the MGMT promoter status present in the tumorous region. In this research, tumorous regions (voxels) were extracted from whole-brain MRIs using all four Modalities (T1, T1ce, T2, and FLAIR) in phase 1 of the pipeline. In phase 2, the segmented voxel generated in phase 1 is used in conjunction with the MGMT label provided by the BraTS2021 datasets to classify the MGMT promoter status. The general workflow of the proposed pipeline is shown in Figure 2.

The proposed pipeline of segmentation and classification models.
Segmentation phase
This section discusses the pre-processing steps applied to the BraTS2021 dataset followed by a detailed description of U-Net architecture and 3D ResU-Net. This section is concluded with the loss function applied on 3D ResU-Net.
Preprocessing
The challenge organizers have already put the dataset (BraTS2021) through several different pre-processing steps before making it available to the public.
33
Each image has already been skull-stripped, co-registered, aligned in a single space, and has the same resolution of 240 × 240 × 155 for each modality. These numbers represent the image volume's dimensions along three axes: width (240 pixels), height (240 pixels), and the number of
Pre-processing is required before passing data into the model. The MRI scans are resized from 240 × 240 × 155 to 128 × 128 × 128. Resizing the MRI images to 128 × 128 × 128 does not significantly change the performance of the models. Resizing allows for more efficient processing and lowers the computational complexity of the models. The decision of 128 × 128 × 128 as the new size was made carefully to ensure that no important information or infected regions were discarded during the resizing process. By using suitable interpolation methods during resizing, the spatial information and important features within the images were retained. This ensures that the resized images still provide useful information for tasks such as tumor detection or classification, without compromising the accuracy or performance of the models and this makes the model less computationally expensive. The images were stacked together to maximize the value of the combined data from all four modalities. This downsizing technique meant that the information included in the four sequences (T1, T1ce, T2, and FLAIR) could be used properly. During the training phase, the model was fed training examples with dimensions of 128 × 128 × 128 × 4, where the fourth dimension represents the stacked of four modalities. This method allowed the model to access and process the integrated information from all four sequences simultaneously, optimizing available data to increase performance.
Network architecture for segmentation
The U-Net architecture is often used for jobs like image segmentation, where the goal is to segment the image into different categories. There is an encoder path and a decoding path in the U-Net. The encoder path grabs high-level features and makes the input image less detailed in space. It usually has several convolutional layers, which are followed by max-pooling processes. The convolutional layers take increasingly abstract features from the input image, while the max-pooling operations reduce the spatial dimensions, allowing the network to understand a wider context. The U-Net's decoding path does the opposite of what the encoder path does. 35 It takes the features from the encoder path and slowly makes them bigger until they are the same size as the original image. This up-sampling process helps make a thick segmentation map pixel by pixel. U-Net has been demonstrated to be useful for various segmentation tasks; nevertheless, it has a few drawbacks. During the up-sampling process that takes place in the decoder path, there is a possibility that some of the spatial information will be lost. 36 The network may have difficulty correctly recreating the fine details and borders of the segmented regions as the spatial resolution increases. Because of this, the segmentation results may be hazy or imprecise. In deep neural networks, the vanishing gradient problem happens when the gradients transmitted backward during training grow smaller as they go through several layers. This issue is most noticeable within the encoder part of the U-Net or other systems with comparable functionality. The network's capacity to learn meaningful and discriminative features from the input data decreases as the gradients decrease. This is because the network has less information to learn from. As a result, the segmentation performance is hampered due to this, particularly when capturing the fine details and borders of objects in volumetric data. Both segmentation performance and the vanishing gradient problem needed to be resolved. Hence the 3D ResU-Net architecture was developed. 37
To effectively segment the volumetric data, the 3D ResU-Net architecture combines the concepts of 3D convolutions, residual blocks, skip connections and an encoder-decoder structure. By combining these components, the architecture is customized for the segmentation task. The encoder path is essential for extracting high-level features from the volumetric data input. It accomplishes this by employing 3D convolutional layers to extract pertinent data while simultaneously reducing the spatial dimensions to capture global context. However, it is important to note that the downsampling process in the encoder path may introduce the issue of vanishing gradients. Which is something that should be taken into consideration. To circumvent the issue of vanishing gradients, the architecture of the 3D ResU-Net has residual blocks included in it. The network can learn residual mappings because residual blocks introduce skip connections that circumvent certain convolutional layers. This allows the network to learn residual mappings. The network can collect and preserve significant features across multiple layers more efficiently when gradients are propagated over these skip connections. This makes learning more representative features easier for the network. 22 The 3D ResU-Net addresses the problem of the loss of spatial information that occurs during the up-sampling process by incorporating skip connections. These skip connections directly connect the corresponding encoder and decoder levels. This enables the decoder path to access and use low-level and high-resolution features in earlier layers. The network can reconstruct the fine details and borders of the segmentation maps thanks to the fusion of information at multiple resolutions. The decoder path of the 3D ResU-Net gradually upsamples the feature maps by utilizing either interpolation techniques or 3D transpose convolutions. Upsampled feature maps are concatenated with associated skip connections to facilitate the integration of high-level and low-level features. The objective of the decoder path is to recreate the initial volumetric data supplied by refining the features and providing accurate segmentation maps. A Softmax activation function is used in the third and final layer of the 3D ResU-Net. It produces a dense prediction volume, where multiple segmentation categories are assigned to each voxel individually. The segmentation map is represented by the output volume, which indicates the presence of distinct structures or regions inside the input volumetric data and the boundaries between them. The 3D ResU-Net model efficiently tackles the vanishing gradient problem and enables accurate and detailed segmentation of volumetric data for medical imaging and other applications by integrating 3D convolutions, residual blocks, skip connections, and the encoder–decoder structure. The 3D ResU-Net architecture is shown in Figure 3.

3D residual U-net architecture. 37
The 3D ResU-Net architecture is a well-established variant of the U-Net framework specifically designed to process three-dimensional (3D) data in medical image segmentation tasks. 38 It employs an encoding-decoding structure consisting of a contracting path (encoder) and an expanding path (decoder) that are interconnected through skip connections. The contracting path comprises five levels, denoting the depth of the network, and operates on a voxel grid with dimensions of 128 × 128 × 128 × 4, incorporating four stacked modalities, namely T1, T2, T1ce, and Flair. At each level of the encoder, residual blocks (green blocks) are employed to extract relevant features, followed by 3D max pooling with a pool size of (2, 2, 2) and a stride of (2, 2, 2) to reduce the spatial dimensions. To prevent overfitting, dropout regularization is implemented at each step. The residual blocks utilize an identity mapping approach, which involves a convolutional layer and batch normalization for the first two components, while the third component does not employ ReLU activation as shown in Figure 4. By integrating the input with the shortcut link using the ReLU function, the output of the third component is obtained, allowing the network to capture residual mappings and extract hierarchical and informative features effectively.

Shortcut identity connection of residual block. 18
On the other hand, lower levels show fewer features but have better spatial precision. 37 The encoder path has a function called “downsampling,” which shrinks the size of the image and then a “bottleneck” layer. In the bottleneck and decoder, segmentation masks are predicted using plain convolutional blocks (purple blocks). Using up-sampling or the Conv3D Transpose function, the decoder path tries to get the image back to its original shape. The up-sampling path aims to get accurate segmentation by putting together data from the down-sampling path that matches. Skip connections are used to keep information from the contracting path in context. Using conv3D transpose layers, the up-sampling path and the matching down-sampling path are joined to make the expanding path look like the original U-Net model. The next step is using convolutional blocks with a dropout regularizer of 0.1. In the expanding path, each convolution block comprises two conv3D layers, batch normalization, and the ReLU activation function. The final segmentation is done with a (1 × 1×1) convolution layer and a Softmax activation function, which gives each target class a probability. The model's output from the decoder path is a 128 × 128 × 128 × 4 tensor that shows the segmentation masks expected for the input image i.e., background, WT, TC, and ET are the four groups.
Loss function
Selecting a loss function is crucial to a DL model's efficacy. It specifies the objective the model attempts to optimize during training. The loss function quantifies the model's performance and instructs it to modify its parameters to minimize error. The model learns to make more precise predictions by minimizing loss. SDL is commonly used in medical image segmentation problems to overcome the issue of class imbalance. This research uses SDL in the segmentation phase.
39
The advantages of SDL include robustness to class imbalance, spatial awareness, smooth optimization, and differentiability. SDL can be calculated in equation (1).
Classification phase
This section discusses the preprocessing steps applied to the result of the segmentation phase followed by a detailed description of 3D ResNet10 architecture. This section is concluded with the loss function applied on 3D ResU-Net.
Preprocessing
In the classification phase, the resultant 3D multimodality segmented voxel from the segmentation phase of the proposed pipeline serves as the proposed ResNet10 classifier's input. Only 536 patients’ data in BraTS2021 datasets contain both segmentation labels and MGMT labels. 10 The resultant 3D multimodality segmented voxel is divided into two classes: methylated and unmethylated, provided by BraTS2021. When passing the segmented voxels to the classifier, their dimensions are 128 × 128 × 128 × 4, the same shape used in the segmentation phase.
Network architecture for classification
The 3D ResNet10 model was selected for the classification phase due to its efficiency and effectiveness in handling volumetric data while mitigating overfitting with our smaller dataset. Unlike deeper architectures such as ResNet34, ResNet50, VGG16, and VGG19, ResNet10 is computationally more efficient while still capturing complex features from 3D medical images. Although ResNet50 improves upon ResNet34 by introducing bottleneck blocks for enhanced accuracy, 40 we opted for ResNet10 to prevent overfitting given our dataset's size. The model processes 3D segmented tumor voxels obtained from the segmentation phase, with input dimensions 16 × 4 × 128 × 128 × 128 (batch size 16, four imaging channels, and spatial resolution of 128³). 41 The network starts with a 3D convolutional layer, which applies spatial filtering to extract low-level features, followed by batch normalization and ReLU activation to introduce non-linearity and enhance feature learning. The Basic Block, a key component of ResNet10, consists of two 3D convolutional layers, batch normalization, and ReLU activation, with residual connections allowing the model to learn residual functions, improving gradient flow and making deeper networks easier to train. 42 The network consists of 10 residual blocks, stacked in a hierarchical structure. After the residual layers, an adaptive pooling layer (1 × 1 × 1) reduces spatial dimensions while preserving essential features. Finally, the extracted features pass through a fully connected layer (400 × 1), which helps learn complex patterns and interdependencies, followed by a sigmoid activation function, mapping the features to classification probabilities—predicting MGMT promoter status as methylated or unmethylated. The 3D ResNet10 architecture is illustrated in Figure 5.

3D ResNet10 architecture.
Loss function
The classification phase model uses loss function BCE with logits.
43
The equation for the BCE with logits is given in equation (2).
Results and discussion
In this section details about evaluation measures, implementation details, analysis of the results, and comparison of the results with the state-of-the-art studies of both segmentation and classification models are discussed.
Evaluation measures
The proposed technique is assessed for segmentation using the Dice coefficient score, specificity, and sensitivity. The Dice score is a frequently used evaluation metric for the problem of BTS in which the overlap region between the anticipated segmentation mask and the ground truth is measured. At the same time, false negative and false positive outcomes are penalized. The Dice score ranges from 0 to 1, with higher values indicating better segmentation results. To assess the model's effectiveness for clinical applications, the tumor structure is divided into three subregions: WT, which includes all three malignant regions, TC, which includes all tumors except edema, and ET, which is mostly seen in HGG patients and evident in T1ce modality. To calculate the Dice score, we compare the model's segmentation results (represented by “X”) with the human expert's (ground truth) labeling of the tumor region (represented by “Y”). each voxel (a small, three-dimensional element) in the image to be either a tumor voxel (labeled 1) or a non-tumor voxel (labeled 0).
44
The dice score formula is given in equation (3).
Equation (7) represents the integral of the TPR against the FPR with respect to the FPR. This equation calculates the area under the ROC curve by summing up the TPR values at different FPR levels. On the other hand, the equation (viii) represents the integral of the TPR against the inverse of the FPR with respect to the threshold
Implementation details
The proposed pipeline was developed in Python 3 using the Keras library and Tensorflow as the backend for the segmentation phase and PyTorch for the classification phase. To improve model stability and normalization, we used the ADAM optimizer with a learning rate of 0.0005 and implemented the ReLU activation function and batch normalization at each layer. Due to restricted computational resources, we trained the model for 40 epochs with a batch size of 4. We used the BraTS2021 benchmark datasets for segmentation, with 80% of the data used for training and 20% for validation. For the classification phase, we employed 536 patients with both segmentation and MGMT labels, with 90% of the data used for training and 10% for validation. We used the Nvidia A100 40 GB GPU and 80 GB RAM offered by Google Colab Pro Plus to train the model. While developing the proposed pipeline, we ran various trials to determine the best hyperparameters. To gather more descriptive information, we gradually increased the filter size in each layer. We tested larger learning rates before lowering them to a smaller value that performed well in the final experiment. There is more information provided in Tables 3 and 4 on the hyperparameters utilized in the segmentation and classification phases.
Hyperparameters of the proposed pipeline segmentation phase of the 3D ResU-Net model.
Hyperparameters of the proposed pipeline classification phase of 3D ResNet10.
Analysis of the results
This research proposed a segmentation and classification pipeline to detect the presence of MGMT promoter status in GBM patients using 3D MRI images. The proposed pipeline consists of two phases Segmentation phase, for tumor class segmentation from the whole brain MRIs, and the classification phase, for detecting MGMT promoter status in the tumorous region. This research used phase 1 to extract tumorous regions (voxel) from whole brain MRIs using all 4 modalities (sequences). Phase 2 used the segmented voxel generated in Phase 1 and its respective MGMT label provided by the BraTS2021 datasets for the classification of MGMT promoter status.
Results of phase 1 (segmentation)
In phase 1 a 3D ResU-Net was trained and validated using the BraTS2021 training dataset because the validation data's source data is not readily accessible. As a result, the segmentation model is trained using 80% data for training and 20% for validation. The segmentation model creates a 3D volume of the segmentation mask, composed of WT, ET, and TC in the tumorous regions, and uses it to segment those MRIs with the MGMT label provided by the BraTS2021 challenge. The segmentation results of the MRI image with the MGMT value in the axial plane and random slice no are shown in Figure 6 The visual results representation describes that the results are quite close to the ground truth values. In Table 5, details about the Dice coefficient score of WT, TC, and ET of training and validation achieved by the pipeline phase 1 segmentation model are given. A modified U-Net model incorporating the residual network has been developed to handle the problem of BTS.

Visualized segmentation results having MGMT label.
Quantitative results of training and validation of the segmentation phase.
The initial attempt was made to substitute two levels of the U-Net model's convolutional layers with residual convolutional blocks during the trial of the proposed design. However, the results did not improve much due to this strategy. After that, residual blocks were used at all stages of the encoder path, significantly improving the outcomes. The model was shown to have an overfitting tendency, and after a few epochs, its loss began to increase. The goal of the bottleneck layer is to ensure that the compressed view of the model's bottleneck layer only contains the information necessary for reconstructing the input or segmentation map. Therefore, only the first four levels of the suggested model's final experiments utilized residual blocks, leaving the other levels unchanged from the U-Net model with batch normalization and ReLU activation function. Within the encoder portion of the U-Net, the residual convolutional blocks used adaptive skip connections, which helped to keep the low-level feature information (such as the boundary or edges of the tumorous zone) with high-level features. In this instance, skip connections in the encoder improved the outcomes.
Figure 6 presents segmentation results on MRI scans from the BraTS 2021 dataset, displayed in the 2D axial plane. The selected cases include only MRIs associated with MGMT methylation status. Each row illustrates the model's segmentation output, highlighting tumor subregions across different MGMT classifications.
The training and validation dice coefficient score per epoch is shown in Figure 7. The training and validation loss is shown in Figure 8, training epoch-wise dice score of each tumor class is shown in Figure 9. Additionally, Figure 10 shows the box plots of segmentation accuracy for the three subregions WT, TC, and ET according to the dice score metric. When compared to TC and ET dice scores, it was found that the model performs well in segmenting WT class. With a higher standard deviation, ET can sometimes be significantly more challenging to segment. Subregions of brain tumors are indicated on the x-axis, and the Dice score value, which ranges from 0 to 1, is indicated on the

Dice score of training and validation.

Training and validation loss.

Epoch-wise dice score of tumor classes.

Boxplot for dice score of WT, TC, and ET.
Discussion of segmentation results
The segmentation phase of the proposed pipeline was evaluated using extensive quantitative analysis, with a particular focus on the impact of employing the 3D ResU-Net architecture. Table 5 displays the performance attained during both the training and validation phases. These quantitative metrics, which include accuracy, precision, recall, and other pertinent measurements, provide valuable insight into the efficacy of the proposed segmentation model. Utilizing the 3D ResU-Net architecture enables the segmentation phase to overcome obstacles such as the vanishing gradient problem and train the network at a deeper level. The residual connections in the architecture enhance the accuracy of segmentation by facilitating the flow of gradients and preserving earlier network information. The model's ability to reliably predict tumor regions is enhanced by the 3D ResU-Net’s capacity to leverage low-level features. The obtained results demonstrate the efficiency of the 3D ResU-Net in achieving high segmentation performance, validating its use in the segmentation phase. Integral to the overall pipeline, the segmentation model plays a crucial role in providing precise tumor region localization, thereby enhancing the accuracy and reliability of subsequent classification tasks. These results highlight the significance of the 3D ResU-Net architecture in enhancing segmentation results and furthering the field of brain tumor analysis.
Table 5 presents Segmentation results, including specificity, sensitivity, and Dice Score. The first column under “Dice Score” represents the average Dice Score calculated across the three tumor classes: whole tumor (WT), TC, and enhancing tumor (ET). These metrics provide insights into the performance of the segmentation model in accurately identifying tumor regions and differentiating them from non-tumor regions. Specificity measures how well the segmentation model can identify areas that are not tumors. A lower number of false positive results is indicated by a higher level of specificity. The training sample has a high level of accuracy in identifying non-tumor regions because the average specificity for all tumor classes is 0.98. Similar to the validation sample, the model's ability to precisely identify non-tumor regions is confirmed by the high average specificity of 0.97. The model's capacity to identify tumor regions is measured by its sensitivity. A lower number of false negatives indicates a higher sensitivity. In the preparation test, the responsiveness is reliably high for all cancer types, with a typical worth of 0.99. This suggests that the training dataset accurately identifies tumor regions for the model. In a similar vein, the validation sample's average sensitivity remains high at 0.99, indicating that the model is capable of accurately identifying tumor regions and generalizing to unseen data. The Dice score is a metric that is used a lot to measure how similar the predicted segmentation and the ground truth are. It takes into account both true and false positives. Dice scores of 0.90 for WT, 0.90 for TC, and 0.85 for ET can be found across all tumor types in the training dataset. These scores indicate that the actual tumor regions and the predicted segmentation overlap significantly. The Dice scores are slightly lower in the validation sample, with values of 0.84 for WT, 0.81 for TC, and 0.80 for ET. The segmentation model achieves high specificity, sensitivity, and Dice scores for all tumor classes in both the training and validation datasets, indicating that it performs well on unseen data. However, these scores still demonstrate a significant similarity between the predicted and ground truth segmentations. The model's ability to accurately distinguish tumor regions from non-tumor regions is demonstrated by these findings. Overall, the segmentation results show that both the training and test datasets show that the model has a high level of specificity and sensitivity. The Dice scores also show that the predicted segmentation and the real segmentation are pretty close, especially for the WT and TC tumor groups. But the success of the segmentation of the ET sub-region is not as good, as shown by slightly lower Dice scores.
The Dice coefficient scores for the training and validation stages of a segmentation model are shown on the graph. The
The training and validation loss values decrease over time, reflecting improved model performance. As discussed earlier, lower loss values indicate better alignment between predictions and actual outcomes. The consistent reduction in both training and validation loss suggests that the model effectively learns from data while maintaining generalization.
The Dice coefficient scores for tumor subregions (WT, TC, and ET) follow a progressive upward trend, demonstrating improved segmentation accuracy over epochs. This pattern has been previously described, reinforcing the model’s ability to refine predictions over time. The ET subregion remains more challenging to segment compared to WT and TC, as noted earlier.
The box plot gives a condensed visualization of the dice score distribution for each BTS. This makes it possible to easily compare and identify any changes in the model's performance across different tumor segment.
Comparison with the related studies
BTS on the BraTS2021 dataset was performed in Phase 1 of this research utilizing the 3D ResU-Net architecture. The findings were then compared to other state-of-the-art methodologies utilized in comparable studies. The comparison was carried out by comparing the Dice scores given by each model for the BraTS 2020 and 2021, as indicated in the results in Table 6.
Segmentation results of the related studies.
As can be seen in Table 6, the outcomes of Phase 1 achieved a Dice score of 0.80 for the ET, 0.84 for the WT, and 0.81 for the TC. These scores are comparable to those achieved by several state-of-the-art architectures, such as the Deep Residual U-Net and the Attention U-Net, which were utilized in experiments that are relevant to this one. The vanishing gradient problem is one that is frequently seen in deep neural networks. One solution to this issue is the 3D ResU-Net architecture, which combines the advantages of the U-Net architecture and residual connections. The U-Net architecture has shown performance in medical image segmentation tasks that are both encouraging and promising. The BraTS presents challenges due to variations in tumor size, shape, and appearance, as well as the presence of healthy brain tissue that can resemble tumors in appearance. Based on the findings of Phase 1, it appears that the 3D ResU-Net architecture has the potential to accurately segment brain tumors, attaining performance that is comparable to that of state-of-the-art methods employed in studies that are similar to the proposed study. Moving forward, the primary objective of the study in the future might be to enhance the performance of models by experimenting with a variety of preprocessing methods and hyperparameters, or by investigating the potential synergies that could be achieved by integrating the 3D ResU-Net with other architectural styles or methodological approaches. These initiatives are intended to significantly improve the accuracy and resilience of the pipeline that is used to segment brain tumors.
Model complexity
The Phase 1 model has 47.30 M parameters, and the 3D ResU-Net model's complexity suggests that it is capable of learning complicated correlations between input data and output labels. Still, its performance on both training and validation data must be monitored to ensure that it is not overfitting. The 3D ResU-Net model is quite complex. Our proposed model's execution time was 70.75 min per epoch, and it was trained for 40 epochs in roughly 47 h. The model testing duration is only 12 s, which is a good time for pipeline-based work in which the second phase model receives the outcome of the first phase model as an input. Model performance can be improved by using more power full GPU and more processing units of GPU to run more epochs on a reasonable batch size.
Results of phase 2 (classification of MGMT)
Phase 2 of the study built on the results of Phase 1 which introduced MGMT labels into the segmentation result to predict MGMT status. Only 536 patient's data consists of both segmentation labels and MGMT labels in BraTS2021 datasets. 10 In phase 2, a 3D ResNet10 model was trained and internally validated using the BraTS2021 training dataset, as the external validation dataset did not contain MGMT labels. As a result, the classification model is trained using 90% data for training and 10% for validation. The MGMT gene is a DNA repair gene commonly methylated in GBM and is related to better treatment response. The 3D segmentation information (voxels) from the first phase and the MGMT labels were used in this phase to train a predictive model to classify the tumor as MGMT methylation or unmethylated. The study used the 3D segmentation results from Phase 1 to create a DL network, 3D ResNet10, to predict MGMT status. The MGMT labels were obtained from the same BraTS dataset used in Phase 1, which included each patient's ground truth MGMT status. The DL model was trained to identify the tumor as MGMT methylated or unmethylated based on the 3D segmentation voxel and the matching MGMT label for each patient.
The model contains batch normalization and dropout layers to boost generalization and prevent overfitting. The 3D ResNet10 model was optimized during training using binary cross-entropy with logistics loss function and the Adam optimizer. The dataset was divided into training and validation sets, and the model was trained for numerous epochs before being stopped early based on the validation loss. After training, the model was tested on a separate test set to see how well it predicted the MGMT status of new patients.
Ablation study
The ablation study starts with MGMT classification using a brain MRI slice-wise approach. Figure 11 shows the training and validation ROC–AUC graph.

Training and validation results of MRI slice-wise approach.
The validation AUC values graph demonstrates that the slice-wise approach for MGMT classification using a 2D ResNet model yields suboptimal results. The AUC values range between 0.45 and 0.56, indicating a relatively low accuracy of classification. These findings underscore the limitations of relying solely on a slice-by-slice approach and emphasize the necessity of incorporating all four modalities into MGMT classification. Each modality, including T1, T1ce, T2, and FLAIR contributes unique and valuable information about the underlying tissue characteristics and abnormalities when working with brain MRI data. Using all four modalities collectively, as opposed to contemplating individual slices, the model is able to capture the comprehensive context and complex relationships present across multiple slices and modalities. This integration permits a more comprehensive comprehension of the brain's structure and pathology, thereby enabling the model to make more accurate and reliable predictions. The poor performance indicated by the validation AUC values suggests that the slice-wise approach fails to exploit the abundant contextual information available across multiple modalities. This limitation may cause the model to lack crucial patterns and characteristics for accurate MGMT classification.
In the ablation study, a 3D ResNet10 model incorporating all four modalities from the BraTS2021 dataset was implemented, and significant improvements of ROC–AUC of 6% over the slice-by-slice approach were observed. The previous method, which evaluated each MRI slice separately, did not completely exploit the three-dimensional nature of the data and the spatial relationship between adjacent slices. By adopting a 3D model that incorporates all modalities, classification results have been significantly improved. Utilizing the 3D ResNet10 model enabled more thorough analysis of the complete tumor volume by capturing the spatial context and interslice dependencies. As a consequence, the model was better able to distinguish between tumor regions and non-tumor regions, resulting in improved MGMT classification accuracy. Nonetheless, there is still potential for improvement. Noting that the MGMT gene status resides within the tumorous region is crucial for effective treatment planning. A pipeline of segmentation and classification is necessary for improved outcomes. The “segmentation” step would accurately define the area of the tumor so that the “classification” step could focus on the areas related to the MGMT gene. The implementation of a pipeline consisting of segmentation and classification yielded an additional 4% improvement in ROC–AUC results. This pipeline strategy relies on the accurate segmentation of the tumor region, allowing the classification stage to precisely target and analyze the relevant regions associated with the MGMT gene. By integrating this pipeline, it is anticipated that the classification performance will become even more robust, leading to more precise predictions of the MGMT gene status in brain tumor analysis.
Overall, the proposed pipeline had a higher ROC–AUC than the 3D classification model, which took in input from all four modalities of whole brain MRI.
Discussion of classification results
The classification phase of the proposed pipeline was analyzed quantitatively, taking into account the impact of employing a segmentation and classification pipeline. Table 7 displays the comprehensive results obtained during the training and validation phases, including metrics such as precision, ROC–AUC, recall, and others. In the context of the segmentation and classification pipeline, these metrics provide invaluable insight into the performance of the classification model. By incorporating segmentation results into the classification process, the proposed pipeline takes advantage of the precise localization of tumor regions, which contributes to an increase in classification precision. The segmentation phase provides crucial information regarding the sub-regions of the tumor, allowing the classification model to make more accurate predictions regarding the MGMT promoter status. This integration of segmentation and classification improves the overall accuracy of the pipeline's MGMT status prediction and provides a comprehensive approach to brain tumor analysis. By integrating segmentation into the classification process, the proposed pipeline benefits from precise tumor localization, enhancing MGMT classification accuracy. The segmentation phase provides crucial insights into tumor sub-regions, allowing for more informed classification decisions. As a result, the pipeline effectively utilizes segmented tumor regions, leading to improved MGMT promoter status prediction and contributing to advancements in brain tumor analysis.
Validation results of the ablation study of two classification approaches whole brain and proposed pipeline.
Using the segmentation results from phase 1 proves to be a better strategy for MGMT Promoter status categorization than using the entire brain. The segmentation result as input to the classifier produces higher precision (0.74) and accuracy (0.65) and ROC–AUC (0.66) values, showing a superior capacity to detect positive instances and accurately categorize cases correctly. Although recall and F1-Score are slightly higher than in the whole brain technique, the improved precision accuracy and ROC–AUC make segmentation-based Implementation a more effective strategy for this classification job. This emphasizes leveraging information from the initial segmentation step to improve MGMT Promoter status categorization accuracy. In, 10 the reported results are on validation split data (training and validation). This study follows that pattern because the public test set has no ground truth labels to validate results. In Figure 12, epoch wise model's ROC–AUC is Shown. Early stopping was implied at the 40th epoch to avoid overfitting and to have a comparable result of all evaluation measures. The confusion matrix of the phase 2 validation set is shown in Figure 12, which explains that the model performed well on class 1(MGMT status methylation) predication and faced a bit of difficulty on class 0 (MGMT status unmethylation predication).

Training and validation ROC–AUC for phase 2 MGMT methylation prediction.
Figure 12 shows the ROC–AUC scores for the training and validation sets across different epochs. In tasks that require binary classification, the ROC–AUC is a performance metric that is commonly used. It plots the TPR against the fake positive rate to see how well the model can tell the difference between the positive and negative classes. In this case, ROC–AUC scores run to 0.92 for the training set and from 0.66 for the validation set on the 40th epoch. Since the amount of data fed into the network was small there was a chance of the model getting overfit on train data. Early stopping was done at the 40th phase to avoid overfitting and make sure that all evaluation measures were the same. This helps choose the model that works well outside of its training data and keeps the training data from being given too much weight. A higher ROC–AUC number means that the model works better, and values closer to 1 mean that the model can tell the difference between the classes very well. The line shows that the model works pretty well on both the training set and the validation set. The ROC–AUC scores for the training set run to 0.92, which shows that the model is able to tell the difference between the positive and negative classes in the training data. This shows that the model has found patterns and connections in the training set that make sense. Also, the ROC–AUC scores for the validation set run to 0.66. Even though the scores are lower in the validation set than in the training set, it is important to remember that the validation set is usually made up of new data that the model has not been trained on directly. Even so, the model shows that it can still tell the difference between the classes in the validation set. Based on Figure 12, we can say that the model works well overall because it gets high ROC–AUC scores on both the training and validation sets. The results show that the model has learned generalizable patterns and can also classify well from data it has not seen before. But it is important to do more research and look at other performance metrics to get a full image of how well the model works generally and where it might have problems.
Figure 13 shows the confusion matrix of results achieved by 3D ResNet10 which shows the model does a good job of identifying class 1 (MGMT methylation status). It knows that 26 cases belong to class 1, which makes them TPs. This shows that the model is good at figuring out when MGMT methylation is present. But the model has trouble figuring out what will happen in class 0 (MGMT state unmethylation). It predicts 18 cases belong to class 0 when they really belong to class 1. These are called false negatives (FN). It also says that 1 instance belongs to class 1 when it belongs to class 0. This is called a false positive (FP). The misclassification of class 0 that was seen in the validation confusion matrix could be caused by a class imbalance, a well-documented issue in MGMT classification studies. 9 Previous works on MGMT methylation classification have reported similar challenges, where the underrepresentation of the unmethylated class (class 0) negatively impacts model performance, leading to a higher rate of false negatives. The term “class imbalance” refers to a situation in which the number of cases in different classes is disproportionately distributed, with one class having significantly fewer examples than the other. In this case, class 0 (MGMT status not methylated) seems to have fewer instances in the dataset. This suggests that the model may not have had enough representative samples for this class to learn and apply well. So, the model might have trouble correctly classifying examples from class 0. This could lead to a higher number of false negatives (FN) and a lower average performance for this class. Overall, the model does a good job of finding cases with MGMT methylation, but it has trouble figuring out how to group cases without MGMT methylation. This demonstrates that the model may require additional development to distinguish these two groups. A more comprehensive picture of how well the model generally works and where it could be improved can be provided by conducting additional analysis and evaluation of other performance metrics. During the training process, it may be advantageous to utilize a larger sample size of MGMT label data in order to enhance the outcomes and eliminate the misclassification that was observed in class 0 (MGMT status unmethylation) of the validation confusion matrix. By consolidating a bigger and more different dataset containing a significant number of models from both class 0 and class 1, the model would have a more prominent possibility of learning the distinctive examples and qualities related to each class. The model's comprehension and classification precision for instances belonging to this class would improve as a result of this increased exposure to data from class 0. Besides, a bigger amount of MGMT mark information can give a more adjusted portrayal of the classes, in this way relieving the impact of class irregularity. This would guarantee that the model receives adequate training on both class 0 and class 1, leading to predictions that are more reliable and accurate. It is anticipated that the model's ability to classify class 0 (MGMT status unmethylation) will significantly improve with the use of a diverse and comprehensive MGMT label dataset. This will lead to predictions that are more reliable and accurate for both classes.

Validation set confusion matrix.
Comparison with state-of-the-art studies
Phase 2 of the pipeline performs the classification of MGMT promoter status using a 3D ResNet10, which takes the result of phase 1 as input. The comparison of MGMT promoter status classification based on the input data is shown in Table 8. The proposed study adopts a comprehensive strategy by incorporating multiple modalities, including T1, T1w, T2, and FLAIR MRI images. By integrating data from different imaging modalities, the model gains a more diverse and informative representation of the brain, leading to improved classification performance. Previous studies have typically focused on either segmentation or classification, with some researchers using only specific modalities. However, relying on a limited set of modalities may not provide a sufficiently comprehensive view of MGMT status. 50 Since each modality captures unique information about the tumor, 29 omitting any of them may result in critical features being overlooked. Therefore, utilizing all four modalities ensures a more robust and informative prediction of MGMT status. In this research, we extracted tumor regions (voxels) from whole-brain MRIs and used all four modalities provided in the BraTS2021 dataset. 3D MRI scans, incorporating multiple imaging sequences, offer a detailed view of the tumor and its surrounding tissue, contributing to a deeper understanding of the pathology. This multi-modal approach helps extract richer features for MGMT promoter status classification. 6 Most previous studies have used whole-brain MRI for MGMT classification, which may introduce irrelevant information into the learning process. In contrast, our approach specifically focuses on tumor regions, ensuring that classification is based on features directly associated with MGMT promoter status. Given that MGMT methylation is present in tumor tissue,6,10 segmenting the tumor before classification enhances the model's ability to extract meaningful features. By integrating segmentation and classification into a unified framework, the proposed method allows the segmentation output to serve as a direct input for classification, ensuring a more targeted and effective analysis. Additionally, previous MGMT classification studies have reported challenges related to class imbalance, particularly in datasets where the unmethylated class (class 0) is underrepresented.9,29 Addressing this imbalance is crucial, as models trained on imbalanced datasets may exhibit bias toward the dominant class. We acknowledge this limitation and suggest that future work explore techniques such as data augmentation, oversampling, or reweighting strategies to mitigate class imbalance and enhance classification accuracy.
Results of related studies of BraTS2021 MGMT classification.
The proposed pipeline achieved a ROC–AUC value of 0.66 on the validation set, demonstrating its effectiveness in accurately classifying MGMT promoter status.
The performance achieved by the proposed study is comparable with other existing studies in the field, as demonstrated in Table 8. It is important to note that this study utilized data from 536 patients, which is a slightly smaller dataset compared to studies involving 585 cases. However, this sample size was selected because it contained both segmentation data and MGMT labels. Despite the smaller dataset, our pipeline, incorporating all available modalities, achieved superior performance compared to certain other studies. In the future, employing a larger labeled dataset could further improve these results. The suggested approach ensures that an appropriate and relevant subset of data is selected for this study's specific objectives.
Model complexity
The proposed 3D ResNet10 model exhibits a relatively simple model complexity compared to other architectures. With fewer parameters than complex models like ResNet50 or VGG16, the 3D ResNet10 model strikes a balance between being lightweight and capable of learning the correlations between input data and output labels. It incorporates 14.3 million parameters and performs at 8.1 Giga-Floating Point Operations Per Second (G FLOPs). This reduced intricacy was necessary due to the limited availability of training instances (536) with MGMT promoter status in the BraTS 2021 dataset. Furthermore, the model demonstrates efficient training with an average execution time of 5 minutes per epoch and a total training time of approximately 8 hours. Additionally, the model's assessment duration per instance is only 5 seconds, making it suitable for real-time deployment in clinical systems. Table 9 provides a comparison of the model complexity and computational cost among the 3D ResNet50, 3D VGG16, and 3D ResNet10 3D CNN-based models.
Comparison of parameters and FLOPs of CNN-based models.
The 3D ResNet50 model has about 40 million parameters and needs about 150 GFLOPs of processing power to run. The 3D VGG16 model, on the other hand, has about 140 million parameters and needs about 250 GFLOPs. With about 14.3 million parameters and about 8.1 GFLOPs, the proposed 3D ResNet10 model stands out as the simplest and lightest. This demonstrates the varying levels of complexity and computational demands among the models. Notably, the 3D ResNet10 model stands out as a more efficient option, consuming less computing power compared to the larger and more resource-intensive 3D ResNet50 and 3D VGG16 models.
Limitations
While our study presents a robust and well-validated approach to BTS and MGMT methylation classification, certain challenges warrant further investigation. The BraTS 2021 dataset, while the largest publicly available dataset with both segmentation and MGMT labels, consists of 536 patients, which may appear limited but remains the only standardized resource for this task. Privacy regulations prevented us from accessing additional annotated data from local hospitals despite outreach efforts. MGMT methylation prediction is a complex radiogenomic task, and while our ROC–AUC score of 0.66 may seem modest, it aligns with prior studies and offers clinical value when combined with other biomarkers. To enhance classification performance, future work should explore multi-institutional datasets, alternative feature extraction techniques, and self-supervised learning approaches. Additionally, class imbalance remains a challenge, and while techniques like oversampling and data augmentation did not yield significant improvements, future efforts could investigate weighted loss functions or synthetic data generation. Our model's computational requirements, including 47.30 M parameters and a 70.75-minute epoch time, are comparable to state-of-the-art medical imaging models, with a 12-second inference time that is clinically feasible, though further optimization through model pruning and quantization could improve efficiency. While our segmentation-first pipeline follows established best practices, we acknowledge the potential for error propagation and have implemented data augmentation, post-processing, and ensemble methods to mitigate this. Future research could explore end-to-end architectures that jointly optimize segmentation and classification to reduce cascading errors. Finally, while our study provides a strong research foundation, real-world clinical adoption requires further validation, including external testing across diverse populations, comparisons with expert radiologists, and integration into clinical workflows. Prospective studies incorporating multi-modal AI approaches—such as combining imaging, genomic, and clinical data—may further improve predictive performance and translation to clinical practice. Addressing these challenges will be key to bridging the gap between research and real-world application.
Conclusion and future work
This research attempts to solve the problem of BTS and classification of MGMT promoter status using the BraTS2021 dataset and MGMT Promoter dataset. In the current research, the proposed pipeline provides a comprehensive and integrated way for identifying MGMT Promoter status, combining segmentation and classification models and using all MRI modalities (T1w, T1CE, T2w, and FLAIR) by stacking them to attain comparable results. The pipeline consists of two phases, phase 1 uses 3D MRI modalities to segment the brain tumor subregions using a 3D ResU-Net architecture, and phase 2 uses the result of the segmentation model (segmented tumor voxels) to predict the MGMT promoter status using 3D ResNet10 classifier. The segmentation phase achieved promising results with average dice scores of 0.81, 0.84, and 0.80 for TC, WT, and ET on the validation set. The classification phase achieves a ROC–AUC score of 0.66 on the validation set, indicating its potential in predicting MGMT promoter status. The proposed pipeline has therapeutic utility since it can assist neuro-oncologists to diagnose brain tumors more accurately and efficiently. By decreasing subjectivity and variability in human interpretation, the BTS and classification pipelined system can potentially improve diagnosis consistency and objectivity. The proposed pipeline has the potential to help neuro-oncologists to predict tumorous regions and classify tumor sub-regions like WT, TC, and ET. Furthermore, it can also assist neuro-oncologists or surgeons in predicting the MGMT promoter status without using surgical equipment. In the future, leveraging large datasets for training novel DL models will enhance overall performance of the proposed pipeline and it can assist radiologists in predicting tumorous regions and MGMT promoter status more efficiently. Advanced techniques will elevate diagnostic quality, streamline decision-making, and improve patient care.
