Abstract
Introduction
Kidney tumors are one of the top 10 most common malignant tumors, accounting for 3.7% of newly diagnosed tumors. Renal cell carcinoma (RCC) is the most common type, accounting for 85% of cases. 1 The median age at diagnosis is 64 years, and it is more commonly seen in men than in women. The 5-year survival rate of RCC has been constantly improving, but the overall prognosis remains poor, especially in patients with a later-stage disease. 2 In the past 12 years, RCC has been targeted by cytokines, targeted therapy and immunotherapy, however, therapeutic benefits are limit. The most common pathological type of RCC is renal clear cell carcinoma (KIRC), which has poor prognosis and a high degree of malignancy. 3
mRNA-processing events which include alternative splicing, m6A methylation and alternative polyadenylation (APA), are crucial in the regulation of most human genes in various diseases, such as brain cancer, lung cancer, liver cancer and COVID-19.4-9 However, there is limited research on the relationship between alternative polyadenylation (APA) and renal clear cell carcinoma. APA is regulated by core polyadenylation trans-factors, including cleavage and polyadenylation specificity factor (CPSF), cleavage stimulatory factor or cleavage stimulation factor (CSTF), cleavage factor (CFim and CFIIm), poly(A) binding protein nuclear 1 (PABPNl), cytoplasmic poly(A) binding proteins (PABPC1 and PABPC4), Factor Interacting With PAPOLA And CPSF1(FIP1L1), Symplekin(SYMPK), and Cleavage and polyadenylation factor subunit(PCF11). 10 Presently, numerous relevant studies have found that APA causes abnormal changes in a variety of tumors.6,11-15
Cancer development is highly associated with the physiological state of the tumor microenvironment. 16 Tumor gene mutation burden (TMB) pertains to the number of mutations in the cancer cell genome and TMB score was associated with multiple immune components and signatures in tumor microenvironment. 17 APA have been reported to be closely associated with the tumor microenvironment in breast cancer. 18 However, few studies have focused on the relationship between APA and tumor microenvironment in kidney cancer.
In this study, we constructed a (least absolute shrinkage and selection operator) lasso regression model using transcriptomic and clinical data of APA regulatory factors in The Cancer Genome Atlas (TCGA) database and found that 5 APA regulatory factors (CPSF1, CPSF2, CSTF2, PABPC1, and PABPC4) play more important roles in renal cell carcinoma. Lasso is a regression analysis method that combines feature selection and regularization to enhance the predictive accuracy and interpretability of statistical models. We found above 5 regulatory factors mainly regulate the mRNA expression of immune-related genes. To further analyze the relationship between APA regulatory factors and clinical features in KIRC and confirm the specific pathway regulating APA regulatory factors, we analyzed the dynamic changes between the expression levels of these APA regulatory factors and clinical features by applying multi-omics data from TCGA. At last, we validated our result in independent datasets of GEO.
Methods
Patient dataset collection
Gene expression data of Kidney Renal Clear Cell Carcinoma were downloaded from the TCGA database (https://cancergenome.nih.gov/). 19 RNA-Seq and clinical data were obtained for 538 samples. The downloaded clinical data included information on age, pathological stage, sex, chemotherapy status, follow-up date, and survival status. Those with missing survival data, missing follow-up date, and survival less than 1 month duration were excluded, and the remaining data were further matched with gene expression data. Finally, 175 patients with both gene expression and clinical data were obtained. To validate our results analyzed from TCGA, independent RNA-Seq data for Renal Cell Carcinoma from the cancer and normal tissue groups were downloaded from the GEO database GSE76207 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE76207) 20 and GSE29609 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE29609). 21
Immune cell phenotypes analysis
All analyses involved in this study were performed using the R software (version 3.5.1). The method flow of the KIRC classification model and the prognosis model construction is shown in Figure 1. First, the immune cell infiltration levels of KIRC tumor tissue and normal tissue were evaluated using TIMER (https://cistrome.shinyapps.io/timer/). 22 Then, the immune cell score composed of 8 immune cells was constructed using lasso regression combined with clinical characteristics to construct the KIRC model by Cox regression. TIMER can recognize 22 immune cell phenotypes, including B cells, T-cells, natural killer cells, macrophages, dendritic cells, and bone marrow subpopulations, with high sensitivity and specificity.

The pipeline of our method.
Gene expression and clinical data analysis
Gene Expression Profiling Interactive Analysis (GEPIA, http://gepia.cancer-pku.cn/), developed by Peking University, is an interactive web server that integrates and helps analyze cancer expression profile data, including RNA sequencing expression data of tumor samples and normal samples from 33 kinds of tumors in the TCGA and GTEx.23,24 In this study, the GEPIAs database was used to plot survival curves. Cbioportal (https://www.cbioportal.org/study/clinicalData?id=kirc_tcga) is a TCGA online data analysis platform that includes data on gene mutation, transcriptome, and proteomics. Oncoprints were performed using Cbioportal software. Differential gene expression analyses were performed by DESeq2 according to the read counts, read counts of each gene determined by HTSeq. Genes with
Risk assessment model construction
Lasso regression helps obtain a more concise model by constructing a penalty function to compress some regression coefficients and set some regression coefficients to zero. 26 In this study, the R-package “glmnet” was used to select the penalty parameter with the minimum error through 10-fold-change cross-validation λ, to select the most effective prognostic marker and its corresponding regression coefficient in the construction of the risk score. According to the risk model, samples in TCGA were assigned a score and then divided into high-risk and low-risk groups with the median risk score as the threshold. The survival curves of the patients in the high- and low-risk groups were drawn with the R-package “Survival,” and the survival times of the 2 groups were compared using log-rank test. We developed a computational framework through integrating gene expressions and clinical data in TCGA and GEO (Figure 1).
Results
Construction of APA regulator signature-based risk assessment model in KIRC
Analysis of the gene expression of APA regulatory factors in TCGA renal cell carcinoma samples revealed that CPSF1, CPSF2, CPSF3, CPSF4, CPSF6, CPSF7, FIP1L1, CSTF1, CSTF2, CSTF2T, PCF11, SYMPK, PAPOLG, PABPC1, PABPC4, and PABPN1 were significantly different between tumor tissues and normal tissues (Figure 2A). CPSF4L, CSTF3, PAPOG were not significantly different between tumor tissues and normal tissues. Lasso Cox regression analysis was performed on the above significantly differently expressed APA regulators. The penalty parameter lambda was selected by the cross-validation method to obtain relatively independent feature genes for subsequent model analysis (Figure 2B and C). The risk score for each patient was calculated using the following formula: Risk score = 0.01927*EXP(CPSF1) + (−0.00165)*EXP(CSTF2) + (−0.00927)*EXP(CPSF4) + (−0.01672)*EXP (PABPC1) + (−0.00255)*EXP (PABPC4) (Figure 2B and C). All kidney cancer patients from TCGA were classified as high or low risk according to the optimum cut-off risk score in the KIRC cohort. Interestingly, GO enrichment analysis of differently expressed genes between high-risk group and low-risk group indicated that APA regulators were associated with immune microenvironment-related genes in KIRC (Figure 2D). We speculated that APA regulatory factors might regulate immune microenvironment-related APA to affect the immunotherapy of KIRC.

Feature selection using the LASSO Cox regression model. (A) Heatmap showing the expression level of APA regulator in cancer and normal tissue. (B) The partial likelihood deviance was plotted versus log (lambda). The y-axis indicates the partial likelihood deviance, while the lower x-axis indicates the log (lambda) and the upper x-axis represents the average number of predictors. (C) LASSO coefficient profiles. The coefficients (y-axis) were plotted against log (lambda) and 5 features with nonzero coefficients were selected to build the radiomics signature. (D) GO-Term analysis of differently expressed genes between high-risk group and low-risk group.
Hallmarks and survival analysis of APA regulators
To further investigate the relationship of immune microenvironment and APA regulators, all patients classified as high or low risk were used to performed TMB score. TMB score partly explains the clinical response to immunotherapy, and we found that the high-risk group had low TMB (Figure 3A), indicating lower levels of neoantigens that can be recognized by the immune system. OS was also compared between the 2 groups using Kaplan-Meier analysis. The results (Figure 3B) showed a different survival curve based on the variables selected by the Cox model in the KIRC, suggesting the significance of the APA regulators in the separation of the 2 risk groups. Furthermore, we downloaded the most frequently mutated genes (VHL, PBRM1, SETD2, BAP1, MTOR, PTEN, KDM5C, ARID1A, TP53, and SPEN) in KIRC. The patients were ranked by the risk score (formula given earlier). The mutation status of these genes is shown in Figure 2C. Interestingly, the mutation rates of VHL, PBRM1, MTOR, and SETD2 were low in the high-risk score group. However, BAP1 mutations were mutually exclusive with PBRM1, MTOR, and SETD2. In addition, we did not find an anomalous trend in the mutation status of PTEN, KDM5C, ARID1A, TP53, and SPEN. This could be due to the low mutation rates of these genes.

LASSO Cox regression model predicted TMB and gene alternative status of KIRC. (A) Mutation burden in high-risk score group versus low-risk score groups. (B) Survival curves obtained for the genes exclusively selected by the COX method, when analyzed individually. (C) Oncoprint plot showing key genes mutated in KIRC. Each column denotes an individual tumor and each row represents a gene. Colors indicate type of gene alternative as indicated in the legend below the oncoprint.
APA regulators risk assessment model for predicting the immune status of KIRC
Just as we expected, higher expression levels of CPSF1, CPSF2, CSTF2, PABPC1, and PABPC4 were significantly associated with increased tumor infiltration of CD4+ T-cells and neutrophils (Figure 4). CPSF1 was associated with tumor infiltration of CD4+ T-cells and neutrophils in the KIRC. CPSF2, PABPC1, and PABPC4 were associated with purity, B cells, CD8+ T-cells, CD4+ T-cells, macrophages, neutrophils, and dendritic cells. However, we found that CSTF2 had no relationship with the purity of KIRC.

Expression level of 5 APA regulators CPSF1, CPSF2, CSTF2, PABPC1, and PABPC4 is correlated with the level of immune infiltration in KIRC.
Survival analysis of five genes in KIRC
Furthermore, 3 (CPSF1, CPSF2, and CSTF2) of the 5 genes were closely correlated with the overall survival (OS) in KIRC. The high expression levels of CSTF2 and CPSF2 were associated with superior OS in KIRC, indicating that CSTF2 and CPSF2 are risk factors for KIRC. The low expression level of CPSF1 is associated with poor survival time in KIRC, indicating that CPSF1 is not a risk factor for KIRC. Thus, most key APA regulatory factors were significantly correlated with prognosis (Figure 5).

Kaplan-Meier survival plots representing the correlations between the expression level of 5 APA regulators expression levels in KIRC. P value is indicated.
Dynamic APA to APA regulatory factor network and risk model for predicting survival in KIRC
In fact, Hu et al have constructed a multi-omic data based 7-gene model in KIRC previously. To compare with this model, we plotted the receiver operating characteristic (ROC) curve of the true-positive rate (sensitivity) as a function of the false-positive rate for 7-gene model and our APA model. We found the AUC of their 7-gene model was 0.64 in our data. Our APA model shows better performance comparing with previously published 7-gene model (Figure 6A). Therefore, the results show that our risk model has a very good predictive efficiency in KIRC. To further validate our results concluded in TCGA datasets, we analyzed RNA-Seq data of other independent RNA-Seq data for Renal Cell Carcinoma from the cancer and normal tissue groups and observed the same results (Figure 6B), 20 suggesting 5 APA regulatory factors (CPSF1, CPSF2, CSTF2, PABPC1, and PABPC4) were significantly associated with development of KIRC.

ROC curve and data validation. (A) TCGA and GSE29609 receiver operating characteristic (ROC) curve for APA risk model (red line) and 7-gene model (blue line) for the prognosis of KIRC. (B) Boxplot of gene expression level of 5 APA regulators CPSF1, CPSF2, CSTF2, PABPC1, and PABPC4 in validation datasets.
Discussion
APA has been reported to drive oncogenic gene expression in many cancers, particularly kidney cancer. 27 Therefore, APA regulators which regulated APA may be an excellent predictor of survival in renal cell carcinoma. The research of APA regulators including CPSF, CFim and CFIIm, PABPC1 and PABPC4, CSTF, PABPNl, FIP1L1, PCF11, and SYMPK28-32 is limit. To date, there has been no research focusing on the role of APA regulators in KIRC.
Previous research has found that APA plays an important role in renal cancer using bioinformatics analysis.
27
There are 2 potential mechanisms of APA regulation during tumorigenesis. APA is regulated in
The results of this study will enhance our understanding of the underlying roles of APA in KIRC. The conclusions might be meaningful to improve the understand of mechanism of KIRC and provide directions for future treatment trends.
