Abstract
Introduction
Finding differentially expressed proteins (DEPs) between two conditions is a common problem in biological and clinical research. There are several quantification methods for measuring DEPs based on liquid chromatography-mass spectrometry/tandem mass spectrometry (LC-MS/MS). These are either label-free or stable isotope labeling methods. 1 Label-free methods offer wider dynamic range and broader proteome coverage, while stable isotope labeling approaches offer higher quantification precision and accuracy. 2
Among many labeling methods, chemical isobaric tagging (including Isobaric tag for relative and absolute quantitation [iTRAQJ]and Tandem mass tag [TMT]) provides up to 8-plex analysis by quantifying at the tandem MS level. However, it suffers from severe dynamic range compression and reduced quantitative accuracy due to precursor interference when samples are complex.3,4 Considering that a lot of research studies need to deal with complex samples, we have to consider quantification at the LC-MS level.
Among labeling methods at the LC-MS level, stable isotope labeling of amino acid in cell culture (SILAC) 5 has high quantification accuracy. However, SILAC requires several cell cycles to incorporate the labels, and in research like microRNA (miR) target prediction, the miR-mediated regulation of proteins with long half-lives may not be detected by measuring steady-state protein levels using SILAC. 6 Pulsed SILAC, which only compares the differential expression of newly synthesized proteins at different time points, is a great analytical tool. However, it is relatively expensive, time-consuming, and not practical for analyzing biological samples that cannot be grown in culture, such as tissues or body fluids. 6 In addition, most proteomic centers that run LC-MS/MS experiments cannot perform required cell culturing because of licensing issues in handling virus-transfected cells. Laboratories with cells that need LC-MS/MS analysis may not have the resource and time in implementing a complex laboratory protocol, such as SILAC/pulsed SILAC. These practical aspects limit the application of SILAC/pulsed SILAC.
An alternative LC-MS quantification method based on chemical labeling is dimethylation of peptides. However, deuterated peptides show a small but significant retention time difference in reversed phase chromatography compared to their nondeuterated counterparts. 7 This complicates data analysis because the relative quantities of two peptides cannot be determined accurately from one spectrum, and it requires integration across the whole chromatographic timescale. Considering that there exists a lot of co-eluting peptides that contaminate elution peaks, integrating across the whole chromatographic timescale becomes impractical.
Isotope-coded affinity tagging is a chemical labeling method that was first described by the Aebersold lab. This method only quantifies cysteine-containing peptides carrying
Reverse phase protein array 8 is another protein quantification method; however, it is limited with the availability of high-quality protein antibody.
18O/16O labeling has relatively low cost and complexity. It does not require specific amino acid in peptides 9 and label incorporation through several cell cycles, nor does it cause significant elution time shifts between heavy- and light-labeled peptides. Its dynamic range of quantification is larger than that of tandem MS-based quantification methods. These properties offer 18O/16O labeling the maximum flexibility in application.
In this work, we propose to develop an LC-MS-based quantitative proteomic approach for identifying DEPs based on 18O/16O labeling and apply the approach to the problem of Kaposi sarcoma-associated herpesvirus (KSHV) miR-K1 target prediction. MiRs are short RNAs that regulate target gene expression levels.10,11 Dysregulation of miRs may lead to disease progression and cancer pathogensis, 12 but the underlying mechanisms are still not very clear. The understanding of miR function is not possible without the knowledge of target messenger RNAs (mRNAs) of miRs. KSHV is the causative agent of Kaposi sarcoma (KSar), which is associated with primary effusion lymphoma (PEL), and a subset of multicentric Castleman disease. 13 KSHV encodes dozens of miRs derived from 12 pre-miRNAs, among which miR-K1 is a very important one. It directly regulates the lKBα protein by targeting the 3′ untranslated region (UTR) of its transcript. The expression of miR-K1 is sufficient to rescue lKBα protein activity and inhibit viral lytic replication, whereas the inhibition of miR-K1 in KSHV-infected PEL cells has the opposite effect. 14 We aim to identify miR-K1 targets by identifying DEPs between the human embryonic kidney 293T cells transfected with KSHV miR-K1 and the control group transfected with an empty vector for 48 hours. 14 We use tandem MS for peptide identification (not quantification) and LC-MS for quantification. KSHV-transfected sample is digested with trypsin in 18 O-water, and the control sample is digested in normal water. Subsequently, after protein digestion, the samples are mixed together. Twenty strong cation exchange (SCX) 15 fractions of peptides are collected, and each fraction is further divided into three technical replicates.
In 18O/16O data processing, one needs to properly address the following issues.
We need to perform peptide feature alignment between different technical replicates, so that a peptide can be quantified multiple times to reduce quantification variations.
Due to interactions between peptide ion clouds inside mass spectrometers, smaller ion clouds in Orbitrap instruments can get torn apart by larger ion clouds 16 and the abundance measurements of smaller ion clouds are suppressed randomly, as shown in our previous research. 17 This suppression effect leads to unknown bias and variation in fold change measurements.
Multiple peptides of a protein could have been quantified in different data fractions with different bias and variation, which cannot be estimated by assuming a simple Gaussian noise model. 17 Consequently, it is hard to estimate protein expression levels based on peptide measurements.
We do not know what statistical test is most appropriate for picking DEPs, given the complex structure of LC-MS/MS data, where each peptide has its unique measurement variance.
Although there are several algorithms and software programs published for 18O/16O labeling,18–22 the abovementioned issues have not been addressed properly. To overcome these issues, we first apply an alignment algorithm called statistical corresponding feature identification algorithm (SCFIA) 23 to boost the total number of quantifications per peptide per protein. In this way, the majority of tandem MS-identified peptides within an SCX fraction can be quantified three times in three technical replicates. Otherwise, a lot of peptides are only quantified once, as in the widely distributed packages such as Trans-Proteomic Pipeline (TPP) 24 and MaxQuant. 19 With more measurements per peptide, we can reduce the measurement variance. After alignment, we employ a peptide quantification method that we developed in a previous study 25 to partially remove interference and random suppression effects.
After peptide quantification, we need to find downregulated DEPs because miRs regulate proteins through suppression. Thus, we need to estimate the direction of protein expression. We develop a variance estimation method, so that the estimated variance can be used for weighing peptide measurements. We assume that unique and nonunique peptides have uniform variance within their fractions. The uniform variance assumption allows us to weigh unique and nonunique measurements properly.
After we determine the direction of protein expression, various statistical tests are employed for picking DEPs. In genomics, tests are either homoscedastic or heteroscedastic. The former model assumes uniform variance for all protein/gene measurements, and obviously, this does not fit our proteomic data. Heteroscedastic model in genomic data processing assumes gene-by-gene variance
26
; however, multiple measurements of genes are still assumed to have the same distribution, which does not fit our data either. A recent research
27
compares several statistical tests often used in proteomics. However, none of the tests, including the widely used
In this paper, we select statistical tests by examining the enrichment of photoactivatable ribonucleoside-enhanced cross-linking and immunoprecipitation (PAR-CLIP) predicted targets with seed match to miR-K1 among top DEPs returned by different statistical tests. PAR-CLIP uses 4-thiouridine to label mRNAs
As seed match is one of the known mechanisms of miR-mRNA binding, a correlation should exist between the PAR-CLIP-predicted targets with seed match to miR-K1 and the DEPs predicted by a good statistical test. On the other hand, the two lists are not expected to overlap completely because: (1) There may exist other binding mechanisms, 30 and consequently, PAR-CLIP plus seed match produces a target list that contains false positives. (2) Downregulated DEPs could be attributed to secondary effects of miR transfection.
We developed a statistical test called Kullback-Leiberler
31
distance test (KL test), which uses the KL distance as a goodness-of-fit measure for comparing peptide fold changes of a protein to that of a background protein with the same number of peptide measurements. Through the KL test, we obtained a significant enrichment of PAR-CLIP-predicted targets with seed match to miR-K1 (PAR-CLIPSMK1). We also examined the
After the statistical test, we had a list of DEPs whose mRNAs are potential miR-K1 targets. However, due to protein-protein interactions, the mRNAs of these DEPs may not be direct miR-K1 targets, and we have to jointly consider miR-K1 targets returned by other prediction methods. Computational methods have been widely used to predict miR targets, 33 and most of them rely on sequence complementarity between the 5′UTR end of mature miRs and the 3′UTR of target genes (seed match).34,35 While seed match is an important mechanism for miR target binding, there are other possibilities. 36 For example, the SVMicro 37 miR target prediction algorithm investigates over 30 features and statistically combines these features for target prediction. However, computational target prediction algorithms suffer from both high false-positive and false-negative rates 38 due to the lack of a comprehensive understanding of the binding mechanisms.
Alternatively, we can measure mRNA abundance changes using high-throughput microarray approaches 39 for target prediction because miRs cause downregulation at the gene level. However, this method could be problematic if most miRs regulate gene expression by translational inhibition rather than mRNA degradation, which has been shown to be the case in animals.40,41 Even if mRNA degradation is the main gene regulation mechanism, 42 there would still be a lot of downregulated genes due to the secondary effects of miR binding that are not direct miR targets.
To improve the true positive rate of identified DEPs as mi-K1 targets, we propose to combine various target prediction methods by further filtering the list of DEPs picked by the KL test using the following criteria: (1) The corresponding mRNAs of DEPs must have downregulation in microarray experiments; (2) the DEPs must have been predicted to be possible targets by SVMicro 37 ; and (3) the DEPs must have been reported as PAR-CLIPSMK1s.
After applying these criteria, the list of DEPs is reduced to three in our experiment, among which RAB23 and HNRNPU are novel. These novel targets have been confirmed by both Western blotting and Luciferase reporter assays, and it shows that the developed quantitative approach based on 18O/16O labeling can be combined with the genomic, PAR-CLIP, and target prediction algorithms to identify KSHV miR targets with high confidence. The developed approach will also have wide applications in other biological and clinical research.
Methods
Data collection
The LC-MS/MS data were pre-separated in 20 fractions, and each fraction consisted of three technical replicates. The samples from human embryonic kidney 293T cells (ATCC) were cultured in Dulbecco's modified Eagle's medium with 10% fetal bovine serum. One group was transfected with an expression vector expressing miR-K1 of KSHV, while the control group was transfected with a vector for 48 hours. 14 These two groups of samples were lysed in 8 M urea and 50 mM ammonium bicarbonate (pH 8.3). The lysates were subjected to centrifugation at 13,000 rpm for 20 minutes, and the supernatants were collected. Then, the two samples were denatured in 8 M urea, reduced using 10 mM dithiothreitol, alkylated with 30 mM iodoacetamide, and digested with trypsin (using an enzyme-to-protein ratio of 1:50) at 37 °C overnight. The samples were desalted with Sep-Pak cartridges, separated into two tubes, and dried in a SpeedVac. The first sample was resuspended in 100 mL 18 O-water (purity > 98%), containing 50 mM ammonium bicarbonate, 10 mM calcium chloride, and trypsin (1–50 w/w trypsimpeptide) pH 7.8. The second sample was treated in the same manner except that the 18 O-water was replaced with purified 16 O-water. After incubation with shaking at 450 rpm for five hours at 37 °C, the labeling reaction was terminated by first boiling the sample for 10 minutes and then adding 5 mL of formic acid to further inhibit any residual trypsin activity. A bicinchoninic acid assay was performed to determine peptide concentration. Two hundred micrograms of equally combined sample was fractionated into 20 fractions, using SCX. Then, the four samples were subjected to reverse phase-reverse phase LC followed by ETD-LTQ-Orbitrap Velos MS.
PAR-CLIP data were downloaded from http://bugs.mimnet.northwestern.edu/labs/gottweinlab/Data.html.
For more information, please refer Ref. 29. Details of gene expression data based on miR-K1 transfection were published in Ref. 37.
LC-MS/MS data processing
Figure 1 shows the overall processing flow chart that consists of four steps: (1) preprocessing, (2) peptide quantification, (3) protein quantification, and (4) identification of DEPs.

The overall processing flow chart.
Preprocessing
The goal of preprocessing is to obtain a list of tandem MS-identified peptides, based on which, we can quantify these peptides at the LC-MS level. We use TPP for this purpose. Raw LC-MS data collected from Orbitrap are first converted to the mzXML formats and are submitted for tandem MS peptide identification using X! Tandem, which is called by TPP. As mentioned in the previous section, there are 20 SCX fractions and 3 technical replicates within each fraction, which results in 60 LC-MS files. All the 60 files are processed in TPP in a signal run. The protein database used is International Protein Index (IPI) human database version 3.68 (http://www.mmnt.net/db/0/5/ftp.ebi.ac.uk/pub/databases/IPI/old/HUMAN/). For X! Tandem, the parent mass and fragment ions are searched with maximal mass errors of 7 ppm and 0.5 dalton, respectively. Methionine oxidation and n-terminal acetylation are considered as variable modifications, and cysteine carbamidomethylation is selected as the fixed modification. The modification mass of the C is set to 57.021464, and the potential modification mass of M is set to 15.994915. The input of the cleavage C-terminal mass change is set to 21.01. In database search, the minimum length of peptide is set to 6 and the maximum missed cleavage sites is set to 2. The peptide prophet score threshold is set to 0.9 to guarantee high confident identification. Peptides that are identified multiple times are combined, which resulted in a list of 31,268 distinct peptides and 4,740 distinct proteins. For more information about the protein expression, please refer Supplementary File (http://compgenomics.utsa.edu/zgroup/miRTargetprediction/miRTargets.htm).
MaxQuant processing
MaxQuant is a widely available software package that can process 18O/16O data. In order to compare our approach with that of MaxQuant, we download MaxQuant_1.3.0.5 from the webpage www.maxquant.org. IPI human database version 3.83 is selected as the source of protein sequences. We set the MSI tolerance to 20 ppm for the first search and 6 ppm for the main search. We set the MS/MS tolerance to 20 ppm, peptide false detection rate (FDR) to 0.01, protein FDR to 0.01, site FDR to 0.01, and heavy labels to 18 O. We have selected oxidation (M) and acetyl(protein N-term) as modification sites. In database search, the minimum length of peptide is set to 7 and the maximum missed cleavage sites is set to 2, as these are the default values of the software.
Peptide quantification
Although both TPP and MaxQuant perform peptide quantification, they are not specifically optimized for 18O/16O data. The ASAPRatio algorithm used in TPP is designed for low resolution data, and MaxQuant is shown to have large bias. 25
The quantification process is briefly described as the following. We first obtain a union list of tandem MS-identified peptides across all the three technical replicates in each fraction. Then, candidate LC peaks of peptides are identified at the LC-MS level. We employ a special LC peak detection algorithm that is effective in removing interference from co-eluting peptides. 25
If a peptide is identified by tandem MS with elution time information, we will pick the LC peak that matches in elution time. If a peptide is identified in other technical replicates, but not in current one, we employ SCFIA developed by Ciu et al. 23 to find its LC peak in the current technical replicate.
Once the peptide peaks are identified in all the technical replicates, a linear regression method is used to quantify the heavy light ratios (HLRs) between the labeled and unlabeled peptides. 43
Protein quantification
Once peptide level quantification is finished, we need to integrate peptide HLRs to that of proteins. However, peptide HLRs have different measurement variances. For example, nonunique peptides that are shared among several proteins have larger variance than unique peptides. Peptides measured on different fractions have different variances. These variances cannot be estimated for every peptide, and we cannot hope to get accurate protein quantification without knowing the variances. To best approximate the true protein expression level, we have to make some simplifying assumptions so that we can at least estimate the direction of protein expression. For this purpose, we assume that unique or nonunique peptides share the same measurement variance, within an LC-MS dataset, which can be estimated as described in the following sections:
Estimating peptide expression variances. To estimate the variance of measurements within a dataset, we consider two peptides from the same protein. Specifically, suppose two peptides from a protein in an LC-MS/MS dataset are measured to have HLRs as
As the peptides of the same protein in the same dataset share the same bias in sample preparation and instrument suppression/distortion, the means of the two log ratios can be assumed identical, and we have:
As we assume that the variance of LRD of unique peptides (varLRDU) is uniform within an LC-MS run, we can take many samples within an LC-MS file to estimate the varLRDU, which reflects variations from the labeling process and the instrument. The LRD of nonunique peptides (varLRDNU) will reflect the interference from other proteins in addition to that from the labeling process and the instrument. varLRDU and varLRDNU are estimated for each dataset (60 in total). We denote varLRDNU as
Determining protein expression direction using weighted average of peptide HLRs. Before protein quantification, we first need to combine peptide HLRs in three technical replicates within each fraction. We first consider taking the weighted average of three measurements when assuming independent instrument noise. However, this approach results in higher varLRDU and varLRDNU within each fraction. This can be attributed to the fact that distortions caused by the instrument are not completely independent in technical replicates. Therefore, we opt for selecting the replicate with the smallest varLRDU.
After peptide measurements within each fraction have been determined, we further consider the problem of combining different peptide measurements from different fractions. In such cases, the variation can be attributed to operational and experimental sources in different LC-MS runs, which are much larger than instrumental variations and can be assumed as independent.
Suppose a protein is measured for
Of note, the sign of
Differential expression prediction
The variance of peptides is calculated based on the assumption that all unique/nonunique peptides share the same variance within the same LC-MS run. While this is a reasonable assumption for estimating the direction of protein expressions, it ignores that each peptide has different labeling efficiency and goes through the instrument with its own random distortion. 17 We cannot rely on the estimated protein expression level for picking DEPs.
Popular statistical tests, such as the
The KL
31
divergence (information divergence) is a non-symmetric measure of the difference between two probability distributions, and the KL divergence between two normalized histograms P and Q is
We want to determine if a protein with
To avoid such problems, we draw 5,000 random samples of normalized histograms

The flowchart of calculating the KL score and significance of each protein.
We have also applied other statistical tests and quantification methods (TPP and MaxQuant) to pick the DEPs. As there is also the problem of significant mismatch in sample size in the
In these tests, we construct the background peptide distribution
In
The KS test is applied in a similar way as the
TPP returns a list of protein measurements and
The modified
Modified KS test is constructed by replacing the
MaxQuant is a popular software package that can process 18O/16O data. MaxQuant returns a list of protein measurements. As MaxQuant does not offer
Results
We compared and evaluated different test statistics based on the enrichment of PAR-CLIPSMK1 proteins within the top ranked DEPs picked by different methods. The rationale is that if the test statistics reflects the real differential expression, then enriched PAR-CLIPSMK1 proteins should exist among top ranked proteins.
The enrichment rate is calculated by counting the percentage of PAR-CLIPSMK1 targets among all downregulated DEPs at a given rank of the DEPs sorted by the selected test statistics. For comparison, we also calculated the enrichment rate of PAR-CLIP targets without seed match to miR-K1.
The results are summarized in Figure 3. From these figures, we can see that DEPs ranked through the KS test/modified KS test,

Enrichment of PAR-CLIPSMK1 targets in top ranked DEPs using (A)
Further filtering of DEPs
Based on the KL test, 331 significantly DEPs are identified. Among them, 13 proteins are found to have overlap with PAR-CLIPSMK1 proteins. The proteins and their corresponding information are listed in Table 1. The enrichment curve is plotted in Figure 3D.
Protein expressed level.
We further filtered the 13 proteins by inspecting if their corresponding gene expressions are downregulated and if their mRNAs are predictedas miR-K1 targets using the bioinformatics tool SVMicro. 37 This cuts down the list to RAB23, PPP2CA, and HNRNPU. Of note, CAMK2D had three protein isoforms and their returned SVMicro scores are different in all the three cases. We did not include it after filtering.
MiR-K1 target verification
In order to verify that the mRNAs of the three DEPs are identified by the proposed approach as miR-K1 targets, we have further performed Western blotting and Luciferase reporter assay.
Western blotting
For target verification, 293T cells were transfected with synthetic miR-K1 mimic (50 nM). Forty-eight hours after transfection, endogenous protein levels were assessed by Western blotting. Tubulin was used as internal controls. Nitrocellulose (NC) was used as negative control for miR-K1 mimic, which was synthesized by Sigma-Aldrich. Locked Nucleic Acid (LNA) suppressors were chemically synthesized by Exiqon. The sequences are listed in Supplementary Table 1 (http://compgenomics.utsa.edu/zgroup/miRTargetprediction/suppl/Supplementary.pdf). Reverse transfection of RNA oligoribonucleotide(s) was done using Lipofectamine RNAiMAX (Invitrogen) according to the manufacturer's protocol.
In Western blotting, equal amount of protein samples was separated by sodium dodecyl sulfate-polyacrylamide gel electrophoresis and transferred to nitrocellulose membranes. The blots were blocked with 5% nonfat milk and incubated with primary antibody followed by a horseradish peroxidase-conjugated secondary antibody (Sigma-Aldrich). Specific bands were revealed with chemiluminescence substrates (Roche) and recorded with BioSpectrum Imaging System (UVP Inc.). Antibodies to PPP2CA were obtained from Cell Signaling Technology (CST). Antibodies to RAB23 and HNRNPU were obtained from Abeam.
RNA oligoribonucleotides and cell transfection
miR-K1 mimic was obtained from Sigma-Aldrich. The sequences were listed in Table 2. A scrambled oligonucleotide containing random sequence was used as a control. RNA oligos were transfected using Lipofectamine RNAiMAX (Invitrogen). The transfection of plasmid DNA with RNA oligos was performed using Lipofectamine 2000 (Invitrogen).
Sequences of miRNA mimics and PCR primers.
Construction of wild-type and mutant 3′UTR reporters
Wild-type (WT) and mutant 3′UTR reporters were generated as reported in a previous study. 45 A WT 3′UTR fragment of the human RAB23 or hnRNPU mRNA containing the putative binding sites for miR-K1 and its 5′ and 3′ flanking regions (271 and 258 bp for RAB23 site 1, 186 and 355 bp for RAB23 site 2, and 344 and 255 bp for hnRNPU, respectively) was polymerase chain reaction (PCR)-amplified and inserted into the Kpn I and Xho I sites, downstream of the stop codon of the firefly luciferase in the pGL3 vector. The mutant 3′UTRs, which carried the mutated sequences in the complementary seed region of miR-K1, were generated using fusion PCR based on the construct using the WT 3′UTR reporters as templates.
Luciferase reporter assays
Reporter assays for the 3′UTR reporters were carried out in 48-well plates as described in a previous study. 45 For each well, cells were cotransfected with 10 ng of the luciferase reporter plasmid, 2 ng of pRL-TK (Promega Corporation), and 10 nM of miR mimic. Cells were collected at 48-hour posttransfection and analyzed using the Dual-Luciferase reporter assay system (Promega Corporation). The pRL-TK vector providing the constitutive expression of Renilla luciferase was used as an internal control. Transfection was performed in duplicate, and all experiments were independently repeated at least three times.
Western blotting and reporter assay results
Western blotting results are shown in Figure 4. We can see that Western blotting results confirm all the three targets picked through the proteomic approach. Among the three confirmed targets, RAB23 and HNRNPU are novel. PPP2CA has been reported as an miR-K1 target in a previous publication.
29

293T cells transfected with synthetic miR-K1 mimic (50 nM). Forty-eight hours after transfection, endogenous protein levels were assessed by Western blotting. Tubulin was used as internal controls. NC was used as negative control for miR-K1 mimic.

RAB23 is a direct target of miR-K1. (A) Sequence alignment of miR-K1 with the RAB23 3′UTR. (B–C) miR-K1 suppressed the activity of luciferase through its binding site 1 (B) and site 2 (C) in RAB23 3′UCTR. 293T cells were cotransfected with the miR-K1 mimic or the scrambled control, a firefly luciferase reporter containing the WT or mutant 3′UTR reporter, and a Renilla luciferase expressing construct. The firefly luciferase activity of each sample was normalized to the Renilla luciferase activity. The mean of normalized luciferase activity of scrambled control in each experiment was set as 1.

hnRNPU is a direct target of miR-K1. (A) Sequence alignment of miR-K1 with the hnRNPU 3′UTR. (B) miR-K1 suppressed the activity of luciferase through its binding site in hnRNPU 3′UTR. 293T cells were cotransfected with the miR-K1 mimic or the scrambled control, a firefly luciferase reporter containing the WT or mutant 3′UTR reporter, and a Renilla luciferase expressing construct. The firefly luciferase activity of each sample was normalized to the Renilla luciferase activity. The mean of normalized luciferase activity of scrambled control in each experiment was set as 1.
These results show that our developed method is highly effective in identifying novel biomarkers based on 18O/16O labeling, which can be applied in a wide range of applications.
Function of identified targets
PPP2CA encodes the phosphatase 2A catalytic subunit. Protein phosphatase 2A is one of the four major Ser/Thr phosphatases, and it is implicated in the negative control of cell growth and division, and is involved in breast cancer. 47
Among the two novel targets, RAB23 encodes a small GTPase of the Ras superfamily and Rab proteins are involved in the regulation of diverse cellular functions associated with intracellular membrane trafficking, including autophagy and immune response to bacterial infection. 48 The encoded protein may play a role in central nervous system development by antagonizing sonic hedgehog signaling. 49 Disruption of this gene has been implicated in Carpenter syndrome as well as cancer. 50
HNRNPU belongs to the subfamily of ubiquitously expressed heterogeneous nuclear ribonucleoproteins (hnRNPs). The hnRNPs are RNA-binding proteins, and they form complexes with heterogeneous nuclear RNA. These proteins are associated with pre-mRNAs in the nucleus and appear to influence pre-mRNA processing and other aspects of mRNA metabolism and transport. 51 It has been shown that hnRNPU directly interacts with WT1 and modulates WT1 transcriptional activation, 52 which is the Wilms’ tumor suppressor gene. Other diseases associated with HNRNPU include diffuse gastric cancer. 53
Future Work
In this work, the aim is to return miRNA targets with low false-positive rate and we have applied three filtering criteria outlined in the Introduction section on DEPs detected by the proposed algorithm. The results have shown that these three criteria effectively ensured a zero false-positive rate in this case, which greatly reduced the cost of biological validation of the computed targets. However, the need of applying these filters has not been investigated, and there could be real miRNA targets that have been missed, which should be investigated in the future.
Conclusion
In this paper, we developed and applied a proteomic approach for identifying the targets of KSHV miR. The developed method is shown to be effective in finding miRNA targets. The developed method is based on 18O/16O labeling that can be used in many applications. Two novel and one previously identified miR-K1 targets are picked by the proposed method. They are further confirmed based on Western blotting and Luciferase reporter assay.
The core of the proteomic approach is based on the statistical test called the KL distance test, which uses the KL distance as a goodness-of-fit measure for comparing peptide fold changes of a protein to that of a background protein with the same number of peptide measurements. Through the KL test, we obtained a significant enrichment of PAR-CLIP-predicted targets with seed match to miR-K1 (PAR-CLIPSMK1). In comparison, none of the other statistical tests, such as
Although the proposed method has a limitation as DEPs can arise through some unknown mechanism other than miRNA involvement even in miRNA-transfected cells, we have shown that the application of additional filters based on PAR-CLIP and SVMicro can greatly reduce false-positive rate.
Author Contributions
Conceived and designed the experiments: SJG, TT, YZ, JZ. Analyzed the data: XM, JZ. Wrote the first draft of the manuscript: XM, JZ. Contributed to the writing of the manuscript: YZ, SJG. Advised XM regarding the usage of SVMicro software and the design of the comparison experiments: YH. Agree with manuscript results and conclusions: JZ, SJG, YZ, XM. Jointly developed the structure and arguments for the paper: XM, JZ, SJG. Made critical revisions and approved final version: JZ. All authors reviewed and approved of the final manuscript.
