Abstract
INTRODUCTION
There are currently no effective methods for prevention or treatment of Alzheimer’s disease (AD) and other dementias [1 –3], and the development of biomarker panels, which could be used in non-invasive detection methods, is in its early stages [4]. Insufficient understanding of the underlying molecular bases and mechanisms of disease development, including the involvement of inflammatory pathways [5 –10] as well as dysregulated microRNAs (miR) [11 –16], contributes to this lack of effectiveness in managing these debilitating conditions. The challenge in delineating the molecular mechanisms, which are key to healthy brain homeostasis and are disrupted during development of AD, is that the majority of cases are likely to be caused by multiple genetic and environmental factors [17 –19]. Nonetheless, several genes implicated in monogenic AD have been identified [20, 21]. To delineate the genetic risk factors contributing to polygenic AD, high-throughput experimental approaches, such as transcriptomic, proteomic and genome-wide association studies, resulting in sets of ‘big data’, are being used [18 , 22–28]. In order to be able to navigate this knowledge and use it for data analyses in an efficient way, researchers rely on bioinformatic resources, such as Gene Ontology (GO) [29, 30].
The GO resource is a biomedical ontology that uses a controlled vocabulary of GO terms to describe the normal physiological roles of biological entities, such as proteins and non-coding RNAs (ncRNAs), across all species and biological fields, in a consistent and computer-accessible manner. GO terms are associated with biological entities manually by scientific biocurators, based on published experimental information, and automatically by electronic pipelines, using carefully designed similarity criteria. The resulting links between GO terms and biological entities are known as ‘annotations’. The GO resource comprises three categories of terms, describing ‘
Annotation of proteins and ncRNAs serves to bridge the gap between data collection and data analyses by providing knowledge about their cellular roles in a format accessible to both systems biology and genomic investigators [22, 31]. In addition to GO [29, 30], other resources that provide annotations of biological entities’ roles include Reactome [32], the Kyoto Encyclopedia of Genes and Genomes (KEGG) [33], and molecular interaction databases [34]. One use of annotations provided by these resources is to identify gene groups that are represented at a higher (or lower) than expected frequency within a given gene list. Annotations are imported into independent enrichment or gene-set analysis tools, such as g:Profiler [35], the WEB-based Gene SeT AnaLysis Toolkit (WebGestalt) [36], the VisuaL Annotation Display (VLAD) tool [37], the Biological Network Gene Ontology (BiNGO) tool [38], the Protein Analysis Through Evolutionary Relationships (PANTHER) tool [39], or the Multi-marker Analysis of GenoMic Annotation (MAGMA) tool [40]. These analysis tools group genes with shared characteristics (such as an involvement in the same pathway or located in the same part of the cell) and apply appropriate statistical parameters to identify enriched or underrepresented gene groups, defined by their associated GO terms, pathways, or interactions. Thus, functional gene annotation data is used to interpret datasets from genome-wide association, proteomic, and transcriptomic studies [27 , 41–43].
In order to improve the GO resource for enrichment analyses relevant to neurobiological conditions, we previously annotated the biological roles of proteins implicated in AD [44], Parkinson’s disease [45], and autism [46], in addition to contributing to the synapse annotation project [47]. The AD-focused GO annotation initiative has already captured the roles of proteins and complexes interacting with either amyloid-
As previously, annotations resulting from this University College London (UCL)-based project, funded by the Alzheimer’s Research UK (ARUK) foundation, are labelled in GO browsers and/or secondary resources as contributed by ARUK-UCL [44]. GO data (ontology and annotations) are freely available and can be downloaded from the AmiGO [56] and QuickGO [57] browsers.
MATERIALS AND METHODS
Community engagement
Collaborations were established between members of the GO Consortium [29, 30] and neuroscience and neuroinflammation community experts to ensure that our biocuration efforts align with the needs of the AD research community. Project progress and direction were discussed and, if required, revised and updated during biannual scientific advisory panel meetings and through regular correspondence.
Curation priorities
A list of 40 human AD-relevant microglial proteins (Supplementary Table 1) implicated in neuroinflammation was compiled based on recent review articles [54, 55]. The microRNA-Target interactions dataBase (miRTarBase) resource [58] and scientific literature, indexed in PubMed [59], were subsequently searched for human miRs involved in silencing of genes encoding these 40 AD-relevant microglial proteins; this resulted in a list of 66 human miRs (Supplementary Table 2). Collectively, the 40 proteins and the 66 miRs comprised the 106 biological entities prioritized for annotation as a part of this project.
Identification of publications describing priority proteins
The PubMed database [59] was used to identify research articles that contained experimental data suitable for annotation. For each of the 40 priority proteins, PubMed searches were performed using the HUGO Gene Nomenclature Committee approved gene symbol (HGNC symbol) [60], protein name or synonym. If the search retrieved more than 100 papers, then the volume of papers was reduced by the inclusion of additional keywords (one at a time): ‘microglia’, ‘microglial’, ‘glia’, ‘glial’, ‘dementia’, ‘Alzheimer’s’, ‘Alzheimer’, ‘AD’, ‘neuroinflammation’, ‘neurology’, ‘neurological’, ‘neurobiology’, ‘neurodegeneration’, ‘nerve’, ‘nervous’, ‘brain’, ‘synapse’, ‘synaptic’, ‘memory’, ‘cognition’, ‘age-related’ or ‘aging’. Research articles describing the human proteins were then selected for annotation based on the relevance of their title or abstract. If no, or insufficient, information on a human entity was found, then papers describing mammalian orthologues, identified using the HGNC orthologue prediction tool ‘HCOP’ [61], were curated.
Identification of publications describing priority miRs
Regulatory miRs (Supplementary Table 2) were identified in two ways: firstly, using the miRTarBase [58] by searching for a priority protein (Supplementary Table 1) and selecting research articles based on reporter assay evidence, or western blot and qRT-PCR evidence; and secondly, by searching the PubMed database [59] using the priority protein (Supplementary Table 1) HGNC approved gene symbol [60], protein name, or synonym plus ‘miR’, ‘miRNA’, or ‘microRNA’. In contrast to the annotation of only carefully selected articles describing the priority proteins, all identified articles, which described an experimentally verified molecular interaction between a priority miR and a messenger RNA (mRNA) transcript of a priority protein, were annotated, irrespectively of whether they were describing neuroinflammation or other biological processes. This approach helped to reduce the chances of creating a set of miR annotations biased toward neuroinflammation.
Curation procedure
Research articles were read by skilled GO biocurators and biological roles and cellular locations of proteins and miRs were captured using GO terms, following established standard GO annotation procedures [53
, 62–64]. Molecular interactions between miRs and mRNA transcripts of their experimentally validated target genes were captured using the guidelines for GO curation of miRs [53]. Additional contextual information was provided in the GO annotation extension using terms from GO or other ontologies [64]. The Universal Protein (UniProt) KnowledgeBase [65], RNAcentral [66], Complex Portal [67], and Ensembl [68] identifiers were used for annotation of, respectively: proteins, ncRNAs including miRs, macromolecular protein complexes and targets of gene silencing by miRs. Specific Evidence and Conclusion Ontology (ECO) [69] codes were included in each biocurator-generated annotation, based on the type of experimental data reported in the research article (e.g., ‘IPI’: physical interaction evidence used in manual assertion (ECO:0000353), or ‘IMP’: mutant phenotype evidence used in manual assertion (ECO:0000315)), or to infer evidence from statements made in reviews (e.g., ‘TAS’: author statement supported by traceable reference used in manual assertion (ECO:000030)). In order to maximize the value of the annotations to the research community, selected research papers were annotated using the
Availability of GO annotations
The annotations contributed by this project to the GO resource are attributed to ARUK-UCL and included in the GO Consortium annotation files [29, 30]. Consequently, our annotations are made available through various http and ftp sites (e.g., http://geneontology.org/page/download-ontology and ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/) and the GO browsers QuickGO [57] and AmiGO [56] and propagated to other major biological databases, including NCBI Gene [70], Ensembl [68], UniProt [65], miRBase [71], and RNAcentral [66]. The miR-target and protein-protein interactions captured by our GO annotations are included in the European Bioinformatics Institute (EBI) GO Annotation (GOA) datasets, respectively, in the EBI-GOA-miRNA [52] and EBI-GOA-nonIntAct datasets available from the PSICQUIC [72] web service (http://www.ebi.ac.uk/Tools/webservices/psicquic/view/home.xhtml), the QuickGO web service (http://www.ebi.ac.uk/QuickGO/psicquic-rna/webservices/current/search/interactor/*), or from directly within Cytoscape [73], as described previously [52, 53].
PANTHER functional GO term enrichment analysis of genes differentially expressed in AD
The PANTHER version 14.1 Online Tool [39, 74] was used to perform functional enrichment re-analyses on two transcriptomic datasets previously analyzed by Avramopoulos et al., 2011 [43]. In the Avramopoulos et al. study, RNA was extracted from the superior temporal lobe of late onset AD and control brains. Two gene groups were identified within the AD versus control dataset: one overexpressed in AD (‘Higher is AD’, 505 genes), and one underexpressed in AD (‘Lower in AD’, 527 genes). UniProt identifiers were used in the PANTHER re-analysis of these datasets with the Fisher’s Exact test and the Bonferroni correction for multiple testing and significance level of 0.05 applied. The human reference proteome (released April 2018) was used as the reference set, with ontology and annotation GO files (released 3 July 2019).
MiR-target molecular interaction networks construction
Two miR-target molecular interaction networks were constructed in Cytoscape 3.7.1 [73]. In the first network the HGNC approved gene symbols [60], corresponding to the priority proteins (Supplementary Table 1), were used as seeds and molecular interaction data were imported from the EBI-GOA-miRNA file [52] (accessed 1 July 2019); this network is referred to as the ‘target-centered’ network. The second network was created by using the RNAcentral identifiers [66] of the prioritized miRs (Supplementary Table 2) as seeds and importing molecular interaction data from the EBI-GOA-miRNA file [52] (accessed 19 August 2019); this network is referred to as the ‘miR-centered’ network. The ‘yFiles Organic Layout’ was applied and adjusted manually.
Functional GO term enrichment analysis of miR-target interaction networks
GO term enrichment analyses were performed on the miR-target networks using the Cytoscape plugins GOlorize [75] and BiNGO [38]. The selected BiNGO settings included ‘Cluster from Network’, ‘Overrepresentation’, and ‘No Visualization’. The Hypergeometric statistical test with the Benjamini & Hochberg FDR correction for multiple testing and a significance level of 0.05 were applied. The GOlorize plugin [75] was used for visualization of selected overrepresented categories after correction. The human entities annotated to the GO term ‘
RESULTS
Gene Ontology annotation outcomes
Annotation priorities
Forty human microglial proteins relevant to AD were identified and prioritized for annotation based on the review by Simon et al. [55] and the perspective by Deczkowska et al. [54] (Supplementary Table 1). This was followed by database and literature searches, which identified 66 human microRNAs (miRs) that have been shown experimentally to regulate the expression of these 40 proteins, and they were also prioritized for annotation (Supplementary Table 2). Therefore, in total 106 human entities, including proteins and miRs, were identified and prioritized for GO annotation.
Increasing curation breadth through the full article annotation approach
GO was used for annotation of 379 research articles (Table 1), capturing the roles of these 40 microglial priority proteins and the 66 miRs regulating their expression (or their orthologues) as described in the methods section. Importantly, the curation process was based on the full article annotation approach; consequently, the total number of annotated entities is not limited to just the prioritized proteins and miRs, but it also includes their isoforms and orthologues as well as other entities described in the annotated articles. Consequently, the total number of 341 annotated human entities was over triple the number of the 106 prioritized entities, whereas the total number of human and non-human entities annotated was 494, almost five times the number of the prioritized entities (Table 2, section A). Hence, the full article annotation approach allowed us to ensure that we capture the breadth of the biology associated with the prioritized entities by annotating other proteins, miRs, and macromolecular complexes implicated in the same processes and pathways.
Number of published research articles annotated using GO as a part of this project
Data from QuickGO [57] (accessed 18 August 2019). *The term ‘entity’ is used to describe proteins, ncRNAs (including miRs) and macromolecular protein-containing complexes.
Summary of the GO annotations resulting from this project. In the table, the rows are organized into sets: A) all GO annotations for all entity types; B) all GO annotations separated into entity type (protein, ncRNA, complexes); C) molecular function (MF) GO annotations separated into entity type; D) biological process (BP) GO annotations separated into entity type; E) cellular component (CC) GO annotations separated into entity type. The columns are also grouped according to (I) totals for all entities; (II) totals for all human entities; (III) totals for human prioritized entities
Systematic in-depth annotation of prioritized entities
Based on the number of GO annotations and annotated entities (Table 2, columns I, section A), it is apparent that far more biological process (BP) annotations were created (2,281, with an average of > 4 BP terms per entity: 2,281/494) than annotations using the two other GO aspects (452 using molecular function (MF) terms and 349 cellular component (CC) terms). For the prioritized entities, we contributed an average of 12 annotations per entity (1352/106), which is considerably higher than the number of annotations we contributed for all entities during this project, with an average of six annotations per entity (3,084/494) (Table 2A, section A; column III and column I, respectively). The majority of annotations for prioritized entities were made for proteins with an average of 20 annotations per protein (839/42, Table 2, column III, section B). Almost 600 of these were BP annotations (an average of 14 BP terms per protein: 597/42, Table 2, column III, section D). Additionally, 101 MF and 141 CC GO terms were associated with the priority proteins (Table 2, column III, sections C and E, respectively).
The greatest number of annotations per prioritized miR was also in the BP category with just under six GO terms per miR (384/65, Table 2, column III, sections B and D), again more than the average of nearly 5 BP GO terms per total annotated ncRNAs (738/154, Table 2, column I, sections B and D). The average number of MF annotations per miR is just under two (Table 2, column III, sections B and C; 125/65), i.e., proportionately higher than for all annotated ncRNAs (Table 2, column I, sections B and C; 209/154).
The amount of cellular location information captured for all annotated ncRNAs, as well as the prioritized miRs, is low overall (Table 2, greater row E). This was expected, since most experimental data describing miRs involves reporter assays, or experiments demonstrating changes in expression levels, and not localization studies [76 –79].
In summary, our focused annotation approach has considerably enriched GO information content about the roles of the prioritized entities, including proteins as well as miRs. This result has been achieved despite the fact that miRs have not been studied as broadly as proteins, with proportionately less experimental data published to date.
Association of prioritized entities with neuroinflammatory processes
As a result of this project, 60 of the 106 prioritized entities have been associated with a biological process GO term relevant to neuroinflammation (Fig. 1A, B), such as ‘

Association of prioritized entities with neuroinflammation-relevant GO terms. a) Number of prioritized entities associated with the listed GO terms and their descendants, bars indicate total number of entities (1st bars), number of entities annotated by ARUK-UCL (2nd bars) or other contributors (3rd bars). b) Number of annotations for the prioritized entities, contributed by ARUK-UCL or other groups, categorized by entity type. c) A fragment of GO, representing the relationships among some of the terms selected for analyses shown in (a) and (b). (Data from QuickGO: accessed 18 September 2019, filtered by prioritized entities, GO terms listed in Supplementary Table 3 and their descendants, evidence used in manual assertion and contributor).
Impact of improved annotation on clinical data analysis
Re-analyses of AD-associated genes
In order to demonstrate how our annotation approach, focused on neuroinflammation in AD, can contribute to a more informative analysis and interpretation of disease expression data, a functional GO analysis of previously published AD-relevant datasets was undertaken. Avramopoulos et al. [43] had identified two groups of genes in a transcriptomic analysis of the superior temporal lobe, a brain region usually greatly affected by the AD pathology, of late onset AD and control brain samples: one overexpressed in AD (‘Higher in AD’, Supplementary Table 4A), and one underexpressed in AD (‘Lower in AD’, Supplementary Table 4B), relative to age-matched healthy controls [43]. A functional enrichment of these datasets, by Avramopoulos et al., had identified some highly AD-relevant processes, for example, ‘
The previous analysis of the ‘Higher in AD’ and ‘Lower in AD’ datasets, by Avramopoulos et al. [43], was undertaken using the functional enrichment online analysis tool PANTHER version 6 [39, 74] with Fisher’s test and Bonferroni correction for multiple testing. Therefore, our re-analysis used the same tool and these same parameters, but with more recent GO ontology and annotation data. Genes ‘Higher in AD’ were enriched (overrepresented) for GO terms such as ‘
Selection of GO terms enriched in a re-analysis of an AD transcriptomic dataset. Selected GO terms identified by PANTHER enrichment analysis of genes differentially expressed in AD. Two groups of genes, identified as ‘Higher in AD’ (A) and ‘Lower in AD’ (B) in Avramopoulos et al., 2011 [43], were analyzed using the PANTHER overrepresentation Test [39, 74]. The full set of enriched GO terms is available in Supplementary Table 5
The
This re-analysis of genes ‘Lower in AD’, i.e., having higher levels in healthy brains when compared to AD brains, revealed they were enriched (overrepresented) for GO terms relevant to maintenance of normal neurological functions and processes impaired in AD, such as ‘
Network analysis and bioinformatics-based prediction of neuroinflammatory genes
Two GO term enrichment analyses were performed in Cytoscape 3.7.1 [73] on networks of the prioritized human proteins and miRs, and their interacting partners, in order to delineate more specific roles of these entities in neuroinflammation.
Analysis of the target-centered miR-target network
Firstly, we constructed a network of miR-target associations centered around the prioritized microglial protein-coding genes. We seeded the network with the 40 priority genes (Supplementary Table 1) and imported the associated miR-target molecular interaction data into Cytoscape [73], from the EBI-GOA-miRNA file [52] containing experimentally-validated miR-target interaction data contributed by the British Heart Foundation (BHF)-UCL [80] and ARUK-UCL GO annotation initiatives [52, 81]. The resulting network included a total of 77 nodes, of which only 17 represented mRNA transcripts of the protein-coding genes and 60 represented miRs targeting these mRNAs (Fig. 2, Supplementary Table 6). The low number of mRNAs included in the network reflects the fact that, at the time of GO annotation, we did not find any published experimental evidence demonstrating miR-mediated gene silencing of the remaining 23 protein-coding genes.

Target-centered miR-target molecular interaction network. This network describes interactions between miRs and the mRNAs encoding AD-relevant microglial proteins. The network was constructed in Cytoscape [73] by seeding with 40 AD-relevant microglial gene symbols (Supplementary Table 1) and importing molecular interaction data from the EBI-GOA-miR file (accessed 1 July 2019). The protein-protein interactions (PPIs) edges were added to the network manually, based on data from another network seeded with the 17 AD-relevant microglial proteins shown in this Figure 2 (Supplementary Figure 1, Supplementary Table 7). The colors of node fragments correspond to GO terms (see key). Data associated with the enriched GO terms displayed in this figure is summarized in Table 4.
The target-centered miR-target network contained 75 edges representing associations between miRs and the targets of their regulation. As this network only included miR-target interactions, four additional edges, corresponding to protein-protein interactions (PPIs) were manually added, which increased the association between the 5 isolated sub-networks. In order to identify these PPIs, we constructed a network of PPIs only (Supplementary Fig. 1, Supplementary Table 7) by using the 17 protein-coding genes from Fig. 2, as seed nodes, and importing PPI data meeting the International Molecular Exchange standard [82].
GO term enrichment analysis, performed in Cytoscape [73], using the BiNGO [38] and GOlorize [75] plugins (Supplementary Table 8, Table 4, and Fig. 2), demonstrates that among the 60 miRs, regulating the expression of the prioritized protein-coding genes, 17 are overrepresented in ‘regulation of inflammatory response’ (Fig. 2). Of those 17, four are enriched specifically in ‘regulation of neuroinflammatory response’ and one, miR-155-5p, was also in a group of entities overrepresented in ‘
Selection of GO terms enriched in the Cytoscape BiNGO analysis of the target-centered molecular interaction network. The network, shown in Fig. 2, was constructed in Cytoscape [73] by seeding with the 40 prioritized AD-relevant microglial genes (Supplementary Table 1) and importing molecular interaction data from the EBI-GOA-miRNA file [52] (accessed 1 July 2019). All results of the BiNGO enrichment analysis are provided in Supplementary Table 8
Key: Gene Ontology (GO) term name; n, number of entities associated with a given GO ID in the whole reference set of 52563 human entities annotated with GO (N); x, number of entities associated with a given GO ID in the analyzed network of a total number of 410 entities (X); Expected, the number of genes in the query dataset expected to be associated with a given GO term by chance; Fold Enrichment, the ratio of the obtained versus expected number of genes associated with a given GO term in the analyzed group of genes;
In context of protein (mRNA target) nodes, the GO term enrichment analysis revealed that seven of the 17 prioritized proteins (CX3CL1, GRN, IFNG, IL6, TNF, TREM2, TYROBP) were associated with the GO term ‘
Other GO terms, identified in the enrichment analysis and highlighted in Fig. 2, represent ‘
Several highly evolutionarily conserved miRs are encoded as polycistronic clusters containing paralogous genes [83]. We hypothesized that some miRs in the miR-target network, involved in silencing of the same gene, may indeed be encoded by the same polycistron. Interestingly, the gene encoding the SIRPA protein is silenced by two miRs from the hsa-miR-17~92 cluster, hsa-miR-17-5p, and hsa-miR-20a-5p, which are also paralogues of each other [83]. Moreover, hsa-miR-17-5p regulates the expression of TNF, which is also regulated by one more member of the hsa-miR-17~92 cluster, hsa-miR-19a-3p [83]. In agreement with previous suggestions [83], results presented in Fig. 2 support the hypothesis that miRs encoded on one cluster cooperate with each other to regulate the same downstream processes.
While, several of the miRs in this network have not yet been associated with the highlighted AD-relevant inflammatory processes, this is likely due to either a lack of experimental data supporting their role, or a need for further curation of these miRs. A good example of this is the hub miR node hsa-miR-29b-3p, which has not been enriched in any of the highlighted processes (grey triangle node in Fig. 2). This miR has been shown to regulate cell migration and differentiation [84] and regulates the expression of three proteins (CX3CL1, GRN, IFNG) overrepresented in ‘
Analysis of the miR-centered miR-target network
A network of miR-target associations centered around the 66 human miRs, prioritized for annotation as a part of this project was also constructed (Supplementary Table 9). The network was seeded with the prioritized miRs and miR-target interaction data from the EBI-GOA-miRNA file [52] was imported into Cytoscape [73]. This resulted in a network of 415 nodes and 524 edges (Supplementary Figure 2).
A GO term enrichment analysis, performed in Cytoscape [73], using the BiNGO [38] and GOlorize [75] plugins, reveals that 14 entities were overrepresented for the ‘
Selection of GO terms enriched in the Cytoscape BiNGO analysis of the miR-centered molecular interaction network. The network, shown in Supplementary Figure 2, was constructed in Cytoscape [73] by seeding with the 66 miRs prioritized for annotation (Supplementary Table 2) and importing molecular interaction data from the EBI-GOA-miRNA file [52] (accessed 19 August 2019). All results of the BiNGO enrichment analysis are provided in Supplementary Table 10
Key: Gene Ontology (GO) term name; n, number of entities associated with a given GO ID in the whole reference set of 52563 human entities annotated with GO (N); x, number of entities associated with a given GO ID in the analyzed network of a total number of 410 entities (X); Expected, the number of genes in the query dataset expected to be associated with a given GO term by chance; Fold Enrichment, the ratio of the obtained versus expected number of genes associated with a given GO term in the analyzed group of genes;

MiR-centered miR-target molecular interaction sub-network constructed by selecting four miRs enriched for ‘
This evaluation implies that a GO term enrichment network analysis can be a helpful tool for identifying candidate entities for future functional studies and/or knowledge curation initiatives in a highly targeted manner. Additionally, our re-analyses demonstrate how the GO resource has been improved in the area of neuroinflammation and now allows for a better interpretation and understanding of neurobiological studies.
DISCUSSION
The main finding of the present study is that recent improvements to the GO resource, focusing on aspects relevant to neuroinflammation, have allowed for a more informative re-analysis and interpretation of a decade-old dataset of genes differentially expressed in AD [43]. Additionally, we demonstrate that a GO term enrichment analysis performed on a network of miR-target interactions is a useful tool for identification and prioritization of biological entities with probable novel roles in neuroinflammation, thus, informing future studies.
The first goal of our GO annotation project was to identify microglial proteins with roles in AD [54, 55] (Supplementary Table 1), which subsequently paved the way for identification of miRs involved in their silencing (Supplementary Table 2). Protein annotation using GO began over 20 years ago [29, 30] resulting in a breadth of information having already being captured for the microglial proteins, which we prioritized for this project. Consequently, our annotation objective was to associate more descriptive GO terms with these proteins and to capture experimental support for their fundamental molecular activities, e.g., whether a protein was a kinase, or a transmembrane transporter, etc., as well as their roles specifically in neuroinflammatory processes, for instance, ‘
On the other hand, since scientific interest in ncRNAs is more recent, and so fewer articles describing their roles in cellular events have been published to date, our second objective was to annotate every article we could find, which described an interaction between a miR and the mRNA transcript of a protein-coding gene from the priority list (Supplementary Table 1). A total of 379 articles describing the prioritized proteins and miRs were thoroughly curated using the full article annotation approach, thus increasing the breadth of the captured biological knowledge and reducing annotation bias (Table 1). This led to an increase in the number of GO terms associated with the prioritized entities and other proteins, ncRNAs and molecular complexes, with a total of 3,000 GO terms associated with almost 500 entities. We additionally achieved a greater specificity of annotations for the prioritized entities, as indicated by double the number of annotations per prioritized entity in comparison to any other entity annotated as a part of this project (Table 2:>12 annotations per prioritized entity versus > 6 annotations per any entity). Moreover, through our process-focused annotation approach we substantially increased the number of entities associated with neuroinflammatory GO terms (Supplementary sTable 3); now over a third of all GO annotations relevant to neuroinflammation will have resulted from this project. Consequently, 60% of neuroinflammation-relevant annotations associated with the 106 prioritized entities were created by this project (Fig. 1).
Our next aim was to demonstrate how our contribution to the neuroinflammatory process branch of GO has led to a more meaningful interpretation of AD gene expression data. A previously published transcriptomic study, which had identified two groups of genes: ‘Higher in AD’ and ‘Lower in AD’ (Supplementary Table 4) and had analyzed the data using PANTHER [39, 43] was selected for re-analysis. The GO term enrichment analysis of these datasets was repeated using the same tool and statistical parameters in order to faithfully reproduce their research method, but with the current GO data version. This led to a substantially more informative analysis in comparison to findings published nearly a decade ago [43]. For instance, we found that the genes ‘Higher in AD’ were associated with ‘
Another objective of this study was to show the applicability of performing functional GO term enrichment analyses of miR-target interaction networks for identification of entities with putative novel roles. Previous studies revealed a number of miRs implicated in dementia (reviewed in [15
, 86]). Our analyses revealed that at least 66 miRs (Fig. 2) regulate mRNAs encoding the 40 AD-relevant priority proteins involved in inflammatory and neuroinflammatory processes. However, only 17 of these 66 miRs have to date been associated with GO terms describing the regulation of neuroinflammatory processes. Given our thorough and systematic approach to miR annotation, this allows us to infer that these miRs have not yet been studied in context of neuroinflammation, or even systemic inflammation, yet they are likely implicated in these processes. This hypothesis refers especially to hsa-miR-29b-3p, which regulates the expression of three proteins involved in ‘
The current project, focusing on biocuration of microglial proteins involved in AD and miRs regulating their expression, is a continuation of our previous work on GO annotation of proteins interacting with amyloid-
In conclusion, through our focused and systematic full article annotation approach, we have contributed a breadth of new knowledge about neuroinflammation and related biological aspects to the GO resource by capturing 3,084 new annotations for 494 entities, i.e., on average six new annotations per entity. This included a total of 1,352 annotations for 40 prioritized microglial proteins implicated in AD and 66 miRs regulating their expression, yielding an average of twelve annotations per prioritized entity. All of the GO data is freely available and can be downloaded from the GO browsers, QuickGO [57, 97], or AmiGO [56].
We subsequently demonstrated how our contributions to the GO resource have rendered it a more helpful tool for meaningful interpretation of AD datasets by re-analyzing gene expression data published a decade ago, using the publicly available PANTHER tool [39, 74], which had been used in the original study [43].
Finally, our GO term enrichment analysis of a network of miR-target interactions, validated the applicability of the GO resource for identification of potential novel roles of these biological entities, using the freely available Cytoscape tool [73] with BiNGO [38] and GOlorize [75] plugins.
Collectively, our previous, current and future biocuration activities, concentrated on GO annotation of normal biological processes and entities perturbed in AD, will help to improve the understanding of molecular bases of this disease, thus providing a more solid foundation for development of diagnostic tools and treatments.
