Abstract
Keywords
Introduction
New genes have frequently been associated with the evolution of taxon-specific traits and adaptations.1-3 For some time after their emergence, new genes are restricted to a single lineage and are often referred to as “lineage-specific genes” (LSGs). Such young genes are typically expressed at lower levels than older genes, with a high proportion expressed in specific tissues and developmental stages.2,4 In animals, new genes are often testis-biased,5
-9 and it has been suggested that LSGs play an important role in the developmental divergence between species.10,11 The analysis of expression profiles throughout development in
Tardigrade genomes have been found to possess different assemblages of stress-related gene pathways, suggesting the acquisition of species-specific molecular adaptations.
13
Tardigrade are known for displaying an impressive ability for cryptobiosis by entering an anhydrobiotic (“tun”) stage that allows them to tolerate extreme temperatures,
14
radiation,
15
and desiccation.
16
Species differ in their propensities and properties of anhydrobiosis, for example
In this study we characterize gene expression biases throughout tardigrade development and the tun stage in 2 tardigrade species with annotated genomes,
Methods
Gene family identification
Protein sequences from 7 species were used to identify and classify orthologs, paralogs, and their corresponding gene families among tardigrade genomes. The species used were the 2 focal tardigrade species,
Gene family classification
Based on the orthogroup information from Orthofinder, genes were categorized into age groups to identify lineage-specific genes and duplications from the point of view of the 2 tardigrades He and of Rv. First, genes were identified as being singletons or duplicated based on the presence of paralogs. Next, genes were sorted based on whether non-tardigrade outgroup species had orthologs (shared genes), whether only other tardigrades had orthologs (tardigrade-specific genes; TSG), or whether the gene had no orthologs detected (lineage-specific genes; LSG). Gene family expansions were assigned when a gene had more paralogs than it did orthologs in all 3 of the non-tardigrade outgroups combined.
Gene expression
Transcriptome data across developmental stages and physiological states were acquired for the 2 tardigrade species He and Rv from previous work17,26 describing transcription in embryos (eggs 1-5 days after laying), juveniles (1-7 days after hatching), active adults, and tun (anhydrobiotic) adults (Supplemental Table S1). In the original study by Yoshida et al, 17 Rv individuals were induced into anhydrobiosis by both fast and slow desiccation, but here we only used the slow desiccation data. The raw reads were trimmed using default settings in Trim Galore, 27 followed by mapping with STAR 28 against the respective tardigrade genomes. The number of reads mapping to genes was determined using featureCounts. 29 Data were imported in R v4.0.3, 30 where read counts were converted to transcript per million (TPM) by dividing the length-normalized transcript count for each gene (read counts divided by gene length) by the sum of length-normalized transcript counts across all genes multiplied by a million. The mean expression from 3 replicates was calculated for each developmental stage and adult condition (active vs tun) per species. Genes that had TPM < 1 in all samples were excluded from analyses. Gene expression variation was assessed across developmental stages and physiological states. Plots in R were generated using ggplot2. 31
Gene expression specificity, bias, and differential expression
Gene expression specificity was calculated using the tissue-specificity measure
Gene ontology enrichment analysis
Gene ontology (GO) terms for tardigrade genes were acquired with Blast2GO v6.0.3
34
using the full list of protein sequences, resulting in 11 744 He genes and 10 465 Rv genes with associated GO terms. GO enrichment analysis of gene groups (biased genes, lineage-specific genes, gene family expansions, and differentially expressed genes) was assessed using topGO v2.42.0,
35
with significantly enriched GO terms identified from the weight01 algorithm with a FDR-corrected
Results and Discussion
Gene orthogroup analysis across 4 tardigrade species and 3 non-tardigrade outgroups agrees with the results from Yoshida et al.
17
that
Orthogroup membership of genes in
Almost one-third of all genes (31%) belong to tardigrade-specific orthogroups, whereas species-specific (herein called lineage-specific) genes make up 34% of genes in He compared to 21% in Rv. We categorized genes into 6 categories, first whether they were singletons versus duplicated genes (paralogs), and then according to whether the gene had orthologs in at least 1 of the 3 non-tardigrade outgroups (shared), only orthologs in other tardigrades (tardigrade-specific genes; TSGs), or no orthologs (lineage-specific genes; LSGs). After excluding genes with low expression in all developmental stages and adult physiological states (TPM < 1), which filtered numerous LSGs and tardigrade-specific paralogs that did not meet these expression thresholds, He had about 3000 more expressed genes than Rv that were included in our analyses (Table 2).
Number of expressed genes according to their orthologs and paralogs in
Singletons and paralogs were identified from Orthofinder with an inflation value of 1.5 and divided into genes with orthologs in non-tardigrade species (shared), orthologs in only tardigrade species (TSG), or no orthologs (LSG).
Lineage-specific genes differ in their developmental stage-specificity between species
Low levels of gene expression and high expression specificity are often observed in new genes, rapidly evolving genes, and diverged duplicated genes that have evolved new functions.5,6,9,36 Meanwhile, older genes tend to be more highly and broadly expressed, which is a feature of genes with conserved expression across species.
37
In tardigrades, we found that LSGs were lower expressed than tardigrade-specific and shared genes (Mann-Whitney U-test

Expression profiles across gene categories in

Expression specificity of genes measured across all developmental stages. Genes were assigned to the developmental stage or physiological state in which they are most highly expressed. The data are from embryos (eggs) 1 to 5 days after laying, juveniles 1 to 7 days after hatching, and adults (active and tun stages). The boxplot notches indicate the median, and Mann-Whitney U-test FDR-corrected
Across all gene categories, we observed that genes expressed highest in early embryogenesis and in adults have relatively high expression specificity in both species, for both singleton and duplicate genes (Supplemental Figure S1). In other words, genes most highly expressed in juveniles tended to have broader expression throughout development, although the middle of the juvenile stages of development (fourth day and fifth day after hatching) also displayed high specificity in both species followed by a drop in specificity levels. The similarity in these trends between tardigrade species was somewhat surprising given that they are known to differ in both hatching timing and transcriptional initiation of conserved molting pathways during development, as determined using this same transcriptomic dataset. 26 But the species did differ in how gene age groups contributed to the developmental transcriptome. In He, shared singletons were overrepresented with expression specificity at the extremes of developmental stages, and lineage-specific singletons were overrepresented in the middle period of development (Supplemental Figure S2). In comparison, proportional representation of gene age categories among Rv developmental stages was less pronounced, with generally more shared singletons and fewer LSGs that were stage-specific at most time points. The variation in the extent to which gene age is associated with expression across developmental stages among species has also been found in other animals and fungi.4,38,39 Our findings suggest that LSGs in He play different roles during development than LSGs in Rv.
Developmental gene expression bias differs by gene age and by species
Expression specificity was used to identify “biased genes” that were predominantly expressed in a single developmental stage or physiological state. Among the most extreme biased genes, as defined as genes in the top fifth percentile of expression specificity (
Number of biased genes (
As the level of gene expression specificity increased, both species had relatively consistent proportions of stage- and state-specific genes within gene categories, with the exception of lineage-specific paralogs that displayed variable distributions of expression specificity across developmental stages (Supplemental Figure S3). However, when a MCL inflation value of 3.0 was used instead of the default of 1.5 to generate orthogroups in Orthofinder, the expression specificity of LSG paralogs across development was similar to LSG singletons. With the focus on biased genes, there were marked differences between He and Rv; biased genes in the 3 earliest embryo stages and tun adults in He made up over 80% of all biased genes, whereas Rv had comparatively more biased genes in active adults (17%) and the latest embryonic stage (19%). However, since He eggs hatch on average 2 days earlier than Rv eggs,
26
the observed difference in timing when biased genes are most highly expressed might in fact reflect matching ontogenetic expression between species. Similarly, Rv active adults likely already express genes important for anhydrobiosis,15,17 unlike He, therefore enrichment of biased genes in He tun adults and Rv active adults might not be biologically different. However, none of the biased genes in He tun adults belonged to the same orthogroups as biased genes in Rv active adults, and only 13% (He) and 18% (Rv) of biased genes overall share orthogroups across species indicating that most biased genes are different among the species. In He, the high expression bias in the first 3 embryo stages was found in all gene categories except for lineage-specific paralogs, where biased LSG paralogs are enriched in tun (Figure 3A) despite having on average lower expression specificity than LSG singletons (Figure 1B). In contrast, expression bias in Rv was more scattered across developmental stages, in which singletons had high proportions of biased genes in active adults, and LSGs were particularly biased in the first embryonic stage (Figure 3B). Lineage-specific singletons in He but not Rv were found to be the main contributors to expression bias in juvenile developmental stages (Supplemental Figure S4A). In both species, shared singletons are largely underrepresented among biased genes (Chi-Square test

Biased gene expression by gene category and developmental stage in (A) He and (B) Rv.
Biased genes largely displayed different enriched gene ontology (GO) terms between tardigrade species, except for juvenile-biased genes that were enriched with chitin binding genes (Figure 4). Chitin involved in tardigrade cuticle formation, remodeling and the molting cycle.
40
Interestingly, genes associated with chitin-binding functions were also enriched among tardigrade-specific gene (TSG) expansions, as defined as genes with more paralogs than the combined number of orthologs in all 3 non-tardigrade outgroup species. The expansion of gene families coding for proteins with chitin binding properties in tardigrades might have evolved to help regulate physiological responses to environmental stresses.41,42 Numerous other GO terms were found enriched among TSGs and TSG expansions including functions related to DNA binding, peptidase activity, and G protein-coupled receptor activity (Figure 4), which have been previously reported in Yoshida et al
17
based on analyses of only He and Rv (before data from

Gene ontology enrichment among gene lists including biased genes in embryos (egg), juveniles (juv), active adults (adu), and tun, as well as tardigrade-specific genes (TSG), lineage-specific genes (LSG), tardigrade-specific gene expansions (TSGexp), and lineage-specific gene expansions (LSGexp). The size of the bubble represents the fold difference between the observed versus expected occurrence of the gene ontology term, and the color represents the FDR-corrected
Young and duplicated genes contribute to differential expression during anhydrobiosis
Previous transcriptomic analysis between the active and tun states in

Gene categories of differentially expressed genes between tun and active adults in: (A) He and (B) Rv. Genes identified as differentially expressed were separated into those that were higher expressed (UP) and lower expressed (DOWN) in tun versus active adults. The number of genes is listed at the top of each bar, and the proportions represented by different gene categories are reported by color.
Conclusions
We described the contributions of lineage-specific genes to developmental expression bias in 2 tardigrade species. Results suggest species-specific transcriptional regulation throughout development and anhydrobiosis, wherein LSGs display expression bias during different developmental stages and physiological states. Accordingly, the sets of singleton and duplicate LSGs from different tardigrades code for proteins that are enriched in different functions. While this suggests that LSGs are involved in largely species-specific processes, we find them to be enriched among genes that transcriptionally respond to anhydrobiosis in both species. As the identification of lineage-specific genes is influenced by the accuracy of gene annotations 43 and homology detection methods, 44 our LSG estimates might artificially be inflated. It is therefore possible that the signal of expression bias among LSGs are caused not by recently emerged genes within a lineage, but by extremely diverged genes among species that have rapidly evolved and diversified both in their molecular sequences and expression. In either case, our findings show that tardigrade genes that have little homology in other species tend to have species-specific transcriptional roles during development and anhydrobiosis.
Research Data
sj-xlsx-2-evb-10.1177_11769343221140277 – Supplemental material for Diversity in Expression Biases of Lineage-Specific Genes During Development and Anhydrobiosis Among Tardigrade Species
Supplemental material, sj-xlsx-2-evb-10.1177_11769343221140277 for Diversity in Expression Biases of Lineage-Specific Genes During Development and Anhydrobiosis Among Tardigrade Species by Jean-Christophe Metivier and Frédéric J J Chain in Evolutionary Bioinformatics
Supplemental Material
sj-zipx-1-evb-10.1177_11769343221140277 – Supplemental material for Diversity in Expression Biases of Lineage-Specific Genes During Development and Anhydrobiosis Among Tardigrade Species
Supplemental material, sj-zipx-1-evb-10.1177_11769343221140277 for Diversity in Expression Biases of Lineage-Specific Genes During Development and Anhydrobiosis Among Tardigrade Species by Jean-Christophe Metivier and Frédéric J J Chain in Evolutionary Bioinformatics
Footnotes
Funding:
Declaration of Conflicting Interests:
Author Contributions
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
