Abstract
Introduction
In the past years, the reconstruction of phylogenetic data sets has changed from using a single or few genes toward phylogenomic analyses exploiting hundreds of genes in a single study. To deal with the sequence data of so many genes, automatic process pipelines are required. There are two divergent views of the consequence of such vast data sets. First, it has been proposed that congruence between phylogenetic analyses is increasing as random noise in the data sets is strongly reduced. 1 Second, it has been concluded that because of increased artificial signal, which is not canceled out like random noise, true and strong cases of incongruence will now be detected more often. 2 In this case, two different kinds of artificial signal can be distinguished.
First, a crucial step in phylogenomic studies is the determination of orthologous genes across the different species present in the analysis. Usually, automated orthology prediction methods are used at this step.3–7 However, these prediction methods can erroneously group paralogous sequences as sets of orthologous sequences. As a consequence, this can result in the reconstruction of gene trees rather than species trees.8–14 For example, Philippe et al.
10
reanalyzed the data sets of Schierwater et al.
15
and Dunn et al.
16
with respect to artificial signal, including the use of manual means to detect paralogous genes in supposed sets of orthologous genes. They were able to detect several cases of paralogy in the first data set. Pruning these sequences from the first data set substantially reduced the very strong support (ie, bootstrap value of 100) for the monophyletic group of Porifera, Ctenophora, Cnidaria, and Placozoa present in the original data set. Owing to the pruning, Porifera was instead placed as sister to all other metazoans, and Cnidaria as sister to Bilateria.
10
An improvement of the Dunn et al.
16
data set also revealed a sister group relationship of Porifera to all other metazoan taxa, instead of Ctenophora being sister to all other metazoan taxa.
10
Similarly, for an annelid data set eight sets of orthologous genes could be found containing paralogous sequences.
11
In two of these eight sets, the paralogous sequences included had a strong impact on the reconstruction of the concatenated data. Specifically, the taxa affected by the presence of a paralogous sequence–-ie
Second, numerous studies in the past decades using both real and simulated data sets have shown that systematic biases like increased substitution rates or saturation can positively mislead phylogenetic reconstructions resulting, for instance, in long-branch attraction artifacts.17–33 Recently, evidence is accumulating that this is also the case for phylogenomic studies. For example, the reconstruction of the eukaryotic tree of life is affected by the presence of both rapidly evolving species 8 and saturation at fast-evolving sites across all taxa. 34 It could also be shown that the placement of Ctenophora as sister to all other metazoan taxa is most likely a long-branch artifact because of increased substitution rates in some species.9,10 Moreover, this position of Ctenophora is likewise affected by saturation at fast-evolving sites across all taxa.10,33 Similar analyses revealed that the support for the monophyly of Tardigrada and Nematoda also stemmed from both long-branch attraction and fast-evolving sites across all taxa. 35 Finally, Salichos and Rokas 36 explored the effects of different parameters such as rapidly evolving species, slowly evolving genes, or phylogenetic signal on the reconstruction of the yeast phylogeny using phylogenomic data. They found that selecting genes based on strong phylogenetic signal would decrease incongruence within the final concatenated data set.
Finally, different methods have been proposed to detect conflict in the phylogenetic reconstruction between different partitions of a data set without any
Although all these methods have been shown to aid the vindication of artificial signal in phylogenetic and phylogenomic studies, they were usually conducted at best in a semi-automated way, which still required several manual analytical steps during the analyses. Manual exploration of hundreds of genes or trees in the course of phylogenomic studies is time-consuming, not very feasible, and likely to miss an instance. Tree-manipulation programs such as Phyutility 64 or tools calculating systematic biases from alignment data such as BaCoCa 65 exist and allow for the implementation in automatic analysis pipelines. But there has not previously been any program that implements the different methods used in the above studies. These methods comprise a screening procedure for paralogous sequences based on single-gene trees,8–11 detection of conflict using partition-by-partition and node-by-node approaches utilizing nodal support values, 56 or measurements for saturation and long-branch attraction based on patristic distances (PDs) in the tree.30,32,33,36 Because all these methods rely on tree-based information such as nodal support or PDs, the program TreSpEx (Tree Space Explorer) has been written in Perl and is presented herein. As it is command-line driven, it can be easily incorporated into automatic pipelines of, for example, phylogenomic studies.
Implementation of Different Methods in TreSpEx
Detection and Pruning of Paralogous Sequences
The procedure to detect paralogous sequences implemented in TreSpEx8–11 is invoked after an initial determination of sets of supposedly orthologous genes using, for example, HaMStR, OrthoMCL, ReMark, MultiMSOAR 2.0, and PhyloTreePruner.4–7,66 As discussed above, these automated orthology predictions have some chance of grouping paralogous genes together. Such misclassifications might subsequently mislead the analysis of the combined data to inferring a gene tree rather than the desired species tree. Thus, before further analyses, additional screening of the sets of supposedly orthologous genes for paralogous sequences should follow the first orthology prediction.8–11
To detect such paralogous sequences, TreSpEx implements a screening procedure based on the phylogenetic reconstruction of single partitions (eg, genes) of a phylogenomic data set8–11 (Fig. 1). For the best tree of each single-partition analysis, this screening identifies all clades possessing a bootstrap value equal to or larger than a certain threshold (eg, 95). These detected clades are regarded as potential indications of paralogy separating paralogs from each other within gene trees. However, strong bootstrap for a clade within a single-partition tree might also be because of true phylogenetic signal for a group of taxa (eg species from the same genus). To separate cases of most likely true signal from cases of paralogy, two different strategies have been proposed. First, clades congruent with clades present in the best tree obtained from the concatenated data set were regarded as exhibiting true phylogenetic signal and “masked” (ie eliminated from further analyses).8–10 Second, only clades congruent with clades for which independent a priori evidence of monophyly from other sources of data can be shown were masked to avoid circularity 11 (Fig. 1). If required, both masking strategies can be invoked with TreSpEx. However, it should be noted that this masking strategy is not a prerequisite for the screening procedure, especially given the automatic BLAST search option in TreSpEx (see below). One reason for using the masking in the previous studies was to scale down the number of cases requiring further manual inspections such as BLAST searches.

Flow-chart of the implementation of TreSpEx in an analytical procedure conducting a screening for paralogy. Programs other than TreSpEx are only examples, and any other program for orthology prediction, phylogenetic reconstruction, and data concatenation can be used.
The next step is to decide if the clades so far identified are truly the results of paralogy (Fig. 1). The first criterion is that, in addition to the strong nodal support that first suggested paralogy, a long-branch leads to the suspect clade8–10 (Fig. 2A). The second is that taxa from a clade with independent a priori evidence of monophyly are found along with taxa outside this clade both within and outside the suspect clade8–10 (Fig. 2B). Finally, TreSpEx marks very short branches leading to one of the terminal taxa in a suspect clade as indicative of potential cross-contamination 11 (Fig. 2C).

Theoretical examples of the sorting criteria in the paralogy screening of TreSpEx. Sorting based on (A) an additional long branch leading to the suspect clade (indicated by an arrow), (B) taxa from a clade with independent a priori evidence of monophyly are found along with taxa outside this clade both within and outside the suspect clade (indicated by arrows), and (C) very short branches leading to one of the terminal taxa in a suspect clade (indicated by an arrow).
To gain further evidence for paralogy, BLAST searches can be applied using TreSpEx.8–11 For each partition containing a suspect clade, TreSpEx conducts BLAST searches of all sequences of the partition against two reference databases (Fig. 1). Although different pre-compiled reference databases (eg of
Instead of considering, in principle, all possible clades for the paralogy screening, TreSpEx also provides the option to test if the support for a clade or clades in a given tree (eg, the best tree of the concatenated data set) stems from paralogous sequences rather than true phylogenetic signal. 11 This is called a posteriori screening, as it is conducted after a phylogenetic reconstruction of some kind. In contrast, the other option described above considering all possible clades is named a priori screening, as it can be conducted before any phylogenetic reconstruction. Finally, TreSpEx allows for the pruning of affected sequences from the partitions of the data set (Fig. 1).
Case Study I
To exemplify the potential of TreSpEx to detect paralogous sequences, I used the analysis and single-partition trees of Struck,
11
which are publically available. Struck
11
found that 24 out of 229 partitions contained clades with bootstrap support of 95 or higher, which could not be attributed to clades with a priori evidence of monophyly (Table 1). The clades with a priori evidence of monophyly comprised only members of Clitellata, Sipuncula, Myzostomidae, Terebelliformia, Capitellidae/Echiura, or Serpulidae/Spionidae. Struck
11
used the sequences of each suspect clade as well as those of
Comparison of the paralogy screening based on the BLAST search of TreSpEx using the data set of Struck 11 to the original results of the study of Struck. 11 The numbers in the brackets provide the number of partitions found by TreSpEx regarded as cases of paralogy (first position) or non-paralogy (second position) by Struck. 11
Using TreSpEx for the paralogy screening, only clades with bootstrap values of 95 or higher (Fig. 1) were detected and masked for the same clades with a priori evidence of monophyly as in Struck.
11
This first screening returned all 24 partitions found by the more or less manual screening of Struck
11
and one additional partition (Table 1). The next step was to blast all sequences of the suspicious 25 partitions against the reference databases of
However, six partitions could not be assigned with certainty. Three of these partitions were indicated as paralogous by Struck
11
. Struck
11
differentiated two classes of paralogy. In one class, taxa of the core set of the orthology prediction (ie,
However, this separation was not perfect as also three partitions indicated as non-paralogous by Struck
11
were marked as uncertain by TreSpEx. In these three cases, in one of the two searches
Detection of Conflict
TreSpEx in combination with a program for phylogenetic reconstruction such as RAxML 67 or PhyloBayes 68 can also be used to detect conflicts in data sets based on the PABA principle.55,56 Using this principle, conflict is detected on a node-by-node and partition-by-partition basis utilizing nodal support values. The PABA principle was first proposed using bootstrap values, 55 and can also be employed with any other nodal support values such as Bremer support (PABSA, partition addition Bremer support alteration) or posterior probabilities (PAPPA, partition addition posterior probability alteration). 56 For reasons of simplicity, it will be referred to as PABA herein. For each node and partition, the alteration of nodal support is determined as partitions are added to the data set. During this process, as each partition is added the order of addition is also taken into account, that is if a partition is added as first, second, or third partition and so on. To condense the results, the mean values of alteration are calculated for each partition and position of addition. These results then allow the alteration of support values to be examined for indications of conflicts. For example, if a partition always decreases the support for a node regardless of its position of addition, this would indicate a conflict between this partition and the other partitions in this data set concerning this particular node (for more details, refer to Struck 56 ).
Except for the phylogenetic reconstructions, TreSpEx provides the first implementation of the other three steps of the PABA approach. First, TreSpEx generates all possible combinations of partitions of a data set as Phylip files for phylogenetic reconstructions in the second step (Fig. 3). For example, if a data set comprises six partitions, TreSpEx will generate all possible data sets comprising only one, two, three, four, five, or six of the six partitions. An option at this step is to generate only a range of possible combinations. For example, instead of generating all possible data sets containing one to six of the six partitions, only all possible data sets with four or five of the six partitions can be generated. Second, after the phylogenetic reconstructions of the data sets with the different combinations of partitions, TreSpEx summarizes bootstrap support values or posterior probabilities across all data sets for each of the nodes that can be found in at least one of the trees. Third, for each node, partition, and position of addition, TreSpEx calculates the alteration in nodal support and averages the results in accordance with the position of addition (eg, added as fourth or fifth partition).

Flow-chart of the implementation of TreSpEx in an analytical procedure conducting a detection of conflict using the PABA, PAPPA, or PABSA approach. Programs other than TreSpEx are only examples, and any other program for data concatenation and phylogenetic reconstruction can be used.
Two different statistical tests proposed by Struck 56 can also be conducted by TreSpEx (Fig. 3). To assess whether the positive contribution of a partition outweighs, if present, its negative impact on a given set of nodes, a Wilcoxon-Signed-Rank test69–73 is conducted. A given set of nodes can, for example, be all nodes of the best or an alternative tree. The results of this test can be used to guide the decision if an entire partition should be excluded from the analysis instead of just a few sequences. To test the significance of the results of each partition at each node and position of addition, a permutation test similar to ILD or LILD (localized ILD) tests41,49 is implemented in TreSpEx. For this permutation test, TreSpEx randomly assigns positions to partitions of the same sizes as the predefined partitions used for the calculation of the original values. Then the same analyses are conducted as for the original partitions. Thus, the test can reveal if the value found for a partition at a node and position of addition can be obtained just by chance because of randomly partitioning the data set. Such tests were lacking in the first proposal of this approach. 55
Case Study II
Herein I exemplified the potential of TreSpEx to detect conflict based on the PABA principle using the data set of Struck et al. 55 . By manual inspection of trends in alteration of nodal support, Struck et al. 55 highlighted three cases in the COI (cytochrome oxidase I) partition and three in the 28S partition as interesting (Table 10 in Struck et al. 55 , Figure 4). The COI partition introduced a strong conflict at node 4 and a slight conflict at node 13. Hidden support was revealed at node 12 55 (Fig. 4). The 28S partition introduced the strongest conflict at node 9 and a slight conflict at node 13. Again, hidden support was revealed at node 8 55 (Fig. 4). For the present demonstration, I generated all 15 possible combinations of the four partitions 16S, 18S, 28S, and COI using TreSpEx. In addition, the 15 possible combinations for each of the 100 permutated data sets were also created with TreSpEx, resulting in an additional 1,500 datasets. In the second step, phylogenetic analyses were conducted for each of the 1,515 datasets using RAxML 7.3.1 67 with the GTR + Γ + I substitution model and 100 bootstrap replicates. 74 For the original as well as the 100 permutated data, the bootstrap values were individually summarized in the third step. In the fourth and final step, the PABA results were calculated, and both a permutation test and a Wilcoxon-Signed-Rank test were conducted to test the significance of the individual PABA results and the overall contribution of a partition to given sets of nodes, respectively. Therefore, two sets of nodes were tested. The first set comprised all nodes of the ML (maximum likelihood) tree of the concatenated data set of all four partitions and the second all nodes of the ML tree of the data set with only the 28S data.

Cladogram of the best maximum likelihood tree based on the analysis of all four partitions herein (same topology as in Struck
Although a different ML algorithm was used herein than by Struck et al., 55 of the six instances discussed by Struck et al. 55 five were indicated here as showing significant conflicts. Especially, the nodes 4, 9, and 13 were affected (Fig. 4). Only node 8, which was regarded as revealing hidden support 55 , was not indicated. Interestingly, as in Struck et al. 55 the 28S partition was not able to overwhelm the support for node 9 present in the other three partitions, when added as fourth (see black box in Figure 4). Furthermore, the permutation test of TreSpEx revealed partitions that contributed significantly more to a node than would have been expected given their size. The value obtained by the original data was significantly higher than the values obtained by just randomly assigning positions to a partition of the same size, for example, 18S and 28S contributed strongly to node 3 when added as second and similarly, 16S, 18S, and 28S to node 10. Thus, using the permutation test of TreSpEx allowed also the detection of strong support for a particular node by a partition. For example, given its size COI was significantly contributing to the bootstrap support of node 8 independent of the position of addition. The 18S partition was also positively contributing to this node as bootstrap support increased by an amount of 33–47%; but given the size of the 18S partition, this contribution was not significant. For the other two partitions, the contribution was also generally positive, but close to zero. Therefore, support for this node did stem from COI and 18S, but considering its size COI contributed more to the support.
The Wilcoxon-Signed-Rank test showed that over all nodes all partitions contributed positively to the ML tree of the concatenated data set. For each partition, its contribution significantly outweighed its negative impact at least at one position of addition (Table 2). Interestingly, although the 28S partition introduced a strong conflict at node 9 55 (Fig. 4) over all nodes, its contribution significantly outweighed its negative impact when added, for example, as second. Its contribution was even stronger than that of the 18S partition when added as second, despite the fact that the 18S partition did not introduce any conflict. Only when COI was added as fourth partition, its negative impact at two nodes outweighed its positive contribution, but this was still not significant. The stronger negative impact of COI when added as fourth was because of two reasons. First, because of the other three partitions most nodes were already maximally or nearly maximally supported and the COI partition could not add any more measurable bootstrap support to these nodes when added as fourth. On the other hand, the conflicts at nodes 4 and 13 persisted. However, this was also the case for the 28S partition and node 9. This led to the second reason. While other partitions (namely 16S and 18S) significantly contributed to node 9 and to a certain degree 13, this was less prominent at node 4. More specifically, maximal bootstrap support was already achieved at node 9 by the concatenation of 16S, 18S, and COI, and the conflict introduced by the 28S partition when added as fourth was not strong enough to decrease the bootstrap value below maximal support (see black box in Figure 4 at node 9). Hence, the Wilcoxon-Signed-Rank test can help to reveal very strong conflicts in a partition when nodal support values with a maximal support value like bootstrap values or posterior probabilities are used. 56 This is different when, for example, Bremer support values are used, which do not have a maximum value. 56
Results of the Wilcoxon-Signed-Rank test for the analyses of the data of struck
In contrast to the nodes of the ML tree of the concatenated data set, the 28S partition was not surprisingly the only partition that over all nodes contributed positively to the nodes of the ML tree of the 28S data set (Table 2). When added as third partition, this contribution was significant. The 16S partition was also contributing to this set of nodes to a certain degree, but the 18S and COI partitions clearly had an overall negative impact, which was significant when they were added as fourth.
Detection of Long-Branched Taxa and Partitions
To assess long-branch attraction based on tree-specific properties, two means have been mainly used. Average evolutionary rates of complete data sets or their partitions have been calculated as a proxy for long-branch attraction, and faster evolving partitions were excluded in favor of slower evolving ones.
75
This is also called the slow-fast method. However, the problem of long-branch attraction stems from heterogeneous branch length and, hence, evolutionary rates between taxa within a data set or partition.30,32 Therefore, distances from the root of the tree to each taxon (ie, tip-to-root distances) are used as a taxon-specific measurement for long-branch attraction.
8
However, the recognition of long-branched taxa by tip-to-root distances heavily depends on the root of the tree by definition. For automatic process pipelines, this can pose severe problems in the recognition of long-branched taxa or data sets severely affected by long-branch attraction. When the root of the tree cannot be objectively placed as different outgroup taxa root the tree differently, it cannot assess which root-based distance would be trustworthy. Therefore, TreSpEx also calculates a new long-branch score, the LB (long branch) score
76
(Fig. 5). The score utilizes PDs, ie, the distance between two taxa based on the connecting branches, and is based on the mean pairwise PD of a taxon Flow-chart of the implementation of TreSpEx in an analytical procedure analyzing long-branch attraction (A), saturation (B), >or phylogenetic signal (C). Programs other than TreSpEx are only examples, and any other program for statistical analyses and phylogenetic reconstruction can be used.
Case Study III
The annelid taxon Myzostomidae is well known for its long-branch problem.77–80 In many molecular phylogenetic studies, it is attracted to the longest outgroup taxon. In the analyses of Struck,
11
Myzostomidae was also placed with the longest outgroup taxon, the ectoproct
In addition to taxa genes, data sets or partitions can be analyzed with respect to long-branch attraction as well. For example, the 229 genes of the data set of Struck
11
were analyzed using TreSpEx, and density plots were generated with the aid of

Density plots generated with

Density plots generated with

Heat map in combination with hierarchical clustering generated with
Detection of Saturation and Phylogenetic Signal
Saturation is known to influence phylogenetic reconstructions even using phylogenomic data sets.10,33 Assessment of the degree of saturation can be determined either by the visual inspection of saturation plots or based on specific values measuring the degree of saturation.10,23,31,33 These values are either the slopes or the
Finally, it has been proposed to assess the resolution power of partitions or genes within a larger phylogenomic data set using the average bootstrap support of each partition. 36 Moreover, average bootstrap support has also been used to determine whether alterations to the data sets like exclusion of data or taxa were beneficial or detrimental to the phylogenetic reconstruction. 80 Hence, TreSpEx also calculates average bootstrap values across all nodes of a given tree (Fig. 5).
Case Study IV
To exemplify these two features of TreSpEx, the 229 genes of the data set of Struck
11
have been used again in combination with density plots generated with

Density plots generated with R of different gene-specific saturation indices (A and B) as well as phylogenetic signal (C) for the 229 genes present in the data set of struck. 11
Run-Time Statistics of TreSpEx
For each data point, the calculation of the run-time statistics was repeated 10 times to assess the variability of the run time. For all steps of the paralogy screening, the run time shows a linear increase with an increasing number of trees or data sets (Fig. 10A). The pruning and a priori screening options are the fastest, requiring less than 0.5 seconds even for 200 data sets or trees, respectively. The a posteriori screening takes about three times longer than the a priori screening, but still needs less than 1.5 seconds. Interestingly, the masking option has no substantial influence on the run time of the screening (Fig. 10A). By far and not surprisingly, the longest time is taken by the BLAST searches of the sequences of the partitions with suspect clades against the two reference databases. When the screening procedure started with 200 trees, this step took about 140 seconds. Thus, a complete paralogy screening procedure starting with 200 trees and, thus, 200 data sets requires a total time of less than three minutes to finish, including the BLAST searches and cleaning of the data sets.

Run-time statistics of different options of TreSpEx as part of (A) the paralogy screening, (B) PABA analyses, and (C) determination of long-branch indices, saturation index, or phylogenetic signal as assessed by average bootstrap support.
Similarly, the run times of the determination of the average bootstrap support, long-branch, or saturation indices also increase more or less linearly with the number of trees and data sets (Fig. 10C). Calculation of the long-branch indices of 200 trees requires less than a quarter second, and the average bootstrap supports only about 0.3 seconds. The calculation of the saturation indices takes substantially longer, but is still achieved in about 1.5 minutes for 200 trees and data sets. This is because of the fact that for this index, the pairwise PDs have to be calculated from the trees as well as in parallel the uncorrected pairwise distances
Finally, the run times of the three PABA options follow an exponential growth as the number of partitions increases (Fig. 10B). This is because of the exponential growth of the number of possible combinations of data sets with an increasing number of partitions 56 (Fig. 10B), so that the correlation of the run time and the number of data sets is linear for the generation of all possible combinations as well as the compilation of bootstrap summaries (data not shown). However, for the calculation of the PABA results itself, the correlation between run time and number of data sets is still exponential. This difference can also be observed in the plot against the number of partitions. While the generation of all possible combinations as well as the compilation of bootstrap summaries show similar curves, the curve for the calculation of the PABA results is much steeper (Fig. 10B). The difference is because of the fact that with an increasing number of data sets, the number of possible additions of a partition to data sets without the partition also increases exponentially. However, even given this exponential growth the calculation of the PABA results for eight partitions takes less than 12 seconds, only about 50% longer than the generation of all possible combinations of eight partitions in the first step (Fig. 10B). For eight partitions, the longest time in performing the steps of the PABA analysis in TreSpEx is used for the summary and compilation of the bootstrap results of all data sets. This step requires a little less than two minutes. However, given the double exponential correlation of the calculations of the PABA results to the number of partitions at a certain number of partitions, this calculation will take the longest. For example, with only up to seven partitions the generation of the possible data sets takes longer than the calculation of the PABA results, but after that it is vice versa (Fig. 10B). Regarding run-time requirements, though, the bottleneck in the PABA analysis will be none of the three steps, but the actual phylogenetic reconstruction including, for example, a bootstrap analysis. 56 For instance, even performing a parallel RAxML analysis on 15 threads with 100 bootstrap replicates, the shortest phylogenetic reconstruction took about 0.5 minutes, so that with 8 partitions and 254 data sets this would be at best about 2 hours. The three steps of the PABA analysis performed by TreSpEx together took less than two minutes for eight partitions and, thus, only about one-sixtieth of the time of the phylogenetic reconstructions. In case of a permutation test, the time for the generation of all possible combinations, the phylogenetic reconstructions, as well as the compilation of bootstrap summaries multiplies by the number of the permutated data sets plus one as the three steps have to be conducted for each permutated data set and the original data set. However, the calculation of the PABA results increases only slightly.
Conclusion
TreSpEx allows the detection of artificial signal because of paralogy, long-branch attraction, or saturation, as well as conflict between different data sets, by utilizing tree-based information like nodal support or PDs. TreSpEx enables the parallel analysis of hundreds of trees and/or predefined gene partitions in very short to reasonable amounts of time. For example, the analysis of the sister group relationship of Ctenophora to all other Metazoa 81 using TreSpEx indicated that the support for this relationship might stem from long-branch attraction of Ctenophora toward the outgroup taxa in the analysis. Hence, more thorough analyses in how far this affects the position of Ctenophora are still necessary and to this end, taxon sampling of Ctenophora should also be substantially increased in future phylogenomic studies. Moreover, after increasing the number of taxa the analyses should be complemented by thorough investigations of the individual genes of the data set with respect to biases such as saturation and heterogeneous substitution rates. TreSpEx could be a useful tool in such analyses.
Generally, the results of TreSpEx provide the foundation and raw data for further analyses of different properties of the data set and the influence of these properties on the phylogenetic reconstructions. The partitions of a data set or different data sets can be ranked or grouped together. Additionally, taxa can be excluded based on the results of TreSpEx. The influence of individual properties like long-branch or saturation indices on the phylogenetic reconstruction can be assessed in combination with additional phylogenetic analyses. Hence, regardless of whether the studies are based on single, a few, or hundreds of genes the reliability of phylogenetic reconstructions can be increased using TreSpEx. This will improve the robustness of phylogenies and therefore also the conclusions drawn in many areas of comparative biological studies that rely on robust phylogenies.
TreSpEx will be kept up to date in the next years if changes in the Perl environment occur, and new tree-based methods will be incorporated. Moreover, on request different input and output formats can be added. The program is open source and released under the terms of GNU General Public License (GPL) 3.0.
Author Contributions
Conceived and designed the experiments: THS. Analyzed the data: THS. Wrote the first draft of the manuscript: THS. Made critical revisions: THS. The author reviewed and approved of the final manuscript.
Disclosures and Ethics
As a requirement of publication the author has provided signed confirmation of compliance with ethical and legal obligations including but not limited to compliance with ICMJE authorship and competing interests guidelines, that the article is neither under consideration for publication nor published elsewhere, of their compliance with legal and ethical guidelines concerning human and animal research participants (if applicable), and that permission has been obtained for reproduction of any copyrighted material. This article was subject to blind, independent, expert peer review. The reviewers reported no competing interests.
