Abstract
Introduction
The discovery of the importance of lateral transfers, losses and duplications events in the evolution of genetic sequences has motivated the development of new approaches to graphically represent phylogenies. Methods like NeighborNet (Bryant and Moulton, 2004), T-Rex (Makarenkov et al. 2006), SplitTrees (Bandelt and Dress, 1992; Dress and Huson, 2004; Huson, 1998), Qnet (Grünewald et al. 2006), Pyramids (Bertrand and Diday, 1985), Tree of Life (Kunin et al. 2005a) allow visualizing deviations from a tree topology. All these methods have in common that they summarize the information in the form of a planar network. Deviations from an X-tree are often represented by supplementary edges (Makarenkov et al. 2006; Nakhleh et al. 2004) that create cycles in the graph.
Phylogenetic information can be represented by a distance matrix
With the availability of complete genomes, many methods have been proposed to determine the evolution of whole genomes (For reviews see Galperin et al. 2006; Delsuc et al. 2005; Henz et al. 2005). The construction of trees from whole genomes has proved over recent years to be a quite difficult task. This is mainly because of the very limited number of genes shared by Archaea, Eukaryota and Bacteria. Furthermore, gene evolution can sometimes be very different from species evolution. The main difficulty consists in finding a good operator to estimate the distance between genomes. Distances have been estimated with measures based on gene order or arrangement (Wolf et al. 2002; Wang et al. 2006), gene content (Fitz-Gibbon and House, 1999; Snel et al. 1999; Korbel et al. 2002), protein domain organization (Fukami-Kobayashi et al. 2007; Yang et al. 2005), folds (Lin and Gerstein, 2007), combining the information from many genes in a supertree or a superdistance (Dutihl et al. 2007 for a comparative study) or using a local alignment search tool such as Blast (Kunin et al. 2005b; Clarke et al. 2002). Among genome distances obtained with Blast, the genome conservation (Kunin et al. 2005b) has furnished some of the best trees up to date, if the quality of a whole genome phylogeny is measured by its concordance to broadly accepted classifications. The genome conservation estimates the distance between two taxa using the sum of BlastP reciprocal best hits between two genomes. The method is capable of quite correctly recovering all main phyla. At the phylum level, the evolution of the different genes is sufficiently similar to form a distinct cluster. The main uncertainties in whole genome phylogenies are on the relationships between phyla. Different evolution rates of the genes, gene losses or duplications, lateral gene transfer may result into large deviations of the distance matrix from a tree topology. In this context, minimum contradiction matrices can furnish information not contained in a single tree or a split network.
The paper is organized as follows. After introducing minimum contradiction matrices in section 2 and their connection to Robinson matrices and Kalmanson inequalities, section 3 explains why the identification of deviations from perfect order is a useful complement to phylogenetic studies. Section 4 presents an algorithm to search for the order minimizing a measure of the deviation from perfect order over all taxa. This order can be interpreted as an average best order over all reference taxa
Circular Order and the Minimum Contradiction Approach
Definitions
Let us start by recalling a number of definitions that are necessary to introduce the notion of circular order. A graph G is denned by a set of vertices
A valued X-tree
A central problem in phylogeny is to determine if there is an
Consider a planar representation of a tree
In an X-tree, a circular order has the property that for any integer

The distance matrix
The above inequalities characterize also a Robinson matrix (Christopher et al. 1996; Thuillard, 2007). Using the definition of
These inequalities have a similar form to the 4-point condition (2) and are known as the Kalmanson inequalities.
In real applications, the distance matrix
The best order of a distance matrix is, per definition, the order minimizing the contradiction. The ordered matrix
For a perfectly ordered X-tree, the contradiction
Kalmanson inequalities are at the center of a number of important results relating convexity (Kalmanson, 1975), the Traveling Salesman Problem (TSP) (Deineko et al. 1995; Korostensky and Gonnet, 2000), phylogenetic trees and networks (Christopher et al. 1996; Dress and Huson, 2004). Let us explain why perfect order is an important property.
If the error on the distance in an X-tree is not greater than
If a distance matrix d fulfills Kalmanson inequalities, then the distance matrix can be exactly represented by a split network (Bandelt and Dress, 1992).
If Kalmanson inequalities are fulfilled, then the tour (1,2,…,
The last result can be demonstrated starting from the sum
The solution to the TSP has the Master Tour property (Deineko et al. 1995). A Master Tour is a solution of the TSP with the property that the optimal tour restricted to a subset of points is also a solution of the reduced TSP. This result follows directly from the inequalities for perfect order
Fast algorithm to search for the best order
The choice of the reference taxon
The contradiction over all n reference taxa is given by
The best order is the order (1, …,
Let us start by considering an X-tree and the 3 vertices

The inequalities
If the contradiction
The quantities
The value
For whole genome phylogenies, the search for appropriate measures to estimate the evolutionary distance between taxa is still the subject of significant research efforts (Korbel et al. 2002; Kunin et al. 2005b; Yang et al. 2005; Fukami-Kobayashi, 2007). Distance matrices obtained from BlastP scores have been quite successful to generate good trees. The similarity score obtained with BlastP programs can be given a probabilistic interpretation. The statistics of high scoring segments in the absence of gaps tends to an extreme value distribution (Karlin and Altschul, 1990). The probability
The log term has the form of a mutual information and furnishes a measure of the similarity of the genomes i and j in reference to the genome
Different approaches have been proposed to normalize the distance matrix using the marginal entropy (Kraskov et al. 2005), the self-score (Kunin et al. 2005b), Korbel normalization (Korbel et al. 2002) or the average score. The normalization by the self-score in the genome conservation gives some of the best results. It is based on a nonlinear weighted sum of the BlastP scores. The gene conservation method computes the distance between two taxa by normalizing the sum of reciprocal best hits between genome
Search for the best average order
The algorithms described in section 4 have been used to search for the best order. The distance matrix was computed using the data furnished by the genome phylogeny server (Kunin et al. 2005b) obtained with an e-value cut-off set to 10−10. The contradiction is significantly lower with the score (1 –

Table 1 gives the order of the different taxa corresponding to the best order. Archaea and Eukaryota are grouped into two adjacent clusters of taxa. One observes, for Bacteria, that all the members of a class or a phylum are neighbors. All proteobacteria (together with Aquifex?) are grouped together. The best order obtained with the minimum contraction approach differs from the NJ tree on the following aspect: all spirochetes and δ-proteobacteria form a cluster. This is not the case of the NJ tree.
This article focus on the mathematical aspects of Minimum Contradiction Matrices. We will limit the discussion to 3 examples showing how to interpret Minimum Contradiction Matrices. The matrix

Distance matrix
The best order in Figure 3 is obtained by minimizing the contradiction using all taxa as reference vertex at least once. The best order is therefore a kind of “average” best order. The matrix
Many contradictions in Figure 5 can be associated to well accepted endosymbiotic events (Chloroplasts in plants or mitochondria in Eukaryota). Figure 5a shows

Distance matrix
Figure 5b shows the distance matrix using all Cyanobacteria as reference taxa. The elements associated to
Conclusions
For an X-tree or a split network the minimum contradiction matrix
An average best order can be obtained by searching for the best circular order over
