Abstract
Introduction
After the Human Genome Project (HGP), we have entered a “post-gene era”, in which researchers have discovered that the human genome is not a combination of isolated genes and a large amount of useless “DNA segments,” but a complex network system. Network motifs are overly represented as the smallest unit in a network.1–3 Motif analysis is increasingly recognized as a powerful approach to research the function structure of a network, the structure of the organization principle, and species evolution.
Network motifs were first defined systematically in
Our method counts all sub-graphs that meet the requirements in a given graph. Backtracking method is utilized to enumerate all sub-graphs in the network. Then, the associated matrixes of sub-graphs are normalized and the isomorphism sub-graphs are marked uniquely. We evaluate our method on the metabolic pathway of the bacteria
Background Information
Definition of Network
A network consists of vertices and edges. A directed graph (or network) is usually denoted by
Definition of Induced Sub-Graph
Two graphs:
Definition of Degree Order
To order degrees of graph
Definition of Sub-Graph Isomorphic
Two sub-graphs

Examples of isomorphic graphs. (A) The isomorphism of an undirected graph. (B) The isomorphism of a directed graph.
The Key of Motif Mining
There are five subtasks in our method:

Flow chart of mining motifs.
The steps of mining motifs are as follows:
Use the ReadGraph function to store real graphs and random graphs.
Enumerate all of the sub-graphs with the backtracking algorithm.
Standardize all types of sub-graphs to easily identify the sub-graphs of the same type. We calculate the Code for a subgraph based on the degree of sub-graphs and symmetric ternary; one Code will represent one type of sub-graph. We obtain the number of one type of sub-graphs based on the value of Code. These steps are described in detail below.
Associated Matrix
Storage of graphs is the first step in the process to solve the problem of motif mining. An associated matrix is easy to compress and makes it easy to know the size of the sub-graph; hence, we use it to store the real graphs and random graphs.
Definition of Associatedmatrix
An associated matrix consists of a graph

An example of an associated matrix. It shows the corresponding relationship between a graph and its form for storage.
As we know a row represents a node, a column represents an edge, and the algebraic sum of all elements is zero in
The Standardization of Associated Matrix
Because the sort of a graph's nodes is not sure, the associated matrix used to store a graph is not unique. In order to mark the graph uniquely, we need to standardize the associated matrix. This is useful for sub-graph isomorphism. Specifically, it can reduce the complexity of sub-graph isomorphism and improve the efficiency of searching. These will be discussed in detail in the next section.
The detailed processes of the standardization of an associated matrix are as follows:
Out-degree order: perform a sort to all vertices with the dictionary sequence, which is in descending order. In-degree order: make a dictionary sequence of the vertices having the same out-degree order. If and only if there are some nodes with the same out-degree order and in-degree order, the sequence of these nodes should be in accordance with the minimum number of edges connecting them and the numbers shall ascend in the sequence. Calculate the columns by symmetric ternary number, and then with the dictionary sequence. (All columns of ternary numbers are not the same.) After sequencing these nodes, we shall transform the rows into a one-dimensional array and get a decimal number through transformation of the ternary number. We may repeat the aforementioned steps until the matrices get the same
Finally, we get a Code value to mark the class of isomorphism. 5668 is a Code signing an isomorphism sub-graph.
Random Network Generation
For the identification of a motif, the proper determination of a sub-graph's significance needs comparison with an ensemble of random graphs. Generation of this ensemble for a given graph model is a necessary step in our method. We generate random graphs with the same feature of the input network (enumeration and classification are also performed on random graphs).
Our approach is based on switching operations,
15
applied on the edges of the input network repeatedly, until the network is well randomized. This switching operation is applied to randomly chosen vertices of the network, as shown in Figure 4. Any two edges (

Schematic view of edge switching operator, edge replacement for generating random networks. As shown in this figure, the replacement process does not change the node degrees.
Backtracking
Backtracking
16
is a general method for finding all (or some) solutions to a computational problem, by incrementally building candidate solutions and then abandoning each partial candidate
The backtracking algorithm enumerates a set of partial candidates that traverses this searching tree recursively, from the root down, in depth-first order. At each node
We use it to enumerate all sub-graphs of a given size that occur in the input graph, because the backtracking algorithm keeps only the path of the initial and the end nodes, saving storage space and reducing space complexity. Then, we combine the way of enumerating all sub-graphs (backtracking) and the way of storing the input graph with an associated matrix that reflects the connection relationship of nodes and edges. The combination of the two algorithms can reduce the complexity of enumerating sub-graphs and searching time. Described in Figure 5, the detailed processes to enumerate sub-graphs are as below:

An example of searching sub-graphs by backtracking.
Carry out steps a–g sequentially to all nodes in the graph
Make two stack tables for all nodes
Put the node
When the number of nodes in stack Table 1 meets the requirements, output the nodes and edges in stack Table 1 as a sub-graph.
Classify the sub-graph outputted by a different edge number.
If stack Table 2 is not empty, perform steps f–g.
When the number of the nodes in stack Table 1 is smaller than the designated size, remove the first candidate node and edge pair from stack Table 2, and put all nodes and edges connected with the removed node into stack Table 2 as the candidate nodes and edges (excluding the nodes in stack Table 1). If the node removed from stack Table 2 is the same as a node in stack Table 1, put the edge removed from stack Table 2 into stack Table 1 only; otherwise, put the node and edge removed from stack Table 2 into stack Table 1 at the same time, and then turn to step c to continue.
When the number of nodes in stack Table 1 is equal to the designated size, search the remainder nodes and edges in stack Table 2. If there is a node in stack Table 2 equal to a node in stack Table 1, put the edge corresponding to the node in stack Table 2 into stack Table 1, and then output the nodes and the edges in stack Table 1 as a sub-graph.
A summary of
The summary of
Remove the node
Using the above steps, we can enumerate all sub-graphs through the backtracking algorithm and classify them based on the difference between the number of nodes and edges. This can reduce the space of isomorphism by judging the size of sub-graph beforehand. The combination of backtracking algorithm and associated matrix makes it easier and faster to search all sub-graphs than that with other algorithms. We provide the pseudo-code to show the process to enumerate sub-graph synoptically.
Time complexity for backtracking is
Sub-Graph Isomorphism
As it is well known, among the different types of graph-matching algorithms, the subgraph isomorphism is an NP-complete question. 14 So, time requirements of brute force matching algorithms (either in cases of isomorphism or sub-graph isomorphism) increase exponentially with the size of the input sub-graphs, restricting the applicability of graph-based techniques to problems implying sub-graphs with a small number of nodes and edges. We deal with the problem of sub-graph isomorphism by standardizing the associated matrix to mark the isomorphism class uniquely.
As previously stated, we standardize all sub-graphs to uniqueness. One way to identify isomorphic sub-graphs is to give an identical associated matrix to every isomorphic kind. For a sub-graph with
The sufficiency and necessity are proved.
Evaluation Measures
By using the results of the last step, the significance of each sub-graph found in the input and random networks is calculated. Here, a network motif is defined as a frequently or uniquely represented sub-graph in an input network; each motif must meet the measurement principles.
Each motif must meet the above principles, Motif = (Frequency,
Result and Discussion
In this section, we present the results of applying our algorithm to some real network; please see http://202.199.159.247/DownloadShow.asp?ID=15 for detail. Network instances applied are both biological and non-biological. The metabolic pathway of the bacteria
The Validity of Algorithm
The numbers of motifs with different sizes observed in each network are presented in Table 1. In this table, we enumerate different kinds of networks with different numbers of edges, and calculate the number of some sub-graphs that occurred in the real network and the only
From the first table, we know the different numbers of edges and nodes embody the difference between networks. Most motifs are similar in different networks, but small differences exist. The more similar the topology structures of motifs, the more similar are their
In this table, the fifth column has the unique
Algorithm Correctness and Extensive Applicability
As is known to all, the
In this table, the second column has the topological structures of motifs, the third column has the names of motifs, the next column has the
Meanwhile, in our method, the topological structures of motifs are the same as in the ESA method. 1 The importance of different motifs in the same network and that of the same motif in different networks is similar to that in Milo et al. 1 This proves the correctness of our method. In addition, our method achieves mining of motifs in different kinds of networks, which proves our method is valid and offers extensive applicability.
The Comparison of Accuracy
We have proved the validity and correctness of our method – Edges-Nodes Search Algorithm (ENSA), and will below prove its accuracy and comparison with that of Feature Compression for Graph Isomorphism (FCGI).
17
In this table, Type is the number of non-isomorphic graphs with size
The comparison of accuracy.
We use the backtracking algorithm to enumerate all sub-graphs of a given size that occur in the input graph. Our method is mining a non-probabilistic and exact motif in the network. So, it is an inherent property of the network that the number of types with size
The Comparison of Searching Time
In order to judge the performance of our method, we carried out a series of experiments on a set of testing data. The performance evaluation measured the amount of time consumed. We enumerate all sizes of sub-graphs for testing, and then compare with method FCGI. 17 The operating environment was consistent, as all evaluation was performed on the same computer hardware and software: Intel® Core™ 2 Quad CPU, 2.4 GHZ work station, 3 G RAM, Windows 7 operating system, and Visual C++ 6.0. This ensures that the comparison is meaningful (Fig. 6).

The performance evaluation is reflected in consuming time, using the
Searching the sub-graphs in the network is the key part in the process of recognizing motifs. The efficiency of enumerating all the sub-graphs is an important standard to measure the quality of the algorithm, because it costs much time to search the sub-graphs in the whole process. According to the previous research, ESA's search time is the longest, ESU takes the second longest, and ESU-Tree is the shortest. The work of Hu et al enumerates all the sub-graphs by the ESU-Tree algorithm. 17 In comparing our method with this use of ESU-Tree, we found only a small difference in the time required to enumerate the sub-graphs. In cases with three and four nodes, our backtracking algorithm took just 10−2 and 10−1 seconds longer. This gap can be reasonably ignored, and they can be regarded as equal. In cases with five nodes, the time spent on enumerating the sub-graphs by the backtracking algorithm is shorter than by ESU-Tree, by a significant 27.5 seconds. This is a relatively big improvement. The shorter time spent on enumerating reflects the superiority of the algorithm presented here.
The time complexity of our algorithm is
The time complexity of the FCGI algorithm is
Because of the limitations of the algorithm FCGI, we only tested and compared the yeast network. But our method has more extensive applicability; hence, we list the searching times of enumerating some sub-graphs in the other networks, as a reference. The results are given in Table 4.
Search time in different networks. The first column has the names of different networks, and columns 2–7 has the sizes of sub-graphs and time.
We list the searching time in the different networks with different sizes in this table. In any network, the searching time is different when the sizes of sub-graphs are different. For example, in the sea urchin network, 0.003 seconds where there are three nodes in the sub-graph, 0.012 seconds for four nodes, and 0.237 seconds for five nodes. With the size increasing, the search time is amplified 10-fold. It shows that the relationship between the searching time and the size of sub-graph is directly proportional. Among the networks we have tested, the searching time was the shortest in the
Conclusion
In this paper, we presented a method for mining network motifs. Based on the associated matrix, all sub-graphs could be enumerated by adding edges and nodes progressively with our backtracking algorithm, which saves storage space and reduces space complexity because it keeps only the path of the initial and the end nodes. This way of storing an input graph by an associated matrix reflects the connecting relationship of nodes and edges. Taking advantage of the combination of the associated matrix and the backtracking algorithm, our method reduces the complexity of enumerating sub-graphs, providing a more efficient solution for enumerating sub-graphs during motif mining. Then, the associated matrix is standardized and the isomorphism sub-graphs are marked uniquely by using the symmetric ternary to translate the elements (−1,0,1) in the associated matrix to a decimal number Code, which represents an isomorphism class. We judge whether the two sub-graphs are isomorphic by simply checking whether the Codes are the same or not. Because the associated matrix is usually not a square matrix and “−1” reflects direction, it is difficult to deal with them. Our method gives a new application of the associated matrix in the digraph domain by combining it with the symmetric ternary. It is easier and faster than other algorithms. In addition, from the results obtained, we have proved the correctness and effectiveness of our method. The comparison of the results of our method with the ones obtained shows that our approach has lower searching time and more extensive applicability.
Author Contributions
Conceived and designed the experiments: YX, QZ. Performed the experiments: YX, CZ. Analyzed the data: QZ. Contributed reagents materials/analysis tools: YX, CZ. Wrote the paper: YX. All authors reviewed and approved of the final manuscript.
