Abstract
Introduction
In the data mining field, the traditional binary classification or multi-classification problems have been explored substantially. However, the multi-label classification (MLC) problem still exists and it has recently attracted increasing research sights due to its wide range of applications, such as text classification,1,2 gene function classification,
3
social network analysis,
4
and image/video annotation.
5
Furthermore, with the rapid increase of development and applications with wireless sensor networks (WSNs), massive data collected from a large number of monitoring objects6–12 are analyzed, clustered, and classified with methods like classic
The traditional single-label classification (SLC) problem considers that one instance belongs only to one category, whereas in the MLC problem, one instance can be allocated to multiple categories simultaneously. Since SLC is merely a special case, MLC deals with a more difficult and general problem in the data mining domain, focusing on the following two challenges. (1) The number of label sets may be very large for test instances (e.g. being exponentially proportional to the total number of labels). 14 For instance, 10 labels would lead to 210 possible combinations of label sets for each test instance. (2) Due to multiple labels and possible links among them, their correlations become very complex. 15 For instance, on one hand, it is more likely for a piece of news tagged with “entertainment” to have another tag “sport” than “war.” On the other hand, in a classification of natural scenes with the set of picture labels {“beach,”“field,”“autumn leaf,”“sunset,”“mountain,”“city”}, it is less possible that a scenery picture is labeled by both “autumn leaf” and “beach.”
Thus, new approaches have been introduced to the MLC algorithms in recent years. For example, considering the application of graph representation to the MLC methods to cope with the above-mentioned problems. However, little progress has been made with these methods and algorithms15–29 especially in the following aspects. (1)
To overcome these problems, we propose a novel graph-based MLC algorithm, which adopts KNN and random walk algorithms, named multi-label classification based on the random walk graph and the
A new construction method for graph model is proposed to greatly reduce the time and space complexity of random walk algorithm, and hence to be able to handle large-scale data more easily than traditional methods.
The similarity measurement method is improved to express the relationships between the instances more accurately through differentiating and integrating discrete and continuous features.
A new prediction method is devised for the label set to reduce the subjectivity of the probability threshold method which occurs in most graph-based MLC algorithms.
We propose the recommended values of algorithm parameters through experimental analysis instead of subjective empirical constants, which possess a great significance in applying this method to similar problems in different domains.
The rest of this article is organized as follows. The background and reviews of the related work about random walk strategy, the KNN algorithm, and graph-based MLC algorithms are discussed in section “Related work.” Based on the previous research work, we propose our approach MLRWKNN in section “The principle of the MLRWKNN algorithm,” which consists of three components: design of feature similarity computation, construction of random walk graph, and label set prediction. In section “Experimental Results and Analysis,” we discuss experiment process, the test data used, and evaluation criteria. We also present comprehensive experimental results and their analysis. We conclude the article in section “Conclusion,” indicating our contributions to this research area and our future work in this direction.
Related work
In this section, we first introduce the random walk strategy and KNN algorithm, which will build up a theoretical foundation for our proposed approach, and then we review the graph-based MLC algorithms.
Random walk and the KNN algorithms
Random walk is an algorithm based on graph representation that iteratively explores the global structure of a network to estimate the proximity between two nodes. As mentioned in the previous section, one challenge of the MLC problem is that there are complex relationships among multiple labels. One solution to this problem is to apply the random walk algorithm to accurately describe the correlations among labels using the connectivity between vertices on a random walk graph. The four input parameters in a random walk algorithm are an adjacent matrix

The random walk diagram.
The KNN algorithm is a lazy learning method which classifies samples according to the idea of “birds of a feather flock together.” For certain test instances, it acquires the KNN instances taken from a training set according to a certain similarity measure and votes on the labels of the KNN instances to determine the predicted label for a test instance. As shown in Figure 2, the test instance “•” would be classified as “Δ” when

The KNN diagram.
PTMs and AAMs
In general, the MLC algorithms can be classified into two categories: problem transformation methods (PTMs) and algorithm adaptation methods (AAMs). The classical PTMs aim to transform an MLC problem into one or several SLC problems to which existing solutions are available. Here, we brief a few of typical ones. Binary relevance (BR)
4
transforms the MLC problem into a binary classification and generates a separate dataset for each label. Calibrated label ranking (CLR)
31
considers the MLC as a label ranking problem and learns a mapping from instances to rankings over a predefined set of labels. Hierarchy of multilabel classifiers (HOMER)
32
transforms the MLC problem into a tree hierarchy of simpler MLC tasks, in which for each node, labels are split using a balanced clustering algorithm and grouped similar labels into a meta-label, and meta-labels would be predicted by a classifier. Label powerset (LP)
33
attempts to build a single-labeled system, called independent label, for a dataset from possible combinations of labels. RAndom
The traditional AAMs intend to enhance the SLC algorithms so that they can handle the MLC problem. For example, the well-known multi-label
Recently, researchers have been focusing on the introduction of graph representation to the MLC algorithms, which can produce more accurate results with probability-based principles, and elegant representation capability of detecting label correlations. The graph-based MLC methods can be put into two categories, one category focusing on the improvement of the existing MLC algorithms by building corresponding graph models for multi-label datasets, and the other category focusing on the solutions to the MLC problem by combining the SLC algorithms with a graph model. Included in the first category are the improved BR algorithms, 15 the improved classification chain (CC) algorithms,27,28 and the improved MLKNN algorithm. 29 In the following, we describe each of them succinctly. In Cetiner and Akgul, 15 the label independence issue of the BR algorithm is addressed by first assuming the outputs of each binary classifier as observed nodes of a graphical model, and then determining the final label assignments using the standard powerful Bayesian inference for the unobservable nodes. The Neighbor Pair Correlation Chain Classifier (NPC) algorithm 27 constructs a graph of labels based on the latent Dirichlet allocation (LDA) model and acquires the label correlations using the random walk with the restart strategy. Then, the CC algorithm will solve the label chain selection problem by determining the label correlations to establish the best label chain. In Lee et al., 28 a directed acyclic graph (DAG) is constructed to maximize the sum of conditional entropies between all parents and children nodes, the highly correlated labels are sequentially ordered in chains obtained from the DAG, and the predictive power could be maximized by utilizing a CC approach with these chains. The instance-ranking method (IR) 29 maps all the training and test instances into a graph, assigns weights to each training instance with the random walk, and uses these weights to calculate the label prior probability in the MLKNN algorithm.
The following algorithms belong to the second category: the random walk-based MLC algorithms,16,17,19,20,22,23 the MLC algorithm based on Hilbert–Schmidt independence criterion, 24 and the dictionary learning-based DL-MLC algorithm. 25 Specifically, the graph DL-MLC algorithm 25 maps a training set to a graph model and improves the label-consistent K-SVD algorithm (LC-KSVD) 37 with adopting the graph Laplacian regularization. The genome-scale metabolic model (GSMM) method 24 contains three steps: first, it converts the training and test sets to a weighted undirected graph and describes the graph smoothness through a series of transformations including the adjacency matrix, the real label set of training instances, and the predicted label set of test instances; second, the algorithm describes the consistence of label space with the Hilbert–Schmidt independent criterion; third, it builds the MLC classifier through optimizing the smoothness and consistency. Among the MLC algorithms based on random walk, these three algorithms MLRW, 17 ML-RWR, 23 and RW.KNN 19 map all training instances to graph vertex sets, acquire the probability distribution from test instances to training set with random walk, and compute the predicted probability of each label by the probability distribution and label sets of training instances. Even though the above three algorithms all construct the edge sets on the whole training instances, MLRW connects the edges among instances that have the same labels, while ML-RWR connects the edges among instances which have mutual KNN relations, and still RW.KNN considers only unilateral KNN relations. The transductive multi-label learning (TML) 16 method builds a complex partial directed graph by mapping all the training and test instances to graph vertices. Undirected edges link the test instances that have KNN relations and directed edges link the training instances. Based on this graph, TML improves the method proposed in Azran 38 to tackle the MLC problem. Wang et al. 20 constructed a bi-relational graph via combining a data graph and a label graph. The data graph is constructed by all the training and test instances and the random walk with restart strategy is used to compute transition probability from each label vertex to the instance vertex, that is, the label predicted probability.
In summary, for the graph construction problems all the afore-mentioned algorithms use whole training instances (even all the training and test instances) to construct graph vertices, which leads to the embedding of too many vertices unrelated to test instances. Huge graph vertices require considerable computation power in time and space, and sometimes even deteriorate the classification effect. 17 For the similarity measurement problem, the above algorithms adopt the distance reciprocal between feature vectors,17–19,29 the distance-based Gaussian function method,16,20,23,24 and the sparse rule operator method. 22 None of these methods consider the difference between discrete features and continuous features. For the parameter selection problem, almost all the algorithms adopt empirical constants and do not implement in-depth experimental analysis on how to choose algorithm parameters. In order to overcome these shortcomings, we propose a novel MLC algorithm based on random walk strategy and KNN algorithm. Section “The principle of the MLRWKNN algorithm” provides a detailed description of the proposed method.
The principle of the MLRWKNN algorithm
In order to outline the MLRWKNN algorithm proposed in this article, some basic conceptions of MLC are discussed here. Suppose that
Suppose that
Therefore, the multi-label classifier
Suppose there are
Overall description of the MLRWKNN algorithm
The MLRWKNN algorithm aims to solve the MLC problem through constructing the random walk graphs, where the vertex and edge sets are generated by a certain test instance
For each test instance in
Through random walk operation, the MLRWKNN algorithm computes the probability distribution on the graph vertices. As shown in Figure 3, starting from
Through using the above obtained probability distribution vector, the MLRWKNN algorithm predicts the probability of each label belonging to

The construction of random walk graphs.
In order to elaborate the MLRWKNN algorithm at a great detail, we will discuss the design of similarity measurement, the construction of random walk graph, and the prediction of label set in the following sections. Similarity measurement, as the basis of KNN algorithm, is discussed in section “Design of proposed similarity,” and sections “A new construction method of the random walk graph” and “Label sets prediction” give the detailed description of the above three steps.
Design of proposed similarity
Considering the difference between discrete features and continuous features, our MLRWKNN calculates the similarities for discrete features and continuous features, respectively, and then combines them with a linear weighted process. Specifically, given an instance
In order to determine the value range of similarity based on a continuous feature, MLRWKNN adopts the Gaussian kernel function to handle the similarity of continuous features
where
A new construction method of the random walk graph
In general, only a few training instances play a decisive role in predicting the label sets of test instances, whereas the other training instances not only complicate the random walk graph and interfere with the classification results. In this article, we propose a novel construction method of the random walk graph by adopting KNN instances of
Let
where
The vertex set of instances
The edge set
Since there is no need to construct the KNN for every training instance in
The terms in this formula are explained as follows: (1)
In equation 13,
where
When the random walk procedure ends, the MLRWKNN algorithm generates
where
Note that the symbol (*) is an ordinary multiplication operator. In summary, for a given random walk graph, the MLRWKNN algorithm obtains the probability distribution from
Label sets prediction
The MLRWKNN algorithm predicts the cardinal number
where
where
where
In summary, first, the MLRWKNN algorithm computes
Convergence proof of MLRWKNN algorithm
We maintain this statement that a simple random walk on a graph is a discrete-time Markov chain over the nodes (i.e. vertexes). 40
Proof:
Because vector
A walk from any vertex can in principle return to any given vertex, including the vertex itself, in consecutive steps due to the existence of strictly positive vector
Obviously, it is possible to traverse a vertex again in a positively recurrent number of walk steps after it is traversed for the first time since being a Markov chain; random walks possess the feature of finiteness.
From the above three points, it can be observed that the MLRWKNN algorithm is ergodic, and hence convergent.
17
That is, there exists a vector
Experimental results and analysis
In this section, we first introduce experimental environment and datasets; then discuss the experimental analysis of the construction of the random walk graph, the similarity measure method, and the parameter selection of the MLRWKNN algorithm; and finally make a comprehensive experimental comparison of the seven algorithms.
Experiment environment and datasets
Six datasets, which are extensively used by researchers to conduct experiments and evaluate the MLC algorithms, include Flags, Genbase, Medical, Scene, Yeast, and Mediamill (for detailed information about these public datasets, refer to the URL: http://mulan.sourceforge.net/datasets-mlc.html). They cover different application domains such as texts, images, biologic data, and video data, and the number of labels varies from 6 to 101 as described in Table 1. Seven state-of-the-art MLC algorithms were chosen for performance comparison, including BRKNN, CLR, HOMER, LP, RAkEL, MLKNN, and Rank-SVM, among which binary relevance KNN (BRKNN), CLR, HOMER, LP, and RAkEL belong to the category of PTMs, and MLKNN and Rank-SVM to that of AAMs. For the objectivity of comparisons, 10-fold cross validation was adopted and the average values of 10 experimental repetitions were considered as the final values for every evaluation criterion.
Description multi-label datasets.
Mediamill contains 43,907 items of video data. We used the first 5000 video items in our experiments for comparison purpose.
Evaluation criteria
So far, a series of criteria have been introduced to evaluate the MLC algorithms from different perspectives, and these measurement criteria can be divided into two categories: bipartition-based criteria and ranking-based criteria.14,41 The former concentrates on whether a label is correctly predicted, while the latter on whether a relevant label is ranked before an irrelevant label. In general, no algorithm can produce a best performance for all the criteria. Appropriate criteria should be set up in accordance with the optimal objective for a proposed method. As our MLRWKNN algorithm focuses on the ranking problem, four ranking-based criteria have been determined to validate our method: Ranking Loss (RL), One Error (OE), Coverage (Cove), and Average Precision (AP). Let
OE calculates the average number of times that the top-ranked label is irrelevant to the test instance
Coverage
AP evaluates the degree that labels before the relevant labels are still relevant labels
Results and analysis
Analysis on the proposed similarity measurement
Most of the graph-based MLC algorithms use the similarity measurement methods for the continuity features (e.g. Euclidean distance) to compute instance distances, which may be unsuitable for the discrete features. Based on the MLRWKNN algorithm, the proposed similarity is compared with the Gaussian kernel of the Euclidean distance method16,20,23,24 and the reciprocal of Euclidean distance.17–19,29 Flags and Genbase datasets were chosen for a comparison study. As described in Table 1, the Genbase dataset only has the feature of discreteness whereas the Flags dataset has both the features of discreteness and continuity. Table 2 shows the performance of the MLRWKNN algorithm under different similarities measurements. The experimental results show that the proposed similarity measurement metric has apparent advantages over the other two measurements in terms of the four criteria.
Performance comparison under different similarity measurement.
In terms of the reciprocal of the Euclidean distance, RL, OE, Coverage, and AP on the Genbase dataset were improved by 98.24%, 97.83%, 66.81%, and 14.99%, respectively. The improvement of those of Flags were by percentage 14.56%, 3.39%, 5.73%, and 2.86%, respectively. For the Gaussian kernel of the Euclidean distance, the corresponding performance on Genbase were 89.29%, 83.99%, 15.93%, and 1.27% improved, respectively; those of Flags were 16.68%, 5.5%, 6%, and 3.3% improved, respectively. The results in Table 3 show that the proposed similarity can reflect the distances between instances more accurately and it can improve the classification affects greatly.
Performance improvement based on the proposed similarity.
Analysis of the proposed random walk graph
In order to illustrate the advantages of the MLRWKNN algorithm, we choose the ML-RWR 23 algorithm which improves the MLRW 17 algorithm and adopts the KNN idea. We use the same datasets as ML-RWR, that is, the Medical and Yeast datasets, for the performance comparison, and the experimental results are described in Table 4.
Different graph construction modes comparison for MLRWKNN and ML-RWR.
MLRWKNN: multi-label classification based on the random walk graph and the
As shown in Table 4, MLRWKNN produced almost the same classification performance as the ML-RWR algorithm. However, compared with the proposed MLRWKNN algorithm, ML-RWR constructs random walk graphs on the whole training set, which leads to the large space and time requirements. Assume the training set of the ML-RWR algorithm is
Analysis of parameters selection
There are three parameters,
Experimental analysis of the selection of the parameter K
Different from adopting fixed constant values, we dynamically determine an optimal
From Figure 4, for the Scene dataset, it can be observed that a gradual increase of the

The relationship between different
Experimental analysis of the selection of the
value
For convenience, we determine an optimal

The relationship between different
As shown in Figure 5, when
Experiment analysis of the selection of the
value
We assign

The relationship between different
From Figure 6, it can be observed that different
Briefly, from the above analysis, we can observe that an optimal
Comparison of experimental results of seven algorithms
We select seven most cited MLC algorithms, that is, BRKNN, CLR, HOMER, LP, RAkEL, MLKNN, and Rank-SVM, for a comparison study in contrast to the proposed MLRWKNN algorithm. As the MLRW algorithm adopts the continuous feature-based similarity measurement and uses
The experimental datasets are described in Table 1, and the optimal values of the parameter
The optimal
Performance comparisons of eight algorithms on Yeast.
BRKNN: binary relevance KNN; CLR: calibrated label ranking; HOMER: hierarchy of multi-label classifiers; LP: label powerset; MLKNN: multi-label KNN; Rank-SVM: rank support vector machine; RAkEL: RAndom
Performance comparisons of eight algorithms on Flags.
BRKNN: binary relevance KNN; CLR: calibrated label ranking; HOMER: hierarchy of multi-label classifiers; LP: label powerset; MLKNN: multi-label KNN; Rank-SVM: rank support vector machine; RAkEL: RAndom
Performance comparisons of eight algorithms on Genbase.
BRKNN: binary relevance KNN; CLR: calibrated label ranking; HOMER: hierarchy of multi-label classifiers; LP: label powerset; MLKNN: multi-label KNN; Rank-SVM: rank support vector machine; RAkEL: RAndom
Performance comparisons of eight algorithms on Scene.
BRKNN: binary relevance KNN; CLR: calibrated label ranking; HOMER: hierarchy of multi-label classifiers; LP: label powerset; MLKNN: multi-label KNN; Rank-SVM: rank support vector machine; RAkEL: RAndom
Performance comparisons of eight algorithms on the Mediamill dataset.
BRKNN: binary relevance KNN; CLR: calibrated label ranking; HOMER: hierarchy of multi-label classifiers; LP: label powerset; MLKNN: multi-label KNN; Rank-SVM: rank support vector machine; RAkEL: RAndom
Performance comparisons of eight algorithms on the Medical dataset.
BRKNN: binary relevance KNN; CLR: calibrated label ranking; HOMER: hierarchy of multi-label classifiers; LP: label powerset; MLKNN: multi-label KNN; Rank-SVM: rank support vector machine; RAkEL: RAndom
In summary, as observed from Tables 6–11, the LP algorithm demonstrates a poor performance on all the six datasets among these eight algorithms. The other six algorithms show a low performance on some datasets. More specifically, the HOMER algorithm performs not well on the datasets of Yeast, Flags, Scene and Mediamill, the Rank-SVM algorithm receives the worst results on the Flags dataset, while the MLKNN algorithm is not good with the Genbase dataset. In terms of the four assessment criteria, the proposed MLRWKNN algorithm outperforms all the other seven algorithms with the best comprehensive performance on the datasets of Yeast, Flags and Genbase, and achieves the above-average performance on the datasets of Scene, Mediamill and Medical. This comparison result demonstrates that the proposed algorithm achieves a stable and significant classification effect on the six commonly used datasets in contrast to the other most-cited seven algorithms.
Conclusion
The MLC problem is an important, significant, and widely influencing research problem in the field of data mining, impacting a wide range of applications in real world domains. The graph-based MLC algorithms have, in recent years, received an increasing research attention and introducing the random walk-based methods into the solutions to the MLC problem has also become a hot research topic. In this article, we propose a novel approach, the MLRWKNN algorithm, and research direction to tackle this problem through integrating the random walk methods in the graph-based MLC algorithms. Our major contributions to the field of MLC in this article include the following three aspects. First, a new paradigm is proposed for a graph model that constructs a vertex set via the KNN training instances for certain test instances and determines the label correlations of the vertices samples to build the edge set. This new paradigm can reduce the overhead complexity in time and space compared with other graph-based models. Second, a novel label set prediction method is developed by introducing the similarity measurement. This method overcomes the problem of subjectivity of determining thresholds in the traditional methods and performs the similarity computation more accurately via differentiating and integrating discrete and continuous features. Third, we consider the influences of the parameter selection and determination on the classification performance and suggest the guidelines of the selection principles and recommended values for the algorithm parameters. A good number of experiments have been conducted and significant comparisons have been made on the six datasets among the seven algorithms and ours, in order to evaluate the proposed similarity measurement, the new construction method for graph model, and the MLRWKNN algorithm. The experiment results demonstrate that the proposed MLRWKNN algorithm produce a much better result than the other seven state-of-the-art MLC algorithms.
Future work: there are efforts required to improve and extend the proposed method and algorithm. First, in this work, we considered only a linear integration of the similarity metric for the two continuous and discrete datasets. To explore a further theoretical similarity computation, it will be interesting and useful to investigate a nonlinear combination style for the proposed similarity measurement which takes the dataset features of continuity and discreteness into account. Second, it is a challenge to tackle the issues of an adaptive adjustment mechanism for the determination of algorithm parameters, in order to enhance the accuracy and automation of the MLC methods and tools. Third, deep learning approaches 43 have been recently widely explored and considered to have a strong impact on various application domains. Integrating deep learning or rule learning methods in the MLC methods44,45 is attracting researchers and some interesting results have been observed. We believe this is another significant direction for our future work. Fourth, a newly published paper proposed a semi-supervised learning (SSL) method based on random walk, 40 which leverages the notation of “landing probabilities” of class-specific random walks and aims at a great improvement in computational complexity and scalability for large graphs. This article highlights another way of investigation of MLC problems. As a part of our next step of MLRWKNN, we consider contrasting this method to our method on KNN and random walk and making comparison study of algorithm performance through experiments with the datasets used in Berberidis et al. 40
