Sage Journals: Discover world-class research

Abstract

Similarity estimation between interconnected objects appears in many real-world applications and many domain-related measures have been proposed. This work proposes a new perspective on specifying the similarity between resources in linked data, and in general for vertices of a directed and attributed graph. More precisely, it is based on the combination of structural properties of a graph and attribute/value of its vertices. We compute similarities between any pair of nodes using an extension of Jaccard measure, which has the nice property of increasing when the number of matching attribute/value of those resources increase. Highly similar vertices are treated as one single node in the next step which is called a CGraph. Nodes of a CGraph represent highly similar resources in the first step and links between resources are generalized to links between clusters. We propose an extension of the structural algorithm, i.e. CRank to merge highly similar nodes in the next step. The suggested model is evaluated in a clustering procedure on our standard dataset where class label of each resource is estimated and compared with the ground-truth class label. Experimental results show that our model outperforms other clustering algorithms in terms of precision and recall rate.

Keywords

Get full access to this article

View all access options for this article.

Structure/attribute computation of similarities between nodes of a RDF graph with application to linked data clustering

Abstract

Keywords

Get full access to this article