Sage Journals: Discover world-class research

Abstract

The name disambiguation task is designed to solve the name ambiguity problem of documents of multiple persons who have the same name with one another. The task aims to partition all the publications belonging to multiple person with the same name and realize that each decomposed partition is composed of publications of a unique person. Many works on name disambiguation task have a common feature that clustering method is usually used in the last step. The paper presents a complementary study to these works from another point of view. Based on the idea that documents with strong association relationships are likely to belong to the same author, this paper proposes a method of discovering meta clusters by graph partition with a heuristic rule to improve these clustering-based works. Specially, different from these works, this work uses clustering ensemble method instead of clustering method in the last step. Experimental results on a real-life dataset show that the improved method has satisfactory performance compared with the clustering-based baseline method.

Keywords

Name disambiguation meta clusters clustering ensemble graph partition

Get full access to this article

View all access options for this article.

References

Zhang

and Hasan

M.A.

, Name Disambiguation in Anonymized Graphs using Network Embedding, Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017, 1239–1248.

Han

, Giles

C.L.

, Zha

, et al. Two supervised learning approaches for name disambiguation in author citations, Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, 2004, 296–305.

Giles

C.L.

, Zha

H.Y.

and Han

, Name disambiguation in author citations using a K-way spectral clustering method, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, 2005, 334–343.

Wang

, Li

, Tang

, et al. Name Disambiguation Using Atomic Clusters, The Ninth International Conference on Web-Age Information Management, 2008, 357–364.

Arif

, Ali

and Asger

, Author name disambiguation using vector space model and hybrid similarity measures, Seventh International Conference on Contemporary Computing, 2014, 135–140.

Tang

, Alvis

C.M.

, Wang

and Zhang

, A Unified Probabilistic Framework for Name Disambiguation in Digital Library, IEEE Transactions on Knowledge and Data Engineering 24(6) (2012), 975–987.

Newman

M.E.

and Girvan

, Finding and evaluating community structure in networks, Physical Review E 69(2) (2004).

X.W.

, Yuruk

, Feng

Z.D.

, et al., SCAN: A structural clustering algorithm for networks, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining 2007, 824–833.

Tan

Y.F.

, Kan

M.Y.

and Lee

D.W.

, Search Engine Driven Author Disambiguation, Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, 2006, 314–315.

10.

Yin

X.X.

, Han

J.W.

and Yu

P.S.

, Object Distinction: Distinguishing Objects with Identical Names, IEEE 23rd International Conference on Data Engineering, 2007, 1242–1246.

11.

Zhou

, Cheng

and Yu

J.X.

, Graph clustering based on structural/attribute similarities, Proceedings of the VLDB Endowment 2(1) (2009), 718–729.

12.

Zhang

, Tang

, Li

J.Z.

and Wang

K.H.

, A constraint-based probabilistic framework for name disambiguation, Proceedings of the sixteenth ACM conference on Conference on information and knowledge management 2007, 1019–1022.

13.

Wang

X.Z.

, Tang

, Cheng H

and Yu

P.S.

, ADANA: Active Name Disambiguation, IEEE 11th International Conference on Data Mining 2011, 794–803.

14.

H.J.

, Tong

W.Q.

and Kausar

, A conditional random field model for name disambiguation in National Natural Science Foundation of China fund, Journal of Algorithms & Computational Technology 12(2) (2018), 91–100.

15.

Jiang

Z.L.

, Hou

and Min

, Clustering ensemble with weighted voting based on feature correlation[J], Computer Engineering & Applications 54(3) (2018), 150–159.

16.

Kuhn

H.W.

, The Hungarian method for the assignment problem, Naval Research Logistics 52(1) (2005), 7–21.