Sage Journals: Discover world-class research

Abstract

Name ambiguity in the context of bibliographic citation records is a hard problem that affects the quality of services and content in Digital Libraries (DL) and similar systems. The challenges of dealing with author name ambiguity have led to a myriad of disambiguation methods. This article presents three author name disambiguation methods, namely HHC, SAND, and INDi, and have presented some experimental results that show their effectiveness when compared to other methods proposed in the literature. HHC is a heuristic-based hierarchical clustering method that successively fuses clusters of citation records with similar author names based on a real-world heuristic applied to their citation attributes (e.g., coauthor names, work title, and publication venue title). SAND is a three-step self-training method for author name disambiguation that requires no manual labelling and no parameterization. It is particularly suitable for situations in which only the most basic citation information is available. INDi is an unsupervized incremental method that aims to identify the correct authors of new citation records as they are inserted in a DL, thus avoiding the drawback of getting it duplicated every time it is updated. These three methods have a common characteristic that they do not require any training and use only the most basic information available in a citation record (i.e., author names, work, and venue titles), thus making them more suitable for the name disambiguation task. Other methods generally require either a training phase to learn a disambiguation model or additional information to help the disambiguation process.

Keywords

Author name ambiguity Bibliographic repository Bibliographic citation Automatic name disambiguation

Get full access to this article

View all access options for this article.