Abstract
Understanding semantic themes of short texts is challenging due to limited word co-occurrence information. Utilising pre-trained word embeddings or incorporating contextual information from external sources is likely to increase noise and mislead the thematic representation of the short texts, which declines the classification performance. For higher accuracy in classifying short texts, we propose a knowledge graph-enhanced topic model called Graph Convolutional Embedded Topic Model (GCETM), which simultaneously learns the graph network and topic modelling. GCETM employs the Graph Convolutional Network (GCN) to infuse prior human knowledge of the current short texts into the topic embedding space. For model fitting, we propose a data-driven regularisation for amortised variational inference. Besides GCETM topic inference, we utilise corpus statistics for semantically enriched vectorial representation of short text for their classification. Experimental results using a linear Support Vector Machine (SVM) classifier outperform several state-of-the-art baselines by achieving 97.15% accuracy on the AgNews data set, 97.75% accuracy on the SearchSnippets data set, 98.73% accuracy on the Movie Review (MR) data set, 98.7% accuracy on the TMNews data set, 96.5% accuracy on the Twitter data set and 98.44% accuracy on the R8 data set.
Get full access to this article
View all access options for this article.
