Publications des agents du Cirad

Cirad

United we stand: using multiple strategies for topic labeling

Gourru A., Velcin J., Roche M., Gravier C., Poncelet P.. 2018. In : Silberztein Max (ed.), Atigui Faten (ed.), Kornyshova Elena (ed.), Métais Elisabeth (ed.), Meziane Farid (ed.). Natural language processing and information systems: 23rd International Conference on Applications of Natural Language to Information Systems, NLDB 2018, Paris, France, June 13-15, 2018, Proceedings. Cham : Springer, p. 352-363. (Lecture Notes in Computer Science, 10859). International Conference on Natural Language to Information Systems. 23, 2018-06-13/2018-06-15, Paris (France).

DOI: 10.1007/978-3-319-91947-8_37

Topic labeling aims at providing a sound, possibly multi-words, label that depicts a topic drawn from a topic model. This is of the utmost practical interest in order to quickly grasp a topic informational content ¿ the usual ranked list of words that maximizes a topic presents limitations for this task. In this paper, we introduce three new unsupervised n-gram topic labelers that achieve comparable results than the existing unsupervised topic labelers but following different assumptions. We demonstrate that combining topic labelers - even only two - makes it possible to target a 64% improvement with respect to single topic labeler approaches and therefore opens research in that direction. Finally, we introduce a fourth topic labeler that extracts representative sentences, using Dirichlet smoothing to add contextual information. This sentence-based labeler provides strong surrogate candidates when n-gram topic labelers fall short on providing relevant labels, leading up to 94% topic covering.

Documents associés

Communication de congrès

Agents Cirad, auteurs de cette publication :