Publications des agents du Cirad


Gemedoc: A text similarity annotation platform

Fize J., Roche M., Teisseire M.. 2018. In : Silberztein Max (ed.), Atigui Faten (ed.), Kornyshova Elena (ed.), Métais Elisabeth (ed.), Meziane Farid (ed.). Natural language processing and information systems: 23rd International Conference on Applications of Natural Language to Information Systems, NLDB 2018, Paris, France, June 13-15, 2018, Proceedings. Cham : Springer, p. 333-336. (Lecture Notes in Computer Science, 10859). International Conference on Natural Language to Information Systems. 23, 2018-06-13/2018-06-15, Paris (France).

DOI: 10.1007/978-3-319-91947-8_35

We present Gemedoc, a platform for text similarity annotation based on the spatial and the thematic dimension. To this end, a two-step annotation protocol was designed to assess the similarity between two documents: (1) identification of salient features according to the two analysis dimensions; (2) similarity assessment according to a 4-degree scale. Ultimately, the labeled data retrieved from different corpora could be used as benchmark for text-mining applications.

