Publications des agents du Cirad


GeoDict: an integrated gazetteer

Fize J., Shrivastava G.. 2017. In : Francesca Frontini (ed.), Larisa Grcic Simeunovic(ed.), ¿pela Vintar(ed.), Fahad Khan(ed.), Artemis Parvisi(ed.). Proceedings of Language, Ontology, Terminology and Knowledge Structures Workshop (LOTKS 2017). Montpellier : Association for Computational Linguistics, p. 31-41. Language, Ontology, Terminology and Knowledge Structures Workshop (LOTKS 2017), Montpellier (France).

DOI: 10.18167/DVN1/MWQQOQ

Nowadays, spatial analysis in text is widely considered as important for both researchers and users. In certain fields such as epidemiology, the extraction of spatial information in text is crucial and both resources and methods are necessary. In most of spatial analysis process, gazetteer is a commonly used resource. A gazetteer is a data source where toponyms (place name) are associated with concepts and their geographic footprint. Unfortunately, most of publicly available gazetteer are incomplete due to their initial purpose. Hence, we propose Geodict, an integrated gazetteer that contains basic yet precise information (multilingual labels, administrative boundaries polygon, etc.) which can be customized. We show its utility when using it for geoparsing (extraction of spatial entities in text). Early evaluation on toponym resolution shows promising results.

