Publications des agents du Cirad

Cirad

A French text-message corpus: 88milSMS. Synthesis and usage

Panckhurst R., Lopez C., Roche M.. 2020. Corpus, 20 : 23 p..

In this article, firstly we briefly summarise the sud4science project and data collection (http://sud4science.org), ensuing processing/analysing stages, and the resulting corpus, 88milSMS (http://88milsms.huma-num.fr), through a synthesis of quotes and references to previous articles (§ 1). Secondly, we provide a state of the art on some research initiatives that use88milSMS in various domains and frameworks, which will enable future cross-disciplinary insight (§ 2). Then, we present other usages of the 88milSMS corpus we identified through surveys (§ 3). Finally, we suggest future paths for textual data collection and analysis.

Mots-clés : fouille de données; analyse de données; traitement des données; collecte de données; application des ordinateurs; fouille de textes

Documents associés

Article (b-revue à comité de lecture)

Agents Cirad, auteurs de cette publication :