Publications des agents du Cirad

Cirad

A French text-message corpus: 88milSMS. Synthesis and usage

In this article, firstly we briefly summarise the sud4science project and data collection (http://sud4science.org), ensuing processing/analysing stages, and the resulting corpus, 88milSMS (http://88milsms.huma-num.fr), through a synthesis of quotes and references to previous articles (§ 1). Secondly, we provide a state of the art on some research initiatives that use88milSMS in various domains and frameworks, which will enable future cross-disciplinary insight(§ 2). Then, we present other usages of the 88milSMS corpus we identified through surveys (§ 3). Finally, we suggest future paths for textual data collection and analysis.

Mots-clés : collecte de données; traitement des données; analyse de données; fouille de données; fouille de texte

Thématique : Documentation et information; Méthodes mathématiques et statistiques

Documents associés

Article de revue

Agents Cirad, auteurs de cette publication :