Publications des agents du Cirad

Cirad

A French text-message corpus: 88milSMS. Synthesis and usage

In this article, firstly we briefly summarise the sud4science project and data collection (http://sud4science.org), ensuing processing/analysing stages, and the resulting corpus, 88milSMS (http://88milsms.huma-num.fr), through a synthesis of quotes and references to previous articles (§ 1). Secondly, we provide a state of the art on some research initiatives that use88milSMS in various domains and frameworks, which will enable future cross-disciplinary insight (§ 2). Then, we present other usages of the 88milSMS corpus we identified through surveys (§ 3). Finally, we suggest future paths for textual data collection and analysis.

Thématique : Documentation et information; Méthodes mathématiques et statistiques

Documents associés

Article de revue

Agents Cirad, auteurs de cette publication :