Publications des agents du Cirad

Cirad

Integration of lexical and semantic knowledge for sentiment analysis in SMS

Khiari W., Bouhafs Hafsia A., Roche M.. 2016. In : Calzolari Nicoletta (ed.), Choukri Khalid (ed.), Declerck Thierry (ed.), Goggi Sara (ed.), Grobelnik Marko (ed.), Maegaard Bente (ed.), Mariani Joseph (ed.) , Mazo Hélène (ed.), Moreno Asuncion (ed.), Odijk Jan (ed.), Piperidis Stelios (ed.). LREC 2016 Proceedings. Portoroz : ELRA, p. 1185-1189. International Conference on Language Resources and Evaluation (LREC 2016). 10, 2016-05-23/2016-05-28, Portoroz (Slovénie).

With the explosive growth of online social media (forums, blogs, and social networks), exploitation of these new information sources has become essential. Our work is based on the sud4science project. The goal of this project is to perform multidisciplinary work on a corpus of authentic SMS, in French, collected in 2011 and anonymised (88milSMS corpus: http://88milsms.huma-num.fr). This paper highlights a new method to integrate opinion detection knowledge from an SMS corpus by combining lexical and semantic information. More precisely, our approach gives more weight to words with a sentiment (i.e. presence of words in a dedicated dictionary) for a classification task based on three classes: positive, negative, and neutral. The experiments were conducted on two corpora: an elongated SMS corpus (i.e. repetitions of characters in messages) and a non-elongated SMS corpus. We noted that non-elongated SMS were much better classified than elongated SMS. Overall, this study highlighted that the integration of semantic knowledge always improves classification.

Documents associés

Communication de congrès

Agents Cirad, auteurs de cette publication :