Fusion of BERT embeddings and elongation-driven features
Rafae A., Erritali M., Roche M.. 2024. Multimedia Tools and Applications, 83 : p. 80773-80797.
Elongated words such as “Wiiiiiin” or “allloooo” are common in oral communication and are often used to emphasize or exaggerate the hidden message of the root word. While elongated words are rarely found in written languages and dictionaries, they are prevalent in social media networks. Considering elongation in sentiment analysis can provide valuable insights into user sentiments. In this article, we analyze the impact of elongation on sentiment classification, along with an in-depth study of lexical forms of elongation. We propose a method to enhance sentiment classification accuracy by incorporating elongation-based features using BERT (bidirectional encoder representations from transformers) approaches. Experimental results conducted on Twitter data demonstrate that our model achieves an average accuracy of 87% through 10-fold cross-validation experiments.
Mots-clés : méthode statistique; modèle mathématique; communication; analyse des réseaux sociaux; réseaux sociaux; traitement automatique des langues
Documents associés
Article (a-revue à facteur d'impact)
Agents Cirad, auteurs de cette publication :
- Roche Mathieu — Es / UMR TETIS