Publications des agents du Cirad

Cirad

Feature selection for sentiment classification of COVID-19 tweets: H-TFIDF featuring BERT

Syed M.A., Arsevska E., Roche M., Teisseire M.. 2022. In : Bier Nathalie (ed.), Fred Ana L. N. (ed.), Gamboa Hugo (ed.). Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2022, Volume 5: HEALTHINF. Setúbal : Scitepress, p. 648-656. International Joint Conference on Biomedical Engineering Systems and Technologies. 15, 2022-02-09/2022-02-11, s.l..

In the first quarter of 2020, the World Health Organization (WHO) declared COVID-19 a public health emergency around the globe. Different users from all over the world shared their opinions about COVID-19 on social media platforms such as Twitter and Facebook. At the beginning of the pandemic, it became relevant to as- sess public opinions regarding COVID-19 using data available on social media. We used a recently proposed hierarchy-based measure for tweet analysis (H-TFIDF) for feature extraction over sentiment classifi- cation of tweets. We assessed how H- TFIDF and concatenation of H-TFIDF with bidirectional encoder representations from transformers (BH-TFIDF) perform over state-of-the-art bag-of-words (BOW) and term frequency-inverse document fre- quency (TF-IDF) features for sentiment classification of COVID-19 tweets. A uni- form experimental setup of the training- test (90% and 10%) split scheme was used to train the classifier. Moreover, evaluation was performed with the gold standard ex- pert labelled dataset to measure precision for each binary classified class.

Documents associés

Communication de congrès

Agents Cirad, auteurs de cette publication :