Publications des agents du Cirad

Cirad

Data quality assessment approaches for event-based surveillance systems

Syed M.A., Arsevska E., Roche M., Teisseire M.. 2025. Data Intelligence : 28 p..

DOI: 10.3724/2096-7004.di.2025.0063

Online news sources are popular resources for learning about current health situations and developing event-based surveillance (EBS) systems. However, having access to diverse information originating from multiple sources can misinform stakeholders, eventually leading to false health risks. The existing literature contains several techniques for performing data quality evaluation to minimize the effects of misleading information. We mainly proposed three approaches to assess the quality of news sources. In our research, our primary focus was on ensuring data quality assessment at two levels: 1) News article level and 2) News source level. We explored data quality assessment at the news article level through two main approaches: 1) Data-driven score-based approach and 2) Metadata-based machine learning (ML) approach. The data-driven score-based approach aims to classify relevant and irrelevant news articles, adding an explainability aspect in the context of EBS. Similarly, the metadata approach is employed for classification, utilizing news article metadata features in ML models to highlight important metadata features. For source-level quality assessment, we identified exogenous metadata attributes such as source categorization and geographical coverage associated with news sources, extracting this information automatically. With the help of extracted source metadata, we conducted the classification of news sources. The obtained results hold significance in terms of prioritizing news sources within the context of EBS. Nevertheless, further investigation is required to enhance the methodology of this approach.

Mots-clés : fouille de textes; métadonnées; apprentissage machine; analyse de données; apprentissage électronique

Documents associés

Article (a-revue à facteur d'impact)

Agents Cirad, auteurs de cette publication :