Publications des agents du Cirad


PADI-web: An event-based surveillance system for detecting, classifying and processing online news

Valentin S., Arsevska E., Mercier A., Falala S., Rabatel J., Lancelot R., Roche M.. 2020. In : Vetulani Zygmunt (ed.), Paroubek Patrick (ed.), Kubis Marek (ed.). Human language technology. Challenges for computer science and linguistics. Cham : Springer, p. 87-101. (Lecture Notes in Computer Science, 12598). Language and Technology Conference. 8, 2017-11-17/2017-11-19, Poznan (Pologne).

DOI: 10.1007/978-3-030-66527-2_7

The Platform for Automated Extraction of Animal Disease Information from the Web (PADI-web) is a multilingual text mining tool for automatic detection, classification, and extraction of disease outbreak information from online news articles. PADI-web currently monitors the Web for nine animal infectious diseases and eight syndromes in five animal hosts. The classification module is based on a supervised machine learning approach to filter the relevant news with an overall accuracy of 0.94. The classification of relevant news between 5 topic categories (confirmed, suspected or unknown outbreak, preparedness and impact) obtained an overall accuracy of 0.75. In the first six months of its implementation (January¿June 2016), PADI-web detected 73% of the outbreaks of African swine fever; 20% of foot-and-mouth disease; 13% of bluetongue, and 62% of highly pathogenic avian influenza. The information extraction module of PADI-web obtained F-scores of 0.80 for locations, 0.85 for dates, 0.95 for diseases, 0.95 for hosts, and 0.85 for case numbers. PADI-web allows complementary disease surveillance in the domain of animal health.

Documents associés

Communication de congrès

Agents Cirad, auteurs de cette publication :