New approach to discover meaningful terms to specify cause of death from narratives verbal autopsy using TF-IDF and the LDA topic model
Diouf M., Thiam M., Roche M.. 2023. In : IEEE EUROCON 2023 - 20th International Conference on Smart Technologies. New York : IEEE. International Conference on Smart Technologies (IEEE EUROCON 2023). 20, 2023-07-06/2023-07-08, Torino (Italie).
Due to a lack of coroners in some remote areas of the world, epidemiological researchers have created a database for collecting causes of death, called a verbal autopsy. The unstructured verbal autopsy (VA) narratives that are collected in this database are full of hidden knowledge about mortality. However, they are under-exploited due to inadequate processing mechanisms, or some of the computational techniques used are inappropriate for the data format. In this paper, we propose an unsupervised approach that is essentially based on a new algorithm for preprocessing such data. This is not only to address the challenge of topic extraction with the Latent Dirichlet Allocation (LDA) topic model in the context of data scarcity, but also to improve the exploitation of topics (causes of death). Experiments with the Population Health Metrics Research Consortium (PHMRC) data have demonstrated the validity of the approach and have led to the identification of reliable causes of death as well as the discovery of new ones.
Documents associés
Communication de congrès
Agents Cirad, auteurs de cette publication :
- Roche Mathieu — Es / UMR TETIS