Publications des agents du Cirad


Food packaging permeability and composition dataset dedicated to text-mining

Lentschat M., Buche P., Dibie-Barthélemy J., Menut L., Roche M.. 2021. Data in Brief, 36 : 6 p..

DOI: 10.18167/DVN1/U7HK8J

DOI: 10.1016/j.dib.2021.107135

This dataset is composed of symbolic and quantitative entities concerning food packaging composition and gas permeability. It was created from 50 scientific articles in English registered in html format from several international journals on the ScienceDirect website. The files were annotated independently by three experts on a WebAnno server. The aim of the annotation task was to recognize all entities related to packaging permeability measures and packaging composition. This annotation task is driven by an Ontological and Terminological Resource (OTR). An annotation guideline was designed in a collective and iterative approach involving the annotators. This dataset can be used to train or evaluate natural language processing (NLP) approaches in experimental fields, such as specialized entity recognition (e.g. terms and variations, units of measure, complex numerical values) or sentence level binary relation (e.g. value to unit, term to acronym).

Mots-clés : fouille de textes; fouille de données; ontologie; analyse de données; conditionnement des aliments; perméabilité; traitement automatique des langues

Documents associés

Article (b-revue à comité de lecture)

Agents Cirad, auteurs de cette publication :