Publications des agents du Cirad


Partial n-Ary relation instances on food packaging composition and permeability extracted from scientific publication tables

Lentschat M., Buche P., Menut L., Guari R., Roche M.. 2022. Data in Brief, 41 : 9 p..

DOI: 10.18167/DVN1/GCZBC9

DOI: 10.1016/j.dib.2022.108000

This dataset is dedicated to text mining and is composed of partial n-Ary relation instances concerning food packaging composition and gas permeability. It was created from 31 tables derived from 10 English-language scientific articles in html format from several international journals hosted on the ScienceDirect website. This dataset includes two sets of data: manual table annotation results and automatic data extraction results. The tables were first annotated by one annotator and cross-curated by three different annotators. The annotation task aimed to identify all table data dealing with packaging permeability measurements and compositions. An Ontological and Terminological Resource (OTR) was used for the annotation process. The annotation guidelines were drawn up through a collective iterative approach involving the annotators, and they may be accessed alongside the data. This dataset of n-Ary relations can be used in natural language processing (NLP) approaches implemented in experimental fields, especially for n-Ary relation extraction research. It can also be useful for training or evaluation of methods for the extraction of experimental data from tables and text in scientific documents, especially in experimental domains such as food packaging.

Documents associés

Article (b-revue à comité de lecture)

Agents Cirad, auteurs de cette publication :