Publications des agents du Cirad

Cirad

Accelerating the automated detection, counting and measurements of reproductive organs in herbarium collections in the era of deep learning

Mora-Fallas A., Goeau H., Mazer S.J., Love N., Mata-Montero E., Bonnet P., Joly A.. 2019. Biodiversity Information Science and Standards, 3 : 3 p.. Biodiversity Next: Building a global infrastructure for biodiversity data, 2019-10-22/2019-10-25, Leiden (Pays-Bas).

DOI: 10.3897/biss.3.37341

Millions of herbarium records provide an invaluable legacy and knowledge of the spatial and temporal distributions of plants over centuries across all continents (Soltis et al. 2018). Due to recent efforts to digitize and to make publicly accessible most major natural collections, investigations of ecological and evolutionary patterns at unprecedented geographic scales are now possible (Carranza-Rojas et al. 2017, Lorieul et al. 2019). Nevertheless, biologists are now facing the problem of extracting from a huge number of herbarium sheets basic information such as textual descriptions, the numbers of organs, and measurements of various morphological traits. Deep learning technologies can dramatically accelerate the extraction of such basic information by automating the routines of organ identification, counts and measurements, thereby allowing biologists to spend more time on investigations such as phenological or geographic distribution studies. Recent progress on instance segmentation demonstrated by the Mask-RCNN method is very promising in the context of herbarium sheets, in particular for detecting with high precision different organs of interest on each specimen, including leaves, flowers, and fruits. However, like any deep learning approach, this method requires a significant number of labeled examples with fairly detailed outlines of individual organs. Creating such a training dataset can be very time-consuming and may be discouraging for researchers. We propose in this work to integrate the Mask-RCNN approach within a global system enabling an active learning mechanism (Sener and Savarese 2018) in order to minimize the number of outlines of organs that researchers must manually annotate. The principle is to alternate cycles of manual annotations and training updates of the deep learning model and predictions on the entire collection to process. Then, the challenge of the active learning mechanism is to estimate automatically at each cycle which are the most

Mots-clés : herbier; morphologie végétale; anatomie végétale; collection botanique; identification; apprentissage machine; traitement des données; organe reproducteur végétal; deep learning; streptanthus tortuosus

Documents associés

Article (b-revue à comité de lecture)

Agents Cirad, auteurs de cette publication :