Pl@ntBERT: leveraging large language models to enhance vegetation classification through species composition analysis
Leblanc C., Bonnet P., Servajean M., Joly A.. 2024. In : ECCB 2024 - 7th European Congress of Conservation Biology: Biodiversity positive by 2030. Book of abstracts. Bologne : Università di Bologna, p. 121. European Congress of Conservation Biology (ECCB 2024). 7, 2024-06-17/2024-06-21, Bologne (Italie).
Biodiversity is under pressure, as many disturbance events threaten natural areas. Therefore, habitat distribution mapping is increasingly relevant for monitoring their statuses. It aims to quantify the mathematical relationships between predictors and occurrences of categorized locations. Thus, advanced numerical technologies are more required than ever. They help summarizing our knowledge of species assemblages. Herein, we present Pl@ntBERT, a framework that encodes vegetation patterns and enhances their classifications. This tool leverages computer science and linguistic processes based on transformers. In particular, the pipeline implements two artificial intelligence tasks: fill-mask and text classification. Firstly, masked language modeling gets a statistical understanding of vascular plant compositions. Then, subsequent training assigns a label to sentences describing phytosociological relevés. The fine-tuning of a pretrained foundation model on in-domain words shows significant upgrade and clearly outperforms previous state-of-the-art methods. The software pushes the accuracy score on a database containing millions of European surveys to 92.48%. Finally, our results showcase that flora is a strong marker of ecosystems and doesn't need to be coupled with environmental data to train neural networks. The proposed application has a vocabulary covering over ten thousand organisms. This approach offers a methodology for advancing our comprehension in community ecology and conservation biology.
Documents associés
Communication de congrès
Agents Cirad, auteurs de cette publication :
- Bonnet Pierre — Bios / UMR AMAP
- Leblanc César — Bios / UMR AMAP