Could key word masking strategy improve language model?

Les publications du Cirad

Auteurs Cirad

Publications récentes

Could key word masking strategy improve language model?

Borovikova M., Ferré A., Bossy R., Roche M., Nédellec C.. 2023. In : Métais Elisabeth (ed.), Meziane Farid (ed.), Sugumaran Vijayan (ed.) , Manning Warren (ed.) , Reiff-Marganiec Stephan (ed.). Natural language processing and information systems: 28th International Conference on Applications of Natural Language to Information Systems, NLDB 2023, Derby, UK, June 21–23, 2023, Proceedings. Cham : Springer, p. 271-284. (Lecture Notes in Computer Science, 13913). International Conference on Applications of Natural Language to Information Systems (NLDB 2023). 28, 2023-06-21/2023-06-23, Derby (Royaume-Uni).

DOI: 10.57745/HVPITE

DOI: 10.1007/978-3-031-35320-8_19

This paper presents an enhanced approach for adapting a Language Model (LM) to a specific domain, with a focus on Named Entity Recognition (NER) and Named Entity Linking (NEL) tasks. Traditional NER/NEL methods require a large amounts of labeled data, which is time and resource intensive to produce. Unsupervised and semi-supervised approaches overcome this limitation but suffer from a lower quality. Our approach, called KeyWord Masking (KWM), fine-tunes a Language Model (LM) for the Masked Language Modeling (MLM) task in a special way. Our experiments demonstrate that KWM outperforms traditional methods in restoring domain-specific entities. This work is a preliminary step towards developing a more sophisticated NER/NEL system for domain-specific data.

Documents associés

Accéder au PDF via la notice Agritrop

Communication de congrès

Agents Cirad, auteurs de cette publication :

Roche Mathieu — Es / UMR TETIS