Averaging and stacking partial least squares regression models to predict the chemical compositions and the nutritive values of forages from spectral near infrared data
Lesnoff M., Andueza D., Barotin C., Barré P., Bonnal L., Fernandez Pierna J.A., Picard F., Vermeulen P., Roger J.M.. 2022. Applied Sciences, 12 (15) : 15 p..
DOI: 10.3390/app12157850
Partial least square regression (PLSR) is a reference statistical model in chemometrics. In agronomy, it is used to predict components (response variables y) of chemical composition of vegetal materials from spectral near infrared (NIR) data X collected from spectrometers. PLSR reduces the dimension of the spectral data X by defining vectors that are then used as latent variables (LVs) in a multiple linear model. One difficulty is to determine the relevant dimensionality (number of LVs) for the given data. This step can be very time consuming when many datasets have to be processed and/or the datasets are frequently updated. The paper focuses on an alternative, bypassing the determination of the PLSR dimensionality and allowing for automatizing the predictions. The strategy uses ensemble learning methods, such as averaging or stacking the predictions of a set of PLSR models with different dimensionalities. The paper presents various methods of PLSR averaging and stacking and compares their performances to the usual PLSR on six real datasets on different types of forages. The main finding of the study was the overall superiority of the averaging methods compared to the usual PLSR. We therefore believe that such methods can be recommended to analyze NIR data on forages.
Mots-clés : composition chimique; spectroscopie infrarouge; modèle mathématique; propriété physicochimique; technique de prévision; vecteur de maladie; valeur nutritive
Documents associés
Article (a-revue à facteur d'impact)
Agents Cirad, auteurs de cette publication :
- Bonnal Laurent — Es / UMR SELMET
- Lesnoff Matthieu — Es / UMR SELMET