Publications des agents du Cirad

Cirad

RoBoost-PLSR: Robust PLS regression method inspired from boosting principles

Metz M., Abdelghafour F., Roger J.M., Lesnoff M.. 2021. In : e-Chimiométrie 2021, 2-3 février 2021. Programme, book of abstracts. s.l. : GFC, p. 11-12. e-Chimiometrie 2021, 2021-02-02/2021-02-03, (France).

Introduction - The calibration of Partial Least Square regression (PLSR) models can be disturbed by outlying samples in the data. In these cases, the models can be unstable and their predictive potential can be depreciated. To address this issue, a new method and algorithm to better apprehend the downweighting of outliers in a context of high dimensional data processing is proposed. This novel robust PLSR algorithm is inspired from the principles of boosting and is called RoBoost-PLSR. Theory - RoBoost-PLSR consists in achieving a series of K unidimensional (1 LV) iteratively reweighted PLSR[1] models. The weigthed PLSR algorithm used is the weighted NIPALS[2]. The model (k+1) is calibrated with the X and Y residuals of the previous k models. Within each model, weights are computed according to a combination of X-residuals, Y-residuals and leverages. The more the samples deviate from the model, the lower the weights. Iteratively, the model is updated according to the weights previously attributed until convergence to a stable solution. Material and methods - RoBoost-PLSR was compared with the PLSR algorithm calibrated with and without outliers (i.e. the reference) and with Partial Robust M-regression (PRM), a reference robust method. This evaluation was conducted on the basis of a simulated dataset and a real dataset. The simulated dataset was generated with the framework proposed in [4]. The simulation objective is to reproduce a contamination in the samples leading to inconsistent spectral measurements. The real dataset is an example of one animal nutrition application: the prediction of the protein content of feed materials and of the presence of incorrectly categorised samples. In this database the samples resulting from animal bonemeal (noted ANF) represent the outliers polluting the regular soyabean cakes (noted TTS). Conclusion - Roboost-PLSR proves to be resilient to the tested outliers, and can achieve the performances of the reference PLSR calibrated without any of these outliers.

Documents associés

Communication de congrès

Agents Cirad, auteurs de cette publication :