Publications des agents du Cirad

Cirad

Jchemo: Chemometrics and machine learning on high-dimensional data with Julia

Lesnoff M.. 2024. In : Bastianelli Denis (ed.), Gilles Chaix (ed.). Résumés des communications présentées aux 25èmes rencontres HélioSPIR, Montpellier (France), 11-12 juin 2024. Montpellier : Association HélioSPIR, p. 36. Rencontres HélioSPIR. 25, 2024-06-11/2024-06-12, Montpellier (France).

Julia (https://julialang.org) is a programming language designed for high performance. It is an open source project made available under the MIT license. The language tries to tackle the “two-language problem” referring to the fact that many scientific codes are prototyped in a slow but flexible language (to test an idea quickly) but then have to be moved to a faster (e.g. C++) but less flexible language for practical applications. Julia allows fast computations with simple and easily readable coding. Works on Julia began in 2009. Julia's syntax is now considered stable, since version 1.0 in 2018 (actual version June 2024: 1.10.4), with many registered available packages and a very active users' forum (https://discourse.julialang.org). The proposed poster will present Jchemo [1] (https://github.com/mlesnoff/Jchemo.jl), a Julia package (tool-box) dedicated to chemometrics and machine learning in general. • Why did I decide to switch in 2021 from the language R to Julia for my chemometrics works? Trying to run a PLSR (25 LVs) with n = 1e6 samples and p = 500 variables with my R function crashed systematically my working session (with a I9 Intel processor). With the same computer and function but written in Julia, the computation took 8 seconds. • Why did I choose Julia compared to Matlab? Since Julia is free. Jchemo was initially dedicated to partial least squares regression (PLSR) and discrimination (PLSDA) models and their extensions, in particular locally weighted PLS models (kNN-LWPLS-R & -DA). The package has then been expanded to various dimension reduction and regression/discrimination models. Beside usual chemometrics methods (signal preprocessing, PCA, PLS etc.), multi-block methods are available for dimension reduction (e.g. MBPCA, ComDim, rCCA, etc.) and regression/discrimination (MBPLS, ROSAPLS, SOPLS, etc.). Various ridge and sparse models are proposed as well as many nonlinear models useful for modeling heterogeneous data (kernel latent variables/ridge,

Documents associés

Communication de congrès

Agents Cirad, auteurs de cette publication :