How to improve interpretability in complex statistical models without losing precision
Events like the Scandinavian Symposium on Chemometrics (SSC2025) are where advanced methods meet real-world challenges. One such method is Sparse PCA (SPCA) — a technique used to simplify high-dimensional datasets by selecting only the most relevant variables.
José Camacho, researcher and co-founder of Datharsis, presented a corrected version of SPCA that improves result interpretation in contexts such as biomarker discovery.
Here we explain why this matters, and how it can help if you work with data in biotechnology, health, or experimental research.
What is Sparse PCA?
PCA (Principal Component Analysis) is widely used for dimensionality reduction. SPCA goes one step further: it forces many variable weights to be zero, so you can focus on the ones that matter most. In short: SPCA helps you identify the most meaningful variables in your dataset.
What can go wrong?
José Camacho highlights an important point: if you don’t understand what the algorithm is doing, you might interpret the results incorrectly. For example:
- The selected variables may not be the real drivers — they could just be highly correlated with them.
- If you use SPCA with certain setups (with deflation and non-orthogonal loadings), the model may generate artifacts that distort your conclusions.
A corrected (and more useful) SPCA version
The new version of SPCA that he presented allows you to:
- Control how many relevant variables are selected per component
- Correctly calculate scores and explained variance
- Distinguish between two types of variables:
- Representatives: those selected by the model
- Associates: other variables that are highly correlated and could have been selected as well
To learn more
Scientific articles
- Camacho, J., Smilde, A.K., Saccenti, E., Westerhuis, J. All Sparse PCA Models Are Wrong, But Some Are Useful. Part I: Computation of Scores, Residuals and Explained Variance. Chemometrics and Intelligent Laboratory Systems, 2020, 196: 1039072. https://doi.org/10.1016/j.chemolab.2019.103907
- Camacho, J., Smilde, A.K., Saccenti, E., Westerhuis, J., Bro, R. All Sparse PCA Models Are Wrong, But Some Are Useful. Part II: Limitations and Problems of Deflation . Chemometrics and Intelligent Laboratory Systems, 2021, 208: 104212. https://doi.org/10.1016/j.chemolab.2020.104212
- Camacho, J., Smilde, A.K., Saccenti, E., Westerhuis, J., Bro, R. All Sparse PCA Models Are Wrong, But Some Are Useful. Part III: Model Interpretation. Submitted to Chemometrics and Intelligent Laboratory Systems, 2025.
Code repositories / tools
- MEDA Toolbox v1.8 (with corrected SPCA routines): https://github.com/josecamachop/MEDA-Toolbox/releases/tag/v1.8
- Corrected SPCA code base: https://github.com/josecamachop/SparsePCAIII
Related projects
- MuSTARD project (Multi-scale Spatio-Temporal Analysis of Research Data): https://codas.ugr.es/mustard/en/
Want to explore how SPCA or other exploratory techniques could benefit your project?

