Principal Component Analysis-Driven Feature Reduction for Predicting Coffee Quality Using a Machine Learning Approach
Published 2026-06-11
Keywords
- coffee quality,
- feature reduction,
- Pearson correlation ,
- PCA,
- principal component
How to Cite
Abstract
Coffee quality assessment using a machine learning approach faces major challenges, including high data dimensionality and redundancy between features. Therefore, PCA is proposed as a feature reduction technique to improve the efficiency and accuracy of coffee quality prediction models. The research phase began with data acquisition, data cleaning, feature engineering, explanatory data analysis, testing the normalization of coffee parameter profiles, implementing PCA on Random Forest and XGBoost models, and then evaluating model performance. Model evaluation using MAE and MAPE showed that Random Forest provided more precise predictions than XGBoost, particularly when applying PCA. This resulted in a 39% performance increase for Random Forest from 0.11903 to 0.08542 and an 8% increase for XGBoost, shifting the score from 0.12511 to 0.11570. Prediction visualization reinforced the consistency and precision of the Random Forest model, regardless of whether PCA was used. The findings of this study highlight the importance of feature cleaning and engineering, and the role of PCA in improving the precision of coffee quality predictions. The use of the Random Forest model with PCA is recommended as an efficient method for modeling the quality of Arabica coffee, taking into account sensory and environmental factors.