Vol. 8 No. 1 (2026): Volume 08, Issue 1, June 2026
Articles

Analysis of Key Features in PCOS Diagnosis Using Random Forest and XGBoost with SMOTE and SHAP

Aulia Firdatunnisa
Universitas Siliwangi
Eka Wahyu Hidayat
Universitas Siliwangi
Siti Yuliyanti
Universitas Siliwangi

Published 2026-06-11

Keywords

  • Machine Learning, PCOS, SHAP, SMOTE, XGBoost

How to Cite

Analysis of Key Features in PCOS Diagnosis Using Random Forest and XGBoost with SMOTE and SHAP. (2026). International Journal of Applied Sciences and Smart Technologies, 8(1), 261-278. https://ejournal.usd.ac.id/index.php/ijasst/article/view/754

Abstract

Polycystic Ovary Syndrome (PCOS) is a hormonal disorder in women of 
reproductive age characterized by irregular cycles, hyperandrogenism, and 
polycystic ovarian morphology. Diagnosis is challenging because symptoms overlap with other endocrine disorders. This study proposes an interpretable machine learning approach for PCOS diagnosis using Random Forest and XGBoost. The Synthetic Minority Oversampling Technique (SMOTE) was applied to handle class imbalance, while Shapley Additive Explanations (SHAP) enhanced model interpretability. The dataset included 541 samples with 45 clinical and hormonal features, processed through preprocessing and hyperparameter tuning with GridSearchCV. XGBoost with SMOTE and GridSearchCV achieved the best performance, with 93% accuracy, 92% precision, 89% recall, and 90% F1-score. Random Forest obtained comparable results with 93% accuracy, 94% precision, 87% recall, and 90% F1-score. SHAP analysis highlighted key features such as follicle count, Anti Müllerian Hormone (AMH), skin darkening, weight gain, and irregular cycles. Global SHAP interpretation identified the most influential predictors, while local SHAP provided patient-specific explanations that improved transparency. The consistency of SHAP results with the Rotterdam criteria supports the model’s clinical validity and strengthens trust in AI-assisted tools. Overall, combining SMOTE, GridSearchCV, and SHAP not only improved predictive performance but also ensured transparent outcomes, indicating potential use for early PCOS screening.