Vol. 8 No. 1 (2026): Volume 08, Issue 1, June 2026
Articles

Handling Highly Imbalanced Flood Data Using K-Means Clustering in Skyline Query Dominance Testing

Vega Purwayoga
Universitas Siliwangi
Zakwan Gusnadi
Universitas Siliwangi
Winda Ayu Anggraini
Universitas Siliwangi

Published 2026-06-11

Keywords

  • Flood, Imbalanced Data, K-Means Clustering, Skyline Query

How to Cite

Handling Highly Imbalanced Flood Data Using K-Means Clustering in Skyline Query Dominance Testing. (2026). International Journal of Applied Sciences and Smart Technologies, 8(1), 43-56. https://ejournal.usd.ac.id/index.php/ijasst/article/view/881

Abstract

Skyline query is a recommendation algorithm used to select objects based on multi-attribute preferences, but a key challenge is that its results can be highly imbalanced, where only a small number of objects meet the preferred criteria. This imbalance reduces the reliability of spatial decision-making, including in flood vulnerability assessment. This study addresses the issue by applying a modified Sort-Filter Skyline method that considers maximum and minimum attribute preferences during sorting. The skyline output shows a strong class imbalance, with only 18 areas identified as flood-prone compared to 1,574 non-flood-prone areas. To mitigate this, K-Means clustering is used as a refinement step. The Elbow and Gap Statistic methods recommend three clusters as optimal, while the Silhouette method suggests eight. Cluster distribution analysis shows that three clusters produce a more balanced representation, with Scheme 1 and Scheme 3 showing better balance ratios and lower variation than Scheme 2. Thus, clustering into three groups helps achieve a more representative mapping of flood-prone areas.