Handling Highly Imbalanced Flood Data Using K-Means Clustering in Skyline Query Dominance Testing

Vega Purwayoga; Zakwan Gusnadi; Winda Ayu Anggraini

doi:10.24071/b4693z32

Vol. 8 No. 1 (2026): Volume 08, Issue 1, June 2026

Articles

Handling Highly Imbalanced Flood Data Using K-Means Clustering in Skyline Query Dominance Testing

pdf

Vega Purwayoga,
Zakwan Gusnadi,
Winda Ayu Anggraini

more info

Vega Purwayoga
Universitas Siliwangi

Zakwan Gusnadi
Universitas Siliwangi

Winda Ayu Anggraini
Universitas Siliwangi

Published 2026-06-11

Keywords

Flood, Imbalanced Data, K-Means Clustering, Skyline Query

How to Cite

Handling Highly Imbalanced Flood Data Using K-Means Clustering in Skyline Query Dominance Testing. (2026). International Journal of Applied Sciences and Smart Technologies, 8(1), 43-56. https://doi.org/10.24071/b4693z32

Abstract

Skyline query is a recommendation algorithm used to select objects based on multi-attribute preferences, but a key challenge is that its results can be highly imbalanced, where only a small number of objects meet the preferred criteria. This imbalance reduces the reliability of spatial decision-making, including in flood vulnerability assessment. This study addresses the issue by applying a modified Sort-Filter Skyline method that considers maximum and minimum attribute preferences during sorting. The skyline output shows a strong class imbalance, with only 18 areas identified as flood-prone compared to 1,574 non-flood-prone areas. To mitigate this, K-Means clustering is used as a refinement step. The Elbow and Gap Statistic methods recommend three clusters as optimal, while the Silhouette method suggests eight. Cluster distribution analysis shows that three clusters produce a more balanced representation, with Scheme 1 and Scheme 3 showing better balance ratios and lower variation than Scheme 2. Thus, clustering into three groups helps achieve a more representative mapping of flood-prone areas.

pdf

Handling Highly Imbalanced Flood Data Using K-Means Clustering in Skyline Query Dominance Testing

Keywords

How to Cite

Download Citation

Abstract

Similar Articles