TY - JOUR
T1 - Graph reduction techniques for instance selection
T2 - comparative and empirical study
AU - Rustamov, Zahiriddin
AU - Zaki, Nazar
AU - Rustamov, Jaloliddin
AU - Zaitouny, Ayham
AU - Damseh, Rafat
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2025/2
Y1 - 2025/2
N2 - The surge in data generation has prompted a shift to big data, challenging the notion that “more data equals better performance” due to processing and time constraints. In this evolving artificial intelligence and machine learning landscape, instance selection (IS) has become crucial for data reduction without compromising model quality. Traditional IS methods, though efficient, often struggle with large, complex datasets in data mining. This study evaluates graph reduction techniques, grounded in graph theory, as a novel approach for instance selection. The objective is to leverage the inherent structures of data represented as graphs to enhance the effectiveness of instance selection. We evaluated 35 graph reduction techniques across 29 classification datasets. These techniques were assessed based on various metrics, including accuracy, F1 score, reduction rate, and computational times. Graph reduction methods showed significant potential in maintaining data integrity while achieving substantial reductions. Top techniques achieved up to 99% reduction while maintaining or improving accuracy. For instance, the Multilevel sampling achieved an accuracy effectiveness score of 0.8555 with 99.16% reduction on large datasets, while Leiden sampling showed high effectiveness on smaller datasets (0.8034 accuracy, 97.87% reduction). Computational efficiency varied widely, with reduction times ranging from milliseconds to minutes. This research advances the theory of graph-based instance selection and offers practical application guidelines. Our findings indicate graph reduction methods effectively preserve data quality and boost processing efficiency in large, complex datasets, with some techniques achieving up to 160-fold speedups in model training at high reduction rates.
AB - The surge in data generation has prompted a shift to big data, challenging the notion that “more data equals better performance” due to processing and time constraints. In this evolving artificial intelligence and machine learning landscape, instance selection (IS) has become crucial for data reduction without compromising model quality. Traditional IS methods, though efficient, often struggle with large, complex datasets in data mining. This study evaluates graph reduction techniques, grounded in graph theory, as a novel approach for instance selection. The objective is to leverage the inherent structures of data represented as graphs to enhance the effectiveness of instance selection. We evaluated 35 graph reduction techniques across 29 classification datasets. These techniques were assessed based on various metrics, including accuracy, F1 score, reduction rate, and computational times. Graph reduction methods showed significant potential in maintaining data integrity while achieving substantial reductions. Top techniques achieved up to 99% reduction while maintaining or improving accuracy. For instance, the Multilevel sampling achieved an accuracy effectiveness score of 0.8555 with 99.16% reduction on large datasets, while Leiden sampling showed high effectiveness on smaller datasets (0.8034 accuracy, 97.87% reduction). Computational efficiency varied widely, with reduction times ranging from milliseconds to minutes. This research advances the theory of graph-based instance selection and offers practical application guidelines. Our findings indicate graph reduction methods effectively preserve data quality and boost processing efficiency in large, complex datasets, with some techniques achieving up to 160-fold speedups in model training at high reduction rates.
KW - Big data
KW - Data mining
KW - Data reduction
KW - Graph reduction
KW - Instance selection
KW - Machine learning
UR - http://www.scopus.com/inward/record.url?scp=85212765283&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85212765283&partnerID=8YFLogxK
U2 - 10.1007/s10462-024-10971-4
DO - 10.1007/s10462-024-10971-4
M3 - Article
AN - SCOPUS:85212765283
SN - 0269-2821
VL - 58
JO - Artificial Intelligence Review
JF - Artificial Intelligence Review
IS - 2
M1 - 62
ER -