Scalable graph attention-based instance selection via mini-batch sampling and hierarchical hashing

Research output: Contribution to journalArticlepeer-review

Abstract

Instance selection (IS) addresses the critical challenge of reducing dataset size while keeping informative characteristics, becoming increasingly important as datasets grow to millions of instances. Current IS methods often struggle with capturing complex relationships in high-dimensional spaces and scale with large datasets. This paper introduces a graph attention-based instance selection (GAIS) method that uses attention mechanisms to identify informative instances through their structural relationships in graph representations. We present two approaches for scalable graph construction: a distance-based mini-batch sampling technique that achieves dataset-size-independent complexity through strategic batch processing, and a hierarchical hashing approach that enables efficient similarity computation through random projections. The mini-batch approach keeps class distributions through stratified sampling, while the hierarchical hashing method captures relationships at multiple granularities through single-level, multi-level, and multi-view variants. Experiments across 39 datasets show that GAIS achieves reduction rates above 96% while maintaining or improving model performance relative to state-of-the-art IS methods. The findings show that the distance-based mini-batch approach offers an optimal efficiency for large-scale datasets, while multi-view variants excel on complex, high-dimensional data, demonstrating that attention-based importance scoring can effectively identify instances important for maintaining decision boundaries while avoiding computationally prohibitive pairwise comparisons. The code is publicly available athttps://github.com/zahiriddin-rustamov/gais.

Original languageEnglish
Pages (from-to)167-182
Number of pages16
JournalAI Open
Volume6
DOIs
Publication statusPublished - 2025

Keywords

  • Graph attention networks
  • Graph data selection
  • Instance selection
  • Locality-sensitive hashing
  • Mini-batch sampling

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Science Applications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Scalable graph attention-based instance selection via mini-batch sampling and hierarchical hashing'. Together they form a unique fingerprint.

Cite this