Approximate search in very large files using the pigeonhole principle

Maryam S. Yammahi, Chen Shen, Simon Berkovich

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)


This paper presents a new technique for efficient searching with fuzzy criteria in very large information systems. The suggested technique uses the Pigeonhole Principle approach. This approach can be utilized with different embodiments, but the most effective realization would be to amplify some already given intrinsic approximate matching capabilities, like those in the FuzzyFind method [1][2]. Considering the following problem, a data to be searched is presented as a bit-attribute vector. The searching operation consists of finding a subset of this bit-attribute vector that is within particular Hamming distance. Normally, this search with approximate matching criteria requires sequential lookup for the whole collection of the attribute vector. This process can be easily parallelized, but in very large information systems this still would be slow and energy consuming. The suggested method, in this paper, of approximate search in very large files using the Pigeonhole Principle, circumvents the sequential search operations and reduces the calculations tremendously.

Original languageEnglish
Pages (from-to)2773-2783
Number of pages11
JournalInternational Review on Computers and Software
Issue number12
Publication statusPublished - Dec 2013
Externally publishedYes


  • Algorithms and data structures
  • Approximate matching
  • Big data
  • Information retrieval
  • Pigeonhole principle

ASJC Scopus subject areas

  • General Computer Science


Dive into the research topics of 'Approximate search in very large files using the pigeonhole principle'. Together they form a unique fingerprint.

Cite this