Abstract
This paper presents a new technique for efficient searching with fuzzy criteria in very large information systems. The suggested technique uses the Pigeonhole Principle approach. This approach can be utilized with different embodiments, but the most effective realization would be to amplify some already given intrinsic approximate matching capabilities, like those in the FuzzyFind method [1][2]. Considering the following problem, a data to be searched is presented as a bit-attribute vector. The searching operation consists of finding a subset of this bit-attribute vector that is within particular Hamming distance. Normally, this search with approximate matching criteria requires sequential lookup for the whole collection of the attribute vector. This process can be easily parallelized, but in very large information systems this still would be slow and energy consuming. The suggested method, in this paper, of approximate search in very large files using the Pigeonhole Principle, circumvents the sequential search operations and reduces the calculations tremendously.
Original language | English |
---|---|
Pages (from-to) | 2773-2783 |
Number of pages | 11 |
Journal | International Review on Computers and Software |
Volume | 8 |
Issue number | 12 |
Publication status | Published - Dec 2013 |
Externally published | Yes |
Keywords
- Algorithms and data structures
- Approximate matching
- Big data
- Information retrieval
- Pigeonhole principle
ASJC Scopus subject areas
- Computer Science(all)