TY - GEN
T1 - An efficient technique for searching very large files with fuzzy criteria using the pigeonhole principle
AU - Yammahi, Maryam
AU - Kowsari, Kamran
AU - Shen, Chen
AU - Berkovich, Simon
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/9/24
Y1 - 2014/9/24
N2 - Big Data is the new term of the exponential growth of data in the Internet. The importance of Big Data is not about how large it is, but about what information you can get from analyzing these data. Such analysis would help many businesses on making smarter decisions, and provide time and cost reduction. Therefore, to make such analysis, you will definitely need to search the large files on Big Data. Big Data is such a construction where sequential search is prohibitively inefficient, in terms of time and energy. Therefore, any new technique that allows very efficient search in very large files is highly demanded. The paper presents an innovative approach for efficient searching with fuzzy criteria in very large information systems(Big Data). Organization of efficient access to a large amount of information by an 'approximate' or 'fuzzy' indication is a rather complicated Computer Science problem. Usually, the solution of this problem relies on a brute force approach, which results in sequential look-up of the file. In many cases, this substantially undermines system performance. The suggested technique in this paper uses different approach based on the Pigeonhole Principle. It searches binary strings that match the given request approximately. It substantially reduces the sequential search operations and works extremely efficiently from several orders of magnitude including speed, cost and energy. This paper presents a complex developed scheme for the suggested approach using a new data structure, called FuzzyFind Dictionary. The developed scheme provides more accuracy than the basic utilization of the suggested method. It also, works much faster than the sequential search.
AB - Big Data is the new term of the exponential growth of data in the Internet. The importance of Big Data is not about how large it is, but about what information you can get from analyzing these data. Such analysis would help many businesses on making smarter decisions, and provide time and cost reduction. Therefore, to make such analysis, you will definitely need to search the large files on Big Data. Big Data is such a construction where sequential search is prohibitively inefficient, in terms of time and energy. Therefore, any new technique that allows very efficient search in very large files is highly demanded. The paper presents an innovative approach for efficient searching with fuzzy criteria in very large information systems(Big Data). Organization of efficient access to a large amount of information by an 'approximate' or 'fuzzy' indication is a rather complicated Computer Science problem. Usually, the solution of this problem relies on a brute force approach, which results in sequential look-up of the file. In many cases, this substantially undermines system performance. The suggested technique in this paper uses different approach based on the Pigeonhole Principle. It searches binary strings that match the given request approximately. It substantially reduces the sequential search operations and works extremely efficiently from several orders of magnitude including speed, cost and energy. This paper presents a complex developed scheme for the suggested approach using a new data structure, called FuzzyFind Dictionary. The developed scheme provides more accuracy than the basic utilization of the suggested method. It also, works much faster than the sequential search.
KW - Algorithms and Data Structure
KW - Approximate search
KW - Big Data
KW - Information Retrieval
KW - Pigeonhole Principle
UR - http://www.scopus.com/inward/record.url?scp=84908565802&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84908565802&partnerID=8YFLogxK
U2 - 10.1109/COM.Geo.2014.8
DO - 10.1109/COM.Geo.2014.8
M3 - Conference contribution
AN - SCOPUS:84908565802
T3 - Proceedings - 5th International Conference on Computing for Geospatial Research and Application, COM.Geo 2014
SP - 82
EP - 86
BT - Proceedings - 5th International Conference on Computing for Geospatial Research and Application, COM.Geo 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th International Conference on Computing for Geospatial Research and Application, COM.Geo 2014
Y2 - 4 August 2014 through 6 August 2014
ER -