TY - GEN
T1 - Candidate document retrieval for Arabic-based text reuse detection on the web
AU - Lulu, Leena
AU - Belkhouche, Boumediene
AU - Harous, Saad
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2017/3/16
Y1 - 2017/3/16
N2 - Given an input document d, the problem of local text reuse detection is to detect from a given documents collection, all the possible reused passages between d and the other documents. Comparing the passages of document d with the passages of every other document in the collection is obviously infeasible especially with large collections such as the Web. Therefore, selecting a subset of the documents that potentially contains reused text with d becomes a major step in the detection problem. This paper describes a new efficient approach of query formulation to retrieve Arabic-based candidate source documents from the Web. We evaluated the work using a collection of documents especially constructed for this work. The experiments show that on average, 79.97% of the Web documents used in the reused cases were successfully retrieved.
AB - Given an input document d, the problem of local text reuse detection is to detect from a given documents collection, all the possible reused passages between d and the other documents. Comparing the passages of document d with the passages of every other document in the collection is obviously infeasible especially with large collections such as the Web. Therefore, selecting a subset of the documents that potentially contains reused text with d becomes a major step in the detection problem. This paper describes a new efficient approach of query formulation to retrieve Arabic-based candidate source documents from the Web. We evaluated the work using a collection of documents especially constructed for this work. The experiments show that on average, 79.97% of the Web documents used in the reused cases were successfully retrieved.
KW - Fingerprinting
KW - Query Generation
KW - Text Reuse Detection
KW - Web Document Retrieval
UR - http://www.scopus.com/inward/record.url?scp=85017607765&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85017607765&partnerID=8YFLogxK
U2 - 10.1109/INNOVATIONS.2016.7880048
DO - 10.1109/INNOVATIONS.2016.7880048
M3 - Conference contribution
AN - SCOPUS:85017607765
T3 - Proceedings of the 2016 12th International Conference on Innovations in Information Technology, IIT 2016
BT - Proceedings of the 2016 12th International Conference on Innovations in Information Technology, IIT 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 12th International Conference on Innovations in Information Technology, IIT 2016
Y2 - 28 November 2016 through 29 November 2016
ER -