Abstract
Given an input document d, the problem of local text reuse detection is to detect from a given documents collection, all the possible reused passages between d and the other documents. Comparing the passages of document d with the passages of every other document in the collection is obviously infeasible especially with large collections such as the Web. Therefore, selecting a subset of the documents that potentially contains reused text with d becomes a major step in the detection problem. This paper describes a new efficient approach of query formulation to retrieve Arabic-based candidate source documents from the Web. We evaluated the work using a collection of documents especially constructed for this work. The experiments show that on average, 79.97% of the Web documents used in the reused cases were successfully retrieved.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2016 12th International Conference on Innovations in Information Technology, IIT 2016 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781509053438 |
DOIs | |
Publication status | Published - Mar 16 2017 |
Event | 12th International Conference on Innovations in Information Technology, IIT 2016 - Al Ain, United Arab Emirates Duration: Nov 28 2016 → Nov 29 2016 |
Other
Other | 12th International Conference on Innovations in Information Technology, IIT 2016 |
---|---|
Country/Territory | United Arab Emirates |
City | Al Ain |
Period | 11/28/16 → 11/29/16 |
Keywords
- Fingerprinting
- Query Generation
- Text Reuse Detection
- Web Document Retrieval
ASJC Scopus subject areas
- Computer Science Applications
- Hardware and Architecture
- Information Systems
- Computer Networks and Communications
- Instrumentation