A Robust Method for Text, Line, and Word Segmentation for Historical Arabic Manuscripts

  • Omar Elharrouss
  • , Somaya Al-Maadeed
  • , Jihad Mohamad Alja’am
  • , Abdelaali Hassaine

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Citation (Scopus)

Abstract

The segmentation of old documents is a crucial phase for reading and understanding the content of a document automatically. Also, the extraction of words and phrases in a document needs segmentation of each line and word. But, the variations of text lines directions throughout the same document and overlapping characters between two or more text lines, especially in Arabic manuscripts, are the problems that usually found in such documents. For that, this chapter proposes an approach for text segmentation as well as line and word for historical Arabic manuscripts. First, text segmentation is realized using an encoder-decoder deep model to segment the main text and side text in the image. The model has been trained on two Arabic manuscripts dataset including Bukhari and RASM2018 datasets. Then, the segmentation of lines using a smoothing approach followed by thresholding determined automatically according to the size of handwriting. Then, segmentation of words is provided using smoothed Chamfer distance which takes into consideration the handwriting characteristics. The evaluation of the proposed approach is reported on the QUWI Arabic database and very promising results are achieved.

Original languageEnglish
Title of host publicationData Analytics for Cultural Heritage
Subtitle of host publicationCurrent Trends and Concepts
PublisherSpringer International Publishing
Pages147-172
Number of pages26
ISBN (Electronic)9783030667771
ISBN (Print)9783030667764
DOIs
Publication statusPublished - Jan 1 2021
Externally publishedYes

ASJC Scopus subject areas

  • General Computer Science
  • General Engineering
  • General Social Sciences
  • General Arts and Humanities

Fingerprint

Dive into the research topics of 'A Robust Method for Text, Line, and Word Segmentation for Historical Arabic Manuscripts'. Together they form a unique fingerprint.

Cite this