Abstract
The method for document image classification presented in this paper mainly focuses on six different Malayalam palm leaf manuscripts categories. The proposed approach consists of three phases: dataset analysis, building a bag of words repository followed by recognition and classification using a voting approach. The palm leaf manuscripts are initially subject to pre-processing and subjective analysis techniques to create a bag of words repository during the dataset analysis phase. Next, the textual components from the manuscripts are extracted for recognition using Tesseract 4 OCR with default and self-adapted training sets and a deep-learning algorithm. The Bag of Words approach is used in the third phase to categorize the palm leaf manuscripts based on textual components recognized by OCR using a voting process. Experimental analysis was done to analyze the proposed approach with and without the voting techniques, varying the size of the Bag of Words with default/self-adapted training datasets using Tesseract OCR and a deep learning model. Experimental analysis proves that the proposed approach works equally well with/ without voting with a bag of words technique using Tesseract OCR. It is noticed that, for document classification, an overall accuracy of 83% without voting and 84.5% with voting is achieved with an F-score of 0.90 in both cases using Teserract OCR. Overall, the proposed approach proves to be high generalizable based on trial wise experiments with Bag of Words, offering a reliable way for classifying deteriorated Malayalam handwritten palm manuscripts.
| Original language | English |
|---|---|
| Pages (from-to) | 4031-4049 |
| Number of pages | 19 |
| Journal | Journal of Intelligent and Fuzzy Systems |
| Volume | 45 |
| Issue number | 3 |
| DOIs | |
| Publication status | Published - Aug 24 2023 |
| Externally published | Yes |
Keywords
- ancient document images
- deep learning
- Document image classification
- handwritten document analysis
- palm leaf manuscripts
- Tesseract OCR
ASJC Scopus subject areas
- Statistics and Probability
- General Engineering
- Artificial Intelligence