Optical Character Recognition for Nastaleeq Printed Urdu Text using Histogram of Oriented Gradient Features

Awais Ahmad; Fatima Yousaf; Tanzeela Kousar

doi:10.66108/mna.v3i1.67

Authors

Awais Ahmad Department of Computer Science, Bahauddin Zakariya University, Multan, 60000, Pakistan
Fatima Yousaf Department of Computer Science, Bahauddin Zakariya University, Multan, 60000, Pakistan
Tanzeela Kousar Institute of Computer Science and Information Technology, The Women University Multan, 60000, Pakistan

DOI:

https://doi.org/10.66108/mna.v3i1.67

Keywords:

Urdu language, Optical Character Recognition, HOG features, Connected Components, Support Vector Machine

Abstract

The focus of research on optical character recognition (OCR) has been to digitize text in images. Urdu OCR is a challenging task because of its complexity, where a character can have multiple inflections depending on its position in the word, making it more difficult than English and similar languages. The proposed research aims to detect offline Urdu printed text using a segmentation-free approach, which means a holistic approach is taken. Horizontal histogram projection is used to extract text lines from an image, while connected components labelling is used for ligature segmentation in the extracted image to text line. To train the proposed model, a set of 14 statistical features along with HOG features are extracted for each sub-word/ligature. An open-source dataset UPTI is used to train and test the proposed algorithm, and SVM with RBF kernel function is used for the classification of ligatures. The proposed algorithm achieves a 97.3%-character recognition rate on the given dataset.

Downloads

Download data is not yet available.

Optical Character Recognition for Nastaleeq Printed Urdu Text using Histogram of Oriented Gradient Features

Authors

DOI:

Keywords:

Abstract

Downloads

Additional Files

Published

How to Cite

Issue

Section

Categories

License

Most read articles by the same author(s)

ISSN

Sidebar1

Information

flagcounter

Indexing