India | Volume 4 Issue 9, September 2015

Euclidean Distance Based Text Line Extraction and Skew Correction

Neha [452] | Apoorva Arora [2]

Abstract: There are many organizations like cultural, educational, governmental, a commercial that manages a wide range of handwritten text documents. In handwritten documents, Text line extraction remains a challenging problem in document image analysis due to multi-skew in handwritten documents. Skew detection and correction of extracting text line becomes a crucial step in document image analysis. In this paper, we intended an innovative technique for text line extraction followed by skew correction by using Euclidean Distance suitable for handwritten documents. We aim to handle single and multi-skew of handwritten text of various writers (single and multiple). Precisely, the problem is stated as energy minimization which affects the accuracy of text line extraction. Additionally, it is necessary to correct the skew of lower baseline and fluctuations of these text lines. Afterwards, text lines are extracted one by one on the proximity of joining of words and align the skew of each text line towards horizontal line. This innovative technique was implemented over 90 documents of various scripts, font size and font style written by single/ multiple writers and multi-skew of text. Till now, huge research has been done to develop the handwritten recognition systems so as to recognize and classify the Characters with the highest possible accuracy and within a shortest period of time. But all the existing systems according to my research extracts the text line features individually using the different technique for each feature which leads to the large amount of processing. So, I am trying to classify the text line documents with the highest possible accuracy and shortest possible time constraints by extracting the text lines rather than segmenting the text lines and words.

Keywords: Text Line Extraction, Skew Detection and Correction, Euclidean Distance, Gradient of a Line

