Extraction of Handwritten Text using Word Beam Search Algorithm and Language Modeling

Authors

  • Kavitha Ananth Research Scholar, Department of Computer Science, CHRIST (Deemed to be University), Bengaluru, India
  • Kirubanand V B Faculty of Department of Computer Science CHRIST (Deemed to be University), Bengaluru, India

DOI:

https://doi.org/10.15379/ijmst.v10i2.2970

Keywords:

CTC (Connectionist Temporal Classification), RNN ( Recurrent Neural Networks), HTR( Handwritten Text Recognition), AI(Artificial Intelligence), HMM (Hidden Markov Model), RNN (Recurrent Neural Network), MDRNN (Multi-Dimensional Recurrent Neural Network ), CNN (Convolutional Neural Network), NMT (Neural Machine Translation), LSTM (Long Short Term Memory ), LM (Language Model ), FSM (Finite State Machine)

Abstract

The challenge of recognizing handwriting in mortgage records is covered in this article. Businesses trying to digitize huge numbers of hand-marked scanned documents or reports have a significant challenge: offline handwritten text recognition from images. In order to translate a picture into a series of characters that match to the text that is contained in the image, this research suggests an innovative language model in combination with a deep convolutional network and a recurrent encoder-decoder network. Using the principles of Deep Learning and Word Beam Search, the complete model is trained as an end-to-end replacement for conventional handwriting recognition techniques. When the Connectionist Temporal Classification (CTC) loss function is trained on the digital form, an RNN is the result. Character probabilities are contained in this matrix for each discrete time step. By translating the character probabilities, a CTC decoding algorithm maps the final text. The token passing mechanism is used to create the recognized text from a list of dictionary words. We offer a novel and highly efficient method for developing restrictive models for classification which might associate entity names in accordance with the data contained in the article on entity types. A benchmark dataset predicated on the Mortgage domain is included. This Mortgage domain is evaluated in the presented model. We tested the model provided below using a set of benchmark mortgage datasets, which are published. The experimental outcomes were compared to the IAM and RIMES datasets, two openly accessible datasets. On the evaluation set of both datasets, word level precision at the cutting edge by 2.5% & 1.3%, respectively.

Downloads

Download data is not yet available.

Downloads

Published

2023-07-30

How to Cite

[1]
K. . Ananth and K. V. . B, “Extraction of Handwritten Text using Word Beam Search Algorithm and Language Modeling”, ijmst, vol. 10, no. 2, pp. 2786-2795, Jul. 2023.