An approach To develop an optical character recognizer for printed sinhala text using C#

Karunasiri, M. G. P.

An approach To develop an optical character recognizer for printed sinhala text using C#

dc.contributor.author	Karunasiri, M. G. P.
dc.date.accessioned	2024-04-26T09:20:22Z
dc.date.available	2024-04-26T09:20:22Z
dc.date.issued	2014
dc.description.abstract	This paper describes an effort of developing a character recognizer for Sinhala printed text using C#. Many approaches have been made before to develop a character recognizer for Sinhala printed text by using mathlab as a tool which has all the image processing and classification methods so that it solves many of the implementation issues by the software itself that can arise during the development. When come to practical implementation of an OCR in a commercially developed application, all these implementation issues must be highlighted. But most of the previous works failed to highlight these implementation issues . C#.NET is a commercial framework developed by Microsoft Corporation which provides a bunch of built in methods plus many of the third party classes which are freely and commercially available and it is one of the widely used frameworks in the present IT industry. AForge.NET is a third party framework for C# which provides many of the image processing and AI related functionalities that can be used effortlessly. Digitizing of image involves several steps mainly preprocessing, classification and recognition . Many of the preprocessing methods are tested using AForge built in methods and some of the C# built in methods. Most of the implementation issues that can occur in implementing a digitization algorithm are highlight. Classification is based by taking decimal equivalent numbers for each pixel lines by treating it as binary numbers in row wise and column wise. All the characters in the alphabet are grouped in to four groups in the preliminary classification which yields a recognition rate over 90% of accuracy. And then each group is separately analyzed for further classification to recognize each individual character separately as the secondary classification which yields 70% of accuracy for fonts FS Sumudu, FM Derana and MI Ridhma. Fonts like DL Araliya, DL Kusumi, DL anupama and DL Paras yealds an accuracy less than 70% . A hidden markov model based alphabet training approach is used to give further insight for the secondary classification.
dc.identifier.uri	https://ir.lib.pdn.ac.lk/handle/20.500.14444/399
dc.language.iso	en_US
dc.publisher	University of Peradeniya
dc.subject	C#
dc.subject	Optical character recognizer
dc.subject	OCR
dc.title	An approach To develop an optical character recognizer for printed sinhala text using C#
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Karunasiri 2014.pdf
Size:: 444.29 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

Master