An Improved Algorithm for Optical Character Recognition using

Graphical User Interface Design

SHAHID MANZOOR1, NIMRA WAHAB1, M. K. A. AHAMED KHAN2

1Department of Electrical and Electronic Engineering,

Faculty of Engineering, Technology & Built Environment,

UCSI University, Kuala Lumpur, 56000,

MALAYSIA

2Departemnt of Mechanical Engineering,

Faculty of Engineering, Technology & Built Environment,

UCSI University, Kuala Lumpur, 56000,

MALAYSIA

Abstract: - Since the COVID-19 pandemic, numerous jobs have become necessary, including the storing and

sharing of printed material across computers. One simple way to save data from printed papers to a computer

system is to scan them first and then save them as images. However, it would be quite challenging to extract or

query text or other information from these photo files to reuse this information. As a result, a method for

automatically retrieving and storing information, particularly text, from picture files is required. Optical

character recognition (OCR) is an ongoing research topic that aims to create a computer system capable of

extracting and processing text from images. To accomplish successful automation, certain significant problems

must be identified and addressed. The font properties of characters in paper documents, as well as image

quality, are only a few of the latest problems. Characters may not be recognized correctly by the computer

system because of many complexities. So, in this study, authors look into OCR in four different contexts and

apply them to get our results. However, every OCR is further followed by these two steps. First, a

comprehensive explanation of the challenges that may develop during the OCR phases is provided. The key

phases of an OCR system are then executed, including pre-processing, segmentation, normalization, feature

extraction, classification, and post-processing. It can be used with deep learning software to provide OCR data

which is very useful for robotic and AI applications.

Key-Words: - Optical Character Recognition, Image Processing, Feature extraction, Segmentation, AI, RGB,

Deep learning, Automatic number plate recognition, Graphical user interface.

Received: July 20, 2022. Revised: October 24, 2023. Accepted: November 27, 2023. Published: December 31, 2023.

1 Introduction

The basic or minimum requirements for data

storage can be accomplished by paper files. You

may store critical files in hand by reach and if well

cataloged you’ll find what you are looking for with

ease. But there is a major catch to this, just like

everything else around us, [1], [2], [3]. There are

several reasons why data should not be stored on

paper. Whether we're addressing privacy, the

environment, or efficiency. This is where Optical

Character Recognition comes into play, [4], [5].

OCR is an auspicious technique for converting

handwritten letters or phrases into a digital version,

[6], [7] as shown in Figure 1. It is a typical

technique of digitizing printed texts so that they

may be altered, searched, saved more compactly,

and shown on the internet, [8].

Optical character recognition is one of the most

intriguing and demanding topics in pattern

recognition in the electronic era. This recognition

system has evolved to include not just printed and

handwritten characters, but even offline characters,

[9], [10], [11]. Steps involved in a recognition of a

system for OCR are crucial. Aside from that, to

attain a good- prognosis, a recognition system relies

heavily on a well-defined feature extraction method

and a robust classifier, [12], [13], [14].

Presently, there is a great deal of interest in

transferring the data contained in these paper records

to a PC storage drive and then reusing this data via

a searching procedure, [15], [16]. The image record

can be composed by hand and analyzed by

photograph, [17].

WSEAS TRANSACTIONS on SIGNAL PROCESSING

DOI: 10.37394/232014.2023.19.20

Shahid Manzoor, Nimra Wahab, M. K. A. Ahamed Khan

E-ISSN: 2224-3488

192

Volume 19, 2023

It translates pictures into an indisputable

machine-encoded editable substance, [9], [10]. It

sees simply those characters for which the system

has been arranged using an unequivocal course of

action calculation. There are numerous algorithms

used and tried for English transcribed content, [11],

[12], [14]].

Fig. 1: Representation of OCR

In this paper, we investigate four OCR with

different features of extraction, segmentation, and

pre and post-image processing methods. First, we

investigate the difficulties that may arise during the

OCR phases. Furthermore, we create a graphical

user interface (GUI) for the OCR system so that it

can be applied to various processing and application

tasks. These responsibilities include pre-processing,

segmentation, normalization, feature extraction,

classification, and post-processing. The findings

demonstrate that the technology used in this paper

was able to successfully retrieve data from both

images and text. The literature review can be found

in the coming section. Afterword, the System

Design is discussed, followed by

the "Methodology", and "Results Discussion"

sections. Then finally the results are concluded

which shows the layout of the system.

2 Literature Review

2.1 History of OCR

It appears that the first Optical Character

Recognition device was developed in the late 1920s

as presented in Figure 2 by the Austrian engineer

Gustav Tauschek (1899-1945), who received a

Reading Machine patent in Germany in 1929,

followed by Paul Handel, who received a Statistical

Machine patent in the United States. A text-

containing image was passed before the window of

the reading machine. The comparison device was a

rotating disc or wheel with letter-shaped holes that

had been housed within the objective lens. When the

shape of the images and letter-shaped holes

matched, the clockwork rotated the printing drum to

the corresponding letter and printed it, [8]. The

machine is shown in Figure 2.

Fig. 2: The first machine invented for OCR

2.2 Isolated Versus Cursive

In languages with independent content, the

characters do not combine. In cursive script,

however, the situation is just the opposite. In

cursive script, adjacent characters in words not only

join together but also alter their shape based on

their position, [9]. This adds a great deal of

complexity and complication to the recognition

process, necessitating an additional level of division

to separate the characters in each word before

proceeding. This may not be difficult to accomplish

for various languages, therefore segmentation-free

approaches are offered on multiple occasions. The

segmentation-free approach aims to comprehend

the word without separating it into individual

letters.

2.3 Offline Vs Online

Off-line text recognition is attempting to recognize

text that has previously been acquired in the form

of pictures. A digital device such as a camera or

scanner is often used to scan text data into the

computer. Online recognition refers to real-time

recognition while the user moves a pen

to write anything. Text input through online

recognition needed specialized gear, such as a

tablet and a stylus. Online identification is regarded

as less complicated because temporal information

from pen traces, such as its speed and velocity of

use, how pen lifts and what writing order, etc., is

readily available, [8]. The first sort of system is one

with a restricted vocabulary, such as postal address

recognition and geographic names as represented in

Figure 3.

WSEAS TRANSACTIONS on SIGNAL PROCESSING

DOI: 10.37394/232014.2023.19.20

Shahid Manzoor, Nimra Wahab, M. K. A. Ahamed Khan

E-ISSN: 2224-3488

193

Volume 19, 2023

Fig. 3: Flow chart for online vs offline recognition

2.4 Street View House Numbers SHVN

Street View House Numbers is a certified image

dataset that may be used to create and build AI and

article recognition algorithms and computations

with little data and information processing,

formatting, and arranging requirements. This is an

informative index of house numbers derived from

Google Street View, as the name implies and

presented in Figure 4. The job complexity is in the

middle. The numbers come in a variety of forms,

fonts, and composing styles, but each home number

is positioned in the center of the view, so placement

is not required. One disadvantage of SVHN is that

the pictures are not high resolution and may be

arranged in an unusual manner, [9], [10].

Fig. 4: Portrayal of OCR works on House Numbers

2.5 CAPTCHA

Since the internet is full of machines, vision tasks,

primarily text decoding or CAPTCHA are a very

popular way to tell them apart from humans and

robots. Many of these documents are jumbled and

twisted, making it difficult for a machine to

decipher them as observed from Figure 5. It is

doubtful if the creators or researchers of the

CAPTCHA anticipated advancements in computer

vision, but most texts are not difficult to answer

nowadays, [11].

Fig. 5: Captcha to verify between Human and Robot

2.6 License Plates

Another challenge for OCR is License Plate

Recognition, it isn't very hard but as to surveys it is a

useful practice. As with most of the OCR jobs and

tasks, this one involves detecting the license plate

and identifying the characters on it as presented in

Figure 6. Since the outline of the plate is relatively

stable, some approaches rely on a basic reshaping

process before identifying the process.

Fig. 6: License plate recognition

3 System Design

3.1 Picture Acquisition

Image acquisition is the act of converting a physical

image into a digital format that can be controlled by

computers. There are a variety of image sources for

collecting pictures, such as scanning through a

device or photographing them with an attached

camera. The source image can be of different natures

such as hand-written documents, printed documents,

typewritten documents, etc.

3.2 Refining

After image acquisition processing or refining

objects is to improve the nature or the quality of the

WSEAS TRANSACTIONS on SIGNAL PROCESSING

DOI: 10.37394/232014.2023.19.20

Shahid Manzoor, Nimra Wahab, M. K. A. Ahamed Khan

E-ISSN: 2224-3488

194

Volume 19, 2023

process. One of the refining process techniques is

thresholding which aims to parallel the picture in

light of some limited esteem. Various types of

channels and filters can be applied such as min

filter, max filter, averaging, etc. Other than that more

unique activities like morphological operations,

erosion, dilation, opening, and closing too can be

performed.

3.3 Thresholding

Typically, the picture obtained from an examination

or other acquisition procedure is in RGB or Indexed

format. Such images contain color information for

each pixel; for instance, RGB images contain RED,

GREEN, and BLUE color values within the range 0-

255 for each pixel, in addition to the pixel's intensity

value. Once the images have been converted to

grayscale, they contain data that has minimal

influence on the identification process but adds

unnecessary complication, [3]. This grayscale image

is turned into a bi-level image to detect whether a

pixel belongs to the foreground or background. Each

pixel in a bi-level picture has just one of two

potential values, either 1 or 0.

3.4 De-skewing

When text lines get tilted as a result of poor

photocopying and scanning, the paper introduces

skewness in the document. Preprocessing may also

provide skewness correction which means taking

care of adjusting the errors. This is referred to as de-

skewing. Since a distorted text may not be

adequately processed during the segmentation

period, this de-skewing phase is critical. Another

related concern is detecting orientation. It aids in the

de-skewing of the data.

3.5 Feature Extraction

After isolating characters, the next step is to obtain

as much detail as possible from each one to

distinguish it from the others. Figure 7 represents

feature extraction as another term for all this. All of

the character's attributes are filtered out in this

process. Four types of features are extracted mainly

during the OCR process namely: Structural,

Statistical, transformations, and matching of the

basic templates and Correlation. Structural based

Features consist of curves, endpoints, intersection

points, zigzags, dots, loops, strokes. Other

information such as direction, slope of the loops,

and length can also be extracted. Statistical Features

include pixel densities, zoning, crossing, moments,

Fourier descriptors, and moments which are

numerical indices of image regions. In Global

transformations, the transformation transforms the

pattern’s pixel representation to a more abstract and

compact shape.

We begin by normalizing all binary character

images to a N×N matrix while keeping the original

aspect ratio in mind. N is set to 60 in our example.

We use the two sorts of characteristics that are

given. Zones criteria are used for the first set of

characteristics. The input image is split into two

zones, vertical and horizontal zones and for each

zone, the density of the character pixels is

calculated.

Fig. 7: Feature Extraction

3.6 Post Processing

OCR accuracy may be enhanced with the use of

contextual analysis. The image's geometric and

document context can be used to improve accuracy.

To further enhance OCR accuracy, lexical

processing using Markov models and dictionaries

can be applied. There is also the option of using

contextual interpretation to boost OCR

effectiveness. The image's connotations of

geometry and paper can aid in the elimination of

errors.

4 Methodology

The OCR software includes the use of a graphical

user interface. We'll start by designing the user

interface. Write 'guide' in the command pane. A new

box will appear. The graphical user interface (GUI)

can be tailored to your requirements. This will assist

you with creating a template that is simple and meets

your needs. In this window, we will design our

buttons and screens for input and output. Three push

buttons of our GUI design were used: Input, Text

Detection, and Output. And two screens show the

display after the image has been chosen.

At first, all of the letters and numbers are

stored separately in a folder on our desktop for

character recognition. For practice, all of the

alphabets in capital and small letters, as well as

numbers 0-9, for training purposes are being used.

For example- M, m, N, n, etc.

WSEAS TRANSACTIONS on SIGNAL PROCESSING

DOI: 10.37394/232014.2023.19.20

Shahid Manzoor, Nimra Wahab, M. K. A. Ahamed Khan

E-ISSN: 2224-3488

195

Volume 19, 2023

Fig. 8: Flow chart for GUI Processing Design

4.1 GUI Process

Graphical User Interface (GUI) verification is the

practice of testing an application's user interface.

Menus, checkboxes, buttons, colors, fonts, sizes,

icons, text, and pictures are all part of a graphical

user interface. GUI testing is performed as indicated

in Figure 8 to ensure the functioning and usability

of design components as a user of an application

under test. Most online exchanges are not intended

to be definitive, but rather to enable and actively

support growth. GUI testing is critical to the

software's successful release since it verifies the

user experience. GUI testing contributes to the

delivery of high-quality, user-friendly software. In

the end, users have increased customer acquisition

and satisfaction.

By the design requirements and objectives

Initially, a test script was created to follow and

read all of the stages. Two windows are included in

the design below. Then proceeding GUI will read

the picture and detect each letter or number.

5 Results and Discussion

To begin, we will select an image for the folder as

an example. The picture is next subjected to the text

detection command, which recognizes each letter

separately. The first example's image reads out the

basic sentence "HOW ARE YOU?" as shown in

Figure 9. The output, as shown in the Figure 10,

reads the supplied sentence without mistake. the

code is successful, and the GUI display the correct

detected text.

Fig. 9: Text test input

Fig. 10: Coded GUI for processing image data

In the subsequent case, we'll examine license

plates. This will allow us to display both numbers

and letters at the same time. Automatic number plate

recognition (ANPR) is another term for recognizing

license plates. The system is installed at the entrance

for security management of a highly restricted

region such as military zones or areas around major

government buildings such as Parliament, the

Supreme Court, and so on. The created technology

identifies the vehicle and then captures a picture of

it.

Fig. 11: Image for identifying number plate

WSEAS TRANSACTIONS on SIGNAL PROCESSING

DOI: 10.37394/232014.2023.19.20

Shahid Manzoor, Nimra Wahab, M. K. A. Ahamed Khan

E-ISSN: 2224-3488

196

Volume 19, 2023

This concept was developed in 1976 at the

Police Scientific Development Branch in the United

Kingdom. However, with the advancement of digital

cameras and the rise in computing capability over

the previous decade, it has garnered a lot of

attention. It is the capacity to automatically extract

and recognize the characters of a car license plate

from a picture. In essence, it comprises a camera or

frame grabber capable of capturing a picture,

identifying the location of the number in the image,

and then extracting the characters for a character

recognition tool to convert the pixels into

numerically readable characters. ANPR may be

utilized for a variety of purposes, ranging from

speed enforcement and tool collection to parking lot

management, etc. It may also be used to identify

and prevent a wide range of criminal actions, as well

as to regulate security in highly restricted locations

such as military zones or regions around important

government offices.

Fig. 12: GUI processes the input image of Figure 11

To begin, the camera is connected to the PC

through Matlab®. The camera is connected through

the USB port.

Various photos of automobiles with various

colors and construction kinds are captured and

saved on a computer. During the processing, the

various impacts of the daylighting are also

evaluated. As seen in Figure 3, the pictures are in

RGB format and have a resolution of 800 600 pixels.

To detect the car number, an optical character

recognition (OCR) method is used. The cropped

picture obtained in the second phase is inverted,

which means that all white pixels are turned to black

and all black pixels to white. The lettering is now

white, while the backdrop of the plate is black.

Individual lines in the text are separated using the

line separation technique before using OCR.

Number plate recognition is done for two car

number plates as shown in Fig. 11, stationary

position, and fig.13, moving car. The results of GUI

processing are shown in Fig. 12 and Fig. 14 for

successful extraction of the number plate.

Fig. 13: Moving car input number plate picture

Fig. 14: GUI processes the image input of Fig. 13

The line separation sums the pixel values of

each row. If the resultant sum of the row is zero, the

row contains no text pixels; if the consequent sum of

the row is more than zero, the row contains text. The

line begins with the first resultant sum greater than

zero, and ends with the first resultant sum equal to

zero. The start and end values of the line are used to

crop the first line of text. A similar method is

utilized to divide the second line of text. After

separating the lines in an extracted car license plate,

the column-by-column line separation process is

done to separate the individual characters. The

separated characters are then stored in separate

variables. Currently, OCR is applied to compare

each character to the whole alphabetic database.

6 Conclusions and Further Work

Given the fact that several algorithms, methods, and

strategies for optical character recognition in scene

images have been developed, there are not enough

literature reviews on this subject. We presented an

WSEAS TRANSACTIONS on SIGNAL PROCESSING

DOI: 10.37394/232014.2023.19.20

Shahid Manzoor, Nimra Wahab, M. K. A. Ahamed Khan

E-ISSN: 2224-3488

197

Volume 19, 2023

arrangement of various approaches, algorithms, and

strategies in this work. It is envisioned that this

comprehensive assessment would give insight into

the ideas involved and, maybe, stimulate future

progress in the field. To begin, we discussed the

major challenges of OCR, followed by a detailed

discussion of the main important phases,

architecture, proposed algorithms, and techniques of

OCR. We emphasize that when designing any

application related to OCR, one must pay close

attention to each phase to achieve a highly accurate

character recognition rate. However, we are unable

to provide detailed methods for each phase since

they are dependent on datasets, application

particulars, and parameter specifications. Finally,

important OCR applications and a brief OCR history

are presented.

Although state-of-the-art OCR allows for high-

accuracy text recognition, we believe that OCR has

many more useful uses. We want to employ OCR

in the future for such practical applications for

everyday personal usage. We intend to combine

mobile devices with OCR in a single OCR solution.

Some of our upcoming OCR-based applications

include an automatic book reader and a receipt

tracker, [15], [17]. OCR is no longer just matching

or seeing, with new technology of deep learning it is

now entered into a new phase and it can recognize

the text after scan and then convert it to different

meaning in full applications[15]. With deep learning

software, it provides more robust extraction of

information and high-quality insight and it also can

be used in the robotics field and integrated for

artificial intelligence applications like Chat-GPT,

[16], [17].

References:

[1] Satti, D. A., Offline Urdu Nastaliq OCR for

Printed Text using Analytical Approach. MS

thesis report Quaid- i- Azam University:

Islamabad, Pakistan. 2013, [Online].

https://www.cle.org.pk/research/theses.htm

(Accessed Date: January 24, 2023).

[2] Mahmoud, S.A., & Al-Badr, B., Survey and

bibliography of Arabic optical text

recognition. Signal processing, 41(1), 2017,

pp 49-77.

[3] Bhansali, M., & Kumar, P, An Alternative

Method for Facilitating Cheque Clearance

Using Smart Phones Application.

International Journal of Application or

Innovation in Engineering & Management

(IJAIEM), 2(1), 2018, pp 211-217.

[4] Qadri, M.T., & Asif, M, Automatic Number

Plate Recognition System for Vehicle

Identification Using Optical Character

Recognition presented at International IEEE

Conference on Education Technology and

Computer, Singapore, 2019. Singapore:

[5] A.S. Abutaleb. Automatic thresholding of

gray-level pictures using two-dimensional

entropy. Computer Vision, Graphics, and

Image Processing, 47 (1): pp 22–32, 2014.

[6] A.M. AL-Shatnawi, K. Omar, and A.M. Zeki.

Challenges in thinning of Arabic text. In

ICGST International Conference on Artificial

Intelligence and Machine Learning (AIML-

11), Dubai. United Arab Emirates, pages 127–

133, 2016.

[7] Lazaro, J., Martín, J.L, Arias, J., Astarloa, A.,

& Cuadrado, C, Neuro semantic thresholding

using OCR software for high precision OCR

applications. Image and Vision Computing,

28(4), 2016, pp.571-578.

[8] Lund, W.B., Kennard, D.J., & Ringger, E.K.,

Combining Multiple Thresholding

Binarization Values to Improve OCR Output

presented in Document Recognition and

Retrieval XX Conference, California, USA,

2015. USA: SPIE.

[9] Shaikh, N.A., & Shaikh, Z.A, A generalized

thinning algorithm for cursive and non-cursive

language scripts presented in 9th International

Multitopic Conference IEEE INMIC, Pakistan,

2015. Pakistan.

[10] Shaikh, N.A., Shaikh, Z.A., & Ali, G,

Segmentation of Arabic text into characters

for recognition presented in International

Multi Topic Conference, IMTIC, Jamshoro,

Pakistan, 2018.

[11] A.P. Dempster, N.M. Laird, and D.B. Rubin.

Maximum likelihood from incomplete data via

the em algorithm. Journal of the Royal

Statistical Society. Series B (Methodological),

pages 1–38, 2019.

[12] Mochurad, L. “A New Approach for Text

Recognition on a Video CARD.” Computer

systems and information technologies, 2022,

https://doi.org/10.31891/csit-2022-3-3.

[13] Karthick, K., Ravindrakumar, K.B., Francis,

R., & Ilankannan, S., Steps Involved in Test

Recognition and Recent Research in OCR ; A

Study, 2019, Corpus ID: 212448703.

[14] Faizullah, S., Ayub, M.S., Hussain, S., &

Khan, M.A., A Survey of OCR in Arabic

Language: Applications, Techniques, and

Challenges. Applied Sciences, 2023, 13(7),

4584; https://doi.org/10.3390/app13074584.

[15] Adyanthaya, S.K., Text Recognition from

WSEAS TRANSACTIONS on SIGNAL PROCESSING

DOI: 10.37394/232014.2023.19.20

Shahid Manzoor, Nimra Wahab, M. K. A. Ahamed Khan

E-ISSN: 2224-3488

198

Volume 19, 2023

Images: A Study, International of engineering

research and technology (IJERT), NCCDS

2020, vol.8, Issue 13, pp. 118-120.

[16] Mittal, R., & Garg, A.. Text extraction using

OCR: A Systematic Review. 2020 Second

International Conference on Inventive

Research in Computing Applications

(ICIRCA), 357-362.

[17] Pal, M., & Santra, D,. Handwritten Character

Recognition. SSRN Electronic Journal, 2023,

https://dx.doi.org/10.2139/ssrn.4539983.

Contribution of Individual Author to the

Creation of a Scientific Article

- Shahid Manzoor provided the main idea and

designed the methodology and formulation of the

project. Finalized the simulation results using

MATLAB®.

- Nimra Wahab executes the design and constructs

the main MATLAB® codes and completes the

GUI design.

- M. K. A. Ahmed Khan was involved in

constructive discussions and formulation of ideas

and analysis throughout the project.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

No funding was received for conducting this study.

Conflict of Interest

The authors have no conflicts of interest to declare.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US

WSEAS TRANSACTIONS on SIGNAL PROCESSING

DOI: 10.37394/232014.2023.19.20

Shahid Manzoor, Nimra Wahab, M. K. A. Ahamed Khan

E-ISSN: 2224-3488

199

Volume 19, 2023