Albanian Handwritten Text Recognition using Synthetic Datasets and

Pre-Trained Models

HAKIK PACI*, DORIAN MINAROLLI, EVIS TRANDAFILI, STELA PATURRI

Computer Engineering Department,

Polytechnic University of Tirana,

Bul. “Dëshmorët e Kombit”, “Nënë Tereza", sheshi nr. 4, Tirana

ALBANIA

*Corresponding Author

Abstract: - Handwritten Text Recognition (HTR) has continuously attracted the focus of researchers to enable

the integration of technology into our daily lives. Handwritten text recognition (HTR), a technology of

considerable importance, takes a leading role in the analysis and digitization of various documents. This

technology is important in facilitating the efficient use of handwritten documents, especially within academic,

historical, and cultural contexts. The use of artificial intelligence in handwriting recognition offers a very good

opportunity to achieve satisfactory results in this field, but to achieve good results a large dataset is needed.

Creating a large dataset to train different AI models is a challenge for languages with limited resources such as

the Albanian language. This paper aims to present a novel approach to the development of an HTR system for

the Albanian language using an attention-based encoder-decoder architecture. The dataset used in the

experiments is a synthetic dataset generated using deep learning techniques based on the English language

dataset as they are both variants of the Latin alphabet. We enhanced the dataset with two letters specific to

Albanian, (“ë” and “ç”). The usage of pre-trained English models for handwriting recognition improved our

model’s performance. The results of the experiments are very promising and prove that our approach is

efficient in recognizing handwriting in the Albanian language. This shows that the attention-based encoder-

decoder architecture can be adopted for different languages with limited resources.

Key-Words: - HTR (Handwritten Text Recognition), Albanian language, Synthetic dataset, HTR Models,

Machine learning, Deep learning.

Received: June 29, 2023. Revised: February 16, 2024. Accepted: April 9, 2024. Published: May 15, 2024.

1 Introduction

According to the “Ethnologue Guide” in our world,

there are more than 7,000 languages including

dialects, and some of them are used by a few people

and only 23 languages are used by half of the

population of the world. This means that some of

those languages in the long term are going to be

forgotten.

The Albanian language is one of the oldest

languages in the world and the only surviving

representative of the Albanoid branch, which

belongs to the Paleo-Balkan group, [1]. From a

grammatical perspective, it has major differences

compared to the other European languages. The

current Albanian alphabet has 36 letters and is based

on the Latin alphabet with the addition of letters ë,

ç, and nine digraph letters: “dh”, “gj”, “ll”, “nj”,

“rr”, “th”, “sh”, “xh”, and “zh”.

Handwritten documents are a valuable resource

of information especially when trying to make

handwritten content like manuscripts, personal

correspondences, legal documents, and scientific

studies stored in archives, accessible and usable by

NLP systems.

Handwritten text recognition (HTR) is a

technique and ability of a computer to read data

from paper documents written by hand, [2].

The process of extraction of digital data from

paper documents can be achieved using Optical

Character Recognition (OCR) and Intelligent

character recognition (ICR) techniques, [3], [4]. The

OCR technique usually is used when the text is

printed and is a well-established technology. The

ICR is used to convert/extract data from images of

handwritten texts, and it is more complex than OCR

technology because it can also detect and recognize

different handwriting styles. Both technologies

focus on the recognition of individual characters, [4]

and they do not check if the generated characters are

real words in a linguistic and semantic context.

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2024.21.25

Hakik Paci, Dorian Minarolli,

Evis Trandafili, Stela Paturri

E-ISSN: 2224-3402

264

Volume 21, 2024

The technology for interpreting handwritten text

(not only the set of characters or words) into

machine-readable data has become increasingly

important in the current Artificial Intelligence era.

The recognition of handwritten text has always

attracted researchers to work on real-life

applications in healthcare, banking, insurance,

online libraries, etc. However, for the Albanian

language, there is a significant gap in this research

field. As far as we know, there is no publicly

available system, nor a dataset for supporting the

recognition of handwritten text in Albanian.

Nevertheless, we must notice that several efforts

have been made to address the handwriting

recognition problem from an NLP (Natural

Language Processing) perspective. The first

attempts to deal with HTR were based on Hidden

Markov Models, [5], dating back 10 to 15 years ago.

Recent advancements in neural networks have

shifted the focus towards the architectures based on

neural networks which excel with sequential data.

Due to the infinite human writing styles, the

implementation of a Handwritten Text Recognition

system is laborious, and requires a very large

amount of training data, often resulting in poor

performance. In this work, we are looking to

modestly address this gap by proposing an HTR

system specifically designed for the Albanian

language and assessing and comparing its accuracy

and efficiency with state-of-the-art data. We aim to

contribute to the development of an HTR system for

the processing of Albanian texts by introducing a

novel approach, employing an encoder-decoder

architecture. For the creation of our Albanian HTR

model, we relied on the paper [2], who proposed an

attention-based sequence-to-sequence model for

English with an encoder composed of ResNet for

feature extraction and bidirectional LSTM for

sequence modeling.

There are no contributions to HTR in Albanian

therefore it was not possible to employ an existing

dataset of images of handwritten text prepared to

train our HTR model. To address this gap, we

generated a synthetic dataset through a deep

learning model able to replicate human handwriting

in different styles. Furthermore, the training process

was sped up by employing a transfer learning

technique with several modifications to adapt the

model to the peculiarities of the Albanian language.

The utilization of pre-trained English models proved

to be an effective solution in scenarios where there

is a limited availability of training data, providing

an enhanced performance for HTR models designed

for low-resource languages.

The paper is organized as follows: Section 2

covers the proposed approaches; Section 3

investigates the synthetic generation of the Albanian

dataset used to train the HTR model; Section 4

analyses the performance and evaluates the

effectiveness of the synthetic dataset in the

Attention HTR model; and finally, we conclude in

Section 5.

2 Proposed Approaches

In this section, we will explain the architecture of

the Attention HTR model. There are three main

components of the model: ResNet, bidirectional-

LSTM, and Transformer.

- Residual Network (ResNet) is a breakthrough

in the field of deep learning, especially in

Convolutional Neural Networks. The seminal

paper entitled "Deep Residual Learning for

Image Recognition", [6], [7] addressed the

challenges faced by deep neural network

training—the vanishing gradients, which

become an issue when the network depth

increases. The residual blocks in ResNet allow

deeper networks to train without the problem of

the vanishing gradient. This can be achieved by

using the output of the previous layer through a

skip connection as an input to a new layer. This

technique enables the network to learn better

residual functions, concerning the input layer,

and, therefore, facilitates better training of deep

networks. In the HTR problem, we use ResNet

in the feature extraction phase. The feature

extraction phase is a very important and critical

step in text recognition and transcription of

HTR because handwritten text is quite complex

and different from a person’s style of writing.

Therefore, a network’s ability to learn the

needed features and encode those in the input

data helps in the performance of the HTR.

- LSTM is an RNN type but with memory cells

that can store or retrieve information over long

sequences. The memory cells enable the model

to capture and remember, two characteristic

features of sequential data. As such, LSTMs are

a great option for this type of data processing,

be it NLP or HTR, [8]. BiLSTM: BiLSTM is an

extension of the LSTM that introduces

bidirectional processing, [9]. The BiLSTMs can

capture both past and future contextual

information in a sequence because they can

process the input sequence in two directions,

forward and reward. Thus, with BiLSTM in

HTR, a model may not only recognize

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2024.21.25

Hakik Paci, Dorian Minarolli,

Evis Trandafili, Stela Paturri

E-ISSN: 2224-3402

265

Volume 21, 2024

characters/words but grasp the relationships

between them within a sentence or paragraph.

This is important because many people’s

writings are context-dependent.

- The Transformer architecture was initially

developed for machine translation tasks, its

ability to address multiple challenges has

transformed it into the model, [10], [11] used by

most NLP applications. Transformer uses self-

attention to define the relevance of different

sections of the input sequence to the prediction.

Instead of processing data in series, other

sequence models do, including RNNs and

LSTMs, [12], [13], [14]. As a result, for training

and inference, allowing the Transformer to

reach faster and higher efficiency, the model’s

speed is drastically increased. The Transformer

architecture is utilized in HTR with both word

and character-level options. It processes

sequence data from images of handwritten text,

effectively modeling context and dependencies

between characters or words, [15], [16]. This

contextual understanding is especially important

in cases involving complex handwriting styles

or context-dependent character variations.

The Attention HTR model, [2], that we

employed in for our Albanian HTR, uses the

attention encoder-decoder architecture for

distinguishing human handwriting. To decompose

and summarize the main elements of words, and

feature extraction, the model relies on ResNet,

whereas for sequence modeling it is based on

bidirectional LSTM [9], [17], [18] and for making

accurate predictions on content-based attention

mechanism. To address the challenge presented by

the limited amount of data available in the Albanian

language dataset, we applied transfer learning

techniques within the model, [2]. Consequently, the

system pre-trained using the deep-learning model

Attention HTR will enable us to train the system

using an Albanian language dataset.

A pre-trained HTR model is adjusted and

adapted to the specifics of the Albanian language

and can be used to compare the performance with

state-of-the-art models in the field of HTR.

The archiving of good results means a big step in

the domain of HTR for the Albanian language.

3 Dataset

To train the HTR model it is necessary to provide a

dataset that contains different images of handwritten

texts. Finding a dataset with images of handwritten

texts in the Albanian language wasn’t possible

because the number of studies in HTR is limited. To

train a HTR model for this purpose we decided to

use a synthetic dataset, [19], [20]. The dataset was

generated by a deep learning technique developed

for the English language. This technique can be

modified, [21], for the Albanian language because

almost all the letters in the Albanian language are

the same as the letters of the English alphabet

except for two letters, ‘ë’ and ‘ç’, [22]. To make the

dataset more diverse, we generated it using six

different handwriting styles.

The dataset's primary goal is to support the

implementation of the Albanian language,

accommodating unique characters ‘ë’ and ‘ç’, which

differentiate Albanian from English. Precisely,

positioning these characters in word images required

image processing techniques, [20]. To attain this

objective, we placed diacritical marks over ‘e’ as

shown in Figure 1, and a line under ‘c’ as shown in

Figure 2. By accurately positioning the mentioned

two characters in the images processed by our

synthetic writing model, we created a dataset of

more than 12,000 Albanian words, prepared for

training, testing, and validation purposes, [23]. The

distribution graph of those word counts per number

of letters is shown in Figure 3.

Fig. 1: Sample Word Generated by Synthetic Model

with ‘ë’

Fig. 2: Sample Word Generated by Synthetic Model

with ‘ç’

Fig. 3: Word count per number of letters

As shown in the graph in Figure 3, words with

lengths from two to six are more widespread in the

generated dataset. Words with two characters are

usually connectors in the Albanian language and

contain the letter 'ë' in most cases. For this reason,

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2024.21.25

Hakik Paci, Dorian Minarolli,

Evis Trandafili, Stela Paturri

E-ISSN: 2224-3402

266

Volume 21, 2024

these words are also a good way of adapting the

letter 'ë' in the training of the system.

During the generation of images of this dataset in

the Albanian language we developed an algorithm

to correct the diacritical points on the letter 'ë' and

the inclusion of a line distinctive below the letter 'ç'.

The algorithm calculates the coordinates of the

location of each 'ë' and 'ç', [24]. Then it modifies the

letters 'e' and 'c' because those are the letters that are

different from the English alphabet which the

algorithm uses to generate the image of handwriting.

Figure 4 illustrates the calculation of the

coordinates where it is necessary to modify the

image to convert the letter 'e' to look like 'ë'.

Fig. 4: Location and offset calculation for the ‘ë’

letter

4 Experimental Evaluation

The focus of this section is the evaluation of the

effectiveness of the synthetic dataset within the

modified Attention HTR, [15], [25] model designed

for the symbol system of the Albanian language. We

performed training and validation using three

different datasets employing a case-sensitive model.

Using those different datasets, we evaluate the

accuracy rate, and compare the performances of

each experiment.

The first dataset employed is the dataset

generated using the deep-learning model, [26],

adapted explicitly for the generation of word images

in the Albanian language. In this dataset, we used

approximately 2,000 words, [19], and using six

different handwriting styles we generated a dataset

with more than 10,000 records.

The second dataset is generated with

approximately 16,000 unique words. The images of

words are written in six different handwriting styles

and this dataset offers the possibility to test the

performance for lexical variety.

The third dataset is generated from the second

dataset and the existing English IAM dataset, [15].

The selection of words from the dataset is random

and contains 25% of its data. In this way, we

generate a dataset with two linguistic areas, the

Albanian and English IAM.

The evaluation of results provides accuracy

analysis based on the formula as follows:

  

  (1)

4.1 Attention HTR Model Trained with

Different Size Datasets

The synthetic two datasets generated by deep

learning with six different handwriting styles, [27],

[28] are used in this experiment. This experiment

aims to investigate the effects of dataset size and

diversity of the words in the dataset on the system’s

performance. Both datasets encompass six distinct

handwriting styles, each utilized across different

paragraphs within the datasets. The 10,000-word

dataset includes Albanian literary excerpts, while

the second dataset comprises alternative words

sourced, [29], from additional Albanian literature

works. We trained the HTR model with each of

these datasets and the results are shown in Table 1.

The outcomes from training the attention HTR

model with these datasets underscore the same

pattern: the larger the training dataset, the more

effectively the system performs during training.

Table 1. Performance comparison between datasets

of different sizes

Dataset size

Train accuracy

10.000

83.2%

16.000

92.4%

4.2 Training the Attention HTR Model with

Multilingual Datasets

In the second experiment, we used the entire dataset,

consisting of 16,000 Albanian word instances,

alongside the IAM dataset, [15], which includes

English language instances. Given the dataset's

volume, we randomly selected a subset of around

25,000-word instances for further analysis, [30].

The testing was performed on 30% of the data in

both datasets. To evaluate the performance of the

Albanian model with non-synthetic images, we

tested it with both Albanian and English language

test datasets.

The results shown in Table 2, present higher

accuracy in English language text validation due to

the size of the dataset and customized network

weights. Nevertheless, the Albanian language model

performed well, considering dataset limitations and

training.

Regarding language testing, the English test

dataset, comprising about 7,500 examples, produced

satisfactory results. In contrast, the Albanian test

dataset with 1,300 examples demonstrated a higher

score than English.

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2024.21.25

Hakik Paci, Dorian Minarolli,

Evis Trandafili, Stela Paturri

E-ISSN: 2224-3402

267

Volume 21, 2024

Additionally, cross-testing with the opposite

language datasets revealed challenges. English

model couldn't train with the dataset created in

Albanian by 16,000 words due to the presence of the

distinct characters ‘ë’ and ‘ç’. Conversely,

evaluating the Albanian model's performance using

English data was feasible, given the subset nature of

English characters within the Albanian language.

However, these assessments yielded suboptimal

outcomes. The model, primarily trained on synthetic

images, struggled to handle human handwriting

images from the English test cases and leaned

towards recalibrating for synthetic handwriting. This

highlights a key challenge addressed in the

following experiment.

Table 2. Comparison of Albanian and English

Language Datasets

Dataset

Train

accuracy

Test

accuracy

with

Albanian

test cases

Test

accuracy

with English

test cases

Albanian

Language

92.4%

94.7%

2.7%

English

Language

97.5%

83.1%

4.3 Training the Attention HTR Model with

the Hybrid and Albanian Language

Datasets

In this experiment, we addressed the issue identified

in the previous study where the system trained with

the Albanian language dataset, had difficulty

accommodating the human handwriting images

from the English model. To resolve this, [21], [24],

[31], we explored creating a combined dataset that

includes both Albanian and English. This approach

allows the model to learn both languages, not only

from synthetic images but also from real

handwritten data. The results from training the

model with a hybrid dataset, alongside the results

from the model trained solely with the Albanian

dataset, are presented in Table 3.

The results unambiguously reveal the enhanced

performance of the hybrid model in predicting

handwritten English words, [32], with a notable

66% improvement when compared to the

performance of the Albanian language dataset in

tests involving human handwriting images.

Furthermore, the hybrid model achieves

commendable accuracy in synthetic Albanian

language tests.

After the analyses of data in the tests the

performance decrease was minimal even when we

used a hybrid dataset of 22,000 training data and

about 3,100 instances of validation data.

Table 3. Comparative Analysis: Hybrid and

Albanian Language Datasets

Dataset

Train

accuracy

Test

accuracy

with

Albanian

test cases

Test

accuracy

with English

test cases

Albanian

Language

92.4%

94.7%

2.7%

Hybrid

Dataset

82.1%

93.5%

68.4%

5 Conclusions

In this paper we explored a methodology to generate

a synthetic dataset, mimicking human handwriting

for texts in the Albanian language using deep

learning techniques based on the English language.

The difference of letters between Albanian and

English alphabets is just two letters, exactly the

letters 'ë' and 'ç' and those letters can be generated

[21], [33], using the letters 'e' and 'c' by calculation

of letter position and modifying them. The dataset

trained a Handwritten Text Recognition (HTR)

system. Furthermore, the trained model has been

enhanced using a pre-trained English model, [21],

showing promising results.

One of our main aims was to test the model's

performance in various scenarios. Most importantly,

we generated a hybrid dataset, including text in

Albanian and English, aiming to develop a system

that can recognize handwritten and synthetic texts.

The results achieved from the experiments

confirmed our idea to establish an HTR system for

the Albanian language based on a model for the

English language was the right one.

Finally, we notice that our research presents

initial efforts in developing hybrid HTR systems for

different languages even if they are limited in

resources, motivating us for further work in the

field.

The developed models require optimization to be

more accurate and effective, especially with

different handwriting styles. Future work will be

focused on the usage of larger nonsynthetic datasets,

with different sources. The Albanians speak two

dialects that have the same sentence structure, the

Gheg and Tosk dialects. The first one is spoken

mostly in the north of Albania, Kosovo, and the

Albanian community in North Macedonia, and the

Tosk dialect which is spoken in the south of

Albania. The research on how to detect and analyze

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2024.21.25

Hakik Paci, Dorian Minarolli,

Evis Trandafili, Stela Paturri

E-ISSN: 2224-3402

268

Volume 21, 2024

based on dialectic elements will be part of our future

work in this field.

Further research in this field will improve

existing HTR systems, simplifying the

implementation of those systems, and increasing

their performance and accuracy rate.

References:

[1] Stefano Coretta, Josiane Riverin-Coutlée,

Enkeleida Kapia, and Stephen Nichols.

“Northern Tosk Albanian.” Journal of the

International Phonetic Association, vol.53,

Issue no. 3, pp 1122–44, 2023, DOI:

10.1017/S0025100322000044.

[2] Dmitrijs Kass and Ekta Vats, “AttentionHTR,

Handwritten Text Recognition Based on

Attention Encoder-Decoder Networks”,

Document Analysis Systems: 15th IAPR

International Workshop, DAS 2022, La

Rochelle, France, pp 507–522, DOI:

10.1007/978-3-031-06555-2_34.

[3] Ray Smith Daria Antonova Dar-Shyang Lee,

“Adapting the Tesseract Open-Source OCR

Engine for Multilingual OCR”, The

International Workshop on Multilingual OCR

(2009), Barcelona, Spain, 2009, Article No.:

1, Pages 1–8, DOI:

10.1145/1577802.1577804.

[4] Minghao Li, Tengchao Lv, Jingye Chen, Lei

Cui, Yijuan Lu, Dinei Florencio, Cha Zhang,

Zhoujun Li, Furu Wei, “TrOCR: Transformer-

based Optical Character Recognition with

Pre-trained Models”, The Thirty-Seventh

AAAI Conference on Artificial Intelligence,

Washington DC, USA, 2023, pp. 13094-

13112, DOI: 10.48550/arXiv.2109.10282

[5] Bianne-Bernard, Anne-Laure and Menasri,

Fares and Al-Hajj Mohamad, Rami and

Mokbel, Chafic and Kermorvant, Christopher

and Likforman-Sulem, Laurence, “Dynamic

and contextual information in hmm modeling

for handwritten word recognition”, IEEE

transactions on pattern analysis and machine

intelligence, vol. 33, no. 10, 2066– 2080,

2011. DOI: 10.1109/TPAMI.2011.22

[6] Kaiming He, Xiangyu Zhang, Shaoqing Ren,

Jian Sun. "Deep Residual Learning for Image

Recognition.", 2016 IEEE Conference on

Computer Vision and Pattern Recognition

(CVPR), Las Vegas, NV, USA, pp. 770-778,

2016, DOI: 10.1109/CVPR.2016.90

[7] Kartik Dutta, Praveen Krishnan, Minesh

Mathew and. Jawahar C. V, "Improving

CNN-RNN Hybrid Networks for Handwriting

Recognition," 16th International Conference

on Frontiers in Handwriting Recognition

(ICFHR), Niagara Falls, NY, USA, 2018, pp.

80-85, DOI: 10.1109/ICFHR-

2018.2018.00023.

[8] Sepp Hochreiter and Jürgen Schmidhuber,

"Long Short-Term Memory." Neural

Computation, vol. 9, pp. 1735-1780, 1997,

DOI: 10.1162/neco.1997.9.8.1735.

[9] Mike Schuster and Kuldip Paliwal,

"Bidirectional Recurrent Neural Networks."

Signal Processing, IEEE Transactions, vol.

45, pp. 2673 – 2681, 1997, DOI:

10.1109/78.650093.

[10] Ashish Vaswani, Noam Shazeer, Niki Parmar,

Jakob Uszkoreit, Llion Jones, Aidan N.

Gomez, Łukasz Kaiser and Illia Polosukhin,

"Attention Is All You Need.", 31st

Conference on Neural Information Processing

Systems (NIPS 2017), Long Beach, CA, USA,

2017, DOI: 10.48550/arXiv.1706.03762.

[11] Alex Graves, "Generating Sequences with

Recurrent Neural Networks", ArXiv, vol.

abs/1308.0850, 2014.

[12] Karen Simonyan, Andrew Zisserman “Very

Deep Convolutional Networks for Large-

Scale Image Recognition”, 3rd International

Conference on Learning Representations,

{ICLR} 2015, San Diego, CA, USA, 2015,

abs/1409.1556.

[13] Tao Wang, David J. Wu, Adam Coates,

Andrew Y. Ng. “End-to-End Text

Recognition with Convolutional Neural

Networks”, 21st International Conference on

Pattern Recognition (ICPR2012), Tsukuba,

Japan, 2012, pp. 3304-3308.

[14] Rakesh Kumar Mandal, N. R. Manna,

"Handwritten English Character Recognition

Using Column-wise Segmentation of Image

Matrix (CSIM)", WSEAS Transactions on

Computers, vol. 11, pp.148-158, 2012.

[15] Urs Victor Marti and H. Bunke, “The iam-

database: an English sentence database for

offline handwriting recognition”.

International Journal on Document Analysis

and Recognition vol. 5, no. 1, pp. 39–46,

2002, DOI:10.1007/s100320200071.

[16] Aiquan Yuan, Gang Bai, Lijing Jiao, and

Yajie Liu, “Offline handwritten English

character recognition based on convolutional

neural network”, 10th IAPR International

Workshop on Document Analysis Systems,

DAS 2012, Washington, DC United States

2012. DOI: 10.1109/DAS.2012.61.

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2024.21.25

Hakik Paci, Dorian Minarolli,

Evis Trandafili, Stela Paturri

E-ISSN: 2224-3402

269

Volume 21, 2024

[17] Ioannis Giachos, Eleni Batzaki, Evangelos C.

Papakitsos, Michail Papoutsidakis, Nikolaos

Laskaris, "Developing a Natural Language

Understanding System for Dealing with the

Sequencing Problem in Simulating Brain

Damage", WSEAS Transactions on Biology

and Biomedicine, vol. 21, pp. 138-147, 2024,

https://doi.org/10.37394/23208.2024.21.14.

[18] Feng Li, Chenxi Cui, Yashi Hu, Lingling

Wang, "Sentiment Analysis of User Comment

Text based on LSTM," WSEAS Transactions

on Signal Processing, 2023, vol. 19, pp. 19-

31,

https://doi.org/10.37394/232014.2023.19.3.

[19] Max Jaderberg, Karen Simonyan, Andrea

Vedaldi, and Andrew Zisserman. “Synthetic

data and artificial neural networks for natural

scene text recognition.”, The Workshop on

Deep Learning, NIPS, Montréal 2014, DOI:

10.48550/arXiv.1406.2227.

[20] Ankush Gupta, Andrea Vedaldi, Andrew

Zisserman, “Synthetic data for text

localization in natural images”, IEEE

Conference on Computer Vision and Pattern

Recognition, Las Vegas, NV, USA 2016, pp.

2315–2324, DOI:10.1109/CVPR.2016.254.

[21] Hoo-Chang Shin, Holger R. Roth, Mingchen

Gao, Le Lu, Ziyue Xu, Isabella Nogues,

Jianhua Yao, Daniel Mollura, and Ronald M.

Summers, “Deep Convolutional Neural

Networks for Computer-Aided Detection:

CNN Architectures, Dataset Characteristics

and Transfer Learning”, IEEE Transactions

on Medical Imaging, vol. 35, pp. 1285-1298,

2016, DOI:10.1109/TMI.2016.2528162.

[22] In-Jung Kim, and Xiaohui Xie, “Handwritten

Hangul recognition using deep convolutional

neural networks”, International Journal on

Document Analysis and Recognition (IJDAR),

vol.18, pp. 1-3, 2015, DOI:10.1007/s10032-

014-0229-4.

[23] Ali Asghar, Leghari Mehwish, Hakro Dil,

Awan Shafique, Jalbani Dr, Pakistan

Nawabshah, “A Novel Approach for Online

Sindhi Handwritten Word Recognition using

Neural Network”. Sindh University Research

Journal SURJ (Science Series), Vol. 48(1),

pp. 213-216, 2016.

[24] Yudong Liang, Jinjun Wang, Sanping Zhou,

Yihong Gong, and Namming Zheng,

“Incorporating image priors with deep

convolutional neural networks for image

super resolution”, Neurocomputing, vol. 194,

pp. 340-347, 2016, DOI:

10.1016/j.neucom.2016.02.046.

[25] I. Khandokar, Mokhtar M. Hasan, Ferda

Ernawan, Saiful Islam, and Muhammad

Nomani Kabir, “Handwritten Text

Recognition Using Convolutional Neural

Network”, Journal of Physics: Conference

Series, 2021, volume 1918, no. 4, DOI:

10.1088/1742-6596/1918/4/042152.

[26] Chowdhury, Arindam and Lovekesh Vig. “An

Efficient End-to-End Neural Model for

Handwritten Text Recognition.” British

Machine Vision Conference, Newcastle,

England, 2018.

[27] Ahmed El-Sawy, Mohamed Loey, Hazem EL-

Bakry, "Arabic Handwritten Characters

Recognition Using Convolutional Neural

Network," WSEAS Transactions on Computer

Research, vol. 5, pp. 11-19, 2017.

[28] Amin Al Ka’Bi, "A Proposed Artificial

Intelligence Algorithm for Development of

Higher Education", WSEAS Transactions on

Computers, vol. 22, pp. 7-12, 2023,

https://doi.org/10.37394/23205.2023.22.2.

[29] Ritesh Sarkhel, Nibaran Das, Amin K. Saha,

and Mita Nasipuri, “A multi-objective

approach towards cost effective isolated

handwritten Bangla character and digit

recognition”, Pattern Recognition, vol. 58, pp.

172-189, 2016, DOI:

10.1016/j.patcog.2016.04.010.

[30] Manmatha, R. and Srimal, N., n.d. “Scale

Space Technique for Word Segmentation in

Handwritten Documents”. Lecture Notes in

Computer Science, vol 1682, pp. 22–33,

Greece 1999, DOI: 10.1007/3-540-48236-9_3.

[31] Jeonghun Baek, Geewook Kim, Junyeop Lee,

Sungrae Park, Dongyoon Han, Sangdoo Yun,

Seong Joon Oh, Hwalsuk Lee, “What is

wrong with scene text recognition model

comparisons? dataset and model analysis”,

IEEE International Conference on Computer

Vision, Seoul, Korea, 2019, pp. 4715–4723,

DOI: 10.1109/ICCV.2019.00481.

[32] Jemimah K, “Recognition of Handwritten

Characters based on Deep Learning with

TensorFlow”, International Research Journal

of Engineering and Technology (IRJET), vol.

6, Issue: 09, pp 1164-1165, 2019.

[33] Chunpeng Wu, Wei Fan, Yuan He, Jun Sun,

and Satoshi Naoi, “Handwritten Character

Recognition by Alternately Trained

Relaxation Convolutional Neural Network”,

14th International Conference on Frontiers in

Handwriting Recognition, ICFHR, Allen, TX,

USA, 2014, DOI: 10.1109/ICFHR.2014.56.

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2024.21.25

Hakik Paci, Dorian Minarolli,

Evis Trandafili, Stela Paturri

E-ISSN: 2224-3402

270

Volume 21, 2024

Contribution of Individual Authors to the

Creation of a Scientific Article (Ghostwriting

Policy)

- Hakik Paci developed the algorithm for image

processing, prepared the structure of the article,

and the contributed to write the main part of the

article.

- Evis Trandafili reviewed the related work and

focused on linguistic for the implementation of the

HTR in Albania language.

- Dorian Minarolli prepared the infrastructure and

the dataset used in simulation.

- Stela Paturi is a student who has organized and

executed the experiments while preparing her

diploma thesis.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

No funding was received for conducting this study.

Conflict of Interest

The authors have no conflicts of interest to declare.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2024.21.25

Hakik Paci, Dorian Minarolli,

Evis Trandafili, Stela Paturri

E-ISSN: 2224-3402

271

Volume 21, 2024