Design and Implementation for BIC Code Recognition System of
Containers using OCR and CRAFT in Smart Logistics
HANGSEO CHOI, JONGPIL JEONG, CHAEGYU LEE, SEOKWOO YUN, KYUNGA BANG,
JAEBEOM BYUN
Department of Smart Factory Convergence,
Sungkyunkwan University,
Cheoncheon-dong, Jangan-gu Suwon-si, Gyeonggi-do,
REPUBLIC OF KOREA
Abstract: - The BIC (Bureau International des Containers et du Transport Intermodal) Code is the identification
code for ocean shipping containers and is crucial for logistics, transportation, and security. Accurate
recognition of container BIC Code is essential for efficient import and export processes, authorities' ability to
intercept illegal goods and safe transportation. Nevertheless, the current practice of employees recognizing and
manually entering container BIC codes is inefficient and prone to error. Although automated recognition efforts
have been made, challenges remain due to the aging of containers, manufacturing differences between
companies, and the mixing of letters and numbers in the 11-digit combination. In this paper, we propose the
design and implementation of a BIC Code recognition system using an open source-based OCR engine, deep
learning object detection algorithm, and text detector model. In the logistics industry, various attempts are
being made to seamlessly link the data required at each stage of transportation between these systems. If we can
secure the stability and consistency of BIC Code recognition that can be used in the field through our research,
it will contribute to overcoming the instability caused by false positives.
Key-Words: - Smart Logistics, Container, BIC-code, Object Detection, Text Detection, OCR.
Received: June 6, 2022. Revised: April 17, 2023. Accepted: May 12, 2023. Published: June 16, 2023.
1 Introduction
The Container BIC Code can be written in different
languages and fonts, so it is important to use robust
algorithms and development logic when developing
a container BIC Code recognition system, [1]. To
achieve this, it is essential to conduct intensive
studies on Object Detection models and build a
model with good performance, [2]. Object detection
is a technique that utilizes deep learning to
recognize and locate specific objects in images or
videos, [3]. There are various types of object
detection models, which can be categorized into 1-
Stage and 2-Stage models. In a two-stage model, the
location of the object is first predicted and then the
object is classified based on that location. To
accomplish this, the system first utilizes an
RPN(Region Proposal Network) to extract regions
in the image that are likely to contain objects. These
regions are then processed through a backbone
network of CNNs to generate a feature map. The
feature map is subsequently transformed into a
fixed-size feature map using RoI(Region of Interest)
Pooling. Finally, the RoI features are fed into the
Classification and Bounding Box Regression Layer,
consisting of a Fully Connected Layer and a
Softmax Layer, to predict the object class and
location. A well-known model in this category is
Faster R-CNN, [4]. On the other hand, 1-Stage
models, such as YOLO, are known for their faster
processing speed. YOLO divides the image into grid
cells to identify objects and predicts the probability
and location of objects within each grid cell.
Another algorithm called SSD extracts feature maps
of different sizes to predict the location and class of
objects, [5].
Since container BIC Codes are typically found
in specific areas of the container, a Bounding Box
can be created to identify and extract those regions,
[6]. Object Detection employs a CNN
(Convolutional Neural Network) based algorithm to
generate bounding boxes and extract features for
extracting the BIC Code from images, [7]. To read
the text within the extracted region, OCR (Optical
Character Recognition) is necessary, wherein the
character recognition algorithm considers the
object's size and position when processing the
image, [8].
In the past, it was common practice to extract
container BIC codes by separately conducting object
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.6
Hangseo Choi, Jongpil Jeong,
Chaegyu Lee, Seokwoo Yun,
Kyunga Bang, Jaebeom Byun
E-ISSN: 2415-1521
62
Volume 11, 2023
detection and OCR. However, this study evaluates
object detection and OCR in an integrated manner,
which can serve as a reference in real-world
industrial applications. It is also worth noting that
we carefully selected a model with an outstanding
performance by comparing various open-source
OCR engines, as well as the actual performance of
object detection models such as Faster R-CNN,
YOLOv5, and the CRAFT text detection model.
This enables us to assess the strengths and
weaknesses of each model and contribute to the
methodology of selecting the optimal model for the
field. The main idea of this study is to generate BIC
Code coordinates extracted through object detection
as custom input for OCR, thereby generating images
for analysis. By comparing different object
detection techniques and models to identify a model
that can be customized for industrial applications,
valuable insights, and practical solutions can be
obtained. Ultimately, the extraction of container
BIC codes plays a crucial role in the logistics and
shipping industry, and this research can contribute
to the development of more accurate and efficient
logistics and shipping management systems. This
paper is highly relevant in the field of computer
vision and artificial intelligence, and it is expected
to generate significant interest in logistics-related
domains such as transportation and ports.
The paper is organized as follows: Section 2
provides an overview of the technology and the
concept of OCR using deep learning, Section 3
presents the overall architecture of the system,
Section 4 describes the implementation process, and
Section 5 concludes with future research
considerations.
2 Related Work
2.1 Object Detection
Object detection is a technique for detecting and
recognizing specific objects in images or videos.
The Object Detection model determines where a
particular object exists in the input image and marks
the region of the object as a bounding box, [9].
These bounding boxes represent the location and
size of the object and are used to extract the object's
location, which is a detail as well as the
classification of the object. Some of the algorithms
that can be used to perform object detection are
YOLO, Faster R_CNN, and SSD, [10].
These algorithms divide the image into multiple
grid cells and train a deep-learning model that
predicts the presence of objects and bounding box
information for each cell. These deep learning
models are typically based on CNNs. Each
algorithm works slightly differently, but they all
preprocess the input image and then use a CNN to
generate a feature map. This feature map is used to
predict the presence of an object in each cell and its
bounding box information. Bounding boxes are then
generated based on the predicted information in
each cell, and redundant bounding boxes are
removed using the NMS(Non-Maximum
Suppression) algorithm. The Object Detection
model is trained using a large dataset, and the
trained model can identify and recognize objects in
new images. You can also use Transfer Learning
techniques to fine-tune the pre-trained model to
improve object detection accuracy. Thus, Object
Detection can be used to identify the location of an
object in an image containing a container BIC Code.
The image at that location can then be passed to
OCR to extract text, classify it, and recognize the
container BIC Code.
2.1.1 Faster R_CNN
Faster R_CNN is a deep learning model that
combines RPN and Fast R_CNN to perform object
detection. The architecture of Faster R_CNN
consists of RPN, RoI Pooling Layer, and Classifier.
Fig. 1 explains the architecture and behavior of the
described Faster R-CNN.
RPN: It is responsible for finding regions in the
input image where objects are likely to be
present. To do this, RPN uses a CNN that
performs a sliding window on the image and
predicts the likelihood of an object being
present in each window.
RoI Pooling Layer: The candidate object
regions found by the RPN are passed to the RoI
Pooling Layer to create a fixed-size feature
map. This allows it to effectively handle
objects of different sizes.
Classifier: Takes the feature map generated by
the RoI Pooling Layer as input and is used to
predict the class and bounding box of an
object.
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.6
Hangseo Choi, Jongpil Jeong,
Chaegyu Lee, Seokwoo Yun,
Kyunga Bang, Jaebeom Byun
E-ISSN: 2415-1521
63
Volume 11, 2023
Fig. 1: Contrasting the Faster R-CNN architecture
for object detection in images, [11].
2.1.2 YOLO5
YOLOv5 is based on the concept of YOLO (You
Only Look Once) and provides faster processing
speed and higher accuracy, [12]. YOLOv5 is mainly
composed of Backbone Network, Neck, and Head.
The Backbone Network is responsible for
extracting the feature map from the input image.
YOLOv5 can use backbone networks such as
EfficientNet and CSPDarknet53, and the Neck
processes the feature map to predict the location and
size of objects. YOLOv5 can use Necks such as
SPP, PANet, etc. Finally, the Head takes the feature
map generated by the Neck as input and predicts the
class and bounding box of the object. Like
YOLOv3, YOLOv5 uses feature maps of various
scales to predict the location and size of objects and
uses concepts such as anchors and grid cells to
generate bounding boxes.
When you input an image containing a container
BIC Code using a model trained with Object
Detection, the model determines the location of the
object in the input image and creates a bounding
box around it. From this, the location of the
container BIC Code can be determined, and the text
can be extracted, classified, and recognized using
the OCR model. The OCR network and architecture
can be divided into three parts. First, there are the
Convolutional Layers, which preprocess the input
images and convert them into a format that can be
used as input to the CNN. Then there are the
Recurrent Layers, which use the output of the
Convolutional Layers as input to the RNN. The
Recurrent Layers process the features of the image
as a continuous sequence, from which the textual
information of the image is extracted. Finally, there
are Fully Connected Layers to identify the extracted
textual information. Through this process, the
trained model can recognize patterns of container
BIC Code in the input image, separate each pattern
and combine the recognized text to recognize the
entire container BIC Code. However, in the case of
the container BIC Code, there is a problem in that
the check digit needs to be recognized separately. In
this case, you can extract the check digit by first
recognizing 4 alphanumeric characters and 6
numeric characters, combining them to get a 10-
digit number, and then applying an algorithm to
calculate the check digit.
2.2 OCR
OCR is a technology that recognizes text in images
and converts it into machine-readable text. OCR is
used in a variety of fields and is widely used in
areas such as document recognition, license plate
recognition, and handwriting recognition. The
general way OCR works starts with preprocessing.
Before text can be recognized from an image,
preprocessing is necessary and can include image
resizing, denoising, contrast enhancement, and
binarization, [13]. Next comes feature extraction. To
extract text from a preprocessed image, various
feature extraction algorithms can be used to extract
features of the text, [14]. For example, algorithms
such as HOG (Histogram of Oriented Gradients),
SIFT (Scale-Invariant Feature Transform), SURF
(Speeded Up Robust Feature), and CNN are used.
Character recognition involves extracting features
and then recognizing them in an image. Character
recognition can be performed with a variety of
algorithms, including machine learning algorithms
(SVM, KNN, Neural Networks, etc.), statistical
models (Hidden Markov Model, Conditional
Random Field, etc.), and rule-based approaches.
Finally, the text generated by the character
recognition process is subjected to post-processing
to improve its accuracy. Post-processing can include
error correction, character segmentation and
merging, word segmentation, and application of
language models, [15]. The performance of OCR is
affected by a variety of factors. For example, image
quality, character size, font, background, etc. affect
the accuracy of OCR. OCR also performs
differently in different languages, and models must
be built for each language.
2.3 Text Classification
Text Classification is a technique for assigning a
given piece of text to a predefined class or category.
It is used for spam filtering, sentiment analysis,
category categorization, etc., and trained and
consists of five main steps. The first is data
collection and preprocessing, where the collected
data must be stored in text format and have a
balanced distribution of all possible classes or
categories. The preprocessing step normalizes the
textual data, removes special characters, dead
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.6
Hangseo Choi, Jongpil Jeong,
Chaegyu Lee, Seokwoo Yun,
Kyunga Bang, Jaebeom Byun
E-ISSN: 2415-1521
64
Volume 11, 2023
words, etc., tokenizes words, and creates a vector
representation for each word, [16]. Each text sample
must then be converted to a vector, [17]. Several
techniques can be used to generate these vector
representations. Techniques such as BoW(Bag of
Words), TF-IDF(Term Frequency-Inverse
Document Frequency), Word2Vec, GloVe, etc. can
be used in this step to convert the text into vectors
that the model can understand. Finally, the model is
selected and trained, and evaluated to tune the
model and make predictions. There are many
different text classification models. These models
take the vectors generated from feature extraction as
input and learn to assign every input vector to a
corresponding class, [18]. Typical text classification
algorithms include Naïve Bayes, SVMs, Random
Forests, and Neural Networks. Recently,
transformer-based models, such as BERT and GPT,
have become popular. Model evaluation metrics
include accuracy, precision, recall, and F1 score,
and hyper-parameter tuning is performed to improve
model performance, and the final model is
generated.
2.4 Deep Learning Frameworks
Deep Learning Frameworks enable the
implementation and training of large-scale complex
models and provide high performance by utilizing
hardware accelerators such as GPUs or TPUs, [19].
Major deep learning frameworks include
TensorFlow, Pytorch, Keras, MXNet, and Caffe. All
these frameworks, [20], are open-source and
continuously evolved by the community. Of these,
TensorFlow and PyTorch are among the most
widely used frameworks. TensorFlow is an open-
source library developed by Google that is very
efficient for implementing large-scale neural
network models. Pytorch is an open-source library
developed by Facebook that supports native Python
code for easy debugging and development and
supports dynamic graph computation for fast model
development and experimentation.
Pytorch provides various functions such as
model design, data processing, model training, and
deployment. Model design can be easily
implemented by inheriting the nn. Module class and
data processing can be easily implemented using
Pytorch's Dataset and DataLoader. Model training
provides various techniques such as learning rate
schedule, weight initialization, and normalization,
and supports GPU acceleration for high
performance. Pytorch also allows you to deploy
your models on a variety of platforms, including
mobile and web.
Pytorch is based on the Torch library, a tensor
computation library, which provides various
modules, projects, and libraries to support various
operations, making it easy to implement various
deep learning models such as CNN, RNN, and
Transformer. Due to these features, Pytorch has an
active community and is used by many researchers
and developers, making it easier and faster to
implement deep learning models.
2.5 CRAFT
The CRAFT (Character-Region Awareness For Text
Detection) model is based on the VGG16
architecture, and Basenet defines the structure of
VGG16. Basenet is divided into five sections from
Slice1 to Slice5, and each section consists of a CNN
layer, Batch Normalization, ReLU activation
function, and Max Pooling layer. The four slices
from Upconv1 to Upconv4 use a U-Net structure,
each using a Double Convolution layer to expand
the feature map. Finally, Conv_cls processes the
feature map using the Convolutional layer and
ReLU activation function and finally generates an
output to the Convolutional layer with two output
channels, which is a two-dimensional binary map
representing the character region detection result.
Fig. 2 illustrates the CRAFT processing process.
Fig. 2: CRAFT Schematic illustration of our
network architecture, [21]
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.6
Hangseo Choi, Jongpil Jeong,
Chaegyu Lee, Seokwoo Yun,
Kyunga Bang, Jaebeom Byun
E-ISSN: 2415-1521
65
Volume 11, 2023
3 Proposed Method
3.1 Container BIC Code Recognition System
The container BIC code consists of 11 characters.
The 11 characters consist of the Owner Code,
Product Group Code, Serial Number, and Check
Digit, as shown in Fig. 3.
Fig. 3: Container BIC Code Structure
Since the container code requires all 11
characters to be detected to be meaningful as a code,
if any of these characters are missing, it is difficult
to recognize the BIC Code. In addition to the
difficulty of recognizing the BIC Code, noise, blur,
deformation, etc. may occur due to varying image
quality, making object recognition difficult. As
shown in Fig. 4, the check digit at the end of the
code, which is a number for accurate data delivery
and verification of the container, is located at the
end of the container BIC Code and cannot be easily
detected by the OCR algorithm because it is a
number in a box.
Fig. 4: Check Digit location and Shape
The idea of this paper is to use the Object
Detection model to detect the image region
containing the container BIC Code, identify the
location of the object, and pass the location of the
object to OCR to extract the text at that location,
classify it, and recognize the container BIC Code.
An additional task in this process is to check the
Check Digit. It is used to self-verify the BIC Code
without relying on preprocessing, as shown in the
last step in Fig. 5.
Fig. 5: System Concept
3.2 Deep Learning Object Detection
Algorithms
To recognize the container BIC Code, an OCR
model must be used, which is responsible for
recognizing and converting text from a given image
into text, [22]. Since the Container BIC Code
consists of 4 alphanumeric characters, 6 numeric
characters, and 1 check digit, the OCR model must
be able to recognize and classify this pattern to
extract the corresponding Container BIC Code, [23].
For this purpose, we selected Faster R_CNN and
YOLOv5 as the Object Detection to use. Object
detection models such as Faster R_CNN and
YOLOv5 can be used to localize the container BIC
Code in the input image and generate the bounding
box of the object, [24]. However, the most
important thing is the detection of Check Digit. This
is because the container image may contain multiple
texts, and if the Check Digit is not detected, the BIC
Code cannot be matched even if the other texts are
extracted normally. Fig. 6 shows the training
structure using a table detection model to find
Check Digit.
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.6
Hangseo Choi, Jongpil Jeong,
Chaegyu Lee, Seokwoo Yun,
Kyunga Bang, Jaebeom Byun
E-ISSN: 2415-1521
66
Volume 11, 2023
Fig. 6: Table detection model training structure for
check digit recognition
3.3 Text Object
CRAFT is one of the deep learning models
responsible for text regions. It can extract text
regions because it works by segmenting text regions
in detail to detect character boundaries and then
predicting character boundaries. As a result, CRAFT
is highly accurate in text recognition tasks and can
be analyzed by object detection models and OCR
engines because of this characteristic. However,
CRAFT does not recognize the content of characters
in text recognition tasks, only the boundaries of
characters.
CRAFT, as well as other object detection
algorithms, must locate the BIC Code location and
integrate it with an OCR library based on the
coordinates to extract, classify, and recognize the
BIC Code. At this point, the OCR model will
segment the image to find the exact location and
boundaries of the recognizable characters in the
image. A segment is a part of an image that contains
a character. The OCR model finds and extracts
segments containing characters from the input
image and converts them into strings. To do this, the
OCR model performs segmentation during image
preprocessing. The segmentation process consists of
two steps: first, finding the character regions, and
second, separating the character regions. There are
several ways to find character regions, including
global thresholding and local thresholding. After
finding the text area, there are methods to separate
the actual character from the area, such as
Connected Component Analysis, Split and Merge,
Labeling and Segmentation, etc.
3.4 Find Text Area
To read the text in the container, it is necessary to
preprocess the text clarity, boundaries, and size of
the aging equipment so that text detection can
proceed efficiently. The preprocessing process can
be divided into image preprocessing and OCR
preprocessing. As shown in Fig. 7, the container has
a lot of text information as well as the BIC Code. To
determine the BIC Code among them, the quality of
the input data is important and needs to be
preprocessed.
Fig. 7: Container Text Area
Image preprocessing is the process of
preprocessing the images that are input to the Object
Detection model. This can include image resizing,
rotating and flipping, denoising, and color
equalization. Image resizing and rotation first resize
the image to fit the input size of the model, [25].
This is essential because the input size of the model
must be constant. Image rotation and inversion
increase the diversity of the image and allow the
model to detect objects from different angles. Noise
is then removed so that the model can identify the
exact location of the object, and color equalization
of the image is performed to make the object more
visible.
3.5 Open-Source OCR
For open-source OCR, we chose EasyOCR and
TesseractOCR. EasyOCR is based on deep learning
technology, supports multilingual OCR recognition,
and has a simple API and high recognition accuracy.
It is also a robust OCR engine that works well with
slightly noisy images.
TesseractOCR is an open-source OCR engine
developed by Google that uses statistical-based
techniques along with deep learning techniques like
LSTM. Tesseract offers high accuracy and
reliability and is known to have high recognition
rates for several languages, including English and
other European languages. EasyOCR and
TesseractOCR were selected as representative
models for the performance comparison because
they have good performance in terms of recognition
accuracy, multilingual recognition support, and
options.
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.6
Hangseo Choi, Jongpil Jeong,
Chaegyu Lee, Seokwoo Yun,
Kyunga Bang, Jaebeom Byun
E-ISSN: 2415-1521
67
Volume 11, 2023
4 Experiment and Results
4.1 Experimental Environments
System and workload parameters are important for
evaluating and optimizing the performance of an
algorithm. These parameters include system and
hardware configuration, image resolution, image
format, processing speed, etc. The hardware and
software environments were tested on Google Corel,
with NVIDIA T4 as the graphics card and Python
3.8, CUDA 11.8, and torch2.0.0 as the development
language. For the learning model, the open source-
based OCR engine used the library in its pre-trained
form without modification, the object detection
algorithm used a pre-trained model with COCO
data, and CRAFT used craft_mlt_25k.pth. Container
images and label data were prepared separately for
training, verification, and testing, and the image
resolution, format, and label data structure were
modified according to each pre-trained model.
4.2 Data Set
To recognize container BIC Code using Faster
R_CNN or YOLOv5, we need a dataset to train on.
Table. 1 shows the training and testing targets,
organized after ingestion.
Table 1. Training Target
Train
Test
Image
Source
100
20
JPG
AI-Hub
3,000
600
JPG
AI-Hub
6,000
1,200
JPG
AI-Hub
We used A.I. Hub's 2021 "Logistics
Infrastructure Data for Connected Ports" dataset.
This dataset is the result of securing source data to
provide BIC Code, a method of identifying actual
containers, and yard trailers, a means of
transportation, and basic data to build realistic and
business-applicable port logistics data, which is the
basis for the development of smart port
construction. The data source was collected from
quay cranes (2 Terminals) and deck gates (3
Terminals, 12 Lanes). Images and labels are in
pairs. Fig. 8 shows the JSON structure of the label
data, which provides the text of the BIC Code and
the coordinates of the area to help learn.
Fig. 8: Label Data Structure
4.3 Results
4.3.1 Faster R-CNN
We evaluated the performance (mAP, Precision,
Recall) by applying Faster R-CNN, the most famous
2-Stage Detector model, and experimented with
various hyperparameters to find the optimal results.
The Learning_Rate, which is used to update the
weights during training, was set to 0.005. The
Optimizer was set to SGD, Momentum to 0.9, and
Weight_Decay to 0.0005. Batch_Size is the number
of data used for training at a time, and since we
were running with Colab-pro, we set 16 and 32 to
account for GPU memory, but there was no
significant difference in training speed between the
two settings. Finally, Num_Epochs, which
determines the number of training times, was set to
40 depending on the amount of data and the
possibility of overfitting. The results of the
experiment showed that the values of mAP,
Precision, and Recall increased as the amount of
training data increased, indicating that it became
more accurate as it learned more data. However,
Recall has a lower detection rate for Check Digit
than mAP, so We had tried to train with more
training data and changed hyperparameters. Table. 2
shows that the performance of the Faster R-CNN
model improves slightly as the number of training
data increases.
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.6
Hangseo Choi, Jongpil Jeong,
Chaegyu Lee, Seokwoo Yun,
Kyunga Bang, Jaebeom Byun
E-ISSN: 2415-1521
68
Volume 11, 2023
Table 2. Check Digit Detection Experiment Results
by Increasing Faster R_CNN Training Data
Train
Test
mAP
Precision
Recall
100
20
0.34
0.56
0.12
3,000
600
0.41
0.64
0.17
6,000
1,200
0.46
0.68
0.24
4.3.2 YOLOv5
To optimize the 1-Stage Detector model, YOLO, we
used hyper-parameter optimization techniques, Grid
Search and Bayesian Optimization, to derive the
optimal hyper-parameters. Grid search involves
trying all possible combinations of hyper-parameter
values to find the optimal combination, while
Bayesian optimization is an automatic optimization
technique. We compared the performance of the
model by applying two methods, Pre-defined
Anchor Box and Anchor Box free. Pre-defined
Anchor Box method detects objects using
predefined anchor boxes, and Anchor Box free
method detects objects without using anchor boxes.
To proceed with YOLOv5, image files and
YOLO Text files containing label information must
be configured as a Dataset. The set of image files
and label files are divided into Train, Validation,
and Test, and the label file consists of the class
index and the coordinate information of the
bounding box. The coordinate information is
expressed as the center (x, y) coordinate and the
width and height of the object, and the value is
expressed as a value relative to the width and height
of the image. Typically, coordinate values are
normalized to a range between 0 and 1. The closer
the object's Width and Height values are to 0, the
smaller the image, and closer to 1, the larger the
image, so that the same label format can be
maintained regardless of the image size.
The YAML file is the configuration file used by
YOLOv5, which sets the training datapath,
validation datapath, batch size, and number of
epochs according to the predefined format, and
finally, the performance evaluation metric for
YOLOv5 uses the same mAP for comparison with
Faster R_CNN. Table. 3 shows that the performance
of the YOLOv5 model improves slightly as the
number of training data increases.
Table 3. Check Digit Detection Experiment Results
by Increasing YOLOv5 Training Data
Train
Test
mAP
Precision
Recall
100
20
0.30
0.50
0.10
3,000
600
0.37
0.58
0.16
6,000
1,200
0.43
0.65
0.20
4.3.3 CRAFT
Each model recognizes text in a given image, finds
the bounding box of the text, and outputs the
coordinates in a result file, based on the data
characteristics for training the model. Ideally, the
four parameters should produce a four-sided
rectangle with vertices, but the results can predict
irregular shapes such as trapezoids or polygons.
Fig. 9 shows the output after running CRAFT,
which returns regions with a polygonal structure, as
opposed to the label data for training. Because the
dimensions of the training and output data were
different, post-processing was performed to match
the same dimensions.
Fig. 9: CRAFT Results Data Structure
Many Object Detect algorithms are based on the
following criteria: WA(Word Accuracy): The
percentage of matches between the extracted text
and the actual text, CA(Character Accuracy): The
percentage of matches between extracted and real
characters, and Precision, Recall, and F1-score:
Precision, Recall, and F1-score values between
extracted and real text as performance metrics.
However, in the case of CRAFT, the accuracy
measurement method for text recognition is
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.6
Hangseo Choi, Jongpil Jeong,
Chaegyu Lee, Seokwoo Yun,
Kyunga Bang, Jaebeom Byun
E-ISSN: 2415-1521
69
Volume 11, 2023
different from the commonly used method. In
general, the IoU value between the ground truth and
the predicted value is 0.5 or more, and the Precision,
Recall, F1-score, etc. are calculated based on that,
but unlike other algorithms, CRAFT aims to
generate multiple candidate regions and miss as few
real text regions as possible, rather than producing
highly accurate predictions. In training and testing,
CRAFT outperformed comparable OCR engines
and object detection models and achieved a success
rate of 74.5% in detecting check digits in the test
data. Table. 4 shows that the CRAFT model
performs well from the first training, and like other
algorithms, the performance improves as the
number of training data increases.
Table. 4. Check Digit Detection Experiment Results
by Increasing CRAFT Training Data
Train
Test
mAP
Precision
Recall
100
20
0.66
0.70
0.62
3,000
600
0.69
0.74
0.63
6,000
1,200
0.75
0.78
0.71
4.4 Text Recognition
Since EasyOCR and TesseractOCR already include
an Object Detection algorithm, we tested them
without customization to handle detection and
recognition as a 1-Stage method. As a result, the
OCR engine had a good text detection rate for the
total number of characters in the image, which was
sufficient to recognize text after detecting the BIC
Code. However, the function of recognizing BIC
Code as a single string and recommending them as
text by combining them, and the detection of check
digits were more than 95% in error, so the
evaluation method through index evaluation was
meaningless.
5 Conclusions
In this paper, we compared and evaluated OCR
engines, object detection algorithms, and text
detection models to identify the location of
container BIC Code objects, and then passed the
location of the objects to OCR to extract and
classify the text at the location to recognize
container BIC Code. In general, container BIC
codes are recorded manually in logistics, but this
process is cumbersome and needs to be improved
due to the possibility of human error. Therefore, we
expect that if this result is developed and optimized
by applying it to the field, it will lay the foundation
for the automatic recognition and processing of BIC
Code. As a result, high accuracy and fast processing
speed can be expected compared to manual
recognition, and it can be used not only in logistics
but also in various industries to increase user
satisfaction.
It is also an achievement that we have
confirmed that there is a need for improvement in
accuracy and consistency while conducting this
experiment. In terms of accuracy, the detection rate
for the entire text is satisfactory because the open
source-based OCR does not have a fixed and partial
purpose of recognizing container numbers.
However, in the case of the object detection
algorithm, a stronger preprocessing and
postprocessing could have resulted in higher
performance, but the lack of training data and
various hyperparameter settings are disappointing.
The results from all the tested models were
inconsistent even on the same data. This can
probably be overcome with technical improvements.
Acknowledgment:
This research was supported by the SungKyunKwan
University and the BK21 FOUR (Graduate School
Innovation) funded by the Ministry of Education
(MOE, Korea) and the National Research
Foundation of Korea(NRF). And, this work was
supported by the National Research Foundation of
Korea (NRF) grant funded by the Korea government
(MSIT) (No. 2021R1F1A1060054). Corresponding
authors: Professor Jongpil Jeong and Professor
Chegyu Lee.
References:
[1] M. Goccia, M. Bruzzo, C. Scagliola, and S.
Dellepiane, Recognition of container code
characters through gray-level feature
extraction and gradient-based classifier
optimization, 7th Int. Conf. Document Anal.
Recognit, Edinburgh, U.K., Aug. 2003, pp.
973–977.
[2] W. Al-Khawand, S. Kadry, R. Bozzo, and S.
Khaled, 8-neighborhood variant for a better
container code extraction and recognition,
Int.J. Comput. Sci. Inf. Secur, Vol.14, No.4,
Apr. 2016, pp. 182–186.
[3] D. G Lee, CNN-based Image Rotation
Correction Algorithm to Improve Image
Recognition Rate, The Journal of The Institute
of Internet, Broadcasting and Communication
(IIBC) Vol.20, No.1, 2022, pp.225-229.
[4] S. Ren, K. He, R.B. Girshick, and J. Sun,
Faster R-CNN: towards real-time object
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.6
Hangseo Choi, Jongpil Jeong,
Chaegyu Lee, Seokwoo Yun,
Kyunga Bang, Jaebeom Byun
E-ISSN: 2415-1521
70
Volume 11, 2023
detection with region proposal networks.
CoRR, 2015, abs.1506.01497.
[5] J. Redmon, A. Farhadi, YOLOv3: An
Incremental Improvement, Computer Vision
and Pattern Recognition Workshops
(CVPRW), 2018, pp.1-8.
[6] J. Song, N. Jung, H. Kang, Container BIC-
code region extraction and recognition
method using multiple thresholding, Journal
of the Korea Institute of Information and
Communication Engineering, Vol.19, No.6,
2015.
[7] J. Huang, V. Rathod, C. Sun, M. Zhu, A.
Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y.
Song, S. Guadarrama, K. Murphy,
Speed/Accuracy Trade-Offs for Modern
Convolutional Object Detectors, Proceedings
of the IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 2017, pp.
7310-731.
[8] T. Szabo, G. Horvath, Finite word length
computational effects of the principal
component analysis networks, IEEE Trans.
Instrum.Meas, Vol.47, No.5, Oct. 1998, pp.
1218–1222.
[9] R. Girshick, J. Donahue, T. Darrell, J. Malik,
Rich feature hierarchies for accurate object
detection and semantic segmentation, In
Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition
(CVPR), 2014, 580–587.
[10] Zhengxia Zou, Keyan Chen, Zhenwei Shi,
Member, IEEE, Yuhong Guo, and Jieping Ye,
Fellow, Object Detection in 20 Years: A
Survey, IEEE Proceedings of the IEEE,
March 2023 Volume: 111, Issue: 3.
[11] Y. Chao, S. Vijayanarasimhan, B. Seybold,
David A. Ross, J. Deng, R. Sukthankar,
Rethinking the Faster R-CNN Architecture for
Temporal Action Localization, arXiv:
1804.07667v1 [cs.CV] 20, Apr 2018.
[12] A. Bochkovskiy, C. Wang, A. Mykhailych,
YOLOv5: Improved Real-Time Object
Detection, Computer Vision and Pattern
Recognition Workshops (CVPRW), 2021, pp.
2290-2298.
[13] X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou,
W. He, J. Liang, EAST: An efficient and
accurate scene text detector, Proceedings of
the IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 2017, pp. 5551-
5560.
[14] N. Subramani, A. Matton, M. Greaves, A.
Lam, A Survey of Deep Learning Approaches
for OCR and Document Understanding,
arXiv:2011.13534, 2021.
[15] K. Simonyan, A. Zisserman, Very deep
convolutional networks for large-scale image
recognition, arXiv:1409.1556, Apr. 2015.
[16] V. E. Bugayong, J. Flores Villaverde, N. B.
Linsangan, Google Tesseract: Optical
Character Recognition (OCR) on HDD / SSD
Labels Using Machine Vision, 14th
International Conference on Computer and
Automation Engineering (ICCAE), 2022, pp.
56-60.
[17] J. Singh, B. Bhushan, Real Time Indian
License Plate Detection using Deep Neural
Networks and Optical Character Recognition
using LSTM Tesseract, 2019 International
Conference on Computing, Communication,
and Intelligent Systems (ICCCIS), 2019,
pp.347-352.
[18] B. Shi, X. Bai, C. Yao, An end-to-end
trainable neural network for image-based
sequence recognition. Image-based sequence
recognition and its application to scene text
recognition, IEEE Trans. Pattern Anal. Mach.
Intell, 2017, pp. 2298-2304.
[19] L. Mei, J. Guo, Q. Liu, and P. Lu, A new
framework for containers. A new framework
for code-to-character recognition based on
deep learning and template matching, Conf.
Ind. Informat.-Comput. Technol, Ind. Inf.
(ICIICII), Wuhan, Hubei, China, Dec. 2016,
pp. 78-82.
[20] Y. Lee, P. Moon, A Comparison and Analysis
of Deep Learning Framework, Journal of the
KIECS, Vol.12, 2017, pp.115-122.
[21] Y. Baek, B. Lee, D. Han, S. Yun, H Lee,
Clova AI Research, NAVER Corp. Character
Region Awareness for Text Detection, CVPR
2019 open access
[22] C. Roeksukrungrueang, T. Kusonthammrat,
N. Kunapronsujarit, T. N. Aruwong, and S.
Chivapreecha, Automatic implementation of a
container number recognition system,
Workshop Adv. Image Technol. (IWAIT),
Chiang Mai, Thailand, Jan. 2018, pp.1-4.
[23] Y. Liu, T. Li, L. Jiang, X. Liang, Container-
code recognition system. Container-code
recognition system based on computer vision
and deep neural network, Int. Conf. Adv.
Mater, Mach, Electron, Xi'an, China, Jan.
2018, 20-21.
[24] U. Mittal, P. Chawla, R. Tiwari,
EnsembleNet: a hybrid approach for vehicle
detection and estimation of traffic density
based on faster R-CNN and YOLO models,
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.6
Hangseo Choi, Jongpil Jeong,
Chaegyu Lee, Seokwoo Yun,
Kyunga Bang, Jaebeom Byun
E-ISSN: 2415-1521
71
Volume 11, 2023
Neural Computing and Applications (2023)
35:4755-4774
[25] K. Zhang, H. Wang, X. Li, An ensemble of
YOLOv5 and Faster R-CNN for container
BIC Code detection and recognition, Robotics
and Automation Sciences (ICRAS), pp.1-6.
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
Hangseo Choi Hangseo designed the research topic
and goals, implemented, and experimented with
CRAFT, and wrote the paper.
Prof. Jongpil Jeong conceptualized the idea,
presented the methodology, and reviewed it.
Prof. Chaegyu Lee provided academic advice.
Seokwoo Yoon implemented and experimented with
Faster R-CNN.
KyungAh Bang implemented and experimented
with YOLOv5.
Jaebeom Yoon implemented and experimented with
OCR.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
This research was supported by the SungKyunKwan
University and the BK21 FOUR(Graduate School
Innovation) funded by the Ministry of
Education(MOE, Korea) and the National Research
Foundation of Korea(NRF). And this work was
supported by the National Research Foundation of
Korea (NRF) grant funded by the Korean
government (MSIT) (No. 2021R1F1A1060054).
Corresponding authors: Professor Jongpil Jeong.
Conflict of Interest
The authors have no conflict of interest to declare.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.6
Hangseo Choi, Jongpil Jeong,
Chaegyu Lee, Seokwoo Yun,
Kyunga Bang, Jaebeom Byun
E-ISSN: 2415-1521
72
Volume 11, 2023