the text in the image, it extracts the text information
from the entire area within the image and then
processes it. However, targeting OCR based on the
table area rather than the entire document, and
excluding the text area from recognition if it
exceeds the table area, will improve the accuracy of
the detection target and reduce boundary
interference, [4]. This proposal locates the table in
the image, extracts the position and size of each cell,
and text information, and excludes areas where it is
technically difficult to separate the text. Table
recognition techniques typically use object detection
techniques to locate tables. Object detection is a
technique for finding the location and size of objects
in an image, [5]. It compares the respective
coordinates of the table and text, measures the
distance between the two coordinates, and sets a
certain threshold value to exclude the text area from
the OCR target when it exceeds the area of the table
object, [6].
The algorithm for detecting and separating table
and text regions is not different from the technique
for extracting text information from images.
Therefore, the text recognition rate can be improved
by clearly specifying the extraction target. In OCR
preprocessing, table positions, and text areas can be
detected, and table and text boundaries can be
separated to improve recognition rates in the text
recognition stage. Alignment and correction of text
areas can also be performed. By accurately aligning
the rows and columns of the table and adjusting the
regular placement of the document, text recognition
accuracy can be improved.
The paper is organized as follows: Section 2
provides an overview of the technology and the
concept of OCR using deep learning, Section 3
presents the overall architecture of the system,
Section 4 describes the implementation process, and
Section 5 concludes with future research
considerations.
2 Related Work
2.1 Faster R-CNN
Faster R-CNN(Faster Regions with Convolutional
Neuron Networks features) is an algorithm proposed
in 2015, [7], that can perform fast and accurate
object detection by compensating for the
shortcomings of R-CNN, [8], and Fast R-CNN. It
first processes the input image with a
CNN(Convolutional Neural Network) to generate a
feature map. It then uses an RPN(Region Proposal
Network) to generate candidate regions and
performs RoI(Region of Interest) pooling on these
candidate regions to extract the features of each
object. These features are then utilized to perform
object classification and bounding box estimation.
Recently, there has been a lot of research in the field
of table recognition that utilizes it to detect table
regions. By utilizing it, table areas can be detected
accurately, and tables of various sizes and shapes
can be detected.
2.2 YOLO
YOLO (You Only Look Once) is an algorithm
proposed in 2016 that provides fast speed and high
accuracy. It divides the input image into a grid and
predicts the probability of the bounding box and
corresponding object in each grid cell. It uses these
predictions to perform object classification and
bounding box estimation. It has been widely used in
the field of table recognition recently due to its fast
speed and high accuracy. YOLOv3, [9], provides
both high accuracy and fast speed, and it can detect
tables of various sizes and shapes. Recently, various
object detection algorithms, including Faster R-
CNN and YOLO, have been developed and continue
to be used in the field of table recognition to provide
high recognition rates and fast speeds.
2.3 Mask R-CNN
Mask R-CNN, [10], is a state-of-the-art object
detection and instance segmentation algorithm that
has been proven to be effective for a variety of
tasks, including table segmentation. It is an
extension of Faster R-CNN, which performs object
detection and object segmentation simultaneously,
[11]. Therefore, a method is proposed to utilize it to
detect table regions within document images and
perform table recognition based on them. This
method is performed in the following steps.
Step 1: Perform object detection and object
segmentation in the image.
Step 2: Extract table regions from the object
segmentation results.
Step 3: Perform table recognition based on the
extracted table regions, [11].
In recent years, there have been many advances
in it that have improved its performance on table
segmentation tasks. One of the most important
advances is the use of attention mechanisms, which
allow the model to focus on specific regions of the
image when making predictions. They are
particularly effective for segmenting tables, as
tables are often difficult to distinguish from other
objects in an image. Another important development
is the use of data augmentation techniques. Data
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.23
Hangseo Choi, Jongpil Jeong