
COVID-19 Medical Data Integration Approach
VIOLETA TODOROVA1, VESKA GANCHEVA2, VALERI MLADENOV1
1Department of Fundamentals of Electrical Engineering
2Department of Programming and Computer Technologies
Technical University of Sofia
Kliment Ohridski boul. 8, Sofia
BULGARIA
Abstract: - The need to create automated methods for extracting knowledge from data arises from the
accumulation of a large amount of data. This paper presents a conceptual model for integrating and processing
medical data in three layers, comprising a total of six phases: a model for integrating, filtering, sorting and
aggregating Covid-19 data. A medical data integration workflow was designed, including steps of data
integration, filtering and sorting. The workflow for Covid-19 medical data from clinical records of 20400
potential patients was employed.
Key-Words: - Clinical Records, COVID-19, Data Analytics, Data Integration
Received: May 15, 2022. Revised: May 28, 2022. Accepted: June 19, 2022. Published: Juluy 18, 2022.
1 Introduction
With the development of health care, science and
high technology, the amount of generated
information is growing at a tremendous speed and
volume [1]. As a result, multiple heterogeneous data
emergece, different in terms of types, storage files,
sources of data generation. The process of managing
different data from different sources is called data
integration. This is a typical process in fields such as
medicine, biology, bioinformatics, etc.
Big data in medicine includes biological,
biometric and electronic health data records [2].
Medical databases have a high degree of differences
in terminologies, features of records, data
presentation [3]. This, in turn, is associated with
problems when querying multiple databases.
Therefore, there is a need to automate database
integration to do much more than simple data
extraction and modification [4, 5]. Records in
different medical databases have different formats.
Integration requires the use of formats across
databases, but high dimensionality and redundancies
make such integration impossible.
During the data integration process, filtering
operations are performed to remove duplicate data,
data conversion, or manage data. The data integration
model can also vary between extract, transform and
load (ETL), extract, load and transform (ELT), data
transformation, data replication, data virtualization,
streaming data integration [6].
This paper presents a conceptual model for
integrating and processing medical data in three
layers, including a total of six phases: a model for
integrating, filtering, sorting and aggregating Covid-
19 data implemented in Talend Open Studio [7].
2 Material and Methods
The structure of the proposed medical data
integration and processing model is illustrated in Fig.
1. The model is organized into three layers, each of
which brings together the tasks to be performed.
Data management consists of three main phases:
data preparation for analysis, interpretation and
visualization, and the preparation phase includes
medical data collection, medical data storage,
medical data integration. The "Medical Data
Collection" phase is based on the data sources, the
technical devices providing visual data, the specifics
of the generated data, including data types and data
formats, images and features. One of the main
sources of medical data includes patient data
obtained from patient examinations, symptoms,
personal data including age, gender, medical history,
etc. Also, sensor data, omics data, electronic health
data and health records are collected.
The second phase "Medical data storage" of the
proposed model is related to the data storage process.
Typically, clinical data is collected and stored in
various file formats such as “.xls”, “.xlsx”, “.csv”,
“.xlsm”, DICOM, etc. However, there are two main
MOLECULAR SCIENCES AND APPLICATIONS
DOI: 10.37394/232023.2022.2.11
Violeta Todorova, Veska Gancheva, Valeri Mladenov