Models and Algorithms for Multimodal Data Processing
NATALIYA BOYKO
Artificial intelligence Department,
Lviv Polytechnic National University,
UKRAINE
Abstract: - Information technologies and computer equipment are used in almost all areas of activity, which is
why new areas of their use are emerging, and the level of ICT implementation is deepening, with more and
more functions that were the prerogative of humans being assigned to computers. As science and technology
develop, new technologies and technical means are emerging that enable a human-centered approach to
software development, better adaptation of human-machine interfaces to user needs, and an increase in the
ergonomics of software products, etc. These measures contribute to the formation of fundamentally new
opportunities for presenting and processing information about real-world objects with which an individual
interacts in production, educational and everyday activities in computer systems. The article aims to identify
current models and algorithms for processing multimodal data in computer systems based on a survey of
company employees and to analyze these models and algorithms to determine the benefits of using models and
algorithms for processing multimodal data. Research methods: comparative analysis; systematization;
generalization; survey. Results. It has been established that the recommended multimodal data representation
models (the mixed model, the spatiotemporal linked model, and the multilevel ontological model) allow for
representing the digital twin of the object under study at differentiated levels of abstraction, and these
multimodal data processing models can be combined to obtain the most informative way to describe the
physical twin. As a result of the study, it was found that the "general judgment of the experience of using
models and algorithms for multimodal data processing" was noted by the respondents in the item "Personally, I
would say that models and algorithms for multimodal data processing are practical" with an average value of
8.16 (SD = 0 1.70), in the item "Personally, I would say that models and algorithms for multimodal data
processing are understandable (not confusing)" with an average value of 7.52. It has been determined that
respondents positively evaluate (with scores above 5.0) models and algorithms for processing multimodal data
in work environments as practical, understandable, manageable, and original.
columns finish at the same distance from the top of the page.
Key-Words: - Models, algorithms, multimodal data processing, modalities.
Received: May 11, 2022. Revised: January 19, 2023. Accepted: February 14, 2023. Published: March 14, 2023.
1 Introduction
At the beginning of the 21st century, major global
changes took place, characterized by the progressive
development of digital and innovative technologies,
the revolution in the information space, and the
acceleration of globalization and digitalization in all
areas of professional activity, [1].
To solve various engineering problems, it is
important to have an accurate, complete, and
consistent representation of multimodal data about
the object of observation. However, for some tasks,
it is crucial to apply an integrated approach when
the data of differentiated modalities are
interconnected and time-specific. It is especially
important to use multimodal data in cases where it is
necessary to analyze the dynamics of changes in
differentiated parameters of an object
simultaneously.
Multimodal data processing is one of the most
complex tasks in the field of natural language
processing and machine learning. This is because
multimodal data processing requires processing
different types of data, such as images, video, text,
and sound, and combining them into a single model
or algorithm.
Today, there are no relatively universal models
and algorithms for representing multimodal digital
data. Therefore, the task of forming models and
algorithms for representing multimodal data that
could be used to identify digital objects of
observation is noted as an urgent task that needs to
be solved. Multimodal data includes both a
behavioral model and a visual model, thus one of
the modalities in these applications is information
about the view of a three-dimensional object.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.11
Nataliya Boyko
E-ISSN: 2224-3402
87
Volume 20, 2023
The idea of storing "volumetric" graphical
information represented in the voxel and voxel
graphics should be extended to the storage of
information of other modalities, such as audio
information, information about physical properties,
and parameters of an object (temperature, humidity,
density), etc. Thus, multimodal information about
the object of observation can be stored based on a
volumetric element of the most generalized type,
which stores information on all possible modalities
represented in time, [2], [3].
The relevance of this area of research is
explained by the need for further informatization of
society and the application of a human-centered
approach based on the combination of multimodal
data technology to solve a wide range of problems
in situations involving risk or limiting the possibility
of direct human participation. The need for this new
approach has been demonstrated by many man-
made and environmental disasters that have
occurred in recent decades, [4].
The research aims to identify current models and
algorithms for processing multimodal data in
computer systems based on a survey of company
employees and analyze these models and
algorithms. Moreover, it is crucial to determine the
benefits of using models and algorithms for
processing multimodal data.
Research objectives of the article:
1. To analyze the proposed multimodal data
representation models: the mixed model, the
spatiotemporal linked model, and the multilevel
ontological model.
2. To analyze the ways of presenting aggregated
data and specifying a multi-image.
3. To make a comparative characterization of
five commonly used models and algorithms for
processing multimodal data in computer systems.
4. To analyze the paradigm of multi-image
programming.
5. To survey students to establish a general
judgment about the experience of using models and
algorithms for processing multimodal data in
computer systems.
2 Problem Formulation
Previously, researchers focused on studying one
modality in the form of text, speech, or images, [5].
However, with the advancement of computer
processing power and the development of
sophisticated sensors, multimodal approaches can
now be applied, which assimilates more accurate
and detailed results. Affective data analysis can be
performed on all types of multimedia, such as text,
images, speech, physiological signals, and video,
[6].
Multimodality refers to forms of communication
and meaning-making that go beyond spoken or
written language, [7]. This includes speaking,
writing, and "visual, auditory, embodied, and spatial
aspects of interaction", [8].
When analyzing multimodal data, it is
additionally necessary to consider differentiated
feature spaces, as well as common and unique
variations between modalities and between
packages, [9]. These problems are not unique to
multimodal data, as domain adaptation, [10], and
multiobjective learning, [11], are well-known
machine learning problems that are directly
applicable to multimodal data integration, [12].
Several methodologies exist that can integrate
information from multimodal data to improve the
effectiveness of monitoring, diagnosis, and
prognostics. Multimodal data fusion is noted as a
dynamic area of research that is being applied in
various industries and their interdisciplinary fields,
such as automation, manufacturing, and robotics,
[13]. The main goal is to process information from
multiple heterogeneous sources to assess a near-
accurate view of the structural, functional, and
behavioral states of a machine. Based on the
observation of these states, specialists and
professionals in the relevant field derive elements of
analysis that can offer several reasoning tools with
digital interactive services for human-robot
recognition, [14], [15], and interaction, [16], [17].
Indeed, in some complex environments, it is
necessary to use multiple multimodal data that
provide additional information about the same
situation and offer excellent problem-solving
capabilities.
The multimodal procedure mechanism has
proven to be useful for defect detection and
diagnosis, [18], fault prediction, [19], and estimation
of the remaining useful life (RUL), [20] of systems
in industrial plants. An approach based on
multimodal data on the remaining useful life (RUL)
may be appropriate for the evaluation of cutting
tools during milling, [21].
In the context of fault diagnosis, the researchers
recommend the use of the DCAE (Data Collection,
Analytics, and Events) model, which captures
multimodal sensor signals related to the measured
space, such as sound and vibration data, and
effortlessly incorporates the extraction of
multimodal data into the data for diagnosing fault
modes, [18].
In scientific works the basic architecture of a
software system for processing multimodal data was
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.11
Nataliya Boyko
E-ISSN: 2224-3402
88
Volume 20, 2023
presented, [22], [23], [24]. The architecture
proposed provides the following main components,
[22]:
1. A software component that implements an
information model that provides an abstract
specification of the technical characteristics of a
physical object.
2. A software component that implements the
mechanism of communication between digital
and physical twins.
3. A software component that implements
procedures for processing, analyzing, and
searching multimodal data to obtain up-to-date
information about a physical twin.
4. A key-value pair storage, [25], designed to store
digital twin data. The proposed architecture does
not include a component for visualizing the
digital twin. The result of such a software system
is only analytical data.
The input data for the multi-image creation
method are temporal data sets and the data type
(modality) of each set. This method consists of
seven stages. The result of the multi-image creation
method is a multi-image of the object under study,
which is presented as an ordered set of temporal
multimodal data. When solving some tasks of
analyzing temporal multimodal data, it may be
necessary to synchronize several multi-images. Let
us analyze a method that allows for such
synchronization, including in the case when the time
values are not clearly defined.
The multi-image synchronization method is
based on the use of interval relations and the
implementation of synchronization rules, which are
divided into 1) the universal synchronization rule
allows for accurate synchronization of temporal
multimodal data and can be applied to arbitrary
tuples of time values, but it requires operations that
cannot be reduced; 2) the use of a synchronization
template created by combining basic
synchronization rules allows to increase the
efficiency of multimodal data processing; 3) fuzzy
synchronization rules allow synchronization with a
certain permissible error in determining the
elements of tuples of time values in the
synchronized multi-images; they can also be used to
create a synchronization template to increase the
efficiency of multi-image data processing.
Let's analyze the mixed model, which allows us
to consider the digital twin of the object under study
as a solid object defined by a set of temporal
multimodal data, each of which is defined by a
specific multi-image. The idea of a mixed model is
based on the fact that one of the important
modalities for the visual display of a digital twin is
information about the appearance of its physical
twin, a real three-dimensional object. Then, this
model can be defined through the concept of a
mixed as a single element of the object under study,
which is described by an ordered set of temporal
multimodal data - a multi-image.
The spatiotemporal linked model represents the
object under study through a set of its discrete
states, which are determined at unique moments by
a set of object characteristics, which are multimodal
data. At each moment, an object has one and only
one state, which is determined by its multi-image.
At the same time, the temporal tuple of each multi-
image contains only one value that determines the
moment in time when the object had a certain
specific state. This approach allows for the use of
the time value as a key that uniquely identifies this
particular state in an ordered sequence of states of
the object under study.
In a multilevel ontological model, the object
under study is represented as a composition of its
constituents (components), which are characterized
by certain multimedia parameters and behavioral
characteristics determined by the corresponding
multimodal data. A multilevel ontological model is
defined as an oriented graph in which each node is
described by a set of multimodal data. The links
between nodes show the semantic subordination of
components to each other.
The multimodal data about the object under
study, which is provided utilizing a digital twin,
may be confidential. If such data is stored in a
storage facility such as, for example, Azure Data
Lake Storage, the confidentiality of the data is
guaranteed by the data protection mechanisms of the
storage facility itself. However, if the amount of
multimodal data is relatively small, the data can be
stored locally, but it may still need to be protected
from unauthorized access. Ensuring confidential
storage for different multimodal data can be
achieved in differentiated ways, including the use of
lightweight cryptography algorithms, which have
recently been used to protect data in IoT
applications, [26], [27], [28], [29], and
steganography algorithms, which are used to protect
multimedia data, [30].
Thus, the study of current models and algorithms
for multimodal data processing is insignificantly
reflected in scientific publications in the form of
theoretical research and practical studies. However,
the issue of promoting the development,
implementation, and application of models and
algorithms for multimodal data processing remains
relevant and open for further research.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.11
Nataliya Boyko
E-ISSN: 2224-3402
89
Volume 20, 2023
3 Materials and Methods
The realization of the goal of this study involves the
use of such research methods as:
systematization of the main features of the
implementation of the proposed three models for
processing multimodal data: the mixed model, the
spatiotemporal linked model, and the multilevel
ontological model.;
system and logical analysis, method of
information synthesis, ways of presenting
aggregated data, multi-image specifications;
generalization of the latest scientific
publications related to the multi-image
programming paradigm;
a comparison method for distinguishing the
characteristics of five commonly used models and
algorithms for processing multimodal data in
computer systems.
To determine the experience of applying models
and algorithms for processing multimodal data in
computer systems, a survey was conducted using
descriptive statistics, the data of which were
provided as a result of using MS Forms Pro. The
survey was conducted to establish a general
judgment about the experience of using models and
algorithms for processing multimodal data in
computer systems. An online survey was conducted
from October 10, 2022, to January 30, 2023,
collecting information from 550 respondents. These
participants answered questions about their
experience with the use of models and algorithms
for processing multimodal data in computer
systems, motivation, expectations, and overall
satisfaction with the use of these models and
algorithms.
4 Results
A sequence of multicomponent values can be
viewed as a function of many variables: = (, 1,
2, …, ). Such a view of aggregated data allows
us to apply the calculus of many variables to them in
problems in which it is appropriate to represent and
process multimodal data using operations and
relations defined in the ASA (algebraic system of
aggregates), i.e., the use of the ASA does not deny
the use of other mathematical concepts. The ways of
presenting aggregated data is illustrated in Figure 1.
Fig. 1: Ways of presenting aggregated data
Notes: n,
;
multi-component
values; n time value;
the value that defines
the property 1;
the value that defines the
property 2;
the value that defines the
property .
Source: Compiled by the authors based on official
data of Sulema
If the data about the object of observation is
obtained taking into account the time of observation
or measurement, then the aggregate must contain a
tuple of corresponding time values corresponding to
the moments when the values of the tuples of
multimodal data were obtained, which we will call
the multi-image of the digital twin of the object
under study. A multi-image is an aggregate the first
tuple of which is a nonempty tuple of time values.
These values can be natural numbers or any other
values that provide clarity and unambiguity of
information about the moments in time when the
elements of other tuples of the multi-image were
obtained. The schematic definition of a multi-image
is shown in Figure 2, each block of which contains
the name of the data sequence (tuple), the modality
of the data (the set to which it belongs), and the
length of the data tuple.
Fig. 2: Schematic specification of multi-image
Source: Compiled by the authors based on official data of
Sulema, [2].
To represent data for each level of abstraction (a
solid object; an object defined by a set of states; an
object that is a composition of its constituents), it is
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.11
Nataliya Boyko
E-ISSN: 2224-3402
90
Volume 20, 2023
necessary to develop appropriate models that will
allow for high-quality representation and efficient
processing of multimodal data of the digital twin of
the object under study.
The mixed model of the digital twin of the object
under study is based on the decomposition of the
object space, as shown in Figure 3.
Fig. 3: The mixed model of the object under study
Source: Compiled by the authors based on official data
of, [2].
An example of a schematic multi-image
specification for describing a pixel is shown in
Figure 4.
Fig. 4: Schematic multi-image specification for
describing a muxed
Source: Compiled by the authors based on official data
of, [2].
The following mathematical specification
corresponds to this schematic specification:
 



󰇛󰇜

where T is a set of time values; R3 is a set of
Cartesian coordinates of a point in the object's
space; RQ is a set of graphical data components; Z3
is a set of values of acoustic components; Mt a
set of temperature data; Md is a set of material
density values; Mh is a set of material moisture
values; r, N, K is several elements of the data
tuple. It is advisable to use the mixed model when
it is needed to present detailed information for a
comprehensive description of a small object.
The spatiotemporal linked model of the object
under study is based on the definition of a specific
state in an ordered sequence of states of the object
under study, as shown in Figure 5.
The multilevel ontological model of the object
under study is based on the description of the
multimodal data set of the object under study as
shown in Figure 6.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.11
Nataliya Boyko
E-ISSN: 2224-3402
91
Volume 20, 2023
Fig. 5: The spatiotemporal linked model of the object under study
Source: Compiled by the authors based on official data of, [2].
Fig. 6: The multilevel ontological model of the object under study
A general judgment on the experience of
applying models and algorithms for processing
multimodal data is shown in Table 1.
Table 1. Questionnaire on the experience of using models and algorithms for multimodal data processing
Question
Assessment (110) (s.d.)
1. "I would say that models and algorithms for processing multimodal data are practical"
8,16 (1,70)
2. "I would say that the models and algorithms for processing multimodal data are understandable"
7,52 (2,54)
3. "I would say that models and algorithms for multimodal data processing are positively contributing to
multimodal data processing"
8,24 (1,96)
4. "I have found that models and algorithms for multimodal data processing contribute to original processing
of multimodal data"
7,20 (2,55)
5. "I have established that models and algorithms for processing multimodal data are modifiable to make the
work easier"
6,36 (2,94)
6. "I have discovered that models and algorithms for processing multimodal data were easy (1) / difficult
(10) to use in the course of my activities"
4,00 (3,06)
7. "I have found that the models and algorithms for processing multimodal data are unpleasant (1)/pleasant
(10) to use"
7,00 (2,16)
Source: Compiled by the authors based on official data.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.11
Nataliya Boyko
E-ISSN: 2224-3402
92
Volume 20, 2023
To determine the most commonly used
models and algorithms for processing multimodal
data, a survey was conducted, the results of which
are shown in Table 2.
Table 2. Comparison of five commonly used models and algorithms for multimodal data processing
The model/algorithm
The ability to apply the most
informative ways to process
multimodal data
The convenience of the
model/algorithm
The multilevel
ontological model
strong
convenient
The spatiotemporal
linked model
very strong
convenient
The mixed model
strong
very convenient
The lightweight
cryptography algorithm
strong
convenient
The steganographic
algorithm
very strong
convenient
Source: Compiled by the authors based on official data.
The multi-image programming paradigm is
focused on solving the problems of processing
temporal multimodal data. In this case, the
developer must take into account the
interconnection of data in terms of the time interval
of their existence (obtaining, determining,
generating, measuring, etc.) and data modality, that
is, operate with such an entity as a multi-image of
an object. In addition, the abstraction of the
concepts of source and receiver of multimodal data
is performed through the use of specialized libraries
that allow data conversion from one form of
representation to another form of the programmer's
need to develop their program code for preparing
multimodal data (see Figure 7).
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.11
Nataliya Boyko
E-ISSN: 2224-3402
93
Volume 20, 2023
Fig. 7: The multi-image programming paradigm
Source: Compiled by the authors based on official data of Sulema (2020).
5 Discussion
The results of the study of models and algorithms
for multimodal data processing led to the following
conclusions. Three models for multimodal data
processing were proposed: the mixed model, the
spatiotemporal linked model, and the multilevel
ontological model. These models make it possible to
represent the digital twin of the object under study
at differentiated levels of abstraction: as a
continuous object (the mixed model), as an object
defined by a set of states dynamically changing over
time (the spatiotemporal linked model); as an object
that is a composition of its components, each of
which, in turn, can be considered as a separate
object, forming a hierarchy of objects (the
multilevel ontological model). The proposed models
of multimodal data processing can be combined to
obtain the most informative way to describe a
physical twin, [2], [3], [4].
The evaluation subscale measures the "overall
judgment of experience in applying models and
algorithms for multimodal data processing". In this
subscale, the mean value of 8.16 (SD = 0 1.70) in
the item "I would say that models and algorithms for
multimodal data processing are practical" and the
mean value of 7.52 in the item "I would say that
models and algorithms for multimodal data
processing are understandable (not confusing)"
stand out. It is interesting to know that respondents
positively evaluate (with scores above 5.0) models
and algorithms for processing multimodal data in
work environments as practical, understandable,
manageable, original, etc. In this regard, it would be
necessary to investigate whether it is possible to
obtain better results and increase the efficiency of
The multi-image programming paradigm
When multimodal data is received (identified, measured, generated), the time when it was received is recorded.
The data from differentiated modalities that define a particular object under study are considered and processed
together with data from other modalities that define the same object.
The data is obtained from heterogeneous sources and in arbitrary formats.
Synchronization and aggregation of data from different modalities is the basis of the computing model for
processing multi-image data.
Multi-image processing is performed using operations and relations defined in the algebraic system of
aggregates.
The processing flow is based on the temporality property of multimodal data.
The main entity on which the program's data representation and processing are based is the multi-image.
The programming paradigm is implemented through the use of a basic computing model and a computing model
for digital twin technology.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.11
Nataliya Boyko
E-ISSN: 2224-3402
94
Volume 20, 2023
temporal multimodal data processing in computer
systems based on the application of models and
algorithms for multimodal data processing increase
the efficiency of temporal multimodal data
processing in computer systems.
Thus, the implementation of models and
algorithms for multimodal data processing in
computer systems will face new challenges in line
with innovative changes. Moreover, in-depth
research will lead to increased attention to
improving models and algorithms for multimodal
data processing.
6 Conclusion
As a result of the analysis of models and algorithms
for processing multimodal data, it was found that
the development of a new class of software and
hardware systems based on digital twin technology
and multimedia technology will expand the
capabilities and increase the efficiency of human
activity in complex or non-standard conditions.
Increasing profits from the use of these models and
algorithms for processing multimodal data,
attracting maximum attention, the novelty in use,
prospects, and opportunities in this area are only a
part of all possible positive effects of using models
and algorithms for processing multimodal data in
professional activities.
Since the digital twin technology is characterized
not only by technical characteristics or behavioral
data but also by a visual model, it is advisable to
develop this technology in conjunction with
multimedia technology, which involves operating
not only audiovisual data but also data of other
modalities that transmit other types of information
perceived by the human senses. The application of
this approach will synergistically enhance the
capabilities of both technologies and contribute to
the development of a new class of software and
hardware systems for processing multimodal data
and solving a wide range of tasks remotely.
The practical significance of the study lies in the
fact that the conclusions and recommendations
developed by the author and proposed in the article
can be used to select tools for implementing the
proposed models and algorithms for processing
multimodal data. Further research can be aimed at
improving and developing methods for studying the
practical principles of implementation and studying
models and algorithms for processing multimodal
data in computer systems. In future research, it
would be interesting to analyze the capabilities of
models and algorithms for multimodal data
processing in differentiated industries.
The advantages of the study are the assessment
of the use of models in multimodal data processing,
which are evaluated in theoretical and practical
aspects. For this, in addition to the analysis of
scientific works, a survey of experts and specialists
working in this direction was conducted.
Further research in this area may include the
development of new architectures and algorithms
that would be able to efficiently process different
types of data and combine them into a single model.
For example, deep neural networks can be
developed that can process images and text
simultaneously.
References:
[1] Sîrghi, S., Sîrghi, A.. Design for online
teaching and learning in the context of digital
education. Știinţa culturii fizice. Nr. 35/1, 50-
54. 2020. Online available from
https://doi.org/10.52449/1857-4114.2020.35-
1.08.
[2] Sulema, Ye., Dychka, I., Sulema, O.
Multimodal Data Representation Models for
Virtual, Remote, and Mixed Laboratories
Development, in Lecture Notes in Networks
and Systems, Springer Cham, vol. 47, pp. 559-
569. 2018
[3] Dychka, I. A., Sulema, E. S. Multimodal data
representation model for a comprehensive
description of observation objects. Bulletin of
the Vinnytsia Polytechnic Institute, (1), 5360.
2020. Online available from
https://doi.org/10.31649/1997-9266-2020-148-
1-53-60
[4] Sulema, E. S. Methods, models, and tools for
processing multimodal data of digital
duplicates of researched objects. The National
Technical University of Ukraine "Kyiv
Polytechnic Institute named after Igor
Sikorsky", Kyiv, 343 p. 2020
[5] Nusrat, J. S., Li-Minn, A., Kah Phooi Seng,
D.M., Motiur, R., Tanveer, Z. Multimodal big
data affective analytics: A comprehensive
survey using text, audio, visual and
physiological signals, Journal of Network and
Computer Applications, Volume 149, 102447.
2020. Retrieved from:
https://doi.org/10.1016/j.jnca.2019.102447.
[6] Calvo, R., D’Mello, S. (Affect Detection: An
Interdisciplinary Review of Models, Methods,
and Their Applications. IEEE Transactions on
Affective Computing, 1, 18-37. 2010.Online
available from http://dx.doi.org/10.1109/T-
AFFC.2010.1
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.11
Nataliya Boyko
E-ISSN: 2224-3402
95
Volume 20, 2023
[7] Scollon, R., Scollon, S. Multimodality and
language: a retrospective and prospective view.
In C. Jewitt (Ed.), The Routledge Handbook of
Multimodal Analysis (pp. 170180). 2009.
London: Routledge.
[8] Jewitt, C. (2013). Multimodal methods for
researching digital technologies. In S. Price, C.
Jewitt, & B. Brown (Eds.), The Sage handbook
of digital technology research (pp. 250265).
London: Sage.
[9] Argelaguet, R., Cuomo, A. S. E., Stegle, O.,
Marioni, J. C. Computational principles and
challenges in single-cell data integration.
Nature Biotechnology, 39:12021215. 2021/
Online available from DOI: 10.1038/s41587-
021-00895-7.
[10] Csurka, G. A Comprehensive Survey on
Domain Adaptation for Visual Applications.
Advances in Computer Vision and Pattern
Recognition, (9783319583464):135. 2017.
Online available from DOI: 10.1007/978-3-
319-58347-1_1.
[11] Zhao, J., Xie, X., Xu, X., Sun, S. 2017. Multi-
view learning overview: Recent progress and
new challenges. Information Fusion, 38:4354.
Online available from DOI:
10.1016/J.INFFUS.2017.02.007.
[12] Lance, C., Luecken, M. D., Burkhardt, D. B.,
Cannoodt, R., Rautenstrauch, P., Laddach, A.,
Ubingazhibov, A., Cao, Z.-J., Deng, K., Khan,
S., Liu, Q., Russkikh, N., Ryazantsev, G.,
Ohler, U., Pisco, A. O., Bloom, J.,
Krishnaswamy, S., & Theis, F. J. (2022).
Multimodal single-cell data integration
challenge: results and lessons learned. Online
available from
https://doi.org/10.1101/2022.04.11.487796.
[13] Bokade, R., Navato, A., Ouyang, R., Jin, X.,
Chou, C.-A., Ostadabbas, S., & Mueller, A. V.
A cross-disciplinary comparison of multimodal
data fusion approaches and applications:
Accelerating learning through trans-
disciplinary information sharing. Expert
Systems with Applications, 165, Article
113885. (2021). Retrieved from:
https://doi.org/10.1016/j. eswa.2020.113885.
[14] Gupta, A., Anpalagan, A., Guan, L., Khwaja,
A. S. (2021). Deep learning for object
detection and scene perception in self-driving
cars: Survey, challenges, and open issues.
Array, 100057. Online available from
https://doi.org/10.1016/j.array.2021.100057.
[15] Alkhalaf, S. A robust variance information
fusion technique for real-time autonomous
navigation systems. Measurement, 179, Article
109441. 2021. Online available from
https://doi.org/
10.1016/j.measurement.2021.109441.
[16] Cuayahuitl, H. A data-efficient deep learning
approach for deployable multimodal social
robots. Neurocomputing, 396, 587598. 2020.
Online available from https://doi.org/10.1016/j.
neucom.2018.09.104.
[17] Liu, H., Fang, T., Zhou, T., Wang, L. Towards
robust human-robot collaborative
manufacturing: Multimodal fusion. IEEE
Access, 6, 7476274771. 2021. Online
available from https://doi.org/
10.1109/ACCESS.2018.2884793.
[18] Ma, M., Sun, C., Chen, X. (2018). Deep
coupling autoencoder for fault diagnosis with
multimodal sensory data. IEEE Transactions
on Industrial Informatics, 14, 11371145.
2018. Online available from
https://doi.org/10.1109/TII.2018.2793246.
[19] Yang, Z., Baraldi, P., Zio, E. A multi-branch
deep neural network model for failure
prognostics based on multimodal data. Journal
of Manufacturing Systems, 59, 4250. 2021
Online available from
https://doi.org/10.1016/j.jmsy.2021.01.007.
[20] Al-Dulaimi, A., Zabihi, S., Asif, A.,
Mohammadi, A. A multimodal and hybrid
deep neural network model for remaining
useful life estimation. Computers in Industry,
108, 186196. 2019. Online available from
https://doi.org/10.1016/j.compind.2019.02.004.
[21] Kumar, S., Kolekar, T., Patil, S., Bongale, A.,
Kotecha, K., Zaguia, A., Prakash, C. A low-
cost multi-sensor data acquisition system for
fault detection in fused deposition modeling.
Sensors, 22, 517. 2022. Online available from
https://doi.org/10.3390/s22020517.
[22] Lu, Y., Liu, C., Wang, K. I-K., Huang, H., Xu,
X. Digital Twin-driven smart manufacturing:
connotation, reference model, applications and
research issues. Robotics and Computer
Integrated Manufacturing, vol. 61, рр. 1–14.
2020.
[23] Alam, K. M., El Saddik, A. (C2PS: A digital
twin architecture reference model for the
cloud-based cyber-physical systems. IEEE
Access, vol. 5, рр. 2050–2062. 2017.
[24] Redelinghuys, A. J. H., Basson, A. H., Kruger,
K. A Six-Layer Digital Twin Architecture for a
Manufacturing Cell. Studies in Computational
Intelligence, vol. 803, рр. 412–423. 2018.
[25] Keith, D. Understanding Key-Value Databases.
Dataversity. 2020. Online available from
https://www.dataversity.net/understanding-
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.11
Nataliya Boyko
E-ISSN: 2224-3402
96
Volume 20, 2023
key-value-databases/#.
[26] Buchanan, W.J., Li, S., Asif, R. Lightweight
cryptography methods. Journal of Cyber
Security Technology, vol. 1, Issue 3–4, рр.
187201. 2017.
[27] Ronen, E., Shamir, A. Extended functionality
attacks on IoT devices: The case of smart
lights. Proceedings of the 2016 IEEE European
symposium on security and privacy (SP’16),
рр. 3–12. 2016.
[28] Dhanda, S.S., Singh, B., Jindal, P. Lightweight
Cryptography: A Solution to Secure IoT.
Wireless Personal Communications, vol. 112,
рр. 1947– 1980. 2020.
[29] Dutta, I. K., Ghosh, B., Bayoumi, M.
Lightweight Cryptography for Internet of
Insecure Things: A Survey. Proceedings of the
IEEE 9th Annual Computing and
Communication Workshop and Conference
(CCWC2019), рр. 0475-0481. 2019.
[30] Maharjan, R., Shrestha, A. K., Basnet, R.
Image Steganography: Protection of Digital
Properties against Eavesdropping. ArXiv. 8
p.2019.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2023.20.11
Nataliya Boyko
E-ISSN: 2224-3402
97
Volume 20, 2023
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
The authors equally contributed in the present
research, at all stages from the formulation of the
problem to the final findings and solution.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
No funding was received for conducting this study.
Conflict of Interest
The authors have no conflicts of interest to declare
that are relevant to the content of this article.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US