Models and Algorithms for Multimodal Data Processing

NATALIYA BOYKO

Artificial intelligence Department,

Lviv Polytechnic National University,

UKRAINE

Abstract: - Information technologies and computer equipment are used in almost all areas of activity, which is

why new areas of their use are emerging, and the level of ICT implementation is deepening, with more and

more functions that were the prerogative of humans being assigned to computers. As science and technology

develop, new technologies and technical means are emerging that enable a human-centered approach to

software development, better adaptation of human-machine interfaces to user needs, and an increase in the

ergonomics of software products, etc. These measures contribute to the formation of fundamentally new

opportunities for presenting and processing information about real-world objects with which an individual

interacts in production, educational and everyday activities in computer systems. The article aims to identify

current models and algorithms for processing multimodal data in computer systems based on a survey of

company employees and to analyze these models and algorithms to determine the benefits of using models and

algorithms for processing multimodal data. Research methods: comparative analysis; systematization;

generalization; survey. Results. It has been established that the recommended multimodal data representation

models (the mixed model, the spatiotemporal linked model, and the multilevel ontological model) allow for

representing the digital twin of the object under study at differentiated levels of abstraction, and these

multimodal data processing models can be combined to obtain the most informative way to describe the

physical twin. As a result of the study, it was found that the "general judgment of the experience of using

models and algorithms for multimodal data processing" was noted by the respondents in the item "Personally, I

would say that models and algorithms for multimodal data processing are practical" with an average value of

8.16 (SD = 0 1.70), in the item "Personally, I would say that models and algorithms for multimodal data

processing are understandable (not confusing)" with an average value of 7.52. It has been determined that

respondents positively evaluate (with scores above 5.0) models and algorithms for processing multimodal data

in work environments as practical, understandable, manageable, and original.

columns finish at the same distance from the top of the page.

Key-Words: - Models, algorithms, multimodal data processing, modalities.

Received: May 11, 2022. Revised: January 19, 2023. Accepted: February 14, 2023. Published: March 14, 2023.

1 Introduction

At the beginning of the 21st century, major global

changes took place, characterized by the progressive

development of digital and innovative technologies,

the revolution in the information space, and the

acceleration of globalization and digitalization in all

areas of professional activity, [1].

To solve various engineering problems, it is

important to have an accurate, complete, and

consistent representation of multimodal data about

the object of observation. However, for some tasks,

it is crucial to apply an integrated approach when

the data of differentiated modalities are

interconnected and time-specific. It is especially

important to use multimodal data in cases where it is

necessary to analyze the dynamics of changes in

differentiated parameters of an object

simultaneously.

Multimodal data processing is one of the most

complex tasks in the field of natural language

processing and machine learning. This is because

multimodal data processing requires processing

different types of data, such as images, video, text,

and sound, and combining them into a single model

or algorithm.

Today, there are no relatively universal models

and algorithms for representing multimodal digital

data. Therefore, the task of forming models and

algorithms for representing multimodal data that

could be used to identify digital objects of

observation is noted as an urgent task that needs to

be solved. Multimodal data includes both a

behavioral model and a visual model, thus one of

the modalities in these applications is information

about the view of a three-dimensional object.

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.11

Nataliya Boyko

E-ISSN: 2224-3402

Volume 20, 2023

The idea of storing "volumetric" graphical

information represented in the voxel and voxel

graphics should be extended to the storage of

information of other modalities, such as audio

information, information about physical properties,

and parameters of an object (temperature, humidity,

density), etc. Thus, multimodal information about

the object of observation can be stored based on a

volumetric element of the most generalized type,

which stores information on all possible modalities

represented in time, [2], [3].

The relevance of this area of research is

explained by the need for further informatization of

society and the application of a human-centered

approach based on the combination of multimodal

data technology to solve a wide range of problems

in situations involving risk or limiting the possibility

of direct human participation. The need for this new

approach has been demonstrated by many man-

made and environmental disasters that have

occurred in recent decades, [4].

The research aims to identify current models and

algorithms for processing multimodal data in

computer systems based on a survey of company

employees and analyze these models and

algorithms. Moreover, it is crucial to determine the

benefits of using models and algorithms for

processing multimodal data.

Research objectives of the article:

1. To analyze the proposed multimodal data

representation models: the mixed model, the

spatiotemporal linked model, and the multilevel

ontological model.

2. To analyze the ways of presenting aggregated

data and specifying a multi-image.

3. To make a comparative characterization of

five commonly used models and algorithms for

processing multimodal data in computer systems.

4. To analyze the paradigm of multi-image

programming.

5. To survey students to establish a general

judgment about the experience of using models and

algorithms for processing multimodal data in

computer systems.

2 Problem Formulation

Previously, researchers focused on studying one

modality in the form of text, speech, or images, [5].

However, with the advancement of computer

processing power and the development of

sophisticated sensors, multimodal approaches can

now be applied, which assimilates more accurate

and detailed results. Affective data analysis can be

performed on all types of multimedia, such as text,

images, speech, physiological signals, and video,

[6].

Multimodality refers to forms of communication

and meaning-making that go beyond spoken or

written language, [7]. This includes speaking,

writing, and "visual, auditory, embodied, and spatial

aspects of interaction", [8].

When analyzing multimodal data, it is

additionally necessary to consider differentiated

feature spaces, as well as common and unique

variations between modalities and between

packages, [9]. These problems are not unique to

multimodal data, as domain adaptation, [10], and

multiobjective learning, [11], are well-known

machine learning problems that are directly

applicable to multimodal data integration, [12].

Several methodologies exist that can integrate

information from multimodal data to improve the

effectiveness of monitoring, diagnosis, and

prognostics. Multimodal data fusion is noted as a

dynamic area of research that is being applied in

various industries and their interdisciplinary fields,

such as automation, manufacturing, and robotics,

[13]. The main goal is to process information from

multiple heterogeneous sources to assess a near-

accurate view of the structural, functional, and

behavioral states of a machine. Based on the

observation of these states, specialists and

professionals in the relevant field derive elements of

analysis that can offer several reasoning tools with

digital interactive services for human-robot

recognition, [14], [15], and interaction, [16], [17].

Indeed, in some complex environments, it is

necessary to use multiple multimodal data that

provide additional information about the same

situation and offer excellent problem-solving

capabilities.

The multimodal procedure mechanism has

proven to be useful for defect detection and

diagnosis, [18], fault prediction, [19], and estimation

of the remaining useful life (RUL), [20] of systems

in industrial plants. An approach based on

multimodal data on the remaining useful life (RUL)

may be appropriate for the evaluation of cutting

tools during milling, [21].

In the context of fault diagnosis, the researchers

recommend the use of the DCAE (Data Collection,

Analytics, and Events) model, which captures

multimodal sensor signals related to the measured

space, such as sound and vibration data, and

effortlessly incorporates the extraction of

multimodal data into the data for diagnosing fault

modes, [18].

In scientific works the basic architecture of a

software system for processing multimodal data was

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.11

Nataliya Boyko

E-ISSN: 2224-3402

Volume 20, 2023

presented, [22], [23], [24]. The architecture

proposed provides the following main components,

[22]:

1. A software component that implements an

information model that provides an abstract

specification of the technical characteristics of a

physical object.

2. A software component that implements the

mechanism of communication between digital

and physical twins.

3. A software component that implements

procedures for processing, analyzing, and

searching multimodal data to obtain up-to-date

information about a physical twin.

4. A key-value pair storage, [25], designed to store

digital twin data. The proposed architecture does

not include a component for visualizing the

digital twin. The result of such a software system

is only analytical data.

The input data for the multi-image creation

method are temporal data sets and the data type

(modality) of each set. This method consists of

seven stages. The result of the multi-image creation

method is a multi-image of the object under study,

which is presented as an ordered set of temporal

multimodal data. When solving some tasks of

analyzing temporal multimodal data, it may be

necessary to synchronize several multi-images. Let

us analyze a method that allows for such

synchronization, including in the case when the time

values are not clearly defined.

The multi-image synchronization method is

based on the use of interval relations and the

implementation of synchronization rules, which are

divided into 1) the universal synchronization rule

allows for accurate synchronization of temporal

multimodal data and can be applied to arbitrary

tuples of time values, but it requires operations that

cannot be reduced; 2) the use of a synchronization

template created by combining basic

synchronization rules allows to increase the

efficiency of multimodal data processing; 3) fuzzy

synchronization rules allow synchronization with a

certain permissible error in determining the

elements of tuples of time values in the

synchronized multi-images; they can also be used to

create a synchronization template to increase the

efficiency of multi-image data processing.

Let's analyze the mixed model, which allows us

to consider the digital twin of the object under study

as a solid object defined by a set of temporal

multimodal data, each of which is defined by a

specific multi-image. The idea of a mixed model is

based on the fact that one of the important

modalities for the visual display of a digital twin is

information about the appearance of its physical

twin, a real three-dimensional object. Then, this

model can be defined through the concept of a

mixed as a single element of the object under study,

which is described by an ordered set of temporal

multimodal data - a multi-image.

The spatiotemporal linked model represents the

object under study through a set of its discrete

states, which are determined at unique moments by

a set of object characteristics, which are multimodal

data. At each moment, an object has one and only

one state, which is determined by its multi-image.

At the same time, the temporal tuple of each multi-

image contains only one value that determines the

moment in time when the object had a certain

specific state. This approach allows for the use of

the time value as a key that uniquely identifies this

particular state in an ordered sequence of states of

the object under study.

In a multilevel ontological model, the object

under study is represented as a composition of its

constituents (components), which are characterized

by certain multimedia parameters and behavioral

characteristics determined by the corresponding

multimodal data. A multilevel ontological model is

defined as an oriented graph in which each node is

described by a set of multimodal data. The links

between nodes show the semantic subordination of

components to each other.

The multimodal data about the object under

study, which is provided utilizing a digital twin,

may be confidential. If such data is stored in a

storage facility such as, for example, Azure Data

Lake Storage, the confidentiality of the data is

guaranteed by the data protection mechanisms of the

storage facility itself. However, if the amount of

multimodal data is relatively small, the data can be

stored locally, but it may still need to be protected

from unauthorized access. Ensuring confidential

storage for different multimodal data can be

achieved in differentiated ways, including the use of

lightweight cryptography algorithms, which have

recently been used to protect data in IoT

applications, [26], [27], [28], [29], and

steganography algorithms, which are used to protect

multimedia data, [30].

Thus, the study of current models and algorithms

for multimodal data processing is insignificantly

reflected in scientific publications in the form of

theoretical research and practical studies. However,

the issue of promoting the development,

implementation, and application of models and

algorithms for multimodal data processing remains

relevant and open for further research.

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.11

Nataliya Boyko

E-ISSN: 2224-3402

Volume 20, 2023

3 Materials and Methods

The realization of the goal of this study involves the

use of such research methods as:

− systematization of the main features of the

implementation of the proposed three models for

processing multimodal data: the mixed model, the

spatiotemporal linked model, and the multilevel

ontological model.;

− system and logical analysis, method of

information synthesis, ways of presenting

aggregated data, multi-image specifications;

− generalization of the latest scientific

publications related to the multi-image

programming paradigm;

− a comparison method for distinguishing the

characteristics of five commonly used models and

algorithms for processing multimodal data in

computer systems.

To determine the experience of applying models

and algorithms for processing multimodal data in

computer systems, a survey was conducted using

descriptive statistics, the data of which were

provided as a result of using MS Forms Pro. The

survey was conducted to establish a general

judgment about the experience of using models and

algorithms for processing multimodal data in

computer systems. An online survey was conducted

from October 10, 2022, to January 30, 2023,

collecting information from 550 respondents. These

participants answered questions about their

experience with the use of models and algorithms

for processing multimodal data in computer

systems, motivation, expectations, and overall

satisfaction with the use of these models and

algorithms.

4 Results

A sequence of multicomponent values can be

viewed as a function of many variables:  = (, 1,

2, …, ). Such a view of aggregated data allows

us to apply the calculus of many variables to them in

problems in which it is appropriate to represent and

process multimodal data using operations and

relations defined in the ASA (algebraic system of

aggregates), i.e., the use of the ASA does not deny

the use of other mathematical concepts. The ways of

presenting aggregated data is illustrated in Figure 1.

Fig. 1: Ways of presenting aggregated data

Notes: n, 



; 



…



  multi-component

values; n  time value; 



  the value that defines

the property 1; 



  the value that defines the

property 2; 



  the value that defines the

property .

Source: Compiled by the authors based on official

data of Sulema

If the data about the object of observation is

obtained taking into account the time of observation

or measurement, then the aggregate must contain a

tuple of corresponding time values corresponding to

the moments when the values of the tuples of

multimodal data were obtained, which we will call

the multi-image of the digital twin of the object

under study. A multi-image is an aggregate the first

tuple of which is a nonempty tuple of time values.

These values can be natural numbers or any other

values that provide clarity and unambiguity of

information about the moments in time when the

elements of other tuples of the multi-image were

obtained. The schematic definition of a multi-image

is shown in Figure 2, each block of which contains

the name of the data sequence (tuple), the modality

of the data (the set to which it belongs), and the

length of the data tuple.

Fig. 2: Schematic specification of multi-image

Source: Compiled by the authors based on official data of

Sulema, [2].

To represent data for each level of abstraction (a

solid object; an object defined by a set of states; an

object that is a composition of its constituents), it is

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.11

Nataliya Boyko

E-ISSN: 2224-3402

Volume 20, 2023

necessary to develop appropriate models that will

allow for high-quality representation and efficient

processing of multimodal data of the digital twin of

the object under study.

The mixed model of the digital twin of the object

under study is based on the decomposition of the

object space, as shown in Figure 3.

Fig. 3: The mixed model of the object under study

Source: Compiled by the authors based on official data

of, [2].

An example of a schematic multi-image

specification for describing a pixel is shown in

Figure 4.

Fig. 4: Schematic multi-image specification for

describing a muxed

Source: Compiled by the authors based on official data

of, [2].

The following mathematical specification

corresponds to this schematic specification:

      









󰇛󰇜



where T – is a set of time values; R3 – is a set of

Cartesian coordinates of a point in the object's

space; RQ – is a set of graphical data components; Z3

– is a set of values of acoustic components; Mt – a

set of temperature data; Md – is a set of material

density values; Mh – is a set of material moisture

values; r, N, K – is several elements of the data

tuple. It is advisable to use the mixed model when

it is needed to present detailed information for a

comprehensive description of a small object.

The spatiotemporal linked model of the object

under study is based on the definition of a specific

state in an ordered sequence of states of the object

under study, as shown in Figure 5.

The multilevel ontological model of the object

under study is based on the description of the

multimodal data set of the object under study as

shown in Figure 6.

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.11

Nataliya Boyko

E-ISSN: 2224-3402

Volume 20, 2023

Fig. 5: The spatiotemporal linked model of the object under study

Source: Compiled by the authors based on official data of, [2].

Fig. 6: The multilevel ontological model of the object under study

A general judgment on the experience of

applying models and algorithms for processing

multimodal data is shown in Table 1.

Table 1. Questionnaire on the experience of using models and algorithms for multimodal data processing

Question

Assessment (1–10) (s.d.)

1. "I would say that models and algorithms for processing multimodal data are practical"

8,16 (1,70)

2. "I would say that the models and algorithms for processing multimodal data are understandable"

7,52 (2,54)

3. "I would say that models and algorithms for multimodal data processing are positively contributing to

multimodal data processing"

8,24 (1,96)

4. "I have found that models and algorithms for multimodal data processing contribute to original processing

of multimodal data"

7,20 (2,55)

5. "I have established that models and algorithms for processing multimodal data are modifiable to make the

work easier"

6,36 (2,94)

6. "I have discovered that models and algorithms for processing multimodal data were easy (1) / difficult

(10) to use in the course of my activities"

4,00 (3,06)

7. "I have found that the models and algorithms for processing multimodal data are unpleasant (1)/pleasant

(10) to use"

7,00 (2,16)

Source: Compiled by the authors based on official data.

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.11

Nataliya Boyko

E-ISSN: 2224-3402

Volume 20, 2023

To determine the most commonly used

models and algorithms for processing multimodal

data, a survey was conducted, the results of which

are shown in Table 2.

Table 2. Comparison of five commonly used models and algorithms for multimodal data processing

The model/algorithm

The speed of multimodal

data processing

The ability to apply the most

informative ways to process

multimodal data

The convenience of the

model/algorithm

The multilevel

ontological model

slow

strong

convenient

The spatiotemporal

linked model

slow

very strong

convenient

The mixed model

very fast

strong

very convenient

The lightweight

cryptography algorithm

fast

strong

convenient

The steganographic

algorithm

fast

very strong

convenient

Source: Compiled by the authors based on official data.

The multi-image programming paradigm is

focused on solving the problems of processing

temporal multimodal data. In this case, the

developer must take into account the

interconnection of data in terms of the time interval

of their existence (obtaining, determining,

generating, measuring, etc.) and data modality, that

is, operate with such an entity as a multi-image of

an object. In addition, the abstraction of the

concepts of source and receiver of multimodal data

is performed through the use of specialized libraries

that allow data conversion from one form of

representation to another form of the programmer's

need to develop their program code for preparing

multimodal data (see Figure 7).

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.11

Nataliya Boyko

E-ISSN: 2224-3402

Volume 20, 2023

Fig. 7: The multi-image programming paradigm

Source: Compiled by the authors based on official data of Sulema (2020).

5 Discussion

The results of the study of models and algorithms

for multimodal data processing led to the following

conclusions. Three models for multimodal data

processing were proposed: the mixed model, the

spatiotemporal linked model, and the multilevel

ontological model. These models make it possible to

represent the digital twin of the object under study

at differentiated levels of abstraction: as a

continuous object (the mixed model), as an object

defined by a set of states dynamically changing over

time (the spatiotemporal linked model); as an object

that is a composition of its components, each of

which, in turn, can be considered as a separate

object, forming a hierarchy of objects (the

multilevel ontological model). The proposed models

of multimodal data processing can be combined to

obtain the most informative way to describe a

physical twin, [2], [3], [4].

The evaluation subscale measures the "overall

judgment of experience in applying models and

algorithms for multimodal data processing". In this

subscale, the mean value of 8.16 (SD = 0 1.70) in

the item "I would say that models and algorithms for

multimodal data processing are practical" and the

mean value of 7.52 in the item "I would say that

models and algorithms for multimodal data

processing are understandable (not confusing)"

stand out. It is interesting to know that respondents

positively evaluate (with scores above 5.0) models

and algorithms for processing multimodal data in

work environments as practical, understandable,

manageable, original, etc. In this regard, it would be

necessary to investigate whether it is possible to

obtain better results and increase the efficiency of

The multi-image programming paradigm

When multimodal data is received (identified, measured, generated), the time when it was received is recorded.

The data from differentiated modalities that define a particular object under study are considered and processed

together with data from other modalities that define the same object.

The data is obtained from heterogeneous sources and in arbitrary formats.

Synchronization and aggregation of data from different modalities is the basis of the computing model for

processing multi-image data.

Multi-image processing is performed using operations and relations defined in the algebraic system of

aggregates.

The processing flow is based on the temporality property of multimodal data.

The main entity on which the program's data representation and processing are based is the multi-image.

The programming paradigm is implemented through the use of a basic computing model and a computing model

for digital twin technology.

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.11

Nataliya Boyko

E-ISSN: 2224-3402

Volume 20, 2023

temporal multimodal data processing in computer

systems based on the application of models and

algorithms for multimodal data processing increase

the efficiency of temporal multimodal data

processing in computer systems.

Thus, the implementation of models and

algorithms for multimodal data processing in

computer systems will face new challenges in line

with innovative changes. Moreover, in-depth

research will lead to increased attention to

improving models and algorithms for multimodal

data processing.

6 Conclusion

As a result of the analysis of models and algorithms

for processing multimodal data, it was found that

the development of a new class of software and

hardware systems based on digital twin technology

and multimedia technology will expand the

capabilities and increase the efficiency of human

activity in complex or non-standard conditions.

Increasing profits from the use of these models and

algorithms for processing multimodal data,

attracting maximum attention, the novelty in use,

prospects, and opportunities in this area are only a

part of all possible positive effects of using models

and algorithms for processing multimodal data in

professional activities.

Since the digital twin technology is characterized

not only by technical characteristics or behavioral

data but also by a visual model, it is advisable to

develop this technology in conjunction with

multimedia technology, which involves operating

not only audiovisual data but also data of other

modalities that transmit other types of information

perceived by the human senses. The application of

this approach will synergistically enhance the

capabilities of both technologies and contribute to

the development of a new class of software and

hardware systems for processing multimodal data

and solving a wide range of tasks remotely.

The practical significance of the study lies in the

fact that the conclusions and recommendations

developed by the author and proposed in the article

can be used to select tools for implementing the

proposed models and algorithms for processing

multimodal data. Further research can be aimed at

improving and developing methods for studying the

practical principles of implementation and studying

models and algorithms for processing multimodal

data in computer systems. In future research, it

would be interesting to analyze the capabilities of

models and algorithms for multimodal data

processing in differentiated industries.

The advantages of the study are the assessment

of the use of models in multimodal data processing,

which are evaluated in theoretical and practical

aspects. For this, in addition to the analysis of

scientific works, a survey of experts and specialists

working in this direction was conducted.

Further research in this area may include the

development of new architectures and algorithms

that would be able to efficiently process different

types of data and combine them into a single model.

For example, deep neural networks can be

developed that can process images and text

simultaneously.

References:

[1] Sîrghi, S., Sîrghi, A.. Design for online

teaching and learning in the context of digital

education. Știinţa culturii fizice. Nr. 35/1, 50-

54. 2020. Online available from

https://doi.org/10.52449/1857-4114.2020.35-

1.08.

[2] Sulema, Ye., Dychka, I., Sulema, O.

Multimodal Data Representation Models for

Virtual, Remote, and Mixed Laboratories

Development, in Lecture Notes in Networks

and Systems, Springer Cham, vol. 47, pp. 559-

569. 2018

[3] Dychka, I. A., Sulema, E. S. Multimodal data

representation model for a comprehensive

description of observation objects. Bulletin of

the Vinnytsia Polytechnic Institute, (1), 53–60.

2020. Online available from

https://doi.org/10.31649/1997-9266-2020-148-

1-53-60

[4] Sulema, E. S. Methods, models, and tools for

processing multimodal data of digital

duplicates of researched objects. The National

Technical University of Ukraine "Kyiv

Polytechnic Institute named after Igor

Sikorsky", Kyiv, 343 p. 2020

[5] Nusrat, J. S., Li-Minn, A., Kah Phooi Seng,

D.M., Motiur, R., Tanveer, Z. Multimodal big

data affective analytics: A comprehensive

survey using text, audio, visual and

physiological signals, Journal of Network and

Computer Applications, Volume 149, 102447.

2020. Retrieved from:

https://doi.org/10.1016/j.jnca.2019.102447.

[6] Calvo, R., D’Mello, S. (Affect Detection: An

Interdisciplinary Review of Models, Methods,

and Their Applications. IEEE Transactions on

Affective Computing, 1, 18-37. 2010.Online

available from http://dx.doi.org/10.1109/T-

AFFC.2010.1

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.11

Nataliya Boyko

E-ISSN: 2224-3402

Volume 20, 2023

[7] Scollon, R., Scollon, S. Multimodality and

language: a retrospective and prospective view.

In C. Jewitt (Ed.), The Routledge Handbook of

Multimodal Analysis (pp. 170–180). 2009.

London: Routledge.

[8] Jewitt, C. (2013). Multimodal methods for

researching digital technologies. In S. Price, C.

Jewitt, & B. Brown (Eds.), The Sage handbook

of digital technology research (pp. 250–265).

London: Sage.

[9] Argelaguet, R., Cuomo, A. S. E., Stegle, O.,

Marioni, J. C. Computational principles and

challenges in single-cell data integration.

Nature Biotechnology, 39:1202–1215. 2021/

Online available from DOI: 10.1038/s41587-

021-00895-7.

[10] Csurka, G. A Comprehensive Survey on

Domain Adaptation for Visual Applications.

Advances in Computer Vision and Pattern

Recognition, (9783319583464):1–35. 2017.

Online available from DOI: 10.1007/978-3-

319-58347-1_1.

[11] Zhao, J., Xie, X., Xu, X., Sun, S. 2017. Multi-

view learning overview: Recent progress and

new challenges. Information Fusion, 38:43–54.

Online available from DOI:

10.1016/J.INFFUS.2017.02.007.

[12] Lance, C., Luecken, M. D., Burkhardt, D. B.,

Cannoodt, R., Rautenstrauch, P., Laddach, A.,

Ubingazhibov, A., Cao, Z.-J., Deng, K., Khan,

S., Liu, Q., Russkikh, N., Ryazantsev, G.,

Ohler, U., Pisco, A. O., Bloom, J.,

Krishnaswamy, S., & Theis, F. J. (2022).

Multimodal single-cell data integration

challenge: results and lessons learned. Online

available from

https://doi.org/10.1101/2022.04.11.487796.

[13] Bokade, R., Navato, A., Ouyang, R., Jin, X.,

Chou, C.-A., Ostadabbas, S., & Mueller, A. V.

A cross-disciplinary comparison of multimodal

data fusion approaches and applications:

Accelerating learning through trans-

disciplinary information sharing. Expert

Systems with Applications, 165, Article

113885. (2021). Retrieved from:

https://doi.org/10.1016/j. eswa.2020.113885.

[14] Gupta, A., Anpalagan, A., Guan, L., Khwaja,

A. S. (2021). Deep learning for object

detection and scene perception in self-driving

cars: Survey, challenges, and open issues.

Array, 100057. Online available from

https://doi.org/10.1016/j.array.2021.100057.

[15] Alkhalaf, S. A robust variance information

fusion technique for real-time autonomous

navigation systems. Measurement, 179, Article

109441. 2021. Online available from

https://doi.org/

10.1016/j.measurement.2021.109441.

[16] Cuayahuitl, H. A data-efficient deep learning

approach for deployable multimodal social

robots. Neurocomputing, 396, 587–598. 2020.

Online available from https://doi.org/10.1016/j.

neucom.2018.09.104.

[17] Liu, H., Fang, T., Zhou, T., Wang, L. Towards

robust human-robot collaborative

manufacturing: Multimodal fusion. IEEE

Access, 6, 74762–74771. 2021. Online

available from https://doi.org/

10.1109/ACCESS.2018.2884793.

[18] Ma, M., Sun, C., Chen, X. (2018). Deep

coupling autoencoder for fault diagnosis with

multimodal sensory data. IEEE Transactions

on Industrial Informatics, 14, 1137–1145.

2018. Online available from

https://doi.org/10.1109/TII.2018.2793246.

[19] Yang, Z., Baraldi, P., Zio, E. A multi-branch

deep neural network model for failure

prognostics based on multimodal data. Journal

of Manufacturing Systems, 59, 42–50. 2021

Online available from

https://doi.org/10.1016/j.jmsy.2021.01.007.

[20] Al-Dulaimi, A., Zabihi, S., Asif, A.,

Mohammadi, A. A multimodal and hybrid

deep neural network model for remaining

useful life estimation. Computers in Industry,

108, 186–196. 2019. Online available from

https://doi.org/10.1016/j.compind.2019.02.004.

[21] Kumar, S., Kolekar, T., Patil, S., Bongale, A.,

Kotecha, K., Zaguia, A., Prakash, C. A low-

cost multi-sensor data acquisition system for

fault detection in fused deposition modeling.

Sensors, 22, 517. 2022. Online available from

https://doi.org/10.3390/s22020517.

[22] Lu, Y., Liu, C., Wang, K. I-K., Huang, H., Xu,

X. Digital Twin-driven smart manufacturing:

connotation, reference model, applications and

research issues. Robotics and Computer

Integrated Manufacturing, vol. 61, рр. 1–14.

2020.

[23] Alam, K. M., El Saddik, A. (C2PS: A digital

twin architecture reference model for the

cloud-based cyber-physical systems. IEEE

Access, vol. 5, рр. 2050–2062. 2017.

[24] Redelinghuys, A. J. H., Basson, A. H., Kruger,

K. A Six-Layer Digital Twin Architecture for a

Manufacturing Cell. Studies in Computational

Intelligence, vol. 803, рр. 412–423. 2018.

[25] Keith, D. Understanding Key-Value Databases.

Dataversity. 2020. Online available from

https://www.dataversity.net/understanding-

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.11

Nataliya Boyko

E-ISSN: 2224-3402

Volume 20, 2023

key-value-databases/#.

[26] Buchanan, W.J., Li, S., Asif, R. Lightweight

cryptography methods. Journal of Cyber

Security Technology, vol. 1, Issue 3–4, рр.

187–201. 2017.

[27] Ronen, E., Shamir, A. Extended functionality

attacks on IoT devices: The case of smart

lights. Proceedings of the 2016 IEEE European

symposium on security and privacy (SP’16),

рр. 3–12. 2016.

[28] Dhanda, S.S., Singh, B., Jindal, P. Lightweight

Cryptography: A Solution to Secure IoT.

Wireless Personal Communications, vol. 112,

рр. 1947– 1980. 2020.

[29] Dutta, I. K., Ghosh, B., Bayoumi, M.

Lightweight Cryptography for Internet of

Insecure Things: A Survey. Proceedings of the

IEEE 9th Annual Computing and

Communication Workshop and Conference

(CCWC2019), рр. 0475-0481. 2019.

[30] Maharjan, R., Shrestha, A. K., Basnet, R.

Image Steganography: Protection of Digital

Properties against Eavesdropping. ArXiv. 8

p.2019.

WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS

DOI: 10.37394/23209.2023.20.11

Nataliya Boyko

E-ISSN: 2224-3402

Volume 20, 2023

Contribution of Individual Authors to the

Creation of a Scientific Article (Ghostwriting

Policy)

The authors equally contributed in the present

research, at all stages from the formulation of the

problem to the final findings and solution.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

No funding was received for conducting this study.

Conflict of Interest

The authors have no conflicts of interest to declare

that are relevant to the content of this article.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US