However, if an ESB is not available, the Component
Control module must handle the sequencing and
prioritization of multiple simultaneous queries. It can
even break down queries into smaller parts to free up
machine time across the entire technological
infrastructure, as described in Section 3.5.
3.2 Data Quality Verification
The historical data stored retains the quality with
which it was acquired at the moment in real-time;
however, multiple factors can affect its quality and
precision. A viable option to ensure a response with
highly reliable data is to integrate a Validation
module. This module is responsible for analyzing the
data request in a query and applying specific
validation, verification, and completeness
algorithms. In case it detects inconsistencies, it
performs an estimate of the corresponding
replacement data and informs the requester about the
actions taken in the response calculation.
Within this Validation module, the quality of raw
data can be verified in several ways. For example:
Integrity: Counting the number of records
available for a data series in a defined period.
Consistency: Validating a data set according
to the electrical or physical laws that model it.
Accuracy: Comparing a data set with external
measurements, redundant measurements, or
manual measurements taken during the same
period or by integrating measured values at
different points in the ESP.
Behavior: Comparing the profile of a data set
with the typical profile of that measurement.
Validity: Cross-comparing measured values
with similar measurements, geographically
close measurements, or calculated values.
AI: Additionally, considering the data
complexity, it is feasible to train Artificial
Intelligence (AI) algorithms to perform much
more comprehensive validations. For
example, this can include identifying and
applying typical profiles, autonomous
autoregression, predictive models, correlation
with exogenous variables, comparison with
nearby data points (case-based reasoning),
and automatic clustering algorithms, among
others.
3.3 Handling Large Data Sets
If the database does not impose query restrictions, a
request can yield a substantial amount of data as a
response. This situation could lead to the saturation
or collapse of the technological platform, causing
delays in all other concurrently running processes.
To address this issue, the Component Control
module, working in conjunction with the Response
Builder module, can adopt a strategy to prioritize,
segment, or break down queries into smaller parts.
This approach allows for the handling of multiple
responses so that the user who requested the data
ultimately receives a complete response. In this
sense, the data is processed in manageable packages
by the technological platform. Consequently, all
other concurrent users are served, and the waiting
time is distributed among them. As a result, high-
priority users receive their answers within the
required timeframe, while users with large data
volume requests (typically not of high priority)
receive their responses only slightly later than if the
query were executed directly (in any case, the
processing time will be considerably longer than for
simple queries).
3.4 Database Operational Security
Another specific issue in traditional architecture is
that the operational stability of the technological
platform is not guaranteed. As explained in section
2.3, it is relatively easy to disrupt it through
uncontrolled use.
The solution proposed by the Optimal Extractor,
as shown in Fig.2, involves breaking down queries
into smaller parts to manage the machine time of the
technological infrastructure. In this regard, the
Component Control module is responsible for
executing the following actions:
Calculate the amount of data that will be
queried in a user request.
If the data amount exceeds an empirically
defined limit (based on the hardware
resources of the technology platform and the
granularity of stored data), the query will be
segmented or divided into sections, and the
Query Constructor and Response Builder
modules will be notified.
Multiple partial queries are generated.
A waiting period is introduced between
queries (the duration is also determined
empirically using the same criteria as the data
limit).
Partial responses are consolidated into a
single coherent response.
WSEAS TRANSACTIONS on COMPUTERS
DOI: 10.37394/23205.2023.22.17
Alfredo Espinosa-Reza, Marxa Lenina Torres-Espindola