Wavelet Based Financial Forecast Ensemble Featuring Hybrid
Quantum-Classical LSTM Model
PETER BIGICA, XIAODI WANG
Department of Mathematics and Computer Science
Western Connecticut State University
181 White Street, Danbury, Connecticut
UNITED STATES
Abstract: One of the most sought-after goals in the financial world is a reliable method by which investors can
predict a stock price movement consistently. Advancements in stock prediction via the use of machine learning
have improved the accuracy of such predictions and yielded better ideas about value investments in the stock
market. However, with the addition of the state-of-art tool M-band wavelet transform as a preprocessing step, we
can denoise our raw data set (prior stock prices) and refine it to make the forecast even more accurate. Following
this preprocessing step, multiple machine learning techniques are deployed to construct a robust, non-parametric
hybrid forecasting model. To demonstrate the novelty of our algorithm, we present a case study on stock price
prediction employing a discrete 4-band wavelet transform integrated with a hybrid machine learning ensemble.
Our results underscore the potential and importance of such ensemble methods in refining the accuracy and reli-
ability of financial forecasts. Furthermore, in theory, quantum computing can further optimize these algorithms,
potentially leading to more precise stock price forecasts. In particular, our ensemble will feature a hybrid quantum-
classical LSTM Model to demonstrate the potential of both wavelets and quantum computing in stock forecasting.
Key-Words: Machine Learning, M-Band Wavelet Transform, Neural Network Ensemble, Quantum Computing,
Stock forecasting, Long Short-Term Memory (LSTM) networks
Received: December 16, 2023. Revised: July 15, 2024. Accepted: August 9, 2024. Published: September 18, 2024.
1 Introduction
The financial realm is characterized by its volatile
nature, with stock price movements often being de-
scribed as “random”. As such, one of the most sought-
after goals in the financial world is finding a consis-
tent, reliable method by which these movements can
be anticipated. Over the years as investors and ana-
lysts have attempted to find such a solution, techno-
logical advancements (particularly in machine learn-
ing) have provided new techniques by which stock
prices can be forecasted. These advances have not
only elevated the precision of stock predictions but
have allowed for the creation of new algorithms to be
created to in- increase the precision of market predic-
tions further, [1].
However, while these techniques continue to ad-
vance and boundaries are continually being pushed,
handling financial data and trying to make an ac-
curate/consistent prediction remains a daunting task.
All predictions must be based on historical data which
can often be limited in its sample size, is virtually al-
ways non-linear, has no clear trends, and can have
high volatility. In other words, financial data can have
a lot of “noise” which makes the task of forecasting
financial movements a difficult one, [2].
To combat such volatility in the quest to develop
a more reliable model, we introduce the M-Band
wavelet. As a state-of-art preprocessing tool, the M-
band wavelet transform has the profound ability to de-
noise a raw dataset, [3]. It refines and purifies the
data, setting the stage for even more accurate fore-
casting than what was previously achievable.
The M-band wavelet transform can separate data
into different frequency components, specifically
1 low-frequency component and Mhigh-frequency
components, and more often the ”noise” embedded in
the raw data that makes stock forecasting so difficult
is contained within the high-frequency components of
the data, [4]. Once the Wavelet Transform and de-
noise procedure have been applied, we are left with
a “clean” data set that better reflects the true trends
within the data, and ultimately allows us to use ma-
chine learning techniques to achieve a better predic-
tion.
To test the validity of this method, several differ-
ent techniques and neural networks will be employed
through a financial ensemble to not only see the effec-
tiveness of different neural networks but also to test
the validity of the Wavelet denoise algorithm applied
to the data. An ensemble forecast will give us more
results, and should the results align more confidence
as to what financial decision to make in the future, [5].
WSEAS TRANSACTIONS on BUSINESS and ECONOMICS
DOI: 10.37394/23207.2024.21.151
Peter Bigica, Xiaodi Wang
E-ISSN: 2224-2899
1856
Volume 21, 2024
The metric of testing error will be accomplished with
a Root Mean Squared Error (RMSE).
In this research, the following techniques that will
be used include Autoregressive Integrated Moving
Average (ARIMA), Long Short Term Neural Net-
work (LSTM), Support Vector Regression (SVR),
and Recurrent Neural Network (RNN).
Lastly, we will continue our exploration into con-
structing a more reliable forecasting model by ven-
turing into the realm of quantum computing. Quan-
tum computing is an emerging advancement that of-
fers several advantages over traditional methods in-
cluding but not limited to more computational power,
more efficient optimization, and the potential for new
algorithms to make even better predictions using ma-
chine learning methods, [6].
The goal of this research is to find a more reli-
able method for investors to make smarter financial
decisions by employing alternative machine learning
methods. We will set out to prove the potential of
wavelets in financial forecasting, while also demon-
strating the power of utilizing an ensemble forecast
as opposed to standard single-method forecasts when
making financial decisions to help create better judg-
ment. Additionally, we will show the potential of
quantum mechanics in financial forecasting as we get
closer to a new era of computing
2 Literature Review
Stock forecasting (for our purposes the area of trying
to predict future stock prices via machine learning)
has been an area of intense research in the last sev-
eral decades. After 50 years of advances in the field,
neural networks have gained significant attention for
their capacity to model complex non-linear relation-
ships predominant in stock markets.
Historically, stock market predictions relied heav-
ily on statistical methods like the ARIMA and
GARCH models, [7]. As deep/machine learning
methods have developed leading into the 21st cen-
tury, stock forecasters started utilizing neural net-
works, with initial experiments focusing primarily on
Feedforward Neural Networks (FNN) and Recurrent
Neural Networks (RNN). As techniques developed,
more sophisticated networks such as Long Short-
Term Memory (LSTM) units and Convolutional Neu-
ral Networks (CNN) have become predominant in the
modern day, [5].
Advantages of neural networks over statistical
methods include:
Modeling Non-linearity data sets
Better proficiency with working with time series
data
Integration of multiple diverse data sets to train
models
Despite their advances, neural networks still con-
tain challenges:
Overfitting of data sets making models less gen-
eralized
Interpretability and acting like a ”black box”
which complicates relying on models for finan-
cial decisions
The required use of significant computational re-
sources, which can limit their ability to perform
real-time predictions,
In this paper, our contribution is to overcome these
challenges by creating a forecasting ensemble con-
sisting of various machine learning algorithms all
based on a Wavelet Transform to denoise prior stock
data and in turn produce better, more accurate fore-
casts that we can study and evaluate.
3 Primaries
3.1 M-Band Wavelet
An Orthogonal M-Band Discrete Wavelet Transform
(DWT) is totally determined by M sets of filters in a
filter bank with certain properties, [4]. In any such
filter bank, there are one low pass filter
α
, and M1
high pass filters
β
(j)for j=1,2, ...., M1with N
vanishing moments. These filters satisfy the follow-
ing conditions:
n
i=1
α
i=M(1)
n
i=1
iK
β
(j)
i=0for k=1,2,...,N1
and j=1,...,M1(2)
||
α
|| =||
β
(j)|| =1for j=1, ..., M1(3)
<
α
,
β
(j)>=0for j=1, ..., M1(4)
h
β
(i),
β
(j)i=0for i,j=1,...,M1
and i6=j(5)
WSEAS TRANSACTIONS on BUSINESS and ECONOMICS
DOI: 10.37394/23207.2024.21.151
Peter Bigica, Xiaodi Wang
E-ISSN: 2224-2899
1857
Volume 21, 2024
In this research, the wavelet transform was per-
formed with the following filter banks:
α
= [0.06737176,0.09419511,0.40580489,
0.56737176,0.56737176,0.40580489,0.09419511,
0.06737176],
β
= [0.09419511,0.06737176,0.56737176,
0.40580489,0.40580489,0.56737176,0.06737176,
0.0941951],
γ
= [0.09419511,0.06737176,0.56737176,
0.4058048,0.4058048,0.5673717,0.06737176,
0.0941951],
δ
= [0.06737176,0.09419511,0.40580489,
0.56737176,0.56737176,0.40580489,0.09419511,
0.06737176]
If the signal we are working with (S) is contained
within R4k(kN,k2), we can create a 4k×4k
Wavelet transform matrix T1by shifting and wrapping
around the filters (as shown in figure 1).
Let C1,C2,...,CMk be the column vectors of TT
1.
Then, CT
1,CT
2,...,CT
Mk are row vectors of T1. Since
T1is an orthogonal matrix,{C1,C2,...,CMk}forms an
orthonormal basis of RMk. Therefore, the components
of ˜
S1are coordinates of Sunder this wavelet basis, and
hence kT1Sk=k˜
S1k. The components of ˜
S1are also
called the wavelet coefficients of S. Moreover, the
M-Band DWT of Sdecomposes Sinto Mdifferent
frequency components with a(1)being the lowest fre-
quency component (or trend) and d(1)
1,...,d(1)
M1being
the higher frequency components (or fluctuations) of
S. If necessary, and if kis divisible M, we can apply
DWT to a(1)using a k×kDWT matrix T2such that:
T2a(1)=ha(2),d(2)
1,...,d(2)
M1iT,˜
S2,(6)
where
a(2)=a(2)
1,a(2)
2,a(2)
3,...,a(2)
k
MT
,(7)
and
d(2)
i=d(2)
i,1,d(2)
i,2,d(2)
i,3,...,d(2)
i,k
MT
,
i=1,...,M1,(8)
The M-Band DWT of a(1)decomposes a(1)into M
different frequency components with a(2)being the
lowest frequency and d(2)
i(i=1,...,M1) being
higher frequency components of a(1).
Let T=T20
0IT1, where the lower corner 0is
an (M1)k×kzero matrix, the upper corner 0is a
k×(M1)kzero matrix, and Iis an (M1)k×(M
1)kidentity matrix. Then,
T S =ha(2),d(2)
1,...,d(2)
M1,d(1)
1,...,d(1)
M1iT,˜
S2,
(9)
and ˜
S2is the second level wavelet coefficients of
S. Since {C1,C2,...,CMk}is an orthonormal basis of
RMk
S=s1C1+s2C2+...+snCn,(10)
where n=Mk and si=CT
iS=hCi,Si, the inner
product of Ciand Sfor i=1,2,...,n. Therefore,
si=
a(1)
ifor i=1,2,...,k
d(1)
1,ifor i= (k+1),k+2,...,2k
.
.
.
d(1)
(M1),ifor i= (M1)k+1,...,Mk
(11)
Let
A(1)=a(1)
1C1+...+a(1)
kCk,D(1)
i
=d(1)
i,1Cik+1+...+d(1)
i,(i+1)kC(i+1)k
(12)
for i=1,...,M1. Then A(1)is correspond-
ing to a(1)and (D(1)
iis corresponding to d(1)
i)for i=
1,...,M1.If we let
V(1)=span{C1,...,Ck}(13)
and
W(1)
i=span{Cik+1,...,C(i+1)k}(14)
for i=1,...,M1, then V(1),W(1)
1,..., and
W(1)
(M1)are orthogonal subspaces of RMk and there-
fore it can be represented as the following direct sum,
Fig. 1: An example of 16x16 Wavelet Transform matrix
WSEAS TRANSACTIONS on BUSINESS and ECONOMICS
DOI: 10.37394/23207.2024.21.151
Peter Bigica, Xiaodi Wang
E-ISSN: 2224-2899
1858
Volume 21, 2024
RMk =V(1)W(1)
1...W(1)
(M1).(15)
So, for any SRMk,Scan be written uniquely as
S=A(1)+D(1)
1+...+D(1)
M1,(16)
where A(1)=ProjV(1)Sis the orthogonal projection
of Sonto V(1),D(1)
i=ProjW(1)
i
Sis the orthogonal pro-
jection of Sonto W(1)
i, for i=1,...,M1.
If we apply the second level DWT to S, then sub-
spaces V(2),W(2)
1,...,W(2)
1,W(1)
2,..., and W(1)
M1are
orthogonal to each other and therefore,
RMk =V(2)W(2)
1...W(2)
(M1)
W(1)
1...W(1)
(M1),(17)
V(1)=V(2)W(2)
1...W(2)
(M1),(18)
S=A(2)+D(2)
1+...+D(2)
M1+D(1)
1+...+D(1)
M1,
(19)
A(1)=A(2)+D(2)
1+...+D(2)
M1(20)
3.2 Autoregressive Integrated Moving
Average (ARIMA)
The ARIMA model, primarily based on linear rela-
tionships, has been a cornerstone in the field of time
series analysis for several decades, celebrated for its
simplicity and ability to handle non-stationary data.
Its strengths lie in its capacity to model and forecast
based on both prior data and prior errors, while in-
corporating autoregressive and moving average com-
ponents, [7]. However, despite its strengths, ARIMA
models do possess limitations especially when deal-
ing with nonlinear data sets which more often than
not are to be expected in real-world scenarios, [8].
ARIMA models primarily have been designed for
making predictions involving time-series-based data.
This model is typically configured using 3 values de-
noted (p,d,q), where ’p’ signifies the order of the
model, ’d’ indicates the degree of differencing, and
’q’ represents the number of lagged forecast errors in
the model, [8].
We can see how ARIMA is derived from AR and
MA models from the below. A pure auto-regressive
model (AR) is one where Ytdepends only on its lag.
Yt=
β
1Yt1+
β
2Yt2+···+ +
β
pYtp(21)
where
β
nis the coefficient of the lag that the model
estimates.
On the other hand, a pure moving average (MA)
model is one where Ytdepends only on the lagged
forecast errors
Yt=
φ
1
ε
t1+
φ
1
ε
t1+
φ
2
ε
t2+···+
φ
q
ε
tq(22)
Now, the ARIMA model can be derived from both
of these equations combined with an integrated (I)
term which controls the differencing in the model.
The ARIMA model is then given by:
Y0
t=
α
+
β
1Y0
t1+···+
β
pY0
tp+
ε
t
+
φ
1
ε
t1+···+
φ
q
ε
tq(23)
3.3 Long Short Term Neural Network
(LSTM)
Long Short-Term Memory (LSTM) networks, an ad-
vancement Recurrent Neural Networks (RNNs), have
been an important point in research involving long-
range modeling of time series data. Introduced by
German computer scientists Hochreiter and Schmid-
huber in 1995, LSTMs were designed to overcome the
predominant challenges faced by RNN models at the
time, more specifically the vanishing and exploding
gradient problems, [3]. In RNNs, during the back-
propagation process, gradients are calculated to up-
date the network’s weights. However, when dealing
with larger data sets, these gradients can become ex-
tremely small (vanish) or very large (explode). These
issues made it difficult for RNNs to learn and retain
information over longer intervals, constraining their
prowess in tasks in many tasks, [9].
LSTMs, with their enhanced architecture, effec-
tively address these challenges which limited RNN
models. Their architecture is comprised of three types
of gates: the input, forget, and output gates. Each of
these gates is responsible for regulating the flow of in-
formation in the network. This flow allows LSTMs to
decide what information to store, discard, or out-put at
each time step. These gates and their ability to deter-
mine what information should be kept or discarded at
each step in the sequence, allow the network to main-
tain a more stable gradient over time and solve the
problems which RNN suffers from, [1]. It also has the
added benefit of being able to maintain information
over longer intervals and thus capture intricate pat-
terns and dependencies in time series data. Over time,
various modifications to the standard LSTM model
have led to new architectures, such as the Gated Re-
current Units (GRUs).
As mentioned, the LSTM unit is composed of a
cell, an input gate, an output gate, and a forget gate.
WSEAS TRANSACTIONS on BUSINESS and ECONOMICS
DOI: 10.37394/23207.2024.21.151
Peter Bigica, Xiaodi Wang
E-ISSN: 2224-2899
1859
Volume 21, 2024
The cell remembers values over arbitrary time inter-
vals, and the three gates regulate the flow of informa-
tion into and out of the cell.
Let’s denote:
xtas the input at time step t.
htas the hidden state at time step t.
ctas the cell state at time step t.
ftas the forget gate’s activation.
itas the input gate’s activation.
otas the output gate’s activation.
σ
as the sigmoid activation function.
LSTM can then be derived from the following set
of equations. Additionally, the equation for the root
mean square error loss function which will be used to
test the accuracy of the model is given below:
ft=
σ
(Wf[ht1,xt] + bf)(24)
it=
σ
(Wi[ht1,xt] + bi)(25)
˜ct=tanh(Wc[ht1,xt] + bc)(26)
ct=ftct1+it˜ct(27)
ot=
σ
(Wo[ht1,xt] + bo)(28)
ht=ottanh(ct)(29)
RMSE =s1
N
N
t=1
(ytht)2(30)
where:
Wf,Wi,Wc, and Woare weight matrices.
bf,bi,bc, and boare bias vectors.
represents element-wise multiplication.
Nrepresents the number of observations
ytand htare the target and predicted outputs at
time t
The LSTM’s ability to forget, learn, and output
information gives it the capacity to model time se-
ries and sequences with long-range dependencies and
avoid the vanishing and exploding gradient problems
of traditional RNNs.
The forget gate (ft) decides what information
from the cell state should be thrown away or kept.
The input gate (it) updates the cell state with new
information.
The output gate (ot) determines what the next
hidden state should be.
3.4 Support Vector Regression (SVR)
Support Vector Regression (SVR), an extension of the
widely acclaimed Support Vector Machines (SVM)
used for regression tasks, is known for its ability to
handle high-dimensional data. SVR operates by map-
ping input data into a higher-dimensional space where
it seeks to find a hyperplane that best fits the data. The
central idea is to identify a function that, for a given
tolerance of error, has the smallest possible deviation
from the actual training data while maintaining a max-
imal margin, [3]. SVR typically uses an
ε
-insensitive
loss function that allows errors falling within a de-
fined threshold (
ε
) and it penalizes errors outside this
range much more heavily.
Over the years, SVR’s application has spanned
across diverse domains, from finance to environmen-
tal modeling. One of the defining features of SVR is
its utilization of kernel functions, such as polynomial,
radial basis function (RBF), sigmoid, and wavelets
which allow it to model complex, non-linear relation-
ships in data by transforming it into a space where lin-
ear regression techniques become applicable, [3]. The
flexibility to choose and craft kernels grants SVR a
versatile edge over many traditional regression mod-
els. While SVR’s robustness and efficacy in high-
dimensional spaces are notable, challenges like tun-
ing hyperparameters and potential computational in-
efficiencies in very large datasets are areas of current
research.
Given a dataset: (x1,y1),(x2,y2),...,(xn,yn)the
SVR optimization problem can be formulated as:
minimize: 1
2||w||2+C
n
i=1
(
ξ
i+
ξ
i)(31)
subject to:
yihw,xiib
ε
+
ξ
i
hw,xii+byi
ε
+
ξ
i
ξ
i,
ξ
i0
(32)
Where:
wis the weight vector.
bis the bias term.
ξ
iand
ξ
iare slack variables.
Cis a regularization parameter.
3.5 Recurrent Neural Network (RNN)
Recurrent Neural Networks (RNNs) have gathered
substantial attention in the realm of deep learning
due to their ability to process sequential data, making
them an ideal choice for tasks involving time series.
RNNs are designed to maintain a memory of past in-
cidents in their internal state, allowing them to exhibit
WSEAS TRANSACTIONS on BUSINESS and ECONOMICS
DOI: 10.37394/23207.2024.21.151
Peter Bigica, Xiaodi Wang
E-ISSN: 2224-2899
1860
Volume 21, 2024
temporal dynamic behavior and recognize patterns
across time steps. Unlike traditional feed-forward
neural networks, RNNs possess loops, enabling the
propagation of information across sequences. This
inherent feature makes them distinctively powerful in
handling tasks where context from earlier steps is vi-
tal for understanding subsequent data points, [1].
However, while RNNs at the time were revolu-
tionary, they unfortunately came with some detrimen-
tal challenges. One significant limitation is RNN’s
difficulty in learning long-term dependencies due to
the infamous vanishing and exploding gradient prob-
lems. This challenge is still within RNN models,
however, it did lead to the development of more so-
phisticated architectures like Long Short-Term Mem-
ory (LSTM) networks and Gated Recurrent Units
(GRUs), both of which were designed to overcome
the shortcomings of RNNs.
RNN can be defined as such:
ht=
σ
(Whhht1+Wxhxt+bh)(33)
yt=Whyht+by(34)
where:
xtis the input at time step t.
htis the hidden state at time step t.
ytis the output at time step t.
Wxh,Whh, and Why are weight matrices.
bhand byare bias vectors.
σ
is the activation function.
While RNNs are powerful for modeling sequential
data, they have challenges such as the vanishing and
exploding gradient problems. These challenges make
it difficult to learn long-range dependencies in se-
quences. Advanced RNN architectures, like the Long
Short-Term Memory (LSTM) and the Gated Recur-
rent Unit (GRU), were developed to overcome these
limitations.
3.6 Quantum Hybrid Quantum Classical
Model
The integration of quantum computing with classical
machine learning models is one such area of research
that has emerged over recent years with the slow but
steady creation of new tools to put quantum comput-
ing (or rather quantum simulation) in the hands of
users. Quantum computing promises unprecedented
advances, allowing uses to explore new areas that
classical computers could not. For our purpose, quan-
tum computing has the potential to be combined with
current classical machine learning algorithms in order
to enhance their predictions abilities, [9].
Hybrid quantum-classical LSTM models aim
to intertwine the time series forecasting strengths
of LSTMs with quantum-enhanced data processing
techniques such as quantum gates, superstition, and
entanglement, [5]. Preliminary studies into this hy-
brid model have shown potential advantages includ-
ing improved accuracy and more efficient process-
ing speeds. Quantum algorithms, such as the Quan-
tum Approximate Optimization Algorithm (QAOA)
or Quantum Support Vector Machines (QSVM), have
shown the potential in solving optimization problems
more efficiently, [10]. Integrating these methods with
classical LSTMs can potentially lead to faster model
training and enhanced prediction accuracy. However,
the creation of these hybrid models remains a chal-
lenge due to quantum computing still being young and
the lack of infrastructure. Additionally, the complex-
ity involved in effectively combining quantum and
classical processes adds yet another layer of challenge
for working with quantum computing. Continued ad-
advancements in quantum hardware and algorithms
will likely pave the way for more robust and scalable
hybrid models in financial forecasting and find use in
other fields as well
In the quantum version of the LSTM cell, quantum
gates, and qubits replace classical gates and bits. The
core memory cell would comprise qubits, and quan-
tum gates would handle the gating mechanisms:
The input,forget, and output gates could be im-
plemented using quantum gates that control the
flow of quantum information.
Entanglement could potentially be used to cap-
ture and maintain long-range dependencies in the
sequence.
Quantum computations result in a superposition
of states. A measurement collapses this superpo-
sition to classical bits. The outcome of the quan-
tum LSTM cell would be interfaced with classi-
cal neural network layers:
Quantum states (from qubits) of the memory cell
would be measured and converted to classical in-
formation.
This classical information would then be pro-
cessed by traditional LSTM layers or other neural
network architectures.
WSEAS TRANSACTIONS on BUSINESS and ECONOMICS
DOI: 10.37394/23207.2024.21.151
Peter Bigica, Xiaodi Wang
E-ISSN: 2224-2899
1861
Volume 21, 2024
4 Experiment Design and Procedure
4.1 Wavelet Transform
For this experiment, in order to denoise our prior data
set (in this case, AMZN will be the stock we will try
to forecast and will use prior closing price data) a 4
Band wavelet will be applied as such:
S=T S =
a
d1
d2
d3
(35)
where
a=
<T1,S>
<T2,S>
.
.
.
<Tn
41,S>
<Tn
4,S>
(36)
And
"d1
d2
d3#=
<Tn
4+1,S>
<Tn
4+2,S>
.
.
.
<Tn,S>
(37)
where T=4 Band Wavelet Transform Matrix, S =
Prior Stock Closing Data, a=Approximation Portion
(low frequency), di= detail portions for i=1,2,3and
Tiis i(th)row of T. Additionally, n=Number of days
(in the form n=4kwhere kN). In this research,
256 days of prior data were used so k=84, and hence
the wavelet transform matrix was of size 256 ×256.
Thresholding was then accomplished by redefining di
j
using the following hard threshold to prevent over-
smoothing of the data:
λ
=
σ
p2log(N)(38)
where
σ
=Standard deviation of noise
from the wavelet coefficients and N=
number of elements in the band. After applying
the threshold, we achieve ˆ
Sfrom S. Let TTbe the
transform of the wavelet matrix, we can use this to
reconstruct our smoothed data from the wavelet data
and finally get our smoothed usable data, ˜
S:
TT·ˆ
S=˜
SS(39)
Figure 3 shows the denoised AMZN data that will be
used to train our various models in our ensemble can
be seen compared to the original AMZN in Figure 2.
As we can see, the denoised data retains the shape
of the original data and the vital time points and does
not over smoothing the data.
4.2 Long Short Term Neural Network
(LSTM)
For the LSTM model, the model was created in
Python and trained both from 248 days of original
data and wavelet denoised data to compare the two,
then a 10-day prediction was output by the model us-
ing each respective dataset. The network was con-
structed using TensorFlow’s Keras API with the fol-
lowing layers:
1. Conv1D layer with 64 filters and a kernel size of
2. This convolutional layer helps to extract fea-
tures from the sequence data, which can be bene-
ficial in identifying trends or patterns over time.
2. Two LSTM layers with 60 units each, designed
to capture long-term dependencies in sequence
data. The first LSTM returns sequences to pro-
vide a three-dimensional array as input to the next
LSTM layer.
Fig. 2: Original AMZN stock data
Fig. 3: Denoised AMZN stock data
WSEAS TRANSACTIONS on BUSINESS and ECONOMICS
DOI: 10.37394/23207.2024.21.151
Peter Bigica, Xiaodi Wang
E-ISSN: 2224-2899
1862
Volume 21, 2024
3. Two Dense layers with 30 and 4 units respec-
tively, using ReLU activation. These are fully
connected layers that help in learning non-linear
combinations of the features.
4. A final Dense layer outputs a single value, repre-
senting the model’s prediction for the next time
step.
In total, this model is comprised of 6 layers and re-
lies on the Adam optimizer as the loss function in or-
der to minimize the difference between the predicted
and actual values in training. The Adam optimizer
was chosen for its adaptive learning rate, which is
very effective in handling the noisy gradients of fi-
nancial time series data.
The dataset was segmented into 80% training,
10% validation, and 10% testing sets, in order to
create an effective learning curve. The table (Ta-
ble 1) below features a comparison of RMSE val-
ues for 1- 1-day-historical, 5-day- historical, and 10-
day-historical values for AMZN Stock, both with and
without the Wavelet Denoising.
Below, the training of the LSTM model using orig-
inal and denoised data can be seen in Fig.4 and Fig.5,
alongside the respective original and denoised predic-
tions in Fig.6 and Fig. 7:
Lastly, the 10-day predictions using each data set:
Fig. 4: LSTM training with original data
Fig. 5: LSTM training with denoised data
Fig. 6: LSTM prediction with original data
Fig. 7: LSTM prediction with denoised data
WSEAS TRANSACTIONS on BUSINESS and ECONOMICS
DOI: 10.37394/23207.2024.21.151
Peter Bigica, Xiaodi Wang
E-ISSN: 2224-2899
1863
Volume 21, 2024
Day Original Value Denoised Value
1 138.34 138.35
2 138.73 138.90
3 138.90 139.16
4 138.95 139.24
5 138.89 139.22
6 138.65 139.11
7 138.37 138.93
8 138.16 138.71
9 137.82 138.26
10 138.47 137.86
RMSE Values
Original 6.27
Denoised 5.94
Table 1: LSTM comparison of Predicted Stock
Values
As we can see in Table 1, our LSTM model fore-
casts AMZN prices staying the same with a minor
downward trend over the next 10 days. This would
suggest to investors a good strategy would be to hold
over the next coming days. Additionally, we can also
see our model had a lower RMSE value when trained
with denoised data vs original data, which suggests
that our wavelet denoising had a positive impact on
the result.
4.3 Recurrent Neural Network (RNN)
For the RNN model, utilizing Python a 2-layer model
was created and trained both with original and de-
noised data to compare the results. The model is
comprised of a SimpleRNN layer and a Dense layer
which utilizes a ReLU activation function in order
to mitigate the vanishing gradient problem which oc-
curs when training neural networks. Likewise, the
’adam’ optimizer is used once again. The figures be-
low feature a comparison of RMSE values and pre-
dicted stock price between the original and denoised
data in Fig.8 and Fig. 9:
Day Original Value Denoised Value
1 142.04 142.87
2 141.53 142.51
3 141.05 142.15
4 140.60 141.81
5 140.19 141.47
6 139.79 141.15
7 139.42 140.83
8 139.07 140.53
9 138.74 140.23
10 138.43 138.43
RMSE Values
Original .0389
Denoised .0297
Table 2: RNN comparison of Predicted Values
As we can see in Table 2, our RNN model likewise
forecasts AMZN prices staying the same with a subtle
downward trend over the next 10 days. This would
suggest to investors a good strategy would be to hold
over the next coming days, or potentially sell before a
price decrease occurs. Additionally, we can also see
our model had a lower RMSE value when trained with
denoised data vs original data, which suggests that our
wavelet denoising had a positive impact on the result.
4.4 Support Vector Regression (SVR)
The SVR model is likewise trained once with de-
noised data and once with original data for compar-
ison. The model employs the RBF kernel, which is
defined as such:
K(xi,xj) = exp(
γ
kxixjk2)(40)
This kernel was chosen due to its ability to handle
nonlinear relationships.
The hyperparameters in this model were carefully
chosen to balance model complexity with general-
ization capability, thereby minimizing overfitting, a
Fig. 8: RNN prediction with original data
Fig. 9: RNN prediction with denoised data
WSEAS TRANSACTIONS on BUSINESS and ECONOMICS
DOI: 10.37394/23207.2024.21.151
Peter Bigica, Xiaodi Wang
E-ISSN: 2224-2899
1864
Volume 21, 2024
critical consideration for financial time-series predic-
tions. Below the stock prediction is given both with
the original and denoised data along with their RMSE
values in Fig.10 and Fig. 11:
Day Original Value Denoised Value
1 142.00 145.62
2 141.76 149.87
3 141.53 152.17
4 141.34 154.45
5 141.16 156.79
6 141.00 159.12
7 140.86 161.40
8 140.73 163.58
9 140.62 165.63
10 140.59 165.70
RMSE Values
Original 3.87
Denoised 2.54
Table 3: SVR comparison of Predicted Stock Val-
ues
As we can see in Table 3, our SVR model fore-
casts a drastic increase in AMZN prices over the next
10 days. This would suggest to investors a good strat-
egy would be to buy before the next 10 days occur, to
capitalize on the “bullish” behavior of the market to
come. Additionally, we can also see our model had a
lower RMSE value when trained with de-noised data
vs original data, which suggests that our wavelet de-
noising had a positive impact on the result.
4.5 Autoregressive Integrated Moving
Average (ARIMA)
For the ARIMA model, the model was created in
Python with the model’s parameters (p,d,q) being
chosen from the Partial Auto-correlation Function
(PACF) as seen in Figure 12 and Auto-correlation
Function (ACF) as seen in Figure 13 based on the
prior stock data seen below:
Lastly, a day 10 forecast was made using both de-
noised (Figure 15) and original stock data (Figure 14):
Fig. 10: SVR prediction with original data
Fig. 11: SVR prediction with denoised data Fig. 12: Auto-correlation Function
Fig. 13: Partial Auto-correlation Function
WSEAS TRANSACTIONS on BUSINESS and ECONOMICS
DOI: 10.37394/23207.2024.21.151
Peter Bigica, Xiaodi Wang
E-ISSN: 2224-2899
1865
Volume 21, 2024
Day Original Data Denoised Data
1 142.32 139.30
2 142.39 141.37
3 142.47 138.17
4 142.48 139.49
5 142.42 139.97
6 142.39 138.63
7 142.42 139.62
8 142.44 139.34
9 142.45 138.98
10 142.46 139.04
RMSE Values
Original 4.35
Denoised 4.02
Table 4: ARIMA comparison of Predicted Stock
Values
As we can see in Table 4, our ARIMA model fore-
casts AMZN prices staying the same over the next 10
days with minor volatility. This would suggest to in-
vestors a good strategy would be to hold over the next
coming days. Additionally, we can also see our model
had a lower RMSE value when trained with denoised
data vs original data, which suggests that our wavelet
denoising had a positive impact on the result.
4.6 Quantum Hybrid Classical LSTM
Method
For the final model in our ensemble, an LSTM model
that is equipped with a quantum layer via the use of
the Pinneylane package in Python was deployed and
was trained with our wavelet denoised stock price
data to likewise give a 10-day stock prediction. Using
Pinneylane, the quantum layer simulates 4 qubits (the
equivalent of a bit in classical computing). Angle em-
bedding within the quantum circuit was used to map
classical data (closed stock prices) to a quantum state,
by which an entangle layer was used within the circuit
to simulate quantum entanglement. The quantum cir-
cuit’s output was then fed back into the classical neu-
ral network, specifically an LSTM model which con-
tains a convolutional layer, 2 LSTM layers, a dropout
layer, and 3 dense layers (9 layers in total)
The goal of this model was to leverage the bene-
fits of quantum computing, with the use of wavelets,
to process data in ways that might be more efficient
or effective than classical-only approaches. Quantum
computing can potentially identify patterns in data
that classical algorithms might miss since it has addi-
tional tools that classical methods do not such as su-
perposition and entanglement, [9]. For our purposes,
the qubits we are utilizing can exist in multiple states
simultaneously and can entangle with each other, al-
lowing our model to not only analyze data sets more
efficiently but also evaluate numerous future possibil-
ities for our forecast which classical computing could
not, [10]. These benefits allow quantum methods to
be an ideal candidate as a preprocessing step.
The training of the model using wavelet de-noised
data is as seen below in Figure 16:
Fig. 14: ARIMA prediction with original data
Fig. 15: ARIMA prediction with denoised data
WSEAS TRANSACTIONS on BUSINESS and ECONOMICS
DOI: 10.37394/23207.2024.21.151
Peter Bigica, Xiaodi Wang
E-ISSN: 2224-2899
1866
Volume 21, 2024
Lastly, the final prediction of our ensemble is
given in Figure 17:
Day Predicted Value
1 126.15
2 125.83
3 125.76
4 125.84
5 125.98
6 126.19
7 126.40
8 126.61
9 126.80
10 126.90
RMSE: 12.82
Table 5: Predicted stock values and RMSE score for
the Hybrid Quantum Classical Model
As we can see from Table 5, the hybrid quantum-
classical model was in line with our other models
as it suggested AMZN stock price would remain un-
changed with little movement. These results paired
with our other models, in addition to the low RMSE
value, show that this is a quality forecast and gives a
glimpse into the power of quantum computing in fi-
nancial forecasts.
5 Conclusion
The results from our newly developed ensemble
model demonstrate the power of wavelet, machine
learning, and quantum computing in several ways.
Firstly, the different models show different lenses by
which the future of AMZN stock can be interpreted.
The majority of the models predict that the price of
AMZN stock over the next 10 days will either stay the
same or mildly decrease, suggesting to a potential in-
vestor that holding may be a good financial move. We
can also see across the board that using the de-noised
data aided in the training of our models, as across the
board RMSE values were lower when models utilized
de-noised data vs raw data, demonstrating the power
of the wavelet de-noising algorithm.
Additionally, the hybrid quantum-classical model
was able to capture trends from prior data well and
give a prediction that was in line with the other mod-
els. This demonstrates the potential for such hybrid
quantum-classical models and shows that there is po-
tential for them in the field. We believe that further re-
search into quantum computing can prove to be fruit-
ful as a new emerging method to make financial fore-
casts.
Overall, the results of this research show that
wavelet de-noising is a useful tool in financial fore-
casts, being able to make our predictions more reliable
and efficiently. While training each model, we can
see an overall decrease in RMSE values when train-
ing with wavelet denoised data as opposed to origi-
nal data. This means that not only did each model
capture the history and trends of the data better, but
it also gave us more confidence in the quality of the
prediction leading to better financial decisions. This
clearly indicates that wavelets show great promise in
financial forecasting and contribute to making a more
sound financial decision.
Additionally, our ensemble forecast clearly shows
the usefulness of such models when making finan-
cial decisions. Employing multiple methods for
stock forecasting helps to eliminate the aforemen-
tioned “black box” issue. Utilizing multiple tech-
niques yields a more diverse range of results which
can aid in making financial decisions. In our case,
when each model yields similar results about the fu-
ture of a stock price, it can give investors more confi-
dence in their decision.
Fig. 16: Hybrid Quantum Classical LSTM model training
Fig. 17: Hybrid Quantum Classical LSTM model prediction
WSEAS TRANSACTIONS on BUSINESS and ECONOMICS
DOI: 10.37394/23207.2024.21.151
Peter Bigica, Xiaodi Wang
E-ISSN: 2224-2899
1867
Volume 21, 2024
In the future, we can extend our current methods
by combining them into a hybrid model, as opposed to
separate models. Additionally, there is a large amount
of potential for tweaks to the quantum model such as
using other packages, adjusting the number of qubits,
etc. We believe the results of this research can also be
improved by better selecting a wide variety of stocks
to train our models vs using only one.
Acknowledgment:
Dr. Xiaodi Wang, PhD, Western Connecticut
State University for his time and support
The WCSU Department of Mathematics for
making this research possible
The WCSU Student Government Association
for their constant support for this research
The WCSU Foundation for their contributions
to this research
References:
[1] M. Goldani, “Comparative analysis on
forecasting methods and how to choose a
suitable one: case study in financial time series,”
Journal of Mathematics and Modeling in
Finance, pp. 35–59, 2023.
[2] S. Selvaraj, N. Vijayalakshmi, S. S. Kumar, and
G. D. Kumar, “Maximizing Profit Prediction:
Forecasting Future Trends with LSTM
Algorithm and compared with Loss function and
Mean error code using Python,” Ushus Journal
of Business Management, vol. 22, no. 4, pp.
15–28, 2023.
[3] A. H. Rahimyar, H. Q. Nguyen, and X. Wang,
“Stock forecasting using M-band wavelet-based
SVR and RNN-LSTMs models,” in 2019 2nd
International Conference on Information
Systems and Computer Aided Education
(ICISCAE), 2019, pp. 234–240.
[4] V. H. Shah, “Machine learning techniques for
stock prediction,” Foundations of Machine
Learning| Spring, vol. 1, no. 1, pp. 6–12, 2007.
[5] Y. Yang and J. Wang, “Forecasting wavelet
neural hybrid network with financial ensemble
empirical mode decomposition and MCID
evaluation,” Expert Systems with Applications,
vol. 166, pp. 114097, 2021.
[6] V. Novykov, C. Bilson, A. Gepp, G. Harris, and
B. J. Vanstone, “Deep learning applications in
investment portfolio management: a systematic
literature review,” Journal of Accounting
Literature, 2023.
[7] J. Fattah, L. Ezzine, Z. Aman, H. El Moussami,
and A. Lachhab, “Forecasting of demand using
ARIMA model,” International Journal of
Engineering Business Management, vol. 10, pp.
1847979018808673, 2018.
[8] L. Rubio, A. Palacio Pinedo, A. Mejía Castaño,
and F. Ramos, “Forecasting volatility by using
wavelet transform, ARIMA and GARCH
models,” Eurasian Economic Review, pp. 1–28,
2023.
[9] E. Paquet and F. Soleymani, “QuantumLeap:
Hybrid quantum neural network for financial
predictions,” Expert Systems with Applications,
vol. 195, pp. 116583, 2022.
[10] S. Y.-C. Chen, S. Yoo, and Y.-L. L. Fang,
“Quantum long short-term memory,” in ICASSP
2022-2022 IEEE International Conference on
Acoustics, Speech and Signal Processing
(ICASSP), 2022, pp. 8622–8626.
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
Peter Bigica carried out all experiments and
modeling
Xiaodi Wang was responsible for advising and
contributing the wavelet filter banks.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
The Western Connecticut State University Stu-
dent Government Association
The Western Connecticut State University Foun-
dation.
The authors have no conflicts of interest to
declare that are relevant to the content of this
article.
Creative Commons Attribution License 4.0
(Attribution 4.0 International , CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
Conflict of Interest
Disclosure:
During the preparation of this work, the author(s)
used generative AI to assist in generating initial
drafts and refining the language of sections related to
the explanation of machine learning techniques and
stock forecasting methods. After using this tool, the
author(s) reviewed and edited the content as needed
and take(s) full responsibility for the content of the
publication.
WSEAS TRANSACTIONS on BUSINESS and ECONOMICS
DOI: 10.37394/23207.2024.21.151
Peter Bigica, Xiaodi Wang
E-ISSN: 2224-2899
1868
Volume 21, 2024