Wavelet Based Financial Forecast Ensemble Featuring Hybrid

Quantum-Classical LSTM Model

PETER BIGICA, XIAODI WANG

Department of Mathematics and Computer Science

Western Connecticut State University

181 White Street, Danbury, Connecticut

UNITED STATES

Abstract: One of the most sought-after goals in the financial world is a reliable method by which investors can

predict a stock price movement consistently. Advancements in stock prediction via the use of machine learning

have improved the accuracy of such predictions and yielded better ideas about value investments in the stock

market. However, with the addition of the state-of-art tool M-band wavelet transform as a preprocessing step, we

can denoise our raw data set (prior stock prices) and refine it to make the forecast even more accurate. Following

this preprocessing step, multiple machine learning techniques are deployed to construct a robust, non-parametric

hybrid forecasting model. To demonstrate the novelty of our algorithm, we present a case study on stock price

prediction employing a discrete 4-band wavelet transform integrated with a hybrid machine learning ensemble.

Our results underscore the potential and importance of such ensemble methods in refining the accuracy and reli-

ability of financial forecasts. Furthermore, in theory, quantum computing can further optimize these algorithms,

potentially leading to more precise stock price forecasts. In particular, our ensemble will feature a hybrid quantum-

classical LSTM Model to demonstrate the potential of both wavelets and quantum computing in stock forecasting.

Key-Words: Machine Learning, M-Band Wavelet Transform, Neural Network Ensemble, Quantum Computing,

Stock forecasting, Long Short-Term Memory (LSTM) networks

Received: December 16, 2023. Revised: July 15, 2024. Accepted: August 9, 2024. Published: September 18, 2024.

1 Introduction

The financial realm is characterized by its volatile

nature, with stock price movements often being de-

scribed as “random”. As such, one of the most sought-

after goals in the financial world is finding a consis-

tent, reliable method by which these movements can

be anticipated. Over the years as investors and ana-

lysts have attempted to find such a solution, techno-

logical advancements (particularly in machine learn-

ing) have provided new techniques by which stock

prices can be forecasted. These advances have not

only elevated the precision of stock predictions but

have allowed for the creation of new algorithms to be

created to in- increase the precision of market predic-

tions further, [1].

However, while these techniques continue to ad-

vance and boundaries are continually being pushed,

handling financial data and trying to make an ac-

curate/consistent prediction remains a daunting task.

All predictions must be based on historical data which

can often be limited in its sample size, is virtually al-

ways non-linear, has no clear trends, and can have

high volatility. In other words, financial data can have

a lot of “noise” which makes the task of forecasting

financial movements a difficult one, [2].

To combat such volatility in the quest to develop

a more reliable model, we introduce the M-Band

wavelet. As a state-of-art preprocessing tool, the M-

band wavelet transform has the profound ability to de-

noise a raw dataset, [3]. It refines and purifies the

data, setting the stage for even more accurate fore-

casting than what was previously achievable.

The M-band wavelet transform can separate data

into different frequency components, specifically

1 low-frequency component and Mhigh-frequency

components, and more often the ”noise” embedded in

the raw data that makes stock forecasting so difficult

is contained within the high-frequency components of

the data, [4]. Once the Wavelet Transform and de-

noise procedure have been applied, we are left with

a “clean” data set that better reflects the true trends

within the data, and ultimately allows us to use ma-

chine learning techniques to achieve a better predic-

tion.

To test the validity of this method, several differ-

ent techniques and neural networks will be employed

through a financial ensemble to not only see the effec-

tiveness of different neural networks but also to test

the validity of the Wavelet denoise algorithm applied

to the data. An ensemble forecast will give us more

results, and should the results align more confidence

as to what financial decision to make in the future, [5].

WSEAS TRANSACTIONS on BUSINESS and ECONOMICS

DOI: 10.37394/23207.2024.21.151

Peter Bigica, Xiaodi Wang

E-ISSN: 2224-2899

1856

Volume 21, 2024

The metric of testing error will be accomplished with

a Root Mean Squared Error (RMSE).

In this research, the following techniques that will

be used include Autoregressive Integrated Moving

Average (ARIMA), Long Short Term Neural Net-

work (LSTM), Support Vector Regression (SVR),

and Recurrent Neural Network (RNN).

Lastly, we will continue our exploration into con-

structing a more reliable forecasting model by ven-

turing into the realm of quantum computing. Quan-

tum computing is an emerging advancement that of-

fers several advantages over traditional methods in-

cluding but not limited to more computational power,

more efficient optimization, and the potential for new

algorithms to make even better predictions using ma-

chine learning methods, [6].

The goal of this research is to find a more reli-

able method for investors to make smarter financial

decisions by employing alternative machine learning

methods. We will set out to prove the potential of

wavelets in financial forecasting, while also demon-

strating the power of utilizing an ensemble forecast

as opposed to standard single-method forecasts when

making financial decisions to help create better judg-

ment. Additionally, we will show the potential of

quantum mechanics in financial forecasting as we get

closer to a new era of computing

2 Literature Review

Stock forecasting (for our purposes the area of trying

to predict future stock prices via machine learning)

has been an area of intense research in the last sev-

eral decades. After 50 years of advances in the field,

neural networks have gained significant attention for

their capacity to model complex non-linear relation-

ships predominant in stock markets.

Historically, stock market predictions relied heav-

ily on statistical methods like the ARIMA and

GARCH models, [7]. As deep/machine learning

methods have developed leading into the 21st cen-

tury, stock forecasters started utilizing neural net-

works, with initial experiments focusing primarily on

Feedforward Neural Networks (FNN) and Recurrent

Neural Networks (RNN). As techniques developed,

more sophisticated networks such as Long Short-

Term Memory (LSTM) units and Convolutional Neu-

ral Networks (CNN) have become predominant in the

modern day, [5].

Advantages of neural networks over statistical

methods include:

• Modeling Non-linearity data sets

• Better proficiency with working with time series

data

• Integration of multiple diverse data sets to train

models

Despite their advances, neural networks still con-

tain challenges:

• Overfitting of data sets making models less gen-

eralized

• Interpretability and acting like a ”black box”

which complicates relying on models for finan-

cial decisions

• The required use of significant computational re-

sources, which can limit their ability to perform

real-time predictions,

In this paper, our contribution is to overcome these

challenges by creating a forecasting ensemble con-

sisting of various machine learning algorithms all

based on a Wavelet Transform to denoise prior stock

data and in turn produce better, more accurate fore-

casts that we can study and evaluate.

3 Primaries

3.1 M-Band Wavelet

An Orthogonal M-Band Discrete Wavelet Transform

(DWT) is totally determined by M sets of filters in a

filter bank with certain properties, [4]. In any such

filter bank, there are one low pass filter

, and M−1

high pass filters

(j)for j=1,2, ...., M−1with N

vanishing moments. These filters satisfy the follow-

ing conditions:

∑

i=1

i=√M(1)

∑

i=1

(j)

i=0for k=1,2,...,N−1

and j=1,...,M−1(2)

|| =||

(j)|| =1for j=1, ..., M−1(3)

(j)>=0for j=1, ..., M−1(4)

(i),

(j)i=0for i,j=1,...,M−1

and i6=j(5)

WSEAS TRANSACTIONS on BUSINESS and ECONOMICS

DOI: 10.37394/23207.2024.21.151

Peter Bigica, Xiaodi Wang

E-ISSN: 2224-2899

1857

Volume 21, 2024

In this research, the wavelet transform was per-

formed with the following filter banks:

= [−0.06737176,0.09419511,0.40580489,

0.56737176,0.56737176,0.40580489,0.09419511,

−0.06737176],

= [−0.09419511,0.06737176,0.56737176,

0.40580489,−0.40580489,−0.56737176,−0.06737176,

0.0941951],

= [−0.09419511,−0.06737176,0.56737176,

−0.4058048,−0.4058048,−0.5673717,−0.06737176,

−0.0941951],

= [−0.06737176,−0.09419511,0.40580489,

−0.56737176,0.56737176,−0.40580489,0.09419511,

0.06737176]

If the signal we are working with (S) is contained

within R4k(k∈N,k≥2), we can create a 4k×4k

Wavelet transform matrix T1by shifting and wrapping

around the filters (as shown in figure 1).

Let C1,C2,...,CMk be the column vectors of TT

Then, CT

1,CT

2,...,CT

Mk are row vectors of T1. Since

T1is an orthogonal matrix,{C1,C2,...,CMk}forms an

orthonormal basis of RMk. Therefore, the components

of ˜

S1are coordinates of Sunder this wavelet basis, and

hence kT1Sk=k˜

S1k. The components of ˜

S1are also

called the wavelet coefficients of S. Moreover, the

M-Band DWT of Sdecomposes Sinto Mdifferent

frequency components with a(1)being the lowest fre-

quency component (or trend) and d(1)

1,...,d(1)

M−1being

the higher frequency components (or fluctuations) of

S. If necessary, and if kis divisible M, we can apply

DWT to a(1)using a k×kDWT matrix T2such that:

T2a(1)=ha(2),d(2)

1,...,d(2)

M−1iT,˜

S2,(6)

where

a(2)=a(2)

1,a(2)

2,a(2)

3,...,a(2)

MT

,(7)

and

d(2)

i=d(2)

i,1,d(2)

i,2,d(2)

i,3,...,d(2)

i,k

MT

i=1,...,M−1,(8)

The M-Band DWT of a(1)decomposes a(1)into M

different frequency components with a(2)being the

lowest frequency and d(2)

i(i=1,...,M−1) being

higher frequency components of a(1).

Let T=T20

0IT1, where the lower corner 0is

an (M−1)k×kzero matrix, the upper corner 0is a

k×(M−1)kzero matrix, and Iis an (M−1)k×(M−

1)kidentity matrix. Then,

T S =ha(2),d(2)

1,...,d(2)

M−1,d(1)

1,...,d(1)

M−1iT,˜

S2,

(9)

and ˜

S2is the second level wavelet coefficients of

S. Since {C1,C2,...,CMk}is an orthonormal basis of

RMk

S=s1C1+s2C2+...+snCn,(10)

where n=Mk and si=CT

iS=hCi,Si, the inner

product of Ciand Sfor i=1,2,...,n. Therefore,

si=









a(1)

ifor i=1,2,...,k

d(1)

1,ifor i= (k+1),k+2,...,2k

d(1)

(M−1),ifor i= (M−1)k+1,...,Mk

(11)

Let

A(1)=a(1)

1C1+...+a(1)

kCk,D(1)

=d(1)

i,1Cik+1+...+d(1)

i,(i+1)kC(i+1)k

(12)

for i=1,...,M−1. Then A(1)is correspond-

ing to a(1)and (D(1)

iis corresponding to d(1)

i)for i=

1,...,M−1.If we let

V(1)=span{C1,...,Ck}(13)

and

W(1)

i=span{Cik+1,...,C(i+1)k}(14)

for i=1,...,M−1, then V(1),W(1)

1,..., and

W(1)

(M−1)are orthogonal subspaces of RMk and there-

fore it can be represented as the following direct sum,

Fig. 1: An example of 16x16 Wavelet Transform matrix

WSEAS TRANSACTIONS on BUSINESS and ECONOMICS

DOI: 10.37394/23207.2024.21.151

Peter Bigica, Xiaodi Wang

E-ISSN: 2224-2899

1858

Volume 21, 2024

RMk =V(1)⊕W(1)

1⊕...⊕W(1)

(M−1).(15)

So, for any S∈RMk,Scan be written uniquely as

S=A(1)+D(1)

1+...+D(1)

M−1,(16)

where A(1)=ProjV(1)Sis the orthogonal projection

of Sonto V(1),D(1)

i=ProjW(1)

Sis the orthogonal pro-

jection of Sonto W(1)

i, for i=1,...,M−1.

If we apply the second level DWT to S, then sub-

spaces V(2),W(2)

1,...,W(2)

1,W(1)

2,..., and W(1)

M−1are

orthogonal to each other and therefore,

RMk =V(2)⊕W(2)

1⊕...⊕W(2)

(M−1)

⊕W(1)

1⊕...⊕W(1)

(M−1),(17)

V(1)=V(2)⊕W(2)

1⊕...⊕W(2)

(M−1),(18)

S=A(2)+D(2)

1+...+D(2)

M−1+D(1)

1+...+D(1)

M−1,

(19)

A(1)=A(2)+D(2)

1+...+D(2)

M−1(20)

3.2 Autoregressive Integrated Moving

Average (ARIMA)

The ARIMA model, primarily based on linear rela-

tionships, has been a cornerstone in the field of time

series analysis for several decades, celebrated for its

simplicity and ability to handle non-stationary data.

Its strengths lie in its capacity to model and forecast

based on both prior data and prior errors, while in-

corporating autoregressive and moving average com-

ponents, [7]. However, despite its strengths, ARIMA

models do possess limitations especially when deal-

ing with nonlinear data sets which more often than

not are to be expected in real-world scenarios, [8].

ARIMA models primarily have been designed for

making predictions involving time-series-based data.

This model is typically configured using 3 values de-

noted (p,d,q), where ’p’ signifies the order of the

model, ’d’ indicates the degree of differencing, and

’q’ represents the number of lagged forecast errors in

the model, [8].

We can see how ARIMA is derived from AR and

MA models from the below. A pure auto-regressive

model (AR) is one where Ytdepends only on its lag.

Yt=

1Yt−1+

2Yt−2+···+ +

pYt−p(21)

where

nis the coefficient of the lag that the model

estimates.

On the other hand, a pure moving average (MA)

model is one where Ytdepends only on the lagged

forecast errors

Yt=

t−1+

t−2+···+

t−q(22)

Now, the ARIMA model can be derived from both

of these equations combined with an integrated (I)

term which controls the differencing in the model.

The ARIMA model is then given by:

1Y0

t−1+···+

pY0

t−p+

t−1+···+

t−q(23)

3.3 Long Short Term Neural Network

(LSTM)

Long Short-Term Memory (LSTM) networks, an ad-

vancement Recurrent Neural Networks (RNNs), have

been an important point in research involving long-

range modeling of time series data. Introduced by

German computer scientists Hochreiter and Schmid-

huber in 1995, LSTMs were designed to overcome the

predominant challenges faced by RNN models at the

time, more specifically the vanishing and exploding

gradient problems, [3]. In RNNs, during the back-

propagation process, gradients are calculated to up-

date the network’s weights. However, when dealing

with larger data sets, these gradients can become ex-

tremely small (vanish) or very large (explode). These

issues made it difficult for RNNs to learn and retain

information over longer intervals, constraining their

prowess in tasks in many tasks, [9].

LSTMs, with their enhanced architecture, effec-

tively address these challenges which limited RNN

models. Their architecture is comprised of three types

of gates: the input, forget, and output gates. Each of

these gates is responsible for regulating the flow of in-

formation in the network. This flow allows LSTMs to

decide what information to store, discard, or out-put at

each time step. These gates and their ability to deter-

mine what information should be kept or discarded at

each step in the sequence, allow the network to main-

tain a more stable gradient over time and solve the

problems which RNN suffers from, [1]. It also has the

added benefit of being able to maintain information

over longer intervals and thus capture intricate pat-

terns and dependencies in time series data. Over time,

various modifications to the standard LSTM model

have led to new architectures, such as the Gated Re-

current Units (GRUs).

As mentioned, the LSTM unit is composed of a

cell, an input gate, an output gate, and a forget gate.

WSEAS TRANSACTIONS on BUSINESS and ECONOMICS

DOI: 10.37394/23207.2024.21.151

Peter Bigica, Xiaodi Wang

E-ISSN: 2224-2899

1859

Volume 21, 2024

The cell remembers values over arbitrary time inter-

vals, and the three gates regulate the flow of informa-

tion into and out of the cell.

Let’s denote:

•xtas the input at time step t.

•htas the hidden state at time step t.

•ctas the cell state at time step t.

•ftas the forget gate’s activation.

•itas the input gate’s activation.

•otas the output gate’s activation.

•

as the sigmoid activation function.

LSTM can then be derived from the following set

of equations. Additionally, the equation for the root

mean square error loss function which will be used to

test the accuracy of the model is given below:

ft=

(Wf[ht−1,xt] + bf)(24)

it=

(Wi[ht−1,xt] + bi)(25)

˜ct=tanh(Wc[ht−1,xt] + bc)(26)

ct=ftct−1+it˜ct(27)

ot=

(Wo[ht−1,xt] + bo)(28)

ht=ottanh(ct)(29)

RMSE =s1

∑

t=1

(yt−ht)2(30)

where:

•Wf,Wi,Wc, and Woare weight matrices.

•bf,bi,bc, and boare bias vectors.

•represents element-wise multiplication.

•Nrepresents the number of observations

•ytand htare the target and predicted outputs at

time t

The LSTM’s ability to forget, learn, and output

information gives it the capacity to model time se-

ries and sequences with long-range dependencies and

avoid the vanishing and exploding gradient problems

of traditional RNNs.

• The forget gate (ft) decides what information

from the cell state should be thrown away or kept.

• The input gate (it) updates the cell state with new

information.

• The output gate (ot) determines what the next

hidden state should be.

3.4 Support Vector Regression (SVR)

Support Vector Regression (SVR), an extension of the

widely acclaimed Support Vector Machines (SVM)

used for regression tasks, is known for its ability to

handle high-dimensional data. SVR operates by map-

ping input data into a higher-dimensional space where

it seeks to find a hyperplane that best fits the data. The

central idea is to identify a function that, for a given

tolerance of error, has the smallest possible deviation

from the actual training data while maintaining a max-

imal margin, [3]. SVR typically uses an

-insensitive

loss function that allows errors falling within a de-

fined threshold (

) and it penalizes errors outside this

range much more heavily.

Over the years, SVR’s application has spanned

across diverse domains, from finance to environmen-

tal modeling. One of the defining features of SVR is

its utilization of kernel functions, such as polynomial,

radial basis function (RBF), sigmoid, and wavelets

which allow it to model complex, non-linear relation-

ships in data by transforming it into a space where lin-

ear regression techniques become applicable, [3]. The

flexibility to choose and craft kernels grants SVR a

versatile edge over many traditional regression mod-

els. While SVR’s robustness and efficacy in high-

dimensional spaces are notable, challenges like tun-

ing hyperparameters and potential computational in-

efficiencies in very large datasets are areas of current

research.

Given a dataset: (x1,y1),(x2,y2),...,(xn,yn)the

SVR optimization problem can be formulated as:

minimize: 1

2||w||2+C

∑

i=1

(

∗

i)(31)

subject to: 





yi−hw,xii−b≤

hw,xii+b−yi≤

∗

i≥0

(32)

Where:

•wis the weight vector.

•bis the bias term.

•

iand

∗

iare slack variables.

•Cis a regularization parameter.

3.5 Recurrent Neural Network (RNN)

Recurrent Neural Networks (RNNs) have gathered

substantial attention in the realm of deep learning

due to their ability to process sequential data, making

them an ideal choice for tasks involving time series.

RNNs are designed to maintain a memory of past in-

cidents in their internal state, allowing them to exhibit

WSEAS TRANSACTIONS on BUSINESS and ECONOMICS

DOI: 10.37394/23207.2024.21.151

Peter Bigica, Xiaodi Wang

E-ISSN: 2224-2899

1860

Volume 21, 2024

temporal dynamic behavior and recognize patterns

across time steps. Unlike traditional feed-forward

neural networks, RNNs possess loops, enabling the

propagation of information across sequences. This

inherent feature makes them distinctively powerful in

handling tasks where context from earlier steps is vi-

tal for understanding subsequent data points, [1].

However, while RNNs at the time were revolu-

tionary, they unfortunately came with some detrimen-

tal challenges. One significant limitation is RNN’s

difficulty in learning long-term dependencies due to

the infamous vanishing and exploding gradient prob-

lems. This challenge is still within RNN models,

however, it did lead to the development of more so-

phisticated architectures like Long Short-Term Mem-

ory (LSTM) networks and Gated Recurrent Units

(GRUs), both of which were designed to overcome

the shortcomings of RNNs.

RNN can be defined as such:

ht=

(Whhht−1+Wxhxt+bh)(33)

yt=Whyht+by(34)

where:

•xtis the input at time step t.

•htis the hidden state at time step t.

•ytis the output at time step t.

•Wxh,Whh, and Why are weight matrices.

•bhand byare bias vectors.

•

is the activation function.

While RNNs are powerful for modeling sequential

data, they have challenges such as the vanishing and

exploding gradient problems. These challenges make

it difficult to learn long-range dependencies in se-

quences. Advanced RNN architectures, like the Long

Short-Term Memory (LSTM) and the Gated Recur-

rent Unit (GRU), were developed to overcome these

limitations.

3.6 Quantum Hybrid Quantum Classical

Model

The integration of quantum computing with classical

machine learning models is one such area of research

that has emerged over recent years with the slow but

steady creation of new tools to put quantum comput-

ing (or rather quantum simulation) in the hands of

users. Quantum computing promises unprecedented

advances, allowing uses to explore new areas that

classical computers could not. For our purpose, quan-

tum computing has the potential to be combined with

current classical machine learning algorithms in order

to enhance their predictions abilities, [9].

Hybrid quantum-classical LSTM models aim

to intertwine the time series forecasting strengths

of LSTMs with quantum-enhanced data processing

techniques such as quantum gates, superstition, and

entanglement, [5]. Preliminary studies into this hy-

brid model have shown potential advantages includ-

ing improved accuracy and more efficient process-

ing speeds. Quantum algorithms, such as the Quan-

tum Approximate Optimization Algorithm (QAOA)

or Quantum Support Vector Machines (QSVM), have

shown the potential in solving optimization problems

more efficiently, [10]. Integrating these methods with

classical LSTMs can potentially lead to faster model

training and enhanced prediction accuracy. However,

the creation of these hybrid models remains a chal-

lenge due to quantum computing still being young and

the lack of infrastructure. Additionally, the complex-

ity involved in effectively combining quantum and

classical processes adds yet another layer of challenge

for working with quantum computing. Continued ad-

advancements in quantum hardware and algorithms

will likely pave the way for more robust and scalable

hybrid models in financial forecasting and find use in

other fields as well

In the quantum version of the LSTM cell, quantum

gates, and qubits replace classical gates and bits. The

core memory cell would comprise qubits, and quan-

tum gates would handle the gating mechanisms:

• The input,forget, and output gates could be im-

plemented using quantum gates that control the

flow of quantum information.

• Entanglement could potentially be used to cap-

ture and maintain long-range dependencies in the

sequence.

• Quantum computations result in a superposition

of states. A measurement collapses this superpo-

sition to classical bits. The outcome of the quan-

tum LSTM cell would be interfaced with classi-

cal neural network layers:

• Quantum states (from qubits) of the memory cell

would be measured and converted to classical in-

formation.

• This classical information would then be pro-

cessed by traditional LSTM layers or other neural

network architectures.

WSEAS TRANSACTIONS on BUSINESS and ECONOMICS

DOI: 10.37394/23207.2024.21.151

Peter Bigica, Xiaodi Wang

E-ISSN: 2224-2899

1861

Volume 21, 2024

4 Experiment Design and Procedure

4.1 Wavelet Transform

For this experiment, in order to denoise our prior data

set (in this case, AMZN will be the stock we will try

to forecast and will use prior closing price data) a 4

Band wavelet will be applied as such:

S∗=T S =









(35)

where

a=





<T1,S>

<T2,S>

<Tn

4−1,S>

<Tn

4,S>







(36)

And

"d1

d3#=





<Tn

4+1,S>

<Tn

4+2,S>

<Tn,S>







(37)

where T=4 Band Wavelet Transform Matrix, S =

Prior Stock Closing Data, a=Approximation Portion

(low frequency), di= detail portions for i=1,2,3and

Tiis i(th)row of T. Additionally, n=Number of days

(in the form n=4kwhere k∈N). In this research,

256 days of prior data were used so k=84, and hence

the wavelet transform matrix was of size 256 ×256.

Thresholding was then accomplished by redefining di

using the following hard threshold to prevent over-

smoothing of the data:

p2log(N)(38)

where

=Standard deviation of noise

from the wavelet coefficients and N=

number of elements in the band. After applying

the threshold, we achieve ˆ

Sfrom S∗. Let TTbe the

transform of the wavelet matrix, we can use this to

reconstruct our smoothed data from the wavelet data

and finally get our smoothed usable data, ˜

TT·ˆ

S=˜

S≈S(39)

Figure 3 shows the denoised AMZN data that will be

used to train our various models in our ensemble can

be seen compared to the original AMZN in Figure 2.

As we can see, the denoised data retains the shape

of the original data and the vital time points and does

not over smoothing the data.

4.2 Long Short Term Neural Network

(LSTM)

For the LSTM model, the model was created in

Python and trained both from 248 days of original

data and wavelet denoised data to compare the two,

then a 10-day prediction was output by the model us-

ing each respective dataset. The network was con-

structed using TensorFlow’s Keras API with the fol-

lowing layers:

1. Conv1D layer with 64 filters and a kernel size of

2. This convolutional layer helps to extract fea-

tures from the sequence data, which can be bene-

ficial in identifying trends or patterns over time.

2. Two LSTM layers with 60 units each, designed

to capture long-term dependencies in sequence

data. The first LSTM returns sequences to pro-

vide a three-dimensional array as input to the next

LSTM layer.

Fig. 2: Original AMZN stock data

Fig. 3: Denoised AMZN stock data

WSEAS TRANSACTIONS on BUSINESS and ECONOMICS

DOI: 10.37394/23207.2024.21.151

Peter Bigica, Xiaodi Wang

E-ISSN: 2224-2899

1862

Volume 21, 2024

3. Two Dense layers with 30 and 4 units respec-

tively, using ReLU activation. These are fully

connected layers that help in learning non-linear

combinations of the features.

4. A final Dense layer outputs a single value, repre-

senting the model’s prediction for the next time

step.

In total, this model is comprised of 6 layers and re-

lies on the Adam optimizer as the loss function in or-

der to minimize the difference between the predicted

and actual values in training. The Adam optimizer

was chosen for its adaptive learning rate, which is

very effective in handling the noisy gradients of fi-

nancial time series data.

The dataset was segmented into 80% training,

10% validation, and 10% testing sets, in order to

create an effective learning curve. The table (Ta-

ble 1) below features a comparison of RMSE val-

ues for 1- 1-day-historical, 5-day- historical, and 10-

day-historical values for AMZN Stock, both with and

without the Wavelet Denoising.

Below, the training of the LSTM model using orig-

inal and denoised data can be seen in Fig.4 and Fig.5,

alongside the respective original and denoised predic-

tions in Fig.6 and Fig. 7:

Lastly, the 10-day predictions using each data set:

Fig. 4: LSTM training with original data

Fig. 5: LSTM training with denoised data

Fig. 6: LSTM prediction with original data

Fig. 7: LSTM prediction with denoised data

WSEAS TRANSACTIONS on BUSINESS and ECONOMICS

DOI: 10.37394/23207.2024.21.151

Peter Bigica, Xiaodi Wang

E-ISSN: 2224-2899

1863

Volume 21, 2024

Day Original Value Denoised Value

1 138.34 138.35

2 138.73 138.90

3 138.90 139.16

4 138.95 139.24

5 138.89 139.22

6 138.65 139.11

7 138.37 138.93

8 138.16 138.71

9 137.82 138.26

10 138.47 137.86

RMSE Values

Original 6.27

Denoised 5.94

Table 1: LSTM comparison of Predicted Stock

Values

As we can see in Table 1, our LSTM model fore-

casts AMZN prices staying the same with a minor

downward trend over the next 10 days. This would

suggest to investors a good strategy would be to hold

over the next coming days. Additionally, we can also

see our model had a lower RMSE value when trained

with denoised data vs original data, which suggests

that our wavelet denoising had a positive impact on

the result.

4.3 Recurrent Neural Network (RNN)

For the RNN model, utilizing Python a 2-layer model

was created and trained both with original and de-

noised data to compare the results. The model is

comprised of a SimpleRNN layer and a Dense layer

which utilizes a ReLU activation function in order

to mitigate the vanishing gradient problem which oc-

curs when training neural networks. Likewise, the

’adam’ optimizer is used once again. The figures be-

low feature a comparison of RMSE values and pre-

dicted stock price between the original and denoised

data in Fig.8 and Fig. 9:

Day Original Value Denoised Value

1 142.04 142.87

2 141.53 142.51

3 141.05 142.15

4 140.60 141.81

5 140.19 141.47

6 139.79 141.15

7 139.42 140.83

8 139.07 140.53

9 138.74 140.23

10 138.43 138.43

RMSE Values

Original .0389

Denoised .0297

Table 2: RNN comparison of Predicted Values

As we can see in Table 2, our RNN model likewise

forecasts AMZN prices staying the same with a subtle

downward trend over the next 10 days. This would

suggest to investors a good strategy would be to hold

over the next coming days, or potentially sell before a

price decrease occurs. Additionally, we can also see

our model had a lower RMSE value when trained with

denoised data vs original data, which suggests that our

wavelet denoising had a positive impact on the result.

4.4 Support Vector Regression (SVR)

The SVR model is likewise trained once with de-

noised data and once with original data for compar-

ison. The model employs the RBF kernel, which is

defined as such:

K(xi,xj) = exp(−

kxi−xjk2)(40)

This kernel was chosen due to its ability to handle

nonlinear relationships.

The hyperparameters in this model were carefully

chosen to balance model complexity with general-

ization capability, thereby minimizing overfitting, a

Fig. 8: RNN prediction with original data

Fig. 9: RNN prediction with denoised data

WSEAS TRANSACTIONS on BUSINESS and ECONOMICS

DOI: 10.37394/23207.2024.21.151

Peter Bigica, Xiaodi Wang

E-ISSN: 2224-2899

1864

Volume 21, 2024

critical consideration for financial time-series predic-

tions. Below the stock prediction is given both with

the original and denoised data along with their RMSE

values in Fig.10 and Fig. 11:

Day Original Value Denoised Value

1 142.00 145.62

2 141.76 149.87

3 141.53 152.17

4 141.34 154.45

5 141.16 156.79

6 141.00 159.12

7 140.86 161.40

8 140.73 163.58

9 140.62 165.63

10 140.59 165.70

RMSE Values

Original 3.87

Denoised 2.54

Table 3: SVR comparison of Predicted Stock Val-

ues

As we can see in Table 3, our SVR model fore-

casts a drastic increase in AMZN prices over the next

10 days. This would suggest to investors a good strat-

egy would be to buy before the next 10 days occur, to

capitalize on the “bullish” behavior of the market to

come. Additionally, we can also see our model had a

lower RMSE value when trained with de-noised data

vs original data, which suggests that our wavelet de-

noising had a positive impact on the result.

4.5 Autoregressive Integrated Moving

Average (ARIMA)

For the ARIMA model, the model was created in

Python with the model’s parameters (p,d,q) being

chosen from the Partial Auto-correlation Function

(PACF) as seen in Figure 12 and Auto-correlation

Function (ACF) as seen in Figure 13 based on the

prior stock data seen below:

Lastly, a day 10 forecast was made using both de-

noised (Figure 15) and original stock data (Figure 14):

Fig. 10: SVR prediction with original data

Fig. 11: SVR prediction with denoised data Fig. 12: Auto-correlation Function

Fig. 13: Partial Auto-correlation Function

WSEAS TRANSACTIONS on BUSINESS and ECONOMICS

DOI: 10.37394/23207.2024.21.151

Peter Bigica, Xiaodi Wang

E-ISSN: 2224-2899

1865

Volume 21, 2024

Day Original Data Denoised Data

1 142.32 139.30

2 142.39 141.37

3 142.47 138.17

4 142.48 139.49

5 142.42 139.97

6 142.39 138.63

7 142.42 139.62

8 142.44 139.34

9 142.45 138.98

10 142.46 139.04

RMSE Values

Original 4.35

Denoised 4.02

Table 4: ARIMA comparison of Predicted Stock

Values

As we can see in Table 4, our ARIMA model fore-

casts AMZN prices staying the same over the next 10

days with minor volatility. This would suggest to in-

vestors a good strategy would be to hold over the next

coming days. Additionally, we can also see our model

had a lower RMSE value when trained with denoised

data vs original data, which suggests that our wavelet

denoising had a positive impact on the result.

4.6 Quantum Hybrid Classical LSTM

Method

For the final model in our ensemble, an LSTM model

that is equipped with a quantum layer via the use of

the Pinneylane package in Python was deployed and

was trained with our wavelet denoised stock price

data to likewise give a 10-day stock prediction. Using

Pinneylane, the quantum layer simulates 4 qubits (the

equivalent of a bit in classical computing). Angle em-

bedding within the quantum circuit was used to map

classical data (closed stock prices) to a quantum state,

by which an entangle layer was used within the circuit

to simulate quantum entanglement. The quantum cir-

cuit’s output was then fed back into the classical neu-

ral network, specifically an LSTM model which con-

tains a convolutional layer, 2 LSTM layers, a dropout

layer, and 3 dense layers (9 layers in total)

The goal of this model was to leverage the bene-

fits of quantum computing, with the use of wavelets,

to process data in ways that might be more efficient

or effective than classical-only approaches. Quantum

computing can potentially identify patterns in data

that classical algorithms might miss since it has addi-

tional tools that classical methods do not such as su-

perposition and entanglement, [9]. For our purposes,

the qubits we are utilizing can exist in multiple states

simultaneously and can entangle with each other, al-

lowing our model to not only analyze data sets more

efficiently but also evaluate numerous future possibil-

ities for our forecast which classical computing could

not, [10]. These benefits allow quantum methods to

be an ideal candidate as a preprocessing step.

The training of the model using wavelet de-noised

data is as seen below in Figure 16:

Fig. 14: ARIMA prediction with original data

Fig. 15: ARIMA prediction with denoised data

WSEAS TRANSACTIONS on BUSINESS and ECONOMICS

DOI: 10.37394/23207.2024.21.151

Peter Bigica, Xiaodi Wang

E-ISSN: 2224-2899

1866

Volume 21, 2024

Lastly, the final prediction of our ensemble is

given in Figure 17:

Day Predicted Value

1 126.15

2 125.83

3 125.76

4 125.84

5 125.98

6 126.19

7 126.40

8 126.61

9 126.80

10 126.90

RMSE: 12.82

Table 5: Predicted stock values and RMSE score for

the Hybrid Quantum Classical Model

As we can see from Table 5, the hybrid quantum-

classical model was in line with our other models

as it suggested AMZN stock price would remain un-

changed with little movement. These results paired

with our other models, in addition to the low RMSE

value, show that this is a quality forecast and gives a

glimpse into the power of quantum computing in fi-

nancial forecasts.

5 Conclusion

The results from our newly developed ensemble

model demonstrate the power of wavelet, machine

learning, and quantum computing in several ways.

Firstly, the different models show different lenses by

which the future of AMZN stock can be interpreted.

The majority of the models predict that the price of

AMZN stock over the next 10 days will either stay the

same or mildly decrease, suggesting to a potential in-

vestor that holding may be a good financial move. We

can also see across the board that using the de-noised

data aided in the training of our models, as across the

board RMSE values were lower when models utilized

de-noised data vs raw data, demonstrating the power

of the wavelet de-noising algorithm.

Additionally, the hybrid quantum-classical model

was able to capture trends from prior data well and

give a prediction that was in line with the other mod-

els. This demonstrates the potential for such hybrid

quantum-classical models and shows that there is po-

tential for them in the field. We believe that further re-

search into quantum computing can prove to be fruit-

ful as a new emerging method to make financial fore-

casts.

Overall, the results of this research show that

wavelet de-noising is a useful tool in financial fore-

casts, being able to make our predictions more reliable

and efficiently. While training each model, we can

see an overall decrease in RMSE values when train-

ing with wavelet denoised data as opposed to origi-

nal data. This means that not only did each model

capture the history and trends of the data better, but

it also gave us more confidence in the quality of the

prediction leading to better financial decisions. This

clearly indicates that wavelets show great promise in

financial forecasting and contribute to making a more

sound financial decision.

Additionally, our ensemble forecast clearly shows

the usefulness of such models when making finan-

cial decisions. Employing multiple methods for

stock forecasting helps to eliminate the aforemen-

tioned “black box” issue. Utilizing multiple tech-

niques yields a more diverse range of results which

can aid in making financial decisions. In our case,

when each model yields similar results about the fu-

ture of a stock price, it can give investors more confi-

dence in their decision.

Fig. 16: Hybrid Quantum Classical LSTM model training

Fig. 17: Hybrid Quantum Classical LSTM model prediction

WSEAS TRANSACTIONS on BUSINESS and ECONOMICS

DOI: 10.37394/23207.2024.21.151

Peter Bigica, Xiaodi Wang

E-ISSN: 2224-2899

1867

Volume 21, 2024

In the future, we can extend our current methods

by combining them into a hybrid model, as opposed to

separate models. Additionally, there is a large amount

of potential for tweaks to the quantum model such as

using other packages, adjusting the number of qubits,

etc. We believe the results of this research can also be

improved by better selecting a wide variety of stocks

to train our models vs using only one.

Acknowledgment:

• Dr. Xiaodi Wang, PhD, Western Connecticut

State University for his time and support

• The WCSU Department of Mathematics for

making this research possible

• The WCSU Student Government Association

for their constant support for this research

• The WCSU Foundation for their contributions

to this research

References:

[1] M. Goldani, “Comparative analysis on

forecasting methods and how to choose a

suitable one: case study in financial time series,”

Journal of Mathematics and Modeling in

Finance, pp. 35–59, 2023.

[2] S. Selvaraj, N. Vijayalakshmi, S. S. Kumar, and

G. D. Kumar, “Maximizing Profit Prediction:

Forecasting Future Trends with LSTM

Algorithm and compared with Loss function and

Mean error code using Python,” Ushus Journal

of Business Management, vol. 22, no. 4, pp.

15–28, 2023.

[3] A. H. Rahimyar, H. Q. Nguyen, and X. Wang,

“Stock forecasting using M-band wavelet-based

SVR and RNN-LSTMs models,” in 2019 2nd

International Conference on Information

Systems and Computer Aided Education

(ICISCAE), 2019, pp. 234–240.

[4] V. H. Shah, “Machine learning techniques for

stock prediction,” Foundations of Machine

Learning| Spring, vol. 1, no. 1, pp. 6–12, 2007.

[5] Y. Yang and J. Wang, “Forecasting wavelet

neural hybrid network with financial ensemble

empirical mode decomposition and MCID

evaluation,” Expert Systems with Applications,

vol. 166, pp. 114097, 2021.

[6] V. Novykov, C. Bilson, A. Gepp, G. Harris, and

B. J. Vanstone, “Deep learning applications in

investment portfolio management: a systematic

literature review,” Journal of Accounting

Literature, 2023.

[7] J. Fattah, L. Ezzine, Z. Aman, H. El Moussami,

and A. Lachhab, “Forecasting of demand using

ARIMA model,” International Journal of

Engineering Business Management, vol. 10, pp.

1847979018808673, 2018.

[8] L. Rubio, A. Palacio Pinedo, A. Mejía Castaño,

and F. Ramos, “Forecasting volatility by using

wavelet transform, ARIMA and GARCH

models,” Eurasian Economic Review, pp. 1–28,

2023.

[9] E. Paquet and F. Soleymani, “QuantumLeap:

Hybrid quantum neural network for financial

predictions,” Expert Systems with Applications,

vol. 195, pp. 116583, 2022.

[10] S. Y.-C. Chen, S. Yoo, and Y.-L. L. Fang,

“Quantum long short-term memory,” in ICASSP

2022-2022 IEEE International Conference on

Acoustics, Speech and Signal Processing

(ICASSP), 2022, pp. 8622–8626.

Contribution of Individual Authors to the

Creation of a Scientific Article (Ghostwriting

Policy)

• Peter Bigica carried out all experiments and

modeling

• Xiaodi Wang was responsible for advising and

contributing the wavelet filter banks.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

• The Western Connecticut State University Stu-

dent Government Association

• The Western Connecticut State University Foun-

dation.

The authors have no conflicts of interest to

declare that are relevant to the content of this

article.

Creative Commons Attribution License 4.0

(Attribution 4.0 International , CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US

Conflict of Interest

Disclosure:

During the preparation of this work, the author(s)

used generative AI to assist in generating initial

drafts and refining the language of sections related to

the explanation of machine learning techniques and

stock forecasting methods. After using this tool, the

author(s) reviewed and edited the content as needed

and take(s) full responsibility for the content of the

publication.

WSEAS TRANSACTIONS on BUSINESS and ECONOMICS

DOI: 10.37394/23207.2024.21.151

Peter Bigica, Xiaodi Wang

E-ISSN: 2224-2899

1868

Volume 21, 2024