
include maximum pooling and average pooling. In the
pooling operation, little deviations in the feature maps can be
discarded and hence the accuracy can be improved while
averting the phenomenon of overfitting. If the feature map
acquired in the t-th convolutional layer is
, then the maximum value of can be
obtained by using the maximum pooling strategy as
illustrated in Eq. (2).
(2)
where expresses the pooling outcome of the ti-th
convolutional layer.
2.1.4. Fully Connected Layer. The function of the fully
connected layer is to implement the feature vector and the
extracted layer values. Assume that the fully connected layer
has m neurons, and a text feature vector is created after
initiating the ReLu activation function such as:
(3)
where denotes the ReLu activation function, is the
output of the learning resource text information on the
pooling layer, is the bias, and denotes the weight.
The neurons of the Long Short-Term Memory (LSTM)
artificial neural network accept only the data from the
neurons close to the layers, meanwhile the words before and
after affect the semantic connections. However,
Bidirectional Long Short-Term Memory Recurrent Neural
Network (BiLSTM-RNN) consists of two groups of long and
short term recurrent neural networks with contradictory
learning trends, which enables better understanding of the
related semantics in comparison with LSTM. The LSTM is
composed mainly from four elements input gate ,
forgetting gate
memory unit , and output gate The
input gate controls the flow of the data into the memory unit,
the forgetting gate identifies the holding of the previous state
data in the memory unit, which determines the memory state
based on the current input data; and then the output gate
identifies the output value of the memory unit for the next
state. The relevant computation procedure is illustrated in
Eq. 4 to Eq. 9.
(4)
(5)
(6)
(7)
(8)
(9)
where represents the memory cell state, is the data
input, b is the function bias term, W denotes the matrix
multiplication operation, is the dot product operation, and
is the sigmoid function.
BiLSTM-RNN model consists of two groups of forward and
backward LSTM models attached to a learning fearure
denoted by
and
respectively. Eq. 10
expresses the time-dimensional feature which determines
final representation of the LSTM model structure.
(10)
where denotes the concatenation operator. This operation
enables the BiLSTM model to fully process the input words
of the contextual data.
The spatial dimensional features F and the time-dimensional
feature T can be merged as depicted in Fig. 3, where
multiscale feature synthesis attention process is used to
achieve this goal.
Fig. 3. Multiscale feature synthesis attention process.
Firstly, the matrix FA and the matrix FB which
represent the matching between the dimensional features
denoted by Fi and the attribute features denoted by Ti are
determined as expressed in Eq. 11.
& (11)
Secondly, the function SoftMax is used to find the
attention distribution weights w1 and w2 of the matching
matrices. Then, the attention representation matrices F′i
and T′I are calculated by multiplying the weights w1 and w2
with the individual scale features as expressed in Eq. 12.
& (12)
where (×) denotes matrix fork multiplication.
At last, the inter-scale mutual attention matrices F1 and
F2 are calculated by using a multiplicative gating process
to multiply the attentional representation with another
single-scale feature for the relevant elements as expressed
in Eq. 13.
2.2. Long- and Short-time Bidirectional
Recurrent Neural Network
2.3 Multiscale Feature Fusion
WSEAS TRANSACTIONS on COMPUTERS
DOI: 10.37394/23205.2023.22.2