Research on Chinese Emotion Classification using BERT-RCNN-ATT

FENG LI, YINTONG HUO, LINGLING WANG*

School of Management Science and Engineering,

Anhui University of Finance and Economics,

Bengbu 233030,

CHINA

Abstract: - Emotional classification is the process of analyzing and reasoning subjective texts with emotional

color, that is, analyzing whether their emotional tendencies are positive or negative. Aiming at the problems of

massive data and nonstandard words in the existing Chinese short text emotion classification algorithm, the

traditional BERT model does not distinguish the semantics of words with the same sentence pattern clearly, the

multi-level transformer training is slow, time-consuming, and requires high energy consumption, this paper

proposes to classify users' emotions based on BERT-RCNN-ATT model, and extract text features in depth

using RCNN combined with attention mechanism, Multi task learning is used to improve the accuracy and

generalization ability of model classification. The experimental results show that the proposed model can more

accurately understand and convey semantic information than the traditional model. The test results show that

compared with the traditional CNN, LSTM, GRU models, the accuracy of text emotion recognition is improved

by at least 4.558%, the recall rate is increased by more than 5.69%, and the F1 value is increased by more than

5.324%, which is conducive to the sustainable development of emotion intelligence combining Chinese

emotion classification with AI technology.

Key-Words: Online; Comment Text; LSTM; Sentiment Analysis.

Received: May 18, 2022. Revised: January 9, 2023. Accepted: February 15, 2023. Published: March 17, 2023.

1 Introduction

According to the 49th Statistical Report on Internet

Development in China [1] released by China

Internet Network Information Center (CNNIC) in

Beijing on February 25, 2022, as of December 2021,

the number of Internet users in China has reached

1.032 billion, an increase of 42.96 million over

December 2020, and the Internet penetration rate

has reached 73.0%. The scale of Internet users in

China has grown steadily. The Internet has

penetrated into our lives like food, clothing, housing

and transportation. For the massive amount of text

information in the network, how to achieve

automatic and efficient analysis of these comments

and the emotions contained in the text has become a

focus of attention [2]. The research on NLP (Natural

Language Processing) came into being at the

historic moment, but it still faces a series of

difficulties and challenges. Information technology

drives the paradigm shift of communication science,

thus increasing the dependence of discipline

research on text data mining technology [3].

Emotional classification is an important branch of

the NLP field, which is widely used in many aspects,

such as artificial customer service and emotional

pacification, classification of depressive patients,

and criminal investigation assisted psychological

research [4].

Traditional emotion classification research is

mainly based on emotion dictionary and machine

learning. Early text emotion analysis work usually

focused on building an emotion dictionary,

establishing a direct mapping relationship between

the dictionary and emotion, and then using statistical

methods to extract features for analysis [5].Due to

the lack of deep extraction of text information,

neural networks as a way to achieve machine

learning has been proposed. When the neural

network gradually matures, researchers put into the

method of deep learning and proposed "word

vector" to solve the problem of data sparsity in high-

dimensional space, and can even add more features

[6].Pang (2002) et al. [7]were the first to apply

machine learning methods to emotion orientation

classification. The experiment shows that using

word unary model as the feature and Bayesian and

SVM as the classifier have achieved good results.

Deep learning is considered as a new research field

in machine learning, which has received more

attention in recent years. Zhao and others described

the challenges and opportunities they are facing in

the future for multi-modal emotion recognition of

WSEAS TRANSACTIONS on COMMUNICATIONS

DOI: 10.37394/23204.2023.22.2

Feng Li, Yintong Huo, Lingling Wang

E-ISSN: 2224-2864

17

Volume 22, 2023

deep learning [8]. Jacob Devlin pointed out in the

pre training of BERT: Deep Bidirectional Converter

for Language Understanding that BERT is simple in

concept and rich in experience. It has obtained new

and most advanced results on 11 natural language

processing tasks [9].

In October 2018, Google AI Research Institute

proposed a BERT (Bidirectional Encoder

Representation from Transformers) pre training

model [10], which is different from the traditional

emotion classification technology in the past and

has achieved the most advanced results in many

popular NLP tasks. Bert model not only introduces

the two-way coding mechanism of lstm, but also

uses the Transformer in GPT for feature extraction.

It has a very strong ability to extract text features,

and can learn the potential syntactic and semantic

information in sentences. Bai Qingchun and others

invented a position gated recurrent neural network

to dynamically integrate sentence level global and

local information to achieve attribute based text

emotion classification [11]. Duan et al. proposed a

Chinese short text classification algorithm based on

Transformer Bi directional Encoder Representation

(BERT) [12].The research team of Chengdu

University of Information Engineering put forward a

time-series multimodal emotion classification model

based on multiple perspectives to extract the key

emotional information in a specific time period and

multiple perspectives in view of the poor

multimodal fusion effect and the inability to fully

tap the key emotional information in a specific time

period and multiple perspectives [13].Suzhou

University proposed a small sample emotion

classification method based on the distillation of

large and small tutor knowledge, which reduced the

frequency of visiting the big tutor model, reduced

the distillation time in the process of training student

models, reduced resource consumption and

improved the accuracy of classification and

recognition [14].In order to improve the existing

Chinese comment emotion classification method

based on deep learning network and improve the

accuracy and efficiency of Chinese comment

emotion classification, Fan Anmin and others

improved the traditional BERT model based on

Tensorflow framework; On the Nlpcc2014 [15] data

set, each index is 1.30%, 0.54%, 2.32% and 1.44%

higher than the BERT model. Research shows that

this model performs well in the classification and

processing of Chinese comments' emotions, and is

better than previous deep learning network models

[16].

On this basis, in order to further deal with the

"emotional phenomenon" of subjectivity, emotion,

mood, mood, attitude and feeling in the text [17], Lv

Xueqiang, Peng Chen and others proposed multi

label text classification based on TLA-BERT model,

which integrates BERT and label semantic attention

MLTC method. Different from multi category text

classification, multi label text classification can

refine the text center from multiple label

perspectives [18];Zheng Yangyu and Jiang Hongwei

fully control the emotional information implied in

the context by using local context and gated

convolutional network model [19];Literature [20]

proposed a multi-channel emotion classification

method integrating feature vectors and attention

mechanisms of part of speech and word location,

which achieved high accuracy on the crawled

microblog dataset. Literature [21] added attention

mechanism to multi-channel CNN and BiGRU for

experiment, and its classification effect is better than

that of single channel network model. The word

vectors mentioned in the above research are static

word vectors, which cannot represent rich emotional

semantic information.

This paper analyzes the user's Chinese emotion

through the user's emotion classification technology

based on BERT-RCNN-ATT model, and gets

inspiration from the research on news text

classification based on improved BERT-CNN

model [22] and medical information classification

based on BERT-ATT-BiLSTM model [23].The use

of relevant technologies, as well as the combination

of Transformer to research BERT model, complete

the collection of data sets and other technologies, so

as to classify users' Chinese emotions, is conducive

to improving the existing Chinese comment emotion

classification methods based on deep learning

networks, and improve the accuracy and efficiency

of Chinese comment emotion classification.BERT

model absorbs the design idea of unsupervised

models such as auto encoder and word2vec, and

combines the characteristics of information such as

unordered relationship and sentence to sentence

relationship to be captured, and proposes a new

unsupervised objective function for the

converter.From this contribution, BERT model is

well deserved to be called the first pre training

language representation model to capture the

bidirectional relationship of text.

2 Related Work

Among the methods for studying Chinese emotion

analysis, there are currently three categories:

WSEAS TRANSACTIONS on COMMUNICATIONS

DOI: 10.37394/23204.2023.22.2

Feng Li, Yintong Huo, Lingling Wang

E-ISSN: 2224-2864

18

Volume 22, 2023

methods based on emotion dictionary [24], methods

based on machine learning [25] and methods based

on deep learning [26]. The method based on

emotion dictionary needs to establish emotion

dictionary, and the classification mainly depends on

the quality and size of emotion dictionary. However,

it is difficult to build a complete emotional

dictionary, and updating the dictionary requires a lot

of manpower and financial resources [27]. Machine

learning based methods require a lot of manual

annotation, use machine learning models for

training, and use the trained classifier to analyze the

emotional orientation of the text.Early processing of

emotion classification tasks mainly relies on manual

intervention to formulate rules. Word vectors are

characterized by exclusive hot coding, but this way

of representing results has high dimensions and

large redundancy.In order to further improve the

representation ability of words, the pre training

models Word2Vec [28] and Glove [29] based on

neural network are proposed. Through a large

number of corpus training and learning, the text is

mapped into low dimensional vectors to

automatically extract features.FastText [30 - 31]

adds n-gram features. Compared with the input of

Word2Vec, FastText is the context information of

the whole sentence.

This paper uses BERT-RCNN Att model to

study Chinese emotion classification.BERT model

makes use of Transformer's bidirectional coding pre

training model, so that each word can make a

bidirectional prediction of the whole semantics. It

can fully extract the emotional information between

texts, and integrates the attention mechanism, which

has a better effect in emotional information

classification. While ALBERT is a pre training

language model improved based on BERT model.

Compared with BERT, this model reduces the

number of parameters and also improves the

running speed [32]. The ALBERT model

decomposes the input vector into a low dimensional

matrix, which is transferred to the hidden layer

through vector mapping. This decomposition

method can significantly reduce the scale of

parameters used in data conversion of input text

information [33]. The model can also realize

parameter sharing. In ALBERT, Transformer uses

the method of parameter sharing between layers to

increase the depth of the model and reduce the

number of model parameters, which significantly

improves the speed of model training and reduces

the occupation of memory space.For BERT to learn

the correlation between statements by predicting

NSP, ALBERT proposed SOP (sense order

prediction) to replace NSP, which has improved the

accuracy and efficiency [34]. Hu Shengli[35] and

others used the ALBERT-CNN model to analyze

takeout comments. First, they used ALBERT to

extract the global features of the text vector, and the

same word can distinguish different meanings in

different contexts. Then, they used CNN to extract

the local feature information of the text. The

experimental results show that the accuracy of the

model reaches 91.3%, which proves the

effectiveness of the model.Since the CNN model

needs to set the length of context dependence

through the size of the window, the RNN model

cannot retain long-term memory, so RCNN

(recursive convolutional neural network) model is

introduced for emotion classification detection

[36].RCNN model replaces the convolution layer of

the traditional convolution neural network with a

recursive cyclic convolution layer. It combines the

advantages of CNN and RNN models, can

uniformly use the context information of words, and

has better performance. Li Yuechen et al. [37]

compared the experimental data. When the original

data is less, the BERT-RCNN model has stronger

semantic feature extraction ability than the

traditional model.

In text analysis, RCNN combined with

Attention can be used to link the expression of each

word learning with the word needed for prediction,

so as to obtain information. Its main function is to

pay attention to the most critical information in

many information and mine deeper semantic

features. Zeng Ziming et al. [38] proposed a model

integrating two-level attention to improve the

performance of emotion analysis. They use

BiLSTM and two-level attention to extract sentence

level features and feature weight distribution of each

level, and finally obtain the emotional classification

of text, which proves that the model has achieved

good results. This paper uses BERT, RCNN model

and attention mechanism to construct BERT-RCNN

Att model for Chinese emotion classification. This

model has better advantages than other models.

3 Methodology

The overall architecture of this paper is as follows

Fig1.

3.1 Word Embedment

BERT pre training model consists of input layer,

coding layer and output layer.Google has provided

two models based on bert2, which are respectively

the base model with 12 layers of transformers, 12

WSEAS TRANSACTIONS on COMMUNICATIONS

DOI: 10.37394/23204.2023.22.2

Feng Li, Yintong Huo, Lingling Wang

E-ISSN: 2224-2864

19

Volume 22, 2023

layers of Attention Headers, 768 hidden layer units

and 110 million parameters, and the large model

with 24 layers of transformers, 16 layers of

Attention Headers, 1024 hidden layer units and 340

million parameters. [39]

During the pre-training of BERT model, there

are two types of pre training tasks: Task 1: Masked

language modeling; Task 2: Next sentence

prediction [40], that is, predict the next statement.

The embedded value of BERT model is composed

of word vector, position vector and sentence feature

vector, which can ensure the correct order of words

in the text and obtain sentence level representation

ability, so as to enrich the vector representation

information and facilitate the follow-up task.

(1) Word vector: the input text is converted into a

real value vector through the word vector

matrix.Suppose that the unique heat vector

corresponding to the input sequence x is expressed

as: et∈RNx|m| , then the corresponding word vector

represents .  

Where, Wt∈R|m|xe represents the word vector

matrix that can be trained, and |m| represents the

vocabulary size; e represents the dimension of the

word vector.

(2) Block Vector: its code is the block number of

the current word (starting from 0); The input

sequence is a single block (single sentence text

classification), and the block code of all words is 0;

The input is two modules (sentence to text

classification). Each word in the first sentence

corresponds to a block code of 0, each word in the

second sentence corresponds to 1 [CLS], and the

[SEP] start and end corresponding codes are both 0.

The trainable block vector matrix is used.Ws∈R|s|xe

(|s| represents the number of blocks; e represents the

dimension of block vector). Convert the block code

es∈RNx|s| into a real value vector to obtain the block

vector Vs.

 

Fig. 1: Architecture of BERT-RCNN-ATT model

[CLS] X1 X2 X3 X4 ... Xn [SEP]

Embedding

E[CLS] E1 E2 E3 E4 ... En E[SEP]

Transformer Transformer ... Transformer

Transformer Transformer Transformer ...

Transformer encoder

T1 T2 T3 T4 T5 ... Tn-1 Tn

CNN CNN CNN CNN CNN ... CNN CNN

Attention Layer

softmax

Input

Normalization

BERT

RCNN

Output Emotional categorization

WSEAS TRANSACTIONS on COMMUNICATIONS

DOI: 10.37394/23204.2023.22.2

Feng Li, Yintong Huo, Lingling Wang

E-ISSN: 2224-2864

20

Volume 22, 2023

Fig. 2: Word embedding graph

(3) Position vector: The position vector is the

absolute position of each word. Each word in the

input sequence is converted into a position unique

hot code according to its subscript order, and then

the position vector matrix is used to convert the

unique hot code into a real value vector to obtain the

position vector VP.

 

Where, WP∈RNxe, N represents the maximum

length, e represents the dimension of the location

vector, eP represents the unique hot code, and VP

represents the location vector.

3.2 Transformer Bidirectional Prediction

BERT only uses the Encoder part of Transformer,

and its structure is shown in the figure.This part is

connected by several Encoders of the Transformer

model. Since each encoder has two residual

connections, it ensures that the model effect will not

deteriorate after each encoder. Finally, the semantic

information of the sentence can be fully obtained

under the action of multiple encoders, and then it is

transmitted to downstream tasks for target task

operation.Since the Self Attention mechanism does

not have the ability to model the position

information of the input sequence, and the position

information reflects the logical structure of the

sequence, which plays a vital role in the calculation,

position coding is added to the input layer [41].

(1) Word vector and position coding: since the

transformer model has no iterative operation of the

cyclic neural network, the position information of

each word must be provided to the transformer to

identify the order relationship in the language. First,

define the dimensions of inputs as [batch_size,

sequence_length, embedding dimension], sequence_

Length refers to the length of a sentence or even the

number of words contained in a sentence.

Embedding division refers to the dimension of each

word vector.

  󰇛󰇜



  

(2) Self-attention mechanism: first, we input

sequence xi, where each xi can be considered as

each word (word), and then we multiply xi by

embedding by W to get the embedded input ai. For

each ai, it has three matrices, namely query matrix

(the matrix for querying other words), key matrix

(the matrix for querying other words), and value

matrix (the matrix for representing the extracted

information value), Q, K, V can be obtained by

multiplying ai with three matrices respectively.

Finally, each Q pair and each key matrix can be

multiplied by K point as an attention.

  

  

  

  󰇛󰇜

[CLS] X1 X2 X3 X4 ... Xn [SEP]

Input

E[CLS] EX1 EX2 EX3 EX4 ... EXn E[SEP]

Token

Embeddings

Segment

Embeddings

Position

Embeddings

EAEAEAEAEA... EBEB

WSEAS TRANSACTIONS on COMMUNICATIONS

DOI: 10.37394/23204.2023.22.2

Feng Li, Yintong Huo, Lingling Wang

E-ISSN: 2224-2864

21

Volume 22, 2023

(3) Residual connection and layer normalization: In

the previous step, we obtained the V weighted by

the attention matrix, that is, Attention (Q, K, V), and

transposed it to make it consistent with the

dimensions of Xembedding, that is, [batch_size,

sequence_length, embedding dimension]. According

to the consistency of the dimensions, we directly

add the elements to make residual connection.

  󰇛󰇜



In the following operations, each module

operation should add the value before and after the

operation to get the residual connection. During

training, the gradient can be directly backported to

the initial layer by taking a shortcut:

  󰇛󰇜

The output of BERT model includes character level

vector and sentence level vector. This paper uses

sentence level vector plus weight as semantic

feature. Compared with traditional text

representation methods, it reduces the steps of

feature extraction and feature vector splicing, and

has certain advantages.

Fig. 3: RCNN structure diagram

Cl(w3) Word1

Left text Word

embedment Right text

Cl(w3)

Word2

Word3

Word4

Word5

Cl(w3)

...

... ......

... ...

x3

x4

x5

x6

x7

y3(2)

y4(2)

y5(2)

y6(2)

y7(2)

Circular structure Max-pooling layer Output

WSEAS TRANSACTIONS on COMMUNICATIONS

DOI: 10.37394/23204.2023.22.2

Feng Li, Yintong Huo, Lingling Wang

E-ISSN: 2224-2864

22

Volume 22, 2023

3.3 RCNN Model

RCNN model includes six parts: input layer,

convolution layer, splicing layer, pooling layer, full

connection layer and output layer. In the

convolution layer, multiple convolutions are used to

detect the input matrix, and then the convolution

operation is carried out. Finally, the features of the

convolution core are obtained by detecting some of

the features obtained. In the CNN model, the key

link is the pooling layer. In order to extract the

feature vector from the convolution, the pooling

layer will extract the important feature vector from

the feature vector. In this way, the pooling layer will

produce a matrix with a fixed size. Finally, the data

is transferred to the entire connection level for data

processing and classification. The model structure is

shown in the figure.

This paper uses RCNN depth to extract local

features, gives weight to the sequence features (E1,

E2... En) output from the last layer of BERT text as

the word embedding of convolution operation, and

extracts the local features of each entity in the

feature sequence. The calculation process of

convolution operation of convolution features is as

follows:

(1) Convolution layer: if a sentence has n words and

the word vector length of each word is m, the input

matrix is nxm, which is similar to the "image" with

channel 1. Then, after one-dimensional convolution

of different sizes of convolution kernels

(region_size) with the number of k, let the number

of convolution kernels with different sizes (filters)

be t, the width of convolution kernels and the

dimension of word vector are the same as m, and the

height h is a super parameter. A total of k * t feature

maps [42] were obtained. By combining the

information of the upper left and lower right words

in the recursive structure, the upper left and

embedded vectors of words are connected and the

hidden semantic feature vectors of words are

calculated.

The calculation formula is as follows, where ⊕

is concatenated by lines, f is a nonlinear activation

function, and b is used to represent the partial term.

 = f(󰇛󰇜()+󰇛󰇜E(-1))

 = f(󰇛󰇜()+󰇛󰇜E(+1))

Where () represents the upper left word of

a word, () represents the upper right word, E()

represents the embedding vector, and W(l) is the

weight matrix. It is transferred from the upper left

word () of the previous word to the upper left

word () of the next word, 󰇛󰇜 indicating

that it is transferred to the upper left word of the

next word by combining the semantics of the current

word E().

 = ()⊕E()⊕()



󰇛󰇜 = f(󰇛󰇜·+󰇛󰇜)

Each row in the matrix represents the extraction

result of T convolution kernels at the same position

in the sentence matrix. Since the extraction results

of T convolution kernels are collected, the row

vector vi in S represents all the convolution features

extracted for a certain position of the sentence.

(2) Pooling layer: The feature sizes obtained by

convolution kernels of different sizes are different.

Use pooling functions for each feature map to make

their dimensions the same, and then splice them to

obtain the final k * m dimension column vector. In

this experiment, the maximum value of the

convolved column vector is extracted by using the

maximum pooling layer. After pooling, we will get

a num_ The row vector of the filters dimension, that

is, the maximum value of each convolution kernel is

connected to eliminate the difference in the length

of sentences.

󰇛󰇜 = max

󰇛󰇜 (i=1,2,3...)

(4) Output layer: obtain the most representative

key features in the text from the above max pooling

layer, then output the full connection layer, and

WSEAS TRANSACTIONS on COMMUNICATIONS

DOI: 10.37394/23204.2023.22.2

Feng Li, Yintong Huo, Lingling Wang

E-ISSN: 2224-2864

23

Volume 22, 2023

finally obtain the classification results through the

softmax function [43].

󰇛󰇜 = 󰇛󰇜󰇛󰇜 + 󰇛󰇜

  󰇛

󰇛󰇜󰇜

 󰇛

󰇛󰇜󰇜





3.4 Attention

Attention mechanism can help the model give

different weights to each part of the input Xi, extract

more critical and important information, and make

more accurate judgment on the machine polarity of

emotion classification. At the same time, it will not

bring more overhead to the calculation and storage

of the model. This is why this paper uses Attention

mechanism many times. The model mainly uses the

attention mechanism to fuse the emotional feature

vector h of the text and the directional feature vector

b of the political entity, and calculate their attention

scores a. Attention mechanism comes from that

human beings will selectively focus on the key parts

of all information while ignoring other visible

information.The training set of training neural

network is composed of Q, K and V. The

constituent elements in the Source can be imagined

to be composed of a series of<Key, Value>data

pairs. At this time, an element Query in a given

Target is calculated to obtain the weight coefficient

of the corresponding Value of each Key by

calculating the similarity or correlation between

Query and each Key, and then the value is weighted

and summed to obtain the final Attention value.

Therefore, Essentially, the Attention mechanism is a

weighted sum of the value values of the elements in

the Source, while Query and Key are used to

calculate the weight coefficients of the

corresponding values. That is, its essence can be

rewritten into the following formula:

Attention(Query,Source) =

 󰇛󰇜 





In this paper, the output Ht after RCNN depth

extraction of text context information is used as the

input of the Attention layer. The model structure is

shown in Figure 4. Assumed word vectors x1, x2,...,

xn learn to derive the context vector gi when

focusing on specific important words. When

predicting sentence categories, the mechanism

should focus on important words in the sentence,

weighting and combining words with different

weights:

   





Among αi,j is called attention weight, requiring

α≥0 and αi,j x j= 1, which is realized through

softmax normalization. The formula describing the

attention mechanism is as follows:

Fig. 4: Experiment dataset

WSEAS TRANSACTIONS on COMMUNICATIONS

DOI: 10.37394/23204.2023.22.2

Feng Li, Yintong Huo, Lingling Wang

E-ISSN: 2224-2864

24

Volume 22, 2023

 󰇛󰇛󰇜󰇜

 󰇛󰇛󰇜󰇜

Score(xi,yi) = 

󰇛󰇟󰇠󰇜

The score value is calculated according to

RCNN, which is used to simulate the correlation of

words. The x with a larger score value has more

weight in the context.

4 Experimental Evaluation

4.1 Dataset

This experiment extracts information texts about

hotels, takeout, microblog and other user reviews,

including more than 7000 hotel review data, more

than 5000 positive reviews and more than 2000

negative reviews; There are more than 4000 positive

and 8000 negative user reviews collected by a

takeout platform; More than 100000 Sina Weibo

texts with emotional annotation, about 50000

positive and negative comments each.

There are two columns of data, one is the

review column, representing the input x of the

model, and the other is the label column,

representing the input y of the model. From the label

perspective, there are two types of data, so the

model is classified into two categories.

4.2 Evaluation Criteria

In order to evaluate the proposed method, this paper

uses four indicators, namely, the accuracy rate Acc

(accuracy), the accuracy rate P (precision), the recall

rate R (recall) and the F1 value (f-score), to

illustrate the efficiency of the experiment.Accuracy

refers to the proportion of the number of positive

and negative comment samples predicted by the

model to the total number of samples; The accuracy

rate refers to the proportion of correctly classified

negative comments among all the samples predicted

as negative comments. It is to show how many of

the samples predicted as positive are real positive

samples from the perspective of prediction results;

Recall rate refers to the proportion of correctly

classified negative comments among all samples

that are true negative comments. It describes how

many positive examples in the real sample are

correctly predicted from the perspective of the

original sample [44]; In order to evaluate different

algorithms, the concept of F1 value is proposed

based on Precision and Recall to evaluate Precision

and Recall as a whole.

  

  

  



  



F1=2*(Precision*Recall)/(Precision+Recall)

Among them, TP means actually negative

comments and identified by the model as negative

comments, FP means actually positive comments

but identified by the model as negative comments,

FN means actually negative comments but identified

by the model as positive comments, and TN means

actually positive comments and identified by the

model as positive comments.

4.3 Implementation Process

First, train CNN, LSTM and GRU models, read

review column and label column respectively, and

perform jieba segmentation on data, while removing

stop words and punctuation; Train word2vec, build

vocabulary and embedding matrix.B

The models to be compared and initialize the

models. When training the model, input each data

into the model and get the output. Finally, calculate

the cross entropy with the label to get the final loss.

Update the parameters through gradient back

propagation.

Second, Bert model configuration

environment:NVIDIA RTX A4000-24G ，CPU ：

E5-2680 v4，CUDA v11.2，PyTorch v1.10

WSEAS TRANSACTIONS on COMMUNICATIONS

DOI: 10.37394/23204.2023.22.2

Feng Li, Yintong Huo, Lingling Wang

E-ISSN: 2224-2864

25

Volume 22, 2023

Use the existing and trained Bert model to fine

tune. The fine tuning process is as follows:

1. Read data, read review column and label

column respectively

2. Initialize the word breaker, and the bert

(BERT: Pre training of Deep Bidirectional

Transformers for Language Understanding) uses

wordpiece word segmentation.

3. Segmenting all review columns and

converting them into input_ Ids and attention_ mask

；

4. Create a dataset and input_ Ids and

attention_ Masks are integrated together, and then a

data read iterator is constructed through the

dataloader.

5. Load the bert model, and then overlay the

full connection classification layer.

6. Training model, input each data_ Ids and

attention_ Mas inputs the model to get the output,

calculates the cross entropy with label to get the

final loss, and finally performs gradient back

propagation and updates the parameters.

4.4 Result and Discussion

First, train on the training set, and then test on the

test set. As shown in the loss graph of this model on

the training set, it can be observed that the loss is

decreasing and the model is gradually converging.

Fig. 5: Loss function

In order to make a more objective evaluation of

the model in this paper, for three different types of

data sets, under each data set, the model proposed in

this paper is compared with the previous traditional

models, and then the evaluation parameters of the

model are analyzed.First, observe the loss chart.

Due to a series of problems such as excessive

complexity, excessive noise data interference in the

sample, or inconsistent distribution of the

characteristics of the training set and the test set, the

model in the verification set may change with the

change of the model, showing a trend of "first

decreasing, then slightly increasing", which leads to

the risk of over fitting, Therefore, we should jointly

observe the dynamic changes of accuracy and loss

value to judge

Fig. 6: Loss diagram of CNN model

WSEAS TRANSACTIONS on COMMUNICATIONS

DOI: 10.37394/23204.2023.22.2

Feng Li, Yintong Huo, Lingling Wang

E-ISSN: 2224-2864

26

Volume 22, 2023

Fig. 7: Loss diagram of GRU

Fig. 8: Loss diagram of Lstm

Table 1. Experiment results on ChnSentiCorp_htl_all

Model

Accuracy

Precision

Recall

F1

BERT-RCNN-Att

0.93834

0.93799

0.93815

0.93807

CNN

0.79047

0.78705

0.71168

0.72960

GRU

0.86389

0.84379

0.84638

0.84506

LSTM

0.87334

0.86016

0.84607

0.85246

WSEAS TRANSACTIONS on COMMUNICATIONS

DOI: 10.37394/23204.2023.22.2

Feng Li, Yintong Huo, Lingling Wang

E-ISSN: 2224-2864

27

Volume 22, 2023

Table 2. Experiment results on waimai_10k

Table 3. Experiment results on weibo_senti_100k

It can be seen from the figure that as the

network is more complex, the calculation amount of

the lstm model is relatively larger, and its training

convergence is slower, but on the whole, while the

loss of the training set continues to decrease and the

accuracy continues to improve, the loss function of

the verification set is also decreasing, and the

accuracy of the verification set is also rising.The

training set is used to train the model, evaluate the

training effect, and use the test set to evaluate the

accuracy of the model. In order to prevent accidents,

this experiment iterates the RCNN model 40 times

to obtain various experimental results and

evaluation values. Each iteration will produce

multiple models, evaluate their efficiency

respectively, and then use the test set to test their

efficiency after optimization. The accuracy of the

initial test set is about 85%, After multiple

iterations, the model can be basically stable above

90%.

Common evaluation indicators include

Accuracy, Precision, Recall and F1 score. For three

different types of data sets on this test set, we get the

following comparison indicators:

This paper analyzes the internal structure and

principle of BERT and RCNN models, in which the

recursive structure of the model and max pooling

play a key role, retaining and deeply capturing a

wide range of text information, which tests the

model effect on text classification tasks.From the

experimental results, it can be seen that the accuracy

of the model introduced RCNN is about 7.6%

higher than that of the CNN model, which shows

that unlike CNN, which cannot store memory for a

long time, RCNN model, as the combination of

RNN and CNN models, achieves high accuracy in

extracting context information, improves

classification accuracy, and occupies a certain

advantage in text classification.The attention

mechanism can improve the ability of the model to

focus on more important sequence information. The

weight of each position relative to another position

can be calculated in parallel, which is much faster

than the lstm under the premise of sufficient

computing resources, and further improves the

Model

Accuracy

Precision

Recall

F1

BERT-RCNN-Att

0.91271

0.90016

0.90255

0.90134

CNN

0.86117

0.84250

0.84082

0.84165

GRU

0.86788

0.85900

0.83407

0.84454

LSTM

0.86620

0.85338

0.83703

0.84429

Model

Accuracy

Precision

Recall

F1

BERT-RCNN-Att

0.96191

0.96248

0.96188

0.96191

CNN

0.92161

0.92235

0.92168

0.92159

GRU

0.93460

0.93473

0.93463

0.93460

LSTM

0.92322

0.92341

0.92326

0.92322

WSEAS TRANSACTIONS on COMMUNICATIONS

DOI: 10.37394/23204.2023.22.2

Feng Li, Yintong Huo, Lingling Wang

E-ISSN: 2224-2864

28

Volume 22, 2023

model accuracy.Through the design of pre training

tasks, the above persuasive data sets are used to fine

tune the model on the basis of the trained model.

The model effect is better. The accuracy rate and

accuracy rate of the model itself are improved

slightly. The accuracy rate of the model in this paper

is about 5.5% higher than that of LSTM and GRU

models, and it exceeds the existing methods in many

Chinese text data sets of text classification.At the

same time, it is found that compared with the

traditional window based neural network, the noise

of RCNN concept experiment is less, which shows

that the model has strong universality.

5 Conclusion

In this paper, when solving user sentiment analysis,

we use BERT model to classify Chinese emotions,

extract information through transformer, and make

multi-layer and two-way prediction on sentences to

better understand the deep meaning of sentences.At

the same time, the RCNN model combined with

attention mechanism is used for in-depth extraction,

which can effectively analyze the positive and

negative emotions of users according to the user's

comments and the tendency of public opinion. It is

helpful for enterprises and the government to take

timely measures through relevant analysis to dig out

greater social value.In addition, the Chinese emotion

analysis in this paper also involves the integration of

artificial intelligence and computer science, which

promotes the development of artificial intelligence.

Emotional analysis is a research work with broad

application prospects. I believe there will be more

achievements in the near future. At the same time,

this study also has shortcomings. In the follow-up

study, we will expand the scope of data for in-depth

research to provide better suggestions for Chinese

emotion analysis.

Acknowledges:

This work was supported in part by the Innovation

and entrepreneurship training program for

college students under Grant No. 202210378351.

References:

[1] Started in November 1997, it is one of the

most authoritative reports on Internet

development data released by China Internet

Network Information Center (CNNIC).

[2] Zhang Xiaoyan. Sentiment Analysis of

Chinese online Comments based on weighted

fusion word vector [J]. Application Research

of Computers,2022,39(01):31-36.

[3] Shi Hao. Application, Challenge and

Opportunity of natural language processing in

computational communication research [J].

[4] Chen Guowei, Zhang Pengzhou, Wang Ting,

Ye Qiankun. A review of multi-modal

sentiment analysis.Journal of Communication

University of China (Natural Science Edition),

2022,29(02):70-78.

[5] Wang Suge. Research on Web-based

sentiment classification of comment text [D].

Shanghai: Shanghai University ,2018.

[6] Wang Yingjie, Zhu Jiuqi, Wang Zumin, Bai

Fengbo, Gong Jian. A survey on the

application of natural language processing in

text sentiment analysis [J]. Journal of

Computer Applications, 2022,42(04):1011-

1020.

[7]

Pang B. Lee L. Vaithyanathan S. Thumbs u

p?: sentiment classification using machinelear

ning techniques[C]//Proceedings of the ACL-

02 conference on Empirical methods in natura

l language processing-

Volume 10. Association for Computational Li

nguistics, 2002:79-86.

[8] Zhao Xiaoming, Yang Yijiao, Zhang

Shiqing.Research progress of multi-modal

emotion recognition for deep learning [J].

Computer Science and Exploration,

2022,16(07):1479-1503.

[9] Jacob Devlin, Ming-Wei Chang, Kenton

Lee, Kristina Toutanova, et al.BERT: Pre-

training of Deep Bidirectional Transformers

for Language Understanding,2019.

[10] DEVLIN J,CHANG M W,LEE K,et al. BERT:

Pre-training of deep bidirectional transformers

for language understanding [C]// Proceedings

WSEAS TRANSACTIONS on COMMUNICATIONS

DOI: 10.37394/23204.2023.22.2

Feng Li, Yintong Huo, Lingling Wang

E-ISSN: 2224-2864

29

Volume 22, 2023

of the 2019 Conference of the North-

American Chapter of the Association for

Computational Linguistics:Human Language

Technologies.

Stroudsburg,PA:ACL,2019:4171-4186.

[11] Bai Qingchun, Xiao Jun, Wang Lamei.An

attribute-level text sentiment classification

method based on location-gated recurrent

neural network [P]. Shanghai:

CN114996454A,2022-09-02.

[12] Duan Dandan, Tang Jiashan, Wen Yong,

Yuan Kehai. A new method for Chinese text

classification based on BERT model

[J].Computer Engineering. 2021,47(01)

[13] Tao Quanhui, An Junxiu, Dai Yurui, Chen

Hongsong, Huang Ping.Research on temporal

multi-model sentiment Classification based on

multi-view Learning [J]. Computer

Application Research.10.19734/j.issn.1001-

3695.2022.06.0298.

[14] Li Shoushan, Chang Xiaoqin, Zhou Guodong.

A small sample sentiment classification

method based on knowledge distillation of

large and small mentors [P]. Jiangsu Province:

CN114722805A,2022-07-08.

[15] NLPCC, short for Natural Language

Processing and Chinese Computing, is the

first international conference in the field of

NLP natural language processing in China,

and also the first choice in the field of Chinese

computing.

[16] Fan Anmin , Li Chunhui ,Improved BERT

model for Chinese comment sentiment

classification [J]. Software Guide,

2022,21(02):13-20.

[17] A New Perspective of Sentiment analysis

Research, Caroline Brun(CSDN),2020.3.10,

https://europe.naverlabs.com/blog/new-

horizons-in-sentiment-analysis-research/

[18] Lv Xueqiang, Peng Chen, Zhang Le, Dong

Zhian, You Xindong. A Text Multi-Label

Classification Method Combining BERT and

Label Semantic Attention [J]. Journal of

Computer Applications, 2022,42(01):57-63.

[19] Zheng Yangyu, Jiang Hongwei.Aspect-level

sentiment classification model based on local

context and GCN [J]. Journal of Background

Information Science and Technology

University (Natural Science

Edition),2022,37(01):76-81.

[20] Han Pu, Zhang Wei, Zhang Zhanpeng, et al.

Sentiment Analysis of Public Health

Emergency in Micro-Blog Based on Feature

Fusion and Multi-Channel[J]. Data Analysis

and Knowledge Discovery, 2021, 5(11): 68-

79.

[21] Cheng Y, Yao L, Xiang G, et al. Text

Sentiment Orientation Analysis Based on

Multi-Channel CNN and Bidirectional GRU

With Attention Mechanism[J]. IEEE Access,

2020, 8: 134964-134975.

[22 ]Zhang Xiaowei, Shao Jianfei.Research on

news text classification based on improved

BERT-CNN model [J]. Television

Technology,2021,45(07),146-150.

[23] Yu Zhangxian, Hu Kongfa. A new model for

the classification of medical information

based on BERT-Att-BiLSTM model [J].

Computer Age,2020,(03),1-4.

[24] Wu Jiesheng, Lu Kui, Wang

Shibing.Sentiment Analysis of Movie reviews

based on multi-sentiment dictionary and SVM.

Journal of Fuyang Teachers University

(Natural Science Edition),2019,36(02):68-72.

[25] Cheng Zhengshuang, Wang Liang.Sentiment

analysis method of online reviews based on

support Vector machine. Electronic

Technology and Software

Engineering,2019,36(02):68-72.

[26] Cui Weijian. Text sentiment analysis based on

deep learning [D]. Jilin University,2018.

[27] Xu Minlin. Research on text Sentiment

Analysis based on Sentiment Dictionary and

Neural Network [D].Jiangxi University of

Science and Technology,2020.

[28] Mikolov T, Chen K, Corrado G, et al.

Efficient Estimation of Word Representations

in Vector Space[J]. arXiv preprint arXiv:

1301.3781, 2013.

[29] Pennington J, Socher R, Manning C. Glove:

Global Vectors for Word Representation[C].

In: Conference on Empirical Methods. 2014.

WSEAS TRANSACTIONS on COMMUNICATIONS

DOI: 10.37394/23204.2023.22.2

Feng Li, Yintong Huo, Lingling Wang

E-ISSN: 2224-2864

30

Volume 22, 2023

[30] Bojanowski P , Grave E , Joulin A , et al.

Enriching Word Vectors with Subword

Information[J]. 2016.

[31] Choi J, Lee S W. Improving FastText with

Inverse Document Frequency of Subwords[J].

Pattern Recognition Letters, 2020, 133: 165-

172.

[32] Zhi Shiyao, Wu Zhenru, Chen Tao, Li

Shengda, Peng Dong. Research on sentiment

analysis of Micro-blog comments based on

ALBERT-BiLSTM-Att[J]. Computer

Age,2022(02):19-22.

[33] Cai Lei. Text sentiment analysis based on

ALBERT-BIGRUATT[D]. Xinjiang

University,2021.

[34] Gao Ying. Text sentiment analysis based on

ALBERT-SABL model [D]. Shenyang

Normal University,2022.

[35] Hu Shengli, Zhang Liping. Sentiment analysis

of takeaway comments based on ALBERT-

CNN [J]. Modern Information Technology,

2022,6(10):157-160.

[36] Wu Hao, Pan Shanliang.A new approach to

Chinese comment recognition based on

BERT-RCNN [J]. Journal of Information

Science and Technology,2019,36(01):92-103.

[37] Li Yuechen, Qian Lingfei, Ma Jing.Early

rumor detection based on BERT-RCNN

model [J]. Information Theory &

Practice,2021,44(07):173-177+151.

[38] Zeng Ziming, Wan Pinyu.A novel micro-blog

sentiment analysis for public security events

based on Bi-level attention and Bi-LSTM [J].

Information Science, 2019,37(06):23-29.

[39] Zhang Yanhua, Yang Shuo, Liu Chao.

Training models of BERT based education

equipment supply chain[J]. Journal of public

opinion report system and innovation of

science and technology, 2022 (16) ：48-

51.DOI:10.15913/j.cnki.kjycx.2022.16.015.

[40] Sun Dandan, Zheng Ruikun.Application of

BERT-DPCNN model in network Public

opinion sentiment analysis [J]. Network

Security Technology and

Application,2022(08):24-27.

[41] Zhao Hong, Fu Zhaoyang, Zhao Fan.Micro-

blog sentiment analysis based on BERT and

hierarchical Attention [J]. Computer

Engineering and Applications,

2022,58(05):156-162.

[42] Bai Jing, Li Fei, Ji Donghong.A new model

for the detection of Chinese micro-blog

position-based on BiLSTM-CNN[J].

Computer Applications and

Software,2018,35(03):266-274.

[43] Wang Haochang, Sun Mingze.Chinese short

text classification based on ERNIE-RCNN

model [J]. Computer Technology and

Development, 2022,32(06):28-33.

[44] Liu Siqin, Feng Xurui.Text sentiment

classification based on BERT [J]. Information

Security Research,2020,6(03):220-227.

WSEAS TRANSACTIONS on COMMUNICATIONS

DOI: 10.37394/23204.2023.22.2

Feng Li, Yintong Huo, Lingling Wang

E-ISSN: 2224-2864

31

Volume 22, 2023

Contribution of Individual Authors to the

Creation of a Scientific Article (Ghostwriting

Policy)

The authors equally contributed in the present

research, at all stages from the formulation of the

problem to the final findings and solution.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

Conflict of Interest

The authors have no conflicts of interest to declare

that are relevant to the content of this article.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US

This work was supported in part by the Innovation

and entrepreneurship training program for

college students under Grant No. 202210378351.