ith the rapid development of economy, noise

problems such as industrial noise and automobile

noise have become increasingly prominent. The

traditional passive noise control technology[1] is

effective for medium and high frequency noise through

passive control methods such as sound absorption and

sound insulation. However, active noise control[2],

namely the method of reducing noise signal through the

principle of destructive interference of sound waves, is

more effective for low-frequency narrow-band noise.

Adaptive denoising system is generated, that is, while

generating anti-noise signals, active repair of anti-noise

signals is carried out according to the change of noise to

complete active noise control.

FxLMS algorithm is widely used in active noise control

system due to its simple circuit structure, simple

implementation and small computation. Researchers through

FPGA[3], DSP[4], MCU[5], ASIC[6] and other design

methods for the hardware implementation of the algorithm.

Reference [3] puts forward a hardware implementation of

FxLMS algorithm based on FPGA, which divides the operation

part of the algorithm into filtering part and update part, in which

the filtering part is FIR filter, namely the process of

one-dimensional convolution; the update part is the weight

update part of LMS algorithm block, that is, the process of

multiply accumulate (MAC). Reference [5] proposes an

implementation of FxLMS algorithm using STM32F407

microprocessor of Cortex-M4, and proposes a fixed step size

method to reduce the computation and solve the problem of

floating-point operation.

An audio denoising coprocessor based on RISC-V custom

instruction set extension is designed in this paper. According to

the hardware implementation of traditional FxLMS algorithm,

the software and hardware co-design of FxLMS algorithm is

carried out, the work of filling and moving the data to be

processed is handed over to MCU for processing. Meanwhile,

the convolution and MAC operations with large computation

are designed as hardware accelerators, and the coprocessor is

designed in the way of instruction pipeline. Finally, the

hardware acceleration is completed by coprocessor, and the

heterogeneous SOC is combined with the hardware accelerator.

RISC-V instruction set has been widely welcomed all over

the world since it was published in 2014. RISC-V instruction

set design is simplified and efficient. At present, the setting of

RISC-V modular instruction set makes RISC-V architecture

have more choices, so that it can try to meet various

applications through a u nified architecture, which is an

advantage that X86 and ARM instruction set architecture do not

possess. Extensibility of instructions is a prominent feature of

RISC-V architecture. Users can customize instructions

according to the reserved instruction coding space. So that the

coprocessor has better portability.

Audio Denoising Coprocessor Based on RISC-V Custom Instruction

Set Extension

JUN YUAN1, QIANG ZHAO1, WEI WANG, XIANGSHENG MENG1, JUN LI1, QIN LI2

1School of Optoelectronic Engineering, Chongqing University of Posts and Telecommunications,

Chong Qing 400065, CHINA

2Chongqing Marketing Department of Southwest Oil & Gas Field Company, Chong Qing 401120,

CHINA

Abstract—As a typical active noise control algorithm, FxLMS is widely used in the field of audio denoising. In

this paper, an audio denoising coprocessor based on RISC-V custom instruction set extension was designed,

and the idea of software and hardware co-design was adopted; based on the traditional pure-hardware

implementation, the accelerator optimization design was carried out, and the accelerator was connected to RISC-

V core in the form of coprocessor. Meanwhile, the corresponding custom instructions were designed, the

compiling environment was established, and the library function of coprocessor acceleration instructions was

established by embedded inline assembly. Finally, the ANC system was built and tested based on E203-SoC,

and the test data was collected by audio analyzer. The results showed that the audio denoising algorithm could

be realized by combining heterogeneous SoC with hardware accelerator, and the denoising effect was about

8dB. The number of instructions consumed by testing custom instructions for specific operations was reduced

by about 60%, and the operation acceleration effect was significant.

Keywords—RISC-V, Custom instruction, ANC, Coprocessor.

Received: August 29, 2021. Revised: April 12, 2022. Accepted: May 9, 2022. Published: June 25, 2022.

1. Introduction

2. RISC-V and Hbird E203 Core

WSEAS TRANSACTIONS on COMMUNICATIONS

DOI: 10.37394/23204.2022.21.23

Jun Yuan, Qiang Zhao, Wei Wang,

Xiangsheng Meng, Jun Li, Qin Li

E-ISSN: 2224-2864

189

Volume 21, 2022

In order to realize the audio denoising coprocessor based on

RISC-V instruction set, it is necessary to select the appropriate

RISC-V processor core as the carrier. Among many open

source RISC-V cores, such as Rocket[7], BOOM[8], RI5CY[9]

and others, Hbird E203 core adopts two-stage pipeline design

and supports RV32I/E/A/M/C instruction subset configuration,

and its supporting SoC provides a large number of IP modules,

including UART, IIC, SPI, etc[10]. Benchmark ARM

Cortex-M0+ in terms of performance, and its microarchitecture

is shown in Figure 1.

③

PC Generation

② Branch

Prediction

①

Mini_decode

Address

Selection

Instruction

Fetch

ITCM

BIU

Pipeline first stage

④ Decode

& Dispatch

OITF

RD_Regfile

⑥

ALU

⑦

LSU

⑧ MUL\DIV

⑨ Custom

Instruction

Writeback

Arbitration

WB_Regfile

interrupt

exception

Branch Prediction

decision

⑤

Pipeline second stage

Flash

Figure 1 Schematic diagram of E203 microarchitecture

E203 core adopts two-stage pipeline structure, the first stage

of which is value taking, and the second stage of which is

instruction decode (ID), execute (EX), writeback (WB) and

memory (MEM).

The first stage pipeline includes simple ID function block,

branch predictor and PC generator. The simple ID function

block (Tag 1 in the figure) partially decodes the obtained

instructions to obtain some instruction information, including

the classification of instructions, whether they are ordinary

instructions or branch jump instructions, and the types and

details of branch jump instructions. For branch jump

instruction, it is necessary to use static branch predictor (Tag 2

in the figure) to predict the jump and get the predicted jump

address of the instruction. The PC generator (tag 3 in the figure)

generates the PC value of the next instruction to be fetched,

generates PC according to different types such as fetching after

reset, sequential fetching, branch instruction fetching and

pipeline flushing fetching, and accesses instruction tightly

coupled memory (ITCM) or bus interface unit (BIU) to fetch

fingers through ICB bus. The PC value and the corresponding

instruction value are stored in the PC register and the IR

The secondary pipeline mainly includes ID and dispatch (tag

4 in the figure), arithmetic logic operation unit (tag 6 in the

figure), memory access unit (tag 7 in the figure), long

instruction (tag 8 in the figure), custom instruction (tag 9 in the

figure), delivery and pipeline flushing (tag 5 in the figure). ID

and dispatching realize ID of instructions and dispatching

related information to arithmetic logic operation unit, and ALU

unit dispatches specific information to different execution units

for execution. One-cycle instructions such as logic operation,

addition and subtraction, shift, etc. are handed over to ordinary

ALU unit for processing. The branch jump instruction is

delivered to judge the prediction, and the prediction error needs

to be flushed by the instruction pipeline. The memory access

instruction is allocated to the memory loading unit for loading

and accessing data. Long-term coprocessor instructions will be

assigned to coprocessor units for execution.

The schematic diagram of active noise control system

architecture is shown in Figure 2, and the operation processing

part is the most classical FxLMS algorithm[11-13].

Noise

Source Primary path P(z)

Secondary

path S(z)

Adaptive

filter W(z)

LMS

x(n)d(n)e(n)

Acoustic area

Electronic area

FXLMS algorithm

y(n)

()Sz

∧

y( )n

∧

Figure 2 Schematic diagram of ANC system structure

The implementation of FxLMS algorithm has two different

acoustic paths. The main signal is sampled with a r eference

microphone, then the speaker emits an anti-noise signal, and the

error sensor measures the residual error signal. In this process,

the acoustic path between the reference noise source and the

error sensor is called the primary path, and the electrical to

acoustic path between the speaker and the microphone is called

the secondary path. FxLMS algorithm contains two parts, one is

the least mean square algorithm, and the other is adaptive

filtering.

The least mean square (LMS) algorithm is based on the

minimum mean square error criterion and the gradient method.

By improving the calculation method of the gradient value of

the mean square error, the algorithm can be shown by recursive

formulas such as Equations (1), (2) and (3)[14-16]:

() () ()=

yn n nWX

(1)

() () ()= −en dn yn

(2)

( 1) () 2 () ()

+= +n n ne nWWX

(3)

Where,

()nW

represents the weight vector of the filter;

()nX

represents a set of vectors composed of input signals;

()yn

represents the output signal;

()dn

represents the desired signal;

()en

represents the error signal;

represents the step size

factor, where the larger

is, the faster the convergence speed

of the algorithm is, and vice versa. However, the faster the

convergence speed, the worse the steady-state performance, so

it is necessary to constrain the step size factor. In this design,

considering the reduction of algorithm complexity and

processing flexibility, the selection of step factor is based on the

3. Fxlms Algorithm and Its Hardware

and Software Co-design Conception

3.1 LMS Algorithm Principle

WSEAS TRANSACTIONS on COMMUNICATIONS

DOI: 10.37394/23204.2022.21.23

Jun Yuan, Qiang Zhao, Wei Wang,

Xiangsheng Meng, Jun Li, Qin Li

E-ISSN: 2224-2864

190

Volume 21, 2022

fixed step proposed in [3].

The adaptive filtering part is a FIR filter, and the formula is

shown in Equation (4):

() () ()=T

yn n nWX

(4)

Where,

()yn

represents

()nX

generated by a FIR filter with

a weight coefficient

()nW

, because every time a stage sound

source

()yn

is generated, the weight coefficient

()nW

updated by LMS operation. Therefore, updated time-varying

coefficients are obtained, i.e. the coefficients are automatically

and continuously adapted to a given signal to obtain a desired

response to complete adaptive filtering.

The coprocessor part is connected with the main processor in

the mode of instruction pipeline through NICE circuit interface,

and the hardware acceleration function is mobilized in the

mode of custom instructions in the software flow, which is

shown in Figure 3.

Configuring

Peripherals

Obtain reference signal Obtain error signal

Data movement

Load instruction

Load data to SRAM

Conv instruction MAC instruction

Conv operation

Store instruction

Data transfer to

external storage

Data calling

MAC array

operation

Update instruction

Update weight

cache

Figure 3 Flow chart of software and hardware co-design

The gray and pink parts in Figure 3 belong to the software

flow, of which specific significance is the external acquisition

of ANC system and the configuration of function blocks. Then

it includes storing the data at the corresponding address after

collection. Subsequently, the light gray part is the related

custom instructions. Used for realizing software and hardware

interaction between the main processor and the coprocessor, the

last dark gray part is the defined hardware acceleration part,

which specifically includes the adaptive filtering part in the

algorithm corresponding to convolution operation, the weight

update part corresponding to LMS algorithm in the algorithm

corresponding to multiplication and accumulation array

operation, and the corresponding cache unit used in data

handling.

At the same time, due to the particularity of serial operation

of the algorithm itself, this process has two steps. First, the

black flow line is the adaptive filtering operation in the main

path, and then the electrical to acoustic transformation is

needed through relevant peripherals to generate secondary

sound sources. The second step is the acquisition of error

signals and the updating operation of weight coefficients,

which will have a sequence relationship. Therefore, this design

process includes two paths, and only when both paths run out

can ANC system denoising be completed once.

The audio denoising accelerator designed in this paper

optimizes the updating weight and filtering module in the

traditional design. The parallel one-dimensional convolution

structure in the form of addition tree is used to replace the serial

MAC arithmetic unit to realize the filtering part, and the

parallel MAC array is used to replace the original updating

weight part, and the related modules of coprocessor are added.

In the traditional hardware implementation of FxLMS

algorithm, the filter module adopts MAC arithmetic unit,

namely multiply accumulate arithmetic unit and realizes

filtering in serial mode, which will reduce the arithmetic

performance of the filter module, and a lot of repeated

operations are needed when the filter order is long. Therefore,

this design adopts the strategy of sacrificing area in exchange

for performance improvement, and uses the addition tree

structure, which will greatly improve the parallel operation

ability and realize one-dimensional convolution operation. At

the same time, this design adopts MAC array parallel operation

to update the coefficients of weight matrix. Finally, this design

adopts the idea of data multiplexing, and uses data distributor to

reduce the resource consumption of weight coefficient storage

SRAM. The circuit structure is shown in Figure 4..

X_SRAM

W_SRAM

E_SRAM

MAC_Array

CONV

MAC

DATA_WIDTH*2

^FILTER_Order

DATA_WIDTH*

FILTER_Order

DATA_WIDTH*

FILTER_Order

1:FILTER_Order

dispatcher 1:FILTER_Order

dispatcher

FILTER_Order:1

dispatcher

reg

DMUX

Y_FIFO

DATA_WIDT

H*2^FILTER_

Order

Figure 4 Structure diagram of hardware accelerator

The whole acceleration circuit comprises a reference signal

data buffer module (X_RAM), a weight coefficient data buffer

module (W_RAM), an error signal data buffer module

(E_RAM), a data distributor, a data distributor, a

one-dimensional convolution operation block, a MAC array

operation block and a data integrator.

The most critical parts in the accelerator circuit are

one-dimensional convolution operation block and MAC array

operation block. The one-dimensional convolution operation

block realizes FIR filtering part of the algorithm, and the MAC

array operation block realizes parallel weight coefficient update.

3.2 Adaptive Filtering

3.2 Software and Hardware Co-design

4. Hardware Design Part

4.1 Optimization of Operation Structure Design

WSEAS TRANSACTIONS on COMMUNICATIONS

DOI: 10.37394/23204.2022.21.23

Jun Yuan, Qiang Zhao, Wei Wang,

Xiangsheng Meng, Jun Li, Qin Li

E-ISSN: 2224-2864

191

Volume 21, 2022

Its circuit structure is shown in Figure 5.

MUL*

ADD+

MUL*

ADD+

CONV

MUL* ADD+

W_SRA

MAC

Figure 5 Circuit structure diagram of arithmetic unit

Because the audio denoising algorithm needs to quickly

generate secondary sound sources after collecting reference

signals, and the generation of secondary sound sources needs to

be operated by adaptive filtering, so in this design, the addition

tree parallel structure is used to design one-dimensional

convolution operation for filtering, which can improve the

operation speed and high parallelism, so that the secondary

sound sources can be produced faster. When the secondary

sound source is generated, it is necessary to collect error signals

to update the filter weight coefficients, so the algorithm has the

characteristics of sequential processing, and the speed of

weight updating will play an important role in the generation of

secondary sound sources. Therefore, in this design, MAC array

is used to realize the updating operation of each weight, and the

updated weight data needed by the next convolution operation

can be obtained in the same period. The reasonable use of data

distributor and data integrator makes the operation speed

greatly improved.

After the operation structure design is completed, the core

instruction cooperation unit should be added to e xpand the

coprocessor design, and the decoder, data extractor and

configuration enabling function block should be added to

complete the hardware design of the audio denoising

coprocessor. Its circuit structure is shown in Figure 6.

X_SRAM

W_SRAM

E_SRAM

MAC_Array

CONV

MAC

DATA_WIDTH*2

^FILTER_Order

DATA_WIDTH*

FILTER_Order

DATA_WIDTH*

FILTER_Order

1:FILTER_Order

dispatcher 1:FILTER_Order

dispatcher

FILTER_Order:1

dispatcher

reg

DMUX

Y_FIFO

DATA_WIDT

H*2^FILTER_

Order

DATA

Fetcher

NICE_Interface

BIU

EXUWB_RegLSU

ITCM

DTCM IFU

Request

Channel

Response

Channel

Memory

Request

Channel

Memory

Response

Channel

Decode

Co_Processor

Config

Figure 6 Circuit structure diagram of audio denoising coprocessor

The NICE controller processes the time sequence related to

the interface of the coprocessor, and transmits the instruction

information and source operands obtained from the request

channel to the decoder for ID. Decoder is used to decode

custom instructions. This design is mainly divided into two

types of instructions, one is configuration instructions, and the

other is data loading and storage instructions. For the

configuration instruction, the configuration information is

transmitted to the configuration module, and the configuration

module will give the enabling signal and control signal required

by the corresponding response module to realize the

configuration of each operation function part. For the data load

store instruction, the memory access information is transmitted

to the data extractor for processing. When the memory access

information is a loading instruction, the address information

and the read signal are transmitted through the memory request

channel and the data is obtained from the corresponding

memory module. Then the read data is transmitted to the data

extractor through the memory feedback channel and distributed

to the corresponding cache module by the data extractor. When

the memory access information is a write memory instruction,

the address information is transmitted through the memory

request channel, and the write data is obtained from the data

extractor and transmitted to the memory location corresponding

to the address. If the instruction is the write-back result, the

write-back data is transmitted to the feedback channel through

the data extractor to complete the write-back of the general

After the software program is burned to the MCU, the main

processor obtains the instructions in sequence, decodes the

instructions, and judges whether the instructions are custom

instructions according to their operation codes. In this design,

the operation codes of custom1-4 defined by RISC-V are used

as custom instruction operation codes, and R-type instructions

are used for custom instruction coding. Its format is shown in

Figure 7.

Figure 7 32-bit custom instruction encoding format

For custom instructions, it is judged whether to read the

source operand according to xs1 and xs2. In this process, the

main processor maintains the data correlation, and if there is a

data conflict, the data channel will be closed until the data

correlation is released. If there is data written back, the

destination register of the rd bit is also a consideration of data

correlation. After that, the instruction information is

transmitted to the coprocessor for processing through the NICE

interface, The coprocessor decodes the instructions and

distributes them to different units for execution according to the

type of instructions. Finally, the coprocessor writes the

instruction execution results back to the main processor

through the response channel, and writes the execution results

back to the rd target register or transmits the results to the

corresponding storage locations through the memory request

channel.

The data stream of the audio noise reduction coprocessor

designed in this subject is shown in Figure 8, and the processing

signals are obtained by external sensors or receivers. Data is

transmitted to ICB peripheral bus through interface IP mounted

4.2 Coprocessor Design

WSEAS TRANSACTIONS on COMMUNICATIONS

DOI: 10.37394/23204.2022.21.23

Jun Yuan, Qiang Zhao, Wei Wang,

Xiangsheng Meng, Jun Li, Qin Li

E-ISSN: 2224-2864

192

Volume 21, 2022

on SoC. When the relevant data needs to be processed, the data

is acquired and written through the memory request and

feedback channel. After the processing is finished, the

processing result is written back to the general register of the

main processor or into the corresponding memory, and the

main processor sends it to the external module through the

interface IP to obtain the generated signal.

Voice Sensor Voice

data

Peripheral

Interface

Peripheral

Interface

MCU

Audio noise

reduction

coprocessor

NICE

WM8731

Audio

Codec

Processing

results

Figure 8 System data flow diagram

Through memory access custom instructions, a lot of data

stored in the main processor is moved to the coprocessor, which

reduces the access of the coprocessor to the main processor

memory and greatly reduces the power consumption. At the

same time, the parallel operation units in the coprocessor will

ensure the operation speed. Finally, compared with the SoC

plug-in accelerator, the coprocessor with instruction pipeline

mode does not need frequent data access, reduces data

movement, and has better real-time processing performance.

The custom instructions of the audio denoising coprocessor

based on FxLMS algorithm are shown in Table 1.

Table 1 Custom instruction table of coprocessor

Instruction Funct7 Rd Xd Rs1 Xs1 Rs2 Xs2

Load.X

X_MemoryAddress

Length|X_BaseAddress

Load.E

E_MemoryAddress

Length|E_BaseAddress

Store.Y

Y_ MemoryAddress

Cfg.Conv

Filter order

En_Conv

Cfg.MAC

Filter order

En_MAC

Updata.W

Filter order

En_Up.W

Rst

There are 7 custom instructions, namely data load storage

instruction and configuration enable instruction. The data

loading instruction is responsible for loading the reference

signal and the error signal from the corresponding address and

storing them in the corresponding buffer of the coprocessor.

The data storage instruction is responsible for transmitting the

secondary sound source signal and writing it to the

corresponding memory address through the memory request

channel. The configuration enable instruction is responsible for

configuring the filter order and enabling the relevant functional

modules.

The use steps of custom instruction are as follows: firstly, the

reference signal is loaded through Load.X instruction, and the

data is accessed through memory request channel and read

through memory feedback channel, and then loaded into

X_SRAM cache. After that, the reference signal and weight

coefficient are read from X_SRAM and W_SRAM by

Cfg.Conv instruction and sent to the corresponding DMUX

through data distributor. DMUX performs convolution

operation and generates secondary sound source under the

control of enable signal until the convolution of reference

signal ends. After that, the secondary sound source data in

FIFO is written back to the corresponding address through the

memory request channel through the Store. Y instruction. Then

Load.E instruction loads error signal data like Load.X

instruction, and Cfg.MAC instruction configures MAC

operation array and updates weight coefficients. Finally,

W_SRAM is configured by Updata.W instruction to write the

update weight, which completes an adaptive denoising

operation acceleration. In addition, the reset of the coprocessor

can be performed by Rst instruction.

After completing the instruction of custom coprocessor, we

can use assembly language to transfer the work of coprocessor.

However, the efficiency of assembly language development is

too low, so embedded inline assembly is often used in C\ C + +.

Therefore, the first task is to package instructions into C

language library functions by using inline assembly syntax

format, and complete the library function design of coprocessor.

The designed library function interface is shown in Table 2.

Table 2 Library functions of custom instructions and their introduction

Function Interface

Function

int Load_X(unsigned int X_MemoryAddress,

unsigned int X_BaseAddress, unsigned int Length) Load reference signal X into X.SRAM

int Load_E(unsigned int E_MemoryAddress,

unsigned int E_BaseAddress, unsigned int Length) Load error signal E into E.SRAM

int Store_Y(unsigned int Y_ MemoryAddress)

Store secondary source Y

int Cfg_Conv(int Filter_order, int En_Conv)

Configure convolution operation length and enable

int Cfg_MAC(int Filter_order, int En_MAC)

Configure MAC operation length and enable

int Updata_W(int Filter_order, int En_UP.W)

Configure weight coefficient length and enable

void Rst()

Reset coprocessor

Cfg.Conv library function is taken as an example, and its

specific inline assembly syntax format is shown in Figure 9.

Figure 9 Library function of Cfg.Conv instruction

After completing the hardware and software design of the

coprocessor, it is the design part of the whole ANC system.

This design is based on Hbird E203_SoC platform, modifies

the original SoC, deletes unnecessary peripheral interfaces, and

adds IIS interface peripherals needed for audio data

transmission. The whole system structure is shown in Figure

10.

5. Software Design Part

5.1 Custom Instruction Design

5.2 Coprocessor Library Function Design

6. Application and Evaluation of

Fxlms Algorithm

6.1 Overall Design of Anc System

WSEAS TRANSACTIONS on COMMUNICATIONS

DOI: 10.37394/23204.2022.21.23

Jun Yuan, Qiang Zhao, Wei Wang,

Xiangsheng Meng, Jun Li, Qin Li

E-ISSN: 2224-2864

193

Volume 21, 2022

I2C SCLK

I2C SDAT

ICB_APB

UART

E203_SOC

IIC

IIS

(3.072MHz)

I2C_cfg

I2C_setup

WM8731_

Config

LRCK

DACDAT

ADCDAT

Audio_data_rx

Reference

microphone

ADC

Audio_data_tx

DAC

Error

microphone

Secondary

sound source

WM8731_Audio_

Interface

WM8731_Audio_Interface

WM8731_Config

WM8731

MATLAB

anti

noise

Reference

noise

Residual

noise

Decode Conv

Mac

Coprocessor

NICE_Interface

SRAM

Data

Fetch

Cfg

EXU LSU

BIU

WB_Reg

E203_CORE

Figure 10 Circuit structure diagram of ANC system

The main processor configures the initial information

through IIC bus to make WM8731 audio codec module work

normally, and uses the probe to collect audio signals and

convert them into digital audio signals through ADC built in the

module. Then it is transmitted to ICB bus through IIS audio

transmission interface, and the coprocessor reads IIS audio data

on the bus through LSU and loads it on the corresponding cache.

After configuring enabling instructions, the convolution

operation can be carried out smoothly and anti-noise signals

can be generated. After that, the anti-noise signal is written to

the address where the IIS interface data is located through the

memory request channel, and the analog signal is obtained by

digital-to-analog conversion through the built-in DAC of the

module and secondary noise is generated. Then the module

collects the residual noise signal again until it is loaded on the

corresponding buffer. After configuring the enabling

instruction, the MAC array can update the weight coefficients,

so as to complete the denoising and acceleration of ANC

system once.

After the whole software and hardware design and system

design are completed, the denoising performance is measured

based on MCU200T development board, and the schematic

diagram of the measured scene is shown in Figure 11.

Reference microphone

Error microphone

Audio signal

analyzer

MCU200T

WM8731

Sensor

Noise source

Secondary sound source

DCDC

Power Supply

AWA6290 signal analysis software

Figure 11 Real scene diagram of test scene

The noise signal is collected before and after denoising by

special instruments, and the pink noise is used for acoustic test,

so as to obtain the relevant collected data and visualize the data

through Matlab to obtain the change schematic diagram before

and after denoising in the ear frequency band as shown in

Figure 12.

Figure 12 Schematic diagram of noise acquisition at the same position

before and after denoising

As can be seen from the schematic diagrams A and B in

Figure 12, the sound pressure levels of the same noise are

different in continuous time periods, so the denoising effect is

unstable. It can achieve good denoising effect in some

individual positions but cannot adapt to the whole low

frequency band. At the same time, it can be known from the

schematic diagram C in Figure 12 that the denoising effect of

the algorithm in the middle and high frequency band is not ideal,

which is related to the denoising principle of the active

denoising system itself. From the analysis of schematic

diagram D in Figure 12, it can be seen that the average

denoising effect of 8dB can be achieved in the frequency range

of 200-2000Hz under the condition of pink noise, which proves

that the algorithm can be realized by combining heterogeneous

SOC with hardware accelerator.

In order to evaluate the performance of the coprocessor, this

paper adopts two methods to implement convolution and MAC

operations, one is implemented by the standard RISC-V I\ M

instruction set, the other is implemented by using the

coprocessor custom instructions designed in this paper and the

RISC-V I\ M instruction set together, and compares the number

of instructions executed by the two methods. Through IDE

tools to write the software code and burn it to the development

board, you can print out the corresponding execution results

and calculate the number of instructions through serial port.

The experimental results are shown in Table 3.

Table 3 Number of instructions required by different arithmetic units

to run under different instruction sets

Algorithm Rv32 I\M Instruction Coprocessor Instruction

Conv

4582

1324

MAC

656

256

Through the instruction number, we can see that Conv and

MAC operation can save instruction space more than standard

instruction set under the action of coprocessor, and the

instruction number is greatly reduced. This is because on the

one hand, the coprocessor realizes convolution and MAC

through a s pecial hardware acceleration unit, while the main

processor can only realize convolution and MAC through

6.2 Evaluation Analysis

WSEAS TRANSACTIONS on COMMUNICATIONS

DOI: 10.37394/23204.2022.21.23

Jun Yuan, Qiang Zhao, Wei Wang,

Xiangsheng Meng, Jun Li, Qin Li

E-ISSN: 2224-2864

194

Volume 21, 2022

software methods such as addition, subtraction, multiplication

and division; on the other hand, from the system data flow

diagram in Figure 7, it can be seen that the coprocessor

implementation reduces the repeated movement of data and

further improves the processing speed of the algorithm.

Based on the design optimization of hardware accelerator,

The coprocessor is designed, the ANC system is built on the

basis of E203_SoC, and the denoising test is carried out in a

quiet indoor environment. The sound pressure level data before

and after denoising are obtained by audio analysis and

acquisition instrument. After data analysis, it can be seen that

the FxLMS algorithm realized by combining heterogeneous

SoC with hardware accelerator has remarkable effect and can

achieve nearly 8dB denoising effect. Subsequently, two

different test methods are used to test the acceleration effect of

coprocessor, and it is concluded that the implementation of

coprocessor custom instruction set has significant acceleration

effect for convolution and MAC operations.

This research was supported by the Science and Technology

Major Project of Chongqing Municipal Science and

Technology Bureau (cstc2018jszx-cyztzxX0054), and the

Chongqing Municipal Science and Technology Commission

Major Project of Integrated Circuit Industry

(cstc2018jszx-cyztzx0217)

[1] Meng H, Chen S. Particle swarm optimization based novel

adaptive step-size FxLMS algorithm with reference signal

smoothing processor for feedforward active noise control

systems[J]. Applied Acoustics, 2021, 174: 107796.

[2] Sookpuwong C, Chompoo-inwai C. A Multi-Channel

Feedforward ANC with FXLMS Algorithm for

Aviation-Noise Suppression[C]//2019 53rd Asilomar

Conference on Signals, Systems, and Computers. IEEE,

2019: 1374-1378.

[3] Abdi F, Amiri P. Design and implementation of adaptive

FxLMS on FPGA for online active noise cancellation[J].

Journal of the Chinese Institute of Engineers, 2018, 41(2):

132-140.

[4] Liu L, Su Q, Li W, et al. Real Time Implementation and

Experiments of Multi-channel Active Noise Control

System for ICU[C]//2021 IEEE International Conference

on Electro Information Technology (EIT). IEEE, 2021:

395-400.

[5] Shyu K K, Ho C Y, Chang C Y. A study on us ing

microcontroller to design active noise control

systems[C]//2014 IEEE Asia Pacific Conference on

Circuits and Systems (APCCAS). IEEE, 2014: 443-446.

[6] Vu H S, Chen K H, Sun S F, et al. A 6.42 mW low-power

feed-forward FxLMS ANC VLSI design for in-ear

headphones[C]//2015 IEEE International Symposium on

Circuits and Systems (ISCAS). IEEE, 2015: 2585-2588.

[7] Asanovic K, Avizienis R, Bachrach J, et al. The rocket chip

generator[J]. EECS Department, University of California,

Berkeley, Tech. Rep. UCB/EECS-2016-17, 2016, 4.

[8] Asanovic K, Patterson D A, Celio C. The berkeley

out-of-order machine (boom): An i ndustry-competitive,

synthesizable, parameterized risc-v processor[R].

University of California at Berkeley Berkeley United

States, 2015.

[9] Traber A, Zaruba F, Stucki S, et al. PULPino: A small

single-core RISC-V SoC[C]//3rd RISCV Workshop. 2016.

[10] Wu N, Jiang T, Zhang L, et al. A reconfigurable

convolutional neural network-accelerated coprocessor

based on RISC-V instruction set[J]. Electronics, 2020, 9(6):

1005.

[11] Félix F B, de Castro Magalhães M, de Souza Papini G. An

improved Anc algorithm for the attenuation of industrial

fan noise[J]. Journal of Vibration Engineering &

Technologies, 2021, 9(2): 279-289.

[12] Munir M W, Abdulla W H. On FxLMS scheme for active

noise control at remote location[J]. IEEE Access, 2020, 8:

214071-214086.

[13] Kang M S. FxLMS Algorithm for Active Vibration

Control of Structure By Using Inertial Damper with

Displacement Constraint[J]. Journal of the Korea Institute

of Military Science and Technology, 2021, 24(5): 545-557.

[14] Rabiman R, Nurtanto M, Kholifah N. Design and

Development E-Learning System by Learning

Management System (LMS) in Vocational Education[J].

Online Submission, 2020, 9(1): 1059-1063.

[15] Yang F, Guo J, Yang J. Stochastic analysis of the filtered-x

LMS algorithm for active noise control[J]. IEEE/ACM

Transactions on Audio, Speech, and Language Processing,

2020, 28: 2252-2266.

[16] Jalal B, Yang X, Liu Q, et al. Fast and robust

variable-step-size LMS algorithm for adaptive

beamforming[J]. IEEE Antennas and Wireless

Propagation Letters, 2020, 19(7): 1206-1210.

Jun Yuan, received B.E. and M.E. degrees in

Electrical Engineering in 2006, 2009

respectively, from Southwest Jiaotong

University, China. And then in 2012 he received

D.Eng. degree from Kochi University of

Technology, Japan. Then he joined School of

Optoelectronic Engineering, Chongqing

University of P osts and Telecommunications,

China. His areas of research interests are

analog-digital mixed signal IC design, DFT

research and noise processing IC design.

7. Conclusion

Acknowledgment

References

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the Creative

Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en_US

WSEAS TRANSACTIONS on COMMUNICATIONS

DOI: 10.37394/23204.2022.21.23

Jun Yuan, Qiang Zhao, Wei Wang,

Xiangsheng Meng, Jun Li, Qin Li

E-ISSN: 2224-2864

195

Volume 21, 2022