FPGA Implementation of High-Performance Truncated Rounding based

Approximate Multiplier with High-Level Synchronous XOR-MUX Full

Adder

G. ERNA1a, G. SRIHARI2b, M. PURNA KISHORE3, ASHOK NAYAK B.4c, M. BHARATHI5d

1Department of Electronics and Communication Engineering,

PACE Institute of Technology & Sciences (UGC Autonomous), Ongole-523272,

Andhra Pradesh,

INDIA

2Department of CSE, School of Technology,

The Apollo University,Chittoor, Andhra Pradesh,

INDIA

3Department of Electronics and Communication Engineering,

KKR & KSR Institute of Technology and Sciences (UGC Autonomous), Guntur, Andhra Pradesh,

INDIA

4Department of ECE, Marri Laxman Reddy Institute of Technology and Management (UGC

Autonomous), Dundigal,Telangana,

INDIA

Department of ECE, School of Engineering & Technology,

Mohan Babu University, Erstwhile Sree Vidyanikethan Engineering College, A.P, Tirupati,

INDIA

aORCiD: 0000-0002-4427-9002

bORCiD: 0000-0002-0233-4958

cORCiD: 0009-0003-0628-9861

dORCiD: 0000-0002-8633-1921

Abstract: - In research and development, the most emerging field in digital signal processing and image processing

is rounded-based approximated signed and unsigned multipliers. In the present research, we propose some cutting-

edge, Preformation, and logic simplification technology connected to processing the Discrete cosine transform

(DCT) and Discrete wavelet transform (DWT) images for sharpening. This technology will yield a truncated shifter

incorporated with logical XOR-MUX Full adder techniques. A reliable and cost-effective approximate signed and

unsigned multiplier was created for the rounding method. While this more advanced technology includes many

approximate multipliers, it sacrifices the ability to find the closest integer of a rounded value when combining

signed and unsigned capabilities, resulting in higher absolute errors than other approximate multipliers based on

rounding. This proposed work will introduce a novel method of Truncated Shifter Rounding-based Approximate

Multiplier integrated with a High-Level Synchronous XOR-MUX Full Adder design to minimize the number of

logic gates and power consumption in the multiplier architecture. The Truncated RoBA (Rounding-based

Approximate Multiplier) with XOR MUX Full Adder will reduce the logic size in the shifter and the arithmetic

circuit. The work will modify this rounding-based approximate multiplier to minimize area, delay, and power

consumption. This proposed architecture will be integrated with two fundamental changes: firstly, its Barrel shifter

method will be replaced with a truncated shifter multiplier with XOR MUX Full Adder, and secondly, the parallel

prefix Brent Kung adder will be replaced with a carrying-save adder with XOR MUX Full Adder. Finally, this

architecture was designed using Verilog-HDL and synthesized using the Xilinx Vertex-5 FPGA family, targeting

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2023.22.13

G. Erna, G. Srihari, M. Purna Kishore,

Ashok Nayak B., M. Bharathi

E-ISSN: 2224-266X

111

Volume 22, 2023

the device Xc7Vx485tFFg1157-1. It resulted in a reduction of area LUT (34%), power (1%), delay (32%), and

error analysis (75%) when compared to the existing RoBA.

Key-Words: - FPGA, Truncated shifter, Barrel Shifter, Prefix adder, Carry save adder, XOR-MUX- Full adder.

Received: January 11, 2023. Revised: Octobert 6, 2023. Accepted: November 7, 2023. Published: December 4, 2023.

1 Introduction

One of the most effective ways to arrange the input

data before processing is to use the rounding

approach. By taking advantage of the built-in error

robustness of some applications, like digital signal

processing, multimedia, and machine learning,

inaccurate circuits decrease the necessary hardware,

[1]. In recent years, approximate computing has

become popular as a n achievable approach for

designing digital systems that use less energy. Many

systems and applications must be able to accept a

certain amount of error or optimal outcome loss in the

computed result for approximate computing to be

successful. Three steps are involved in multiplication

on a circuit: 1. partial product production, 2.

accumulation, and 3. final addition. Truncation of the

least significant portion of the partial product column

is frequently utilized for quick and effective multiplier

construction, [2]. Almost all electronic systems are

subjected to challenges in design with the requirement

of energy minimization, specifically for portable

devices such as tablets, smartphones, and electronic

gadgets. A desire is to achieve reduced performance

speed at a minimal penalty rate. In those mobile

devices, Signal Processing blocks exhibit vital

components with the realization of different

multimedia-related applications. The core

computational blocks consist of an arithmetic logic

unit with a multiplication process to achieve higher

arithmetic operations within DSP systems. Hence, to

achieve higher efficiency of the processor, it is

necessary to improve the characteristics of the

multiplier, such as speed, power, and energy. Several

DSP cores implement image and video processing

algorithms to derive the desired final output through

pictures or video for efficient human use. This reality

played a role in the desire to achieve productivity in

terms of speed or energy, [3]. The arithmetic unit

approximation exhibits various design abstraction

levels consisting of circuit, architecture level, and

logic in the software layer of the system. However,

few approximation techniques are implemented with

multiple techniques subjected to certain violations

regarding over-scaling voltage or over-clocking—

similarly, functional-based approximation methods

concentrated on modification of Boolean function. In

the case of the functional approximation method, it

consists of arithmetic building blocks at various

design levels with adders and multipliers, [4].

This nearest value and altered shifted products

focused on achieving great speed, low dynamic

switching transitions, and energy-efficient

performance with the estimated multiplier. The

developed approximate multiplier concentrated on

error resilience for DSP applications, [5]. The recently

developed truncated Carry Save Adder approximate

multiplier involved area efficiency, construction, and

modification of the conventional multiplication

approach and Boolean function at different algorithm

levels with an assumption of rounding off input

values. The proposed method is a Truncated shifter

rounding-based approximate (RoBA) multiplier

integrated with a ca rry-save adder using an XOR-

MUX adder, [6]. This concept describes halving the

partial products in the final addition. Also, image and

video processing applications focused on the accuracy

of arithmetic operations in conjunction with the

system's functionality, [7]. Approximate multipliers

multiply signed and unsigned integers and rely on

quantification techniques, i.e., truncation and

rounding. The truncation is the method of discarding

all bits on the LSB side, implementing an 8-bit RoBA,

which cuts down the LSB part of the nearest

multiplier, [8]. The estimated calculation to the goal

decreased the arithmetic circuit. The process of

rounding and truncation minimizes hardware

complexity. Reduced the size of the complexity to

save power consumption, but the multiplication output

is incorrect. The expected multiplier generates errors

relative to the actual multipliers. Errors are minimized

due to an increase in the scale of the operand sizes,

[9]. Compared to rounding higher bits, rounding lower

bits results in less error, according to simple logic.

Design the approximate truncated shifter multipliers

using the three approaches, and the overall work

contributes to these evaluate roles, given below about

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2023.22.13

G. Erna, G. Srihari, M. Purna Kishore,

Ashok Nayak B., M. Bharathi

E-ISSN: 2224-266X

112

Volume 22, 2023

the present work.1. Approximate partial product

generation and removal of least significant bits (LSB)

from partial product coefficients. They were cutting

the partial product of LSB and ignoring the logic gates

of partial products using the truncation technique. 3.

Use an imperfect circuit with fewer elements and

enhance the Turunacted Shifter's inexact multiplier

performance. The above three plans to shrink the

partial products and low power Combinational logic

and using fewer hardware components gates are used

like arithmetic gates are XOR and XNOR and

adaptable with NAND gate also used in the estimated

Multipliers. Approximate computing has overcome

the overflow and underflow situation because half of

the part of the product was discarded, [10].

2 Literature Review on Inaccurate

Multiplier

This section reviews existing literature for high-speed,

energy-efficient, low-power DSP applications. The

existing literature considered for evaluation is

presented as follows.

An architecture for efficient granularity stated as

digit-reconfigurable finite impulse response (FIR),

[11], exhibits compact and low-power characteristics

with broader precision and length. The proposed FIR

architecture comprises an 8-digit reconfigurable FIR

filter implemented with CMOS technology. The

developed technology incorporates a single-poly

quadruple-metal of 0.35 m. Simulation results

exhibited that fabricated chips operate at a frequency

of 86MHz with a power of 16.5mW for a power

supply of 2.5V.

FIR filters are a digital technique broadly adopted

in DSP applications with appropriate stability, linear

phase, minimal finite precision error, and efficient

processing. The developed reconfigurable architecture

exhibits minimal FIR filter complexity with

programmable shifts. The proposed architecture

involved the modification of adder architecture with

the integration of a carry save adder rather than a

typical adder unit. The proposed architecture

exhibited a 12% reduction in power and area

compared with the existing reconfigurable FIR filter.

The developed FIR architecture was implemented and

evaluated in Spartan-3 xc3s200-5pq208 field-

programmable gate array (FPGA), [12].

A Transpose form FIR filter constructed in the

pipelined manner with support for multiple constant

multiplications (MCM) techniques to significantly

reduce the computation process. However, the

configuration of transpose does not rely directly on

the processing block as direct form configuration. The

developed technique explores the possibility of FIR

block with the transpose of configuration with

reduced area delay for higher-order filters for

applications such as fixed and reconfigurable

parameters—the detailed analysis of the

computational process involved in transpose of FIR

filter based on the conical signed digit. The proposed

technique is involved in FIR optimization of block

filter transpose difficulty. The developed architecture

involved the construction of architecture with a

general multiplier integrated with a b lock filter for

different reconfigurable applications. The MCM

scheme incorporates design with minimal complexity

with block implementation of FIR. Instead of using

the current block in a direct-form structure with more

considerable filter lengths, the suggested architecture

lowered the minimal area delay product (ADP) with

reduced energy per sample (EPS) with the filter's short

length. The implementation block involved in FIR

structure directly with reduced EPS and ADP.

Applying a specific integrated circuit exhibits

significant results rather than the proposed

architecture with a 4- and 64-bit fixed-length block

size. The analysis of results stated that ADP and EPS

performance is reduced by 40% rather than the

existing FIR filter structure. With a similar filter

length and size of the sample, the block exhibits ADP

with 13% and EPS value of 12.8% rather than the

existing FIR structure in directform, [13].

A new architecture for truncated multiplication and

compare it to previous implementations—the first

DSP device to incorporate a truncated multiplier into a

programmable DSP block, [14]. When used as part of

a complete DSP scheme, column-wise implementation

has been shown to help with the power-SNR trade-

offs of arithmetic units based on truncated multipliers

with very little overhead. In reality, software-based

error compensation is a realistic and efficient solution.

The efficiency of programmable truncation has been

thoroughly tested in a DSP block designed for simple

yet arithmetically complicated algorithms and

evaluated at the system level. The findings show that

conservative power savings of up to 15% of full DSP

power can be achieved while maintaining appropriate

SNR levels. The practical advantages of truncated

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2023.22.13

G. Erna, G. Srihari, M. Purna Kishore,

Ashok Nayak B., M. Bharathi

E-ISSN: 2224-266X

113

Volume 22, 2023

multiplication are demonstrated using several real-

world algorithms running on a computer. The

architecture was made using the TSMC 90 nm

process, and samples showed that it worked

adequately regarding functionality and power savings.

Multiplier with less power and a shorter critical

path than traditional multipliers for high-performance

DSP applications, [15], employs a newly developed

estimated adder that limits carry propagation to the

nearest neighbours for fast partial product

accumulation. Configurable error recovery can

achieve various levels of accuracy by using different

numbers of the most significant bits (MSBs) for error

reduction. Most errors are slight since the approximate

multiplier has a small mean error gap. In contrast to

the Wallace multiplier, a 16-bit predicted multiplier

implemented in a 2 8nm CMOS system decreases

delay and power by 20% and up t o 69 pe rcent,

respectively. Due to ample error recovery, the

proposed approximate multiplier achieves processing

precision equivalent to traditional exact multipliers

but with significant power and efficiency gains.

3 Existing Technique on RoBA

Figure 1 s hows the existing technique RoBA

multiplier consisting of a sign detector and Rounding

methods, and it consists of three-barrel shifters for

higher speed and energy efficiency. The sign detector

finds the positive and negative integer; if it finds the

negative number, it converts it into 2’s complement

and rounding methods. To determine the values of the

closest inputs, this block rounds the absolute values.

As a result, after rounding, the output values are

recovered in the form of 2n. Rounding values were

generated; similarly, Ar and Br, and these values are

given to three Barrel shifter blocks based on the

products of n bit width are adjusted on log2Ar or

log2Br depending on ope rand size, [16]. The

developed approach is based on round operands with

neighbouring exponent values of two. In the proposed

method, multiplication is considered a significant part

of improving speed and energy consumption with

minimal cozy and error. Even the developed approach

can be appropriate for both signed and unsigned

multiplications. The related techniques are adopted in

three hardware implementations with approximate

multipliers. The multiplier involved in consideration

of both unsigned and signed operations. The

performance of the related work of the RoBA

multiplier is comparatively examined with

approximate and multiplier considering various design

parameters. The existing RoBA with barrel shifter

multiplier is also evaluated and examined in image

processing applications such as image sharpening and

smoothing, [17].

Fig. 1: Existing RoBA Multiplier architecture with Brent Kung Adder

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2023.22.13

G. Erna, G. Srihari, M. Purna Kishore,

Ashok Nayak B., M. Bharathi

E-ISSN: 2224-266X

114

Volume 22, 2023

4 RoBA Integrated with Barrel Shifter

Included WithBrent Kung Adder

The function of the barrel shifter in rounded

approximated multipliers is to shift the binary

numbers being multiplied by a power of two. For

example, suppose the multiplied binary numbers are

shifted left by one position, Br*A, Ar*B, and Ar*Br.

In that case, the resulting limited products will be half

the size of the original partial products, and the

number of partial products required for multiplication

will be reduced by half. In addition to reducing the

number of partial products, barrel shifters can also be

used to perform rounding in the nearest approximated

multipliers. However, the approximation introduced

by rounding can result in errors in the final result,

which may be acceptable in specific applications that

produce image sharping in image processing and

video processing. In operating rounded-based

approximate multipliers using Brent-Kung adder, the

inputs are first rounded to the nearest power of two.

The partial products are very few for multiplication,

reducing the adder operations. The partial products are

then added using a Brent-Kung adder. The Brent-

Kung adder uses a p arallel prefix structure that

reduces the propagation delay and makes it suitable

for a more extended wait time operation than the

KONGE STONE adder. Using rounded-based

approximate multipliers with Brent-Kung adder have

logic size can result in significant savings in power

consumption and chip area compared to traditional

multipliers. However, the approximation introduced

by rounding can result in errors in the final result,

which may not be acceptable in specific applications,

[18]. The structure of the 8-bit BrentKung adder is

presented in Figure 2.

Fig. 2: Structure of 8-Bit BrentKung adder

5 Proposed Method of Truncated

Shifter RoBACarrySave Adder

The main contributions of this proposed method are

divided into two methods.

1. Truncated shifter RoBA multiplication with

modification of full adder design using XOR-MUX

approaches.

2. To Reconfigurable three-hardware architecture with

truncated shifter Carry Save Adder RoBA

Multiplication of both signed and unsigned

operations, [19].

The 8-bit Truncated shifter Multiplier architecture

combines an XOR MUX Full Adder design, an 8-bit

carry-save adder, and an 8-bit subtract. The truncated

shifter-rounded-based- approximate multiplier

integrated with carry save adder with synchronous

XOR-MUX full adder is a digital circuit designed to

perform approximate multiplication with reduced

hardware complexity and power consumption,

[20].The truncated shifter is used to truncate or round

the partial products generated by the multiplier to

reduce the number of bits required for addition.

TSRoBA(Truncated shifter Rounded approximate

multiplier) helps reduce the circuit's overall hardware

complexity and power consumption. The carry-save

adder performs fast and efficient addition of the

partial products. The High-Level synchronous XOR-

MUX full adder uses XOR-MUX gates to implement

the sum and carry logic, respectively, and is designed

to operate synchronously with the clock signal. The

synchronous design of the XOR-MUX full adder

helps to ensure that the outputs are stable and valid

simultaneously, which can improve the overall

performance and reliability of the circuit. The function

of the circuit is to perform approximate multiplication

with reduced hardware complexity and power

consumption. The approximate multiplier may

introduce some errors in the multiplication result.

However, this error can be controlled by adjusting the

truncated shifter's precision and the approximation

level used. The carry-save adder and synchronous

XOR-MUX full adder help to improve the

performance of the circuit by reducing the

propagation delay and improving the power

efficiency, [21].

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2023.22.13

G. Erna, G. Srihari, M. Purna Kishore,

Ashok Nayak B., M. Bharathi

E-ISSN: 2224-266X

115

Volume 22, 2023

Fig. 3: The proposed TSRoBA (Rounded ShiftedRounded-Based Approximated Multiplier) overall architecture

5.1 An Implementation of Truncated RoBA

Multiplier

This study aimed to create a truncated carry-save

Adder RoBA multiplier that could be used for both

signed and unsigned operations. The proposed

TSRoBA is tested using three multipliers—truncated,

automatic, and rounded or nearest value-based

Multiplier. Initially, the sign of the inputs is

determined, and both values are observed as negative

values, resulting in the generation of absolute values.

In the next step, the nearest value block calculates the

closest value in terms of absolute value in 2n. The bit

length of the output block is determined to be n. This

means that the absolute value of n-b operations with a

format of 0 as 2's complement is the most crucial bit

(MSB). The following equation is calculated for

output bit using rounding block to find the closest

value for input A and B. Instead of A and B operands,

consider M operands and approximate is Mr in the

planned architecture, [22].

Mr[3] = (M[3] · M[2] · M[1] + M[3] · M[2]) · n −1

i=4 M[i]

(1)

Mr[2] = M[2] · M[1] · n −1 i=3 M[i] (2)

Mr[1] = X[1] · n −1 i=2 M[i] (3)

Mr[0] = X[0] · n −1 i=1 M[i]. (4)

[  1]=[  1]



. [  2] . [  3]+

[  1] . [  2]



(5)

[  2]=[  2]



[  3].[  4]+

[  2].[  3]



 . [  1] (6)



[]=

[]



. [  1] . [  2]+

[].[  1]



) . []



1

=+1 (7)

[3]=

[3]



. [2] . +[3].[2]



 . []



1

=4 (8)

[2]=[2].[1]



. []



1

=3 (9)

[1] . []



1

=2 (10)

[2]=[0][]



1

=1 (11)

Based on t he above-stated equation (1) - (11),

Xr[i] incorporates two cases that are stated as follows:

Case 1: If M[i] = 1 then all bits in left are 0, while

X[i-1] = 0

Case 2: If M[i] = 0 then all bits in left are 0, while

X[i-1] & X[i-2] = 0

To round off the values of the Truncated shifter with

the products calculated using the shifting is estimated

with the computation of logMr (2 − 1 ) or logM(2 − 1)

with an operand of M (or N). Here, the width of the

input bit is estimated with shifter block n, where

output is computed as 2n. However, it provides a

method for higher fan-out value with usable minimal

operand width, [23].

5.2 Full Adder Designs using Xor-Mux Gates

Logic

The standard full adder in the reduction step of the

truncated RoBA multiplier is replaced by a ch anged

full adder to reduce the power and area. The

Truncated shifter RoBA, as shown in Figure 3, which

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2023.22.13

G. Erna, G. Srihari, M. Purna Kishore,

Ashok Nayak B., M. Bharathi

E-ISSN: 2224-266X

116

Volume 22, 2023

comprises half of the right part, was discarded.

TSRoBA multiplier power reduction was

accomplished using an XOR MUX-based complete

adder in the reduction process. It is noticeable that 2:1

MUX combined with full adder incorporated with

carrying save adder due to this effect,The path delay

produced is shown in equation (12). Two stages of

XOR gates handle the sum logic, while two stages of

XOR gates handle the summed logic in the complete

adder architecture. Figure 4 shows a complete adder

design with two stages of XOR gates for the

cumulative logic. As a result, the power dissipation

caused by a sh ort circuit can be prevented.

Furthermore, changing behaviour is reduced since the

key inputs directly activate all internal nodes.

However, this power rate can be reduced at the

expense of increased area, [24]. The Truncated shifter

and XOR-MUX full adder were designed in the

TSRoBA. So, these techniques can be used to trade

accuracy for reduced power consumption or area in

approximate multipliers comparatively with existing

methods. However, let me explain how XOR gates

and multiplexers might be used in approximate

multipliers.

Fig. 4: XOR-MUX with Full adder circuit design is

illustrated

Delay=NOT+2MUX (12)

5.3 Reconfiguration of Truncated Shifter

RoBA

Truncated multiplication is an efficient method to

minimize the hardware requirements of rounded

parallel multipliers. Only the n + k most meaningful

columns of the multiplication matrix are used to

compute the product in truncated multiplication. The

error caused by ignoring the n-k most miniature

columns and rounding the multiplication result to n

bits is determined, and Figure 3 determines the error.

A constant is added to columns n- 1 to n- k of the

multiplication matrix with the constant correction

Truncated shifter Multiplier presented in Figure 3, and

the simulation response is shown in Figure 9. The

outcome of the simulation is shown in Figure 8. In

this scenario, ordinary multiplication, M=68, and

N=104, which produces the product output=7072,

existing Rounding approximated multiplier (RoBA

existing multiplication), Mr=68 and Nr=104, then

approximate output is 7168. Truncated SHIFTER

RoBA output is 38 as per the least significant value

discarded in the planned architecture shown in Figure

3, so the truncated architecture indicates a reduction in

the relative errors as p er the formula. The overall

relative error was given in the existing RoBA and

planned Truncated RoBA as per the model simulator

results, which is error relative analysis is, [25].

[(Mr*Nr) Appxomate-

(M*N)exact*100]/[(M*N)Exact

[7168-7072]*100/7072=1.35%

Calculated the error in the Truncated (RoBA)

multiplication output is 38

[(Mr]*Nr)Truncated,approximate–

(M*N)exacted*100/Exact value

[38-7072]*100/7072= -99.99%

The constant helps compensate for the reduction

error made by omitting the n- k lowest significant

rows and the error caused by rounding the product to

n bits (called rounding error). E-total is the

conditional probability of the sum of these errors.

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2023.22.13

G. Erna, G. Srihari, M. Purna Kishore,

Ashok Nayak B., M. Bharathi

E-ISSN: 2224-266X

117

Volume 22, 2023

Fig. 5:

Truncated Rounded Multiplication

The significant benefits of the truncated shifter

(Figure 5) RoBA multiplier are stated as follows, [26].

1. It provides a significant multiplier design for both

signed and unsigned operation

2. Minimal logic size

3. Minimal Power and delay

4. Trimmed partial products, the approximated

multiplier

5. Several hardware logic primitives are available in

this research

6. Improve the performance.

5.4 Truncated Shifter Integrated With Carry

Save Adder

Due to many similar critical operations, increasing the

operand efficiency of the structure by transistor tuning

productivity incurs substantial costs. When the output

carry bits are passed side to side slightly downward

instead of just to the right, as shown in Figure 6, the

multiplication result remains unchanged. We are made

up of including an extra adder called a feature adder to

achieve the desired result. The resulting multiplier is a

take multiplier because the carry bits are not quickly

posted but "stored" for another adder stage. A quick

carry-propagate (e.g., bring) adder stage combines

carries and sums in the final step despite a slight

inclination, [27].

 = + (  1) + (13)

Fig. 6: Architecture of CSA (CARRY SAVE ADDER)

The digital carry-save adder is used in the design

of ALU for processor units. It necessitated the

creation of a new design to increase computation

speed. Adders and multipliers are used in the CMOS

architecture, which also includes high-performance

processors. The created carry-save adder differed

from digital adders in that it was used in the

computation process to summate three or more binary

numbers with n-bit. In the same way, output values

are obtained as a sum and carry bit sequences. The

LSB part was eliminated involved in the carry-save

adder, which comprises two lookup tables: one for the

carry-out bit and the other for the sum bit. The FPGA

implementation of CSA uses fewer logic gates with

effective XOR-MUX complete full adder design for

carrying propagate addition (CPA) with minimum

LUT implementation, [28].

5.5 Carry Save Adder With Logic Gate

CSA stands for Carry Save Adder, a digital logic

circuit that can quickly add multiple numbers. It is

commonly used in high-speed arithmetic circuits, such

as multipliers and digital signal processors.

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2023.22.13

G. Erna, G. Srihari, M. Purna Kishore,

Ashok Nayak B., M. Bharathi

E-ISSN: 2224-266X

118

Volume 22, 2023

Fig. 7: Addition operation of CSA with Full Adder

The operation of a CSA gate involves three main

steps:

1. Carry Save Operation: In this step, input numbers A

and B are added together without Considering any

carry from the previous bit position. The sum output S

and carry output C are generated.

2. Cascaded Carry Save Operation: In this step,

multiple CSA gates are cascaded together toAddmore

than two numbers. The output of each CSA gate is

connected to the input of thenext gate. TSRoBA

allows the carry outputs of each CSA gate to be saved

and used in the final addition step.

3.Final Addition: In this step, the saved carry outputs

are added to the sum outputs of the lastCSA gate.

Carry save adder generates the final result of the

addition operation. The CSA Gatecan be implemented

using various logic circuits, such as full adders and

XOR gates.The specific Implementation depends on

the desired performance and application requirements,

and the addition operation appears in Figure 8, [ 29].

The addition operation of CSA with Full Adder is

presented in Figure 7.

5.6 The Ex-Or Mux Subtractor

In a truncated shifter rounding-based signed binary

number approximated multiplier, the EX-OR MUX

SUB TRACTOR is used to multiply two binary

numbers with approximate accuracy using a

combination of shifts, additions, and subtraction. The

EX-OR MUX SUB TRACTOR is used to implement

the third option, which involves selecting the original

value of the partial product rather than shifting or

adding/subtracting it. The truncated shifter is

accomplished by using an exclusive-OR (EX-OR)

gate to select between the original partial product and

its two complements (i.e., negation), based on

rounding the operands M and Mr. The resulting

products from each group are then added together,

with intermediate results being rounded and truncated

as necessary to reduce the number of bits and improve

performance. The overall accuracy of the multiplier is

determined by the size and distribution of the partial

products and the accuracy of the rounding and

truncation operations. The sign set block's primary

function is to adjust the sign of the final multiplication

result. When the obtained sign (from the symbol

extractor) for the two input values changes, the

subtractor's output modifications, [30], [31].

5.7 Results and Discussion

The proposed TSRoBA has 0 slices with a t otal

availability of 11,440. The total number of LUTs used

is 370, w ith 365 logics and an availability value of

5,720. According to the utilization review, proposed

RoBA registers have a utilization of 1%, slices LUTs

have a u tilization of 6%, and logic slices have a

utilization of 6%. The proposed RoBA multiplier

design layout can be seen in Figure 3. Shifter, adder,

subtractor, sign detector, sign collection, and rounding

are all parts of the proposed architecture. The study of

the second parameter is power utilization. The

proposed RoBA method consumes 0.014 w atts of

electricity, implying that the proposed solution

substantially decreases total energy consumption in

high-power applications.

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2023.22.13

G. Erna, G. Srihari, M. Purna Kishore,

Ashok Nayak B., M. Bharathi

E-ISSN: 2224-266X

119

Volume 22, 2023

6 Hardware Implementation

Fig. 8: RTL schematic for TruncatedShifter Rounded Based Approximated Multiplier

Fig. 9: Simulation outcome of Truncated shifter RoBA multiplier

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2023.22.13

G. Erna, G. Srihari, M. Purna Kishore,

Ashok Nayak B., M. Bharathi

E-ISSN: 2224-266X

120

Volume 22, 2023

Fig. 10: Power consumption of Truncated shifter RoBA multiplier

Table 1. Comparisons between Barrel shifter with Brent Kung adder and Truncated shifter multiplier with carry

save adder in RoBA.

Parameters

Existing ROBA Multiplier (Barrel Shifter

Multiplier -Brent Kung Adder), [16]

Proposed ROBA Multiplier

(Truncated Shifter-carry save

adder)

Reduction (%)

Slice Register

100

LUT's

370

243

Occupied Slices

142

102

IOB 32 24 25

Delays (ns)

52.093

35.311

Power (mW)

0.015

0.014

Error Analysis

1.35

-99.99

The proposed RoBA architecture effectively

reduces the amount of power and energy used. The

overall description of the proposed TSRoBA

architecture is shown in Figure 3. The proposed

RoBA consumes a minimum of 1440 by tes of

memory without being used. The number of slices

occupied is 102, w ith a total supply of 1,430,

indicating a 7 percent utilization rate. The number of

processed unused flip-flops used is 243, w ith a total

consumption value of 243, meaning that the entire

process is used. The overall latency was reduced due

to eliminating the right-hand partial products.

The proposed TSRoBA has a lower gate and net

delay LUT according to the implementation results.

All LUTs in the planned TSRoBA have a considerable

gate delay value of 0.254ns, and the net delay values

vary slightly between LUTs. The TSRoBA

demonstrated that the proposed RoBA output

significantly reduces delay and power consumption.

The overall description of the findings obtained is

given in Table 1. The results showed that the proposed

LUTs aresubstantially less than the current RoBA,

with a reduction of 34%. The proposed RoBA has 102

occupied slices, while the current RoBA has 142, with

a 28 percent reduction rate, and slice registers are

almost all reduced in innovation truncated RoBA, as

shown in Table 1.

Table 1 describes the barrel shifter used in a digital

circuit that can shift its multiple data inputs by a

specified number of positions in a single clock cycle.

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2023.22.13

G. Erna, G. Srihari, M. Purna Kishore,

Ashok Nayak B., M. Bharathi

E-ISSN: 2224-266X

121

Volume 22, 2023

Fig. 11: Comparison chart between existing and proposed RoBA

It consumes more power dissipation because its

operation depends on a combinational circuit. It is

used in the processor multiplication circuits. Brent-

Kung adder is the parallel adder, which has more

stages than the KS adder, but it consists of fewer

lookup tables and drives the output, which takes a

long time. A truncated multiplier is a multiplier that

discards the least significant bits of the product,

providing a truncated unit result.

Truncated shifters are typically used in arithmetic,

where only a limited number of bits are needed for the

fractional part of the result. Carry-save adders are

used in the early stages of multiplier circuits to

compute partial products efficiently, and they offer the

add-subtract operations more bit sizers in estimated

multipliers. TSRoBA multiplier might consist of a

truncated multiplier, barrel shifter, and carry-save

adder design with XOR-MUX full adder to optimize,

so reduced dynamic power and lesser logic sizes are

available in TSRoBA (Truncated Shifter Rounding

Based approximated Multiplier).

Similarly, for the present or traditional and

proposed RoBA, the delay and power are decreased

by 32% and 53%, respectively. The current RoBA has

a delay of 52.093ns, while the planned RoBA has a

nominal delay of 35.311 ns, a 32 percent reduction in

delay. The power measurement for the current RoBA

is calculated to be 0.030mW, while the planned RoBA

offers a power level of 0.014mW with a 53 p ercent

reduction in power consumption. The results showed

that the proposed TSRoBA performed significantly

better than the current RoBA, and the comparison

chart is shown in Figure 10. A comparison chart

between existing and proposed RoBA is presented in

Figure 11.

7 Conclusion

In this work, we introduced the TSRoBA multiplier,

an energy-efficient approximation multiplier that

bypasses the least significant partial product gates.

This proposal implements an exclusive Truncated

shifter Rounding Method multiplier with a high-level

Parallel XOR-MUX adder to minimize power

consumption and multiplier configuration logic gates.

The logical settings are reduced when using this

truncated ROBA multiplier with XOR MUX adder in

the arithmetic and shifting operation design. This

work aims to reconsider the rounding multiplier,

focusing on f ield approximation and capacity

reduction. Round parallel multipliers' power

dissipation and area can be effectively decreased by

using truncated shifter RoBA multiplication. The

synthesis results show that when using an average

XOR-MUX full adder instead of the current RoBA

multiplier, TSRoBA (Truncated shifter Rounding

approximated multiplier) reduces delay by 32%, slice

operand sizes of 8 bits. Additionally, the effectiveness

of the presented approximate multiplication method

has been investigated in two image processing

applications: smoothness and brightening. The

100

150

200

250

300

350

400

Existing ROBA Multiplier

(Normal Shifter Multiplier -

Brent Kung Adder)

Proposed ROBA Multiplier

(Truncated Shifter-carry save

adder)

Reduction (%)

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2023.22.13

G. Erna, G. Srihari, M. Purna Kishore,

Ashok Nayak B., M. Bharathi

E-ISSN: 2224-266X

122

Volume 22, 2023

comparison showed that the picture parameters

matched those of accurate multiplication techniques.

Acknowledgements We thank the college

management and principal for their continuous

support to do the research in the domain of VLSI in

signal processing, at PACE Institute of Technology &

Sciences- Ongole AP in IndiaAuthors’ contributionto

this innovation reduced the least significant columns

are eliminated to minimize primitive gate size. So, this

truncated shifter RoBA multiplier is used in DCT and

DWT to enhance the images sharpen the image pixels

and at the same time compress the data in JPEG and

MPEG.

Funding applicable, beforesemi customized ASIC,

we implemented prototype Truncated RoBA used in

the Digital Signal Processing operations required in

the MAC and image compressions in the Decreate

Cosine Transform. If is it validated using model

simulation waveform then we go to netlist in the

physical design using Cadence Tool Coding

Accessibility Users can find the necessary schematic

right here.

References

[1] Mahalingam, V., & Ranganathan, N. (2006).

Improving accuracy in Mitchell's logarithmic

multiplication usingoperand

decomposition. IEEE Transactions

onComputers, 55(12), pp.1523-1535.

[2] Kelly, D., Phillips, B., & Al-Sarawi, S. (2009).

Approximate signed binary integer multipliers

for arithmetic data value speculation

[3] Schulte, M. J., & Swartzlander, E. E. (1993,

October). Truncated multiplication with

correction constant for [DSP]. In Proceedings of

IEEE workshop on VLSI signal processing (pp.

388-396). IEEE.

[4] Han, J., &Orshansky, M. (2013, May).

Approximate computing: An emerging paradigm

Forenergy-efficient design. In 2013 18th IEEE

European Test Symposium (ETS) (pp. 1-6).

IEEE.

[5] Momeni, A., Han, J., Montuschi, P., &

Lombardi, F. (2014). Design and analysis

ofapproximate compressors for

multiplication. IEEE Transactions on

Computers, 64(4), pp.984- 994.

[6] Narayana Moorthy, S., Moghaddam, H. A., Liu,

Z., Park, T., & Kim, N. S. (2014)Energy-

efficient approximate multiplication for digital

signal processing and

satisfactionapplications. IEEE transactions on

very large scale integration (VLSI)

systems, 23(6), 180-1184.

[7] Muruges wari, S., & Mohideen, S. K. (2014,

July). Design of area efficient and lowpower

multipliers using multiplexer based full adder.

In Second International Conference on Current

Trends In Engineering and Technology-ICCTET

2014 (pp. 388-392). IEEE.

[8] Schulte, M. J., & Swartzlander, E. E. (1993,

October). Truncated multiplication with

correction constant [for DSP]. In Proceedings of

IEEE workshop on VLSI signal processing (pp.

388-396). IEEE.

[9] Huang, Y. H., Ma, H. P., Liou, M. L., &Chiueh,

T. D. (2004). A 1.1 G MAC/s sub-word-parallel

digital signal processor for wireless

communication applications. IEEE Journalof

Solid-State Circuits, 39(1), 169-183.

[10] Radhakrishnan, P., &Themozhi, G. (2020).

FPGA implementation of XOR-MUX fulladder

based DWT for signal processing

applications. Microprocessors and

Microsystems, 73, 102961.

[11] Chen, K. H., &Chiueh, T. D. (2003, May).

Design and implementation of a reconfigurable

FIR filter. In 2003 IEEE International

Symposium on Circuits and Systems

(ISCAS) (Vol. 4, pp. IV-IV). IEEE.

[12] Umasankar, A., Vasudevan, N.,

&Kirubanandasarathy, N. (2015).Area Efficient

and LowPower Reconfiurable Fir

Filter. International Journal of Computer

Science and Network Security (IJCSNS), 15(8),

50.

[13] Mohanty, B. K., Meher, P. K., Singhal, S. K.,

&Swamy, M. N. S. (2016). A high-performance

VLSI architecture for reconfigurable FIR using

distributedarithmetic. Integration, 54, 37-46.

[14] de la Guia Solaz, M., Han, W., & Conway, R.

(2012). A flexible low power DSP with a

programmable truncated multiplier. IEEE

Transactions on Circuits and Systems I: Regular

Papers, 59(11), 2555-2568.

[15] Liu, C., Han, J., & Lombardi, F. (2014, March).

A low-power, high-performance approximate

multiplier with configurable partial error

recovery. In 2014 Design, Automation & Test in

Europe Conference & Exhibition (DATE) (pp. 1-

4). IEEE.

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2023.22.13

G. Erna, G. Srihari, M. Purna Kishore,

Ashok Nayak B., M. Bharathi

E-ISSN: 2224-266X

123

Volume 22, 2023

[16] Zendegani, R., Kamal, M., Bahadori, M., Afzali-

Kusha, A., & Pedram, M. (2016). RoBA

multiplier: A rounding-based approximate

multiplier for high-speed yet energy-efficient

digital signal processing. IEEE Transactions on

Very Large Scale Integration

(VLSI)Systems, 25(2), 393-401.

[17] Manogna, M. (2020). RoBA Multiplier: A

Rounding-Based Approximate Multiplier for

High-Speed yet Energy-Efficient Digital Signal

Processing (No. 4801). EasyChair.

[18] Mittal, A., Nandi, A., & Yadav, D. (2017).

Comparative study of 16‐order FIR filter design

using different multiplication techniques. IET

Circuits, Devices & Systems, 11(3), 196-200.

[19] de la Guia Solaz, M., & Conway, R. (2014).

Razor based programmable truncated multiply

and accumulate, energy-reduction for efficient

digital signal processing. IEEE Transactions on

Very Large Scale Integration (VLSI)

Systems, 23(1), 189-193.

[20] Balasubramanian, P., & Mastorakis, N. E.

(2009). High speed gate level synchronous full

adder designs. WSEAS Transactions on Circuits

and Systems, 8(2), 290-300.

[21] Erna, G., Saidulu, V., Srihari, G., & Rao, K. B.

(2024). FPGA Implementation of Adaptive

Hold Logic Vedic Fused Dot Product Floating-

Point Multiplier Using Razor Flip-

Flop. International Journal of Intelligent

Systems and Applications in

Engineering, 12(2s), 420-434.

[22] Kuang, S. R., & Wang, J. P. (2007). Design of

power-efficient pipelined truncated multipliers

with various output precision. IET Computers &

Digital Techniques, 1(2), 129-136.

[23] Osta, M., Ibrahim, A., Chible, H., & Valle, M.

(2017, September). Approximate multipliers

based on inexact adders for energy efficient data

processing. In 2017 New Generation of CAS

(NGCAS) (pp. 125-128). IEEE.

[24] Kumudha, G., & Ramesh, K. B. (2022). High

Speed Gate Level Synchronous Full Adder

Designs. Journal of VLSI Design and its

Advancement, 4(3).

[25] Suhasini, P., & Selvakumar, D. J. (2018).

Modified Rounding Based Approximate

Multiplier (MROBA) and MAC Unit Design for

Digital Signal Processing. International Journal

of Pure and Applied Mathematics, 118(18),

1539-1545.

[26] Swartzlander, E. E. (1999, October). Truncated

multiplication with approximate rounding.

In Conference Record of the Thirty-Third

Asilomar Conference on Signals, Systems, and

Computers (Cat. No. CH37020) (Vol. 2, pp.

1480-1483). IEEE.

[27] Schulte, M. J., Stine, J. E., & Jansen, J. G.

(1999, March). Reduced power dissipation

through truncated multiplication. In Proceedings

IEEE Alessandro Volta MemorialWorkshop on

Low-Power Design (pp. 61-69). IEEE.

[28] Erle, M. A., Schulte, M. J., & Hickmann, B. J.

(2007, June). Decimal floating-point

multiplication via carry-save addition. In 18th

IEEE Symposium on Computer Arithmetic

(ARITH'07) (pp. 46-55). IEEE.

[29] Koc, C. K., & Hung, C. Y. (1990). Multi

operand modulo addition using carry save

adders. Electronics Letters, 26(6), 361-363.

[30] Alioto, M., & Palumbo, G. (2000). Modeling

and optimized design of current mode

MUX/XOR and D f lip-flop. IEEE Transactions

on Circuits and Systems II: Analog and Digital

Signal Processing, 47(5), 452-461.

[31] Kumudha, G., & Ramesh, K. B. (2022). High

Speed Gate Level Synchronous Full Adder

Designs. Journal of VLSI Design and its

Advancement, 4(3).

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2023.22.13

G. Erna, G. Srihari, M. Purna Kishore,

Ashok Nayak B., M. Bharathi

E-ISSN: 2224-266X

124

Volume 22, 2023

Contribution of Individual Authors to the Creation

of a Scientific Article (Ghostwriting Policy)

- G. ERNA

Conceptualization: G. ERNA contributed the

problem statements, and problems in fixed

multipliers; it multiplied only unsigned operands,

occupied more partial products, and faced more

switching activities. So, Idecided to research

innovative ideas by searching in approximated

multipliers with new specifications of architecture.

Effectively finalized with truncated shifter rounding

multiplier with suitable adder.

Focusing: Writing the RTL code in Virology HDL

for the truncated shifter.

Validation: I conducted extensive simulations and

validation experiments to ensure the simulation

results had a lower error rate in digital signal

processing.

- G. SRIHARI

G. SRIHARI chosen the Xilinx 14.7 Software: He

synthesized and implemented the performance of a

truncated shifter.

Methodology: He studied fixed and inaccurate

multipliers for image enhancement in recently

estimated articles with possible rounding

techniques. To reduce power dissipation in parallel

multipliers, concentrated on the problem analysis

and be responsible for designing and formulating

the methodologies employed.

- M. PURNA KISHORE

Hardware Implementation: PURNA KISHORE

used Vertex-5-5 FPGA implementation and

synthesis design aspects, ensuring the proposed

techniques could be realized in hardware. He Found

the prototype methods to minimize the logic gates

count and save power opportunities with hardware

selections in approximate Multipliers.

Data Analysis: He conducted in-depth data analysis

and interpretation of the simulation results obtained

from the experiments using model simulation.

- ASHOK NAYAK.B

ASHOK NAYAK.B contributed the Writing -

Original Draft Preparation.He taken the lead in

drafting the initial manuscript, including the

introduction, methodology, and results

sections.Writing - Review & Editing: Contributed to

reviewing and editing the manuscript, providing

critical input for clarity and coherence.

- M. BHARATI

BHARATI contributed an excellent way to save

circuit space in unsigned and signed designs is to

truncate some LSBs or portions of the PPs of the

input operands. Comparatively reduced magnitude

errors and was used in image processing and

machine learning applications.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

No funding was received for conducting this

experiment.

Conflict of Interest

There is no c onflict of interest between us.

Acceptance of participation. We have decided to share

and publish this research in the journal

Moral

Endorsement. This is a portion of my proposed

study results and is novel with citations to

resources.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en_

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS

DOI: 10.37394/23201.2023.22.13

G. Erna, G. Srihari, M. Purna Kishore,

Ashok Nayak B., M. Bharathi

E-ISSN: 2224-266X

125

Volume 22, 2023