FPGA Implementation of High-Performance Truncated Rounding based
Approximate Multiplier with High-Level Synchronous XOR-MUX Full
Adder
G. ERNA1a, G. SRIHARI2b, M. PURNA KISHORE3, ASHOK NAYAK B.4c, M. BHARATHI5d
1Department of Electronics and Communication Engineering,
PACE Institute of Technology & Sciences (UGC Autonomous), Ongole-523272,
Andhra Pradesh,
INDIA
2Department of CSE, School of Technology,
The Apollo University,Chittoor, Andhra Pradesh,
INDIA
3Department of Electronics and Communication Engineering,
KKR & KSR Institute of Technology and Sciences (UGC Autonomous), Guntur, Andhra Pradesh,
INDIA
4Department of ECE, Marri Laxman Reddy Institute of Technology and Management (UGC
Autonomous), Dundigal,Telangana,
INDIA
Department of ECE, School of Engineering & Technology,
Mohan Babu University, Erstwhile Sree Vidyanikethan Engineering College, A.P, Tirupati,
INDIA
aORCiD: 0000-0002-4427-9002
bORCiD: 0000-0002-0233-4958
cORCiD: 0009-0003-0628-9861
dORCiD: 0000-0002-8633-1921
Abstract: - In research and development, the most emerging field in digital signal processing and image processing
is rounded-based approximated signed and unsigned multipliers. In the present research, we propose some cutting-
edge, Preformation, and logic simplification technology connected to processing the Discrete cosine transform
(DCT) and Discrete wavelet transform (DWT) images for sharpening. This technology will yield a truncated shifter
incorporated with logical XOR-MUX Full adder techniques. A reliable and cost-effective approximate signed and
unsigned multiplier was created for the rounding method. While this more advanced technology includes many
approximate multipliers, it sacrifices the ability to find the closest integer of a rounded value when combining
signed and unsigned capabilities, resulting in higher absolute errors than other approximate multipliers based on
rounding. This proposed work will introduce a novel method of Truncated Shifter Rounding-based Approximate
Multiplier integrated with a High-Level Synchronous XOR-MUX Full Adder design to minimize the number of
logic gates and power consumption in the multiplier architecture. The Truncated RoBA (Rounding-based
Approximate Multiplier) with XOR MUX Full Adder will reduce the logic size in the shifter and the arithmetic
circuit. The work will modify this rounding-based approximate multiplier to minimize area, delay, and power
consumption. This proposed architecture will be integrated with two fundamental changes: firstly, its Barrel shifter
method will be replaced with a truncated shifter multiplier with XOR MUX Full Adder, and secondly, the parallel
prefix Brent Kung adder will be replaced with a carrying-save adder with XOR MUX Full Adder. Finally, this
architecture was designed using Verilog-HDL and synthesized using the Xilinx Vertex-5 FPGA family, targeting
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS
DOI: 10.37394/23201.2023.22.13
G. Erna, G. Srihari, M. Purna Kishore,
Ashok Nayak B., M. Bharathi
E-ISSN: 2224-266X
111
Volume 22, 2023
the device Xc7Vx485tFFg1157-1. It resulted in a reduction of area LUT (34%), power (1%), delay (32%), and
error analysis (75%) when compared to the existing RoBA.
Key-Words: - FPGA, Truncated shifter, Barrel Shifter, Prefix adder, Carry save adder, XOR-MUX- Full adder.
Received: January 11, 2023. Revised: Octobert 6, 2023. Accepted: November 7, 2023. Published: December 4, 2023.
1 Introduction
One of the most effective ways to arrange the input
data before processing is to use the rounding
approach. By taking advantage of the built-in error
robustness of some applications, like digital signal
processing, multimedia, and machine learning,
inaccurate circuits decrease the necessary hardware,
[1]. In recent years, approximate computing has
become popular as a n achievable approach for
designing digital systems that use less energy. Many
systems and applications must be able to accept a
certain amount of error or optimal outcome loss in the
computed result for approximate computing to be
successful. Three steps are involved in multiplication
on a circuit: 1. partial product production, 2.
accumulation, and 3. final addition. Truncation of the
least significant portion of the partial product column
is frequently utilized for quick and effective multiplier
construction, [2]. Almost all electronic systems are
subjected to challenges in design with the requirement
of energy minimization, specifically for portable
devices such as tablets, smartphones, and electronic
gadgets. A desire is to achieve reduced performance
speed at a minimal penalty rate. In those mobile
devices, Signal Processing blocks exhibit vital
components with the realization of different
multimedia-related applications. The core
computational blocks consist of an arithmetic logic
unit with a multiplication process to achieve higher
arithmetic operations within DSP systems. Hence, to
achieve higher efficiency of the processor, it is
necessary to improve the characteristics of the
multiplier, such as speed, power, and energy. Several
DSP cores implement image and video processing
algorithms to derive the desired final output through
pictures or video for efficient human use. This reality
played a role in the desire to achieve productivity in
terms of speed or energy, [3]. The arithmetic unit
approximation exhibits various design abstraction
levels consisting of circuit, architecture level, and
logic in the software layer of the system. However,
few approximation techniques are implemented with
multiple techniques subjected to certain violations
regarding over-scaling voltage or over-clocking—
similarly, functional-based approximation methods
concentrated on modification of Boolean function. In
the case of the functional approximation method, it
consists of arithmetic building blocks at various
design levels with adders and multipliers, [4].
This nearest value and altered shifted products
focused on achieving great speed, low dynamic
switching transitions, and energy-efficient
performance with the estimated multiplier. The
developed approximate multiplier concentrated on
error resilience for DSP applications, [5]. The recently
developed truncated Carry Save Adder approximate
multiplier involved area efficiency, construction, and
modification of the conventional multiplication
approach and Boolean function at different algorithm
levels with an assumption of rounding off input
values. The proposed method is a Truncated shifter
rounding-based approximate (RoBA) multiplier
integrated with a ca rry-save adder using an XOR-
MUX adder, [6]. This concept describes halving the
partial products in the final addition. Also, image and
video processing applications focused on the accuracy
of arithmetic operations in conjunction with the
system's functionality, [7]. Approximate multipliers
multiply signed and unsigned integers and rely on
quantification techniques, i.e., truncation and
rounding. The truncation is the method of discarding
all bits on the LSB side, implementing an 8-bit RoBA,
which cuts down the LSB part of the nearest
multiplier, [8]. The estimated calculation to the goal
decreased the arithmetic circuit. The process of
rounding and truncation minimizes hardware
complexity. Reduced the size of the complexity to
save power consumption, but the multiplication output
is incorrect. The expected multiplier generates errors
relative to the actual multipliers. Errors are minimized
due to an increase in the scale of the operand sizes,
[9]. Compared to rounding higher bits, rounding lower
bits results in less error, according to simple logic.
Design the approximate truncated shifter multipliers
using the three approaches, and the overall work
contributes to these evaluate roles, given below about
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS
DOI: 10.37394/23201.2023.22.13
G. Erna, G. Srihari, M. Purna Kishore,
Ashok Nayak B., M. Bharathi
E-ISSN: 2224-266X
112
Volume 22, 2023
the present work.1. Approximate partial product
generation and removal of least significant bits (LSB)
from partial product coefficients. They were cutting
the partial product of LSB and ignoring the logic gates
of partial products using the truncation technique. 3.
Use an imperfect circuit with fewer elements and
enhance the Turunacted Shifter's inexact multiplier
performance. The above three plans to shrink the
partial products and low power Combinational logic
and using fewer hardware components gates are used
like arithmetic gates are XOR and XNOR and
adaptable with NAND gate also used in the estimated
Multipliers. Approximate computing has overcome
the overflow and underflow situation because half of
the part of the product was discarded, [10].
2 Literature Review on Inaccurate
Multiplier
This section reviews existing literature for high-speed,
energy-efficient, low-power DSP applications. The
existing literature considered for evaluation is
presented as follows.
An architecture for efficient granularity stated as
digit-reconfigurable finite impulse response (FIR),
[11], exhibits compact and low-power characteristics
with broader precision and length. The proposed FIR
architecture comprises an 8-digit reconfigurable FIR
filter implemented with CMOS technology. The
developed technology incorporates a single-poly
quadruple-metal of 0.35 m. Simulation results
exhibited that fabricated chips operate at a frequency
of 86MHz with a power of 16.5mW for a power
supply of 2.5V.
FIR filters are a digital technique broadly adopted
in DSP applications with appropriate stability, linear
phase, minimal finite precision error, and efficient
processing. The developed reconfigurable architecture
exhibits minimal FIR filter complexity with
programmable shifts. The proposed architecture
involved the modification of adder architecture with
the integration of a carry save adder rather than a
typical adder unit. The proposed architecture
exhibited a 12% reduction in power and area
compared with the existing reconfigurable FIR filter.
The developed FIR architecture was implemented and
evaluated in Spartan-3 xc3s200-5pq208 field-
programmable gate array (FPGA), [12].
A Transpose form FIR filter constructed in the
pipelined manner with support for multiple constant
multiplications (MCM) techniques to significantly
reduce the computation process. However, the
configuration of transpose does not rely directly on
the processing block as direct form configuration. The
developed technique explores the possibility of FIR
block with the transpose of configuration with
reduced area delay for higher-order filters for
applications such as fixed and reconfigurable
parametersthe detailed analysis of the
computational process involved in transpose of FIR
filter based on the conical signed digit. The proposed
technique is involved in FIR optimization of block
filter transpose difficulty. The developed architecture
involved the construction of architecture with a
general multiplier integrated with a b lock filter for
different reconfigurable applications. The MCM
scheme incorporates design with minimal complexity
with block implementation of FIR. Instead of using
the current block in a direct-form structure with more
considerable filter lengths, the suggested architecture
lowered the minimal area delay product (ADP) with
reduced energy per sample (EPS) with the filter's short
length. The implementation block involved in FIR
structure directly with reduced EPS and ADP.
Applying a specific integrated circuit exhibits
significant results rather than the proposed
architecture with a 4- and 64-bit fixed-length block
size. The analysis of results stated that ADP and EPS
performance is reduced by 40% rather than the
existing FIR filter structure. With a similar filter
length and size of the sample, the block exhibits ADP
with 13% and EPS value of 12.8% rather than the
existing FIR structure in directform, [13].
A new architecture for truncated multiplication and
compare it to previous implementations—the first
DSP device to incorporate a truncated multiplier into a
programmable DSP block, [14]. When used as part of
a complete DSP scheme, column-wise implementation
has been shown to help with the power-SNR trade-
offs of arithmetic units based on truncated multipliers
with very little overhead. In reality, software-based
error compensation is a realistic and efficient solution.
The efficiency of programmable truncation has been
thoroughly tested in a DSP block designed for simple
yet arithmetically complicated algorithms and
evaluated at the system level. The findings show that
conservative power savings of up to 15% of full DSP
power can be achieved while maintaining appropriate
SNR levels. The practical advantages of truncated
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS
DOI: 10.37394/23201.2023.22.13
G. Erna, G. Srihari, M. Purna Kishore,
Ashok Nayak B., M. Bharathi
E-ISSN: 2224-266X
113
Volume 22, 2023
multiplication are demonstrated using several real-
world algorithms running on a computer. The
architecture was made using the TSMC 90 nm
process, and samples showed that it worked
adequately regarding functionality and power savings.
Multiplier with less power and a shorter critical
path than traditional multipliers for high-performance
DSP applications, [15], employs a newly developed
estimated adder that limits carry propagation to the
nearest neighbours for fast partial product
accumulation. Configurable error recovery can
achieve various levels of accuracy by using different
numbers of the most significant bits (MSBs) for error
reduction. Most errors are slight since the approximate
multiplier has a small mean error gap. In contrast to
the Wallace multiplier, a 16-bit predicted multiplier
implemented in a 2 8nm CMOS system decreases
delay and power by 20% and up t o 69 pe rcent,
respectively. Due to ample error recovery, the
proposed approximate multiplier achieves processing
precision equivalent to traditional exact multipliers
but with significant power and efficiency gains.
3 Existing Technique on RoBA
Figure 1 s hows the existing technique RoBA
multiplier consisting of a sign detector and Rounding
methods, and it consists of three-barrel shifters for
higher speed and energy efficiency. The sign detector
finds the positive and negative integer; if it finds the
negative number, it converts it into 2’s complement
and rounding methods. To determine the values of the
closest inputs, this block rounds the absolute values.
As a result, after rounding, the output values are
recovered in the form of 2n. Rounding values were
generated; similarly, Ar and Br, and these values are
given to three Barrel shifter blocks based on the
products of n bit width are adjusted on log2Ar or
log2Br depending on ope rand size, [16]. The
developed approach is based on round operands with
neighbouring exponent values of two. In the proposed
method, multiplication is considered a significant part
of improving speed and energy consumption with
minimal cozy and error. Even the developed approach
can be appropriate for both signed and unsigned
multiplications. The related techniques are adopted in
three hardware implementations with approximate
multipliers. The multiplier involved in consideration
of both unsigned and signed operations. The
performance of the related work of the RoBA
multiplier is comparatively examined with
approximate and multiplier considering various design
parameters. The existing RoBA with barrel shifter
multiplier is also evaluated and examined in image
processing applications such as image sharpening and
smoothing, [17].
Fig. 1: Existing RoBA Multiplier architecture with Brent Kung Adder
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS
DOI: 10.37394/23201.2023.22.13
G. Erna, G. Srihari, M. Purna Kishore,
Ashok Nayak B., M. Bharathi
E-ISSN: 2224-266X
114
Volume 22, 2023
4 RoBA Integrated with Barrel Shifter
Included WithBrent Kung Adder
The function of the barrel shifter in rounded
approximated multipliers is to shift the binary
numbers being multiplied by a power of two. For
example, suppose the multiplied binary numbers are
shifted left by one position, Br*A, Ar*B, and Ar*Br.
In that case, the resulting limited products will be half
the size of the original partial products, and the
number of partial products required for multiplication
will be reduced by half. In addition to reducing the
number of partial products, barrel shifters can also be
used to perform rounding in the nearest approximated
multipliers. However, the approximation introduced
by rounding can result in errors in the final result,
which may be acceptable in specific applications that
produce image sharping in image processing and
video processing. In operating rounded-based
approximate multipliers using Brent-Kung adder, the
inputs are first rounded to the nearest power of two.
The partial products are very few for multiplication,
reducing the adder operations. The partial products are
then added using a Brent-Kung adder. The Brent-
Kung adder uses a p arallel prefix structure that
reduces the propagation delay and makes it suitable
for a more extended wait time operation than the
KONGE STONE adder. Using rounded-based
approximate multipliers with Brent-Kung adder have
logic size can result in significant savings in power
consumption and chip area compared to traditional
multipliers. However, the approximation introduced
by rounding can result in errors in the final result,
which may not be acceptable in specific applications,
[18]. The structure of the 8-bit BrentKung adder is
presented in Figure 2.
Fig. 2: Structure of 8-Bit BrentKung adder
5 Proposed Method of Truncated
Shifter RoBACarrySave Adder
The main contributions of this proposed method are
divided into two methods.
1. Truncated shifter RoBA multiplication with
modification of full adder design using XOR-MUX
approaches.
2. To Reconfigurable three-hardware architecture with
truncated shifter Carry Save Adder RoBA
Multiplication of both signed and unsigned
operations, [19].
The 8-bit Truncated shifter Multiplier architecture
combines an XOR MUX Full Adder design, an 8-bit
carry-save adder, and an 8-bit subtract. The truncated
shifter-rounded-based- approximate multiplier
integrated with carry save adder with synchronous
XOR-MUX full adder is a digital circuit designed to
perform approximate multiplication with reduced
hardware complexity and power consumption,
[20].The truncated shifter is used to truncate or round
the partial products generated by the multiplier to
reduce the number of bits required for addition.
TSRoBA(Truncated shifter Rounded approximate
multiplier) helps reduce the circuit's overall hardware
complexity and power consumption. The carry-save
adder performs fast and efficient addition of the
partial products. The High-Level synchronous XOR-
MUX full adder uses XOR-MUX gates to implement
the sum and carry logic, respectively, and is designed
to operate synchronously with the clock signal. The
synchronous design of the XOR-MUX full adder
helps to ensure that the outputs are stable and valid
simultaneously, which can improve the overall
performance and reliability of the circuit. The function
of the circuit is to perform approximate multiplication
with reduced hardware complexity and power
consumption. The approximate multiplier may
introduce some errors in the multiplication result.
However, this error can be controlled by adjusting the
truncated shifter's precision and the approximation
level used. The carry-save adder and synchronous
XOR-MUX full adder help to improve the
performance of the circuit by reducing the
propagation delay and improving the power
efficiency, [21].
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS
DOI: 10.37394/23201.2023.22.13
G. Erna, G. Srihari, M. Purna Kishore,
Ashok Nayak B., M. Bharathi
E-ISSN: 2224-266X
115
Volume 22, 2023
Fig. 3: The proposed TSRoBA (Rounded ShiftedRounded-Based Approximated Multiplier) overall architecture
5.1 An Implementation of Truncated RoBA
Multiplier
This study aimed to create a truncated carry-save
Adder RoBA multiplier that could be used for both
signed and unsigned operations. The proposed
TSRoBA is tested using three multiplierstruncated,
automatic, and rounded or nearest value-based
Multiplier. Initially, the sign of the inputs is
determined, and both values are observed as negative
values, resulting in the generation of absolute values.
In the next step, the nearest value block calculates the
closest value in terms of absolute value in 2n. The bit
length of the output block is determined to be n. This
means that the absolute value of n-b operations with a
format of 0 as 2's complement is the most crucial bit
(MSB). The following equation is calculated for
output bit using rounding block to find the closest
value for input A and B. Instead of A and B operands,
consider M operands and approximate is Mr in the
planned architecture, [22].
Mr[3] = (M[3] · M[2] · M[1] + M[3] · M[2]) · n 1
i=4 M[i]
(1)
Mr[2] = M[2] · M[1] · n −1 i=3 M[i] (2)
Mr[1] = X[1] · n −1 i=2 M[i] (3)
Mr[0] = X[0] · n −1 i=1 M[i]. (4)
[ 1]=[ 1]
. [ 2] . [ 3]+
[ 1] . [ 2]
(5)
[ 2]=[ 2]
[ 3].[ 4]+
[ 2].[ 3]
. [ 1] (6)
[]=
[]
. [ 1] . [ 2]+
[].[ 1]
) . []
1
=+1 (7)
[3]=
[3]
. [2] . +[3].[2]
. []
1
=4 (8)
[2]=[2].[1]
. []
1
=3 (9)
[1] . []
1
=2 (10)
[2]=[0][]
1
=1 (11)
Based on t he above-stated equation (1) - (11),
Xr[i] incorporates two cases that are stated as follows:
Case 1: If M[i] = 1 then all bits in left are 0, while
X[i-1] = 0
Case 2: If M[i] = 0 then all bits in left are 0, while
X[i-1] & X[i-2] = 0
To round off the values of the Truncated shifter with
the products calculated using the shifting is estimated
with the computation of logMr (2 − 1 ) or logM(2 − 1)
with an operand of M (or N). Here, the width of the
input bit is estimated with shifter block n, where
output is computed as 2n. However, it provides a
method for higher fan-out value with usable minimal
operand width, [23].
5.2 Full Adder Designs using Xor-Mux Gates
Logic
The standard full adder in the reduction step of the
truncated RoBA multiplier is replaced by a ch anged
full adder to reduce the power and area. The
Truncated shifter RoBA, as shown in Figure 3, which
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS
DOI: 10.37394/23201.2023.22.13
G. Erna, G. Srihari, M. Purna Kishore,
Ashok Nayak B., M. Bharathi
E-ISSN: 2224-266X
116
Volume 22, 2023
comprises half of the right part, was discarded.
TSRoBA multiplier power reduction was
accomplished using an XOR MUX-based complete
adder in the reduction process. It is noticeable that 2:1
MUX combined with full adder incorporated with
carrying save adder due to this effect,The path delay
produced is shown in equation (12). Two stages of
XOR gates handle the sum logic, while two stages of
XOR gates handle the summed logic in the complete
adder architecture. Figure 4 shows a complete adder
design with two stages of XOR gates for the
cumulative logic. As a result, the power dissipation
caused by a sh ort circuit can be prevented.
Furthermore, changing behaviour is reduced since the
key inputs directly activate all internal nodes.
However, this power rate can be reduced at the
expense of increased area, [24]. The Truncated shifter
and XOR-MUX full adder were designed in the
TSRoBA. So, these techniques can be used to trade
accuracy for reduced power consumption or area in
approximate multipliers comparatively with existing
methods. However, let me explain how XOR gates
and multiplexers might be used in approximate
multipliers.
Fig. 4: XOR-MUX with Full adder circuit design is
illustrated
Delay=NOT+2MUX (12)
5.3 Reconfiguration of Truncated Shifter
RoBA
Truncated multiplication is an efficient method to
minimize the hardware requirements of rounded
parallel multipliers. Only the n + k most meaningful
columns of the multiplication matrix are used to
compute the product in truncated multiplication. The
error caused by ignoring the n-k most miniature
columns and rounding the multiplication result to n
bits is determined, and Figure 3 determines the error.
A constant is added to columns n- 1 to n- k of the
multiplication matrix with the constant correction
Truncated shifter Multiplier presented in Figure 3, and
the simulation response is shown in Figure 9. The
outcome of the simulation is shown in Figure 8. In
this scenario, ordinary multiplication, M=68, and
N=104, which produces the product output=7072,
existing Rounding approximated multiplier (RoBA
existing multiplication), Mr=68 and Nr=104, then
approximate output is 7168. Truncated SHIFTER
RoBA output is 38 as per the least significant value
discarded in the planned architecture shown in Figure
3, so the truncated architecture indicates a reduction in
the relative errors as p er the formula. The overall
relative error was given in the existing RoBA and
planned Truncated RoBA as per the model simulator
results, which is error relative analysis is, [25].
[(Mr*Nr) Appxomate-
(M*N)exact*100]/[(M*N)Exact
[7168-7072]*100/7072=1.35%
Calculated the error in the Truncated (RoBA)
multiplication output is 38
[(Mr]*Nr)Truncated,approximate–
(M*N)exacted*100/Exact value
[38-7072]*100/7072= -99.99%
The constant helps compensate for the reduction
error made by omitting the n- k lowest significant
rows and the error caused by rounding the product to
n bits (called rounding error). E-total is the
conditional probability of the sum of these errors.
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS
DOI: 10.37394/23201.2023.22.13
G. Erna, G. Srihari, M. Purna Kishore,
Ashok Nayak B., M. Bharathi
E-ISSN: 2224-266X
117
Volume 22, 2023
Fig. 5:
Truncated Rounded Multiplication
The significant benefits of the truncated shifter
(Figure 5) RoBA multiplier are stated as follows, [26].
1. It provides a significant multiplier design for both
signed and unsigned operation
2. Minimal logic size
3. Minimal Power and delay
4. Trimmed partial products, the approximated
multiplier
5. Several hardware logic primitives are available in
this research
6. Improve the performance.
5.4 Truncated Shifter Integrated With Carry
Save Adder
Due to many similar critical operations, increasing the
operand efficiency of the structure by transistor tuning
productivity incurs substantial costs. When the output
carry bits are passed side to side slightly downward
instead of just to the right, as shown in Figure 6, the
multiplication result remains unchanged. We are made
up of including an extra adder called a feature adder to
achieve the desired result. The resulting multiplier is a
take multiplier because the carry bits are not quickly
posted but "stored" for another adder stage. A quick
carry-propagate (e.g., bring) adder stage combines
carries and sums in the final step despite a slight
inclination, [27].
 = + ( 1) + (13)
Fig. 6: Architecture of CSA (CARRY SAVE ADDER)
The digital carry-save adder is used in the design
of ALU for processor units. It necessitated the
creation of a new design to increase computation
speed. Adders and multipliers are used in the CMOS
architecture, which also includes high-performance
processors. The created carry-save adder differed
from digital adders in that it was used in the
computation process to summate three or more binary
numbers with n-bit. In the same way, output values
are obtained as a sum and carry bit sequences. The
LSB part was eliminated involved in the carry-save
adder, which comprises two lookup tables: one for the
carry-out bit and the other for the sum bit. The FPGA
implementation of CSA uses fewer logic gates with
effective XOR-MUX complete full adder design for
carrying propagate addition (CPA) with minimum
LUT implementation, [28].
5.5 Carry Save Adder With Logic Gate
CSA stands for Carry Save Adder, a digital logic
circuit that can quickly add multiple numbers. It is
commonly used in high-speed arithmetic circuits, such
as multipliers and digital signal processors.
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS
DOI: 10.37394/23201.2023.22.13
G. Erna, G. Srihari, M. Purna Kishore,
Ashok Nayak B., M. Bharathi
E-ISSN: 2224-266X
118
Volume 22, 2023
Fig. 7: Addition operation of CSA with Full Adder
The operation of a CSA gate involves three main
steps:
1. Carry Save Operation: In this step, input numbers A
and B are added together without Considering any
carry from the previous bit position. The sum output S
and carry output C are generated.
2. Cascaded Carry Save Operation: In this step,
multiple CSA gates are cascaded together toAddmore
than two numbers. The output of each CSA gate is
connected to the input of thenext gate. TSRoBA
allows the carry outputs of each CSA gate to be saved
and used in the final addition step.
3.Final Addition: In this step, the saved carry outputs
are added to the sum outputs of the lastCSA gate.
Carry save adder generates the final result of the
addition operation. The CSA Gatecan be implemented
using various logic circuits, such as full adders and
XOR gates.The specific Implementation depends on
the desired performance and application requirements,
and the addition operation appears in Figure 8, [ 29].
The addition operation of CSA with Full Adder is
presented in Figure 7.
5.6 The Ex-Or Mux Subtractor
In a truncated shifter rounding-based signed binary
number approximated multiplier, the EX-OR MUX
SUB TRACTOR is used to multiply two binary
numbers with approximate accuracy using a
combination of shifts, additions, and subtraction. The
EX-OR MUX SUB TRACTOR is used to implement
the third option, which involves selecting the original
value of the partial product rather than shifting or
adding/subtracting it. The truncated shifter is
accomplished by using an exclusive-OR (EX-OR)
gate to select between the original partial product and
its two complements (i.e., negation), based on
rounding the operands M and Mr. The resulting
products from each group are then added together,
with intermediate results being rounded and truncated
as necessary to reduce the number of bits and improve
performance. The overall accuracy of the multiplier is
determined by the size and distribution of the partial
products and the accuracy of the rounding and
truncation operations. The sign set block's primary
function is to adjust the sign of the final multiplication
result. When the obtained sign (from the symbol
extractor) for the two input values changes, the
subtractor's output modifications, [30], [31].
5.7 Results and Discussion
The proposed TSRoBA has 0 slices with a t otal
availability of 11,440. The total number of LUTs used
is 370, w ith 365 logics and an availability value of
5,720. According to the utilization review, proposed
RoBA registers have a utilization of 1%, slices LUTs
have a u tilization of 6%, and logic slices have a
utilization of 6%. The proposed RoBA multiplier
design layout can be seen in Figure 3. Shifter, adder,
subtractor, sign detector, sign collection, and rounding
are all parts of the proposed architecture. The study of
the second parameter is power utilization. The
proposed RoBA method consumes 0.014 w atts of
electricity, implying that the proposed solution
substantially decreases total energy consumption in
high-power applications.
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS
DOI: 10.37394/23201.2023.22.13
G. Erna, G. Srihari, M. Purna Kishore,
Ashok Nayak B., M. Bharathi
E-ISSN: 2224-266X
119
Volume 22, 2023
6 Hardware Implementation
Fig. 8: RTL schematic for TruncatedShifter Rounded Based Approximated Multiplier
Fig. 9: Simulation outcome of Truncated shifter RoBA multiplier
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS
DOI: 10.37394/23201.2023.22.13
G. Erna, G. Srihari, M. Purna Kishore,
Ashok Nayak B., M. Bharathi
E-ISSN: 2224-266X
120
Volume 22, 2023
Fig. 10: Power consumption of Truncated shifter RoBA multiplier
Table 1. Comparisons between Barrel shifter with Brent Kung adder and Truncated shifter multiplier with carry
save adder in RoBA.
Parameters
Existing ROBA Multiplier (Barrel Shifter
Multiplier -Brent Kung Adder), [16]
Proposed ROBA Multiplier
(Truncated Shifter-carry save
adder)
Reduction (%)
Slice Register
62
0
100
LUT's
370
243
34
Occupied Slices
142
102
28
IOB 32 24 25
Delays (ns)
52.093
35.311
32
Power (mW)
0.015
0.014
1
Error Analysis
1.35
-99.99
75
The proposed RoBA architecture effectively
reduces the amount of power and energy used. The
overall description of the proposed TSRoBA
architecture is shown in Figure 3. The proposed
RoBA consumes a minimum of 1440 by tes of
memory without being used. The number of slices
occupied is 102, w ith a total supply of 1,430,
indicating a 7 percent utilization rate. The number of
processed unused flip-flops used is 243, w ith a total
consumption value of 243, meaning that the entire
process is used. The overall latency was reduced due
to eliminating the right-hand partial products.
The proposed TSRoBA has a lower gate and net
delay LUT according to the implementation results.
All LUTs in the planned TSRoBA have a considerable
gate delay value of 0.254ns, and the net delay values
vary slightly between LUTs. The TSRoBA
demonstrated that the proposed RoBA output
significantly reduces delay and power consumption.
The overall description of the findings obtained is
given in Table 1. The results showed that the proposed
LUTs aresubstantially less than the current RoBA,
with a reduction of 34%. The proposed RoBA has 102
occupied slices, while the current RoBA has 142, with
a 28 percent reduction rate, and slice registers are
almost all reduced in innovation truncated RoBA, as
shown in Table 1.
Table 1 describes the barrel shifter used in a digital
circuit that can shift its multiple data inputs by a
specified number of positions in a single clock cycle.
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS
DOI: 10.37394/23201.2023.22.13
G. Erna, G. Srihari, M. Purna Kishore,
Ashok Nayak B., M. Bharathi
E-ISSN: 2224-266X
121
Volume 22, 2023
Fig. 11: Comparison chart between existing and proposed RoBA
It consumes more power dissipation because its
operation depends on a combinational circuit. It is
used in the processor multiplication circuits. Brent-
Kung adder is the parallel adder, which has more
stages than the KS adder, but it consists of fewer
lookup tables and drives the output, which takes a
long time. A truncated multiplier is a multiplier that
discards the least significant bits of the product,
providing a truncated unit result.
Truncated shifters are typically used in arithmetic,
where only a limited number of bits are needed for the
fractional part of the result. Carry-save adders are
used in the early stages of multiplier circuits to
compute partial products efficiently, and they offer the
add-subtract operations more bit sizers in estimated
multipliers. TSRoBA multiplier might consist of a
truncated multiplier, barrel shifter, and carry-save
adder design with XOR-MUX full adder to optimize,
so reduced dynamic power and lesser logic sizes are
available in TSRoBA (Truncated Shifter Rounding
Based approximated Multiplier).
Similarly, for the present or traditional and
proposed RoBA, the delay and power are decreased
by 32% and 53%, respectively. The current RoBA has
a delay of 52.093ns, while the planned RoBA has a
nominal delay of 35.311 ns, a 32 percent reduction in
delay. The power measurement for the current RoBA
is calculated to be 0.030mW, while the planned RoBA
offers a power level of 0.014mW with a 53 p ercent
reduction in power consumption. The results showed
that the proposed TSRoBA performed significantly
better than the current RoBA, and the comparison
chart is shown in Figure 10. A comparison chart
between existing and proposed RoBA is presented in
Figure 11.
7 Conclusion
In this work, we introduced the TSRoBA multiplier,
an energy-efficient approximation multiplier that
bypasses the least significant partial product gates.
This proposal implements an exclusive Truncated
shifter Rounding Method multiplier with a high-level
Parallel XOR-MUX adder to minimize power
consumption and multiplier configuration logic gates.
The logical settings are reduced when using this
truncated ROBA multiplier with XOR MUX adder in
the arithmetic and shifting operation design. This
work aims to reconsider the rounding multiplier,
focusing on f ield approximation and capacity
reduction. Round parallel multipliers' power
dissipation and area can be effectively decreased by
using truncated shifter RoBA multiplication. The
synthesis results show that when using an average
XOR-MUX full adder instead of the current RoBA
multiplier, TSRoBA (Truncated shifter Rounding
approximated multiplier) reduces delay by 32%, slice
register by 100%, and relative error by 75% for
operand sizes of 8 bits. Additionally, the effectiveness
of the presented approximate multiplication method
has been investigated in two image processing
applications: smoothness and brightening. The
0
50
100
150
200
250
300
350
400
Existing ROBA Multiplier
(Normal Shifter Multiplier -
Brent Kung Adder)
Proposed ROBA Multiplier
(Truncated Shifter-carry save
adder)
Reduction (%)
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS
DOI: 10.37394/23201.2023.22.13
G. Erna, G. Srihari, M. Purna Kishore,
Ashok Nayak B., M. Bharathi
E-ISSN: 2224-266X
122
Volume 22, 2023
comparison showed that the picture parameters
matched those of accurate multiplication techniques.
Acknowledgements We thank the college
management and principal for their continuous
support to do the research in the domain of VLSI in
signal processing, at PACE Institute of Technology &
Sciences- Ongole AP in IndiaAuthors’ contributionto
this innovation reduced the least significant columns
are eliminated to minimize primitive gate size. So, this
truncated shifter RoBA multiplier is used in DCT and
DWT to enhance the images sharpen the image pixels
and at the same time compress the data in JPEG and
MPEG.
Funding applicable, beforesemi customized ASIC,
we implemented prototype Truncated RoBA used in
the Digital Signal Processing operations required in
the MAC and image compressions in the Decreate
Cosine Transform. If is it validated using model
simulation waveform then we go to netlist in the
physical design using Cadence Tool Coding
Accessibility Users can find the necessary schematic
right here.
References
[1] Mahalingam, V., & Ranganathan, N. (2006).
Improving accuracy in Mitchell's logarithmic
multiplication usingoperand
decomposition. IEEE Transactions
onComputers, 55(12), pp.1523-1535.
[2] Kelly, D., Phillips, B., & Al-Sarawi, S. (2009).
Approximate signed binary integer multipliers
for arithmetic data value speculation
[3] Schulte, M. J., & Swartzlander, E. E. (1993,
October). Truncated multiplication with
correction constant for [DSP]. In Proceedings of
IEEE workshop on VLSI signal processing (pp.
388-396). IEEE.
[4] Han, J., &Orshansky, M. (2013, May).
Approximate computing: An emerging paradigm
Forenergy-efficient design. In 2013 18th IEEE
European Test Symposium (ETS) (pp. 1-6).
IEEE.
[5] Momeni, A., Han, J., Montuschi, P., &
Lombardi, F. (2014). Design and analysis
ofapproximate compressors for
multiplication. IEEE Transactions on
Computers, 64(4), pp.984- 994.
[6] Narayana Moorthy, S., Moghaddam, H. A., Liu,
Z., Park, T., & Kim, N. S. (2014)Energy-
efficient approximate multiplication for digital
signal processing and
satisfactionapplications. IEEE transactions on
very large scale integration (VLSI)
systems, 23(6), 180-1184.
[7] Muruges wari, S., & Mohideen, S. K. (2014,
July). Design of area efficient and lowpower
multipliers using multiplexer based full adder.
In Second International Conference on Current
Trends In Engineering and Technology-ICCTET
2014 (pp. 388-392). IEEE.
[8] Schulte, M. J., & Swartzlander, E. E. (1993,
October). Truncated multiplication with
correction constant [for DSP]. In Proceedings of
IEEE workshop on VLSI signal processing (pp.
388-396). IEEE.
[9] Huang, Y. H., Ma, H. P., Liou, M. L., &Chiueh,
T. D. (2004). A 1.1 G MAC/s sub-word-parallel
digital signal processor for wireless
communication applications. IEEE Journalof
Solid-State Circuits, 39(1), 169-183.
[10] Radhakrishnan, P., &Themozhi, G. (2020).
FPGA implementation of XOR-MUX fulladder
based DWT for signal processing
applications. Microprocessors and
Microsystems, 73, 102961.
[11] Chen, K. H., &Chiueh, T. D. (2003, May).
Design and implementation of a reconfigurable
FIR filter. In 2003 IEEE International
Symposium on Circuits and Systems
(ISCAS) (Vol. 4, pp. IV-IV). IEEE.
[12] Umasankar, A., Vasudevan, N.,
&Kirubanandasarathy, N. (2015).Area Efficient
and LowPower Reconfiurable Fir
Filter. International Journal of Computer
Science and Network Security (IJCSNS), 15(8),
50.
[13] Mohanty, B. K., Meher, P. K., Singhal, S. K.,
&Swamy, M. N. S. (2016). A high-performance
VLSI architecture for reconfigurable FIR using
distributedarithmetic. Integration, 54, 37-46.
[14] de la Guia Solaz, M., Han, W., & Conway, R.
(2012). A flexible low power DSP with a
programmable truncated multiplier. IEEE
Transactions on Circuits and Systems I: Regular
Papers, 59(11), 2555-2568.
[15] Liu, C., Han, J., & Lombardi, F. (2014, March).
A low-power, high-performance approximate
multiplier with configurable partial error
recovery. In 2014 Design, Automation & Test in
Europe Conference & Exhibition (DATE) (pp. 1-
4). IEEE.
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS
DOI: 10.37394/23201.2023.22.13
G. Erna, G. Srihari, M. Purna Kishore,
Ashok Nayak B., M. Bharathi
E-ISSN: 2224-266X
123
Volume 22, 2023
[16] Zendegani, R., Kamal, M., Bahadori, M., Afzali-
Kusha, A., & Pedram, M. (2016). RoBA
multiplier: A rounding-based approximate
multiplier for high-speed yet energy-efficient
digital signal processing. IEEE Transactions on
Very Large Scale Integration
(VLSI)Systems, 25(2), 393-401.
[17] Manogna, M. (2020). RoBA Multiplier: A
Rounding-Based Approximate Multiplier for
High-Speed yet Energy-Efficient Digital Signal
Processing (No. 4801). EasyChair.
[18] Mittal, A., Nandi, A., & Yadav, D. (2017).
Comparative study of 16order FIR filter design
using different multiplication techniques. IET
Circuits, Devices & Systems, 11(3), 196-200.
[19] de la Guia Solaz, M., & Conway, R. (2014).
Razor based programmable truncated multiply
and accumulate, energy-reduction for efficient
digital signal processing. IEEE Transactions on
Very Large Scale Integration (VLSI)
Systems, 23(1), 189-193.
[20] Balasubramanian, P., & Mastorakis, N. E.
(2009). High speed gate level synchronous full
adder designs. WSEAS Transactions on Circuits
and Systems, 8(2), 290-300.
[21] Erna, G., Saidulu, V., Srihari, G., & Rao, K. B.
(2024). FPGA Implementation of Adaptive
Hold Logic Vedic Fused Dot Product Floating-
Point Multiplier Using Razor Flip-
Flop. International Journal of Intelligent
Systems and Applications in
Engineering, 12(2s), 420-434.
[22] Kuang, S. R., & Wang, J. P. (2007). Design of
power-efficient pipelined truncated multipliers
with various output precision. IET Computers &
Digital Techniques, 1(2), 129-136.
[23] Osta, M., Ibrahim, A., Chible, H., & Valle, M.
(2017, September). Approximate multipliers
based on inexact adders for energy efficient data
processing. In 2017 New Generation of CAS
(NGCAS) (pp. 125-128). IEEE.
[24] Kumudha, G., & Ramesh, K. B. (2022). High
Speed Gate Level Synchronous Full Adder
Designs. Journal of VLSI Design and its
Advancement, 4(3).
[25] Suhasini, P., & Selvakumar, D. J. (2018).
Modified Rounding Based Approximate
Multiplier (MROBA) and MAC Unit Design for
Digital Signal Processing. International Journal
of Pure and Applied Mathematics, 118(18),
1539-1545.
[26] Swartzlander, E. E. (1999, October). Truncated
multiplication with approximate rounding.
In Conference Record of the Thirty-Third
Asilomar Conference on Signals, Systems, and
Computers (Cat. No. CH37020) (Vol. 2, pp.
1480-1483). IEEE.
[27] Schulte, M. J., Stine, J. E., & Jansen, J. G.
(1999, March). Reduced power dissipation
through truncated multiplication. In Proceedings
IEEE Alessandro Volta MemorialWorkshop on
Low-Power Design (pp. 61-69). IEEE.
[28] Erle, M. A., Schulte, M. J., & Hickmann, B. J.
(2007, June). Decimal floating-point
multiplication via carry-save addition. In 18th
IEEE Symposium on Computer Arithmetic
(ARITH'07) (pp. 46-55). IEEE.
[29] Koc, C. K., & Hung, C. Y. (1990). Multi
operand modulo addition using carry save
adders. Electronics Letters, 26(6), 361-363.
[30] Alioto, M., & Palumbo, G. (2000). Modeling
and optimized design of current mode
MUX/XOR and D f lip-flop. IEEE Transactions
on Circuits and Systems II: Analog and Digital
Signal Processing, 47(5), 452-461.
[31] Kumudha, G., & Ramesh, K. B. (2022). High
Speed Gate Level Synchronous Full Adder
Designs. Journal of VLSI Design and its
Advancement, 4(3).
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS
DOI: 10.37394/23201.2023.22.13
G. Erna, G. Srihari, M. Purna Kishore,
Ashok Nayak B., M. Bharathi
E-ISSN: 2224-266X
124
Volume 22, 2023
Contribution of Individual Authors to the Creation
of a Scientific Article (Ghostwriting Policy)
- G. ERNA
Conceptualization: G. ERNA contributed the
problem statements, and problems in fixed
multipliers; it multiplied only unsigned operands,
occupied more partial products, and faced more
switching activities. So, Idecided to research
innovative ideas by searching in approximated
multipliers with new specifications of architecture.
Effectively finalized with truncated shifter rounding
multiplier with suitable adder.
Focusing: Writing the RTL code in Virology HDL
for the truncated shifter.
Validation: I conducted extensive simulations and
validation experiments to ensure the simulation
results had a lower error rate in digital signal
processing.
- G. SRIHARI
G. SRIHARI chosen the Xilinx 14.7 Software: He
synthesized and implemented the performance of a
truncated shifter.
Methodology: He studied fixed and inaccurate
multipliers for image enhancement in recently
estimated articles with possible rounding
techniques. To reduce power dissipation in parallel
multipliers, concentrated on the problem analysis
and be responsible for designing and formulating
the methodologies employed.
- M. PURNA KISHORE
Hardware Implementation: PURNA KISHORE
used Vertex-5-5 FPGA implementation and
synthesis design aspects, ensuring the proposed
techniques could be realized in hardware. He Found
the prototype methods to minimize the logic gates
count and save power opportunities with hardware
selections in approximate Multipliers.
Data Analysis: He conducted in-depth data analysis
and interpretation of the simulation results obtained
from the experiments using model simulation.
- ASHOK NAYAK.B
ASHOK NAYAK.B contributed the Writing -
Original Draft Preparation.He taken the lead in
drafting the initial manuscript, including the
introduction, methodology, and results
sections.Writing - Review & Editing: Contributed to
reviewing and editing the manuscript, providing
critical input for clarity and coherence.
- M. BHARATI
BHARATI contributed an excellent way to save
circuit space in unsigned and signed designs is to
truncate some LSBs or portions of the PPs of the
input operands. Comparatively reduced magnitude
errors and was used in image processing and
machine learning applications.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
No funding was received for conducting this
experiment.
Conflict of Interest
There is no c onflict of interest between us.
Acceptance of participation. We have decided to share
and publish this research in the journal
Moral
Endorsement. This is a portion of my proposed
study results and is novel with citations to
resources.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en_
US
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS
DOI: 10.37394/23201.2023.22.13
G. Erna, G. Srihari, M. Purna Kishore,
Ashok Nayak B., M. Bharathi
E-ISSN: 2224-266X
125
Volume 22, 2023