Design of Modified Booth’s Encoder Using SPST technique
1VASUDEVA.G, 2BHARATHI GURURAJ
1Department of Ece, Dayananda Sagar Academy of Technology and Management, Bangalore-74,
Karnataka, INDIA
2Department of Ece, Acs College of Engineering, Bangalore, Karnataka, INDIA
Abstract: This paper presents the design and implementation of signed-unsigned Modified Booth Encoding
(SUMBE) multiplier. The present Modified Booth Encoding (MBE) multiplier and the Baugh-Wooley
multiplier perform multiplication operation on signed numbers only. Therefore, this paper presents the design
and implementation of SUMBE multiplier. The modified Booth Encoder circuit generates half the partial
products in parallel. By extending sign bit of the operands and generating an additional partial product the
SUMBE multiplier is obtained. The Carry Save Adder (CSA) tree and the final Carry Look ahead (CLA) adder
used to speed up the multiplier operation. Since signed and unsigned multiplication operation is performed by
the same multiplier unit the required hardware and the chip area reduces and this in turn reduces power
dissipation and cost of a system. The proposed radix-2 modified Booth algorithm MAC with SPST gives a
factor of 5 less delay and 7% less power consumption as compared to array MAC.
Keywords: CLA, CSA, SUMBE, Booth Encoder
Received: October 23, 2022. Revised: April 28, 2023. Accepted: June 11, 2023. Published: July 18, 2023.
1. Introduction
1.1 Common Features of Multipliers:
1.1.1 Counterflow Organization:
A novel multiplier organization is introduced, in
which the data bits flow in one direction, and the
Booth commands [1] are piggybacked on the
acknowledgments flowing in the opposite direction.
1.1.2 Merged Arithmetic/Shifter Unit:
An architectural optimization is introduced that
merges the arithmetic operations and the shift
operation into the same function unit, thereby
obtaining significant improvement in area, energy
and speed[1].
1.1.3 Overlapped Execution:
The entire design is pipelined at the bit-
level, which allows overlapped execution of
Proceedings of multiple iterations of the Booth
algorithm, including across successive
multiplications. As a result, both the cycle time per
Booth iteration, as well as the overall cycle time per
multiplication are significantly improved[2].
1.1.4 Modular Design:
The design is quite modular, which allows the
implementation to be scaled to arbitrary operand
widths[2] without the need for gate resizing, and
without incurring any overhead on iteration time.
1.1.5 Precision-Energy Trade-Off:
Finally, the architecture can be easily modified to
allow dynamic specification of operand widths, i.e.,
successive operations of a given multiplier
implementation could operate upon different word
length[4].
A new architecture of multiplier and accumulator
(MAC) for high-speed arithmetic. By combining
multiplication with accumulation was and devising
a hybrid type of carry save adder (CSA), the
performance was improved. Since the accumulator
that has the largest delay in MAC was merged into
CSA, the overall performance elevated. The
proposing method CSA tree uses 1’s-complement-
based radix-2 modified Booth’s algorithm (MBA)
and has the modified array for the sign extension in
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2023.5.9
Vasudeva. G, Bharathi Gururaj
E-ISSN: 2769-2507
73
Volume 5, 2023
order to increase the bit density of the operands. The
proposed MAC showed the superior properties to
the standard design in many ways and performance
twice as much as the previous research in the similar
clock frequency. We expect that the proposed
MAC[6] can be adapted to various fields requiring
high performance such as the signal processing
areas.The advanced digital processors now have
fast bit-parallel multipliers embedded in them.
Multipliers for unsigned numbers are designed
using dizzying array of ways with each method
having its own advantages and tradeoffs. In recent
years, high-speed multipliers play an important role
while designing any architecture and researchers are
still working on many factors to increase the speed
of operation of these basic elements[7]. Algorithms
for designing high-speed multipliers have been
modified and developed for better efficiency.
The increased complexity of various applications,
demands not only faster multiplier chips but also
smarter and efficient multiplying algorithms that
can be implemented in the chips[8]. It is up to the
need of the hour and the application on to which the
multiplier is implemented and what tradeoffs need
to be considered. Generally, the efficiency of the
multipliers is classified based on the variation in
speed, area and configuration. Due to rapidly
growing system-on-chip industry, not only the
faster units but also smaller area and less power has
become a major concern for designing very large
scale integration (VLSI) circuits[11]. Digital
circuits make use of digital arithmetic’s. Among
various arithmetic operations, multiplication is one
of the fundamental operation used and is being
performed by an added.
There are many ways to build a multiplier each
providing trade-off between delays and other
characteristics[15], such as area and energy
dissipation The objective of a good multiplier and
accumulator (MAC)[24] is to provide a physically
compact, good speed and low power consuming
chip. To save significant power consumption of a
VLSI design[16], it is a good direction to reduce its
dynamic power that is the major part of total power
dissipation.
This paper proposes a high speed MAC
adopting the new Spurious Power Suppression
Technique (SPST) implementing approach This
multiplier and accumulator is designed by
equipping the SPST on a modified Booth encoder
which is controlled by a detection unit using an
AND gate. The modified booth encoder will reduce
the number of partial products generated by a factor
of 2. The SPST adder will avoid the unwanted
addition and thus minimize the switching power
dissipation.[17]
2. Related Work
The number and variety of products that include
some form of digital signal processing has grown
dramatically over the last years[25]. DSP has
become a key component in many consumers,
communications, medical, and industrial products.
These products use a variety of hardware
approaches[26] to implement DSP, ranging from
the use of off-the-shelf microprocessors to field-
programmable gate arrays (FPGAs) to custom
integrated circuits (ICs). Programmable “DSP
processors,” a class of microprocessors optimized
for DSP[12], are a popular solution for several
reasons. In comparison to fixed-function solutions,
they have the advantage of potentially being
reprogrammed in the field, allowing product
upgrades or fixes. They are often more cost-
effective (and less risky) than custom hardware,
particularly for low-volume applications, where the
development cost of custom ICs[13] may be
prohibitive. And in comparison to other types of
microprocessors, DSP processors often have an
advantage in terms of speed, cost, and energy
efficiency [1].
. There are many implementation media available
for signal processing. These implementations vary
in terms of programmability from fixed-
functionality hardware like ASIC’s [9] to fully
programmable like general-purpose processors.
The emergence of the new architecture[5], which
offers the same computational attributes as fixed-
functionality architectures in a package that can be
customized in the field, is driven by a need for real-
time performance within the given operational
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2023.5.9
Vasudeva. G, Bharathi Gururaj
E-ISSN: 2769-2507
74
Volume 5, 2023
parameters of a target system[14] and a need to
adapt to changing data sets, computing conditions,
and execution environments of DSP applications. In
this paper we used three main ideas VHDL,
architecture pipelining, and implementation of
FPGAs. More details on FPGAs can be found in
[2]This design has been simulated using the
modelsim software then implemented on FPGA.
Table 1. Recoding of bits using Modified Booths Encoder
Xi+1
Xi
Xi-1
0
0
0
0 × Y
0
0
1
1 × Y
0
1
0
1 × Y
0
1
1
2 × Y
1
0
0
-2 × Y
1
0
1
-1 × Y
1
1
0
-1 × Y
1
1
1
0 × Y
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2023.5.9
Vasudeva. G, Bharathi Gururaj
E-ISSN: 2769-2507
75
Volume 5, 2023
3. Modified Booth Encoder
In order to achieve high-speed, multiplication
algorithms using parallel counters, such as the
modified Booth algorithm[18] has been proposed,
and some multipliers based on the algorithms have
been implemented for practical use. This type of
multiplier operates much faster than an array
multiplier for longer operands because its
computation time is proportional to the logarithm of
the word length of operands[19]. Booth
multiplication is a technique that allows for To
Booth recode the multiplier term,[20] we consider
the bits in blocks of three, such that each block
overlaps the previous block by one bit. Grouping
starts from the LSB, and the first block only uses
two bits of the multiplier. Figure 1 shows the
grouping of bits from the multiplier term for use in
modified booth encoding.
Figure 1.Recoding of bits
0
0
0
0
0
1
0
0
0
8
×
0
0
0
0
1
0
1
0
0
20
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 × Y
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1 × Y
+ 0
0
0
0
0
0
0
0
1
0
0
0
1 × Y
0
0
0
0
0
0
0
0
0
0
0 × Y
0
0
0
0
0
0
0
0
0 × Y
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
160
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2023.5.9
Vasudeva. G, Bharathi Gururaj
E-ISSN: 2769-2507
76
Volume 5, 2023
Figure 2 Example for Modified Booth Encoder
4. Proposed Low Power High
Performance Multiplier and
Accumulator
The proposed high speed low power
multiplier is designed by equipping the SPST on a
tree multiplier. There are two distinguishing design
considerations in designing the proposed multiplier
as listed in the following: Applying the SPST on the
Modified Booth Encoder Figure 8 shows a
computing example of Booth multiplying two
numbers “2AC9” and “006A”. The shadow denotes
that the numbers in this part of Booth multiplication
are all zero so that this part of the computations can
be neglected. Saving those computations can
significantly reduce the power consumption caused
by the transient signals. According to the analysis
of the multiplication shown in figure 8, we propose
the SPST-equipped modified-Booth encoder, which
is controlled by a detection unit. The detection unit
has one of the two operands as its input to decide
whether the Booth encoder calculates redundant
computations as shown in Figure 9. The latches can,
respectively, freeze the inputs of MUX-4 to MUX-7
or only those of MUX- 6 to MUX-7 when the PP4 to
PP7 or the PP6 to PP7 are zero; to reduce the
transition power dissipation. Figure 10. Shows the
booth partial product generation circuit. It includes
AND/OR/ EX-OR logic.The former SPST has been
discussed in [9] and [10]. Figure 4 shows the five
cases of a 16-bit addition in which the spurious
switching activities occur. The 1st case illustrates a
transient state in which the spurious transitions of
carry signals occur in the MSP though the final
result of the MSP are unchanged[21]. The 2nd and
the 3rd cases describe the situations of one negative
operand adding another positive operand without
and with carry from LSP, respectively. Moreover,
the 4th and the 5th cases respectively demonstrate
the addition of two negative operands without and
with carry-in from LSP[22]. In those cases, the
results of the MSP are predictable Therefore the
computations in the MSP are useless and can be
neglected. The data are separated into the Most
Significant Part (MSP) and the Least Significant
Part (LSP)[23].
The SPST uses a detection logic circuit to detect the
effective data range of arithmetic units, e.g., adders
or multiplier[24]s. When a portion of data does not
affect the final computing results, the data
controlling circuits of the SPST latch this portion to
avoid useless data transitions occurring inside the
arithmetic units. Besides, there is a data asserting
control realized by using registers to further filter
out the useless spurious signals of arithmetic unit
every time when the latched portion is being turned
on. This asserting control brings evident power
reduction. Figure 5 shows the design of low power
adder/subtract with SPST.
The adder /subtract is divided into two parts, the
most significant part (MSP) and the least significant
part (LSP). The MSP of the original adder/subtract
is modified to include detection logic circuits, data
controlling circuits, sign extension circuits, logics
for calculating carry in and carry out signals. The
most important part of this study is the design of the
control signal asserting circuits, denoted as
asserting circuits in Figure 2. Although this
asserting circuit brings evident power reduction, it
may induce additional delay. There are two
implementing approaches for the control signal
assertion circuits. The first implementing approach
of control signal assertion circuit is using registers.
This is illustrated in Figure 6. The three output
signals of the detection logic are close, Carr_ctrl,
sign. The restriction that must be greater than to
guarantee the registers from latching the wrong
values of control usually decreases the overall
speed of the applied designs
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2023.5.9
Vasudeva. G, Bharathi Gururaj
E-ISSN: 2769-2507
77
Volume 5, 2023
Figure 4.Booths Encoder using LSP and MSP adder
Figure 5: Illustration of Multiplication using Modified Booth
Encoding
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2023.5.9
Vasudeva. G, Bharathi Gururaj
E-ISSN: 2769-2507
78
Volume 5, 2023
Figure 6 Booth Partial Product Selector Logic
Figure 7.: SPST Equipped Modified Booth encoder
A. Applying the SPST on the Compression Tree The proposed SPST -equipped multiplier is
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2023.5.9
Vasudeva. G, Bharathi Gururaj
E-ISSN: 2769-2507
79
Volume 5, 2023
illustrated in figure 11. The PP generator generates
five candidates of the partial products, i.e., {–2A,–A,
0, A, 2A}. These are then selected according to the
Booth encoding results of the operand B. When the
operand besides the Booth encoded one has a small
absolute value, there are opportunities to reduce the
spurious power dissipate Radix-2 modified booth
MAC with SPST performs both multiplication and
accumulation. Multiplication result is obtained by
multiplying multiplicand and multiplier. This
multiplication with SPST module is shown in figure
8.
Figure 8 MAC with SPST Module
The schematic of MAC unit with SPST Module is
obtained using the RTL schematic by Xilinx tool.
The RTL schematic of the MAC with SPST
module consist of a 16 bit input and a 32 bit
Output.
Figure 9. RTL Schematic of MAC Unit.
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2023.5.9
Vasudeva. G, Bharathi Gururaj
E-ISSN: 2769-2507
80
Volume 5, 2023
5. Results and Discussion
The MBE code is written in verilog,synthesized and
simulated using Xilinx and modelsim tools. The
netlist generated during the execution of RC
compiler tool from cadence is given as an input to
the physical design. The portioning process and
floor planning is done by using cadence assura tool.
The W/L ratio is fixed at 0.33 for all the horizontal
and vertical pads and rings and also the stripes. Next
power planning is done. After that placement of the
pads is done in such a way that the aspect ratio is
minimized .Next routing is done for interconnection
between the wires .Finally GSSII IS done and the
ASCII code is generated, which will be further taken
as an input to the fabrication process.
The results of FPGA and ASIC are as shown in
Figures 10 and 11.The fig 10 describes the
waveform i.e the simulation result for an 8x8 bit
MBE. The final 32 bit output is obtained and it is
divided into y0,y1,y2 and y3.The physical design
output is obtained by using the assura tool from
cadence and the netlist is generated from the RC
compiler tool from cadence. Finally we compare
FPGA and ASIC Metholologies.
Figure10.Simulation results for MBE using Modelsim
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2023.5.9
Vasudeva. G, Bharathi Gururaj
E-ISSN: 2769-2507
81
Volume 5, 2023
Figure 11. Final Routing Output
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2023.5.9
Vasudeva. G, Bharathi Gururaj
E-ISSN: 2769-2507
82
Volume 5, 2023
also less as compared to it. It is completely depend on the Algorithm used in both Multipliers.
Table 2 Comparison of Modified booth encoder using FPGA and ASIC
Hence ASIC is preferred when compared to FPGA.
VENDOR
FPGA
ASIC
AREA
62%
16.3%
POWER DISSIPATION
123mw
32763.567nw
DELAY
1.901ns
7950ps
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2023.5.9
Vasudeva. G, Bharathi Gururaj
E-ISSN: 2769-2507
83
Volume 5, 2023
6. Conclusion and Scope for Future
Work
Radix-2 Booth Multiplier is implemented
here; the complete process of the implementation is
giving higher speed of operation. The four cycle of
shifting process including addition and subtraction
is available. Now at the same time RTL Schematic
generated here is giving the comfortable execution
of it. This RTL Schematic can be implemented in
FPGA CPLD kit that will give the proper Output.
Now this RTL Schematic of Radix-2 Booth
Multiplier is compared with implemented RTL
Radix-4 Encoder Booth Multiplier. The Speed and
Circuit Complexity is compared, Radix-4 Booth
Multiplier is giving higher speed as compared to
Radix-2 Booth Multiplier and Circuit Complexity is
The MAC process is coded with VHDL and
synthesized using Xilinx ISE 6.2i. The MAC
process is implemented using xc3s1000-5fg456
FPGA Xilinx device. The synthesis results of the
MAC unit have been calculated as can be seen in
Table2. Here, same FPGA device (part number &
speed grade) with the same design constraints
implied for the synthesis of the MAC unit has been
targeted. This MAC unit is generally preferred for
simpler designs. The experimental test shows that
the results have been validated.
In this project, we propose a high speed low-power
multiplier and accumulator (MAC) adopting the
newSPST implementing approach. This MAC is
designed by equipping the Spurious Power
Suppression Technique (SPST) on a modified Booth
encoder which is controlled by a detection unit using
an AND gate. The modifiedbooth encoder will
reduce the number of partial products generated by
a factor of 2. The SPST adder will avoid the
unwanted addition and thus minimize the switching
power dissipation. The SPST MAC implementation
with AND gates have an extremely high flexibility
on adjusting the data asserting time. This facilitates
the robustness of SPST can attain 30% speed
improvement and 22% power reduction in the
modified booth encoder. This design can be verified
using Modelsim and Xilinx using verilog.
References
[1] A. D. Booth,"A signed binary
multiplication technique," Quarterly Journal of
Mechanics and Applied Mathematics, vol. 2, pp.
236-240, 1951.
[2] O. L. MacSorley, "High-speed arithmetic
on binary computers," IRE Transaction on
Electronic Computers, vol. 49, pp. 67-91, 1961.
[3] A. R. Cooper, "Parallel Architecture
Modified Booth Multiplier", Proceedings of the
Institution of Electrical Engineers, 1989.
[4] J. Fadavi-Ardekani, "M × N booth encoded
multiplier generator using optimized Wallace
trees," IEEE Trans. Very Large Scale Integr.
(VLSI) Syst. , vol. 1, no. 2, pp. 120–125, Jun, 1993.
[5] W. -C. Yeh and C. -W. Jen, "High-speed
booth encoded parallel multiplier design," IEEE
Trans. Comput. , vol. 49, no. 7, pp. 692–701, Jul,
2000.
[6] B Ayman A. Fayed, Magdy A. Bayoumi,
"A Novel Architecture for Low-Power Design of
Parallel Multipliers," vlsi,pp. 0149, IEEE Computer
Society Workshop on VLSI 2001.
[7] Razaidi Hussin, Ali Yeon Md. Shakaff,
Norina Idris1, Zaliman Sauli1, Rizalafande Che
Ismail1 and Afzan Kama An Efficient Modified
Booth Multiplier Architecture, International
Conference on Electronic Design, 2008.
[8] Shiann-Rong Kuang and Jiun-Ping Wang
"Modified Booth Multipliers with a Regular Partial
Product Array" IEEE Transactions on Circuits and
Systems—II Volume. 56, No. 5, 2009.
[9] Soojin Kim and Kyeongsoon Cho, "Design
of High-speed Modified Booth Multipliers
Operating at GHz Ranges," World Academy of
Science, Engineering and Technology, Vol. 7, No.
2, 2010.
[10] S. K. Sahoo and C. Shekhar, "A fast final
adder for a 54-bit parallel multiplier for DSP
application",international journal of electronics,
vol. 98, no. 12, pp. 1625-1638, 2011.
[11] S. K. Sahoo and C. Shekhar,"Delay
Optimized Array Multiplier for Signal and Image
Processing", 2011 International Conference on
Image Information Processing, 2011.
[12] C. Senthilpari "A low power and High
performance Radix- 4 Multiplier design using a
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2023.5.9
Vasudeva. G, Bharathi Gururaj
E-ISSN: 2769-2507
84
Volume 5, 2023
Modified Pass Transistor Logic Technique" IETE
Journel of Research Vol. 57, Issue 2, 2012.
[13] Neil. H. Weste and Kamran Eshraghian,
"Principles of CMOS VLSI Design," Addison-
Wesley, ISBN 0201733897,pp 361-380, 1993.
[14] Kiat-Seng Yeo and Kaushik Roy, "Low-
Voltage,Low- Power VLSI Subsystems" Tata
McGraw-Hill,ISBN-13-978-0- 07-067750-0, pp
119- 146, 2009.
[15] Pratap Kumar Dakua, Anamika Sinha,
Shivdhari & Gourab,“Hardware Implementation of
MAC unit,” International Journal of Electronics
Communication and Computer Engineering, vol. 3,
2012.
[16] M.Jeevitha, R.Muthaiah, P.Swaminathan,
“Review Article: Efficient Multiplier Architecture
in VLSI Design,” Journal of Theoretical and
Applied Information Technology, vol. 38, no. 2,
April 2012.
[17] Ravi Shankar Mishra, Puran Gour, Braj
Bihari Soni, “Design and Implements of Booth and
Robertson’s multipliers algorithm on FPGA,”
International Journal of Engineering Research and
Applications, Vol. 1, Issue 3, pp. 905-910, 2011.
[18] Tung Thanh Hoang, Magnus Själander, Per
Larsson- Edefors, “A High-Speed, Energy-Efficient
Two-Cycle Multiply- Accumulate (MAC)
Architecture and Its Application to a Double-
Throughput MAC Unit,” IEEE transactions on
Circuits & Systems, vol. 57, no. 12, pp. 3073- 3081,
Dec. 2010.
[19] A. Abdelgawad, Magdy Bayoumi, “High
Speed and Area- Efficient Multiply Accumulate
(MAC) Unit for Digital Signal Prossing
Applications,” IEEE International Symposium on
Circuits & Systems , pp. 3199 – 3202, 2007.
[20] Berkeley Design Technology, Inc.,
“Choosing a DSP Processor,” World Wide Web,
http://www.bdti.com/articles/choose_2000.pdf,
2000.
[21] Jennifer Eyre and Jeff Bier, “The Evolution
of DSP Processors”, Berkeley Design
Technology,Inc.,
http://www.bdti.com/articles/evolution.pdf, 2000
[22] G. Lakshmi Narayanan and B.
Venkataramani, “Optimization Techniques for
FPGA-Based Wave Pipelined DSP Blocks”, IEEE
Trans. Very Large Scale Integr. (VLSI) Syst., 13.
No 7. pp 783-792, July 2005.
[23 ] L.Benini, G.D. Micheli, A. Macii, E. Macii,
M. Poncino, and R.Scarsi,“"Glitching Power
Minimization by Selective Gate Freezing”, IEEE
Trans. Very Large Scale Integr. (VLSI) Syst., 8, No.
3, pp. 287-297, June 2000.
[24] S.Henzler, G. Georgakos, J. Berthold, and D.
Schmitt- Landsiedel, “Fastpower-Efficientcircuit-
Block Switch-off Scheme”, Electron. Lett., 40, No.
2, pp. 103-104, Jan. 2004.
[25 ] H. Lee, “A Power-Aware Scalable Pipelined
Booth Multiplier”, In Proc. IEEE Int. SOC Conf.,
2004, pp. 123-126.
[26] J. Choi, J. Jeon, and K. Choi, “Power
Minimization of Functional units by Partially
Guarded Computation”, In Proc. IEEE Int. Symp.
Low Power Electron. Des., 2000, pp. 131-136.
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2023.5.9
Vasudeva. G, Bharathi Gururaj
E-ISSN: 2769-2507
85
Volume 5, 2023
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
The authors equally contributed in the present
research, at all stages from the formulation of the
problem to the final findings and solution.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
No funding was received for conducting this study.
Conflict of Interest
The authors have no conflicts of interest to declare
that are relevant to the content of this article.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2023.5.9
Vasudeva. G, Bharathi Gururaj
E-ISSN: 2769-2507
86
Volume 5, 2023