Design of Modified Booth’s Encoder Using SPST technique

1VASUDEVA.G, 2BHARATHI GURURAJ

1Department of Ece, Dayananda Sagar Academy of Technology and Management, Bangalore-74,

Karnataka, INDIA

2Department of Ece, Acs College of Engineering, Bangalore, Karnataka, INDIA

Abstract: This paper presents the design and implementation of signed-unsigned Modified Booth Encoding

(SUMBE) multiplier. The present Modified Booth Encoding (MBE) multiplier and the Baugh-Wooley

multiplier perform multiplication operation on signed numbers only. Therefore, this paper presents the design

and implementation of SUMBE multiplier. The modified Booth Encoder circuit generates half the partial

products in parallel. By extending sign bit of the operands and generating an additional partial product the

SUMBE multiplier is obtained. The Carry Save Adder (CSA) tree and the final Carry Look ahead (CLA) adder

used to speed up the multiplier operation. Since signed and unsigned multiplication operation is performed by

the same multiplier unit the required hardware and the chip area reduces and this in turn reduces power

dissipation and cost of a system. The proposed radix-2 modified Booth algorithm MAC with SPST gives a

factor of 5 less delay and 7% less power consumption as compared to array MAC.

Keywords: CLA, CSA, SUMBE, Booth Encoder

Received: October 23, 2022. Revised: April 28, 2023. Accepted: June 11, 2023. Published: July 18, 2023.

1. Introduction

1.1 Common Features of Multipliers:

1.1.1 Counterflow Organization:

A novel multiplier organization is introduced, in

which the data bits flow in one direction, and the

Booth commands [1] are piggybacked on the

acknowledgments flowing in the opposite direction.

1.1.2 Merged Arithmetic/Shifter Unit:

An architectural optimization is introduced that

merges the arithmetic operations and the shift

operation into the same function unit, thereby

obtaining significant improvement in area, energy

and speed[1].

1.1.3 Overlapped Execution:

The entire design is pipelined at the bit-

level, which allows overlapped execution of

Proceedings of multiple iterations of the Booth

algorithm, including across successive

multiplications. As a result, both the cycle time per

Booth iteration, as well as the overall cycle time per

multiplication are significantly improved[2].

1.1.4 Modular Design:

The design is quite modular, which allows the

implementation to be scaled to arbitrary operand

widths[2] without the need for gate resizing, and

without incurring any overhead on iteration time.

1.1.5 Precision-Energy Trade-Off:

Finally, the architecture can be easily modified to

allow dynamic specification of operand widths, i.e.,

successive operations of a given multiplier

implementation could operate upon different word

length[4].

A new architecture of multiplier and accumulator

(MAC) for high-speed arithmetic. By combining

multiplication with accumulation was and devising

a hybrid type of carry save adder (CSA), the

performance was improved. Since the accumulator

that has the largest delay in MAC was merged into

CSA, the overall performance elevated. The

proposing method CSA tree uses 1’s-complement-

based radix-2 modified Booth’s algorithm (MBA)

and has the modified array for the sign extension in

International Journal of Electrical Engineering and Computer Science

DOI: 10.37394/232027.2023.5.9

Vasudeva. G, Bharathi Gururaj

E-ISSN: 2769-2507

Volume 5, 2023

order to increase the bit density of the operands. The

proposed MAC showed the superior properties to

the standard design in many ways and performance

twice as much as the previous research in the similar

clock frequency. We expect that the proposed

MAC[6] can be adapted to various fields requiring

high performance such as the signal processing

areas.The advanced digital processors now have

fast bit-parallel multipliers embedded in them.

Multipliers for unsigned numbers are designed

using dizzying array of ways with each method

having its own advantages and tradeoffs. In recent

years, high-speed multipliers play an important role

while designing any architecture and researchers are

still working on many factors to increase the speed

of operation of these basic elements[7]. Algorithms

for designing high-speed multipliers have been

modified and developed for better efficiency.

The increased complexity of various applications,

demands not only faster multiplier chips but also

smarter and efficient multiplying algorithms that

can be implemented in the chips[8]. It is up to the

need of the hour and the application on to which the

multiplier is implemented and what tradeoffs need

to be considered. Generally, the efficiency of the

multipliers is classified based on the variation in

speed, area and configuration. Due to rapidly

growing system-on-chip industry, not only the

faster units but also smaller area and less power has

become a major concern for designing very large

scale integration (VLSI) circuits[11]. Digital

circuits make use of digital arithmetic’s. Among

various arithmetic operations, multiplication is one

of the fundamental operation used and is being

performed by an added.

There are many ways to build a multiplier each

providing trade-off between delays and other

characteristics[15], such as area and energy

dissipation The objective of a good multiplier and

accumulator (MAC)[24] is to provide a physically

compact, good speed and low power consuming

chip. To save significant power consumption of a

VLSI design[16], it is a good direction to reduce its

dynamic power that is the major part of total power

dissipation.

This paper proposes a high speed MAC

adopting the new Spurious Power Suppression

Technique (SPST) implementing approach This

multiplier and accumulator is designed by

equipping the SPST on a modified Booth encoder

which is controlled by a detection unit using an

AND gate. The modified booth encoder will reduce

the number of partial products generated by a factor

of 2. The SPST adder will avoid the unwanted

addition and thus minimize the switching power

dissipation.[17]

2. Related Work

The number and variety of products that include

some form of digital signal processing has grown

dramatically over the last years[25]. DSP has

become a key component in many consumers,

communications, medical, and industrial products.

These products use a variety of hardware

approaches[26] to implement DSP, ranging from

the use of off-the-shelf microprocessors to field-

programmable gate arrays (FPGAs) to custom

integrated circuits (ICs). Programmable “DSP

processors,” a class of microprocessors optimized

for DSP[12], are a popular solution for several

reasons. In comparison to fixed-function solutions,

they have the advantage of potentially being

reprogrammed in the field, allowing product

upgrades or fixes. They are often more cost-

effective (and less risky) than custom hardware,

particularly for low-volume applications, where the

development cost of custom ICs[13] may be

prohibitive. And in comparison to other types of

microprocessors, DSP processors often have an

advantage in terms of speed, cost, and energy

efficiency [1].

. There are many implementation media available

for signal processing. These implementations vary

in terms of programmability from fixed-

functionality hardware like ASIC’s [9] to fully

programmable like general-purpose processors.

The emergence of the new architecture[5], which

offers the same computational attributes as fixed-

functionality architectures in a package that can be

customized in the field, is driven by a need for real-

time performance within the given operational

International Journal of Electrical Engineering and Computer Science

DOI: 10.37394/232027.2023.5.9

Vasudeva. G, Bharathi Gururaj

E-ISSN: 2769-2507

Volume 5, 2023

parameters of a target system[14] and a need to

adapt to changing data sets, computing conditions,

and execution environments of DSP applications. In

this paper we used three main ideas VHDL,

architecture pipelining, and implementation of

FPGAs. More details on FPGAs can be found in

[2]This design has been simulated using the

modelsim software then implemented on FPGA.

Table 1. Recoding of bits using Modified Booths Encoder

Xi+1

Xi-1

Action

0 × Y

1 × Y

2 × Y

-2 × Y

-1 × Y

0 × Y

International Journal of Electrical Engineering and Computer Science

DOI: 10.37394/232027.2023.5.9

Vasudeva. G, Bharathi Gururaj

E-ISSN: 2769-2507

Volume 5, 2023

3. Modified Booth Encoder

In order to achieve high-speed, multiplication

algorithms using parallel counters, such as the

modified Booth algorithm[18] has been proposed,

and some multipliers based on the algorithms have

been implemented for practical use. This type of

multiplier operates much faster than an array

multiplier for longer operands because its

computation time is proportional to the logarithm of

the word length of operands[19]. Booth

multiplication is a technique that allows for To

Booth recode the multiplier term,[20] we consider

the bits in blocks of three, such that each block

overlaps the previous block by one bit. Grouping

starts from the LSB, and the first block only uses

two bits of the multiplier. Figure 1 shows the

grouping of bits from the multiplier term for use in

modified booth encoding.

Figure 1.Recoding of bits

0 × Y

1 × Y

+ 0

1 × Y

0 × Y

160

International Journal of Electrical Engineering and Computer Science

DOI: 10.37394/232027.2023.5.9

Vasudeva. G, Bharathi Gururaj

E-ISSN: 2769-2507

Volume 5, 2023

Figure 2 Example for Modified Booth Encoder

4. Proposed Low Power High

Performance Multiplier and

Accumulator

The proposed high speed low power

multiplier is designed by equipping the SPST on a

tree multiplier. There are two distinguishing design

considerations in designing the proposed multiplier

as listed in the following: Applying the SPST on the

Modified Booth Encoder Figure 8 shows a

computing example of Booth multiplying two

numbers “2AC9” and “006A”. The shadow denotes

that the numbers in this part of Booth multiplication

are all zero so that this part of the computations can

be neglected. Saving those computations can

significantly reduce the power consumption caused

by the transient signals. According to the analysis

of the multiplication shown in figure 8, we propose

the SPST-equipped modified-Booth encoder, which

is controlled by a detection unit. The detection unit

has one of the two operands as its input to decide

whether the Booth encoder calculates redundant

computations as shown in Figure 9. The latches can,

respectively, freeze the inputs of MUX-4 to MUX-7

or only those of MUX- 6 to MUX-7 when the PP4 to

PP7 or the PP6 to PP7 are zero; to reduce the

transition power dissipation. Figure 10. Shows the

booth partial product generation circuit. It includes

AND/OR/ EX-OR logic.The former SPST has been

discussed in [9] and [10]. Figure 4 shows the five

cases of a 16-bit addition in which the spurious

switching activities occur. The 1st case illustrates a

transient state in which the spurious transitions of

carry signals occur in the MSP though the final

result of the MSP are unchanged[21]. The 2nd and

the 3rd cases describe the situations of one negative

operand adding another positive operand without

and with carry from LSP, respectively. Moreover,

the 4th and the 5th cases respectively demonstrate

the addition of two negative operands without and

with carry-in from LSP[22]. In those cases, the

results of the MSP are predictable Therefore the

computations in the MSP are useless and can be

neglected. The data are separated into the Most

Significant Part (MSP) and the Least Significant

Part (LSP)[23].

The SPST uses a detection logic circuit to detect the

effective data range of arithmetic units, e.g., adders

or multiplier[24]s. When a portion of data does not

affect the final computing results, the data

controlling circuits of the SPST latch this portion to

avoid useless data transitions occurring inside the

arithmetic units. Besides, there is a data asserting

control realized by using registers to further filter

out the useless spurious signals of arithmetic unit

every time when the latched portion is being turned

on. This asserting control brings evident power

reduction. Figure 5 shows the design of low power

adder/subtract with SPST.

The adder /subtract is divided into two parts, the

most significant part (MSP) and the least significant

part (LSP). The MSP of the original adder/subtract

is modified to include detection logic circuits, data

controlling circuits, sign extension circuits, logics

for calculating carry in and carry out signals. The

most important part of this study is the design of the

control signal asserting circuits, denoted as

asserting circuits in Figure 2. Although this

asserting circuit brings evident power reduction, it

may induce additional delay. There are two

implementing approaches for the control signal

assertion circuits. The first implementing approach

of control signal assertion circuit is using registers.

This is illustrated in Figure 6. The three output

signals of the detection logic are close, Carr_ctrl,

sign. The restriction that must be greater than to

guarantee the registers from latching the wrong

values of control usually decreases the overall

speed of the applied designs

International Journal of Electrical Engineering and Computer Science

DOI: 10.37394/232027.2023.5.9

Vasudeva. G, Bharathi Gururaj

E-ISSN: 2769-2507

Volume 5, 2023

Figure 4.Booths Encoder using LSP and MSP adder

Figure 5: Illustration of Multiplication using Modified Booth

Encoding

International Journal of Electrical Engineering and Computer Science

DOI: 10.37394/232027.2023.5.9

Vasudeva. G, Bharathi Gururaj

E-ISSN: 2769-2507

Volume 5, 2023

Figure 6 Booth Partial Product Selector Logic

Figure 7.: SPST Equipped Modified Booth encoder

A. Applying the SPST on the Compression Tree The proposed SPST -equipped multiplier is

International Journal of Electrical Engineering and Computer Science

DOI: 10.37394/232027.2023.5.9

Vasudeva. G, Bharathi Gururaj

E-ISSN: 2769-2507

Volume 5, 2023

illustrated in figure 11. The PP generator generates

five candidates of the partial products, i.e., {–2A,–A,

0, A, 2A}. These are then selected according to the

Booth encoding results of the operand B. When the

operand besides the Booth encoded one has a small

absolute value, there are opportunities to reduce the

spurious power dissipate Radix-2 modified booth

MAC with SPST performs both multiplication and

accumulation. Multiplication result is obtained by

multiplying multiplicand and multiplier. This

multiplication with SPST module is shown in figure

Figure 8 MAC with SPST Module

The schematic of MAC unit with SPST Module is

obtained using the RTL schematic by Xilinx tool.

The RTL schematic of the MAC with SPST

module consist of a 16 bit input and a 32 bit

Output.

Figure 9. RTL Schematic of MAC Unit.

International Journal of Electrical Engineering and Computer Science

DOI: 10.37394/232027.2023.5.9

Vasudeva. G, Bharathi Gururaj

E-ISSN: 2769-2507

Volume 5, 2023

5. Results and Discussion

The MBE code is written in verilog,synthesized and

simulated using Xilinx and modelsim tools. The

netlist generated during the execution of RC

compiler tool from cadence is given as an input to

the physical design. The portioning process and

floor planning is done by using cadence assura tool.

The W/L ratio is fixed at 0.33 for all the horizontal

and vertical pads and rings and also the stripes. Next

power planning is done. After that placement of the

pads is done in such a way that the aspect ratio is

minimized .Next routing is done for interconnection

between the wires .Finally GSSII IS done and the

ASCII code is generated, which will be further taken

as an input to the fabrication process.

The results of FPGA and ASIC are as shown in

Figures 10 and 11.The fig 10 describes the

waveform i.e the simulation result for an 8x8 bit

MBE. The final 32 bit output is obtained and it is

divided into y0,y1,y2 and y3.The physical design

output is obtained by using the assura tool from

cadence and the netlist is generated from the RC

compiler tool from cadence. Finally we compare

FPGA and ASIC Metholologies.

Figure10.Simulation results for MBE using Modelsim

International Journal of Electrical Engineering and Computer Science

DOI: 10.37394/232027.2023.5.9

Vasudeva. G, Bharathi Gururaj

E-ISSN: 2769-2507

Volume 5, 2023

Figure 11. Final Routing Output

International Journal of Electrical Engineering and Computer Science

DOI: 10.37394/232027.2023.5.9

Vasudeva. G, Bharathi Gururaj

E-ISSN: 2769-2507

Volume 5, 2023

also less as compared to it. It is completely depend on the Algorithm used in both Multipliers.

Table 2 Comparison of Modified booth encoder using FPGA and ASIC

Hence ASIC is preferred when compared to FPGA.

VENDOR

FPGA

ASIC

AREA

62%

16.3%

POWER DISSIPATION

123mw

32763.567nw

DELAY

1.901ns

7950ps

International Journal of Electrical Engineering and Computer Science

DOI: 10.37394/232027.2023.5.9

Vasudeva. G, Bharathi Gururaj

E-ISSN: 2769-2507

Volume 5, 2023

6. Conclusion and Scope for Future

Work

Radix-2 Booth Multiplier is implemented

here; the complete process of the implementation is

giving higher speed of operation. The four cycle of

shifting process including addition and subtraction

is available. Now at the same time RTL Schematic

generated here is giving the comfortable execution

of it. This RTL Schematic can be implemented in

FPGA CPLD kit that will give the proper Output.

Now this RTL Schematic of Radix-2 Booth

Multiplier is compared with implemented RTL

Radix-4 Encoder Booth Multiplier. The Speed and

Circuit Complexity is compared, Radix-4 Booth

Multiplier is giving higher speed as compared to

Radix-2 Booth Multiplier and Circuit Complexity is

The MAC process is coded with VHDL and

synthesized using Xilinx ISE 6.2i. The MAC

process is implemented using xc3s1000-5fg456

FPGA Xilinx device. The synthesis results of the

MAC unit have been calculated as can be seen in

Table2. Here, same FPGA device (part number &

speed grade) with the same design constraints

implied for the synthesis of the MAC unit has been

targeted. This MAC unit is generally preferred for

simpler designs. The experimental test shows that

the results have been validated.

In this project, we propose a high speed low-power

multiplier and accumulator (MAC) adopting the

newSPST implementing approach. This MAC is

Suppression Technique (SPST) on a modified Booth

encoder which is controlled by a detection unit using

an AND gate. The modifiedbooth encoder will

reduce the number of partial products generated by

a factor of 2. The SPST adder will avoid the

unwanted addition and thus minimize the switching

power dissipation. The SPST MAC implementation

with AND gates have an extremely high flexibility

on adjusting the data asserting time. This facilitates

the robustness of SPST can attain 30% speed

improvement and 22% power reduction in the

modified booth encoder. This design can be verified

using Modelsim and Xilinx using verilog.

References

[1] A. D. Booth,"A signed binary

multiplication technique," Quarterly Journal of

Mechanics and Applied Mathematics, vol. 2, pp.

236-240, 1951.

[2] O. L. MacSorley, "High-speed arithmetic

on binary computers," IRE Transaction on

Electronic Computers, vol. 49, pp. 67-91, 1961.

[3] A. R. Cooper, "Parallel Architecture

Modified Booth Multiplier", Proceedings of the

Institution of Electrical Engineers, 1989.

[4] J. Fadavi-Ardekani, "M × N booth encoded

multiplier generator using optimized Wallace

trees," IEEE Trans. Very Large Scale Integr.

(VLSI) Syst. , vol. 1, no. 2, pp. 120–125, Jun, 1993.

[5] W. -C. Yeh and C. -W. Jen, "High-speed

booth encoded parallel multiplier design," IEEE

Trans. Comput. , vol. 49, no. 7, pp. 692–701, Jul,

2000.

[6] B Ayman A. Fayed, Magdy A. Bayoumi,

"A Novel Architecture for Low-Power Design of

Parallel Multipliers," vlsi,pp. 0149, IEEE Computer

Society Workshop on VLSI 2001.

[7] Razaidi Hussin, Ali Yeon Md. Shakaff,

Norina Idris1, Zaliman Sauli1, Rizalafande Che

Ismail1 and Afzan Kama An Efficient Modified

Booth Multiplier Architecture, International

Conference on Electronic Design, 2008.

[8] Shiann-Rong Kuang and Jiun-Ping Wang

"Modified Booth Multipliers with a Regular Partial

Product Array" IEEE Transactions on Circuits and

Systems—II Volume. 56, No. 5, 2009.

[9] Soojin Kim and Kyeongsoon Cho, "Design

of High-speed Modified Booth Multipliers

Operating at GHz Ranges," World Academy of

Science, Engineering and Technology, Vol. 7, No.

2, 2010.

[10] S. K. Sahoo and C. Shekhar, "A fast final

adder for a 54-bit parallel multiplier for DSP

application",international journal of electronics,

vol. 98, no. 12, pp. 1625-1638, 2011.

[11] S. K. Sahoo and C. Shekhar,"Delay

Optimized Array Multiplier for Signal and Image

Processing", 2011 International Conference on

Image Information Processing, 2011.

[12] C. Senthilpari "A low power and High

performance Radix- 4 Multiplier design using a

International Journal of Electrical Engineering and Computer Science

DOI: 10.37394/232027.2023.5.9

Vasudeva. G, Bharathi Gururaj

E-ISSN: 2769-2507

Volume 5, 2023

Modified Pass Transistor Logic Technique" IETE

Journel of Research Vol. 57, Issue 2, 2012.

[13] Neil. H. Weste and Kamran Eshraghian,

"Principles of CMOS VLSI Design," Addison-

Wesley, ISBN 0201733897,pp 361-380, 1993.

[14] Kiat-Seng Yeo and Kaushik Roy, "Low-

Voltage,Low- Power VLSI Subsystems" Tata

McGraw-Hill,ISBN-13-978-0- 07-067750-0, pp

119- 146, 2009.

[15] Pratap Kumar Dakua, Anamika Sinha,

Shivdhari & Gourab,“Hardware Implementation of

MAC unit,” International Journal of Electronics

Communication and Computer Engineering, vol. 3,

2012.

[16] M.Jeevitha, R.Muthaiah, P.Swaminathan,

“Review Article: Efficient Multiplier Architecture

in VLSI Design,” Journal of Theoretical and

Applied Information Technology, vol. 38, no. 2,

April 2012.

[17] Ravi Shankar Mishra, Puran Gour, Braj

Bihari Soni, “Design and Implements of Booth and

Robertson’s multipliers algorithm on FPGA,”

International Journal of Engineering Research and

Applications, Vol. 1, Issue 3, pp. 905-910, 2011.

[18] Tung Thanh Hoang, Magnus Själander, Per

Larsson- Edefors, “A High-Speed, Energy-Efficient

Two-Cycle Multiply- Accumulate (MAC)

Architecture and Its Application to a Double-

Throughput MAC Unit,” IEEE transactions on

Circuits & Systems, vol. 57, no. 12, pp. 3073- 3081,

Dec. 2010.

[19] A. Abdelgawad, Magdy Bayoumi, “High

Speed and Area- Efficient Multiply Accumulate

(MAC) Unit for Digital Signal Prossing

Applications,” IEEE International Symposium on

Circuits & Systems , pp. 3199 – 3202, 2007.

[20] Berkeley Design Technology, Inc.,

“Choosing a DSP Processor,” World Wide Web,

http://www.bdti.com/articles/choose_2000.pdf,

2000.

[21] Jennifer Eyre and Jeff Bier, “The Evolution

of DSP Processors”, Berkeley Design

Technology,Inc.,

http://www.bdti.com/articles/evolution.pdf, 2000

[22] G. Lakshmi Narayanan and B.

Venkataramani, “Optimization Techniques for

FPGA-Based Wave Pipelined DSP Blocks”, IEEE

Trans. Very Large Scale Integr. (VLSI) Syst., 13.

No 7. pp 783-792, July 2005.

[23 ] L.Benini, G.D. Micheli, A. Macii, E. Macii,

M. Poncino, and R.Scarsi,“"Glitching Power

Minimization by Selective Gate Freezing”, IEEE

Trans. Very Large Scale Integr. (VLSI) Syst., 8, No.

3, pp. 287-297, June 2000.

[24] S.Henzler, G. Georgakos, J. Berthold, and D.

Schmitt- Landsiedel, “Fastpower-Efficientcircuit-

Block Switch-off Scheme”, Electron. Lett., 40, No.

2, pp. 103-104, Jan. 2004.

[25 ] H. Lee, “A Power-Aware Scalable Pipelined

Booth Multiplier”, In Proc. IEEE Int. SOC Conf.,

2004, pp. 123-126.

[26] J. Choi, J. Jeon, and K. Choi, “Power

Minimization of Functional units by Partially

Guarded Computation”, In Proc. IEEE Int. Symp.

Low Power Electron. Des., 2000, pp. 131-136.

International Journal of Electrical Engineering and Computer Science

DOI: 10.37394/232027.2023.5.9

Vasudeva. G, Bharathi Gururaj

E-ISSN: 2769-2507

Volume 5, 2023

Contribution of Individual Authors to the

Creation of a Scientific Article (Ghostwriting

Policy)

The authors equally contributed in the present

research, at all stages from the formulation of the

problem to the final findings and solution.

Sources of Funding for Research Presented in a

Scientific Article or Scientific Article Itself

No funding was received for conducting this study.

Conflict of Interest

The authors have no conflicts of interest to declare

that are relevant to the content of this article.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the

Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en

_US

International Journal of Electrical Engineering and Computer Science

DOI: 10.37394/232027.2023.5.9

Vasudeva. G, Bharathi Gururaj

E-ISSN: 2769-2507

Volume 5, 2023