FPGA Implementation of Enhanced Throughput Design of AES
Architecture using Nikhilam Sutra
BINDU SWETHA PASULURI, V. J. K. KISHOR SONTI
School of Electronics & Electrical Engineering
Sathyabama Institute of Science & Technology, Chennai- 600119,
INDIA
Abstract: The exponential growth of the internet and contemporary communications users have established
safety as a fundamental design feature for encrypted transmission. The Enhanced Cryptography Standard is
perhaps the most widely used cryptography information security algorithm standard that has been authorized by
NIST. This paper proposes a high-throughput design for the AES Algorithm with huge key sizes. AES would
be a block cipher that ensures data security by using key lengths of 128,192 and 256-bits. The design concept
focuses on a 256-bit key size classification algorithm since a big key size is required to ensure excellent
security. Additionally, simultaneous key expansion & encryption/decryption processes would be pipelined to
maximize speed. Parallelization of a key expansion module's sub-processes would be used to reduce the critical
chain latency. The S-box comprising sub-byte & inverse sub-byte operations has been developed with
compound field arithmetic operations to reduce time and area further. The work Increased throughput by 50%,
area reduced by 34.32 %, and latency by 20% compared to the old approach with modified nikhilam sutra.
Additionally, integrated AES encryption/decryption is planned and implemented on the FPGA Zed board
utilizing Verilog HDL in Xilinx Vivado.
Key-words: - Advanced encryption standard, throughput, FPGA, high security, nikhilam sutra
Received: September 17, 2021. Revised: August 18, 2022. Accepted: September 21, 2022. Published: October 31, 2022.
1 Introduction
The need to document every meaningful occurrence
in one's everyday life became one of the key
motives. Messages from uninvited persons should
be treated with caution. Encryption has become one
of the security components used to safeguard data
accessible to the public. Cryptography is a technique
of communication security in which data
representations are transformed from one
configuration to the other to cover up and safeguard
them [1], [5], [6], [7], [8], [9], [10].
Cryptography's value continues to expand in
lockstep well with the volume of sensitive data
exchanged across open networks. The more data
delivered in a format understandable by a personal
computer, the more exposed we become too
automated espionage. Cryptography is critical in
both resistant and certifiable applications.
On January 2, 1997, the National Institute of
Standards and Technology (NIST) accepted
suggestions for innovative Advanced Encryption
Standard (AES) computations. The goal was to
supersede the older Data Cryptography Standard,
published in November 1976 in response to the
disclosure of DES's vulnerability, [10]. Rijndael was
selected and called the AES Measurement on
November 26, 2001, following two rounds of
evaluation. The Rijndael would be a block cypher
that operates on a set string of bits. Joan Daemen &
Vincent Rijmen, two of the company's Belgian
founders, inspired the name. AES would be used in
several sectors, such as database servers, ATMs,
mobile networking, and optical video recorders.
AES is a cryptographic algorithm used in both
hardware and software components. On either hand,
implementation is better suited to high-speed real-
world applications, [11].
Fig. 1: The fundamental concept of the AES
algorithm.
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2022.4.8
Bindu Swetha Pasuluri, V. J. K. Kishor Sonti
E-ISSN: 2769-2507
51
Volume 4, 2022
AES has 3 different round procedures. A specific
input block, typically 128 bits in size, is required, as
is a comparable yield of the same size. The
conversion process necessitates adding a second
kind of information, which is the key. The cypher
determines the key length, and AES supports three
alternative key sizes: 128, 192, and 256 bits. Each of
the three primary sizes has been determined to be
acceptable for Federal Government various
applications from classified to top secret. TABLE I
summarizes the round count for three separate AES
implementations. Nevertheless, the final round key
for each iteration is 128 bits. The first-round key is
initialized by entering 128 bits of text into the first-
round key, [12],[13], [19]. The key generation
function multiplies the value of the input key to
create a round key per round. The AES method is
based on a single 4x4 byte cluster known as a state.
The status changes four times during encryption:
Add Round Key, Substitution Byte (sub-byte), Shift
Row, and MixColumn (a combination of these four
Table 1. In AES, the width of the round key and the
number of iterations
Cipher Key
size
Number of
Rounds (Nr)
Round Key
size
256bits
14
128bits
192bits
12
128bits
128bits
10
128bits
Operations are referred to as a round); and for
decryption, the state undergoes four changes: Add
Round Key, converse sub-byte, transposed Shift
rows, and Inverse Mix column. The AddRoundKey
technique combines a bitwise XOR operation upon
that current state and the Key Expansion Function’s
Round Key. Encryption or decryption may be
performed in our integrated architecture by
changing the multiplexer option to 0 or 1.
There are two ways to do sub-byte substitutions:
I via a ROM table and (ii) through CFA.
Substitution is the costliest and most time-
consuming mode of operation. As a result, hardware
optimization in VLSI implementation would be
crucial for reducing the AES architecture's area and
power consumption. The ROM-based method
consumes a huge memory space and introduces
substantial latency due to ROM access time. As a
consequence, composite field arithmetic facilitates
S-box completion. Cryptography's practical usage
for encryption is contingent upon properly
managing cryptographic keys. Greater is considered
to be superior. Because they can, and because it is
available, top-secret military applications may
demand a key length of 256 bits. Consequently, this
research emphasizes the need to provide strong
protection on a large key scale.
The remainder of this paper is organized as
follows: The AES method and composite field
arithmetic needed to implement the S-box was
discussed in Section II. Section III details the S-
modified box's structure. Sections IV&V contain the
findings of the FPGA achievement, as well as
comparisons to other new S-box techniques. Section
VI concluded.
2 Integrated Encryption/Decryption
Architecture
Figure 1 illustrates the Efficient
Encryption/Decryption Recommended AES round
model interactively. The design of each module is
discussed in detail in the following sections, as are
the approaches for maximizing throughput. Every
byte therein state sequence is replaced by another
byte acquired using a multiplicative inverse using
GF (28) and a sub-byte translation using an affine
transformation.
Fig. 2: Hybrid Encryption/Decryption Round
Topology with Sub-pipelines
The inverse affine transformation is usually
performed in the same way that its opponent, the
Inverse Sub-Byte transformation, has been
performed.
Fig. 3: Block diagram for determining the S-
Multiplicative box's inverse.
Isomorphic and inverse isomorphic mapping
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2022.4.8
Bindu Swetha Pasuluri, V. J. K. Kishor Sonti
E-ISSN: 2769-2507
52
Volume 4, 2022
The reverse multiplicative approach is obtained by
decomposing the much more complicated GF (28)
into the lower order fields GF (21), GF (22), and GF
((22)2). This would be done by the use of the
irreducible polynomials listed below:
GF (22) →GF (2): x2+x+1
GF ((22)2)) →GF (22): x2+x+φ (2,1)
GF (((22)2)2) →GF ((22)2): x2+x+λ
The multiplicative inverse formula could be
clearly associated with a variable that really is GF
dependent within structural fields (28). The part
must always be isomorphic ally related to its own
composite field descriptions. Likewise, every result
must always be determined by its hybrid field
following the multiplicative inverse. -1 may be
represented mathematically as an eight-by-eight
matrix. Let q become the vector from GF (28) and
explain the isomorphic and inverse relations. As δ*q
and δ-1*q.
Multiplication of every input byte to an
isomorphic from the most important to the least
significant bit and the multiplicative inverse outputs
to a reverse isomorphic provides the expression
which may be logically realized as:
Fig. 4: Logical Implementation of δ*q.
Squaring:
The very first nibble from the output of the
isomorphic algorithm is square into itself. The
following operations have been involved:
Assume k = q2, in which GF (24) k and q are
elements which can be expressed in Boolean as {k3,
k2, k1, k0} and {q3, q2, q1, q0}.
k = kHx+kL= (qHx+qL)2 (2.2)
The above expression is in the form of (a+b)2 and
thus can be expanded as a2+2ab+b2.
k = qH2x2+qHxqL+qHxqL+qL2 = qH2x2+qL2
x2 can be reduced by an irreducible polynomial
x2+x+ 𝜑, that yields
k = qH2(x+ 𝜑 ) + qL2 qH2x+ ( qL2+ qH2𝜑)
kH= (q3x+q2)2 kH=q3x2+q2
which is in the form of bx+c. The last term has
now been reduced to GF (22). Further decomposing
kH and kL to GF (2) yields:
The irreducible polynomial may be used to
decrease this. x2+x+1
Similarly kL=
From the above equations, we obtain
K3=q3
K2=q2q3
K1= q2q1
K0= q3q1 q0
From the above equations, the logical
implementation for squaring is
Fig. 5: In GF, there is a logical structure for
squaring (24).
Multiplication of a nibble with constant λ:
Assume k=q λ, k and λ= {1100} are components in
GF (24).
k= (q3, q2, q1, q0) (1100). The first 2 bits are
considered as qH and lower 2-bits are considered as
qL.
k=qHλHx2+qLλHx L is cancelled as its value is
00)
Above expression can be reduced by substituting
x2=x+ 𝜑
k= qHλH (x+ 𝜑) + qLλHx
By separating the higher and lower terms, we get
kH= q3x2+( q3+ q2) x+q2+q1x2+(q1+q0) x+ q0
Substituting x2=x+1, gives
kH= q3(x+1) +( q3+ q2) x+q2+q1(x+1) +(q1+q0) x+
q0
similarly
k3x+k2= (q2+q0) x+(q3+ q2+ q1+ q0)
and for the lower terms
k1x+k0= (q3) x+ q2
k3= q2q0, k2= q3q2q1 q0, k1=q3 , k0=q2
This can be implemented as
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2022.4.8
Bindu Swetha Pasuluri, V. J. K. Kishor Sonti
E-ISSN: 2769-2507
53
Volume 4, 2022
Fig. 6: Multiplication using Constants Logic
Representation.
GF(24) Multiplication
Assume k=qw, q and w are components in
GF(24).
k=(q3q2q1q0)(w3w2w1w0)=(qHx+qL)(wHx+wL).
By expanding the equation and substituting
x2=x+ 𝜑, we get
k=(qHwH) ( x+ 𝜑)+( qH wL+ qLwH)x+qL wL
This can be logically implemented as in figure 7,
and the multiplication in GF(22) can be implemented
as in figure7.
Fig. 7: Logical implementation of multiplication in
GF (24).
Fig. 8: Completion of multiplication in GF in a
logical manner (2).
Multiplication with constant 𝝋
Assume k=q 𝜑, q and 𝜑={10} are components
in GF(22).
k= (q1x+q0)(10) =q1x2+q0x
By replacing x2=x+1, we can get k1=q1q0,
k0=q0that can be logically implemented as
Fig. 9: Multiplication Logical Implementation as φ.
The multiplicative inverse in GF(24)
Inverse for the independent bits [1], [2], [3], [4],
[5] can be performed as follows:
q3-1 =q3 q3q2q1 q3q0 q2
q2-1 =q3q2q1 q3q2q0 q3q0 q2 q1q2
q1-1 =q3 q3q2q1 q3q0 q2 q2q0 q1
q0-1 = q3q2q1 q3q2q0 q3q1 q3q1q0 q3q0
q3q0q0 q3q0 q2 q2q1 q2q1q0 q1 q0
Equations can be logically implementable as
Fig. 10: Logical implementation for the
multiplicative inverse.
Each output byte of the multiplicative inverse is
multiplied by an affine transformation, and in
inverse sub-byte transformation, every input byte is
multiplied by an inverse affine transformation.
3 Shift Row and Inverse
Transformation of Shift row
The 128-bit display of the s-box is laid out in a 4x4
matrix. The rows are moved to the left by x number
of bytes when x is the row number. This is not a
really good decision.
The first row has indeed been relocated 0 places
to the left. The second row has indeed been shifted
to the left by one position. The third row has already
been shifted to the left by two places. The fourth
row was being pushed to the left by three spaces.
Fig. 11: Shift Row Transformation.
Inverse shift rows take the output byte of the
inverse sub-byte as reference. Each row of the state
is circularly moved to the right, based on the row
index, in this process. Following are the steps:
The 0 positions had been shifted to the right in
the first row. The second row has also been
relocated one space to the right. The third row has
indeed been moved to the right by two spaces. The
fourth row is being pushed to the right by three
spaces.
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2022.4.8
Bindu Swetha Pasuluri, V. J. K. Kishor Sonti
E-ISSN: 2769-2507
54
Volume 4, 2022
Fig. 12: Inverse Shift Row Transformation.
4 Transformation of Mixed Columns
and Inverse Mixed Columns
Within Mix Columns and Inverse Mix Columns
transformations, every column byte becomes
transformed to the correct value, a function among
all four bytes across the column. Every
transformation involves multiplying every byte of
the output shifting rows or reverse shift rows by a
predetermined matrix, which would then be sorted
as a state, [19].
5 Vedic Multiplier
The Vedic multiplier's work focuses on Vedic
multiplication equations (Sutras). Several Vedic
sutras have traditionally is often used to multiply 2
decimal integers. This paper extends the same
concepts to the binary system to develop a very
good solution for digital electronics [12], [13]. This
paper provides an overview and application of the
notion of high-speed multiplier designs. The
multiplier concept has been extensively researched
in the Nikhilam and U T sutras.
Nikhilam Sutra
Furthermore, Nikhilam Sutra signifies 'all nine and
the final ten.' Even though it is valid across all
multiplication situations when the numbers are the
same, it's much more efficient. Because it regulates
the addition of a huge number to execute a
multiplication operation out of its neighboring base,
the beginning number is more straightforward, and
the multiplication difficulty is lower than in Figure
2. Using the multiplication of two decimal integers
(96 * 93), we will first explain this Sutra, in which
the selected base is 100, which would be more
robust and more extensive than the two. Figure 13
depicts the Nikhilam sutra.
Fig. 13: Illustration of Nikhilam Sutra.
Proposed Modified Nikhilam Multiplier
Multiplication is crucial among the several
operations employed in creating the AES algorithm
since it requires more resources to function
correctly. The multiplier's efficiency is dependent
mainly on the adders used in the design to calculate
the final number. A new multiplication unit based
on the KSA and Nikhilam Vedic sutra multipliers is
suggested to address earlier shortcomings. This
multiplication offers the benefits of using less
power, less latency, and taking up less space.
Fig. 14: Modified Nikhilam Vedic multiplier.
The Nikhilam Vedic multiplier is applied to two
binary integers. The CSA has been substituted with
a KSA high-speed adder in this manner. This
method allows for successful implementation by
multiplying one bit by another, calculating the
partial product total, and providing output [14], [15].
Figure 5 depicts the block design for the proposed
Vedic multiplier, which employs the KSA adder
instead of the RCA or CLA. KSA has more
incredible benefits than CSA since it permits high-
speed processing and lowers propagation delays. As
a result, utilizing KSA [15], the result of the
Nikhilam multiplier may be achieved in less time.
[16], [17], [18].
The proposed Nikhilam multiplier is
implemented by replacing the existing conventional
multiplier.
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2022.4.8
Bindu Swetha Pasuluri, V. J. K. Kishor Sonti
E-ISSN: 2769-2507
55
Volume 4, 2022
6 Results
Fig. 15: Simulation Result for Encryption.
Fig. 16: Simulation Result for Decryption.
For a ‘00112233445566778899aabbccddeeff’
(128-bit in hexadecimal) and 256-bit key
000102030405060708090a0b0c0d0e0f10111213
141516 1718191a1b1c1d1e1f’, enc- dec as 1,rst as 0
and clock inputs the encrypted data of 128-bit
‘ea2b7ca516745bfeafc49904b496089’ is generated.
By selecting the multiplexer selector as 1, the
encrypted data ‘ea2b7ca516749bf eafc 499
04b496089’ with256-bitkey
000102030405060708090a0b0c0d0e0f10111213
141516 1718191a1b1c1d1e1f’data is decrypted
back to ‘001122334455667788 99 abb ccddeeff’.
From table 2, it is evident that the proposed
method showed better improvement in terms of
area, latency and throughput. Some relevant studies
can be found in [19], [20], [21], [22].
Table 2. Performance Comparison
Existing
with the
conventi
onal
multiplie
r
Prop
osed
one
%
Improvem
ent
23
35
50
3289
2120
34
0.5
0.41
20
7 Conclusion
Internal pipelining for the composite sector S-box is
utilized to construct an improved encryption-
decryption architecture for the Advanced Encryption
Protocol algorithm. This pipelining enabled the
processing of state array columns simultaneously
and S-box communication between the main round
unit and the expansion key unit. Furthermore, this
design is employed with on-the-fly production of
all-round keys, which minimizes the need for an
ample space to retain all of the keys while also
cancelling the extra delay caused by pre-calculation
and storage for all-round keys. The new design
outperformed earlier s-box systems in terms of
throughput. The usage of 256-key size offers the
highest degree of security, which is employed in
top-secret military applications. Compared to the old
technique, we improved throughput by 50%, slices
by 34%, and latency by 20%. The design is created
using the Xilinx Vivado tool and implemented on
the Xilinx FPGA Zynq board.
References:
[1] I. J. Bahram Rashidi, Implementation of An
Optimized and Pipelined Combinational
Logic RijndaelS-Box on FPGA, Computer
Network and information security, Vol.1,
2013, pp. 41-48, DOI:
10.5815/ijcnis.2013.01.05.
[2] Salma Hesham, Mohamed A. Abd ElGhany,
and Klaus Hofmann, High Throughput
Architecture for the Advanced Encryption
Standard Algorithm, Electronics Department,
German University in Cairo, Egypt
Electronics Department, German University
in Cairo, Egypt, Integrated Electronic Systems
Lab, TU Darmstadt, Germany, IEEE
Integrated Electronic Systems Lab, TU
Darmstadt, Germany, 2014.
[3] TanzilurRahman, Shengyi Pan, and Qi Zhang,
Design of a High Throughput 128-bit AES
(Rijndael Block Cipher), International
multiconference of engineers and computer
scientists, 2010.
[4] Shen-Fu Hsiao, Ming-Chih Chen, and Chia-
Shin Tu Memory-Free Low-Cost Designs of
Advanced Encryption Standard Using
Common Sub Expression Elimination for
Subfunctions in Transformations, IEEE
transactions on circuits and systemsi:
regular papers, Vol.53, No. 3, 2006.
[5] Xinmiao Zhang, Student Member, IEEE, and
Keshab K. Parhi, Fellow, IEEE High-Speed
VLSI Architectures for the AES Algorithm,
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2022.4.8
Bindu Swetha Pasuluri, V. J. K. Kishor Sonti
E-ISSN: 2769-2507
56
Volume 4, 2022
IEEE transactions on very large-scale
integration (VLSI) systems, Vol.12, No.9,
2004.
[6] Naga M. Kosaraju University of South
Florida, MuraliVaranasi, Saraju P. Mohanty,
University of North Texas, Denton TX 76203,
A High-Performance VLSI Architecture for
Advanced Encryption Standard (AES)
Algorithm.
[7] Hardware Implementation of High-
Performance AES Using MinimalResources,
International Journal of Engineering
Research, Vol.3, No.2, pp. 68-72.
[8] Xinmiao Zhang, Student Member, and
Keshab K. Parhi, Fellow, High-Speed VLSI
Architectures for the AES Algorithm, IEEE
transactions on very large-scale integration
(VLSI) systems, Vol.12, No.9, 2004.
[9] Purnima Gehlot, S. R. Biradar, and B. P.
Singh MITS University, Implementation of
modified Two fish Algorithm using 128 and
192-bit keys on VHDL, International Journal
of Computer Applications, Vol.70, No.13,
2013.
[10] S. Karthik, and A. Muruganandam, Research
Scholar, Periyar University, Salem,
Tamilnadu, India, Data Encryption and
Decryption by Using Triple-DES and
performance Analysis of Crypto System,
International Journal of Scientific
Engineering and Research (IJSER) ISSN
(Online), 2014, pp. 2347-3878.
[11] Snehal Wankhade, and Rashmi Mahajan
Dynamic Partial Reconfiguration
Implementation of AES Algorithm,
International Journal of computer
applications, Vol.97, No.3, 2014.
[12] H. D. Tiwari, G. Gankhuyag, C. M. Kim, and
Y. B. Cho, Multiplier design based on ancient
Indian Vedic mathematics, Proc. Int SoC
Design Conf, pp. 65-68. 2008.
[13] S. Patil, Design of speed and power-efficient
multipliers using Vedic Mathematics with
VLSI implementation, IEEE, 2014.
[14] M. Akila, C. Gowribala, and S. M. Shaby,
Implementation of high-speed Vedic
multiplier using modified adder, IEEE
Conference on Communication and signal
processing (ICCSP), 2016, pp. 2244 -2248.
[15] N. Singh, and M. Singh, Design and
Implementation of 16 X 16 High-speed Vedic
multiplier using Brent Kung adder,
International Journal of Science and
Research (IJSR), Vol.5, No.12, 2016, pp. 239-
242.
[16] B. S. Pasuluri, V. J. K. Kishor Sonti,
Performance Analysis of 8-Bit Vedic
Multipliers Using HDL Programming,
ICDSMLA 2019, Lecture Notes in Electrical
Engineering, Springer, Singapore, Vol.601,
2020.
[17] R. Nikhil Mistri, S. B. Somani, and Dr. V. V.
Shete, Design & Comparison of Multiplier
using Vedic Mathematics, Proceedings of
IEEE.
[18] Bindu Swetha Pasuluri, and V. J. K. Kishor
Sonti, “Design of Vedic multiplier-based FIR
filter for signal processing applications, J
Phys: Conf. Ser, 2021.
[19] B. S. Pasuluri, and V. J. K. Kishor,
Application of UT multiplier in AES
algorithm and analysis of its performance,
Information Technology in Industry ITI,
Vol.9, No.3, 2021, pp. 647-652.
[20] B. Pasuluri, V. J. K. Kishor Sonti, S. M. M.
Trinath, and N. Bala Dastagiri, Design of
CMOS 6T and 8T SRAM for memory
Applications, Proceedings of Second
International Conference on Smart Energy
and communication. Algorithms for Intelligent
Systems Springer Singapore, 2021,
doi.org/10.1007/978-981-15-6707-0_44.
[21] Bindu swetha Pasuluri, and V. J. K. Kishor
Sonti, Design and Analysis of Instrumentation
amplifier using 45nm technology, in
Informatica Journal, Vol.32, No. 11, 2021.
[22] Suganthi Venkatachalam, and Seok –Bumko,
Design of Power and Area Efficient
approximate multipliers, IEEE Transactions
on VLSI Systems, Vol.25, No. 5, 2017.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2022.4.8
Bindu Swetha Pasuluri, V. J. K. Kishor Sonti
E-ISSN: 2769-2507
57
Volume 4, 2022