WSEAS Transactions on Computers
Print ISSN: 1109-2750, E-ISSN: 2224-2872
Volume 13, 2014
Towards an Optimal Multicore Processor Design for Cryptographic Algorithms – A Case Study on RSA
Authors: ,
Abstract: This paper aims at identifying the optimal Multicore processor configuration for cryptographic applications. The RSA encryption algorithm has been taken as a case study and a comprehensive design space exploration (DSE) has been performed to obtain the optimal processor configuration that can serve as either a standalone or a coprocessor for security applications. The DSE was based on four figures of merit that include: performance, power consumption, energy dissipation and lifetime reliability of the processor. A parallel version of the RSA algorithm has been implemented and used as an experimentation workload. Direct program execution and full-system simulation have been used to evaluate each candidate processor configuration based on the aforementioned figures of merit. Our analysis was based on commodity processors in order to come up with realistic optimal processor configuration in terms of its clock rate, number of cores, number of hardware threads, process technology and cache hierarchy. Our results indicate that the optimal Multicore processor for parallel cryptographic algorithms must have a large number of cores, a large number of hardware threads, small feature size and should support dynamic frequency scaling. The execution of our parallel RSA algorithm on the identified optimal configuration has revealed a set of observations. First, the parallel algorithm has achieved a 79% performance improvement as compared to the serial implementation of the same algorithm. Second, running the optimal configuration at the highest possible clock rate has achieved 40.13% energy saving as compared to the same configuration with the lowest clock rate. Third, running the optimal configuration at the lowest clock rate has achieved a 19.7 % power saving as compared to the same configuration with the highest clock rate. Fourth, the optimal configuration with low clock rate has achieved 109.85 % higher mean time to failure (MTTF), on average, as compared to the high-frequency configuration. Consequently, the optimal configuration has always the same number of cores, hardware threads, and process technology but the clock rate should be adjusted appropriately based on the design constraints and the system requirements.
Search Articles
Pages: 54-77
WSEAS Transactions on Computers, ISSN / E-ISSN: 1109-2750 / 2224-2872, Volume 13, 2014, Art. #6