WSEAS Transactions on Computers
Print ISSN: 1109-2750, E-ISSN: 2224-2872
Volume 22, 2023
Residue Number Systems Quantization for Deep Learning Inference
Author:
Abstract: Quantization of learned CNN weights to Residue Number System can improve inference latency by
taking advantage of fast and precise low bit integer arithmetic. In this paper we review the mathematical aspects
of RNS operations for signed integer values and evaluate implementation choices for conversion of conventional
float-point PyTorch weights of CNN models to RNS representation. We also present a workflow to convert
weights of PyTorch neural network layers specific for computer vision domain to 4-bit RNS moduli-sets able to
maintain classification accuracy within 5% of 8-bit quantization baseline.