Abstract
Approximate computing is a rapidly growing technique to speed up applications with less computational effort while maintaining the accuracy of error-resilient applications such as machine learning and deep learning. Inheritance properties of the machine and deep learning process give freedom for the designer to simplify the circuitry to speed up the computation process at the expense of accuracy of computational results. Fundamental blocks of any computation are adders. In order to optimize it for better performance, 2-bit multi-bit approximate adders (MAPX) are proposed in this work which breaks the lengthy carry chain. In contrast with other approximate larger width adders, instead of using accurate adders for the most significant part, here proposed 2-bit MAPX-1 and MAPX-2 adders are arranged in various ways to compose most and least significant parts. Designed 8-bit and 16-bit adders are evaluated for their performance and error characteristics. Proposed 2-bit MAPX-2 shows better error characteristics whose MED is 0.250 while occupying less area and MAPX-1 consumes less power and delay at the cost of accuracy. Among the extended adders, MAPX 8-bit adder design1 outperforms the best performing APX based 8-bit adder design1. The error performance of it is improved by 14%, 42.1% and 50.4% compared to the existing well-performing APX 8-bit Design1, Design2 and Design3 respectively. Similarly, proposed MAPX 16-bit Design1 exhibits overwhelming performance compared to best performing APX 16-bit Design1, and its error performance is improved by 24.3%, 34.9% and 50.3% compared to APX 16-bit Design1, Design2 and Design3 respectively. In order to evaluate the proposed adder for a real application, extended MAPX 16-bit Design1 is fit in the convolution layer of Low Weights Digit Detector (LWDD) convolutional neural network-based digit classification system. Our modified system accelerates the computation process by 1.25 factors while exhibiting the accuracy of 91% and it best fits error-tolerant real applications. All the adders are synthesized and implemented in the Intel Cyclone IV EP4CE22F17C6N FPGA.
Introduction
In the recent years, the convolutional neural networks (CNN) based deep learning techniques have got the significant attention among the researchers developing solution for a wide range of cognitive challenges such as object recognition, digit detection, image classification and natural language processing [1]. CNN demands more computations for convolution process. Most of the machine learning applications impose real-time performance. Nowadays, the number of CNN layers are getting deeper to improve the inference accuracy of the applications [2]. As the number of layers increases, network parameters and computation involved with those parameters also increases. This creates the need for the computation accelerators to speed up the CNN process [3].
In order to speed up the CNN computation, hardware accelerators are preferred. These architectures are developed mainly for the inference, since the training process can be accomplished in many ways such as local CPU, TPU or clusters [4]. The CNN accelerators aim high throughput, while it has conformance to less power utilization and latency in a wide range of applications [5]. There are various hardware accelerators approaches present in the literature focusing on multicore CPUs, GPUs, FPGAs and ASICs. Since most of the deep learning applications target portable devices, it is highly necessary to choose a proper accelerator that could speed up inferencing while consuming less power and area [6].
Trends in the development of acceleration processes wish to exploit the reconfigurability of the accelerators for the frequent adoption and upgradations in the fast-growing CNN development era. Field Programmable Gate Arrays (FPGAs) are the right and flexible candidates to make an accelerator for all the aforementioned requirements [1, 5–8]. Moreover, FPGAs can be ported alongside with other onboard devices efficiently, which make them use the result outcomes of FPGA in the integrated systems [9].
The acceleration through FPGA can be achieved by incorporating the following [1, 7 and 10] Implementing a dedicated hardware block for the convolution operation Using fixed-point arithmetic instead of floating-point calculations Deploying approximate computing for processing Reducing memory accessing time by organizing the weights of the network.
Proposed work aimed to accelerate the CNN for digit classification application through the development of dedicated fixed-point inexact (approximate) arithmetic blocks for the convolution operation.
Approximate computing plays a vital role in speeding up the computation of error-tolerant applications, especially image processing and neural networks-based processing [10]. Approximate computing is the trade-off process which gains speed at the expense of computational accuracy. This trade-off has a limitation with respect to the applications inheritance in tolerating the error. Each application needs a unique kind of approximation limits.
CNNs are inheriting error tolerance through its learning and updating process of weights by random initialization. In this work, we took the privilege of the error-resilient property of CNN and focused on developing the most common block which is used in the convolution computation. The fundamental block which frequently involved in all the computation process of CNN is the adder. Adder contributes to the entire system’s performance and influences the total power consumption. Thus, by introducing the inexactness in the addition process curtail the power, area and delay, while improving the whole system’s performance [11].
Inexact Adders can be designed using two approaches to limit the carry propagation [12, 13]. They are1) Inexact Full Adder (Single-bit) 2) Block- based inexact adder or Equal Segment Adder (Multi-bit).
The first approach approximates the least significant bits of result computation and calculates most significant bits with accurate adders to substantially reduce power and area consumption. Since, least significant bits are processed with inaccurate computation; there is a significant loss of information in the processed data. It can be managed by keeping more accurate adder for maintaining the accuracy to the specific level based on the application’s error tolerance. This approach improves the power, delay and area performance of the design [14–17].
The second approach is approximating the block adders rather than approximating the single full adder as in the first approach. In this an adder is divided into multiple sub-adder blocks of specific size and these smaller blocks will be overlapped or kept disjoint. Sum of sub-adders is produced by speculating the carry. Since, accurate adders are commonly used as sub-adder blocks, errors in this method are relatively less and it has higher speed, area and power in comparison with the first method [12, 18–20].
This work proposes a multi-bit sub-adder by trading off both single-bit and segmented adders. Proposed 2-bit sub-adder is designed by breaking the lengthy carry chain delay, which is the primary factor of processing delay in existing larger width adders. Area efficiency and low power are achieved by carefully approximating the sum and carry logical expressions used for the design while keeping sufficient accuracy. The designed 2-bit sub-adder with relatively better accuracy is used as a building block of larger width adders. These adders are very well fit in a fixed-point accelerators design of CNN.
The remainder of this manuscript is organized as follows. Section II briefs background and related works of state-of-the-art single-bit and block-based approximate adders. In Section III, error characterization and performance measures of approximate adders are discussed. Proposed 2-bit, 8-bit and 16-bit adders are presented and compared with existing adders in Section IV. Performance of the proposed adder is evaluated in CNN digit classification application in Section V. Finally, the significance of this work is summarized and concluded in Section VI.
Related works
Inexact computing gains the attention of researchers in the past few years. There are many approximate adders available in the literature. All the works in the literature are attempted to explore either an approximate full adder or block-based adders. There is only one work in the literature which trades off both the methods and explored the design space by designing multi-bit adder [21]. In this section, we presented some of the works in the literature which closely related to our work.
Mahdiani et al., [17] proposed a Bio-inspired Imprecise Computational (BICs) blocks by keeping OR gates in the lower part of the adder. A multi-bit adder is split into a precise part, and imprecise part along with one AND gate for input carry generation. The precise part uses regular, accurate full adders and imprecise part use only OR gates to compute the sum. This method optimizes the resource usages for implementation while keeping power and delay as low. It is claimed that designed adder is suitable for error inherited soft computing applications.
A novel overlapped reconfigurable adder with error correction proposed by Shafique et al., [18] and is the most generic framework in nature. Adders accuracy can be configured by choosing proper values of overlapping, carry prediction and bits contributing to the final sum. Based on the number of bits contributed to the final sum accuracy of the adder is decided. If the number of overlapping bits and sum contributing bits are taken as (1, 1), it becomes overlapped 2bit adder design similar to our work. But it requires more adders and addition is carried out repeatedly for same bit values which involve in overlapping.
Yu Gong et al., [19] improved the GeAr, and they applied the adder structure as one of the components to create better CNN system architecture.
Li Luo et al., [20] implemented a single clock cycle approximate adder (SCCA) with hybrid prediction and error compensation methods. They designed the adder by balancing the computing accuracy and energy efficiency while producing the output in one clock cycle to decrease the critical path delay. Hybrid prediction methods are used for approximation. For most significant bits more bits are allocated and least significant bits few bits (simplified prediction) are used for prediction thus the name hybrid. They evaluated the adder on CNN.
An idea of approximating the multi-bit adder is proposed by Sarvenaz Tajasob et al., [21]. In this they proposed 2-bit approximate adder (APX) circuits and extended them to larger adders. Each adder has various level of approximation in sum and carry output. Proposed adders compared with other baseline adders and exhibited good area and power performance.
Error characteristics of inexact adders
Inexact adders are characterized by their deviation of computed results from the accurate results. Various error parameters such as error distance (ED), mean error distance (MED), and error rate (ER) are used to characterize the inexact adders properly. Each error parameter has its uniqueness to make the inexactness analysis of the adders. Error distance is the absolute difference of the computed results with actual results. Mean error distance is calculated as the average of all the error distances of an approximate adder circuit and it is given in Equation (1). The error rate is the percentage of erroneous results out of the results that an approximate adder can produce and it can be calculated using Equation (2).
Among the available error metrics, MED is considered as a representative metric of any approximate adder to optimize and to make an appropriate comparison of similar width approximate adders for practical applications [21, 22]. In this work also we considered MED as a comparative metric for adders with all the uniform inputs.
Approximate adder design space is explored in this section by introducing the multi-bit approximate adder. Most of the works in the literature are lie in any one of the categories as mentioned in section I. This work is attempted to apply approximation for the multiple bits. In order to develop a fundamental block, first, we have developed a 2-bit approximate adder and later it is extended to larger bit adder.
We have designed two different 2-bit approximate adders which have optimum erroneous outputs and MED. While designing an approximate adder, we considered the delay which is caused by the carry bit propagation from one sub-module to another. Most of the approximate adder works in the literature follow ripple carry adder for constructing higher bit width adders [11–13].
The major problem with single full adder approximation and block-based approximation adders is the lengthy carry chain. This is caused by the accurate adder’s usage for higher-order bits and even in some cases approximate adders in the lower order bits which follow carry rippling principle. In our proposed design, we carefully managed and broken the carry chain to minimize the path delay. Here, we have taken 2-bits as a set and approximated for its sum (S0, S1) and carry (Cout) output. In this way of doing we minimized the adder waiting time for carry by 50% through internally calculating sub carry. This design method not only reduces the carry delay time but also increases the accuracy of the result by minimizing the error.
Moreover, in our 2-bit multi-bit approximate adder (MAPX) design 1 and design 2, we have generated the carry out using multiplexers and that is given in Table 1 as Equation (5 & 8). Carry input is used to choose the carry out, which inturn minimizes the gate delay to generate Cout of the proposed adders.
Boolean functions for designing proposed 2-Bit sub-adders
Boolean functions for designing proposed 2-Bit sub-adders
Circuit diagram of our proposed 2-bit multi-bit adder design 1 (MAPX-1) and design 2 (MAPX-2) is shown in Fig. 1 and 2 respectively,

Proposed 2-Bit Multi-bit Adder Design 1 (MAPX-1).

Proposed 2-Bit Multi-bit Adder Design 2 (MAPX-2).
Comparison of the proposed adders with the existing adders in terms of MED, ER, critical path delay, Logic Elements (LE) used and Power Delay Product (PDP) is presented in Table 2. From Table 2, it is evident that proposed MAPX-2 has better MED with less ER and MAPX-1 has relatively good MED while consuming very less power and path delay among other adders. From [21, 22], an adder with better MED is suitable for real applications. Therefore, our proposed MAPX-2 is suitable for real applications.
Comparison of various 2-Bit sub-adders implementation
In Fig. 3 a and b, Mean Error distance and Error rates of the proposed 2-bit adder is comparatively illustrated with existing adders, respectively.

a. Mean Error Distance of various adders. b. Error rates of various adders.
With reference to the above results, we intended to design various higher width adders with proposed adders to make them fit in real applications. Instead of using conventional accurate adders for computing most significant bit results, here we used proposed 2-bit MAPX adders in different combination for composing the adders. The compositions of various 8-bit and 16-bit adders along with the adders in the existing works are listed in Table 3.
Composition of approximate adders with existing APX and proposed MAPX 2-bit adders
Composition of approximate adders with existing APX and proposed MAPX 2-bit adders
Composed 8-bit and 16-bit adders are evaluated for their accuracy and its error characteristics are listed in Table 4.
Error characteristics of existing and proposed approximate adders
Our 2-bit MAPX based 8-bit adder design1 outperforms the best performing APX based 8-bit adder design 1. Error performance in the proposed 8-bit adder design is improved by 14%, 42.1% and 50.4% compared to the existing well-performing APX 8-bit Design1, Design2 and Design3 respectively. Similarly, proposed 16-bit MAPX Design1 exhibits overwhelming performance compared to best-performing 16-bit APX Design1 and its error performance is improved by 24.3%, 34.9% and 50.3% compared to APX 16-bit Design1, Design2 and Design3 respectively.
All the proposed adders are implemented in the Intel Cyclone IV EP4CE22F17C6N FPGA and synthesized with Intel Quartus II version 13. Implemented results are listed in Table 5.
Synthesized Results of the Proposed MAPX Based Approximate Adders
From the synthesis results, we can observe that there is a trade-off among area, power and delay while choosing the adder for applications. MAPX 8-Bit Design1 occupies less area and takes much less time compared to others while consuming slightly higher power than others. In the case of 16-bit adders, MAPX 16-Bit Design1 takes less area while consuming moderate power and delay while comparing to adders of the same kind. MAPX 16-Bit Design2 has a moderate area and MED, but it exhibits relatively better power and delay performance.
Proposed well-performing MAPX 16-Bit Design1 with less MED is extended to fit in the accelerator to improve the speed of the convolutional neural network (CNN) based digit classification.
CNN system architecture for digit classification
Recent developments in the Convolutional Neural Network (CNN) system architecture enhance the performance of deep learning algorithms in a better manner.
A CNN or Convnet is a multi-layer filter, specially designed to visualize data information through preprocessing operations. In CNN, the input parameter size decreases layer by layer as the size of the filter increases. Detailed layer by layer dimensions variations are presented in a neat way in [2]. The system architecture of CNN for digital identification is shown in Fig. 4.

Conventional CNN architecture.
In this work, we have concentrated only on VGG based CNN system architecture for digit recognition application. Simonyan and Zisserman have developed a VGGNet based CNN architecture in the year of 2014 [23]. VGGNet is a standardized architecture with 16 convolutional layers and filters for minimal data processing. This CNN architecture is most suitable for image preprocessing applications such as identification, extracting features etc.
Roman A. Solovyev et al. [6] restructured the VGGNet and they made Low Weights Digit Detector (LWDD) architecture to detect digits from 0 to 9. They optimized the structure in various layers and made the weights as a 16-bit fixed-point to minimize the total count of weight parameters. Since these modifications significantly reduced the storage needed by parameters; it can be perfectly fit in the FPGA. Layers structure and size of the images and parameters of the LWDD network is given in Table 6.
Low Weights Digit Detector Neural Network Architecture
Low Weights Digit Detector Neural Network Architecture
This LWDD architecture receives 28×28 size digit images and performs the classification. This network has been trained using MNIST dataset digit images. We have taken the trained parameters for inferencing digit classification.
In order to enhance the hardware performance of the VGG network, we replaced conventional fixed- point adders used in the convolutional layers of the LWDD architecture with our proposed fixed-point multi-bit approximate adder and evaluated. The hardware complexity for processing the convolution layer is much simpler in our proposed architecture compared with other works. LWDD architecture processing steps and intermediate processed image sizes are shown in Table 6.
Performance of the proposed system is measured in the convolution layer level by measuring the total number of clock cycles needed for processing. The number of clock cycles of the LWDD system with a fixed-point adder and the modified system with our proposed extended MAPX 16-Bit Design1 for 22-bits are listed in Table 7.
Comparison of Clock Cycles Taken for Computation by the Convolution Layers in the in LWDD Network
From the results, it is evident that our proposed adder minimizes the clock cycles requirement uniformly and increases the system speed by the factor 1.25. Clock cycle wise comparison of the existing fixed-point adder based LWDD system and the proposed LWDD system with MAPX adder is illustrated in Fig. 5. The proposed modified accelerator architecture with extended MAPX 16-Bit Design1 is implemented in Cyclone IV FPGA and results are listed in Table 8. Our system accuracy of detection is evaluated by test dataset and it is 91%.

Clock cycles taken by different levels of convolution layer for different systems.
Resource Utilization by the LWDD System
In this paper, two 2-bit multi-bit approximate adders are proposed by considering two bits together. Performance of the proposed adders is demonstrated in comparison with closely related multi-bit adders. In the proposed adders, MAPX-2 is much accurate compared to others, while MAPX-1 keeps moderate accuracy with better power and delay performance. Designed adders are extended to various 8-bit and 16-bit adders. Among them, MAPX 8-Bit Design1 managed to have less MED, area and delay. MAPX 16-Bit Design1 is relatively accurate with less area and tolerable power and delay. Real application performance of the adder is evaluated by replacing the adder in the convolutional layer of the LWDD CNN architecture. Our modified digit recognition system speeds up the computation process by the factor of 1.25 by maintaining the accuracy of 91%. This shows that presented adder fits well for real error-tolerant applications. Further acceleration can be achieved by developing approximate multiply and accumulate unit to do the convolution computation.
