ISSN 2319 – 2518 www.ijeetc.com Vol. 5, No. 4, October 2016 © 2016 IJEETC. All Rights Reserved

#### **Research Paper**

# IMPLEMENTATION OF POWER EFFICIENT 12-BIT VEDIC MULTIPLIER BY USING ANT ARCHITECTURE

Bala Kiran P<sup>1\*</sup> and Padma Priya K<sup>2</sup>

\*Corresponding Author: Bala Kiran P, 🖂 penumakakiran@gmail.com

In this paper, we propose a new paradigm that is a power efficient and low area Vedic multiplier to meet the demand of high precision and low power integrating with the Algorithmic Noise Tolerant (ANT) architecture. The truncation errors can be decreased by implementing ancient Vedic multiplication techniques combined with modern probability and statistics method. The hardware complexity of multiplier is greatly simplified. In this 12\*12 bit Vedic multiplier, the effective total on-chip power can be decreased by 15% and the circuit area in our proposed design is lowered by 25% compared with existing fixed width RPR multiplier. Hence the power is efficiently used and implemented.

Keywords: Vedic multiplier, ANT architecture, Truncation errors

#### INRODUCTION

The need of ultra low power consumption meets the demand of using portable electronic devices and Digital Signal Processors (DSP). Researchers demonstrated many low-power techniques to reduce the power dissipation. Supply voltage scaling is the best method to adopt in deep sub-micrometer technologies including CMOS technologies. In general power consumption of any electronic circuit is proportional to square of supply voltage (The International Technology Roadmap for Semiconductors, 2009). But the method of Voltage Over Scaling (VOS) (Hedge and Shanbhag, 1999) leads to degeneration of noise (S/N). The complex logic sequential operations are performed in DSPs like addition, multiplication etc. One of the most complex operations is array multiplication. Since the size of array increases, the demand of power consumption also increases drastically. So in order to compensate these factors, A novel approach of Algorithmic Noise Tolerant (ANT) technique (Shim *et al.*, 2004) adjoined with voltage over scaling in the main block with Reduced Precision Replica (RPR)

<sup>&</sup>lt;sup>1</sup> PG Scholar, Department of ECE, JNTUK, Kakinada, India.

<sup>&</sup>lt;sup>2</sup> Professor, JNTUK, Kakinada, India.

to minimize the errors and also achieves the amount of energy saving.

ANT architecturehas preferred because it has the main advantage of lowering the truncation of soft errors without compromising the area of the circuit, power consumption and supply voltage scaling. Instead of using full-width rpr (Kidambi et al., 1996), the fixed width rpr is designed in the previous technique with the use of probability, statistics and partial product term analysis to find accurate compensation vector for efficient RPR design. It gives exact and efficient results but still the problem of an area overhead. So we implement a 12\*12 ANT architecture is designed and implemented with ancient but modified Vedic multiplier (Premananda et al., 2013). So the circuit can be made easier and the critical path delay improved without compromising the chip area and power consumption.

In general the speed of the multipliers is constrained by the speed of the adders used for partial product terms weight. The partial product terms are realized by using carry skip technique. Different algorithms are used in Vedic mathematics for multiplication. One of the best and basic techniques is Urdhva Tiryagbhyam (Premananda *et al.*, 2013). It is also known as Vertical-Cross algorithm. This method is used in various branches of engineering for computation and signal processing (digital) techniques. The Vedic multiplier is integrated with the ANT architecture and calculated the error precision and the amount of power is greatly reduced.

## EXISTED ANT ARCHITECTURE DESIGN The existed ANT architecture consists of Main

Digital Signal Processor (MDSP) block and Error Correction (EC) block. VOS (voltage over scaling) method is used in MDSP block. The circuit is shown in fig 1 below. If  $T_m$  is higher than  $\mathrm{T}_{\mathrm{sam}}$  of the circuit then soft errors will be occurred. Where,  $T_{_{\rm CD}}$  is critical path delay and T<sub>sam</sub> is sampling period. An exact copy of MDSP but with reduced precision parameters and shorter computation delay is used as EC. Output of MDSP is y<sub>2</sub>[n] in which there exists an amount of input independent soft errors. Output of RPR block is  $y_r[n]$ . If  $T_{cp}$  is smaller than  $T_{sam}$ ,  $y_r[n]$  is applied to detect errors in  $y_a[n]$ . To attain the errors against the threshold the difference |y<sub>2</sub>[n] -y<sub>2</sub>[n] | is considered. If the difference is larger than Th then  $y^{n} = y_{n}$ . Otherwise  $y^{n} = y_{n}$ .

Th is determined by,

$$Th = \max_{\text{vinput}} |y_{o}[n] - y_{r}[n] \qquad \dots (1)$$

where y<sub>[n]</sub> is error free output signal. In this technique power consumption is lowered but still maintains SNR degradation. Fixed-width multiplier has the advantage of avoid using increasing bit width in DSP applications compared to full-width multiplier design. To construct a fixed width DSP with n-bit input and n-bit output by truncating n-bit LSB output is best solution. It results in rounding error. The rounding error can be compensated with the constant correction value (Lim, 1992; Schulte and Swartzlander, 1993; and Jou and Wang, 2000) rather than variable correction value (Curticapean and Niittylahti, 2001; Strollo et al., 2005; Kuang and Wang, 2006; Petra et al., 2010; and Wey and Wang, 2010). In fixed width, the compensation error is corrected by overall truncation error of MDSP block. Error compensation algorithm makes use of partial product terms with the largest



weight of LSB with the concept of probability, statistics and linear regression analysis to detect error compensation value (Jou *et al.*, 1999).

In this technique, there is no particular fixedwidth RPR block is designed. Fixed width multiplier circuits are integrated with full-width multiplier circuit to compensate the error value. The Error compensation circuit should be located in the non-critical path of fixed-width RPR in order not to increase the critical path delay.

#### **Error Compensation Vector**

The main function of RPR block is to construct the errors occurred in the output of MDSP and maintains the SNR of whole system while achieving the low supply voltage. The circuit of compensation vector is shown in Figure 2. In MDSP of ANT Baugh-Wooley multiplier of n-bit inputs of X and Y is given by



X= (i=0 to n-1)  $x_i.2^i$  Y= (i=0 to n-1)  $y_i.2^i$  ...(2)

The multiplication result is denoted by 'P' which is

$$P= (k=0 \text{ to } 2n-1) p_k.2^k = (j=0 \text{ to } n-1) (i=0 \text{ to } n-1) x_i y_j. 2^{i+j} \\ \dots (3)$$

The (n/2) bit unsigned full-width Baugh-Wooley product array is divided into 4 subsets. They are: Most Significant Part (MSP) Input Correction Vector ICV ( $\beta$ ), Minor Input Correction Vector MICV ( $\alpha$ ) and Least Significant Part (LSP). Here, for computational purpose only MSP part is present remaining sub-sets are truncated. ICV ( $\beta$ ) and MICV ( $\alpha$ ) are applied to construct the error compensation algorithm because they have highest Weight.

Accuracy of fixed-width is given by

$$\varepsilon = \mathsf{P} - \mathsf{P}_{\mathsf{t}} \qquad \dots (4)$$

where P is output of MDSP and P<sub>t</sub> is fixedwidth RPR output. Average truncated error is distributed between  $\beta$  and  $\beta$  + 1. If we analyze the multiple compensation vectors for average purpose which are greater than 0.5\*2<sup>(3n/2)</sup>. So that we can lower the compensation error effectively. Multiple compensation vectors are constructed by combining f (ICV) and f (MICV) (I-Chyn Wey *et al.*, 2015). The weighted vectors  $\alpha$  is equal to half of  $\beta$ .

#### Fixed Width RPR Multiplier with Compensation Constructed by ICV and MICV

In order to realize the fixed width RPR, we construct one directly injecting ICV ( $\beta$ ) and one MICV ( $\alpha$ ) to detect the insufficient error compensation cases. ICV is realized by C<sub>1</sub>, C<sub>2</sub>, C<sub>3</sub>, ..., C<sub>(n/2)-1</sub> which is shown in Figure 3. Other compensation vector is constructed by one conditional controlled OR gate. Input of OR gate is X<sub>(n/2)</sub>Y<sub>(n-1)</sub>. Other input is conditional controlled by judge whether  $\beta = 0$  and  $\beta_1$  0 as well. C<sub>m1</sub> indicates the controlled input of AND gate. C<sub>m</sub> and X<sub>(n/2)</sub>Y<sub>(n-1)</sub> are injected into two input OR gate. Since it gives the optimal



solution in which out performs the low power consumption and area efficient architecture.

#### OVERVIEW OF VEDIC MULTIPLIER

Multipliers (Binary Multiplier) are the critical components in DSP devices. The performance of DSP processor is based on mainly complex calculations of multipliers (Array multiplications). So the design of a multiplier in a DSP processor has a major impact on the performance. Vedic multipliers are designed based on the 16 sutras(principles). It helps in the field of different branches of engineering for computation process. Integrating Multiplication with Vedic mathematician techniques helps the saving of computation time. It enhances the speed of operation. These methods can be directly applied to both differential and integral calculus of any kind.

# PROPOSED 12-BIT POWER EFFICIENT VEDIC MULTIPLIER

In the previous method, the RPR block is constructed to truncate the rounding errors.



So that the complex circuitry is built using combinational logic gates. Here, we implement a 12\*12 bit Vedic multiplier by adopting Urdhva Tiryagbhyam. It is a vertical cross algorithm in which the multiplier and multiplicand are product each other and added with the carry from the previous step. It leaves a result and a carry. This carry is added in the next step and hence the process goes on. The RPR block is implemented with 6-bit vector using full-adders. This 12-bit multiplier block is designed using four 6-bit multipliers and two 8-bit adders and two 12-bit adders. At the first step, the carry is taken as zero. The 12-bit multiplier circuit is shown below.

The first most 6 LSB bits of two operands are going to product each other [(a5-a0) and (b5-b0)]. Then we get the 12-bit output which carries the LSB 6 bits are left and the MSB 6 bits are given to 12-bit Carry Look ahead adder along with (n/2) 0's are appended to it. In the similar way, remaining 3 product terms are going to continue the process. At the bottom side, all the 12-bit outputs are added together and give the result of 24-bit output with leaves a carry. A CLA or fast adder is used in digital logic. It improves speed by reducing the amount of time required to determine carry bits. It depends on two things.

- Calculating, for each digit position, whether that position is going to propagate a carry if one comes in from the right.
- Combining these calculated values to be able to deduce quickly whether, for each group of digits, that group is going to propagate a carry that comes in from the right.

The multiplier is integrated with the proposed ANT architecture to find out the error precision value using modelsim-Altera Starter edition. Finally, after simulating the verilog code the amount of power is greatly reduced to 24.01 W. Hence we can say that the approach of Vedic multiplier is power efficient.

# SIMULATION RESULTS



| ò.                  | Msgs                          |               |        |        |        |        |     |
|---------------------|-------------------------------|---------------|--------|--------|--------|--------|-----|
| 🕂 🌧 /rpr/a          | 001100                        | 001100        |        |        |        |        |     |
| 🛨 🌧 /rpr/b          | 010101                        | 010101        |        |        |        |        |     |
| ₽-🖕 /rpr/p          | 000110                        | 000110        |        |        |        |        |     |
| 🗇 /rpr/cm1          | St0                           |               |        |        |        |        |     |
| <pre>/rpr/cm2</pre> | St1                           |               |        |        |        |        |     |
| /rpr/cm             | StO                           |               |        |        |        |        |     |
| +                   | 000100                        | 000100        |        |        |        |        |     |
| + / /rpr/s          | 00001100000<br>10100000001000 | 00001100000   | 2422   |        |        |        |     |
| ±                   | 1010000001000                 | 1010000000100 | 00100  |        |        |        |     |
|                     |                               |               |        |        |        |        |     |
|                     |                               |               |        |        |        |        |     |
|                     |                               |               |        |        |        |        |     |
|                     |                               |               |        |        |        |        |     |
|                     |                               |               |        |        |        |        |     |
|                     |                               |               |        |        |        |        |     |
|                     |                               |               |        |        |        |        |     |
|                     |                               |               |        |        |        |        |     |
| Nov                 | w 400 ps                      | os            | 200 ps | 400 ps | 600 ps | 800 ps | 100 |
| 🖥 🖌 😑 🛛 Cursor      | 1 0 ps                        | 0 ps          |        |        |        |        |     |







#### 34

| Table 1: Performance Comparison of Baugh-Wooley and Vedic Multiplier |                                      |                                            |    |  |  |
|----------------------------------------------------------------------|--------------------------------------|--------------------------------------------|----|--|--|
| Design Name                                                          | 12-bit Fixed Width<br>RPR Multiplier | 12-bit Power Efficient<br>Vedic Multiplier | 3. |  |  |
| Word Length                                                          | 12                                   | 16                                         | 0. |  |  |
| Product Word<br>Length (n)                                           | 24                                   | 32                                         |    |  |  |
| Transistor/Gate<br>Count                                             | 1372                                 | 1240                                       |    |  |  |
| Power Supply                                                         | 1                                    | 1                                          |    |  |  |
| Total On-Chip<br>Power                                               | 38.829 W                             | 24.013                                     | Л  |  |  |
| Chip Size                                                            | 4616.5µm <sup>2</sup>                | 2545.2 μm <sup>2</sup>                     | 4. |  |  |

45 nm CMOS

250 MHz

7.940

#### Technology Max Frequency 200 MHz Delay (ns) 9.868 CONCLUSION

90 nm CMOS

Process

In this paper, a 12-bit low-error and areaefficient xed-width RPR-based ANT multiplier design was presented in 90 nm CMOS technology. The proposed 12-bit Vedic multiplier with 8-bit fixed width RPR circuit is implemented in 45 nm CMOS technology. Under 0.6 V supply voltage and 200-MHz operating frequency, the total power consumption is 38.36 W. But for the 12-bit Vedic multiplier it consumes only 24.013 W. Hence the power is efficiently used and implemented.

### REFERENCES

- 1. Curticapean F and Niittylahti J (2001), "A Hardware Efficient Direct Digital Frequency Synthesizer", in Proc. 8th IEEE Int. Conf. Electron., Circuits, Syst., Vol. 1, September, pp. 51-54.
- 2. Hedge R and Shanbhag N R (1999), "Energy-Efficient Signal Processing via

Algorithmic Noise-Tolerance", in Proc. IEEE Int. Symp. Low Power Electron. Des., August, pp. 30-35.

- I-Chyn Wey, Chien-Chang Peng and Feng-Yu Liao (2015), "Reliable Low-Power Multiplier Design Using Fixed-Width Replica Redundancy Block", in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 23, No. 1, pp. 78-87.
- Jou J M, Kuang S R and Chen R D (1999), "Design of Low-Error Fixed-Width Multipliers for DSP Applications", IEEE Trans. Circuits Syst., Vol. 46, No. 6, pp. 836-842.
- 5. Jou S J and Wang H H (2000), "Fixed-Width Multiplier for DSP Application", in Proc. IEEE Int. Symp. Comput. Des., September, pp. 318-322.
- 6. Kidambi S S, El-Guibaly F and Antoniou A (1996), "Area-Efficient Multipliers for Digital Signal Processing Applications", IEEE Trans. Circuits Syst. II, Exp. Briefs, Vol. 43, No. 2, pp. 90-95.
- 7. Kuang S R and Wang J P (2006), "Low-Configurable Error Truncated Multipliers for Multiply-Accumulate Applications", Electron. Lett., Vol. 42, No. 16, pp. 904-905.
- 8. Lim Y C (1992), "Single-Precision Multiplier with Reduced Circuit Complexity for Signal Processing Applications", IEEE Trans. Comput., Vol. 41, No. 10, pp. 1333-1336.
- 9. Petra N, Caro D D, Garofalo V, Napoli N and Strollo A G M (2010), "Truncated **Binary Multipliers with Variable Correction**

and Minimum Mean Square Error", *IEEE Trans. Circuits Syst.*, Vol. 57, No. 6, pp. 1312-1325.

- Premananda B S, Samarth S Pai, Shashank B and Shashanl S Bhat (2013),
  "Design and Implementation of 8-Bit Vedic Multiplier", International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, Vol. 2, No. 12, pp. 5877-5882.
- Schulte M J and Swartzlander E E (1993), "Truncated Multiplication with Correction Constant", in *Proc. Workshop VLSI Signal Process.*, Vol. 6, pp. 388-396.
- Shim B, Sridhara S and Shanbhag N R (2004), "Reliable Low-Power Digital Signal Processing via Reduced Precision

Redundancy", *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, Vol. 12, No. 5, pp. 497-510.

- Strollo A G M, Petra N and Caro D D (2005), "Dual-Tree Error Compensation for High Performance Fixed-Width Multipliers", *IEEE Trans. Circuits Syst. II, Exp. Briefs*, Vol. 52, No. 8, pp. 501-507.
- 14. The International Technology Roadmap for Semiconductors [Online] (2009), available http://public.itrs.net/
- Wey I C and Wang C C (2010), "Low-Error and Area-Efficient Fixed Width Multiplier by Using Minor Input Correction Vector", in *Proc. IEEE Int. Conf. Electron. Inf. Eng.*, Vol. 1, August, pp. 118-122, Kyoto, Japan.