ISSN 2319 – 2518 www.ijeetc.com Vol. 4, No. 2, April 2015 © 2015 IJEETC. All Rights Reserved ## Research Paper # POWER EFFICIENT ALU DESIGN WITH CLOCK AND CONTROL-SIGNAL GATING TECHNIQUE Anil Kumar Yadav1\* and Mohammed Aneesh Y1 \*Corresponding Author: Anil Kumar Yadav, Manil8210yadav@gmail.com To design a low power processor, low power Arithmetic and Logic Unit (ALU) is required, since ALU is one of the core components and most power hungry sections in a microprocessor that carries out the arithmetic and logic operations. Clock gating is the power saving technique used for low power design, it switches off the module which is not active as decided by selection line. This techniques is applied in ALU to reduce clock power and dynamic power consumption of ALU. Control signal gating technique provides power saving by reducing unnecessary switching activity on datapath buses, when a bus is not going to be used and it will be held in a quiescent state by stopping the propagation of switching activity through the module(s) driving the bus. For designing a proposed Power Efficient ALU, both the clock gating and the control-signal gating techniques are introduced for minimizing the clock power and to reduce the unnecessary switching activity of data path. Functionality of proposed ALU is verified by using Xilinx tool and power analysis is carried out by using Xilinx X'Power analysis tool. Keywords: Clock gating, Control-signal gating, Data buses, Low power, Switching activity, Dynamic power ## INTRODUCTION Power dissipation is becoming a limiting factor for high performance microprocessor design due to ever increasing device counts and clock rates and it continues to grow as an important challenge in deep submicron chip design since, it is becoming a bottleneck for future technologies, lowering power consumption becomes major issue not only for increasing battery life in portable devices, but also for improving reliability and reducing heat removal cost in high performance devices. With the scaling of technology power dissipation becomes a major concern for microprocessor systems design. A large fraction of the total power dissipation in CPU today is typically due to clocks (40%), memory (20%), datapaths (15%), and control unit (15%), I/O Pads (10%) as shown in Figure 1. Hence, 55% of total power of CPU is <sup>&</sup>lt;sup>1</sup> Department of Electronics Engineering, Pondicherry University, Pondicherry, India. dissipated only in datapaths and clocks. So, to reduce the power of clock and data path, clock and control signal gating techniques are introduces. A Simple Microprocessor contain register files, an ALU and Control Circuit. ALU is the core of microprocessors where all computations are being performed it is one of the most power hungry sections of its data path and clock power. For designing a power efficient ALU becomes a major issue to reduce the clock power and switching activity of data paths. According to reference (Kapadia et al., 1999; and Kamaraju et al., 2010). Power optimization, traditionally relegated to the synthesis and circuits level, now it shifted to the System Level and Register-Transfer-Level (RTL). This is possible due to clock gating and control signal gating. Clock gating techniques turn off the inactive units of the design and control signal gating stop the propagation of switching activity of data path by detecting the bus which is not active by the help of selection line. ### **Clock Gating Techniques** Clock power constitutes a significant portion of dynamic power. In many designs, data are loaded into registers infrequently but the clock signal continues to switch at every clock cycle often driven a large capacitive load. So, a significant amount of power can be saved by identifying when the registers are inactive and disabling the clock during these periods. This technique has been extensively used for dynamic power reduction in low power applications. However, at any particular instant only a single module may be functional, but unnecessary clocking of the other modules lead to a lot of power dissipation. Hence, Clock gating technique is a power down methodology, which involves selectively clocking modules as and when required while keeping other inactive modules in quiescent mode. Thus the power consumption due to charging and discharging of the clock at unused gates, is avoided in this strategy. There are three types of clock gating: - a) Latch based clock gating - b) Flip flop based clock gating - c) AND/OR gate based clock gating ## Latch Based Clock Gating Latch based clock gating techniques employs a level-sensitive latch to the design as shown in Figure 2, for holding the enable signal from the active edge of the clock until the inactive edge of the clock (Frank Emnett and Mark Biegel, 2000). Since the latch captures the state of the enable signal and holds it until the generation of complete clock pulse cycle and it is necessary for enable signal to be stable up to the rising edge of the clock pulse. #### Flip-Flop Based Clock Gating The Flip-Flop based clock gating technique consists of a level sensitive latch in design as EN SEL[2:0] ALU CLK Clock Gate shown in Figure 3, to hold the enable signal from the active edge to the inactive edge of the clock. AND/OR Gate Based Clock Gating In AND/OR gate based clock gating technique uses the AND/OR gate for controlling of the clock signal. In AND gate clock gating clock is active on the rising edge of the input clock, whereas in OR gate clock gating signal is active on the falling edge of the clock. ## Control-Signal Gating Techniques The control-signal technique employs the advantage of a fine granularity analysis to reduce the switching activity in the datapaths buses. The method is based on the Observability Don't Care concept (ODC) to detect when a bus is not used and to stop the propagation of the switching activity through the module(s) driving the bus. There are many different ways to implement control-signal gating. The simplest method is to put an AND/ OR gate at the signal path to stop the propagation of signal, when it needs to be masked. Another techniques is to use a latch or flip-flop to block the propagation of the signal. Sometimes, a transmission gate or a tristate buffer can be used in place of a latch if charge leakage is not concern. All signal-gating method requires control signals to stop the propagation of switching activities, that's why it'sname given control-signal gating, as shown in Figure 4. ## CONVENTIONAL16-BIT ALU WITHOUT GATING Functional of Conventional 16-Bit ALU The basic blocks of a computer are Central Processing Unit (CPU), memory unit, and input/output unit. CPU of the computer is basically the same as the brain of a human being. It contains all the registers, control unit and the Arithmetic Logic Unit (ALU). Arithmetic and Logic Unit (ALU) is considered as one of the common and the most crucial components of microprocessor. Usually ALU's consists of a number of functional blocks/units for different arithmetical, logical and shifting operation which are realized using combinational circuits. Each of the functional blocks/unit performs a specific arithmetical, logical or shifting operation. Therefore, the ALU is one of the hottest spots in the datapaths. ALU takes 16-bit operands as inputs, process the operand data and gives 16-bit output data. Following operation is performed by conventional ALU, described in Table 1. | Table 1: Function of ALU | | | | | | | |--------------------------|-------------------------|---------------------|--------------------|--|--|--| | Select<br>Operation | Arithmetic<br>Operation | Select<br>Operation | Logic<br>Operation | | | | | 0000 | Add | 1000 | AND | | | | | 0001 | Subtract | 1001 | OR | | | | | 0010 | Increment | 1010 | NOT | | | | | 0011 | Decrement | 1011 | EXOR | | | | | 0100 | Multiply | 1100 | Shift left | | | | | 0101 | Division | 1101 | Shift right | | | | | 0110 | Clear | 1110 | Rotate left | | | | | 0111 | Set | 1111 | Rotate right | | | | The Top level schematic view and Timing waveform of Conventional ALU is shown in Figures 5 and 6 respectively. ## PROPOSED ALU MODEL WITH CLOCK AND CONTROL-SIGNAL GATING Architecture of Proposed ALU Instead of designing ALU as a single module, it is divided into sixteen functional blocks, which is select by Clock gating technique, but there is problem that when one module is functioning, all other 15 input bit lines will be always in active mode, causes more switching power dissipation. To remove this problem, we introduces control signal gating techniques, which will activate only that line which is required for operation and deactivate other functional line similar to clock gating techniques. Proposed Architecture of is shown in Figure 8, which consist of clock gating and control-signal gating circuit feeded to ALU module. Clock Gating Circuit: It is combination of 4 x 16 decoder and 16 AND logic gate, in which clock signal is ANDed with output of decoder, which will select one of sixteen AND gate as decided by opcode. Hence, clock gating circuit acts as a clock selector, which can select one of the sixteen block of any architecture which requires clock. Control Signal Gating Circuit: It is a combination of two 1 x 16 Demux, which is controlled by a control-signal, (as shown in Figure 8). We are applying input signal to this block and it will give a pair of sixteen combination of output line, which will be used to feed the ALU block as a input pair line. Controlled line will select only one pair of output line, i.e., one time only one output line will activate and other fifteen will be off. Hence, in this way, it saving the switching power consumption on datapath buses. Clock distribution of proposed ALU is shown in Figure 7. ## Function of Proposed ALU Function of proposed ALU can be explained as follows: There is two inputs line in proposed ALU architecture: one is clock signal (Clock) and another is selection line (SEL). SEL line has feeded to Control signal gating circuit, clock-gating circuit and also to out DEMUX. It acts as a opcode signal for Decoder in Clock gating circuit, which will selects one of the sixteen AND gates of clock-feeded AND logic in clock-gating circuit. SEL is also feeded to control-signal gating circuit as a controlling signal. Hence, in this way, SEL line is act as a control signal for ALU operation. For example, when selection line is "0000", then opcode will be "0000", then it will selects AND 1 gate, which in turn sends clock signal to the first functional block of ALU, while for remaining fifteen units clock signals are not allowed, at a time. Control-signal gating circuit activate only first input lines of first block and deactivate other fifteen inputs line and then, out mux will select output of first functional block because of selection line input as feeded. Hence, it's saving about 93.7 (15/16)% switching activity power of its input data path and about 93.7 (15/16)% clock power. The timing waveform and top level schematic of proposed ALU is shown in Figure 9 and Figure 10 respectively. ## **RESULTS** ## Power Comparison The dynamic and total Power of conventional and proposed Model is calculated by Xilinx X'Power Analyzer tool considering target FPGA Device Virtex-6 Low power with speed grade –1 L. Power of both model is given in Table 2. and dynamic power comparison is shown in Figure 11. Table 2: Power Comparison of Both Model | Parameter | Conventional<br>Model | Proposed<br>Model | | | |-----------------|-----------------------|-------------------|--|--| | Total power | 1176 mW | 1171 mW | | | | Dynamic power | 77 mW | 72 mW | | | | Quiescent power | 1099 mW | 1099 mW | | | Figure 11: Dynamic Power Comparison From the above synthesis result and table, it is observed that the Dynamic power of proposed model is reduced in comparison to conventional model. Hence, clock and control-signal gating can be implemented as a power efficient techniques to reduce the dynamic power of device. ## **Device Utilization Summary** Device utilization summary of both the model (conventional as well as Proposed) is shown Figure 12 and Figure 13 respectively. Number of unique control sets: IO Utilization: Number of IOs: Number of bonded IOBs: IOB Flip Flops/Latches: | Figure 12: Device Utilization Summary of Conventional Model | | | | ary | |-------------------------------------------------------------|-----|-----|----|------| | Slice Logic Utilization: | | | | | | Number of Slice LUTs: | 598 | out | of | 2400 | | Number used as Logic: | 598 | out | of | 2400 | | Slice Logic Distribution: | | | | | | Number of LUT Flip Flop pairs used: | 598 | | | | | Number with an unused Flip Flop: | 598 | out | of | 598 | | Number with an unused LUT: | 0 | out | of | 598 | | Number of fully used LUT-FF pairs: | 0 | out | of | 598 | Figure 13: Device Utilization Summary of Proposed Model 58 17 58 out of 102 | Selected Device : 6vlx75tlff484-11 | | | | | |-------------------------------------|-----|-----|----|-------| | Slice Logic Utilization: | | | | | | Number of Slice Registers: | 656 | out | of | 93120 | | Number of Slice LUTs: | 424 | out | of | 46560 | | Number used as Logic: | 424 | out | of | 46560 | | Slice Logic Distribution: | | | | | | Number of LUT Flip Flop pairs used: | 759 | | | | | Number with an unused Flip Flop: | 103 | out | of | 759 | | Number with an unused LUT: | 335 | out | of | 759 | | Number of fully used LUT-FF pairs: | 321 | out | of | 759 | | Number of unique control sets: | 31 | | | | | IO Utilization: | | | | | | Number of IOs: | 53 | | | | | Number of bonded IOBs: | 53 | out | of | 240 | | IOB Flip Flops/Latches: | 16 | | | | ## CONCLUSION There are several approaches to reduce the dynamic power. In this project work, ALU model with clock gating and control signal gating technique is designed using Verilog HDL, simulated and synthesized using Xilinx ISE 13.2 considering Virtex Low power with a speed grade—1 L, to optimize the power and it is found that ALU with both clock gating and control-signal techniques consumes less power than conventional ALU. Hence, it is concluded that clock and control-signal gating is power efficient technique to optimize the power of clock and datapath buses. In this project work, there are still some improvements could be done in the future for improving the performance of ALU by introducing Low power adder and multiplier unit. We can also improve in Clock and Control signal gating techniques by replacing it with appropriate circuit model. This techniques can also be implemented at architecture level, by introducing clock gating and other low power design techniques, we can improve the dynamic power performance of Microprocessor and other power hungry devices. ### REFERENCES - 1. Bishwajeet Pandey and Manisha Pattanaik (2013), "Clock Gating Aware Low Power ALU Design and Implementation on FPGA", *International Journal of Future Computer and Communication*, Vol. 2, No. 5. - Christian Piguet (2005), "Low-Power CMOS Circuits: Technology, Logic Design and CAD Tools", CRC Press. - 3. Frank Emnett and Mark Biegel (2000), "Power Reduction Through RTL Clock Gating", SNUG, San Jose. - Hamid Mahmoodi, Tirumalashetty V, Matthew Cooke and Kaushik Roy (2009), "Ultra Low-Power Clocking Scheme Using Energy Recovery and Clock Gating", IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 17, No. 1. - 5. Ireneusz Brzozowski and Andrzej Kos (1999), "Minimization of Power - Consumption in Digital Integrated Circuits by Reduction of Switching Activity", EUROMICRO Conference, pp. 1376, doi:10.1109/EURMIC 1999.794494. - 6. Jagrit Kathuria, Ayoub Khan M and Arti Noor (2011), "A Review of Clock Gating Techniques", MIT International Journal of Electronics and Communication Engineering, Vol. 1, No. 2, pp. 106-114. - 7. Javier Castro, Pilar Parra and Antonio J Acosta (2010), "Optimization of Clock-Gating Structures for Low Leakage High-Performance Applications", *IEEE International Symposium on Circuit and System*, May 10, pp. 3320-3223. - 8. Kamaraju M, Lal Kishore K and Tilak A V N (2010), "Power Optimized ALU for Efficient Datapath", *International Journal of Computer Applications*, Vol. 11, No. 11. - Kapadia H, Benini L and De Micheli G (1999), "Reducing Switching Activity on Datapath Buses with Control-Signal Gating", *IEEE J. Solid-State Circuits*, Vol. 34, March, pp. 405-414. - Pietro Babighian, Luca Benini and Enrico Macii (2005), "A Scalable Algorithm for RTL Insertion of Gated Clocks Based on ODCs Computation", IEEE Trans. Computer-Aided Design, Vol. 24, No. 1, pp. 29-42.