ISSN 2319 – 2518 www.ijeetc.com Vol. 5, No. 4, October 2016 © 2016 IJEETC. All Rights Reserved

#### **Research Paper**

# LOW POWER CLOCK DISTRIBUTION NETWORK USING CLOCK PAIRED SHARED FLIP FLOP

Kishore P A K<sup>1</sup>\* and Padma Priya K<sup>2</sup>

\*Corresponding Author: Kishore PAK, 🖂 adityakk43620@gmail.com

In any VLSI technique, dynamic power consumption and delay plays a very significant role in an integrated circuit design. We implement a new method for distributing the clock in a clock distribution network from a common point to all the elements equally. A current mode signalling scheme has been used in one-to-many transitions, which is implemented in this method. To attain this, we implement a high performance Clock Paired Shared Flip Flop (CPSFF). When the CPSFF is integrated with the current mode transmitter, the Clock Distribution Network (CDN) exhibits 25% amount of lower average power consumption when compared with previously implemented Current Mode Pulsed Flip Flop with Enable (CMPFFE) clock distribution network. The project is implemented in 45 nm CMOS technology so that the parameters area and delay are greatly reduced to the extent.

Keywords: CMPFFE, CPSFF, Power gating technique, Clock distribution network, Current mode

#### INTRODUCTION

Low power consumption is achieved by the portable electronic devices when using low power components and is attained by scaling of Complementary Metal Oxide Semiconductors (CMOS) to a large extent. Now-a-days scaling has become a quite critical in Asynchronous Specific Integrated Circuit (ASICs) and System-On-Chips (SOCs) because interconnect in the Very Large Scale Integration (VLSI) technologies expend huge amount of power. The major consumers of this power are Clock Distribution Networks (CDN) and flip flop designs. Even a typical microprocessor exhibits large amount of power is consumed. But in the scaled technologies, power consumption is greatly reduced to 25% every year. A clock distribution network distributes the clock in a network to all the sequential elements that need it and equally. Large latencies (Guthaus *et al.*, 2013) occur in the conventional flip flops and CDNs due to global interconnect delay (Sylvester and Hu, 2001; and Katoch *et al.*, 2005). Current

<sup>&</sup>lt;sup>1</sup> PG Scholar, Department of ECE, JNTUK, Kakinada, India.

<sup>&</sup>lt;sup>2</sup> Professor, JNTUK, Kakinada, India.

Mode (CM) logic was an efficient and attractive high speed signal scheme, since it offers an optimal static power compared to traditional voltage modes. Low-swing is the efficient solution to address the current mode logic. The latency is improved in current mode global interconnect over voltage mode.

Researchers explained that in any digital system clock signals dominate large amount of system power. Thus, by reducing the clock power can have a great impact on the total system power in a VLSI digital device. Clock switching causes an unwanted gate transitions. In Single-Edge Triggered (SE) flip flops, they are sensitive to only rising/falling edge. Power dissipation is excessive in another edge of operation. But in Double-Edge (DE) Triggered flip flops, they are sensitive to both fall/rise edge of operation, so that the clock frequency can be down to half and the data rates will be same, results in 50% power savings of a flip flop.

In this paper, we present a new CP flip flop and that integrates with a current mode clock distribution network where, the clock (CLK) input is for flip flop synchronous operation, data (D) and output (Q, Q\_b). The key contributions of this paper are:

- Overview of current mode logic
- Operation of CMPFFE and CM CDN
- Power gating technique
- Proposed CPSFF using Power gating method
- Implementation of CDN using CPSFF
- Power comparisons of flip flops

CURRENT MODE LOGIC Current Mode Logic (CML), or Source-



Coupled Logic (SCL), is a differential digital logic family intended to transmit data at speeds between 312.5 Mbit/s and 3.125 Gbit/s across standard printed circuit boards.

The transmission is point-to-point, unidirectional, and is usually terminated at the destination with 50  $\Omega$  resistors to V<sub>cc</sub> on both differential lines. CML is frequently used in interfaces to fibre optic components.CML signals have also been found useful for connections between modules. CML is the physical layer used in DVI and HDMI video links, the interfaces between a display controller and a monitor.

This technology has widely been used in design of high-speed integrated systems, such as in telecommunication systems (serial data transceivers, frequency synthesizers, etc.). The fast operation of CML circuits is mainly due to their lower output voltage swing compared to the static CMOS circuits as well as the very fast current switching taking place at the input differential pair transistors. One of the primary requirements of a current-mode logic circuit is that the current bias transistor must remain in saturation region in order to maintain constant current.

# TRADITIONAL CURRENT MODE SIGNALING SCHEMES

In a current mode signalling scheme, three methods were implemented.

First one, CM expensive Trans impedance amplifier receiver (Narasimhan *et al.*, 2005) in which, Transmitter utilizes a voltage mode input current that transmits with minimum voltage swing into a transmission line. The receiver converts I to V provide a full swing output voltage. The receiver voltage around a common mode voltage which results in Vcm shift (Kancharapu *et al.*, 2011) which leads to large latencies.

Second one, dynamic over driving transmitter with a strong and weak driver alongside a low-gain inverter amplifier receiver (Dave *et al.*, 2012), leads to rise time and fall time mismatch at the output.

Third one, Expensive variation tolerant CM signal method which has a CM transmitter with corner aware bias circuitry (Dave *et al.*, 2012). It leads to a large static and dynamic power due to low impedance at the threshold point.

# EXISTED CURRENT MODE PULSED FLIP FLOP WITH ENABLE

Basically, in this current mode scheme, is highly integrated into the flip flops that receive the current signal precisely to reduce the power consumption of a device and silicon chip area. It is similar to previously implemented CMPFF (Islam and Guthaus, 2014) but uses an active low enable (EN\_b) signal. It has a comparator stage (CC), register stage and a static storage cell. Current comparator stage compares input push pull current with reference I and subject to amplifies the clock signal. Push pull current enables transmitter circuit and maintains a constant low-swing bias voltage. It is sensitive to unidirectional push current which triggers positive edge. Complementary CC modifies the negative edge using pull current. At the input side, if we want to receive and efficient current, the input impedance must be low according to the given equation.

$$Zin = 1/(g_{m1} + g_{m2}) \qquad \dots (1)$$

If active-low enable (EN\_b) is low or zero, transistor M7 is disabled and M4 is point to Vdd and disables static I1 in standby mode when EN b is high or one. B is disconnected and M7 is required to ground the internal clock node which occurs stacking effect (Roy et al., 2003) in comparator stage. It reduces the leakage I in M4. Input stage, rvg (M<sub>2</sub>-M<sub>3</sub>) creates I<sub>ref</sub> mirrored by M4 and generates I1. M1-M2 pair creates I<sub>ref2</sub> combines with (i\_in) then mirrored by M5 to I2. M<sub>r1</sub> is added to replicate the voltage drop of M3. Using global rvg reduces static power and saves two transistors per flip flop. 11, 12 compared using inverter amplifier A1 at B and extended at C by A2. (X1-X2) inverter pair generates required pulse duration before the feedback connection in M6. The circuit shown in Figure 2.

### Current Mode Clock Distribution Network

When a Current Mode Pulsed Flip Flop with Enable (CMPFFE) (Riadul Islam and Matthew Guthaus, 2015) integrates, a transmitter provides a push pull current into clock network and distributes required current to each flip flop. The network circuit shown in Figure 3. Transmitter receives  $V_{clk}$  from phase locked loop at the root of H-tree (Eby Friedman, 2001)



network and supplies pulsed current to interconnect at near constant voltage. Clock distribution is symmetric H-tree with equal 'Z' in each branch so the current is supplied equally. It uses a Transmitter; NAND uses a clk and clk\_b to generate negative pulse to turn briefly M1. In Parallel, NOR uses positive edge of clk and clk\_b signals to turn M2. M1 acts as a source while M2 acts as a sink, removes short circuit current form transmitter. The circuit performance depends on the length of wires used in the network. By demonstrating the analysis of both voltage mode and current mode clock networks, the increased total power consumption is extracted from the number of sinks and chip surface area.

#### POWER GATING TECHNIQUE

Power gating is a technique used in integrated circuit design to reduce power consumption, by shutting off the current to blocks of the circuit that are not in use. In addition to reducing stand-by or leakage power, power gating has the benefit of enabling Iddq testing. Power gating affects design architecture more than clock gating. It increases time delays, as power gated modes have to be safely entered and exited.



Architectural trade-offs exist between designing for the amount of leakage power saving in low power modes and the energy dissipation to enter and exit the low power modes. Shutting down the blocks can be accomplished either by software or hardware. Driver software can schedule the power down operations. Hardware timers can be utilized. A dedicated power management controller is another option. An externally switched power supply is a very basic form of power gating to achieve long term leakage power reduction. To shut off the block for small intervals of time, internal power gating is more suitable. CMOS switches that provide power to the circuitry are controlled by power gating controllers. Outputs of the power gated block discharge slowly. Hence output voltage levels spend more time in threshold voltage level. This can lead to larger short circuit current.

Power gating uses low-leakage PMOS transistors as header switches to shut off power supplies to parts of a design in standby or sleep mode. NMOS footer switches can also be used as sleep transistors. Inserting the sleep transistors splits the chip's power network into a permanent power network connected to the power supply and a virtual power network that drives the cells and can be turned off. Typically, high-Vt sleep transistors are used for power gating, in a technique also known as multi-threshold CMOS (MTCMOS). The sleep transistor sizing is an important design parameter.

The quality of this complex power network is critical to the success of a power-gating design. Two of the most critical parameters are the IR-drop and the penalties in silicon area and routing resources. Power gating can be implemented using cell- or cluster-based (or fine grain) approaches or a distributed coarsegrained approach.

# PROPOSED CLOCK PAIR SHARED FLIP FLOP

In the previous method, we used an active low enable pin to switching the data from one point to another point which results in extra added transistor and the amount of leakage current is comparatively more. Also the surface area of flip flop is increased and by changing the operation point of one transistor gets the result of extra delay. In order to compensate all these factors, we go for a novel approach and implemented a new flip flop known as clock pair shared flip flop. The circuit is shown in Figure 4. In this method, number of clocking transistors is less compared to previously implemented CMPFFE. It contains an input stage contains a comparator and a register stage at th e output to store the data.

Here, a pseudo NMOS transistor P1 is being used to charge the internal node X instead of using two clocked pre-charging transistors which was implemented with CDMFF (Yamashina and Yamada, 1992). In this flip flop the clock load is driven by only four transistors and the node X holds the supply voltage  $V_{dd}$ . So the floating point is not present and improves the noise margin at X. If D = 1 then Q = 1, N5 is HIGH, N1 will be kept LOW to avoid the unnecessary transitions occurs at the point X and produces short circuit current. The transistor P2 pulls Q from 0 to 1. If D = 0 then Q = 0, i.e., N2 branch pull down Q down



to 0. So Y = 1 when input clock is HIGH. In this stage, PMOS present in N1 Should turn on N2. Short circuit current occurs when D changes state from 0 to 1 and is disconnected for two clocks delay. The clocked pseudo NMOS scheme is used in the flip flop. The transistors P1, N1, N3 and N4 should be designed properly so that the efficient noise margin would be present. Low input clock voltage doesn't drive PMOS because low-swing is possible in the flip flop. N1, N2 shares the clocked pair (N3, N4). The flip flop is implemented in 45 nm CMOS technology having 8 metal layers. The simulation of the output data gives the dynamic power consumption of 0.35  $\mu$ w and a surface area of 43.1  $\mu$ m<sup>2</sup> which is comparatively less power consumption and chip area. The total amount of power decreases by 35% which is very high compared to CMPFFE. After all that, when CPSFF (Yamashina and Yamada, 1992) is combined with current mode clock distribution network, exhibits 0.59  $\mu$ w at 2 GHz frequency of operation. The circuit of CPSFF clock distribution network is as follows in Figure 5.



### SIMULATION RESLUTS









| Table 1: Comparison of CMPFFE<br>and CPSFF |          |      |         |      |
|--------------------------------------------|----------|------|---------|------|
| Parameters                                 | CMPFFE   |      | CPSFF   |      |
| No. of Transistors                         | 13       | 15   | 12      | 15   |
| Capacitance (pF)                           | 2.22     | 5.1  | 0.1     | 0.1  |
| Metal Capacitance (pF)                     | 2.16     | 2.68 | 0.08    | 0.08 |
| No. of Contacts with metals                | 28       | 18   | 10      | 12   |
| Tiime Pa                                   | rameters | 3    |         |      |
| tp HL (ps)                                 | 114      |      | 30      |      |
| tp LH (ps)                                 | 1111     |      | 31      |      |
| Delay time (ps)                            | 612.5    |      | 30.5    |      |
| ElectricalL                                | Paramet  | ers  |         |      |
| Load Capacitance (pF)                      | 2 14.08  |      | 4.46    |      |
| Vdd (V)                                    | 0.4      |      | 0.4     |      |
| Idd (mA)                                   | 0.083    |      | 0.05    |      |
| Frequency (MHz)                            | 250      |      | 500     |      |
| Dynamic power (µW)                         | 1.339    |      | 0.35    |      |
| Static power (W)                           | 0.03 mW  |      | 0.02 mW |      |
| Geometric                                  | Paramete | ers  |         |      |
| Width (µm)                                 | 20.9     |      | 11.7    |      |
| Height (µm)                                | 3.9      |      | 3.7     |      |
| Surface Area $(\mu m^2)$                   | 81.3     |      | 43.1    |      |

## CONCLUSION

In this paper, we first implemented a CMPFFE and its need in clock distribution network. It exhibits a large amount of load capacitance. The proposed CPSFF with shared clock technique is 40% faster and the amount of leakage current is reduced to 10%. CPSFF enables a 38.2% reduction in power consumption compared to conventional CMPFF. It also reduces the amount of clocking transistors to eliminate the floating point by using the 45 nm CMOS technology design rule.

### REFERENCES

- Chithra R and Gowri Sankaran B (2015), "Design of Low Power Clocked Pair Shared Flip-Flop Using Power Gating Technique", in *Proc. IJIRCCE*, Vol. 3, April, pp. 41-46.
- Dave M, Jain M, Baghini S and Sharma D (2012), "A Variation Tolerant Current-Mode Signaling Scheme for On-Chip Interconnects", *IEEE Trans. Very Large*

*Scale Integr. (VLSI) Syst.*, Vol. 99, pp. 1-12.

- Eby G Friedman (2001), "Clock Distribution Networks in Synchronous Digital Integrated Circuits", *Proc. IEEE*, Vol. 89, No. 5, pp. 665-692.
- Guthaus M R, Wilke G and Reis R (2013), "Revisiting Automated Physical Synthesis of High-Performance Clock Networks", *ACM Trans. Design Autom. Electron. Syst.*, Vol. 18, No. 2, pp. 31:1-31:27.
- 5. Islam R and Guthaus M (2014), "Current-Mode Clock Distribution", in *Proc. ISCAS*, June, pp. 1203-1206.
- Kancharapu N K, Dave M, Masimukkula V, Baghini M S and Sharma D K (2011), "A Low-Power Low-Skew Current-Mode Clock Distribution Network in 90 nm CMOS Technology", in *Proc. IEEE Comput. Soc. Annu. Symp. VLSI* (ISVLSI), July, pp. 132-137.
- Katoch A, Veendrick H and Seevinck E (2005), "High Speed Current-Mode Signaling Circuits for On-Chip Interconnects", in *Proc. ISCAS*, May, pp. 4138-4141.

- Narasimhan A, Divekar S, Elakkumanan P and Sridhar R (2005), "A Low Power Current-Mode Clock Distribution Scheme for Multi-GHz NoC Based SoCs", in *Proc. 18th Int. Conf. VLSI Design*, January, pp. 130-135.
- Riadul Islam and Matthew R Guthaus (2015), "Low Power Clock Distribution Using Current-Pulsed Clocked Flip-Flop", in *IEEE Transactions on Circuits and Systems*, Vol. 62, No. 4, pp. 1156-1164.
- Roy K, Mukhopadhyay S and Mahmoodi-Meimand H (2003), "Leakage Current Mechanisms and Leakage Reduction Techniques in Deep-Submicrometer CMOS Circuits", *Proc. IEEE*, Vol. 91, No. 2, pp. 305-327.
- Sylvester D and Hu C (2001), "Analytical Modeling and Characterization of Deep-Submicrometer Interconnect", *Proc. IEEE*, Vol. 89, No. 5, pp. 634-664.
- Yamashina M and Yamada H (1992), "An MOS Current Mode Logic (MCML) Circuit for Low-Power Sub-GHz Processors", *IEICE Trans. Electron.*, Vol. E75-C, No. 10, pp. 1181-1187.