# Development and Analysis of a Dual Stack Approach for Low Leakage and High Performance VLSI Design

A thesis submitted to
The Department of Electrical and Electronic Engineering (EEE)
for partial fulfillment of the requirements for the degree of
Master of Science in EEE (M.Sc. in EEE)

By

**Most. Sultana Nasrin** Student no: 040806231P

**Thesis Supervisor:** Professor Md. Shafiqul Islam, *PhD* 

June 2011



This thesis entitled "Development and Analysis of a Dual Stack Approach for Low Leakage and High Performance VLSI Design" submitted by Most. Sultana Nasrin, Roll No. 040806231P to the Department of Electrical & Electronic Engineering, BUET, Dhaka has been accepted as satisfactory in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE IN ELECTRICAL AND ELECTRONIC ENGINEERING.

## **BOARD OF EXAMINERS**

| Dr. Md. Shafiqul Islam Professor Department of EEE BUET, Dhaka-1000.                                                       | <b>Chairman</b><br>(Supervisor)      |
|----------------------------------------------------------------------------------------------------------------------------|--------------------------------------|
| Dr. Md. Saifur Rahman Professor and Head Department of EEE BUET, Dhaka-1000.                                               | <b>Member</b><br>(Ex-Officio)        |
| Dr. A. B. M Harun-Ur-Rashid Professor Department of EEE BUET, Dhaka-1000.                                                  | Member                               |
| Dr. Kazi M. A. Salam Assistant Professor Department of Electrical Engineering and Comp North South University, Dhaka-1229. | Member<br>(External)<br>uter Science |

# Bangladesh University of Engineering and Technology (BUET)

# **DECLARATION**

| It is hereby declared that this thesis or award of any degree or diploma. | any part of it has not been submitted elsewhere for the |
|---------------------------------------------------------------------------|---------------------------------------------------------|
| Signature of the Candidate:                                               |                                                         |
| Most. Sultana Nasrin (040806231P)                                         |                                                         |

# **ACKNOWLEDGEMENTS**

I would like to express my sincere gratitude and appreciation to everyone who made this thesis possible. Most of all, I would like to thank my supervisor, Professor Md. Shafiqul Islam, for his guidance of my research and his patience during my M.Sc. study. With his knowledge and experience, he has guided me to successfully achieve my research objective. Second, I would also like to thank all my colleagues who have helped me to complete this thesis. Last, but most importantly, I would like to specially thank to my husband for his love and support during my M.Sc. He encouraged me to finish my research.

# TABLE OF CONTENTS

| BOARD OF    | EXAMINERS                                                        | i    |
|-------------|------------------------------------------------------------------|------|
| DECLARAT    | ION                                                              | ii   |
| ACKNOWL     | EDGEMENTS                                                        | iii  |
| TABLE OF O  | CONTENTS                                                         | iv   |
| LIST OF TA  | BLES                                                             | vii  |
| LIST OF FIC | URES                                                             | viii |
| ABSTRACT    |                                                                  | X    |
|             |                                                                  |      |
| CHAPTER I   | INTRODUCTION                                                     | 1    |
| 1.1         | Problem Statement                                                | 2    |
| 1.2         | Motivation                                                       | 2    |
| 1.3         | Thesis organization                                              | 3    |
| CHAPTER I   | DESIGN CONSIDERATION                                             | 5    |
| 2.1         | Leakage power                                                    | 5    |
| 2.2         | SRAM Cell leakage paths                                          | 8    |
| 2.3         | Switching power and delay tradeoffs                              | 8    |
| CHAPTER I   | II PREVIOUS WORKS                                                | 12   |
| 3.1         | Static Power Reduction VLSI Research                             | 12   |
|             | 3.1.1 Static Power Reduction Research for Generic Logic Circuits | 12   |
|             | 3.1.1.1 Base case                                                | 12   |
|             | 3.1.1.2 Sleep transistor                                         | 13   |
|             | 3.1.1.3 Forced stack                                             | 15   |
|             | 3.1.1.4 Sleepy stack                                             | 19   |
|             | 3.1.1.5 Sleepy keeper                                            | 20   |
|             | 3.1.1.6 Dual sleep                                               | 21   |
|             | 3.1.1.7 Dual stack                                               | 22   |
| 3.2         | Power Reduction Research Using Voltage Scaling                   | 23   |
|             | 3.2.1 Multiple Vdd and V <sub>4</sub> , Optimization             | 23   |

| CHAPTER IV | PROPOSED METHOD                                                      | 25   |
|------------|----------------------------------------------------------------------|------|
| 4.1        | Proposed structure                                                   | 25   |
| 4.2        | Operation of the proposed structure                                  | 26   |
| 4.3        | Delay model                                                          | 27   |
| 4.4        | Analytical Comparison of Proposed approach vs. Sleepy Stack Inverter | .28  |
|            | 4.4.1 Delay model of sleepy stack inverter                           | 28   |
|            | 4.4.2 Delay model of proposed inverter                               | 29   |
| 4.5        | Analytical Comparison of Proposed approach vs. Dual Stack Inverter   | .30  |
|            | 4.5.1 Delay model of dual stack inverter                             | 30   |
|            | 4.5.2 Delay model of proposed inverter                               | 32   |
| CHAPTER V  | APPLYING PROPOSED METHOD                                             | 33   |
| 5.1        | Application of proposed method in A Chain of Four Inverters          |      |
| 5.2        | Application of proposed method in SRAM Cell                          |      |
| CHAPTER V  | I SIMULATION RESULTS                                                 | 36   |
|            |                                                                      |      |
| 6.1        | Simulation results for a chain of four inverters                     |      |
| 6.2        | Simulation results for SRAM cell                                     |      |
| 6.3        | Summary                                                              | 44   |
| CHAPTER V  | II CONCLUSIONS AND SUGGESTIONS FOR FUTURE                            |      |
|            | RESEARCH                                                             | 45   |
| 7.1        | Conclusions                                                          | 45   |
| 7.2        | Suggestions for future work                                          | 46   |
| REFERENCE  | SS                                                                   | 47   |
| APPENDICE  | S                                                                    |      |
| APPENDIX-A | A DESIGN LAYOUTS OF A CHAIN OF FOUR INVERTERS                        |      |
| A1:        | Layout of a chain of four inverters in sleepy stack method           | A.1  |
| A2:        | Layout of a chain of four inverters in dual sleep method             | .A.2 |
| A3:        | Layout of a chain of four inverters in dual stack method             | .A.3 |

| A4:        | Layout of a chain of f | four inverters in proposed method      |
|------------|------------------------|----------------------------------------|
| APPENDIX-I | B DESIGN LAY           | YOUTS OF SRAM CELL                     |
| B1:        | Layout of SRAM cell    | l in sleepy stack methodB.1            |
| B2:        | Layout of SRAM cell    | l in dual sleep methodB.2              |
| В3:        | Layout of SRAM cell    | l in dual stack methodB.3              |
| B4:        | Layout of SRAM cell    | l in proposed methodB.4                |
|            |                        |                                        |
| APPENDIX-0 | C SIMULATIO            | N DATA (A CHAIN OF FOUR INVERTERS) C.1 |
| C1:        | 130 nm technology      | C.1                                    |
| C3:        | 90 nm technology       | C.1                                    |
| C4:        | 65 nm technology       | C.1                                    |
| C5:        | 45 nm technology       | C.2                                    |
| C6:        | 32 nm technology       |                                        |
|            |                        |                                        |
| APPENDIX-I | O SIMULATIO            | N DATA (SRAM CELL)D.1                  |
| D1:        | 130 nm technology      | D.1                                    |
| D3:        | 90 nm technology       | D.1                                    |
| D4:        | 65 nm technology       | D.1                                    |
| D5:        | 45 nm technology       | D.2                                    |
| D6:        | 32 nm technology       | D.2                                    |

# LIST OF TABLES

| Table I:   | Leakage model parameters (0.5µtech)                                        | .19 |
|------------|----------------------------------------------------------------------------|-----|
| Table II:  | Power supply voltage for different technologies                            | .36 |
| Table III: | A chain of four inverters in 65nm technology                               | .39 |
| Table IV:  | Comparison between different circuit techniques for a chain of four invert | ers |
|            | in 65nm technology (values in percentage)                                  | .40 |
| Table V:   | A chain of four inverters in 65nm technology                               | .43 |
| Table VI:  | Comparison between different circuit techniques for SRAM cell in 65nm      |     |
|            | technology (values in percentage)                                          | .44 |

# LIST OF FIGURES

| Figure 2.1:   | Subthreshold leakage of an nFET                                              | 5   |
|---------------|------------------------------------------------------------------------------|-----|
| Figure 2.2:   | Impact of short-channel effects on drain current. As channel length is reduc | ed. |
|               | Subthreshold swing increases (S2>S1) and threshold voltage decreases         |     |
|               | (VTH,2 <vth,1)< td=""><td>7</td></vth,1)<>                                   | 7   |
| Figure 2.3:   | (a) A single transistor                                                      | 7   |
|               | (b) Stacked transistors                                                      | 7   |
| Figure 2.4:   | SRAM cell leakage paths                                                      | 8   |
| Figure 3.1:   | Base case                                                                    | .13 |
| Figure 3.2:   | Sleep transistor technique                                                   | .13 |
| Figure 3.3:   | Forced stack                                                                 | .15 |
| Figure 3.3.1: | A single transistor                                                          | .16 |
| Figure 3.3.2: | A stacked transistor                                                         | .16 |
| Figure 3.4:   | Sleepy stack transistor                                                      | .20 |
| Figure 3.5:   | Sleepy Keeper                                                                | .21 |
| Figure 3.6:   | Dual sleep method                                                            | .22 |
| Figure 3.7:   | Dual stack method                                                            | .23 |
| Figure 4.1:   | Proposed method                                                              | .25 |
| Figure 4.2:   | (a) Inverter logic circuit (left)                                            | .27 |
|               | (b) RC equivalent circuit (right)                                            | .27 |
| Figure 4.3:   | (a) Sleepy stack technique inverter (left)                                   | .28 |
|               | (b) RC equivalent circuit (right)                                            | .28 |
| Figure 4.4:   | (a) Proposed inverter (left)                                                 | .29 |
|               | (b) RC equivalent circuit (right)                                            | .29 |
| Figure 4.5:   | (a) Dual stack technique inverter (left)                                     | .31 |
|               | (b) RC equivalent circuit (right)                                            | .31 |
| Figure 5.1:   | A chain of four inverters                                                    | .33 |
| Figure 5.2:   | Four Step Inverter using proposed method                                     | .34 |
| Figure 5.3:   | SRAM cell using proposed method                                              | .35 |
| Figure 6.1:   | Static Power Dissipation of a chain of 4 inverters                           | .37 |

| Figure 6.2: | Dynamic Power Dissipation of a chain of 4 inverters | 38 |
|-------------|-----------------------------------------------------|----|
| Figure 6.3: | Propagation Delay of a chain of 4 inverters         | 38 |
| Figure 6.4: | Area of a chain of 4 inverters                      | 39 |
| Figure 6.5: | Static Power Dissipation of SRAM cell               | 41 |
| Figure 6.6: | Dynamic Power Dissipation of SRAM cell              | 42 |
| Figure 6.7: | Propagation Delay of SRAM cell                      | 42 |
| Figure 6.8: | Area of SRAM cell                                   | 43 |

# **ABSTRACT**

The main objective of this thesis is to provide new solutions to reduce leakage power for Very Large Scale Integration (VLSI) designers. Especially, we focus on leakage power reduction. Although leakage power was negligible at 0.18µ technology and above, in nanoscale technology, such as 0.07µ, leakage power is almost equal to dynamic power consumption. In 65 nm and below technologies, leakage accounts for 30-40% of processor power.

In this thesis we propose a new technique to reduce leakage power with minimum area. It is a state saving technique which makes it better than traditional sleep transistor technique. As it is a state saving technique, it can be used in memory design i.e. SRAM (Static Random Access Memory) cell. Although the proposed approach incurs some delay, the SRAM cell with proposed method can achieve ultra-low leakage power consumption while suppressing two main leakage paths in an SRAM cell.

Unlike the stack approach (which saves state), this approach can work well with dual-Vth technologies, reducing leakage by several orders of magnitude over the stack approach in single-Vth technology. In comparison with the most common approaches in VLSI design (sleepy stack, dual stack and dual sleep approaches), the proposed method shows better leakage power(almost 50% leakage reduction than dual stack and 70% leakage reduction than dual sleep) and dynamic power dissipation than dual stack, dual sleep, sleepy stack and better speed than sleepy stack, dual sleep. Moreover, the area required by proposed method is much less than those of the sleepy stack and dual stack approaches.

# **CHAPTER I**

#### INTRODUCTION

Power consumption is one of the top concerns of Very Large Scale Integration (VLSI) circuit design, for which Complementary Metal Oxide Semiconductor (CMOS) is the primary technology. Today's focus on low power is not only because of the recent growing demands of mobile applications. Even before the mobile era, power consumption has been a fundamental problem. To solve the power dissipation problem, many researchers have proposed different ideas from the device level to the architectural level and above. However, there is no universal way to avoid tradeoffs between power, delay and area, and thus designers are required to choose appropriate techniques that satisfy application and product needs.

Power consumption in CMOS consists of dynamic and static components. Dynamic power is consumed when transistors are switching, and static power is consumed regardless of transistor switching. Dynamic power consumption was previously (at 0.18µ technology and above) the single largest concern for low power chip designers since dynamic power accounted for 90% or more of the total chip power. But now, as the technology feature size scales down, static power has become a great challenge for current and future technologies. Based on the International Transistor Roadmap for Semiconductor (ITRS), Kim, *et al.* reported that subthreshold leakage power dissipation of a chip may exceed dynamic power dissipation at the 65nm feature size [1], [2].

One of the main reasons causing the leakage power increase is increase of subthreshold leakage power. When technology feature size scales down, supply voltage and threshold voltage also scale down. Subthreshold leakage power increases exponentially as threshold voltage decreases. Furthermore, the structure of the short channel device lowers the threshold voltage even lower. In addition to subthreshold leakage, another contributor to leakage power is gate-oxide leakage power due to the tunneling current through the gate-oxide insulator. Since gate-oxide thickness will be reduced as the technology decreases, in nanoscale

technology, gate-oxide leakage power may be comparable to subthreshold leakage power if not handled properly.

In this thesis, we provide a novel approach as a new remedy for designers in terms of leakage power, area and speed. Here we explore the basic structure of this approach. Also, we study various proposed circuits including generic logic circuits and memory.

#### 1.1 Problem Statement

Power consumption is a major concern in VLSI systems. Until recently, dynamic power was a great concern. But recently, due to the decreasing technology of feature size, static power is the top most concern in VLSI circuits. In fact static power increases exponentially in nanoscale VLSI systems. So, it is an important issue to reduce leakage power consumption.

Techniques for reducing leakage power can be grouped into two categories: state- preserving techniques where circuit state (present value) is retained and state-destructive techniques where the current Boolean value might be lost. A state-preserving technique has an advantage over state-destructive technique is that with a state preserving technique the circuitry can resume operation at a point much later in time without having to somehow regenerate state.

There are several methods for reducing the leakage power. But each of the methods has some limitations. In this thesis we propose a novel approach that achieves ultra-low leakage power consumption while maintaining logic state with minimum area requirements and less delay, and thus can be used for a long inactive time but a fast response time requirement.

#### 1.2 Motivation

Most of the portable systems, such as cellular communication devices, and laptop computers operate from a limited power supply. Devices like cell phones have long idle times and operate in standby mode when not in use. Consequently, the extension of battery-based operation time is a significant design goal which can be made possible by controlling the leakage current

flowing through the CMOS gate. Low-power consumption in high performance VLSI circuits is highly desirable aspect as it directly relates to battery life, reliability, packaging, and heat removal costs. With the continuous trend of technology scaling, leakage power is becoming major contributor to the total power consumption in CMOS circuits. Scaling of Vdd reduces dynamic power consumption but degrades the performance of the circuit as well. This can be partially compensated by lowering Vth but at the cost of increased leakage power. Minimizing leakage power consumption is currently an extremely challenging area of research.

For low leakage consumption there are several methods available. One of the most common methods is sleep transistor method which cuts off Vdd and/or Gnd connections of transistors to save leakage power consumption. However, when transistors are allowed to float, a system may have to wait a long time to reliably restore lost state and thus may experience seriously degraded performance. Therefore, retaining state is crucial for a system that requires fast response even while in an inactive state. As the sleep transistor technique is a state destructive method, it is not suitable for VLSI logic design. Another method is stack approach. But the area requirement in this case is a great problem. Again dynamic power dissipation is also high and the worse case is the propagation delay. The next two important proposals are sleepy stack and sleepy keeper approaches. But sleepy keeper approach consumes more static and dynamic power than sleepy stack, whereas the sleepy stack approach is slower and consumes more area than sleepy keeper. Dynamic power is increased in dual sleep method. Area requirement is higher in dual stack technique. Hence we sought a new way to trade off between these characteristics. This new method ensures excellent tradeoff between these.

### 1.3 Thesis Organization

The thesis is organized into seven chapters.

CHAPTER I: INTRODUCTION. This chapter introduces power consumption issues in VLSI. This chapter also explains motivation and organization of the thesis.

CHAPTER II: DESIGN CONSIDERATION. This chapter explains design criteria used throughout this thesis.

CHAPTER III: PREVIOUS WORKS. This chapter describes previous work in power reduction research and explains key differences between our solutions and previous work.

CHAPTER IV: PROPOSED METHOD. This chapter introduces the novel proposed technique. The structure of the proposed technique is described followed by a detailed explanation of its operation. Delay model of our proposed method is compared here analytically with some previous techniques.

CHAPTER V: APPLYING PROPOSED METHOD. This chapter explores various applications of the proposed approach. The applications include generic logic and memory circuits. For each application of the proposed technique, comparisons with the best known prior low-leakage techniques are carried out using benchmark circuits.

CHAPTER VI: SIMULATION RESULTS. This chapter discusses the simulation results from various applications of the proposed approach. The proposed technique is empirically compared to well-known previous approaches. The comparisons are assessed in terms of area, dynamic power, static power and propagation delay.

CHAPTER VII: CONCLUSIONS AND SUGGESTIONS FOR FUTURE RESEARCH. This chapter summarizes the major accomplishments of this thesis and suggestions for future research are also included in this chapter.

# **CHAPTER II**

#### **DESIGN CONSIDERATION**

Power consumption in CMOS consists of dynamic and static components. Dynamic power is consumed when transistors are switching, and static power is consumed regardless of transistor switching. Dynamic power consumption was previously (at 180nm technology and above) the single largest concern for low-power chip designers since dynamic power accounted for 90% or more of the total chip power. Therefore, many previously proposed techniques, such as voltage and frequency scaling, focused on dynamic power reduction. However, as the feature size shrinks, e.g., to 90nm and 65nm and less, static power has become a great challenge for current and future technologies. Based on the ITRS, Kim *et al.* [2] reported that subthreshold leakage power dissipation of a chip may exceed dynamic power dissipation at the 65nm feature size. Four types of characteristics are considered in design process and for the purpose of comparison between the previous methods and the proposed technique. These four design considerations are static power, dynamic power, propagation delay and area.

#### 2.1 Leakage Power



Figure 2.1: Subthreshold leakage of an NMOS.

The current flow in the channel of an FET depends on creating and sustaining an inversion layer on the surface. If the gate bias voltage is not sufficient to invert the surface ( $V_{GS} < V_{TO}$ ), the carriers (electrons) in the channel face a potential barrier that blocks the flow. Increasing the gate voltage reduces this potential barrier and eventually allows the flow of

carrier under the influence of the channel electric field. In small geometry MOSFETs, the potential barrier is controlled by both gate to source voltage  $V_{GS}$  and the drain to source voltage  $V_{DS}$ . If the drain voltage is increased, the potential barrier in the channel decreases, leading to drain induced barrier lowering (DIBL). The reduction of the potential barrier eventually allows electron flow between the source and the drain even if the gate to source voltage is lower than the threshold voltage. The channel current that flows under the condition ( $V_{GS} < V_{TO}$ ) is called the sub-threshold current.

One of the main contributors to static power consumption in CMOS is subthreshold leakage current shown in Figure 2.1, i.e., the drain to source current when the gate voltage is smaller than the transistor threshold voltage. Since subthreshold current increases exponentially as the threshold voltage decreases, nanoscale technologies with scaled down threshold voltages will severely suffer from subthreshold leakage power consumption.

The relation between sub-threshold current (I<sub>d</sub>) and gate voltage is

$$I_{d} = I_{S} \exp \left( V_{gs} / \xi V_{T} \right) \tag{2.1}$$

The threshold voltage is the value of the gate voltage that turns on the transistor by inducing a highly conductive path in the channel from the source to the drain. The subthreshold swing is the change of the gate voltage in the subthreshold region that is required for an order-of-magnitude change of the drain current. As the channel length (*L*) of a typical MOSFET is reduced with all other device parameters held constant, the threshold voltage decreases, and the subthreshold swing increases, as illustrated in Figure 2.2. Collectively, threshold voltage roll off and subthreshold swing rollup are commonly known as short-channel effects (SCE). In consequence of SCEs, as seen in Figure 2.2, the ratio of the drive (ON) current to the leakage (OFF) current is substantially reduced, which imposes severe trade-offs between circuit speed and standby power. In addition, SCEs amplify the impact of process variations on integrated circuits (IC), impairing their reliability and even functionality. Therefore, suppression of SCEs to acceptable levels is of the highest importance during MOSFET scaling.



Figure 2.2: Impact of short-channel effects on drain current. As channel length is reduced. Subthreshold swing increases (S2>S1) and threshold voltage decreases (VTH, 2< VTH, 1).

Another contributor to leakage power is gate oxide leakage due to the tunneling current through the gate oxide insulator. Since gate oxide thickness will be reduced as the technology decreases, in nanoscale technology, gate oxide leakage power may be comparable to subthreshold leakage power. High-K dielectric gate insulators may provide a solution to reduce gate leakage.



Figure 2.3: (a) A single transistor (left) and (b) stacked transistors (right)

Subthreshold leakage can be reduced by stacking transistors, i.e., taking advantage of the so-called "stack effect" [3]. The stack effect occurs when two or more stacked transistors are turned off together; the result is reduced leakage power consumption (Fig. 2.3).

## 2.2 SRAM Cell Leakage Paths

In this section, we explain the major subthreshold leakage components in a 6-T SRAM cell. The subthreshold leakage current in an SRAM cell is typically categorized into two kinds [4] as shown in Figure 2.4: (i) cell leakage current that flows from Vdd to Gnd internal to the cell and (ii) bitline leakage current that flows from bitline (or bitline') to Gnd. Although an SRAM cell has two bitline (BL) leakage paths, the bitline leakage current and bitline' (BL') leakage current differs according to the value stored in the SRAM bit. If an SRAM cell holds '1' as shown in Figure 2.4, the bitline leakage current passing through N3 and N2 is effectively suppressed due to two reasons. First, after precharging bitline and bitline' both to '1,' the source voltage and the drain voltage of N3 are the same, and thus potentially no current flows through N3. Second, two stacked and turned off transistors (N2 and N3) induce the stack effect.



Figure 2.4: SRAM cell leakage paths

Meanwhile, for this case where the SRAM bit holds value '1,' a large bitline' leakage current flows passing through N4 and N1. If, on the other hand, the SRAM cell holds '0,' a large bitline leakage current flows while bitline' leakage current is suppressed.

# 2.3 Switching Power and Delay Tradeoffs

In this section, we explain tradeoffs between switching power and delay. In CMOS, power consumption consists of leakage power and dynamic power – note that dynamic power includes both switching power and short-circuit power. Switching power is consumed when a

gate charges its output load capacitance, and short-circuit power is consumed when a pull-up network and a pull-down network are on together for an instant while transistors are turning on and off. For 180nm channel lengths and above, leakage power is very small compared to dynamic power. Furthermore, short-circuit power is also less than 10% of the dynamic power for a typical CMOS design, and the ratio between dynamic power and short-circuit power does not change as long as the ratio between supply voltage and threshold voltage remains the same [5]. Since, for 180nm and above, short-circuit power and leakage power are relatively small compared to switching power, CMOS power consumption of a particular CMOS gate under consideration can be represented by the following switching power ( $P_{switching}$ ) equation:

$$P_{switching} = \rho C_L V_{dd}^2 f$$
 (2.2)

Where,  $C_L$ ,  $V_{dd}$ , and f denote the load capacitance of a CMOS gate, the supply voltage and the clock frequency, respectively [6]. Notation  $\rho$  denotes the switching ratio of a gate output; this switching ratio represents the number of times the particular gate's output changes from Gnd to  $V_{dd}$  per second – please note that when output capacitance discharges from  $V_{dd}$  to Gnd, switching power is not consumed because power from  $V_{dd}$  is not used (e.g., discharging to Gnd does not consume battery power). The switching ratio varies according to the input vectors and benchmark programs, and thus an average value of each benchmark may be used as a switching ratio.

Equation 2.2 shows that lowering  $V_{dd}$  decreases CMOS switching power consumption quadratically. However, this power reduction unfortunately entails an increase in the gate delay in a CMOS circuit as shown in following approximated equation:

$$T_d \propto \frac{V_{dd}}{\left(V_{dd} - V_{th}\right)^{\alpha}} \tag{2.3}$$

Where Td,  $V_{th}$ , and  $\alpha$  denote the gate delay in a CMOS circuit, the threshold voltage and velocity saturation index of a transistor, respectively. It is well-known that while  $\alpha$  has values close to 2 for above  $2.0\mu$ , for  $0.25\mu$   $\alpha$  is between 1.3 and 1.5, and for below  $0.1\mu$   $\alpha$  is close to 1 [7], [8]. However, instead of scaling down  $\alpha$  value along with the technology feature size, CMOS technology may take a constant  $\alpha$  value to avoid the hot carrier related problem [9]. A constant  $\alpha$  value could be accomplished by changing Vth because  $\alpha$  is a function of gatesource voltage [10]. If we scale down Vdd, switching power in Equation 2.2 decreases, while the gate delay in Equation 2.3 increases. Therefore, CMOS circuit speed can be traded with switching power consumption as shown in Equations 2.2 and 2.3. When there exists tradeoffs between multiple criteria, e.g., power and delay, we may say one design is better than another design in specific criteria. The point of design space is called a Pareto point if there is no point with one or more inferior objective [11]. In this thesis we estimate leakage power consumption by measuring static power when transistors are not switching. Furthermore, we estimate active power consumption by measuring power when transistors are switching. This active power includes dynamic power consumption and leakage power consumption.

In theory, a new technology generation with transistor width, length, and oxide thickness scaled down by 30% will accomplish all of these three goals.

Previously scaling was done keeping the voltage constant until reaching 0.8µ feature size. This approach keeps the dynamic power consumption resulting from charging/discharging of capacitances per transistor the same for different technologies, as seen in equation 2.4;

$$Power = C \times V_{DD}^{2} \times f \tag{2.4}$$

This approach leads to a dramatic increase in power consumption as the transistor count and design complexity (and therefore switching activity) increased. For this reason, after  $0.8\mu$  technology, constant electric field scaling instead of constant voltage scaling was employed. In constant electric field scaling, supply voltage is scaled down by the same amount as the feature size. So, for a 0.7 scaling factor, this approach leads to a ~50% reduction in power consumption per transistor, as seen in Equation 2.5;

Power 
$$\propto 0.7 \times C \times 0.7^2 \times V_{DD}^2 \times f / 0.7$$
  
=  $0.49 \times C \times V_{DD}^2 \times f$  (2.5)

Even though reduced power supply voltages decrease dynamic power consumption per transistor, the trend of reducing power supply voltage led to a significant increase in the leakage power consumption because of the necessity to reduce threshold voltage in order to compensate the drive loss caused by the reduced supply voltage. As a common practice, threshold voltage for a process is usually chosen to be smaller than one quarter of the supply voltage value to ensure that performance does not suffer excessively [12]. This approach combined with the exponential relationship of leakage current to threshold voltage, led to a significant increase in the percentage of leakage power consumption in total system power consumption. If this trend continues, the leakage power consumption will be equal to the dynamic power consumption in a couple of technology generations. Since the leakage energy is- in a sense "wasted" energy, half of the energy dissipation will be waste in the future technologies if significant improvements in device, circuit, architecture, and software are not introduced.

# **CHAPTER III**

#### PREVIOUS WORKS

In this chapter, we review important prior work that is closely related to our research. Furthermore, the previous work is compared to our research. We explore the prior work targeting leakage power reduction.

#### 3.1 Static Power Reduction VLSI Research

In this section, we discuss previous low-power techniques that primarily target reducing leakage power consumption of CMOS circuits. Techniques for leakage power reduction can be grouped into two categories: (i) state-saving techniques where circuit state (present value) is retained and (ii) state-destructive techniques where the current Boolean output value of the circuit might be lost [13]. A state-saving technique has an advantage over a state-destructive technique in that with a state-saving technique the circuitry can immediately resume operation at a point much later in time without having to somehow regenerate state. We characterize each low-leakage technique according to this criterion.

#### 3.1.1 Static Power Reduction Research for Generic Logic Circuits

This section explains low-leakage techniques for generic logic circuits. Although our research focuses on techniques which save state, we also review the state-destructive techniques for the purposes of comparison.

#### **3.1.1.1** Base case

The base case circuit contains only the PMOS network and the NMOS network and there exists no method to reduce leakage. A base case inverter is shown as for clarity in Figure 3.1.

Though it is a state-saving technique it is not efficient in case of static as well as dynamic power management though it requires the minimum possible area.



Figure 3.1: Base case

#### 3.1.1.2 Sleep transistor



Figure 3.2: Sleep transistor technique

State-destructive techniques cut off transistor (pull-up or pull-down or both) networks from supply voltage or ground using sleep transistors [14]. These types of techniques are also called gated-Vdd and gated-Gnd (note that a gated clock is generally used for dynamic power reduction). Motoh *et al.* propose a technique they call Multi-Threshold-Voltage CMOS (MTCMOS) [14], which adds high-Vth sleep transistors between pull-up networks and Vdd

and between pull-down networks and ground as shown in Figure 3.2 while logic circuits use low-Vth transistors in order to maintain fast logic switching speeds. The sleep transistors are turned off when the logic circuits are not in use. By isolating the logic networks using sleep transistors, the sleep transistor technique dramatically reduces leakage power during sleep mode. However, the additional sleep transistors increase area and delay. Furthermore, the pull-up and pull-down networks will have floating values and thus will lose state during sleep mode. These floating values significantly impact the wakeup time and energy of the sleep technique due to the requirement to recharge transistors which lost state during sleep (this issue is nontrivial, especially for registers and flip-flops).

#### **Effect of introducing sleep Transistor(s) in Active Mode:**

During the normal mode of circuit operation, the sleep transistors can be modeled as a resistor R. Assuming that the current flowing into the transistor is I, this resistance will cause a voltage drop across it, say  $V_{\text{sleep}}$ . Therefore, the gate driving capability reduces to  $V_{\text{dd}}$  -  $V_{\text{sleep}}$  from  $V_{\text{dd}}$ . This reduction in driving capability causes degradation in circuit performance.

To overcome this problem, it is essential to lower the resistance R of the transistor as much as possible. This in turn implies increasing the size (width) of the transistor, since the resistance of the transistor is inversely proportional to it's width. This, however, comes at an expense of increased area and dynamic power dissipation. Conversely, a small size transistor would degrade the circuit speed. A solution to this problem would be to reduce the threshold voltage V<sub>th</sub> but the sub-threshold current and hence the leakage power would increase exponentially. Hence, there is a clear trade-off between area, power and delay metrics of a circuit for low leakage designs.

#### So, sleep transistors

- ➤ Increase area and delay
- Lose state during sleep mode

#### 3.1.1.3 Forced stack

Another technique to reduce leakage power is transistor stacking (Fig.3.3). Transistor stacking exploits the stack; the stack effect results in substantial sub-threshold leakage current reduction when two or more stacked transistors are turned off together. Narendra *et al.* studied the effectiveness of the stack effect including effects from increasing the channel length [15]. Since forced stacking of what previously was a single transistor increases delay, Johnson *et al.* propose an algorithm that finds circuit input vectors which maximizes stacked transistors of existing complex logic [16].



Figure 3.3: Forced stack

#### Reducing leakage power through stack effect

Subthreshold leakage can be reduced by stacking transistors, i.e., taking advantage of the so-called "stack effect" [3]. The stack effect occurs when two or more stacked transistors are turned off together; the result is reduced leakage power consumption. Let us explain an important stack effect leakage reduction model. The model we explain here is based on the leakage models [3], [17].



Figure 3.3.1: A single transistor



Figure 3.3.2: A stacked transistor

For a turned off single transistor shown in Figure 3.3.1, leakage current ( $I_{sub0}$ ) can be expressed as follows:

$$I_{sub\ 0} = Ae^{\frac{1}{nV_{\theta}} \left(V_{gs\ 0} - V_{th\ 0} - \gamma V_{sb\ 0} + \eta V_{ds\ 0}\right)} \left(1 - e^{-V_{ds\ 0}/V_{\theta}}\right)$$

$$= Ae^{\frac{1}{nV_{\theta}} \left(-V_{th\ 0} + \eta V_{dd}\right)}$$
(3.1)

Where,  $A = \mu_0 C_{ox} \left( W / L_{eff} \right) V_\theta^2 e^{1.8}$ 

n = subthreshold swing coefficient

 $V_{\theta}$  = thermal voltage.

 $V_{gs0}$ ,  $V_{th0}$ ,  $V_{bs0}$  and  $V_{ds0}$  are the gate-to-source voltage, the zero-bias threshold voltage, the base-to-source voltage and the drain-to-source voltage, respectively.  $\gamma$  is the body-bias effect coefficient, and  $\eta$  is the Drain Induced Barrier Lowering (DIBL) coefficient,  $\mu_0$  is zero-bias mobility,  $C_{ox}$  is the gate-oxide capacitance, W is the width of the transistor, and  $L_{eff}$  is the effective channel length [18]. (Note that throughout this project we assume  $\mu_n = 2\mu_p$ , i.e., NMOS carrier mobility is twice PMOS carrier mobility). Also note that we use a W/L ratio based on an actual transistor size, in which way a W/L ratio properly characterizes circuit models used in this case.

Let us assume that the two stacked transistors (M1 and M2) in Figure 3.3.2 are turned off. We also assume that the transistor width of each of M1 and M2 is the same as the transistor width of M0 (WM0 = WM1 = WM2). Two leakage currents Isub1 of the transistor M1 and Isub2 of the transistor M2 can be expressed as follows:

$$I_{sub\ 1} = Ae^{\frac{1}{nV_{\theta}}(V_{gs\ 1} - V_{th\ 0} - \gamma V_{sb\ 1} + \eta V_{ds\ 1})} (1 - e^{-V_{ds\ 1}/V_{\theta}})$$
(3.3)

$$= Ae^{\frac{1}{nV_{\theta}}(-V_{x}-V_{th\ 0}-\gamma V_{x}+\eta (V_{dd}-V_{x}))}$$
(3.4)

$$I_{sub\ 2} = Ae^{\frac{1}{nV_{\theta}}(V_{gs\ 2} - V_{th\ 0} - \gamma V_{sb\ 2} + \eta V_{ds\ 2})} (1 - e^{-V_{ds\ 2}/V_{\theta}})$$
(3.5)

$$= Ae^{\frac{1}{nV_{\theta}}(-V_{th 0} + \eta V_{x})} \left(1 - e^{-V_{x}/V_{\theta}}\right)$$
(3.6)

Where,  $V_X$  is the voltage at the node between M1 and M2.

Now consider leakage current reduction between  $I_{sub0}$  and  $I_{sub1}(=I_{sub2})$ . The reduction factor X can be expressed as follows:

$$X = \frac{I_{sub0}}{I_{sub1}} = \frac{Ae^{\frac{1}{nV_{\theta}}(-V_{th0} + \eta V_{dd})}}{Ae^{\frac{1}{nV_{\theta}}(-V_{x} - V_{th0} - \gamma V_{x} + \eta (V_{dd} - V_{x}))}} = e^{\frac{V_{x}}{nV_{\theta}}(1 + \gamma + \eta)}$$
(3.7)

 $V_X$  in Equation (3.7) can be derived by letting  $I_{sub1} = I_{sub2}$  and by solving the following equation:

$$1 = e^{\frac{1}{nV_{\theta}}(\eta V_{dd} - V_{x}(1 + 2\eta + \gamma))} + e^{\frac{-V_{x}}{V_{\theta}}}$$
(3.8)

If all the parameters are known, we can calculate stack effect leakage power reduction using the equations (3.7) and (3.8). As an example, we consider leakage model parameter values targeting 0.5 $\mu$  technology in Table I [3]. From equation (3.8), we calculate  $V_X = 0.0443V$ , and from equation (3.7) we obtain leakage reduction factor X = 4.188. Although the reduction is 4.188X at 0.5µ technology, the reduction increases at nanoscale technology because η increases as technology feature size shrinks. Threshold voltage of a CMOS transistor can be controlled using body bias. In general, we apply Vdd to the body (e.g., an nwell or n-tub) of PMOS and apply Gnd to a body (e.g., p-well or p-substrate) of NMOS. This condition, in which source voltage and body voltage of a transistor are the same, is called Zero-Body Bias (ZBB). Threshold voltage at ZBB is called ZBB threshold voltage. When body voltage is lower than source voltage by biasing negative voltage to body, this condition is called Reverse-Body Bias (RBB). Alternatively, when body voltage is higher than source voltage by biasing positive voltage to body, this condition is called Forward-Body Bias (FBB). When RBB is applied to a transistor, threshold voltage increases, and when FBB applied to a transistor, threshold voltage decreases. This phenomenon is called body-bias effect, and this is frequently used to control threshold voltage dynamically [31].

Table I: Leakage model parameters (0.5µ tech)

| Parameter                          | Value   |
|------------------------------------|---------|
| Vdd                                | 1V      |
| Vth                                | 0.2V    |
| n (subthreshold slope coefficient) | 1.5     |
| η (DIBL coefficient)               | 0.05V/V |
| γ (body-bias effect coefficient)   | 0.24V/V |

#### 3.1.1.4 Sleepy stack

Another technique to reduce leakage power is sleepy stack [19], [20] structure. The sleepy stack structure has a combined structure of the forced stack and the sleep transistor techniques. Figure 3.4 shows a sleepy stack inverter. The sleepy stack technique divides existing transistors into two transistors each typically with the same width W1 half the size of the original single transistor's width W0 (i.e., W1 = W0/2), thus maintaining equivalent input capacitance. Sleep transistors are added in parallel to one of the transistors in each set of two stacked transistors. Half size transistor width of the original transistor is used for the sleep transistor width of the sleepy stack. During active mode all sleep transistors are turned on. This sleepy stack structure can potentially reduce circuit delay in two ways. First, since the sleep transistors are always on during active mode, the sleepy stack structure achieves faster switching time than the forced stack structure. High-Vth transistors (which are slow but 1000X or so less leaky) can be used for the sleep transistors and the transistors parallel to the sleep transistors without incurring large delay increase.

During sleep mode both of the sleep transistors are turned off. Although the sleep transistors are turned off, the sleepy stack structure maintains exact logic state. The leakage reduction of the sleepy stack structure occurs in two ways. First, leakage power is suppressed by high-Vth transistors, which are applied to the sleep transistors and the transistors parallel to the sleep transistors. Second, two stacked and turned off transistors induce the stack effect, which also suppresses leakage power consumption. By combining these two effects, the sleepy stack

structure achieves ultra-low leakage power consumption during sleep mode while retaining exact logic state. The price for this is increased area.



Figure 3.4: Sleepy stack inverter

#### 3.1.1.5 Sleepy keeper

Another approach utilizes leakage feedback approach [21] as shown in Figure 3.5, a PMOS transistor is placed in parallel to the sleep transistor (S) and a NMOS transistor is placed in parallel to the sleep transistor (S'). The two transistors are driven by the output of the inverter which is driven by the output of the circuit. During sleep mode, sleep transistors are turned off and one of the transistors in parallel to the sleep transistors keep the connection with the appropriate power rail to maintain a value of '1' in sleep mode, given that the '1' value has already been calculated, the sleepy keeper [22] approach uses this output value of '1' and an NMOS transistor connected to VDD to maintain output value equal to '1' when in sleep mode. As shown in Figure 3.5, an additional single NMOS transistor placed in parallel to the pull-up sleep transistor connects VDD to the pull-up network. When in sleep mode, this NMOS transistor is the only source of VDD to the pull-up network since the sleep transistor is off. Similarly, to maintain a value of '0' in sleep mode, given that the '0' value has already been calculated, the sleepy keeper approach uses this output value of '0' and a PMOS transistor connected to Gnd to maintain output value equal to '0' when in sleep mode. As shown in Figure 3.5, an additional single PMOS transistor placed in parallel to the pull-down

sleep transistor is the only source of Gnd to the pull-down network which is the dual case of the output '1' case explained above.



Figure 3.5: Sleepy keeper

#### **3.1.1.6 Dual sleep**

In dual sleep method (Figure 3.6), two sleep transistors are used in each NMOS or PMOS block [23]. One sleep transistor is used to turn on in ON state and the other one is used to turn on in OFF state. Again in OFF state a block is used containing both PMOS and NMOS transistors in order to reduce the leakage power.

Dual sleep approach uses the advantage of using the two extra pull-up and two extra pull-down transistors in sleep mode either in OFF state or in ON state. It uses two pull-up sleep transistors and two pull-down sleep transistors. When S=1 the pull down NMOS transistor is ON and the pull-up PMOS transistor is ON since S'=0. So the arrangement works as a normal device in ON state. During OFF state S is forced to 0 and hence the pull-down NMOS transistor is OFF and PMOS transistor is ON and the pull-up PMOS transistor is OFF while NMOS transistor is ON. So in OFF state a PMOS is in series with an NMOS both in pull-up and pull-down circuits which is liable to reduce power.



Figure 3.6: Dual sleep method

Besides power, a major advantage is area reduction. Since the dual sleep portion can be made common to all logic circuitry, less numbers of transistors are needed to apply a certain logic circuit. For example a chain of 4 inverters both sleepy stack and sleepy keeper approaches require 24 transistors whereas the dual sleep method only requires 12 transistors, thus saving 12 transistors and hence a considerable amount of area reduction can be achieved.

#### **3.1.1.7 Dual stack**

In dual stack method (Figure 3.7) two pairs of transistors, one in the pull-up network and another in the pull-down network are used in order to retain state [24]. Here sleep transistors N5 and P5 are parallel to two other sets of sleep transistors. Those two sets are made up of 1 pair of transistor each. P5 is parallel to the NMOS set with N6 and N7. N5 is parallel to the PMOS set with P6 and P7.

In sleep mode the single sleep transistors are off, i.e. transistor N5 and P5 are off. It is done by making S=0 and hence S'=1. Then the other four transistors P6, P7 and N6, N7 connect the main circuit with power rail. Here two PMOS in the pull-down network and two NMOS in the pull-up network are used. The advantage is that NMOS degrades the high logic level while PMOS degrades the low logic level. Due to the body effect, they further decrease the voltage level. So, the pass transistors decrease the voltage applied across the main circuit. As

static power is proportional to the voltage applied, with the reduced voltage the power decreases but the advantage of state retention is achieved.



Figure 3.7: Dual stack method

While in active mode i.e. S=1 and S'=0, both the single sleep transistors (N5 and P5) and the parallel transistors (P6, P7 and N6, N7) are on. The set of one PMOS parallel with two NMOS works as transmission gate and the power connection is again established in uncorrupted way. Further the set of one PMOS parallel with two series NMOS and the set of one NMOS parallel with two series PMOS produces less resistance as paralleling two resistance reduces the net resistance. And due to the less resistance more current can flow through the circuit block; the output capacitor is charged and discharged fast and thus delay is less.

## 3.2 Power Reduction Research Using Voltage Scaling

# 3.2.1 Multiple Vdd and V<sub>th</sub> Optimization

High-level synthesis based on voltage scaling can be extended to circuits with multiple supply voltages. A multiple voltage supply system can, for example, assign a low supply voltage ( $V_{ddl}$ ) to non-critical paths while assigning a high supply voltage ( $V_{ddh}$ ) to critical paths. The voltage level of each operation unit (each collection of logic circuits) is decided so

that the power is reduced while preserving timing constraints: this is called the Multiple-Voltage Scheduling (MVS) [25]. Raje *et al.* propose a behavioral level MVS algorithm that uses a data flow graph to abstract a system; thus, an algorithm can be applied to minimize power consumption at the system or chip level [26].

In a multiple-Vdd system, the co-existence of multiple voltages in circuits potentially induces two problems. One is extra wiring needed to properly supply multiple Vdd values, potentially causing large area overhead. The other problem is placement of level converters. If a  $V_{ddl}$  gate drives the input of a  $V_{ddh}$  gate, the voltage level of the output of the  $V_{ddl}$  gate is not high enough to drive the input of the  $V_{ddh}$  gate; thus, if no level converter is used, the incompletely cut-off pmos transistor of the  $V_{ddh}$  gate may incur static current flowing from  $V_{ddh}$  to ground (Gnd). This phenomenon can be prevented by placing a level converter that shifts the voltage level of the  $V_{ddl}$  gate output to  $V_{ddh}$ . These two problems are potentially serious for Vdd optimization because many extra wires and level converters may be required. Therefore, Johnson and Chang tackle the MVS problem with the consideration of level converters [25], [27].

While [25], [26] and [27] focus on solutions within high-level synthesis frameworks, Usami and Horowitz propose clustered voltage scaling, which handles level converter overhead in gate placement. Clustered voltage scaling minimizes the number of level converters by clustering gates having the same supply voltage and placing  $V_{\rm ddh}$  gate clusters before  $V_{\rm ddl}$  gate clusters if possible [28]. Usami *et al.* also tackle the placement problem of wires carrying different voltages by placing  $V_{\rm ddh}$  and  $V_{\rm ddl}$  wires row-by-row [29]. In placing gates using different supply voltages, the easier way is to place  $V_{\rm ddh}$  and  $V_{\rm ddl}$  gates in two separate areas, which is called area-by-area placement. However, area-by-area placement requires long interconnections between  $V_{\rm ddh}$  and  $V_{\rm ddl}$  cells. The row-by-row scheme first places cells without considering voltages and then chooses the voltage level of each Vdd wire based on the majority of cells. The cells in a row of a different Vdd value (e.g.,  $V_{\rm ddl}$ ) are relocated to the nearest row where cells use the same Vdd (e.g.,  $V_{\rm ddl}$ ).

# **CHAPTER IV**

# PROPOSED METHOD

In this chapter we now discuss our new method regarding static and dynamic power, propagation delay and area. Firstly we will discuss its structure and next the operating principle will be shown.

#### 4.1 Proposed Structure



Figure 4.1: Proposed method

Sleep transistors are important part in any low leakage power design. Sleep transistors are used to cut- off the connection to the power rail during sleep mode. But the problem is: the circuit losses it's output state during sleep mode. So, in our proposed method (Figure 4.1), to retain state we used two parallel transistors, one in the pull-up network and another in the

pull-down network. One NMOS transistor is used in the pull-down network to degrade the lower logic level.

The circuit block of the inverter chain has an aspect ratio of W/L=3 for NMOS and W/L=6 for PMOS (assuming  $\mu_n$ =2 $\mu_p$ ). Since the proposed portion can be made common to all logic circuitry, less numbers of transistor are needed to apply a certain logic circuit. For example a chain of 4 inverters both sleepy stack and sleepy keeper approaches require 24 transistors whereas the proposed method only requires 13 transistors, thus saving 11 transistors and hence a considerable amount of area reduction can be achieved.

We used minimum size transistor width for the sleep transistor of the proposed method. Although we use minimum width of the sleep transistor, changing the sleep transistor width may provide additional trade-off between delay, power and area. The size of the transistors are W/L=1 except the transistors of the logic circuits and the NMOS transistor (N7) which is used for degradation of the lower logic level.

We used three high threshold transistors (N6, P6 and N7) to reduce static power consumption. RBB technique is applied to increase threshold voltage of N6 and N7; and FBB technique is applied to increase threshold voltage of P6. So, the source voltages of N6 and N7 are zero, but body voltages are kept at  $-V_{dd}$ . And in case of P6, source voltage is  $V_{dd}$ , but body voltage is  $2V_{dd}$ .

### 4.2 Operation of the Proposed Method

In this section, the structure and operation of our proposed low-leakage-power design is described. It is also compared with well-known previous approaches, i.e., the sleepy stack, dual sleep and dual stack methods. N7 is always on (both in active mode and sleep mode). While in active mode i.e. S=1 and S'=0. Now the sleep transistors (N5 and P5) and one parallel transistor (N6), connected with the pull-up network are "on" and another parallel transistor (P6), connected with the pull-down network is "off". They work as transmission gate and the power connection is available. Further they decrease the dynamic power. In sleep mode, the sleep transistors are "off", i.e. transistors N5 and P5 are "off" and both the

parallel transistors N6 and P6 are "on". We do so by making S=0 and S'=1. Now the voltage of  $V_{gnd}$  is increased at  $V_{tp}$ . The transistor N6 degrades the higher logic level and both the transistors N7 and P6 increase the lower logic level of the main logic circuit. So the voltage level across the logic circuit is decreased and static power is decreased for this lower voltage level. Again, source voltage of NMOS transistors of logic circuit is increased in sleep mode than in active mode; but body voltage of these NMOS transistors are zero. Due to this reverse body biasing threshold voltage of NMOS transistors are increased and so, sub-threshold leakage is decreased. Due to the high threshold voltage of N6, P6, and N7 leakage power is reduced.

#### 4.3 Delay Model



Figure 4.2: (a) Inverter logic circuit (left) and (b) RC equivalent circuit (right)

Generally the transistor delay of a conventional inverter shown in Figure 4.2 can be expressed using the following equation:

$$T_{d0} = C_L R_t \tag{4.1}$$

Where  $C_L$  is the load capacitance and  $R_t$  is the transistor resistance.  $C_{in}$  in Figure 4.2(b) indicates input capacitance. Although the non-saturation mode equation is complicated, we can predict the adequate first-order gate delay from Equation 4.1 [30].

#### 4.4 Analytical Comparison of Proposed approach vs. Sleepy Stack Inverter

### 4.4.1 Delay model of sleepy stack inverter

Now the delay model of the inverter with the sleepy stack technique [20] is derived (shown in Figure 4.3). Since we assume that we break each existing transistor into two half sized transistors, the resistance of each transistor of the sleepy stack is doubled, i.e.  $2R_t$ , compared to the standard inverter; furthermore, in this way we can maintain input capacitance equal to Figure 4.2(b). In Figure 4.3,  $C_{X1}$  is internal node capacitance between the two pull-down transistors.



Figure 4.3: (a) Sleepy stack technique inverter (left) and (b) RC equivalent circuit (right)

Using the Elmore equation [31], we can express the delay of the sleepy stack inverter as follows:

$$T_{d1} = (2R_t + R_t)C_L + R_tC_{X1}$$

$$= 3R_tC_L + R_tC_{X1}$$

$$= 3(R_tC_L + 0.33R_tC_{X1})$$

$$= 3K \quad [Assuming R_tC_L + 0.33R_tC_{X1} = K]$$
(4.2)

### 4.4.2 Delay model of proposed inverter

If  $C_1$  and  $C_2$  are internal node capacitances, using Elmore equation delay of proposed inverter (Figure 4.4) as follows:

$$T_{d2} = (2R_t + R_t/2) C_L + (R_t/2 + R_t) C_1 + (R_t/2) C_2$$
(4.4)

 $C_1$  is the capacitance from two transistors connected while  $C_2$  is the capacitance from three transistors connected. Then

$$C_2 = 3C_1/2 \tag{4.5}$$



Figure 4.4: (a) Proposed inverter (left) and (b) RC equivalent circuit (right)

Using equation 4.5 we get

$$\begin{split} T_{d2} &= (2R_t + R_t/2) \; C_L + (R_t/2 + R_t) \; C_1 + (R_t/2) \; (3C_1/2) \\ &= 2.5 R_t C_L + 2.25 R_t C_1 \end{split}$$

Now  $C_{X1}$  is the capacitance from three transistors connected while  $C_1$  is the capacitance from two transistors connected. Then

$$C_1 = 2C_{X1}/3$$

So, 
$$T_{d2} = 2.5R_tC_L + 2.25R_t (2C_{XI}/3)$$
  

$$= 2.5R_tC_L + 1.5R_tC_{XI}$$
  

$$= 2.5(R_tC_L + 0.6R_tC_{XI})$$
(4.6)

The internal node capacitances are primarily due to the source and drain diffusion capacitances of the transistor, and are not as large as the output node capacitance. The output node can only have two possible values: Vdd and ground; however each internal node has less charge stored than the output node, the values  $0.33R_tC_{X1}$  can be almost equal to the value of  $0.6R_tC_{X1}$ .

So, we can assume  $R_tC_L + 0.33R_tC_{X1} = R_tC_L + 0.6R_tC_{X1} = K$ 

Now from equation 4.6,

$$T_{d2} = 2.5K$$
 (4.7)

 $= 2.5T_{dl}/3$  [From equation 4.3]

$$=0.83T_{d1}$$
 (4.8)

So delay of our proposed method is almost equal to delay of sleepy stack.

### 4.5 Analytical Comparison of Proposed approach vs. Dual Stack Inverter

#### 4.5.1 Delay model of dual stack inverter

Now, in case of dual stack structure, the dual stack inverter and it's RC equivalent circuit is shown in Figure 4.5.



Figure 4.5: (a) Dual stack technique inverter (left) and (b) RC equivalent circuit (right)

If  $C_3$  and  $C_4$  are internal node capacitances, using Elmore equation delay of dual stack inverter as follows:

$$T_{d3} = (R_t + 2 R_t/3) C_L + (2 R_t/3) C_3 + R_t C_4$$

 $C_3$  is the capacitance from three transistors connected while  $C_4$  is the capacitance from two transistors connected. Then

$$C_3 = 3C_4/2$$
 (4.9)

Then,

$$T_{d3} = (R_t + 2 R_t/3) C_L + (2 R_t/3) (3C_4/2) + R_t C_4$$

$$= 5 R_t C_L/3 + 2R_t C_4$$
(4.10)

Again in the sleepy stack approach  $C_{X1}$  is the capacitance for three transistors and here in the dual sleep approach  $C_4$  is the capacitance for two transistors. So,

$$C_4 = 2C_{X1}/3 \tag{4.11}$$

Now from equation 4.10,

$$T_{d3} = 5 R_t C_L / 3 + 2 R_t (2C_{X1} / 3)$$
(4.12)

$$= 5 R_t C_L / 3 + 4R_t C_{X1} / 3$$

$$= 5(R_t C_L + 4R_t C_{X1} / 5) / 3$$

$$= 5(R_t C_L + 0.8R_t C_{X1}) / 3$$
(4.13)

As internal node has less charge stored than the output node, the values  $0.33R_tC_{X1}$ ,  $0.6R_tC_{X1}$  and  $0.8R_tC_{X1}$  are almost equal.

So, 
$$R_tC_L + 0.33R_tC_{X1} = R_tC_L + 0.6R_tC_{X1} = R_tC_L + 0.8R_tC_{X1} = K$$
 (4.14)

Now from equation 4.13,

$$T_{d3}=1.67K$$

### 4.5.2 Delay model of proposed inverter

From equation 4.7 delay of proposed inverter

$$T_{d2} = 2.5K$$
 
$$= 2.5T_{d3}/1.67$$
 
$$= 1.5T_{d3} \text{ [From equations 4.13 and 4.14]}$$
 (4.15)

So delay of proposed inverter is slightly larger than dual stack inverter.

## **CHAPTER V**

## APPLYING PROPOSED METHOD

The proposed method can successfully be implemented in both logic circuit and memory design. To verify this statement here this method is applied in two ways:

- 1. A chain of 4 inverters
- 2. SRAM cell.

### 5.1 Application of the Proposed Method in a Chain of Four Inverters

A chain of 4 inverters (Figure 5.1) is chosen because an inverter is one of the most basic CMOS circuits and is typically used to study circuit characteristics. We size each transistor of the inverter to have equal rise and fall times in each stage. Instead of using the minimum possible size of the transistor in a given technology, we use W/L = 6 for PMOS and W/L = 3 for the NMOS.



Figure 5.1 : A chain of four inverters

According to this base case circuit we construct the proposed circuit for a chain of four inverters as shown in Figure 5.2. It is obvious that we can use same sleep transistors for all the logic blocks performing same function. Thus it reduces the number of transistors to construct logic and hence reduce area in a significant manner. The transistor sizes are shown in Figure 5.2.



Figure 5.2: A chain of four inverter using proposed method

#### 5.2 Application of proposed method in SRAM Cell

An SRAM cell is designed based on the proposed technique. The conventional 6-T SRAM cell consists of two coupled inverters and two wordline pass transistors.

The subthreshold leakage current in an SRAM cell in typically categorized into two kinds:

- Cell leakage current that flows from Vdd to Gnd internal to the cell and
- Bitline leakage current that flows from bitline (or bitline') to Gnd.

The proposed method is applied in both cases: 1) in bitline leakage path and 2) in cell leakage path. The SRAM cell using proposed method is shown in Figure 5.3.



Figure 5.3: SRAM cell using the proposed method

### **CHAPTER VI**

### SIMULATION RESULTS

We compare the proposed method to sleepy stack, dual sleep and dual stack techniques. Thus, we compare four design approaches in terms of power consumption (dynamic and static), delay and area. To show that this approach is applicable to general logic and memory design, we choose a chain of 4 inverters (Figure 5.2) and a SRAM cell (Figure 5.3). We use synopsis HSPICE [32] to estimate delay and power consumption. Area is estimated using MICROWIND. The inverter chain uses four inverters each with W/L=6 for PMOS and W/L=3 for NMOS for the dual sleep technique (Figure 3.6). Sleep transistors in this approach are sized such that sleep transistors activated in ON state uses W/L=6 for PMOS and W/L=3 for NMOS and the sleep transistors activated in OFF state uses W/L=1.5. In case of dual stack approach (Figure 3.7), inverter chain uses four inverters each with W/L=6 for PMOS and W/L=3 for NMOS. The size of other transistors in this approach is W/L=1. In case of sleepy stack (Figure 4.4), W/L =3 for PMOS and W/L=1.5 for NMOS. In the proposed approach, the transistor size is shown in Fig. 5.2. The chosen technologies are BSIM4 PTM Model [33] and their supply voltages are given in Table II.

Table II: Power supply voltage for different technologies

| 130n | 90n  | 65n  | 45n  | 32n  |
|------|------|------|------|------|
| 1.3V | 1.2V | 1.1V | 1.0V | 0.9V |

### 6.1 Simulation Results for a Chain of Four Inverters

Static power is the power which is consumed when the gate's output is stable at logic "1" and logic "0". This power is calculated by multiplying the supply voltage value by the stable current value. Dynamic power is dissipated during output is rising and falling and it is calculated by multiplying the supply voltage value by the total charge drawn from the supply during rising output and falling output.

Propagation delay is calculated to be the average of propagation delays for rising and falling outputs.

First we explore the impact of technology scaling. Figures 6.1 ~6.4 show the simulation results for the chain of 4 inverters. Four different techniques: sleepy stack, dual sleep, dual stack and proposed method are considered.



Figure 6.1: Static Power Dissipation for a chain of 4 inverters



Figure 6.2: Dynamic Power Dissipation for a chain of 4 inverters



Figure 6.3: Propagation Delay for a chain of 4 inverters



Figure 6.4: Area for a chain of 4 inverters

Let us focus on the technology implementation of each benchmark. Here data for dual sleep method is used from [34] and due to unavailability of other data (sleepy stack and dual stack), our simulated data are used.

Table III: A chain of four inverters in 65nm technology

| Circuit         | Static power | Dynamic power | Propagation | Area        |
|-----------------|--------------|---------------|-------------|-------------|
| techniques      | (nW)         | (µW)          | delay       | $(\mu m^2)$ |
|                 |              |               | (ps)        |             |
| Sleepy stack    | 1.6469       | 6.33          | 55.73       | 10.52       |
| Dual sleep [34] | 2.128        | 8.1733        | 36.43       | 5.29        |
| Dual stack      | 1.31         | 3.92          | 42.33       | 4.82        |
| Proposed        | 0.678        | 3.768         | 58.82       | 4.62        |

Table IV: Comparison between different circuit techniques for a chain of four inverters in 65nm technology

| Circuit techniques | Static power | Dynamic power | Propagation<br>delay | Area     |
|--------------------|--------------|---------------|----------------------|----------|
| Sleepy stack       | +142.9%      | +67.99%       | -5.25%               | +127.71% |
| Dual sleep         | +213.86%     | +116.91%      | -38.06%              | +14.50%  |
| Dual stack         | +93.21%      | +4.03%        | -28.03%              | +4.33%   |

Here '+' denotes improved and '-' denotes degraded performance of our proposed technique with respect to other methods. We used data for dual sleep inverter from [34] and in case of other methods (sleepy stack and dual stack) we used our simulated data due to unavailability of reference data. Here the proposed method exhibits 93.21%, 4.03%, 4.33% improved and 28.03% degraded performance than dual stack method in terms of static power, dynamic power, area and propagation delay respectively for a chain of four inverters considering 65nm technology. When compared to dual sleep method, the proposed method exhibits 213.86%, 116.91%, 14.50% improved and 38.06% degraded performance in terms of static power, dynamic power, area and propagation delay respectively. And in case of sleepy stack technique, proposed method gives 142.9%, 67.99% and 127.71% improved and 5.25% degraded performance in terms of static power, dynamic power, area and propagation delay respectively.

So we can see that our proposed method shows better result than all previous method in case of area, static power consumption and dynamic power consumption. Here delay is increased than dual sleep and dual stack method whereas comparable to sleepy stack technique.

According to our analytical comparison [equation 4.8] delay of proposed inverter should be 0.83times of delay of sleepy stack inverter; i.e. 17% lower than sleepy stack inverter. But from simulation results we see that propagation delay is 5.25% larger in proposed method than in sleepy stack technique for a chain of four inverter using 65nm technology. Again,

analytically, delay of proposed inverter should be 1.5 times of delay of dual stack inverter [equation 4.15]; i.e. 50% larger than dual stack inverter. And from simulation results we find that propagation delay is 28.03% larger in proposed method than in dual stack method for a chain of four inverter using 65nm technology. These variations between analytical results and simulation results are due to some assumptions [Equation 4.14] in analytical comparisons.

#### 6.2 Simulation Results for an SRAM Cell

Here we explore the simulation results for SRAM cell (Figures 6.5~6.8). Four different techniques: sleepy stack, dual sleep, dual stack and proposed method are considered again here.



Figure 6.5: Static Power Dissipation for an SRAM cell



Figure 6.6: Dynamic Power Dissipation for an SRAM cell



Figure 6.7: Propagation Delay for an SRAM cell



Figure 6.8: Area for an SRAM cell

Again we focus on the technology implementation of each benchmark. Here data for dual stack method is used from [35] and due to unavailability of other data (sleepy stack and dual sleep), our simulated data are used.

Table V: SRAM cell in 65nm technology

| Circuit         | Static power | Dynamic power | Propagation | Area        |
|-----------------|--------------|---------------|-------------|-------------|
| Techniques      | (nW)         | (µW)          | delay       | $(\mu m^2)$ |
|                 |              |               | (ps)        |             |
| Sleepy stack    | 1.332        | 18.65         | 201.6       | 6.416       |
| Dual sleep      | 2.169        | 16.72         | 263.4       | 4.48        |
| Dual stack [35] | 1.4909       | 7.452         | 258         | 14.4        |
| Proposed        | 0.694        | 7.01          | 358.5       | 7.37        |

Table VI: Comparison between different circuit techniques for SRAM cell in 65nm technology

| Circuit techniques | Static power | Dynamic power | Propagation<br>delay | Area    |
|--------------------|--------------|---------------|----------------------|---------|
| Sleepy stack       | +91.93%      | +166.05%      | -43.76%              | -12.94% |
| Dual sleep         | +212.54%     | +138.52%      | -26.53%              | -39.21% |
| Dual stack         | +114.83%     | +6.31%        | -28.03%              | +95.38% |

So the proposed method shows 114.83%, 6.31%, 95.38% improved and 28.03% degraded performance than dual stack method in static power, dynamic power, area and propagation delay respectively. Whereas the proposed method shows 212.54%, 138.52% improved and 26.53%, 39.21% degraded performance than dual sleep in terms of static power, dynamic power, propagation delay and area respectively. In case of sleepy stack proposed method exhibits 91.93%, 166.05% improved and 12.94%, 43.76% degraded performance in static power, dynamic power, area and propagation delay respectively.

Here also like a chain of four inverter circuit, the proposed method shows better performance than all previous methods considering static power and dynamic power consumption. The proposed method is better than dual stack method whereas comparable to dual sleep and sleepy stack method in case of area. Here propagation delay is slightly increased than the previous techniques because of some high threshold transistors used in proposed method.

#### **6.3** Summary

The proposed technique is compared to existing techniques in terms of static power, dynamic power, delay and area. Although the proposed technique incurs some delay compared to dual stack and dual sleep techniques, it shows 45-55% leakage reduction compared to dual stack and 60-70% leakage reduction compared to dual sleep method. The proposed method shows better performance than dual sleep and dual stack in case of dynamic power reduction with less area than dual stack technique.

## **CHAPTER VII**

### CONCLUSIONS AND SUGGESTIONS FOR FUTURE RESEARCH

#### 7.1 Conclusions

In nanometer scale CMOS technology, subthreshold leakage power consumption is a great challenge. In this dissertation we propose a new structure to tackle the leakage problem. It is an improved version of dual stack method. Is shows better performance than dual stack technique in terms of static power, dynamic power and area. Only delay is slightly increased here than that of dual stack. Since the proposed structure can retain state it can be used both in generic logic circuit and memory; i.e. SRAM cell. We see that leakage power is around 90~100% larger in dual stack method than our proposed method in case of a chain of four inverter. And in case of an SRAM cell leakage power is more than 100% larger in dual stack method than the proposed method. A reduction in power consumption provides several benefits; less heat is generated, which reduces problems associated with high temperature, such as need for heat sinks. This provides the consumer with a product that costs less. Furthermore, the reliability of the system is increased due to lower temperature stress gradients on the device. An additional benefit of the reduced power consumption is the extended life of the battery in battery-powered systems.

Although previous approaches are effective in some ways, no perfect solution for reducing leakage power consumption is yet known. Therefore, designers choose techniques based upon technology and design criteria. Our proposed method is a novel choice for VLSI designers. The method is applicable to single- and multi-threshold voltages. The proposed method is unique in area saving and leakage power consumption than any other approaches. Trade-offs between dynamic power, static power and area is excellent in our proposed method. As such, the proposed method shows the best solution than any other state saving low power VLSI design techniques.

### **7.2** Suggestions for Future Work

Further leakage reduction techniques should be explored based on the leakage reduction technique proposed in this work. NMOS and PMOS transistors can be added to some of the gates in the circuit to increase the controllability of the internal signals of the circuit and decrease the leakage current of the gates using the "stack effect". This is however should be done carefully so that the minimum leakage is achieved subject to a delay constraint for all input – output path in the circuit.

We used HSPICE for simulation and obtaining static power, dynamic power and propagation delay and MICROWIND for area calculation. CADENCE Software provides the more real time results. So CADENCE can be used to precisely estimate these parameters. Dependence of static power, dynamic power and propagation delay on threshold voltage scaling and temperature variation can be estimated.

## **REFERENCES**

- [1] International Technology Roadmap for Semiconductors by Semiconductor Industry Association, http://public.itrs.net, 2002.3.
- [2] N. S. Kim et al., "Leakage Current: Moore's Law Meets Static Power," IEEE Computer, Vol. 36, Issue 12, pp. 68-75, December 2003.
- [3] JOHNSON, M. C., SOMASEKHAR, D., and ROY, K., "Models and Algorithms for Bounds on Leakage in CMOS Circuits," *IEEE Transactions on Computer Aided Design on Integrated Circuits and Systems*, vol. 18, no. 6, pp. 714–725, June 1999.
- [4] KIM, C. and ROY, K., "Dynamic Vt SRAM: a Leakage Tolerant Cache Memory for Low Voltage Microprocessors," *Proceedings of the International Symposium on Low Power Electronics and Design*, pp. 251–254, August 2002.
- [5] NOSE, K. and SAKURAI, T., "Analysis and Future Trend of Short-Circuit Power," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 19, no. 9, pp. 1023–1030, September 2000.
- [6] CHANDRAKASAN, A. P., SHENG, S., and BRODERSEN, R. W., "Low-Power CMOS Digital Design," *IEEE Journal of Solid-State Circuits*, vol. 27, no. 4, pp. 473–484, April 1992.
- [7] KHELLAH, M. M. and ELMASRY, M. I., "Power Minimization of High-Performance Submicron CMOS Circuits Using a Dual-Vdd Dual-Vth (DVDV) Approach," *Proceedings of the International Symposium on Low Power Electronics and Design*, pp. 106–108, 1999.
- [8] SAKURAI, T. and NEWTON, A. R., "Alpha-Power Law MOSFET Model and Its Application to CMOS Inverter Delay and Other Formulas," *IEEE Journal of Solid-State Circuits*, vol. 25, no. 2, pp. 584–593, April 1990.

- [9] BOWMAN, K. A., AUSTIN, B. L., EBLE, J. C., TANG, X., and MEINDL, J. D., "A Physical Alpha-Power Law MOSFET Model," *IEEE Journal of Solid-State Circuits*, vol. 34, no. 10, pp. 1410–1414, October 1999.
- [10] "Crusoe: Features and Benefits," <a href="http://www.transmeta.com/crusoe/features.html">http://www.transmeta.com/crusoe/features.html</a>.
- [11] MICHELI, G. D., Synthesis and Optimization of Digital Circuits. USA:McGraw-Hill Inc., 1994.
- [12] C. Hu, "Device and Technology Impact on Low Power Electronics," in Low Power Design Methodologies, J. M. Rabaey and M. Pedram, Eds.: Kluwer Academic Publishers, 1996, pp. 21-35.
- [13] KIM, N., AUSTIN, T., BAAUW, D., MUDGE, T., FLAUTNER, K., HU, J., IRWIN, M., KANDEMIR, M., and NARAYANAN, V., "Leakage Current: Moore's Law Meets Static Power," *IEEE Computer*, vol. 36, pp. 68–75, December 2003.
- [14] MUTOH, S., DOUSEKI, T., MATSUYA, Y., AOKI, T., SHIGEMATSU, S., and YAMADA, J., "1-V Power Supply High-speed Digital Circuit Technology with Multithreshold Voltage CMOS," *IEEE Journal of Solid- State Circuits*, vol. 30, no. 8, pp. 847–854, August 1995.
- [15] NARENDRA, S., S. BORKAR, V. D., ANTONIADIS, D., and CHANDRAKASAN, A., "Scaling of Stack Effect and its Application for Leakage Reduction," *Proceedings* of 148 the International Symposium on Low Power Electronics and Design, pp. 195– 200, August 2001.
- [16] JOHNSON, M., SOMASEKHAR, D., CHIOU, L.-Y., and ROY, K., "Leakage Control with Efficient Use of Transistor Stacks in Single Threshold CMOS," *IEEE Transactions on VLSI Systems*, vol. 10, no. 1, pp. 1–5, February 2002.
- [17] NARENDRA, S., DE, V., BORKAR, S., ANTONIADIS, D. A., and CHANDRAKASAN,
  A. P., "Full-Chip Subthreshold Leakage Power Prediction and Reduction Techniques

- for Sub-0.18um CMOS," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 2, pp. 501–510, February 2004.
- [18] SHEU, B., SCHARFETTER, D., KO, P.-K., and JENG, M.-C., "BSIM: Berkeley short-channel IGFET model for MOS transistors," *IEEE Journal of Solid-State Circuits*, vol. 22, pp. 558–566, August 1987.
- [19] J.C. Park, V. J. Mooney III and P. Pfeiffenberger, "Sleepy Stack Reduction of Leakage Power," Proceeding of the International Workshop on Power and Timing Modeling, Optimization and Simulation, pp. 148-158, September 2004.
- [20] J. Park, "Sleepy Stack: a New Approach to Low Power VLSI and Memory," Ph.D. Dissertation, School of Electrical and Computer Engineering, Georgia Institute of Technology, 2005. [Online]. Available <a href="http://etd.gatech.edu/theses">http://etd.gatech.edu/theses</a>
- [21] J. Kao and A. Chandrakasan, "MTCMOS sequential circuits", Proceedings of European Solid-Stat Circuits Conference, pp 332-335, September 2001.
- [22] Se Hun Kim, V.J. Mooney, "Sleepy Keeper: a New Approach to Low-leakage Power VLSI Design" Proceeding of the 2006 IFIP International Conference on <a href="Very Large Scale Integration">Very Large Scale Integration</a>, pp367-372, Oct. 2006.
- [23] N. Karmakar, M. Z. Sadi, M. K. Alam and M. S. Islam, "A novel dual sleep approach to low leakage and area efficient VLSI design," *Proc. 2009 IEEE Regional Symposium on Micro and Nano Electronics (RSM2009)*, Kota Bharu, Malaysia, August 10-12, 2009, pp. 409-414.
- [24] M. S. Islam, M. Sultana Nasrin, Nuzhat Mansur, Naila Tasneem, "Dual Stack Method: A Novel Approach to Low Leakage and Speed Power Product VLSI Design," Proc. 2010 IEEE International Conference on Electrical and Computer Engineering (ICECE 2010), Dhaka, Bangladesh, December 18 – 20, 2010, pp. 89 – 92.
- [25] CHANG, J.-M. and PEDRAM, M., "Energy Minimization Using Multiple Supply

- Voltages," *IEEE Transactions on VLSI Systems*, vol. 5, no. 4, pp. 436–443, December 1997.
- [26] RAJE, S. and SARRAFZADEH, M., "Variable Voltage Scheduling," *Proceedings of the International Symposium on Low Power Electronics and Design*, pp. 9–14, August 1995.
- [27] JOHNSON, M. C. and ROY, K., "Datapath Scheduling with Multiple Supply Voltages and Level Converters," *ACM Transactions on Design Automation of Electronic Systems*, vol. 2, no. 3, pp. 227–248, July 1997.
- [28] USAMI, K. and HOROWITZ, M., "Clustered Voltage Scaling Technique for Low-Power Design," *Proceedings of the International Symposium on Low Power Electronics and Design*, pp. 3–8, April 1995.
- [29] USAMI, K., IGARASHI, M., MINAMI, F., ISHIKAWA, T., KANZAWA, M., ICHIDA, M., and NOGAMI, K., "Automated Low-Power Technique Exploiting Multiple Supply Voltages Applies to a Media Processor," *IEEE Journal of Solid-State Circuits*, vol. 33, no. 3, pp. 463–472, March 1998.
- [30] CHANDRAKASAN, A. P., SHENG, S., and BRODERSEN, R. W., "Low-Power CMOS Digital Design," *IEEE Journal of Solid-State Circuits*, vol. 27, no. 4, pp. 473–484, April 1992.
- [31] UYEMURA, J. P., CMOS Logic Circuit Design Second Edition. Norwell, Massachusetts USA: Kluwer Academic Publishers, 1999.
- [32] Synopsis Inc. [Online]. Available <a href="http://www.synopsis.com">http://www.synopsis.com</a>.
- [33] <a href="http://www.eas.asu.edu/~ptm/">http://www.eas.asu.edu/~ptm/</a>
- [34] Nittaranjan Karmakar, Mohammed Khorshed Alam, Mehdi Zahid Sadi, "Dual Sleep Method: A Novel Approach to Low Leakage and Area Efficient VLSI Design," B.Sc. Dissertation, Electrical and Electronic Engineering, Bangladesh University of

- Engineering and Technology, 2009.
- [35] Nuzhat Mansur, Naila Tasneem, "Dual Stack Method: A Novel Approach to Low Leakage and Area Efficient VLSI Design," B.Sc. Dissertation, Electrical and Electronic Engineering, Bangladesh University of Engineering and Technology, 2009.

## **APPENDICES**

### APPENDIX A: DESIGN LAYOUTS FOR A CHAIN OF FOUR INVERTERS

A1: Layout of a chain of four inverters in sleepy stack method



A2: Layout of a chain of four inverters in dual sleep method



# A3: Layout of a chain of four inverters in dual stack method



# A4: Layout of a chain of four inverters in proposed method



### APPENDIX B: DESIGN LAYOUTS OF SRAM CELL

## B1: Layout of SRAM cell in sleepy stack method



# **B2:** Layout of SRAM cell in dual sleep method



# **B3:** Layout of SRAM cell in dual stack method



B4: Layout of SRAM cell in proposed method



# APPENDIX C: SIMULATION DATA (A CHAIN OF FOUR INVERTERS)

## C1: 130nm technology

| Circuit techniques | Static power (nW) | Dynamic power (µW) | Propagation<br>delay<br>(ps) | Area (μm²) |
|--------------------|-------------------|--------------------|------------------------------|------------|
| Sleepy stack       | 3.4859            | 19.2               | 77.45                        | 42.11      |
| Dual sleep         | 3.328             | 23.799             | 52.81                        | 21.18      |
| Dual stack         | 2.289             | 11.54              | 64.219                       | 19.25      |
| Proposed           | 0.987             | 11.36              | 86.44                        | 18.46      |

## C2: 90nm technology

| Circuit      | Static power | Dynamic power | Propagation   | Area        |
|--------------|--------------|---------------|---------------|-------------|
| techniques   | (nW)         | (µW)          | delay<br>(ps) | $(\mu m^2)$ |
| Sleepy stack | 2.476        | 11.43         | 67.74         | 20.18       |
| Dual sleep   | 2.7981       | 14.023        | 41.823        | 10.15       |
| Dual stack   | 1.8389       | 6.811         | 45.629        | 9.226       |
| Proposed     | 0.843        | 6.45          | 63.36         | 8.85        |

## C3: 65nm technology

| Circuit      | Static power | Dynamic power | Propagation   | Area        |
|--------------|--------------|---------------|---------------|-------------|
| techniques   | (nW)         | (µW)          | delay<br>(ps) | $(\mu m^2)$ |
| Sleepy stack | 1.6469       | 6.33          | 55.73         | 10.52       |
| Dual sleep   | 2.128        | 8.1733        | 36.43         | 5.29        |
| Dual stack   | 1.31         | 3.92          | 42.33         | 4.82        |
| Proposed     | 0.678        | 3.768         | 58.82         | 4.62        |

# C4: 45nm technology

| Circuit techniques | Static power (nW) | Dynamic power (µW) | Propagation<br>delay<br>(ps) | Area (μm²) |
|--------------------|-------------------|--------------------|------------------------------|------------|
| Sleepy stack       | 0.9294            | 2.917              | 50.079                       | 5.04       |
| Dual sleep         | 1.4113            | 4.1342             | 31.395                       | 2.54       |
| Dual stack         | 0.765             | 1.933              | 55.14                        | 2.306      |
| Proposed           | 0.446             | 1.849              | 61.33                        | 2.21       |

# C5: 32nm technology

| Circuit techniques | Static power (nW) | Dynamic power (µW) | Propagation<br>delay<br>(ps) | Area (μm²) |
|--------------------|-------------------|--------------------|------------------------------|------------|
| Sleepy stack       | 0.70529           | 1.425              | 52.81                        | 2.55       |
| Dual sleep         | 1.184             | 2.087              | 38.831                       | 1.28       |
| Dual stack         | 0.546             | 0.9695             | 69.79                        | 1.166      |
| Proposed           | 0.357             | 0.926              | 82.42                        | 1.12       |

## APPENDIX D: SIMULATION DATA (SRAM CELL)

# D1: 130nm technology

| Circuit      | Static power | Dynamic power | Propagation delay | Area        |
|--------------|--------------|---------------|-------------------|-------------|
| techniques   | (nW)         | (μW)          | (ps)              | $(\mu m^2)$ |
| Sleepy stack | 2.751        | 45.51         | 193.19            | 25.66       |
| Dual sleep   | 3.31         | 41.63         | 265.09            | 17.94       |
| Dual stack   | 2.3549       | 18.95         | 273               | 57.54       |
| Proposed     | 1.017        | 19.45         | 382.79            | 29.48       |

# D2: 90nm technology

| Circuit      | Static power | Dynamic power | Propagation   | Area               |
|--------------|--------------|---------------|---------------|--------------------|
| techniques   | (nW)         | (µW)          | delay<br>(ps) | (µm <sup>2</sup> ) |
| Sleepy stack | 1.972        | 29.12         | 195.29        | 12.3               |
| Dual sleep   | 2.801        | 26.96         | 267.9         | 8.6                |
| Dual stack   | 1.9824       | 11.38         | 243           | 27.57              |
| Proposed     | 0.858        | 11.42         | 360.8         | 14.13              |

## D3: 65nm technology

| Circuit      | Static power | Dynamic power | Propagation | Area        |
|--------------|--------------|---------------|-------------|-------------|
| Techniques   | (nW)         | (µW)          | delay       | $(\mu m^2)$ |
|              |              |               | (ps)        | ·           |
| Sleepy stack | 1.332        | 18.65         | 201.6       | 6.416       |
| Dual sleep   | 2.169        | 16.72         | 263.4       | 4.48        |
| Dual stack   | 1.4909       | 7.452         | 258         | 14.4        |
| Proposed     | 0.694        | 7.01          | 358.5       | 7.37        |

# D4: 45nm technology

| Circuit      | Static power | Dynamic power | Propagation delay | Area        |
|--------------|--------------|---------------|-------------------|-------------|
| techniques   | (nW)         | (µW)          | (ps)              | $(\mu m^2)$ |
| Sleepy stack | 0.771        | 9.715         | 210.29            | 3.075       |
| Dual sleep   | 1.503        | 8.744         | 259.2             | 2.15        |
| Dual stack   | 0.9701       | 3.8693        | 268               | 6.89        |
| Proposed     | 0.481        | 3.63          | 363.09            | 3.53        |

# D5: 32nm technology

| Circuit      | Static power | Dynamic power | Propagation   | Area        |
|--------------|--------------|---------------|---------------|-------------|
| techniques   | (nW)         | (µW)          | delay<br>(ps) | $(\mu m^2)$ |
|              |              |               | (ps)          |             |
| Sleepy stack | 0.603        | 5.18          | 223.59        | 1.555       |
| Dual sleep   | 1.308        | 4.647         | 274.6         | 1.087       |
| Dual stack   | 0.7803       | 1.9917        | 289           | 3.483       |
| Proposed     | 0.395        | 1.88          | 400.39        | 1.78        |