# A Low Dynamic Power and Low Leakage Power 90-nm CMOS Square-Root Circuit

Tadayoshi Enomoto and Nobuaki Kobayashi Chuo University, Graduate School of Science and Engineering Information and System Engineering Course 1-13-27 Kasuga, Bunkyo-ku, Tokyo 112-0881, Japan

**Abstract** - To drastically reduce the dynamic power ( $P_{AT}$ ) and the leakage power ( $P_{ST}$ ), while to keep speed of a CMOS square-root (SR) circuit, a new algorithm, new architectures and a new leakage reduction circuit were developed. Using these techniques, a 90-nm CMOS LSI was fabricated. The  $P_{AT}$  and  $P_{ST}$  of the new SR circuit were reduced to about 1/4 and 1/33 those of a conventional SR circuit. Measured results agreed well with simulated results.

# **1. INTRODUCTION**

Low-power circuit techniques are needed for use in battery-driven portable systems. To reduce both the dynamic power  $(P_{\rm AT} = GCf_{\rm c}V_{\rm DD}^2)$  and the leakage power  $(P_{\rm ST} = GI_{\rm L}V_{\rm DD})$ of the CMOS circuits, we have to reduce the total number of logic gates (G), the supply voltage ( $V_{DD}$ ), and/or the leakage current  $(I_1)$  of an individual logic gate while maintaining a required clock frequency  $(f_c)$ . Improving both algorithms and architectures can reduce G. Shortening a critical path, that is, decreasing a number of logic gates  $(G_c)$  of the critical path can lower  $V_{DD}$ . Lowering  $V_{DD}$  is also effective for lowering  $I_{\rm L}$ . Furthermore, to drastically decrease  $I_{\rm L}$ , we developed a special leakage current reduction circuit. To examine the effects of the developed low power techniques on both  $P_{\rm AT}$  and  $P_{\rm ST}$ , we have applied those techniques to a square-root circuit for such uses as in computer graphic application.

## 2. TECHNIQUE FOR LOWERING SUPPLY VOLTAGE

Let  $Q = (a_1q_2 - q_m)$  be the square root of  $A = (a_1a_2 - a_{2m-1}a_{2m})$ . The *m*th-bit SR  $(q_m)$  is obtained as a carry signal when a *m*th reminder  $(.R_m)$  is calculated [1].  $R_m$  is obtained by

 $R_m = R_{m-1}a_{2m-1}a_{2m} - .00 - 0q_1q_2 - q_{m-2}q_{m-1}01$ (1) when  $q_{m-1}$  is 1. It is calculated as

 $R_m = R_{m-1}a_{2m-1}a_{2m} + .00 - 0q_1q_2 - q_{m-2}q_{m-1}11$  (2) when  $q_{m-1}$  is 0. The above two equations for *m* of 4 are carried out by the square-root (SR) circuit shown in Fig. 1 [1]. This 2*m*-bit conventional SR circuit (C-SR) for *m* of 4 can be constructed with a 4-stage ripple carry adder that consists of 20 full adders (CASs) with a subtraction function. Bold solid lines indicate the critical path. The C-SR including buffer inverters has *G* of 189 gates and *G*<sub>c</sub> of 60 gates in the critical path.

Replacing CASs by CAS1s and CAS2s (Fig.2) can drastically reduce  $G_c$ . G and  $G_c$  of the SR circuit for m of 4 would be reduced to 179 and 40, respectively. To further reduce G and  $G_c$ , we have modified Eq. 1 as

 $R_m = R_{m-1}a_{2m-1}a_{2m}+.11-.1q_{1,B}q_{2,B}-q_{m-2,B}q_{m-1,B}11$ , (3) where  $q_{1,B}$ ,  $q_{2,B}$ , and so on are the inverses of  $q_1$ ,  $q_2$ , and so on. Furthermore, 1 and  $q_{m-1,B}$  in Eq. 3 are replaced by  $q_{m-1}$ 



Fig. 1. A conventional square-root circuit (C-SR) for *m* of 4.



and 0, respectively. Similarly, 0 and  $q_{m-1}$  in Eq. 2 are replaced by  $q_{m-1}$  and 0, respectively. Thus, both Eqs. 2 and 3 can be expressed by the same equation as

$$R_m = .R_{m-1}a_{2m-1}a_{2m} + .q_{m-1}q_{m-1} - q_m$$

 $(q_1 \oplus q_{m-1})(q_2 \oplus q_{m-1})$ -- $(q_{m-2} \oplus q_{m-1})011$ . (4) *G* and *G*<sub>c</sub> of the SR circuit (not shown) using Eq. 4 for *m* of

4 were greatly reduced to 128 and 32, respectively.

The calculation processes of Eq. 4 are mostly carried out by additions, so complicated full adders (CAS-1 and CAS-2) with subtraction functions can mostly be replaced by either smaller full adders (FA, FA-1, FA-2) or simple half adders (HA-1, HA-2). Thus, we were able to significantly simplify the SR circuit (Fig. 2). *G* and  $G_c$  of the new SR circuit (N-SR) were greatly reduced to 95 and 30, respectively. Thus,



G and  $G_c$  of N-SR are about 50.0% of those of C-SR, respectively.

At  $V_{DD}$  of 1.0 V, the simulated maximum operating clock frequency ( $f_c$ ) of N-SR was 946 MHz, which was 1.66 times faster than that (= 570 MHz) of C-SR. This great  $f_c$ improvement was due to the considerable reduction of  $G_{\rm c}$ . The simulated  $P_{AT}$ s of C-SR and N-SR for *m* of 4 at  $f_c$  of 570 MHz are plotted as solid lines in Fig. 3. Between 0.5 V and 1.5 V,  $P_{AT}$  of N-SR is less than 50% of that of C-SR.  $P_{\rm AT}$  of N-SR at 0.77 V and 570 MHz is 131  $\mu$ W, which is 27.1% of that (484 µW at 1 V and 570 MHz) of C-SR. The simulated  $P_{ST}$ s of C-SR and N-SR for *m* of 4 are plotted as solid lines in Fig. 4. P<sub>ST</sub> of N-SR at 0.77 V is 276 nW, which is less than 1/4 of that (1,147 nW) of C-SR. Table 1 summarizes the characteristics of C-SR and N-SR.

## **3. LEAKAGE CURRENT REDUCTION CIRCUIT**

To further reduce  $P_{\rm ST}$ , we developed a leakage current reduction circuit called a "self-controllable-voltage-level (SVL)" circuit (Fig. 5). N-SR incorporating the SVL circuits is called N-SR-S. The upper SVL circuit (U-SVL) and the lower SVL circuit (L-SVL) can supply a maximum  $V_{\rm D}$  (=  $V_{\rm DD}$ ) and a minimum  $V_{\rm S}$  (=  $V_{\rm SS}$  = 0 V), respectively to the active N-SR on request (i.e., CLB = 0, CL = 1). The U-SVL and L-SVL can also supply decreased  $V_{\rm D}$  (<  $V_{\rm DD}$ ) and increased  $V_{\rm S}$  (> 0 V), respectively to the stand-by N-SR when CL is 0 and CLB is 1.

The SVL circuits can simultaneously reduce the drainto-source voltage  $(V_{ds})$  and increase the substrate bias  $(V_{sub})$ of cut-off MOSFETs. Thus, it decreases the sub-threshold currents of the cut-off MOSFETs [2]. The SVL circuit can also reduce the gate-to-drain electric fields of the cut-off MOSFETs and gate-to-source electric fields of the turn-on MOSFETs; it can reduce not only gate induced drain leakage (GIDL) currents in the cut-off MOSFETs [3], but also gate tunnel currents in the turn-on MOSFETs. Consequently,  $P_{\rm ST}$  of the SR circuit is considerably reduced.

At 1.0 V the maximum  $f_c$  of N-SR-S was 918 MHz, which was 3% slower than that (= 946 MHz) of N-SR. The simulated  $P_{AT}$  of N-SR-S for *m* of 4 at 570 MHz is plotted in Fig. 3. At 0.78 V  $P_{\rm AT}$  was reduced to 132  $\mu$ W that is 27.3% of that of C-SR. The simulated  $P_{\rm ST}$  of N-SR-S is plotted in Fig. 4. P<sub>ST</sub> of N-SR-S at 0.78 V is 34 nW, a reduction to 3% of C-SR and 12% of N-SR. The SVL

| SR circuits                                           | C-SR            | N-SR<br>(N-SR/C-SR) | N-SR-S<br>(N-SR-S/C-SR) |
|-------------------------------------------------------|-----------------|---------------------|-------------------------|
| No. of logic gates $G$                                | 189             | 95<br>(50.3%)       | 97<br>(51.3%)           |
| No. of logic gates of<br>critical path G <sub>c</sub> | 60<br>(100%)    | 30<br>(50.0%)       | 30<br>(50.0%)           |
| Supply voltage V <sub>DD</sub> [V] *                  | 1               | 0.77                | 0.78                    |
| Dynamic power $P_{AT}$ [µW]<br>$f_{c}$ of 570 MHz **  | 484<br>(100%)   | 131<br>(27.1%)      | 132<br>(27.3%)          |
| Leakage power P <sub>ST</sub> [nW]                    | 1,147<br>(100%) | 276<br>(24%)        | $\frac{34}{(3\%V)}$     |

Table 1. Characteristics of C-SR, N-SR and N-SR-S.

\* Minimum  $V_{DD}$  that confirms the 570-MHz operation.

\*\*  $P_{\text{AT}}$  measured at  $f_{\text{c}}$  of 570 MHz.



circuit is very effective in reducing  $P_{\rm ST}$ , while the speed overhead is negligible.

#### 4. LSI FABRICATION AND EXPERIMENTAL RESULTS

C-SR, N-SR and N-SR-S were fabricated for m of 4 as shown in Fig. 6. The 90-nm, 6-layer Cu CMOS fabrication process was used. The threshold voltage  $(V_{th})$  of n-MOSFETs was 0.22 V and that of p-MOSFETs was -0.24 V. The measured  $P_{AT}$ s and  $P_{ST}$ s for the three circuits are plotted in Figs. 3 and 4, respectively. Measured results agree well with SPICE simulated results.

#### 5. SUMMARY

We have developed an SR algorithm, small circuit architectures, and a leakage current reduction circuit to reduce  $P_{AT}$  and  $P_{ST}$ , while maintaining operating speed. Our developed techniques hardly affected the operating speed, while reducing  $P_{\rm AT}$  to about 1/4 and  $P_{\rm ST}$  to 3% those of the conventional circuit. These power reduction techniques will therefore play a major role in future development of sub-100-nm CMOS circuits.

Acknowledgment - The authors wish to thank our colleagues at the Institute of Science and Engineering, Chuo University for their supported of this work. The VLSI chips used in this study were Education Centre (VDEC) of the University of Tokyo in collaboration with STARK and ASPLA.

## References

- Kai Hwang, "Computer Architecture: -Principles, Architecture and Design-," John Wiley & Sons, Inc., Section 11.2, 1979.
   H. J. M. Veendrick, "Deep-submicron CMOS ICs", Kuluwer
- academic publishers, Deep-subilition CMOS ICS, Klindwer academic publishers, Dordrecht, Netherlands, pp. 73-75, 1998.
  [3] M. Rosar, B. Leroy and G. Schweeger, "A New Model for Description of Gate Voltage and Temperature Dependence of Catto Induced Design Voltage and Temperature Dependence of Catto Induced Description of Gate Voltage and Temperature Dependence of Catto Induced Description of Gate Voltage and Temperature Dependence of Catto Induced Description of Gate Voltage and Temperature Dependence of Catto Induced Description of Gate Voltage and Temperature Dependence of Catto Induced Description of Gate Voltage and Temperature Dependence of Catto Induced Description of Gate Voltage and Temperature Dependence of Catto Induced Description of Gate Voltage and Temperature Dependence of Catto Induced Description of Gate Voltage and Temperature Dependence of Catto Induced Description of Gate Voltage and Temperature Dependence of Catto Induced Description of Gate Voltage and Temperature Dependence of Catto Induced Description of Gate Voltage and Temperature Dependence of Catto Induced Description of Gate Voltage and Temperature Dependence of Catto Induced Description of Gate Voltage and Temperature Dependence of Catto Induced Description of Gate Voltage and Temperature Dependence of Catto Induced Description of Gate Voltage and Temperature Dependence of Catto Induced Description of Gate Voltage and Temperature Dependence of Catto Induced Description of Gate Voltage and Temperature Dependence of Catto Induced Description of Gate Voltage and Temperature Dependence of Catto Induced Description of Gate Voltage and Temperature Dependence of Catto Induced Description of Gate Voltage I Gate Induced Drain Leakage (GIDL) in the Low Electric Field Region", IEEE Tran. on Electron Devices, vol. 47, no. 1, pp. 154 - 159, January 2000.