Supercond. Sci. Technol. 15 (2002) 1744–1748

# Design and operation of a rapid single flux quantum demultiplexer

# Masaaki Maezawa, Motohiro Suzuki and Akira Shoji

National Institute of Advanced Industrial Science and Technology, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, Japan

E-mail: masaaki.maezawa@aist.go.jp

Received 24 June 2002, in final form 29 August 2002 Published 22 November 2002 Online at stacks.iop.org/SUST/15/1744

### Abstract

A demultiplexer (DMUX) is a key subsystem of rapid single flux quantum (RSFQ) circuits and systems in practical applications. High-speed data from RSFQ circuits should be converted into sufficiently low-speed ones for transmission to and processing by room temperature electronics. We designed, fabricated and successfully tested an RSFQ DMUX based on the synchronous shift-and-dump architecture whose advantages are modularity and compactness. A 1-to-8 DMUX was implemented using our standard cell library on 1.6 kA cm<sup>-2</sup> Nb trilayer technology. Fully functional operation was confirmed by low speed testing. Experimental bias margins were as large as  $\pm 16\%$ . Results of average voltage measurements implied that the DMUX was operated at data rates up to 25 Gb s<sup>-1</sup>.

# 1. Introduction

Rapid single flux quantum (RSFQ) [1, 2] is a high-speed, low-power digital technology based on superconductivity and Josephson effect. RSFQ is capable of simultaneously carrying out ultrafast digital processing and metrologically precise quantization of electric signals. This unique advantage makes RSFQ attractive as a supplement to future electronics systems that should meet the demand for increasing complexity and variety. For instance, high-performance analog-to-digital converters are expected to be one of the killer applications of RSFQ technology, and their excellent performance has been successfully demonstrated.

In practical use of RSFQ circuits, most systems will include external electronics consisting of ordinary semiconductor devices and circuits. An interface between the cryogenic RSFQ and room temperature electronics is of great importance. The RSFQ circuitry operates at internal clock frequencies of several tens of GHz or higher. Output data rates of the RSFQ circuit are too high to transmit to room temperature electronics at acceptably small bit error rates because the distance between low- and room-temperature stages is much longer than the wavelength of the data signals. Also today's semiconductor digital circuits cannot directly process 10 GHz data. A practical way of transmitting data is to down-convert high-speed RSFQ data into sufficiently low-speed ones. Thus, a demultiplexer (DMUX) [3–10]

is a key subsystem for practical use of RSFQ circuits and systems.

In this paper we present design, fabrication and test results on an RSFQ DMUX for applications such as an analog-to-digital converter. The DMUX was designed on the synchronous shift-and-dump architecture. A 1-to-8 DMUX was implemented on our standard cell library and fabricated using a 1.6 kA cm<sup>-2</sup> Nb trilayer technology. Fully functional operation was confirmed by low speed testing. Results of average voltage measurements implied that the DMUX was operated at input data rates up to 25 Gb s<sup>-1</sup>.

# 2. Design

## 2.1. Architecture

Two types of RSFQ DMUX have been proposed and demonstrated so far. One is based on the binary tree architecture [4–10] and the other is based on the shift-and-dump architecture [3]. The binary tree type DMUX has been more widely investigated. Combined with an asynchronous timing scheme based on the dual-rail coding of SFQ data, high-throughput operation of a single cell of the binary tree DMUX has been reported [9]. However, the binary tree structure occupies a rather large area and consumes extra bias currents because Josephson transmission lines (JTLs) are used to interconnect the DMUX cells. In addition,

**Table 1.** AIST standard library cells.  $I_{\rm B}$  is the bias current at the nominal bias voltage of 2.5 mV.  $\Delta I_{\rm B}$  is the operating margins of  $I_{\rm B}$  obtained by simulation.

| Name   |                                    | Width $\times$ Height $(\mu m \times \mu m)$ | $I_{\rm B}({\rm mA})$ | $\Delta I_{\rm B}(\%)$ | Number<br>of JJs |
|--------|------------------------------------|----------------------------------------------|-----------------------|------------------------|------------------|
| jt150  | Straight JTL ( $l = 50 \ \mu m$ )  | $50 \times 50$                               | 0.35                  |                        | 2                |
| jt175  | Straight JTL $(l = 75 \ \mu m)$    | $75 \times 50$                               | 0.35                  |                        | 2                |
| jt1100 | Straight JTL ( $l = 100 \ \mu m$ ) | $100 \times 50$                              | 0.35                  |                        | 2                |
| jtl.L  | L-shaped JTL                       | $50 \times 50$                               | 0.35                  |                        | 2                |
| jtl.S  | S-shaped JTL                       | $25 \times 100$                              | 0.35                  |                        | 2                |
| jtl.U  | U-shaped JTL                       | $25 \times 100$                              | 0.35                  |                        | 2                |
| jtl.X  | Crossing JTLs                      | $50 \times 100$                              | 0.70                  |                        | 4                |
| dc/sfq | DC-to-SFQ converter                | $175 \times 50$                              | 0.36                  | -50/+43                | 6                |
| sfq/dc | SFQ-to-DC converter                | $175 \times 50$                              | 1.30                  | -33/+43                | 12               |
| term   | Terminator                         | $50 \times 50$                               | 0.35                  | ,                      | 2                |
| sp1    | Splitter                           | $50 \times 100$                              | 0.43                  | -41/+41                | 3                |
| sp2    | Splitter                           | $50 \times 100$                              | 0.43                  | -43/+41                | 3                |
| sp3    | Splitter                           | $50 \times 100$                              | 0.43                  | -43/+41                | 3                |
| cb1    | Confluence buffer                  | $50 \times 100$                              | 0.63                  | -44/+41                | 6                |
| cb3    | Confluence buffer                  | $50 \times 100$                              | 0.63                  | -42/+41                | 6                |
| tff    | T flip-flop                        | $50 \times 100$                              | 0.53                  | -37/+39                | 5                |
| dff    | D flip-flop                        | $50 \times 100$                              | 1.21                  | -39/+39                | 8                |
| d2     | D2 cell                            | $100 \times 150$                             | 2.37                  | -34/+37                | 20               |
| dffc   | D flip-flop with                   |                                              |                       | ,                      |                  |
|        | complementary outputs              | $75 \times 100$                              | 1.27                  | -31/+33                | 12               |



**Figure 1.** Block diagram of a shift-and-dump DMUX. Input data are loaded in the shift register by the shift signal S and then dumped to the outputs by the dump signal R.

if a synchronous timing scheme based on the RSFQ basic convention [1] is employed, which is useful and convenient for some applications such as analog-to-digital converters, timing design becomes more complicated with increasing demultiplexing factor n. On the other hand, the shiftand-dump architecture has advantages of good modularity and compactness. The shift-and-dump DMUX consists of a clock control circuit and a shift register with parallel outputs (figure 1). Input data are stored in the shift register synchronized with the shift signal S, and then dumped to the parallel output ports by the dump signal R when the shift register becomes full. A role of the clock controller is to generate the shift-and-dump signals, S and R, from the clock. As shown in figure 1, the circuit structure is regular, resulting in easy timing design and compact layout even if the synchronous timing scheme is used. We selected the shiftand-dump architecture with the synchronous timing scheme because our target application is an analog-to-digital converter operating with a sampling clock pulse.

### 2.2. Implementation

The first RSFQ implementation of the shift-and-dump DMUX was reported by Kaplan and Mukhanov [3]. In this original

design, the clock controller simply consisted of a cascade of toggle flip-flops (TFFs) and a unit cell of the shift register was a non-destructive readout cell (NDRO). The implementation was simple and compact, and the operating margins were sufficiently large. However, there are two drawbacks: (1) the critical path in the clock controller of a 1-to-*n* DMUX includes a cascade of  $\log_2 n$  TFFs, where  $n = 2, 4, 8, 16, \ldots$  is a demultiplexing factor. So, the critical path length becomes longer with increasing *n*, resulting in tight racing between S and R. (2) The NDRO in the shift register requires resetting after the dump operation, which is done by an additional shift signal before input of the next data set. This narrows the timing window and limits the throughput and margins.

Figure 2 shows our implementation of a 1-to-8 shift-anddump DMUX. To improve the throughput and margins, we employed a pipelining in the clock controller and a destructive readout cell for the shift register. The pipeline structure is implemented by inserting a D flip-flop (DFF) between TFFs. The critical path and racing are localized so that the throughput of the clock controller is independent of the demultiplexing factor n. The shift register consists of D2 cells [11], destructive readout cells with two read/out terminals. The dump signal R makes the D2 cells empty so that the shift register is ready for the data input immediately after the dump operation. The global margins of the bias current are expected over  $\pm 25\%$  at 20 GHz by simulation. The 1-to-8 DMUX core consists of approximately 300 junctions, occupies a 1.2 mm  $\times$  0.25 mm area and consumes 0.1 mW dc power at the nominal bias voltage of 2.5 mV.

### 2.3. Standard cell library

All the components of the DMUX in figure 2 are included in our standard cell library that has been developed for efficient design process. The standard cell library consists of RSFQ elementary cells that are frequently used to build large circuits (table 1). The library cells were designed on



**Figure 2.** Block diagram of a 1-to-8 RSFQ shift-and-dump DMUX. Symbols T, D, DFFC and D2 denote a toggle flip-flop (TFF), a D flip-flop (DFF), a D flip-flop with complementary outputs and a D flip-flop with two read/out ports, respectively. Circuits in dashed boxes are added in order to estimate throughput by the average voltage technique.



Figure 3. Cross-section of AIST standard Nb process for RSFQ circuits.

Table 2. Summary of AIST standard process.

| Layer                                                        |                                                      |                   |
|--------------------------------------------------------------|------------------------------------------------------|-------------------|
| GP                                                           | Al (6 nm)/Nb (200 nm)                                | Ground plane      |
| I1                                                           | SiO <sub>2</sub> (200 nm)                            | Insulation layer  |
| BE                                                           | Nb (100 nm)                                          | Base electrode    |
| BAR                                                          | Al-AlO <sub>x</sub> (8 nm)                           | Tunnel barrier    |
| CE                                                           | Nb (125 nm)                                          | Counter electrode |
| I2                                                           | $SiO_2$ (100 nm)                                     | Insulation layer  |
| R                                                            | Ti (2 nm)/Pd (55 nm)                                 | Resistor          |
| I3                                                           | SiO <sub>2</sub> (100 nm)                            | Insulation layer  |
| W                                                            | Nb (300 nm)                                          | Wiring            |
| Critical current density<br>Minimum junction size            | 1.6 kA cm <sup>-2</sup><br>2.8 $\mu$ m × 2.8 $\mu$ m |                   |
| Sheet resistance<br>Dielectric constant of insulation layers | 1.2 Ω<br>4                                           |                   |

our 1.6 kA cm<sup>-2</sup> Nb junction technology described in the next section. For direct connection of the cells, input/output ports were positioned on a 50  $\mu$ m grid and were made of 4  $\mu$ m width striplines in the top wiring layer. For the cell design we used PSCAN, COWBOY and LMETER programs<sup>1</sup> on the CADENCE platform. All the cells were

<sup>1</sup> http://rsfq1.physics.sunysb.edu/RSFQ/software.html

individually fabricated and tested, and the design and layout were experimentally confirmed.

### 3. Fabrication

We fabricated the circuit using our standard Nb trilayer technology. The target critical current density is  $1.6 \text{ kA cm}^{-2}$ . The minimum size of the junction is  $2.8 \mu \text{m} \times 2.8 \mu \text{m}$ .



Figure 4. Photograph of a 1-to-8 DMUX.



Figure 5. Operation of a 1-to-8 DMUX at  $\sim$ 1 kHz clock frequency for a test pattern (11111111 00001111 11110000 1010101...).

Resistors are made of a Pd film with 1.2  $\Omega$  of sheet resistance. The chip size is 5 mm  $\times$  5 mm. Figure 3 shows a cross-section of the layout and table 2 summarizes the parameters.

The circuits were fabricated in a class 10 000 clean room. Nb and Al films were deposited by dc magnetron sputtering in a load-locked system including two deposition chambers for Nb and Al. The background pressures were kept below  $5 \times 10^{-5}$  Pa. Insulation SiO<sub>2</sub> layers were deposited by rf magnetron sputtering with a substrate bias voltage of -200 V for I1 (see figure 3 and table 2), and without a substrate bias voltage for I2 and I3. All lithography processes were done with an i-line (365 nm) 5-to-1 stepper.

The fabrication started with the deposition of an etch stop Al layer on a 75 mm diameter bare Si wafer, followed by the



Figure 6. Maximum throughput estimated by the average voltage measurements as a function of the bias voltage for the DMUX core.

deposition of a 200 nm Nb ground plane (GP). The GP layer was patterned by reactive ion etching (RIE) with CF<sub>4</sub> gas at 20 Pa. The Al etch stop layer was removed by wet etching in a bath of nitric acid at 50 °C. A 200 nm SiO<sub>2</sub> (I1) was deposited at -200 V of substrate voltage. The vias in the I1 layer were made by electron cyclotron resonance (ECR) plasma etching with  $CF_4$  gas at 0.4 Pa. The trilayer of Nb/Al-AlO<sub>x</sub>/Nb was fabricated as follows: a 100 nm Nb base electrode (BE) was deposited at 1.2 Pa of pressure and 86 nm min<sup>-1</sup> of deposition rate. The wafer was cooled for more than 20 min, and an 8 nm Al layer was then deposited. A barrier of  $AlO_x$  (BAR) was formed with O<sub>2</sub> exposure of typically 1000 Pa min at 20 °C of substrate temperature. A 125 nm Nb counter electrode (CE) was then deposited with the same condition as the BE layer. After trilayer fabrication, the CE layer was defined by RIE with CF<sub>4</sub> at 13 Pa. The critical dimension loss for the CE layer was estimated to be 0.4  $\mu$ m at one side. The BAR layer was removed by wet etch process with nitric acid and the BE layer was then defined by RIE with CF<sub>4</sub> at 20 Pa. After deposition of a 100 nm SiO<sub>2</sub> (I2), the resistor (R), consisting of a 55 nm Pd film with a 2 nm Ti adhesion layer, was fabricated by dc magnetron sputtering and lift-off technique. A 100 nm of SiO<sub>2</sub> (I3) was sputtered, and the vias in I2 and I3 were made by ECR etching with CF<sub>4</sub> at 0.4 Pa. Finally, a 300 nm Nb wiring (W) was deposited and then patterned by RIE with CF<sub>4</sub> at 20 Pa. Figure 4 shows a photograph of the 1-to-8 DMUX.

### 4. Testing

The 1-to-8 DMUX was tested at low speed (<1 kHz) using the superconductor circuit tester OCTOPUX [12]. The chip was mounted on a 60-pin probe with a double-layer  $\mu$ -metal shield and cooled with liquid helium at 4.2 K.

Figure 5 shows the result of low-speed functional tests. Fully functional operation was confirmed. Experimental bias margins of the DMUX core were as large as  $\pm 16\%$  at  $\sim 1$  kHz of clock frequency.

The DMUX has extra terminals, shown in dashed boxes in figure 2, for estimates of throughput by average voltage measurements. Injection of a dc current  $I_{IN}$  into the junction  $J_0$  generates a stream of input data (11111...) at a data rate  $V_{\rm IN}/2\Phi_0$  where  $V_{\rm IN}$  is the dc voltage across  $J_0$ . During correct operation of the DMUX, dc voltages at the terminals should be  $V_S = 7V_{\rm IN}/16$ ,  $V_Q = 0$  and  $V_R = V_{\rm IN}/16$ . Although these relationships are not sufficient to confirm the correct operation, deviation from them indicates an error in the operation. Figure 6 shows the maximum throughput as a function of the bias voltage for the DMUX core. The result suggests that the DMUX can be operated at the input data rates up to 25 Gb s<sup>-1</sup>.

### 5. Conclusions

A demultiplexer (DMUX) is important for use of RSFQ circuits and systems in most practical applications. We designed, fabricated and successfully tested an RSFQ DMUX based on the shift-and-dump architecture. This architecture has advantages such as modularity and compactness, especially if the synchronous timing scheme is employed, compared with the binary tree architecture. The DMUX comprises a clock controller and a shift register with parallel outputs. To improve throughput and margins, we employed a pipeline structure for the clock controller and a destructive readout cell as a unit cell of the shift register. A 1-to-8 DMUX was implemented using our standard cell library based on a 1.6 kA cm<sup>-2</sup> Nb trilayer technology. Low speed testing confirmed fully functional operation with the bias margins as large as  $\pm 16\%$ . Results of average voltage measurements implied that the DMUX was operated at input data rates up to  $25 \text{ Gb s}^{-1}$ .

### Acknowledgments

This work was supported in part by the Ministry of Education, Culture, Sports, Science and Technology through Special Coordination Funds for promoting Science and Technology. We are grateful to F Hirayama for his help in fabrication and testing.

### References

- Likharev K K and Semenov V K 1991 IEEE Trans. Appl. Supercond. 1 3
- [2] Bunyk P I, Likharev K and Zinoviev D 2001 Int. J. High Speed Electron. Syst. 11 257 and references therein
- [3] Kaplan S B and Mukhanov O A 1995 IEEE Trans. Appl. Supercond. 5 2835
- [4] Kirichenko A F, Semenov V K, Kwon Y K and
- Nandakumar V 1995 *IEEE Trans. Appl. Supercond.* **5** 2857 [5] Deng J Z, Whiteley S R and Van Duzer T 1995 *Extended Abstracts of ISEC95* p 189
- [6] Miller D L, Przybysz J X, Worsham A and Kang J 1997 IEEE Trans. Appl. Supercond. 7 2690
- [7] Maezawa M, Kameda Y, Kurosawa I and Nanya T 1997 IEEE Trans. Appl. Supercond. 7 2705
- [8] Yoshikawa N, Deng Z J, Whiteley S R and Van Duzer T 1997 Extended Abstracts of ISEC97 p 353
- [9] Kirichenko A F 1999 IEEE Trans. Appl. Supercond. 9 4046
- [10] Zheng L, Yoshikawa N, Deng J, Meng X, Whiteley S R and Van Duzer T 1999 IEEE Trans. Appl. Supercond. 9 3310
- [11] Zinoviev D Y and Likharev K K 1997 *IEEE Trans. Appl.* Supercond. 7 3155
- [12] Zinoviev D and Polyakov Y 1997 IEEE Trans. Appl. Supercond. 7 3240