# A Delay-Locked Loop using a Synthesizer-based Phase Shifter for 3.2 Gb/s Chip-to-Chip Communication

Chun-Ming Hsu, Charlotte Y. Lau, Michael H. Perrott Microsystems Technology Laboratories Massachusetts Institute of Technology Cambridge, MA USA cmhsu@mit.edu, http://www-mtl.mit.edu/~perrott/

*Abstract*—A delay-locked loop in 0.18-µm CMOS for chip-tochip communication at 3.2 Gb/s is presented. By leveraging the fractional-N synthesizer technique, this architecture provides fine-resolution and infinite-range delay and is less sensitive to process, temperature, and voltage variations than that of conventional techniques using a phase interpolator. A key element of the proposed structure is a digital Sigma-Delta modulator architecture that allows a high clock rate with compact area and reasonable power dissipation. The custom prototype IC operates at a 1.8-V supply voltage with a current consumption of 55 mA. The phase resolution is 1.4° and measured differential and single-ended rms clock jitter is 3.6 ps and 4.8 ps, respectively. The core circuits occupy 0.42 mm<sup>2</sup>.

## I. INTRODUCTION

Delay-locked loops (DLL) using analog phase interpolators as phase shifters have become popular because they provide reasonable phase resolution with an infinite phase range [1]. These structures rely on interpolation circuits that are implemented in the analog domain, which must be accurately controlled to maintain the linearity of the phase shifter. Analog DLLs constructed with such phase interpolators usually provide good jitter performance, but the relatively high analog complexity of these blocks complicates the design of such analog DLLs. In addition, the design of such structures will be complicated by the need for low-voltage design in future CMOS processes. For lowvoltage design, digital DLLs using digital phase interpolators can be implemented [2], but they usually only provide moderate phase resolution and jitter performance.

We propose to use a simple voltage-controlled oscillator (VCO) instead of a phase interpolator to achieve the phase shifting functionality within a DLL. By implementing the VCO as a standard ring oscillator, this approach offers a simple yet highly digital implementation that can achieve fine phase shifting and infinite phase range. By applying feedback to the VCO in the form of a fractional-N synthesizer, the phase resolution can be digitally controlled and is less sensitive to process, temperature, and voltage (PVT) variations than conventional structures based on phase interpolators.

In the following section, we discuss the proposed DLL architecture in detail. We then provide details of the circuit implementation, which includes a proposed second order digital Sigma-Delta structure that allows high clock rates with a low power and compact area implementation. Finally, we present the measurement results and conclusions.

# II. PROPOSED DLL ARCHITECTURE

## A. Synthesizer-based Phase Shifter

We begin by discussing the application of a VCO as a phase shifter. As shown in Fig. 1, a VCO can be modeled as an integrator with the VCO phase being regarded as the output. The input voltage,  $V_{ctrl}(t)$ , is multiplied by the VCO gain,  $K_v$ , and integrated to become the phase,  $\Phi_{out}(t)$ . Thus, if a positive or negative rectangular pulse with a height of  $\Delta V$  and a width of  $T_p$  is fed into the VCO, the VCO phase increases or decreases by  $\Delta V \cdot K_v \cdot T_p \cdot 2\pi$  at each time increment. Through proper adjustment of these parameters, very fine phase resolution can be achieved with infinite phase range. Within a DLL application, the phase would be



Figure 1. Application of a VCO as a phase shifter.

appropriately shifted to a desired value according to the control signal of the DLL.

When considering the VCO as a standalone element, it is guite difficult to accurately control  $\Delta V$ , K<sub>v</sub>, and T<sub>p</sub> and to set the nominal oscillation frequency of the VCO such that it is locked to the received clock of the DLL. However, by placing the VCO within a Sigma-Delta fractional-N frequency synthesizer [3], we can accurately control the VCO with digital precision. Fig. 2(a) illustrates this concept, with discrete-time impulses of value of  $\Delta f$  or  $-\Delta f$  being fed into the Sigma-Delta modulator (SDM) input of a fractional-N synthesizer. This concept can be understood by the model of the synthesizer shown in Fig. 2(b) [3]. The input,  $n_{sd}[k]$ , is modulated, accumulated, and then filtered by the PLL loop filter to generate the desired  $\Phi_{out}(t)$ . Thus, whenever an impulse is fed into the synthesizer,  $\Phi_{out}(t)$  increases or decreases by  $2\pi\Delta f \cdot T$  each time. A phase resolution of  $2\pi/2^n$ can be achieved by simply setting the number of fractional bits in the SDM to n. Thus, the resolution can be accurately and finely controlled and is independent of PVT variations. For example, when an 8-bit SDM is used, the phase resolution is 1.4°. Compared to a phase interpolator, both the linearity and resolution of the proposed phase shifter are improved.

## B. DLL Architecture

Fig. 3 shows the proposed DLL architecture. The DLL adjusts the VCO phase such that the adjusted clock is aligned to the center of the received data, data(t). To do so, a bangbang phase detector (BBPD) is used to control the phase setting of a fractional-N synthesizer. Aside from achieving fine resolution and infinite range in the phase adjustment, the proposed DLL structure also allows us to easily multiply the incoming clock without need for a fifty percent duty cycle of that clock. Therefore, we can easily support double data rate (DDR) applications in which the incoming clock needs to be multiplied by two.

We now mention a few details of the proposed structure. First, the output clock of the DLL structure is multiplied by



Figure 2. (a) Schematic of the synthesizer-based phase shifter. (b) Model of this phase shifter.



Figure 3. Proposed DLL architecture.

the ratio N/M – we have chosen N=6 and M=3 in the prototype, so that the input clock frequency is multiplied by two. Assuming a 1.6 GHz input clock, the output clock is then 3.2 GHz and the Sigma-Delta clock is 533 MHz. The bandwidth of the fractional-N PLL is set to approximately 4 MHz, and the period of the pulses entering the SDM input, as shown in Fig. 2, is set to approximately  $f_d = 1/T_d = 1$  MHz so that the PLL properly settles after each pulse is applied. The output of the BBPD is fed into a saturating integrator that allows its output to be averaged and converted from a three-level signal (1,0,-1) to a two level signal (1,-1). The integrator output is then sampled by a D-flip-flop (DFF) with a period of T<sub>d</sub>. The sampled signal is then fed into the input of the SDM.

#### C. Sigma-Delta Modulator

As mentioned above, the SDM operates at a relatively high speed of 533 MHz, which could potentially lead to high power dissipation and circuit complexity (if pipelining were required, for instance.) Instead of using a standard secondorder SDM, we propose a more compact and power-saving second-order modulator architecture, as shown in Fig. 4. The central part of this modulator is a first-order SDM, whose signal transfer function (STF) and noise transfer function (NTF) are  $Z^{-1}$  and  $1-Z^{-1}$ , respectively. A digital differentiator, whose transfer function is 1-Z<sup>-1</sup>, is then added to get a cascaded NTF of  $(1-Z^{-1})^2$ , which is equivalent to that of a second-order SDM. However, this results in a cascaded STF of  $Z^{-1}(1-Z^{-1})$ . This STF is undesirable, but can be easily fixed by adding a digital accumulator, whose transfer function is  $1/(1-Z^{-1})$ , before the first-order SDM so that the overall cascaded STF becomes  $Z^{-1}$ . Thus, both the STF and NTF of the proposed second-order SDM are the same as those of a standard topology. The output of the proposed structure is a three-valued signal (1, 0, -1).

The advantage of the proposed SDM modulator is not clear until one examines the impact of applying a multi-rate clock to the structure as shown in Fig. 4. To see how a multi-rate implementation can be applied, first notice that the input to the SDM is being updated at a rate of approximately  $f_d = 1$ 



Figure 4. Proposed SDM.

MHz while the output is being updated at  $f_{ref} = 533$  MHz. Thus, the first stage accumulator can be simplified to an up/down counter (UDC) that operates at a rate of approximately 1 MHz, whereas the third stage differentiator must operate at 533 MHz. To connect these different sample rates, the second stage (first-order SDM) must be progressively clocked from low to high frequencies. We achieve this goal by cascading three first-order SDMs with different resolutions and clock rates. By using this approach, only a small portion of the overall SDM circuit operates at the highest frequency. Thus, the power consumption and design complexity is reduced at the expense of a slightly larger area.

In designing the multi-rate, first-order SDM, the bitlengths of the lower clock frequency stages are chosen to be higher than that of the higher clock frequency stages in order to ensure that the total quantization noise is dominated by the last (highest frequency) stage. By gradually changing the clock rate through the structure, metastability and synchronization problems are avoided.

In order to accommodate overflows in the first stage U/D counter, its up and down overflow signals are propagated and realigned by DFFs in each clock domain before being added to the output, as shown in Fig. 4. Interestingly, despite the addition of the overflow signals, the output remains at three levels (1,0,-1).

#### III. CIRCUIT IMPLEMENTATION

As previously mentioned, the DLL prototype was designed for a 3.2 Gb/s DDR RAM application with an input clock frequency of 1.6 GHz. An 8-bit SDM was chosen to provide a phase resolution of  $1.4^{\circ}$ . The update clock for the input to the SDM,  $f_{d}$ , is generated by dividing  $f_{ref}$  by 512, which is, in turn generated from the VCO clock after being divided by N=6. The bandwidth of the PLL is chosen to be 4 MHz to jointly minimize the impact of VCO phase noise and SDM quantization noise. The behavior of this system was verified with the CppSim behavior level C++ simulator.

Fig. 5 shows a simplified schematic of the circuits excluding the SDM. In order to achieve a compact design, we use a ring oscillator similar to that proposed in [5]. A



Figure 5. Schematic of the DLL excluding the SDM.

divider based on [6] is designed to provide a divide-ratio from five to seven. The XOR phase-frequency detector (PFD) in [3] with current-mode logic is used. A differentialto-single-ended charge pump and an on-chip loop filter are used as shown in the figure. Since the  $K_v$  of the VCO is more linear when its control voltage is lower, a source follower is inserted in the loop filter to shift the voltage level. A standard BBPD [4] is used and followed by two different-tosingle-ended converters. The integrator is composed of a current pump and a capacitor. An inverter following the integrator is used as a limiter.

As shown in Fig. 5, only simple analog circuits are required in the proposed DLL architecture without the need for good matching between any of their elements. The overall architecture is primarily digital and well suited for more advanced CMOS processes. Area and power could be dramatically reduced with a more advanced process since full swing logic could be used rather than current-mode logic in the divider, PFD, and BBPD. Current-mode logic was required in this case in order to meet the high speed requirements of the design in 0.18-µm CMOS.

## IV. RESULTS

The prototype chip was fabricated in a 0.18- $\mu$ m digital CMOS process. The die photograph is shown in Fig. 6, and its active area is 600  $\mu$ m X 700  $\mu$ m. It was packaged and mounted on a printed circuit board for measurement. The chip operates at 1.8 V, and the DLL dissipates 55 mA excluding the input and output buffers. Measured phase noise and K<sub>v</sub> of the VCO are –118 dBc/Hz at 20MHz offset and 140 MHz/V, respectively.

The measured single-ended recovered clock and data jitter under different conditions are summarized in Table I. Setting the synthesizer in integer-N mode indicates the intrinsic jitter performance, whereas the synthesizer is set to fractional-N mode to test the actual DLL performance. Note that clock jitter increases when the data output driver is turned on due to the coupling between the clock and data output drivers through the shared bias circuits.

Fig. 7 illustrates the eye-diagram of the recovered data



Figure 6. Chip microphotograph.

| TABLE I. Measured Single-ended RMS Clock/Data Jitter |
|------------------------------------------------------|
|------------------------------------------------------|

| Testing Condition                                      | RMS Clock/Data Jitter (ps)         |                                   |                                   |
|--------------------------------------------------------|------------------------------------|-----------------------------------|-----------------------------------|
|                                                        | 3.2Gb/s<br>2 <sup>31</sup> -1 PRBS | 3.2Gb/s<br>2 <sup>7</sup> -1 PRBS | 1.6Gb/s<br>2 <sup>7</sup> -1 PRBS |
| Integer-N PLL<br>with data output driver off           | 3.4/-                              | 3.4/-                             | 3.1/-                             |
| Integer-N PLL<br>with data output driver on            | 4.3/30.1                           | 4.1/30.2                          | 4.3/4.7                           |
| DLL in synchoronous mode                               | 4.8/30.5                           | 4.7/29.8                          | 4.7/5.2                           |
| DLL in asynchoronous mode<br>(frequency offset = 3kHz) | 4.8/30.0                           | 4.6/30.7                          | 4.6/5.0                           |

and clock when the input data is a 3.2 Gb/s PRBS  $2^{31}$ -1 sequence, and reveals 4.8 ps single-ended clock jitter and 30 ps single-ended data jitter. A separate differential clock measurement reveals jitter less than 3.6 ps, which means part of the 4.8ps single-ended clock jitter is due to common-mode noise. The high data jitter is due to intersymbol interference which is likely introduced by the BBPD and output buffer having inadequately high bandwidth. To verify this fact, Fig. 8 shows that the output data jitter is reduced to 5.2 ps with a 1.6 Gb/s PRBS  $2^7$ -1 input sequence. Note that the bit-error rate of the DLL is less than  $10^{-12}$  in all of the measurements.

The DLL was also tested in asynchronous mode by introducing a frequency offset between the input data and clock. By doing so, the phase between the data and clock increases linearly so that the DLL must constantly rotate its output phase. As Table I reveals, the resulting jitter with a frequency offset of 3 kHz is very close to that obtained in synchronous mode, which implies very good linearity of the synthesizer-based phase shifter. The bit-error rate remains less than  $10^{-12}$  in these measurements.

## V. CONCLUSIONS

A 3.2 Gb/s DLL in 0.18- $\mu$ m CMOS for chip-to-chip communication was presented. By leveraging the fractional-N synthesizer technique, this architecture provides a digitally-controlled phase adjustment with fine-resolution and infinite-range that is less sensitive to PVT variations than conventional techniques. The prototype operates at a 1.8-V supply voltage with a current consumption of 55 mA. The phase resolution and differential rms clock jitter are 1.4° and 3.6 ps, respectively.



Figure 7. Measured recovered data and clock when the input data is a 3.2 Gb/s PRBS 2<sup>31</sup>-1 sequence.



Figure 8. Measured recovered data and clock when the input data is a 1.6 Gb/s PRBS 2<sup>7</sup>-1 sequence.

## ACKNOWLEDGMENT

This work was funded by the MARCO Focus Center for Circuit & System Solutions (C2S2, www.c2s2.org) under contract 2003-CT-888. The authors also wish to thank National Semiconductor for chip fabrication, and Peter Holloway in particular for his help on this project.

## References

- T. H. Lee, K. S. Donnelly, J. T. C. Ho, J. Zerbe, M. G. Johnson, and T. Ishikawa, "A 2.5 V CMOS delay-locked loop for a 18 Mbit, 500 Megabytes/s DRAM," *IEEE J. Solid State Circuits*, vol. 29, pp. 1491-1496, Dec. 1994.
- [2] B. W. Garlepp, K. S. Donnelly, J. Kim, P. S. Chau, J. L. Zerbe, C. Huang, C. V. Tran, C. L. Portmann, D. Stark, Y.-F. Chan, T. H. Lee, and M. A. Horowitz, "A portable digital DLL for high-speed CMOS interface circuits," *IEEE J. Solid State Circuits*, vol. 34, pp. 632-644, May 1999.
- [3] M. H. Perrott, M. D. Trott, and C. G. Sodini, "A modeling approach for sigma-delta fractional-N frequency synthesizers allowing straightforward noise analysis," *IEEE J. Solid State Circuits*, vol. 37, pp. 1028-1038, Aug. 2002.
- [4] J. Lee, K. S. Kundert, and B. Razavi, "Analysis and modeling of bang-bang clock and data recovery circuits," vol. 39, pp. 1571-1580, Sep. 2004.
- [5] Y. A. Eken and J. P. Uyemura, "A 5.9-GHz voltage-controlled ring oscillator in 0.18-um CMOS," *IEEE J. Solid State Circuits*, vol. 39, pp. 230-233, Jan. 2004.
- [6] S. Vaucher, "A family of low-power truly modular programmable divider," *IEEE J. Solid State Circuits*, vol. 39, pp.230-233, July 2000.