# A Fast-Locking All-Digital Deskew Buffer with DCC using Digital-Controlled Delay Line

## A.Ashwini<sup>1</sup>, H. Shravan Kumar<sup>2</sup>

<sup>1</sup>PG Student [DECS], Department of ECE, Vardhaman College of Engineering, Hyderabad, India

<sup>2</sup>Assistant Professor, Department of ECE, Vardhaman College of Engineering, Hyderabad, India

Abstract: This paper presents a wide range fast lock all-digital deskew buffer using a digital controlled delay line, which achieves low jitter, fast lock, low power consumption and 50% duty cycle correction. A cyclic time-to-digital converter is introduced to decrease the locking time in conventional register-controlled delay-locked loop. A balanced edge combiner to achieve 50% output clock is also presented. A circuit is designed in 0.18µm technology to demonstrate the feasibility of the proposed architecture with better figure of merit. The circuit can accept the input clock rates from 250 MHz to 1 GHz to generate close to output clocks with low jitter and phase noise. It owns the capability of closed loop power consumption.

**Keywords:** Delay-locked loop(DLL), Digital-Controlled Delay Line(DCDL), Duty Cycle Correction(DCC), Edge Combiner, Time-to-Digital Converter(TDC)

## 1. Introduction

With the rapid advances in semiconductor technologies, modern digital systems operated at several hundred megahertz have been successfully developed for many years. Since there are more and more IC modules integrated on the same printed-circuit board (PCB), the clock-skew problem will undoubtedly be significant and becomes one of the bottlenecks for high-performance systems. The clock- skew problem exists in several different situations. For example, the input clock driver in any chip will somewhat introduce uncertain time delay between the internal clock and the external clock. Thus, the internal clocks in a multi-chip system become asynchronous and problems will occur when data transfer between chips is needed. This phenomenon will also become more serious for chips operated at gigahertz in the future. In addition, on a large PCB, the length of traces between different chips and the clock generator may differ from each other. Hence, the clock-skew problem will inevitably exist. Similar problems also happen in a multiboard system in which different boards are connected through cables. Briefly speaking, the clock-skew problem comes from different propagation delays of system clocks on the board or in a chip, and is usually dependent on process, voltage, temperature, and loading (PVTL), which make it a complicated issue.

Reducing the clock skew can not only further increase system clock frequency but also avoid system malfunction. Phase-locked loops (PLLs) and delay-locked loops (DLLs) have been widely used to suppress the clock skew in highspeed memory interface circuits and microprocessors. Comparing to conventional techniques using PLLs, DLLs are superior in the deskew buffer design due to several merits including lower cost, less jitter, smaller power consumption and better loop stability. DLLs can be classified into analog and digital types. Even though the analog DLL scheme [1], [2] can achieve better performance on jitter minimization and skew cancellation, it still has the drawbacks of the long locking time, the process dependence and the lengthy design cycle. However, the digital DLL can speed up the locking procedure with the aid of digital circuit techniques [3]–[12]. It possesses the advantage of portability in technology migration. Moreover, the digital DLLs are suitable in low-power and low-voltage designs.

In this paper, an all-digital deskew buffer using digital controlled delay line proposed to preserve the capability of closed-loop operation. In order to accomplish the goal of fast-locking, a cyclic time-to-digital converter (CTDC) is introduced for deriving the coarse setting of the required control codes of the duty cycle. To fulfill the functions of correcting duty cycle and cancelling clock skew in the same circuit, a quad-state controller is also presented for determining the operating sequences. A balanced edge combiner with equal delay in setting and resetting signal paths is proposed to minimize the output duty-cycle distortion.

# 2. Proposed Architecture

The simplified architecture of the proposed DLL with DCC is drawn in Fig. 1. Basically, it makes use of the digital DCC technique by using an edge combiner so as to generate approx 50% output clock.. To fulfill the requirement of fast-locking capability, a new cyclic time-to-digital converter (CTDC) is employed to obtain the coarse setting of control codes (MSBs in CC1 ~ 2) for the two HDLs. Two phase detectors are dedicated for giving lead-lag information during DCC and deskew operations. The quad-state controller is responsible for determining the operating sequences so as to derive the fine setting of control codes (LSBs in CC1 ~ 2) by adjusting the values of the shift registers and the latch array inside.



Figure 1: Simplified architecture of the proposed deskew buffer with DCC

# 3. Circuit Implementation

## 3.1 Digital-Controlled Delay Line (DCDL)

Fig. 2(a) shows the entire binary-weighted digital-controlled delay line. The delay line is divided into two sections, coarse section and fine section. The coarse section consists of 4 delay stages, and each stage provides 8 unit delays. Moreover, delay stages with longer delay time will suffer from larger variations in the clock duty cycle, especially in the case of low supply voltages. To avoid such potential variations, more stages with equal delay time are cascaded in the design at the cost of the slightly increased chip area.

The schematic of each delay stage is shown in Fig. 2(b). In the design of delay stages, one of the important things is to preserve 50% clock duty cycle. Usually, the propagation delay through a delay stage resulting from a HIGH-to-LOW transition is not equal to that of a LOW-to-HIGH transition. This nuisance inevitably will result in the undesired pulse shrinkage and duty cycle distortion of the output clock.



Figure 2: (a) Binary-weighted digital-controlled delay line



Figure 2: (b) Schematic of single delay stage employing interpolation

In order to achieve high resolution in a low-voltage digital DLL, the unit delay is obtained from the difference of the delay time of the two different signal paths in Fig. 2(b). When the control signal is 0, the upper path is selected and vice versa. The two paths are the same except the added capacitors in the lower path. Instead of using larger-size MOS capacitors, MOSFET's with the same size are combined in parallel to obtain better linearity. Fig. 3 shows the waveform for a single delay element with fast and slow delay analysis shown. The larger the offset delay is, the longer the loop delays would be. Now, T<sub>delay</sub> can be rewritten as

$$T_{delay} = T_{constant} + T_{excess} = T_{constan} + n. T_{unit}$$

Where T<sub>constant</sub>, T<sub>excess</sub>, T<sub>unit</sub> and n denote the constant delay, excess delay, unit delay, and the control word of the delay line respectively.



Figure 3: Transient analysis of single digital delay element

## 3.2 Cyclic Time-to-Digital Converter

In order to obtain the control codes of all HDLs as their coarse setting quickly, the CTDC shown in Fig. 4(a) is proposed to achieve the fastest conversion time. Its architecture is simply composed of a period generator and a gated oscillator. As shown in Fig. 4(b), the period generator neither consists of two DFFs and one NOR gate. When the Start signal is activated, the period generator detects the first and second rising edges of CLK<sub>IN</sub> and produces a one-shot signal, Period, with its pulse width equal to a



Figure 4: (a). Cyclic time-to-digital converter

Volume 4 Issue 7, July 2015



cycle time of the input clock, CLK<sub>IN</sub>. The Period signal is served as an enable signal of the succeeding part, the gated oscillator, to emanate pulses for triggering the shift registers inside the quad-state controller which will be discussed in later subsection. The number of pulses is computed via the shifting function of the shift registers. The circuit diagram of the gated oscillator is depicted in Fig. 4(c). Before the gated oscillator activates, the Start signal will ensure the low state of the generated clock, CLK<sub>TDC</sub>. When receiving the notification from the period generator, i.e., the rising edge of the *Period* signal, the output of the DFF will be triggered to the high state. The first buffer (BUF1) will provide the enough pulse width of this trigger signal to guarantee its operation correctly. Then, the output transition of the DFF starts to traverse the succeeding buffers. The second buffer (BUF2) is dedicated to reset the DFF to the low state again. Finally,  $CLK_{TDC}$  is generated after the third buffer (BUF3) and fed back to the trigger input of the DFF. As long as the *Period* signal stays high for one cycle time of the input clock  $(CLK_{IN})$ , the feedback path is chosen by the multiplexor for oscillation.

#### 3.3 Edge Combiner



Figure 5: (a). Edge Combiner



The conventional edge combiner using two DFFs [8] has unequal propagation delay in set/reset paths due to the nonzero reset time of the DFF. The unbalanced path delay will distort the accuracy of the duty cycle seriously, especially at high frequency. Balanced edge combiners [2], [8] have been proposed to diminish the distortion by providing equal loading in both paths. However, the circuit architectures are still unmatched and will encounter large deviation cause by PVTL influence. To alleviate the path delay problem, a new edge combiner is proposed and depicted in Fig. 5(a). It consists of a cross-coupled SR-latch, two short pulse generators (SPG) at the input and two transmission gates at the output. The short pulse generator illustrated in Fig. 5(b) with the shortest response time accepts the output clock from half delay lines and produces a positive one-shot signal to trigger the SR-latch. Then the internal signals, X and Y, inside the edge combiner are brought out complementarily to the transmission gates for combining into the single-ended output clock, CLK<sub>EC</sub>. Obviously, the signal paths from the input signals, S and R, to the transmission gates possess equal loading. Hence the duty cycle of CLK<sub>EC</sub> is more correlated to S and R.

#### **3.4 Phase Detector**

Fig. 6 illustrates the circuit topology of the phase detectors, PD1 and PD2, respectively. Since PD1 are used twice at the first fine measurement and the deskew phase (i.e., S2 and S4), the clock to be aligned is altered from CLK<sub>DL</sub> to CLK<sub>OUT</sub>. The upper two DFFs and three NAND gates are responsible for determining whether resides or not in the locking window generated by CLK<sub>IN</sub>. If the compared signal is still far away from CLK<sub>IN</sub>, the lead/lag signals (RT1, LT1) will be activated so as to move the setting value inside the shift registers to left or right. As long as the compared signal is close to CLK<sub>IN</sub> enough, the lock signal (LD1) is thus enabled. The locking window for the first fine measurement is equal to two LSBs (2\*t<sub>FB</sub>) because HDL1 and HDL2 change their control codes concurrently during the first fine measurement (S2). When the system reaches the



Figure 6: (a) Phase Detector PD1

## International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438



Figure 6: (b) Phase Detector PD2

deskew phase (S4), the locking window is further shrunk to one gate delay ( $t_{FB}$ ) since only HDL1 is adjusted. In order to maintain the system stability, the phase comparison is performed every two input clock cycles such that the produced control codes can be sent to the HDLs correctly. It is because the required response time from the phase detector to the delay line. The lower DFF is a simple divideby-2 circuit to accomplish this function. The short pulse generator (SPG) is identical to the one in Fig. 5(b). Whenever it is necessary to adjust the delay of HDLs, CLK<sub>PD1</sub> bursts out for moving the corresponding control codes in the shift registers.

The second phase detector (PD2) is quite similar to PD1 except it only accepts the output of HDL3 (CLK<sub>HDL</sub>) and works at the second fine measurement (S3). Unlike PD1, PD2 will be turned off when entering the deskew phase (S4) so as to store the erroneous timing information for interpolation. Since there are two phase detectors in this design, it represents that two independent control loops exist with the possibility of interacting with each other. However, PD1 activates during the first fine measurement (S2) and then halts its operation automatically after entering the second fine measurement (S3) because CLK<sub>PD1</sub> is inhibited. Thus, PD1and PD2 will not generate output clock pulses (CLK<sub>PD1</sub> and CLK<sub>PD2</sub>) at the same state of operation so as to maintain the system stability.

## 3.5 Quad State Controller

The quad-state controller is designed to provide the necessary clock generation for deciding the control codes of the two half delay lines. Its circuit topology is shown in Fig. 7. It consists of a state generator, two shift clock selectors (SCS1 and SCS2), and two sets of shift registers (SR1 and SR3).

The state generator depicted in Fig. 8 enters the first state (S1) when both *Start* and  $\text{CLK}_{\text{IN}}$  signals are received. After two input clock cycles, the state becomes S2. Then the state generator accepts the locking signals (LD1 and LD2) from phase detectors so as to change into the corresponding states (S3 and S4) and to determine the multiplexor selection signal (*MSEL*) in front of SCS1 for CTDC operation. After

the deskew operation is completed, the *Lock* signal is incurred correspondingly.





The detailed circuit diagram of the shift clock selector is illustrated in Fig.9. When operating at the CTDC mode (S1), only the coarse shift clock (CLK<sub>C1</sub>) is generated to activate the coarse part of the first shift register (SR1). During the first fine measurement (S2) and deskew phase (S4) with the first phase detector (PD1) in operation, CLK<sub>F1</sub>starts to shift the lower bits of control codes (CC1) for the fine section. Since all the HDLs are initialized with minimum delay, SCS1has to deal with the situation when the fine delay line exceeds its maximum delay setting. When this situation happens, the value to select the output of the *fine* section will be circulated from CC1 [n] into CC1[1] again and thus a overflow signal (OV1) is generated. Then OV1 fromSR1 forces CLKC1to occur once to increase the delay provided by the coarse delay line. The second shift clock selector (SCS2) is identical to SCS1. According to the proposed quad-state controller, the coarse part of SR3 (MSB in CC3) is also determined in CTDC mode (S1) so as to reduce the locking time. Then SCS2 works at the second fine measurement (S3) to adjust the setting of HDL3. For the coarse measurement in the first state (S1), it takes only two input clock cycles for CTDC operation. Then the operation time of the second and third states (S2 and S3) takes (2\*n)cycles at most. The required clock cycles in the deskew phase depend on the skew time (t<sub>skew</sub>) caused by the

Volume 4 Issue 7, July 2015 <u>www.ijsr.net</u> Licensed Under Creative Commons Attribution CC BY propagation delay of the edge combiner, interpolator and output buffer. Then the overall locking time (in input clock cycles) can be estimated as



# 4. Results

All-digital deskew buffer architecture was designed in cadence virtuoso at 0.18µm CMOS technology. Each half delay line of the DLL contains an 8-stage fine delay line and a 32-stage coarse delay line. The ADDLL can operate at the input clock rate within 250MHz to 1GHz. It can tolerate the duty cycle variation of the input clock from 30% to 70% and generate a approximate 50% output clock. Figs. 10 show the input and output waveforms operated at 625MHz. Fig. 11 shows the state operation of the controller with the clock rate at 625MHz. It takes 2 cycles to change to for CTDC mode after the start signal is activated. Then the first fine measurement spends 4 cycles to measure the period of input clock and at the third state, cycles are consumed to accomplish the second fine measurement. Finally, to cancel the skew caused by the edge combiner and output buffer takes 14 cycles at the deskew phase. The overall locking time is no greater than 18 cycles. The power consumption of the proposed ADDLL is 14.43mW and 27.4012mW at 250and 625MHz respectively.



Figure 10: Input and Output waveform



Figure 11: State Operation and locking transient response

Table 1: Performance Summary and Comparison

| Tuble 1. I enformance Summary and Comparison |                  |                |
|----------------------------------------------|------------------|----------------|
|                                              | TVLSI02[1]       | PROPOSED       |
| Technology                                   | 0.18µm           | 0.18µm         |
| Supply                                       | 1.8v             | 1.8v           |
| Frequency Range                              | 250MHz to 650MHz | 250MHz to 1GHz |
| <b>Correction Range</b>                      | 30% to 70%       | 30% to 70%     |
| Delay Line                                   | Coarse-Fine      | Coarse-Fine    |
| Architecture                                 |                  |                |
| Closed Loop                                  | yes              | yes            |
| Locking Time                                 | <36 cycles       | <18 cycles     |
| Pk - Pk Jitter                               | 21.1ps@625MHz    | 17.46ps@625Mhz |
| Power                                        | 28.156mW @250MHz | 8.89mW@250MHz  |
|                                              | 49.4012mW@625MHz | 14.46mW@650MHz |
|                                              |                  | 28.62mW@1GHz   |

# 5. Conclusion

A fast-locking all-digital deskew buffer using a digital control delay line with duty cycle correction is presented in this paper. The cyclic time-to-digital converter reduces the enormous locking time in conventional register controlled scheme to only two input lock cycles. The balanced edge combiner can achieve approx 50% output clock with equal loading at both setting and resetting clocks. A quad-state controller is also designed to accomplish the necessary sequence of DCC and deskew operations. The design uses 0.18µm technology proofs the feasibility of the proposed architecture. It can accept the clock rates within 250MHz to 1GHz. It owns the capability of closed loop control with low jitter and low power consumption.

# References

- [1] You-Gang Chen, Hen-Wai Tsao, and Chorng-Sii Hwang "A Fast-Locking All-Digital Deskew Buffer With Duty-Cycle Correction," *IEEE Transactions on VLSI systems*, vol.21, No. 2, Feb 2013.
- [2] Y.Moon, J. Choi, K. Lee, D.-K. Jeong, and M.-K. Kim, "An all-analog multiphase delay-locked loop using a replica delay line for wide-range operation and lowjitter performance," *IEEE J. Solid-State Circuits*, vol. 35, no. 3, pp. 377–384, Mar. 2000.

- [3] J.-S. Wang, C.-Y.Cheng, J.-C.Liu, Y.-C.Liu, and Y.-M. Wang, "A duty-cycle-distortion-tolerant half-delay-line low-power fast-lock-in all-digital delay-locked loop," *IEEE J. Solid-State Circuits*, vol. 45, no. 5, pp. 1036– 1047, May 2010.
- [4] H.-H. Chang and S.-I. Liu, "A wide-range and fastlocking all-digital cycle-controlled delay-locked loop," *IEEE J. Solid-State Circuits*, vol. 40, no. 3, pp. 661– 670, Mar. 2005.
- [5] S.-K. Kao and S.-I. Liu, "A 62.5–625-MHz anti-reset all digital delay locked loop," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 54, no. 7,pp 566–570, Jul. 2007.
- [6] G.-K. Dehng and S.-I. Liu, "Clock-deskew buffer using a SAR-controlled delay-locked loop," *IEEE J. Solid-State Circuits*, vol. 35, no. 8, pp. 1128–1136, Aug. 2000.
- [7] Y.-M. Wang and J.-S. Wang, "An ultra-low-power fastlock-in small jitter all-digital DLL," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2005, pp. 422–423.
- [8] R.-J. Yang and S.-I. Liu, "A 40–550 MHz harmonicfree all-digital delay-locked loop using a variable SAR algorithm," *IEEE J. Solid- State Circuits*, vol. 42, no. 2, pp. 361–373, Feb. 2007.
- [9] S.-K. Kao and S.-I. Liu, "All-digital fast-locked synchronous duty cycle corrector," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 53, no. 12, pp. 1363– 1367, Dec. 2006.
- [10] D. Shin, J. Song, H. Chae, and C. Kim, "A 7ps jitter 0.053 mm fast lock all-digital DLL with a wide range and high resolution DCC," *IEEE J. Solid-State Circuits*, vol. 44, no. 9, Sep. 2009.
- [11] T. Matano, Y. Takai, T. Takahashi, Y. Sakito, and I. Fujii, "A 1-Gb/s/pin 512-Mb DDRII SDRAM using a digital DLL and a slew-rate-controlled output buffer," *IEEE J. Solid-State Circuits*, vol. 38, no. 5, pp. 762– 768, May 2003.
- [12] Y.-M.Wang and J.-S.Wang, "An all-digital 50%dutycycle corrector, "in *IEEE Int. Circuits Syst. Symp.*, 2004, pp. 925–928.