WO2007125519A2

WO2007125519A2 - Latency optimized resynchronization solution for ddr/ddr2 sdram read path

Info

Publication number: WO2007125519A2
Application number: PCT/IB2007/051617
Authority: WO
Inventors: Jan Vink
Original assignee: Nxp B.V.
Priority date: 2006-05-03
Filing date: 2007-05-02
Publication date: 2007-11-08
Also published as: WO2007125519A3

Abstract

An apparatus for synchronizing memory data signals is provided. The apparatus comprises a first interface circuit (110) that is configured to generate a differential clock signal in a strobe domain and to convey a data signal to a data bus (110), a second interface circuit (120) in a clock domain that is configured to receive the data signal (170) from the data bus and a synchronization circuit that is configured to adjust the data signal (170) between the strobe domain and the clock domain such that integrity of information encoded by the data signal is preserved. Methods of using the apparatus are also disclosed.

Description

LATENCY OPTIMIZED RESYNCHRONIZATION SOLUTION FOR DDR/DDR2 SDRAM READ PATH

This disclosure relates generally to the field of computing devices and more particularly to memory devices.

Data processing capabilities of computing devices, in particular, central processing units ("CPUs"), historically have increased greatly over relatively short life cycles. These processing capabilities can be limited by the CPU's ability to obtain data from associated memory components. Although capabilities of memory units have increased along with processing abilities of CPUs, advances in memory units generally have lagged behind CPU advances.

In a traditional computing architecture, when a CPU needs to either read data to process or write results of data processing, the CPU attempts to access a component in a memory hierarchy, such as an on-chip cache, a volatile memory such as a random access memory ("RAM"), or a non-volatile memory such as a magnetic disk drive. Generally, these memory components are accessed in this order because data storage capacities of each device are limited and amounts of time needed to read data from or write data to each of these components increase dramatically as each successive component in the hierarchy is accessed to obtain needed data. During times when needed data is being read or written, the CPU can be idle or stalled and does not perform any computational tasks. Although various techniques can be employed to attempt to minimize CPU idle time, memory access time can be a significant limiting factor for overall performance of a computing system.

RAM architectures have attempted to address memory access times. In particular, dynamic RAM ("DRAM") and synchronous dynamic RAM ("SDRAM") architectures have improved memory access times. Additionally, data transfer techniques and standards such as double data rate ("DDR") and double data rate two ("DDR2") have further improved RAM component performance. DDR2 RAM is gaining wide acceptance as a preferred memory architecture. The systems and components disclosed herein can be employed in a Mobile/DDR/DDR2 SDRAM architecture to assist in managing data transfer operations efficiently.

As electronic devices continue to evolve and incorporate computing features, memory components, including components that can employ the components and methods disclosed herein, will continue to be needed. Devices such as televisions, mobile and cellular telephones, personal digital assistants (PDAs), portable music players, portable gaming devices, and general- purpose computers all can incorporate such memory components in their architectures.

The following presents a simplified summary in order to provide a basic understanding and a high-level survey. This summary is not an extensive overview and is neither intended to identify key or critical elements nor to delineate the scope of any claims. The sole purpose of this summary is to present some concepts in a simplified form as a prelude to the more detailed description that follows. Additionally, section headings used herein are provided merely for convenience and should not be taken as limiting in any way.

An apparatus for synchronizing memory data signals, comprises a first interface circuit that is configured to generate a differential clock signal in a strobe domain and to convey a data signal to a data bus, a second interface circuit in a clock domain that is configured to receive the data signal from the data bus, and a synchronization circuit that is configured to adjust the data signal between the strobe domain and the clock domain such that integrity of information encoded by the data signal is preserved. The first interface circuit can be further configured to generate a strobe signal that is associated with a data read operation. The strobe signal can travel with the data signal. The second interface circuit can be further configured to generate a strobe signal that is associated with a data write operation. The strobe signal can travel with the data signal.

An apparatus for use in data transfer operations between an application- specific integrated circuit and a memory comprises a first control circuit including a plurality of flip-flops that are controlled by a first clock signal having a first phase, a second control circuit that includes a plurality of flip-flops that are controlled by a second clock signal having a second phase, and a memory controller that is configured to manage memory access functions, wherein data traveling from a strobe domain to an internal clock domain is synchronized to preserve integrity of the data. The strobe domain can be a domain of one circuit selected from the group consisting of an integrated circuit and a clocked memory circuit. The strobe domain can be a domain of an application-specific integrated circuit or a domain of a dynamic random access memory circuit. The dynamic random access memory circuit can be a dynamic random access memory circuit selected from the group consisting of a synchronous dynamic random access memory circuit, a double data rate dynamic random access memory circuit, a double data rate two dynamic random access memory circuit, a double data rate synchronous dynamic random access memory circuit, and a double data rate two synchronous dynamic random access memory circuit. The internal clock domain can be a domain of a circuit selected from the group consisting of an integrated circuit and a clocked memory circuit. The internal clock domain can be a domain of a dynamic random access memory circuit. The dynamic random access memory circuit can be a dynamic random access memory circuit selected from the group consisting of a synchronous dynamic random access memory circuit, a double data rate dynamic random access memory circuit, a double data rate two dynamic random access memory circuit, a double data rate synchronous dynamic random access memory circuit, and a double data rate two synchronous dynamic random access memory circuit.

A method for managing data access functions comprises generating, for a data transfer control circuit of a first clock domain, a data signal that encodes information for temporary storage at one or more flip-flops of a second clock domain; generating, in the first clock domain, a strobe signal to travel from the first clock domain to the second clock domain; and adjusting, in the second clock domain, the data signal to account for a timing differential between a clock pulse of the first clock domain and a clock pulse of the second clock domain. The method can further comprise using the strobe signal in the second clock domain as an indicator of validity of the data signal at the one or more flip-flops of the second clock domain. Still further, the method can comprise generating a round-robin control signal to manage reading of the data signal from the one or more flip-flops of the second clock domain. The method can further still comprise reading the data signal from the one or more flip-flops of the second clock domain. Storing information from the data signal in a memory unit associated with the second clock domain can also be part of the method as well as using the information stored in the memory unit associated with the second clock domain in a data processing operation.

The disclosed and described components and methods comprise one or more of the features that are particularly pointed out and distinctly claimed herein. The following description, including the drawings, sets forth in detail certain illustrative or exemplary components and methods. However, these illustrative or exemplary components and methods illustrate only a few of the various ways in which these components and methods can be employed. Specific implementations of the disclosed and described components and methods can include some, many, or all of such components and methods, as well as their equivalents. Variations of the specific implementations and examples presented will be apparent from the following detailed description.

FIG. IA is a system block diagram of an interface between an integrated circuit and a memory component.

FIG. IB is a chart of signal waveforms.

FIG. 2 is a system block diagram of a clocked memory read path.

FIG. 3 is a chart of signal waveforms.

FIG. 4 is a system block diagram of an interface between two clock domains.

FIG. 5 is a chart of signal waveforms.

As used in this application, the terms "component," "system," "module," and the like are intended to refer to a computer-related entity, such as hardware, software (for instance, in execution), and/or firmware. For example, a component can be a process running on a processor, a processor, an object, an executable, a program, and/or a computer. Also, both an application running on a server and the server can be components. One or more components can reside within a process and a component can be localized on one computer and/or distributed between two or more computers.

Disclosed components and methods are described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the disclosed subject matter. It may be evident, however, that certain of these specific details can be omitted or combined with others in a specific implementation. In other instances, certain structures and devices are shown in block diagram form in order to facilitate description. Further, it should be noted that although specific examples presented herein include or reference specific components, an implementation of the components and methods disclosed and described herein is not necessarily limited to those specific components and can be employed in other contexts as well.

It should also be appreciated that although specific examples presented may describe or depict systems or methods that are based upon components of personal computers, the use of components and methods disclosed and described herein is not limited to that domain. For example, the disclosed and described components and methods can be used in a distributed or network computing environment. Additionally or alternatively, the disclosed and described components and methods can be used on a single server accessed by multiple clients. Those of ordinary skill in the art will readily recognize that the disclosed and described components and methods can be used to create other components and execute other methods on a wide variety of computing devices.

FIG. IA is a system block diagram of an interface 100 between an integrated circuit and a memory component. In this specific example, an application-specific integrated circuit (ASIC) 110 and a double data rate synchronous dynamic random access memory (DDR SDRAM) module 120 are illustrated. It should be appreciated that as with all the examples presented in this disclosure, the components illustrated in FIG. 1 are exemplary only and that other implementations between components are both possible and expressly contemplated. Such other implementations will be apparent to practitioners of ordinary skill in this art from reading this disclosure. In particular, the components disclosed and described herein can be used with a wide variety of computing and memory components including ASICS and other data processors as well as various types of memory architectures such as some of the architectures previously mentioned.

The ASIC 110 and the DDR SDRAM module 120 can engage in data communication using a group of timing, information and control signals. These timing, information, and control systems can be transmitted between the ASIC 110 and the DDR SDRAM module 120 using a bidirectional data bus or other appropriate data communication path (shown logically). The ASIC 110 can contain a memory controller and can generate a differential clock signal that can be connected to the DDR SDRAM module 120. Specifically, the ASIC 110 can send two clock signals, clkp 130 and clkn 140, to the DDR SDRAM module 120.

A combined address and command signal 150 can also be sent from the ASIC 110 to the DDR SDRAM module 120 in a manner that is synchronous to the differential clock signal. During a data read operation, the DDR SDRAM module 120 can deliver desired data over the data bus that connects the ASIC 110 and the DDR SDRAM module 120. In this example, the data bus is bidirectional and two bits per clock cycle can be delivered by using both the rising edge and the falling edge of the clock signal.

A strobe signal dqs 160 can be sent bidirectionally, either from the ASIC 110 to the DDR SDRAM module 120 or conversely from the DDR SDRAM module 120 to the ASIC 110. A data transfer signal dq 170 can carry data between components. A data mask signal dqm 180 can also be employed for write operations.

FIG. IB is a signal waveform chart that illustrates waveforms of the timing, information, and control signals between the ASIC 110 and the DDR SDRAM module 120. During read operations 185, clock signals clkp 130 and clkn 140 alternate between high and low signals over time. In this example, as with other waveform examples, high and low signals are used to designate logical 1 and logical 0, respectively. Those of ordinary skill in this art area will readily recognize that these designations can be reversed, depending upon a specific implementation.

Strobe signal dqs 160 can be used to indicate the presence of a valid data signal that can be read. Assertion of strobe signal dqs 160 is synchronized with clock signal clkp 130. Levels of the strobe signal dqs 160 stabilize within the boundaries of the vertical time indication lines to provide time for the data signal dq 170 to stabilize before the strobe signal 160 is used to indicate valid data. As can be seen here, the strobe signal dqs 160 travels with the data and delay of the strobe is approximately equal to delay of the data. Relative delay variations between the strobe signal dqs 160 and the data signal dq 170 influence the maximum achievable bus frequency and not the delay itself. It should be recognized that data is captured with the strobe but can only be used in the ASIC 110 when the data is available in the internal clock domain of the ASIC 110. The components and methods disclosed and described herein provide a flexible, latency- optimized solution for re synchronization of this data from the strobe domain to the internal clock domain.

For write operations 190, the process is similar. The strobe signal dqs 160 is used to indicate the validity of values of the data signal dq 170 and the data mask dqm 180. Illustration of signal timing with respect to address and command operations 195 shows signals present for use across both falling and rising edges of the clock signals clkp 130 and clkn 140. FIG. 2 is a system block diagram of a clocked memory read path 200. The read path includes a series of connected flip-flops, each of which is clocked as illustrated. A data signal to be captured, dq_from_pad 210, is clocked into the read path 200 as follows. Two signals, dqs90 and dqs270 are derived from the read strobe signal, dqs_from_pad (not shown). Care should be taken to aligned clock faces for these operations. The read strobe signal, dqs_from_pad, can be generated by the DDR SDRAM memory module 120 illustrated in FIG. IA. The first data bit DO from the data signal dq_from_pad 210 is captured at the first flip-flop using the dqs90 signal. The second data bit Dl is captured using the dqs270 signal and the first data bit is transferred to the next flip-flop. Both data bits, DO and Dl, are then transferred through the flip-flops clocked using the retclk_180 and clkl80 signals. Data in signals in-dq_even 220 and in_dq_odd 230 can then be used in flip-flops running under the control of a clock in the component that received the signal dq_from_pad 210.

FIG. 3 is a signal waveform chart. The signal waveform chart illustrates an exemplary no-delay situation or an ideal/low-frequency solution. In a specific implementation, there will ordinarily be a delay in signals dq_from_pad and dqs_from_pad such that these signals will shift to the right.

Data to be captured arrives on the edge 4 in FIG. 3. To ensure that the data actually arrives on edge 4, the signal retclk_180 shifts with the signals dq_from_pad and dqs_from_pad. The signal retclk_180 is generated in a specific manner such that two conditions hold. Specifically, retclk_180 should always have more delay than clkl80 to ensure that edge 2 is timed slightly later than the rising edge of clkl80 at a clock cycle before edge 3. Additionally, delay shifts partly with both signal dqs90 and signal dqs270 to ensure that edge 2 is timed later than edge 1. In this manner edge 2 remains times between edge 1 and edge 3 and can be seen as bridging the signal towards edge 4. The signal retclk_180 can be created with the aid of a non- bonded bi-directional IO cell using the signal elk as the input. The inverted output can be the signal re t_clk 180.

It should be understood that in a typical implementation, the data bus is often 32 bits wide. Consequently, the signal retclk_180 will be routed to 32 different circuits such as the exemplary circuit shown in FIG. 2. Balancing this signal towards all connections is critical; the signal needs to arrive at all 32 read-paths at the same time. As operating frequencies increase in a specific implementation, delay of the signals dq_from_pad, dqs90, and dqs270 can become too large to have a sufficient margin such that edge 2 of retclk_180 can be timed inbetween edge 1 and edge 3. Typically, the delay of dq_from_pad, dqs90, and dqs270 can be as high as 2 to 4.5 ns (CMOS 12 numbers, variation due to PVT conditions). For operating frequencies in the order of 250 to 300 MHz, the disclosed and described components become critical because of operational failures above such frequencies. To solve the problem of operational failures at higher frequencies, another bridge flip-flop can be introduced but at a cost of increased complexity.

FIG. 4 is a system block diagram of an interface system 400 between two clock domains. An internal clock domain is illustrated on the left and a strobe domain on the right. Each of the internal clock domain and the strobe domain is separately clocked as illustrated and explained herein.

In the internal clock domain, signal clk_intern 405 is used to govern a counter 410 and a D flip-flop 415 that controls the signal in_dq_even 420. The counter 410 can generate a select signal 425 that is used to drive a multiplexer 430. The multiplexer 430 can select from among a plurality of D flip-flops 435, 440, 445 in the strobe domain. The D flip-flops 435, 440, 445 are enabled using a counter 450. The counter 450 and the plurality of D flip-flops 435, 440, 445 in the strobe domain are commonly clocked using a signal dqs90. When enabled, each of the plurality of D flip-flops 435, 440, 445 in the strobe domain can receive data from the signal dq_from_pad 455.

The interface system 400 can operate as follows. In the strobe domain, the first data bit that arrives in signal dq_from_pad 455 is captured in D flip-flop 435 with the aid of dqs90. The signal dqs90 is also connected to the counter450 that selects the next flip-flop, for example, D flip-flop 440. The second data bit that arrives in the signal dq_from_pad455 is captured in D flip-flip 440, also using clock signal dqs90. The process continues to capture successive data bits in successive D flip-flops.

In the internal clock domain, the counter 410 has, after start-up, an output value select 425 that is used by the multiplexer 430 to select the respective Q output of a corresponding D flip-flop. As illustrated, the multiplexer 430 selects D flip-flop 435, the flip-flop that captured the first data bit from the signal dq_from_pad 455 as described above. After giving a read command, the D flip-flop 415 passes the read data to signal in_dq_even 420. Clock timings of these operations are shown in the signal waveform chart of FIG. 5. As disclosed and described herein, an advantage is that operational impact of signal delay is lessened or mitigated from a level of impact experienced with other memory control architectures. In particular, an amount of delay can have any value, but the read path can be easily adapted to deal with such delay by increasing the number of parallel flip-flops in the strobe domain. To implement a full solution for a Mobile, DDR, or DDR2 SDRAM interface using the components and methods herein disclosed and described, a circuit such as the one presented in FIG. 4 is required twice per data pin. The first circuit can be clocked by dqs90 and the second circuit clocked by dqs270. Also, a control input is needed. The input must be controlled when the data is transferred from the strobe-domain flip-flops to the internal clock- domain flip-flop.

The number of bits read must also be controlled. This information is available in the memory controller as the controller is typically implemented to account for the number of words it reads from memory for a particular transfer. The dqs90 and dqs270 signal must be generated and should be generated from the dqs_from_pad. The strobe must also be prevented from adversely affecting the dqs90/dqs270 clock signal during high-Z periods of the strobe.

To design a specific solution for a system, the following analyses must be performed. First, the worst-case (longest) and best-case (shortest) delay must be calculated. Flexibility should be considered. In case there are doubts as to timing, an additional margin can be added to the worst/best-case delay as a safety margin. In case the latency is critical for system performance, extra effort can be invested to minimize the worst-case (longest) delay. The worst- case (longest) delay determines at what cycle the data is transferred from the strobe-domain flip- flops to the clock-domain flip-flop. In the example shown in FIG. 5, this occurs at clock edge 7.

The difference between the worst-case (longest) delay and the best-case (shortest) delay can be used to determine the number of flip-flops that are to be used in the strobe-domain (N). For flexibility, the cycle at which data is transferred to the internal clock domain can be made programmable. Depending upon actual delay, determined by PVT conditions, the moment can be adjusted.

What has been described above includes illustrative examples of certain components and methods. It is, of course, not possible to describe every conceivable combination of components or methodologies, but one of ordinary skill in the art will recognize that many further combinations and permutations are possible. Accordingly, all such alterations, modifications, and variations are intended to fall within the spirit and scope of the appended claims.

In particular and in regard to the various functions performed by the above described components, devices, circuits, systems and the like, the terms (including a reference to a "means") used to describe such components are intended to correspond, unless otherwise indicated, to any component which performs the specified function of the described component (for example, a functional equivalent), even though not structurally equivalent to the disclosed structure, which performs the function in the herein illustrated examples. In this regard, it will also be recognized that the disclosed and described components and methods can include a system as well as a computer-readable medium having computer-executable instructions for performing the acts and/or events of the various disclosed and described methods.

In addition, while a particular feature may have been disclosed with respect to only one of several implementations, such feature may be combined with one or more other features of the other implementations as may be desired and advantageous for any given or particular application. Furthermore, to the extent that the terms "includes," and "including" and variants thereof are used in either the detailed description or the claims, these terms are intended to be inclusive in a manner similar to the term "comprising."

Claims

CLAIMS:

1. An apparatus for synchronizing memory data signals, comprising: a first interface circuit (110) that is configured to generate a differential clock signal (130, 140) in a strobe domain and to convey a data signal (170) to a data bus (110); a second interface circuit (120) in a clock domain that is configured to receive the data signal from the data bus; and a synchronization circuit (110) that is configured to adjust the data signal between the strobe domain and the clock domain such that integrity of information encoded by the data signal is preserved.

2. The apparatus of claim 1, wherein the first interface circuit is further configured to generate a strobe signal (160) that is associated with a data read operation.

3. The apparatus of claim 2, wherein the strobe signal (160) travels with the data signal (170).

4. The apparatus of claim 1, wherein the second interface circuit (120) is further configured to generate a strobe signal that is associated with a data write operation (160).

5. The apparatus of claim 4, wherein the strobe signal (160) travels with the data signal (170, 180).

6. An apparatus for use in data transfer operations between an application- specific integrated circuit and a memory, comprising: a first control circuit including a plurality of flip-flops (435, 440, 445) that are controlled by a first clock signal having a first phase; a second control circuit including a flip-flop (415) that is controlled by a second clock signal (405) having a second phase; and a memory controller (430) that is configured to manage memory access functions; wherein data traveling from a strobe domain to an internal clock domain is synchronized to preserve integrity of the data.

7. The apparatus of claim 6, wherein the strobe domain is a domain of one circuit selected from the group consisting of an integrated circuit (110) and a clocked memory circuit (120).

8. The apparatus of claim 7, wherein the strobe domain is a domain of an application- specific integrated circuit (110).

9. The apparatus of claim 7, wherein the strobe domain is a domain of a dynamic random access memory circuit (120).

10. The apparatus of claim 9, wherein the dynamic random access memory circuit (120) is a dynamic random access memory circuit selected from the group consisting of a synchronous dynamic random access memory circuit, a double data rate dynamic random access memory circuit, a double data rate two dynamic random access memory circuit, a double data rate synchronous dynamic random access memory circuit, a mobile double data rate dynamic random access memory circuit, and a double data rate two synchronous dynamic random access memory circuit.

11. The apparatus of claim 7, wherein the internal clock domain is a domain of one circuit selected from the group consisting of an integrated circuit (110) and a clocked memory circuit (120).

12. The apparatus of claim 11, wherein the internal clock domain is a domain of a dynamic random access memory circuit.

13. The apparatus of claim 12, wherein the dynamic random access memory circuit (120) is a dynamic random access memory circuit selected from the group consisting of a synchronous dynamic random access memory circuit, a double data rate dynamic random access memory circuit, a double data rate two dynamic random access memory circuit, a double data rate synchronous dynamic random access memory circuit, a mobile double data rate dynamic random access memory circuit, and a double data rate two synchronous dynamic random access memory circuit.

14. A method for managing data access functions, comprising: generating, for a data transfer control circuit of a first clock domain, a data signal that encodes information for temporary storage at one or more flip-flops of a second clock domain; generating, in the first clock domain, a strobe signal to travel from the first clock domain to the second clock domain; and adjusting, in the second clock domain, the data signal to account for a timing differential between a clock pulse of the first clock domain and a clock pulse of the second clock domain.

15. The method of claim 14, further comprising using the strobe signal in the second clock domain as an indicator of validity of the data signal at the one or more flip-flops of the second clock domain.

16. The method of claim 15, further comprising generating a round-robin control signal to manage reading of the data signal from the one or more flip-flops of the second clock domain.

17. The method of claim 16, further comprising reading the data signal from the one or more flip-flops of the second clock domain.

18. The method of claim 17, further comprising storing information from the data signal in a memory unit associated with the second clock domain.

19. The method of claim 16, further comprising using the information stored in the memory unit associated with the second clock domain in a data processing operation.