US6343344B1 - System bus directory snooping mechanism for read/castout (RCO) address transaction - Google Patents

System bus directory snooping mechanism for read/castout (RCO) address transaction Download PDF

Info

Publication number
US6343344B1
US6343344B1 US09/368,221 US36822199A US6343344B1 US 6343344 B1 US6343344 B1 US 6343344B1 US 36822199 A US36822199 A US 36822199A US 6343344 B1 US6343344 B1 US 6343344B1
Authority
US
United States
Prior art keywords
address
data access
cast out
access operation
combined
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
US09/368,221
Inventor
Ravi Kumar Arimilli
John Steven Dodson
Guy Lynn Guthrie
Jody B. Joyner
Jerry Don Lewis
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US09/368,221 priority Critical patent/US6343344B1/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ARIMILLI, RAVI K., DODSON, JOHN S., Guthrie, Guy L., JOYNER, JODY B., LEWIS, JERRY DON
Application granted granted Critical
Publication of US6343344B1 publication Critical patent/US6343344B1/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0804Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • G06F12/0855Overlapped cache accessing, e.g. pipeline
    • G06F12/0859Overlapped cache accessing, e.g. pipeline with reload from main memory

Definitions

  • the present invention is related to the subject matter of commonly assigned, copending U.S. patent applications: Ser. No. 09/368,222, entitled “MULTIPROCESSOR SYSTEM BUS WITH READ/CASTOUT (RCO) ADDRESS TRANSACTION”; Ser. No. 09/368,225, entitled “PRECISE INCLUSIVITY MECHANISM FOR SYSTEM BUS WITH READ/DEALLOCATED (RDA) ADDRESS TRANSACTION”; Ser. No. 09/368,224, entitled “MULTIPROCESSOR SYSTEM BUS WITH CACHE STATE AND LRU SNOOP RESPONSES FOR READ/CASTOUT (RCO) ADDRESS TRANSACTION”; Ser. No.
  • the present invention relates in general to directory accesses necessary for data access operations in data processing systems and in particular to directory lookups and tag comparisons for related data access and cast out operations. Still more particularly, the present invention relates to concurrent directory lookups and tag comparisons for related data access and cast out operations to improve storage device performance and response latency.
  • High performance data processing systems typically include a number of levels of caching between the processor(s) and system memory to improve performance, reducing latency in data access operations.
  • multiple cache levels are typically employed in progressively larger sizes with a trade off to progressively longer access latencies. Smaller, faster caches are employed at levels within the storage hierarchy closer to the processor or processors, while larger, slower caches are employed at levels closer to system memory. Smaller amounts of data are maintained in upper cache levels, but may be accessed faster.
  • a replacement policy typically a least-recently-used (LRU) replacement policy—is employed to decide which cache location(s) should be utilized to store the new data.
  • LRU least-recently-used
  • the cache location (commonly referred to as a “victim”) to be overwritten contains only data which is invalid or otherwise unusable from the perspective of a memory coherency model being employed, or for which valid copies are concurrently stored in other devices within the system storage hierarchy.
  • the new data may be simply written to the cache location without regard to preserving the existing data at that location.
  • the cache location selected to received the new data contains modified data, or data which is otherwise unique or special within the storage hierarchy.
  • the replacement of data within a selected cache location (a process often referred to as “updating” the cache) requires that any modified data associated with the cache location selected by the replacement policy be written back to lower levels of the storage hierarchy for preservation.
  • the process of writing modified data from a victim to system memory or a lower cache level is generally called a cast out or eviction.
  • a cache When a cache initiates a data access operation—for instance, in response to a cache miss for a READ operation originating with a processor—typically the cache will initiate a data access operation (READ or WRITE) on a bus coupling the cache to lower storage levels. If the replacement policy requires that a modified cache line be over-written, compelling a cast out for coherency purposes, the cache will also initiate the cast out, but on a subsequent bus cycle. The data access operation thus requires multiple operations, and bus cycles, to complete.
  • READ data access operation
  • WRITE data access operation
  • the data access operation and related cast out also require multiple directory accesses. Whether the storage devices is a vertically in-line device to which the data access and cast out operations are directed, or a horizontal storage device snooping the data access and cast out operations, separate directory lookups and the associated tag comparisons must be performed for these discrete operations.
  • a single directory access is performed utilizing the index to locate the congruence class. Address tags within the congruence class are then compared to the address tag for the data access operation and the address tag for the cast out operation concurrently, generating separate hit signals as appropriate. Only a single directory access is required, however, rather than two separate directory accesses as required in the known art, taking advantage of the fact that both the data access target and the cast out victim belong to a single congruence class. Response latency is also improved, as is address bus bandwidth utilization.
  • FIG. 1 depicts a block diagram of a data processing system in which a preferred embodiment of the present invention may be implemented
  • FIGS. 2A-2B are an addressing scheme and a combined address for related data access and cast out operations for transmission in accordance with a preferred embodiment of the present invention
  • FIGS. 3A-3C depict diagrams of a cache and related cache control logic for utilizing the combined address for related data access and cast out operations to improve directory accesses in accordance with a preferred embodiment of the present invention
  • FIG. 4 is a high level flow chart for a process of utilizing the combined address for related data access and cast out operations to improve directory accesses in accordance with a preferred embodiment of the present invention.
  • FIGS. 5A-5B depict timing diagrams for data access and cast out operations in accordance with the known art and for a combined data access and cast out operation in accordance with a preferred embodiment of the present invention.
  • Data processing system 100 is a symmetric multiprocessor (SMP) system including a plurality of processors 102 aa through 102 an and 102 ma through 102 mn (where “m” and “n” are integers).
  • SMP symmetric multiprocessor
  • Each processor 102 aa - 102 mn includes a respective level one (L1) cache 104 aa - 104 mn , preferably on chip with the processor and bifurcated into separate instruction and data caches.
  • L1 cache 104 aa - 104 mn preferably on chip with the processor and bifurcated into separate instruction and data caches.
  • Each processor 102 aa - 102 mn is coupled via a processor bus 106 aa - 106 l to a level two cache 108 a - 108 l (where “l” is an integer), which are in-line caches shared by multiple processors in the exemplary embodiment.
  • each L2 cache may be shared by four processors, and a total of sixteen L2 caches may be provided.
  • Each L2 cache 108 a - 108 l is connected to a level three (L3) cache 110 a - 110 l and to system bus 112 .
  • L3 caches 110 a - 110 l are actually in-line caches rather than lookaside caches as FIG. 1 suggests, but operations received from a vertical L2 cache (e.g., L2 cache 108 a ) are initiated both within the L3 cache 110 a and on system bus 112 concurrently to reduce latency. If the operation produces a cache hit within the L3 cache 110 a , the operation is cancelled or aborted on system bus 112 . On the other hand, if the operation produces a cache miss within the L3 cache 110 a , the operation is allowed to proceed on system bus 112 .
  • L3 caches 110 a - 110 l are actually in-line caches rather than lookaside caches as FIG. 1 suggests, but operations received from a vertical L2 cache (e.g., L2 cache 108 a ) are
  • L2 caches 108 a - 108 l and L3 caches 110 a - 110 l are employed to stage data to the L1 caches 104 a - 104 l and typically have progressively larger storage capacities but longer access latencies.
  • L2 caches 108 a - 108 l and L3 caches 110 a - 110 l thus serve as intermediate storage between processors 102 a - 102 l and system memory 114 , which typically has a much larger storage capacity but may have an access latency many times that of L3 caches 110 a - 110 l .
  • Both the number of levels in the cache hierarchy and the cache hierarchy configuration (i.e, shared versus private, in-line versus lookaside) employed in data processing system 100 may vary.
  • L2 caches 108 a - 108 l and L3 caches 110 a - 110 l are connected to system memory 114 via system bus 112 .
  • a memory mapped device 116 such as a graphics adapter providing a connection for a display (not shown), and input/output (I/O) bus bridge 118 .
  • I/O bus bridge 118 couples system bus 112 to I/O bus 120 , which may provide connections for I/O devices 122 , such as a keyboard and mouse, and nonvolatile storage 124 , such as a hard disk drive.
  • System bus 112 , I/O bus bridge 118 , and I/O bus 120 thus form an interconnect coupling the attached devices, for which alternative implementations are known in the art.
  • Non-volatile storage 124 stores an operating system and other software controlling operation of system 100 , which are loaded into system memory 114 in response to system 100 being powered on.
  • data processing system 100 may include many additional components not shown in FIG. 1, such as serial and parallel ports, connections to networks or attached devices, a memory controller regulating access to system memory 114 , etc. Such modifications and variations are within the spirit and scope of the present invention.
  • the combined address is transmitted on system bus 112 by an L2 cache 108 a - 108 l in response to a cache miss for a data access operation within the L2 cache.
  • the combined address may be employed for transmission on any bus by any storage device initiating related data access and cast out operations.
  • the data access operation is a READ in the exemplary embodiment, but may be any data access operation (e.g., WRITE, etc.).
  • Cache lines are stored within the cache in congruence classes, sets of cache lines identified by a common index field within the system addresses for the cache lines in a congruence class.
  • bits 0 . . . 35 of a 56 bit cache line address are the tag
  • bits 36 . . . 46 are the index
  • the remaining bits are an intracache line address.
  • the index field of the address is employed by the cache directory and the cache memory to locate congruence classes.
  • Cache directory stores tags for cache lines-contained within cache memory within the congruence class identified by the index, and compares the tag of a target address to the tags within the congruence class. If a match is identified, the corresponding cache line within cache memory is the target data.
  • the address for a data access operation and the address for a related cast out are transmitted in separate system bus operations.
  • the target data of a data access operation and the victim selected by the replacement policy are members of the same congruence class. Therefore the index field will be identical for both the data access and the cast out.
  • the index for the congruence class containing the target cache lines for both the data access and the cast out (“Index”) is combined with the tags for the cache line targeted by the data access (“Tag RD”) and the cache line targeted by the cast out (“Tag CO”).
  • the combined address of the present invention may be employed whenever the need to preserve some unique aspect of data arises.
  • MESI coherency protocol which includes the modified (M), exclusive (E), shared (S), and invalid (I) coherency states
  • M modified
  • E exclusive
  • S shared
  • I invalid
  • a modified cache segment should be written to lower level storage when selected to be replaced.
  • the modified state indicates that cache data has been modified with respect to corresponding data in system memory without also modifying the system memory data, such that the only valid copy of the data is within the cache entry storing the modified cache line or segment.
  • the cache segemented selected for replacement need not be written to lower level storage since either (1) a valid copy already exists elsewhere in storage, or (2) the contents of the cache segment are invalid.
  • the exclusive state indicates that the cache entry is consistent with system memory but is only found, within all caches at that level of the storage hierarchy, in the subject cache.
  • the shared state indicates that the cache entry may be found in the subject cache and at least one other cache at the same level in the storage hierarchy, with all copies of the data being consistent with the corresponding data in system memory
  • the invalid state indicates that a cache entry—both the data and the address tag—within a given cache entry is no longer coherent with either system memory or other caches in the storage hierarchy.
  • Coherency states implemented as extensions to the basic MESI protocol may also require a cast out, or elect to perform a cast out, and therefore benefit from the present invention.
  • the recent (R) state essentially a variant of the shared state, indicates that the cache entry may be found in both the subject cache and at least one other cache at the same level in the storage hierarchy, and that all copies of the data in the subject cache and other caches are consistent with the corresponding data in system memory, but also indicates that the subject cache, of all caches containing the shared data, most recently received the data in a system bus transaction such as a read from system memory. While a cast out is not necessary to preserve data integrity in such a case, a cast out operation may be useful to accurately maintain the recent state, and the combined address bus transaction of the present invention may be utilized for that purpose.
  • the combined address of the present invention will save bus cycles over the dual operation scheme of the known art. If each index or tag requires a full bus cycle to completely transmit, the combined address of the present invention may be transmitted in three bus cycles (neglecting the optional state information), rather than four bus cycles as would be required for separate data access and cast out operations. The additional bus cycle is saved because the index field need only be transmitted once for both operations.
  • the resulting system bus transaction condenses, within a single address, the information required for both the data access operation and the related cast out.
  • the combined index and tags may be transmitted in any predefined order, and may be transmitted on a single bus cycle as shown in FIG. 2B or over multiple consecutive bus cycles. If the combined address is transmitted over multiple bus cycles, the index should be transmitted first to allow the receiving devices to begin a directory lookup at the earliest possible time.
  • the tags may be transmitted during subsequent cycles and still be timely for the comparators employed to compared directory tags to the target tag(s). See commonly assigned, copending U.S. patent application Ser. No. 09/345,302 entitled “CACHE INDEX BASED SYSTEM ADDRESS BUS,” incorporated herein by reference.
  • FIGS. 3A through 3C diagrams of a cache and related cache control logic for formulating, transmitting and utilizing the combined address for related data access and cast out operations in accordance with a preferred embodiment of the present invention is depicted.
  • the elements depicted are employed in L2 caches 108 a - 108 n and in L3 caches 110 a - 110 n .
  • a cache controller 302 receives and transmits operations relating to data within cache memory 304 from upstream and downstream buses through bus interface units (“BIU”) 306 a and 306 b .
  • BIU bus interface units
  • a directory lookup 308 is employed to locate cache lines within cache memory 304 and an LRU unit 310 implements the replacement policy for updating cache lines within cache memory 304 .
  • the logical organization of data within the cache is in tables containing cache directory entries 312 and a corresponding data array 314 .
  • the cache directory entries 312 contain the address tag for the corresponding cache lines within data array 314 , as well as the coherency state, the LRU status, and an inclusivity (“I”) state for the respective cache line.
  • the coherency state indicates the cache line consistency with other copies of the cache line in other storage devices within the system.
  • the LRU status indicates the LRU position for the cache line within a congruence class.
  • the inclusivity state indicates whether the cache line is stored within a logically in-line, higher level cache.
  • cache controller 302 may trigger the LRU 310 to select a victim, then look up the selected victim to determine if a cast out would be required to update the corresponding cache line and, if so, retrieve the tag for the current contents of the potential victim. This may be performed concurrently with the directory lookup and tag comparison employed to determine whether the received data access operation generates a cache hit or miss.
  • FIG. 3B depicts a detail of the portion of a cache employed to formulate and transmit a combined address for related data access and cast out operations.
  • the identity and address tag 316 for the potential victim are determined from the replacement policy (LRU) and cache directory.
  • the index field and address tag 318 for the data access operation are supplied within the operation.
  • a multiplexer 320 receives, as one input 322 , the index field and address tag 318 for the data access operation. As the other input 324 , multiplexer 320 receives the index field and address tag 318 combined with the address tag 316 for the potential cast out. Multiplexer 320 is controlled by a cast out signal 326 indicating whether a cast out may be required for the data access operation. This may be determined by examining the coherency state of the potential victim and whether the current access was a miss. If the potential victim does not contain valid and unique data (e.g., the coherency state is “shared” or “invalid”), the cast out signal 326 is not asserted. If the potential victim contains unique and valid data (e.g., the coherency state is “modified”), case out signal 326 is asserted.
  • a cast out signal 326 indicating whether a cast out may be required for the data access operation. This may be determined by examining the coherency state of the potential victim and whether the current access was a miss
  • Multiplexer 320 is also controlled by a cache hit signal 328 , taken from the end of the directory lookup and tag comparison functions within the cache and asserted if the cache contains the target data for the received data access operation. If cache hit signal 328 is asserted, the first input 322 is passed to the bus interface unit regardless of whether the cast out signal 326 was asserted. If a cache hit occurs, the target data is within the cache and no need to select a victim exists. Furthermore, no need exists to transmit the address for the data access operation to lower level storage devices, except perhaps to allow the lower level storage devices to update their coherency state and/or LRU information relating to the target cache line.
  • the first input 322 is similarly passed to the bus interface unit. In this circumstance, no need to perform a cast out exists (e.g., the victim was “invalid”).
  • the address for the data access operation will be transmitted to the lower storage levels, however, and the cast out tag may optionally be transmitted with the index and tag for the data access operation to allow lower level devices to update status information (e.g., “recent” version of shared coherency state or LRU position).
  • the second input 324 is passed to the bus interface unit only when the cache hit signal 328 is not asserted (i.e., cache miss) and the cast out signal 326 is asserted. In this case, both the address tag for the data access and the cast out tag will be transmitted with multiplexer 320 to allow a lower level cache a chance to process both operations.
  • FIG. 3C depicts a detail of the portion of a cache employed to utilize a combined address for related data access and cast out operations to improve directory access performance and response latency.
  • a lower level cache device 330 receiving the combined address of the present invention routes the index field (“Index”) to the cache directory 332 and the cache memory 334 to perform a lookup of the congruence class identified by the index.
  • the lower level cache thus requires only one directory access for the combined address, rather than the two separate directory accesses required by the discrete data access and cast out operations of the known art.
  • the data access and cast out tags (“Tag RD” and “Tag CO,” respectively) are routed to separate sets of comparators 336 a and 336 b for concurrent comparison to the address tags within the congruence class identified by the index.
  • Separate signals (“CO HIT” and “RD HIT”) are generated for matches of the cast out tag and the data access tag with tags within the indexed congruence class.
  • the RD HIT signal may be employed to select corresponding data within the cache memory 334 for the data access operation.
  • the CO HIT signal may be utilized to update the status of cache directory 332 or to trigger some other function. If the target of the data access operation is contained within cache memory 334 , the cache line may be sourced to the higher level cache initiating the combined data access and cast out operation, where the cache line is stored in the location previously occupied by the victim of the cast out operation.
  • step 402 depicts receiving a combined data access and cast out operation address, including the index field identifying the congruence class containing both the data access target and the cast out victim as well as the address tags for the data access target and the cast out victim.
  • step 404 which illustrates looking up the congruence class corresponding to the index received within the combined address.
  • step 406 depicts comparing the address tags within the identified congruence class to the address tag for the data access operation, and then to step 408 , which illustrates a determination of whether a match was determined. If so, the process proceeds to step 410 , which depicts asserting the data access cache hit signal.
  • step 404 the process also passes, in parallel, to step 412 , which depicts comparing the address tags within the identified congruence class to the address tag for the cast out operation, and then to step 414 , which illustrates a determination of whether a match was determined. If so, the process proceeds to step 416 , which depicts asserting the cast out cache hit signal. Thus, only a single directory lookup for the congruence class identified by the index received within the combined address is required. From either or both of steps 410 and/or 416 , the process passes to step 418 , which illustrates the process becoming idle until another combined data access and cast out operation address is received.
  • FIGS. 5A and 5B timing diagrams for data access and cast out operations in accordance with the known art and for a combined data access and cast out operation in accordance with a preferred embodiment of the present invention are depicted.
  • FIG. 5A depicts a timing diagram for data access and cast out operations in accordance with the known art.
  • the data access operation is initiated by transmitting the index and tag for the target on the address bus by the storage device (“L2”) requiring the target data.
  • the index and tag are presumed to require separate bus cycles for transmission in the example shown.
  • a response window of two bus cycles is presumed. During those two bus cycles, the index and tag for the related cast out operation may be transmitted.
  • the combined response to the data access operation (“CR RD”) is then driven by the system bus response logic within the memory controller (MC). Two bus cycles later, the combined response to the cast out operation (“CR CO”) is driven. The data access operation cannot be successfully completed until this response is received.
  • FIG. 5B depicts the combined data access and cast out operation of the present invention.
  • the combined address is driven by the storage device (L2) initiating the combined operation, and takes fewer cycles.
  • the combined response is received one bus cycle sooner with the present invention than with the known art, and the complete operation requires fewer bus cycles (4 bus cycles used over a 6 cycle period) than the known art (6 bus cycles used over a 7 cycle period).
  • the present invention reduces the number of directory accesses required to perform related data access and cast out operations initiated by a higher level storage device. Latency is also improved over conventional systems, which require separate data access and cast out operations each having their own response period. In the present invention, a single response period suffices for both the data access and cast out operations. Address bus bandwidth utilization is also improved since fewer address bus cycles are consumed for the combined operation than for separate operations.

Abstract

In response to receiving a combined address for related data access and cast out operations, including an index identifying a congruence class containing both the target of the data access and the victim of the cast out, a single directory access is performed utilizing the index to locate the congruence class. Address tags within the congruence class are then compared to the address tag for the data access operation and the address tag for the cast out operation concurrently, generating separate hit signals as appropriate. Only a single directory access is required, however, rather than two separate directory accesses as required in the known art, taking advantage of the fact that both the data access target and the cast out victim belong to a single congruence class. Response latency is also improved, as is address bus bandwidth utilization.

Description

RELATED APPLICATIONS
The present invention is related to the subject matter of commonly assigned, copending U.S. patent applications: Ser. No. 09/368,222, entitled “MULTIPROCESSOR SYSTEM BUS WITH READ/CASTOUT (RCO) ADDRESS TRANSACTION”; Ser. No. 09/368,225, entitled “PRECISE INCLUSIVITY MECHANISM FOR SYSTEM BUS WITH READ/DEALLOCATED (RDA) ADDRESS TRANSACTION”; Ser. No. 09/368,224, entitled “MULTIPROCESSOR SYSTEM BUS WITH CACHE STATE AND LRU SNOOP RESPONSES FOR READ/CASTOUT (RCO) ADDRESS TRANSACTION”; Ser. No. 09/368,223, entitled “UPGRADING OF SNOOPER CACHE STATE MECHANISM FOR SYSTEM BUS WITH READ/CASTOUT (RCO) ADDRESS TRANSACTIONS”; Ser. No. 09/368,227 now U.S. Pat. No. 6,279,086 entitled “MULTIPROCESSOR SYSTEM BUS WITH COMBINED SNOOP RESPONSES IMPLICITLY UPDATING SNOOPER LRU POSITION”; Ser. No. 09/368,226 now U.S. Pat. No. 6,275,909 entitled “MULTIPROCESSOR SYSTEM BUS WITH SYSTEM CONTROLLER EXPLICITLY UPDATING SNOOPER CACHE STATE INFORMATION”; Ser. No. 09/368,229 entitled “MULTIPROCESSOR SYSTEM BUS WITH SYSTEM CONTROLLER EXPLICITLY UPDATING SNOOPER LRU INFORMATION”; Ser. No. 09/368,228 entitled “MULTIPROCESSOR SYSTEM BUS WITH COMBINED SNOOP RESPONSES EXPLICITLY CANCELLING MASTER VICTIM SYSTEM BUS TRANSACTION”; Ser. No. 09/368,230 entitled “MULTIPROCESSOR SYSTEM BUS WITH COMBINED SNOOP RESPONSES EXPLICITLY CANCELLING MASTER ALLOCATION OF READ DATA”; and Ser. No. 09/368,231 entitled “MULTIPROCESSOR SYSTEM BUS WITH COMBINED SNOOP RESPONSES EXPLICITLY INFORMING SNOOPERS TO SCARF DATA”. The content of the above-identified applications is incorporated herein by reference.
BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates in general to directory accesses necessary for data access operations in data processing systems and in particular to directory lookups and tag comparisons for related data access and cast out operations. Still more particularly, the present invention relates to concurrent directory lookups and tag comparisons for related data access and cast out operations to improve storage device performance and response latency.
2. Description of the Related Art
High performance data processing systems typically include a number of levels of caching between the processor(s) and system memory to improve performance, reducing latency in data access operations. When utilized, multiple cache levels are typically employed in progressively larger sizes with a trade off to progressively longer access latencies. Smaller, faster caches are employed at levels within the storage hierarchy closer to the processor or processors, while larger, slower caches are employed at levels closer to system memory. Smaller amounts of data are maintained in upper cache levels, but may be accessed faster.
Within such systems, when data access operations frequently give rise to a need to make space for the subject data. For example, when retrieving data from lower storage levels such as system memory or lower level caches, a cache may need to overwrite other data already within the cache because no further unused space is available for the retrieved data. A replacement policy—typically a least-recently-used (LRU) replacement policy—is employed to decide which cache location(s) should be utilized to store the new data.
Often the cache location (commonly referred to as a “victim”) to be overwritten contains only data which is invalid or otherwise unusable from the perspective of a memory coherency model being employed, or for which valid copies are concurrently stored in other devices within the system storage hierarchy. In such cases, the new data may be simply written to the cache location without regard to preserving the existing data at that location.
At other times, however, the cache location selected to received the new data contains modified data, or data which is otherwise unique or special within the storage hierarchy. In such instances, the replacement of data within a selected cache location (a process often referred to as “updating” the cache) requires that any modified data associated with the cache location selected by the replacement policy be written back to lower levels of the storage hierarchy for preservation. The process of writing modified data from a victim to system memory or a lower cache level is generally called a cast out or eviction.
When a cache initiates a data access operation—for instance, in response to a cache miss for a READ operation originating with a processor—typically the cache will initiate a data access operation (READ or WRITE) on a bus coupling the cache to lower storage levels. If the replacement policy requires that a modified cache line be over-written, compelling a cast out for coherency purposes, the cache will also initiate the cast out, but on a subsequent bus cycle. The data access operation thus requires multiple operations, and bus cycles, to complete.
In other storage devices within the system, the data access operation and related cast out also require multiple directory accesses. Whether the storage devices is a vertically in-line device to which the data access and cast out operations are directed, or a horizontal storage device snooping the data access and cast out operations, separate directory lookups and the associated tag comparisons must be performed for these discrete operations.
It would be desirable, therefore, to reduce the number of directory accesses associated with data access operations requiring a victim cast out. It would further be advantageous to improve latency associated with responses to data access operations requiring a cast out.
SUMMARY OF THE INVENTION
It is therefore one object of the present invention to provide improved directory accesses necessary for data access operations in data processing systems.
It is another object of the present invention to provide improved directory lookups and tag comparisons for related data access and cast out operations.
It is yet another object of the present invention to provide concurrent directory lookups and tag comparisons for related data access and cast out operations to improve storage device performance and response latency.
The foregoing objects are achieved as is now described. In response to receiving a combined address for related data access and cast out operations, including an index identifying a congruence class containing both the target of the data access and the victim of the cast out, a single directory access is performed utilizing the index to locate the congruence class. Address tags within the congruence class are then compared to the address tag for the data access operation and the address tag for the cast out operation concurrently, generating separate hit signals as appropriate. Only a single directory access is required, however, rather than two separate directory accesses as required in the known art, taking advantage of the fact that both the data access target and the cast out victim belong to a single congruence class. Response latency is also improved, as is address bus bandwidth utilization.
The above as well as additional objects, features, and advantages of the present invention will become apparent in the following detailed written description.
BRIEF DESCRIPTION OF THE DRAWINGS
The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself however, as well as a preferred mode of use, further objects and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
FIG. 1 depicts a block diagram of a data processing system in which a preferred embodiment of the present invention may be implemented;
FIGS. 2A-2B are an addressing scheme and a combined address for related data access and cast out operations for transmission in accordance with a preferred embodiment of the present invention;
FIGS. 3A-3C depict diagrams of a cache and related cache control logic for utilizing the combined address for related data access and cast out operations to improve directory accesses in accordance with a preferred embodiment of the present invention;
FIG. 4 is a high level flow chart for a process of utilizing the combined address for related data access and cast out operations to improve directory accesses in accordance with a preferred embodiment of the present invention; and
FIGS. 5A-5B depict timing diagrams for data access and cast out operations in accordance with the known art and for a combined data access and cast out operation in accordance with a preferred embodiment of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
With reference now to the figures, and in particular with reference to FIG. 1, a block diagram of a data processing system in which a preferred embodiment of the present invention may be implemented is depicted. Data processing system 100 is a symmetric multiprocessor (SMP) system including a plurality of processors 102 aa through 102 an and 102 ma through 102 mn (where “m” and “n” are integers). Each processor 102 aa-102 mn includes a respective level one (L1) cache 104 aa-104 mn, preferably on chip with the processor and bifurcated into separate instruction and data caches. Each processor 102 aa-102 mn is coupled via a processor bus 106 aa-106 l to a level two cache 108 a-108 l (where “l” is an integer), which are in-line caches shared by multiple processors in the exemplary embodiment.
Although in the exemplary embodiment only two processors are depicted as sharing each L2 cache, and only two L2 caches are depicted, those skilled in the art will appreciate that additional processors L2 caches may be utilized in a multiprocessor data processing system in accordance with the present invention. For example, each L2 cache may be shared by four processors, and a total of sixteen L2 caches may be provided.
Each L2 cache 108 a-108 l is connected to a level three (L3) cache 110 a-110 l and to system bus 112. L3 caches 110 a-110 l are actually in-line caches rather than lookaside caches as FIG. 1 suggests, but operations received from a vertical L2 cache (e.g., L2 cache 108 a) are initiated both within the L3 cache 110 a and on system bus 112 concurrently to reduce latency. If the operation produces a cache hit within the L3 cache 110 a, the operation is cancelled or aborted on system bus 112. On the other hand, if the operation produces a cache miss within the L3 cache 110 a, the operation is allowed to proceed on system bus 112.
The lower cache levels—L2 caches 108 a-108 l and L3 caches 110 a-110 l—are employed to stage data to the L1 caches 104 a-104 l and typically have progressively larger storage capacities but longer access latencies. L2 caches 108 a-108 l and L3 caches 110 a-110 l thus serve as intermediate storage between processors 102 a-102 l and system memory 114, which typically has a much larger storage capacity but may have an access latency many times that of L3 caches 110 a-110 l. Both the number of levels in the cache hierarchy and the cache hierarchy configuration (i.e, shared versus private, in-line versus lookaside) employed in data processing system 100 may vary.
L2 caches 108 a-108 l and L3 caches 110 a-110 l are connected to system memory 114 via system bus 112. Also connected to system bus 112 may be a memory mapped device 116, such as a graphics adapter providing a connection for a display (not shown), and input/output (I/O) bus bridge 118. I/O bus bridge 118 couples system bus 112 to I/O bus 120, which may provide connections for I/O devices 122, such as a keyboard and mouse, and nonvolatile storage 124, such as a hard disk drive. System bus 112, I/O bus bridge 118, and I/O bus 120 thus form an interconnect coupling the attached devices, for which alternative implementations are known in the art.
Non-volatile storage 124 stores an operating system and other software controlling operation of system 100, which are loaded into system memory 114 in response to system 100 being powered on. Those skilled in the art will recognize that data processing system 100 may include many additional components not shown in FIG. 1, such as serial and parallel ports, connections to networks or attached devices, a memory controller regulating access to system memory 114, etc. Such modifications and variations are within the spirit and scope of the present invention.
Referring to FIGS. 2A and 2B, an addressing scheme and a combined address for related data access and cast out operations for transmission in accordance with a preferred embodiment of the present invention are illustrated. In the exemplary embodiment, the combined address is transmitted on system bus 112 by an L2 cache 108 a-108 l in response to a cache miss for a data access operation within the L2 cache. However, the combined address may be employed for transmission on any bus by any storage device initiating related data access and cast out operations. Similarly, the data access operation is a READ in the exemplary embodiment, but may be any data access operation (e.g., WRITE, etc.).
When a cache miss occurs within the L2 cache for a data access operation, the cache controller for the L2 cache should be able to determine whether a cast out will be required to preserve data within the cache location selected to be updated by the replacement policy. Moreover, an indexed cache organization is employed for caches within the preferred embodiment. Cache lines are stored within the cache in congruence classes, sets of cache lines identified by a common index field within the system addresses for the cache lines in a congruence class.
An exemplary addressing scheme for data processing system 100 is shown in FIG. 2A. In the example shown, bits 0 . . . 35 of a 56 bit cache line address are the tag, bits 36 . . . 46 are the index, and the remaining bits are an intracache line address. The index field of the address is employed by the cache directory and the cache memory to locate congruence classes. Cache directory stores tags for cache lines-contained within cache memory within the congruence class identified by the index, and compares the tag of a target address to the tags within the congruence class. If a match is identified, the corresponding cache line within cache memory is the target data.
In the prior art, the address for a data access operation and the address for a related cast out are transmitted in separate system bus operations. However, within an indexed cache organization of the type described, the target data of a data access operation and the victim selected by the replacement policy are members of the same congruence class. Therefore the index field will be identical for both the data access and the cast out. In the present invention, the index for the congruence class containing the target cache lines for both the data access and the cast out (“Index”) is combined with the tags for the cache line targeted by the data access (“Tag RD”) and the cache line targeted by the cast out (“Tag CO”). The directory state (“CO State”) of the cast out victim cache line—i.e., coherency state and/or LRU state—may also be appended to the address.
The combined address of the present invention may be employed whenever the need to preserve some unique aspect of data arises. Under the basic MESI coherency protocol, which includes the modified (M), exclusive (E), shared (S), and invalid (I) coherency states, a modified cache segment should be written to lower level storage when selected to be replaced. The modified state indicates that cache data has been modified with respect to corresponding data in system memory without also modifying the system memory data, such that the only valid copy of the data is within the cache entry storing the modified cache line or segment.
For exclusive, shared, or invalid cache segments, the cache segemented selected for replacement need not be written to lower level storage since either (1) a valid copy already exists elsewhere in storage, or (2) the contents of the cache segment are invalid. The exclusive state indicates that the cache entry is consistent with system memory but is only found, within all caches at that level of the storage hierarchy, in the subject cache. The shared state indicates that the cache entry may be found in the subject cache and at least one other cache at the same level in the storage hierarchy, with all copies of the data being consistent with the corresponding data in system memory Finally, the invalid state indicates that a cache entry—both the data and the address tag—within a given cache entry is no longer coherent with either system memory or other caches in the storage hierarchy.
Coherency states implemented as extensions to the basic MESI protocol may also require a cast out, or elect to perform a cast out, and therefore benefit from the present invention. For example, the recent (R) state, essentially a variant of the shared state, indicates that the cache entry may be found in both the subject cache and at least one other cache at the same level in the storage hierarchy, and that all copies of the data in the subject cache and other caches are consistent with the corresponding data in system memory, but also indicates that the subject cache, of all caches containing the shared data, most recently received the data in a system bus transaction such as a read from system memory. While a cast out is not necessary to preserve data integrity in such a case, a cast out operation may be useful to accurately maintain the recent state, and the combined address bus transaction of the present invention may be utilized for that purpose.
The combined address of the present invention will save bus cycles over the dual operation scheme of the known art. If each index or tag requires a full bus cycle to completely transmit, the combined address of the present invention may be transmitted in three bus cycles (neglecting the optional state information), rather than four bus cycles as would be required for separate data access and cast out operations. The additional bus cycle is saved because the index field need only be transmitted once for both operations.
The resulting system bus transaction condenses, within a single address, the information required for both the data access operation and the related cast out. The combined index and tags may be transmitted in any predefined order, and may be transmitted on a single bus cycle as shown in FIG. 2B or over multiple consecutive bus cycles. If the combined address is transmitted over multiple bus cycles, the index should be transmitted first to allow the receiving devices to begin a directory lookup at the earliest possible time. The tags may be transmitted during subsequent cycles and still be timely for the comparators employed to compared directory tags to the target tag(s). See commonly assigned, copending U.S. patent application Ser. No. 09/345,302 entitled “CACHE INDEX BASED SYSTEM ADDRESS BUS,” incorporated herein by reference.
With reference now to FIGS. 3A through 3C, diagrams of a cache and related cache control logic for formulating, transmitting and utilizing the combined address for related data access and cast out operations in accordance with a preferred embodiment of the present invention is depicted. The elements depicted are employed in L2 caches 108 a-108 n and in L3 caches 110 a-110 n. A cache controller 302 receives and transmits operations relating to data within cache memory 304 from upstream and downstream buses through bus interface units (“BIU”) 306 a and 306 b. A directory lookup 308 is employed to locate cache lines within cache memory 304 and an LRU unit 310 implements the replacement policy for updating cache lines within cache memory 304.
The logical organization of data within the cache is in tables containing cache directory entries 312 and a corresponding data array 314. The cache directory entries 312 contain the address tag for the corresponding cache lines within data array 314, as well as the coherency state, the LRU status, and an inclusivity (“I”) state for the respective cache line. The coherency state indicates the cache line consistency with other copies of the cache line in other storage devices within the system. The LRU status indicates the LRU position for the cache line within a congruence class. The inclusivity state indicates whether the cache line is stored within a logically in-line, higher level cache.
When a data access operation is received from a processor or higher level storage device, cache controller 302 may trigger the LRU 310 to select a victim, then look up the selected victim to determine if a cast out would be required to update the corresponding cache line and, if so, retrieve the tag for the current contents of the potential victim. This may be performed concurrently with the directory lookup and tag comparison employed to determine whether the received data access operation generates a cache hit or miss.
FIG. 3B depicts a detail of the portion of a cache employed to formulate and transmit a combined address for related data access and cast out operations. When a data access operation is received, the identity and address tag 316 for the potential victim are determined from the replacement policy (LRU) and cache directory. The index field and address tag 318 for the data access operation are supplied within the operation.
A multiplexer 320 receives, as one input 322, the index field and address tag 318 for the data access operation. As the other input 324, multiplexer 320 receives the index field and address tag 318 combined with the address tag 316 for the potential cast out. Multiplexer 320 is controlled by a cast out signal 326 indicating whether a cast out may be required for the data access operation. This may be determined by examining the coherency state of the potential victim and whether the current access was a miss. If the potential victim does not contain valid and unique data (e.g., the coherency state is “shared” or “invalid”), the cast out signal 326 is not asserted. If the potential victim contains unique and valid data (e.g., the coherency state is “modified”), case out signal 326 is asserted.
Multiplexer 320 is also controlled by a cache hit signal 328, taken from the end of the directory lookup and tag comparison functions within the cache and asserted if the cache contains the target data for the received data access operation. If cache hit signal 328 is asserted, the first input 322 is passed to the bus interface unit regardless of whether the cast out signal 326 was asserted. If a cache hit occurs, the target data is within the cache and no need to select a victim exists. Furthermore, no need exists to transmit the address for the data access operation to lower level storage devices, except perhaps to allow the lower level storage devices to update their coherency state and/or LRU information relating to the target cache line.
If neither the cast out signal 326 nor the cache hit signal 328 are asserted, the first input 322 is similarly passed to the bus interface unit. In this circumstance, no need to perform a cast out exists (e.g., the victim was “invalid”). The address for the data access operation will be transmitted to the lower storage levels, however, and the cast out tag may optionally be transmitted with the index and tag for the data access operation to allow lower level devices to update status information (e.g., “recent” version of shared coherency state or LRU position). Otherwise, the second input 324 is passed to the bus interface unit only when the cache hit signal 328 is not asserted (i.e., cache miss) and the cast out signal 326 is asserted. In this case, both the address tag for the data access and the cast out tag will be transmitted with multiplexer 320 to allow a lower level cache a chance to process both operations.
FIG. 3C depicts a detail of the portion of a cache employed to utilize a combined address for related data access and cast out operations to improve directory access performance and response latency. A lower level cache device 330 receiving the combined address of the present invention routes the index field (“Index”) to the cache directory 332 and the cache memory 334 to perform a lookup of the congruence class identified by the index. The lower level cache thus requires only one directory access for the combined address, rather than the two separate directory accesses required by the discrete data access and cast out operations of the known art.
The data access and cast out tags (“Tag RD” and “Tag CO,” respectively) are routed to separate sets of comparators 336 a and 336 b for concurrent comparison to the address tags within the congruence class identified by the index. Separate signals (“CO HIT” and “RD HIT”) are generated for matches of the cast out tag and the data access tag with tags within the indexed congruence class. The RD HIT signal may be employed to select corresponding data within the cache memory 334 for the data access operation. The CO HIT signal may be utilized to update the status of cache directory 332 or to trigger some other function. If the target of the data access operation is contained within cache memory 334, the cache line may be sourced to the higher level cache initiating the combined data access and cast out operation, where the cache line is stored in the location previously occupied by the victim of the cast out operation.
Referring to FIG. 4, a high level flow chart for a process of utilizing the combined address for related data access and cast out operations to improve directory accesses in accordance with a preferred embodiment of the present invention is illustrated. The process begins at step 402, which depicts receiving a combined data access and cast out operation address, including the index field identifying the congruence class containing both the data access target and the cast out victim as well as the address tags for the data access target and the cast out victim.
The process then passes to step 404, which illustrates looking up the congruence class corresponding to the index received within the combined address. The process next passes to step 406, which depicts comparing the address tags within the identified congruence class to the address tag for the data access operation, and then to step 408, which illustrates a determination of whether a match was determined. If so, the process proceeds to step 410, which depicts asserting the data access cache hit signal.
From step 404, the process also passes, in parallel, to step 412, which depicts comparing the address tags within the identified congruence class to the address tag for the cast out operation, and then to step 414, which illustrates a determination of whether a match was determined. If so, the process proceeds to step 416, which depicts asserting the cast out cache hit signal. Thus, only a single directory lookup for the congruence class identified by the index received within the combined address is required. From either or both of steps 410 and/or 416, the process passes to step 418, which illustrates the process becoming idle until another combined data access and cast out operation address is received.
With reference to FIGS. 5A and 5B, timing diagrams for data access and cast out operations in accordance with the known art and for a combined data access and cast out operation in accordance with a preferred embodiment of the present invention are depicted. FIG. 5A depicts a timing diagram for data access and cast out operations in accordance with the known art. The data access operation is initiated by transmitting the index and tag for the target on the address bus by the storage device (“L2”) requiring the target data. The index and tag are presumed to require separate bus cycles for transmission in the example shown.
In the example depicts, a response window of two bus cycles is presumed. During those two bus cycles, the index and tag for the related cast out operation may be transmitted. The combined response to the data access operation (“CR RD”) is then driven by the system bus response logic within the memory controller (MC). Two bus cycles later, the combined response to the cast out operation (“CR CO”) is driven. The data access operation cannot be successfully completed until this response is received.
FIG. 5B depicts the combined data access and cast out operation of the present invention. The combined address is driven by the storage device (L2) initiating the combined operation, and takes fewer cycles. The combined response—to both the data access and the cast out operation—is driven two bus cycles after transmission of the address is complete. In the example shown, the combined response is received one bus cycle sooner with the present invention than with the known art, and the complete operation requires fewer bus cycles (4 bus cycles used over a 6 cycle period) than the known art (6 bus cycles used over a 7 cycle period).
The present invention reduces the number of directory accesses required to perform related data access and cast out operations initiated by a higher level storage device. Latency is also improved over conventional systems, which require separate data access and cast out operations each having their own response period. In the present invention, a single response period suffices for both the data access and cast out operations. Address bus bandwidth utilization is also improved since fewer address bus cycles are consumed for the combined operation than for separate operations.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims (17)

What is claimed is:
1. A method of accessing data, comprising:
receiving a combined address including an address tag for a data access operation, an address tag for a cast out operation related to the data access operation, and an address index for both the data access operation and the cast out operation;
identifying a congruence class corresponding to the address index that includes both a target of the data access operation and a victim of the cast out operation; and
comparing address tags within the congruence class to the address tag for the data access operation and to the address tag for the cast out operation.
2. The method of claim 1, wherein the step of receiving a combined address including an address index for a data access operation, an address tag for the data access operation, and an address tag for a cast out operation related to the data access operation further comprises:
receiving the address index in a first bus cycle and the address tags for the data access and cast out operations in subsequent bus cycles following the first bus cycle.
3. The method of claim 1, wherein the step of receiving a combined address including an address index for a data access operation, an address tag for the data access operation, and an address tag for a cast out operation related to the data access operation further comprises:
receiving directory state information for a victim of the cast out operation.
4. The method of claim 1, wherein the step of identifying a congruence class corresponding to the address index further comprises:
performing a single directory access for both the data access operation and the cast out operation.
5. The method of claim 1, wherein the step of comparing address tags within the congruence class to the address tag for the data access operation and to the address tag for the cast out operation further comprises:
concurrently comparing address tags within the congruence class to the address tag for the data access operation and to the address tag for the cast out operation.
6. A method of accessing data, comprising:
receiving a combined address for a data access operation and a cast out operation related to the data access operation;
accessing a directory a single time utilizing the combined address; and
in response to accessing said directory said single time, responding to the data access operation based on contents of the directory and updating a status of an entry within the directory based on the cast out operation.
7. The method of claim 6, wherein the step of receiving a combined address for a data access operation and a cast out operation related to the data access operation further comprises:
receiving an index identifying a congruence class including both a target of the data access operation and a victim of the cast out operation.
8. The method of claim 6, further comprising:
concurrently comparing a target address tag and a victim address tag within the combined address to address tags within a congruence class identified by an index within the combined address to determine whether a target for the data access operation and a victim for the cast out operation are contained within a memory for a storage device receiving the combined address.
9. A system for accessing data, comprising:
a bus; and
a storage device coupled to the bus, the storage device receiving a combined address including an address tag for a data access operation, an address tag for a cast out operation related to the data access operation on the bus, and an address index for both the data access operation and the cast out operation,
wherein the storage device, responsive to receiving the combined address on the bus, identifies a congruence class corresponding to the address index that includes both a target of the data access and a victim of the cast out operation, and compares address tags within the congruence class to the address tag for the data access operation and to the address tag for the cast out operation.
10. The system of claim 9, wherein the storage device receives the address index within the combined address in a first bus cycle and the address tags for the data access and cast out operations in subsequent bus cycles following the first bus cycle.
11. The system of claim 9, wherein the storage device receives directory state information for a victim of the cast out operation with the combined address.
12. The system of claim 9, wherein the storage device performs a single directory access for both the data access operation and the cast out operation.
13. The system of claim 9, wherein the storage device concurrently compares address tags within the congruence class to the address tag for the data access operation and to the address tag for the cast out operation.
14. A system for accessing data, comprising:
a bus;
a storage device coupled to the bus, the storage device:
receiving a combined address for a data access operation and a cast out operation related to the data access operation;
accessing a directory within the storage device a single time utilizing the combined address; and
in response to said single access, responding to the data access operation based on contents of the directory and updating a status of an entry within the directory based on the cast out operation.
15. The system of claim 14, wherein the storage device receives, within the combined address, an index identifying a congruence class including both a target of the data access operation and a victim of the cast out operation.
16. The system of claim 14, wherein the storage device concurrently compares a target address tag and a victim address tag within the combined address to address tags within a congruence class identified by an index within the combined address to determine whether a target for the data access operation and a victim for the cast out operation are contained within a memory for a storage device receiving the combined address.
17. The system of claim 14, further comprising:
a second storage device coupled to the bus and transmitting the combined address;
a plurality of processors initiating data access operations including the data access operation corresponding to the combined address; and
a system memory containing data targeted by the data access operation corresponding to the combined address.
US09/368,221 1999-08-04 1999-08-04 System bus directory snooping mechanism for read/castout (RCO) address transaction Expired - Fee Related US6343344B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/368,221 US6343344B1 (en) 1999-08-04 1999-08-04 System bus directory snooping mechanism for read/castout (RCO) address transaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/368,221 US6343344B1 (en) 1999-08-04 1999-08-04 System bus directory snooping mechanism for read/castout (RCO) address transaction

Publications (1)

Publication Number Publication Date
US6343344B1 true US6343344B1 (en) 2002-01-29

Family

ID=23450358

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/368,221 Expired - Fee Related US6343344B1 (en) 1999-08-04 1999-08-04 System bus directory snooping mechanism for read/castout (RCO) address transaction

Country Status (1)

Country Link
US (1) US6343344B1 (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040010644A1 (en) * 2002-07-11 2004-01-15 International Business Machines Corporation System and method for providing improved bus utilization via target directed completion
US20040091966A1 (en) * 1999-08-30 2004-05-13 Martin Zeidler Polypeptide regulation by conditional inteins
US20070168617A1 (en) * 2006-01-19 2007-07-19 International Business Machines Corporation Patrol snooping for higher level cache eviction candidate identification
US20090199197A1 (en) * 2008-02-01 2009-08-06 International Business Machines Corporation Wake-and-Go Mechanism with Dynamic Allocation in Hardware Private Array
US20090199184A1 (en) * 2008-02-01 2009-08-06 Arimilli Ravi K Wake-and-Go Mechanism With Software Save of Thread State
US20090199029A1 (en) * 2008-02-01 2009-08-06 Arimilli Ravi K Wake-and-Go Mechanism with Data Monitoring
US20090199030A1 (en) * 2008-02-01 2009-08-06 Arimilli Ravi K Hardware Wake-and-Go Mechanism for a Data Processing System
US20090216955A1 (en) * 2008-02-22 2009-08-27 International Business Machines Corporation Method, system and computer program product for lru compartment capture
US20100269115A1 (en) * 2009-04-16 2010-10-21 International Business Machines Corporation Managing Threads in a Wake-and-Go Engine
US20100268791A1 (en) * 2009-04-16 2010-10-21 International Business Machines Corporation Programming Idiom Accelerator for Remote Update
US20100268790A1 (en) * 2009-04-16 2010-10-21 International Business Machines Corporation Complex Remote Update Programming Idiom Accelerator
US20100293341A1 (en) * 2008-02-01 2010-11-18 Arimilli Ravi K Wake-and-Go Mechanism with Exclusive System Bus Response
US20100293340A1 (en) * 2008-02-01 2010-11-18 Arimilli Ravi K Wake-and-Go Mechanism with System Bus Response
US20110173423A1 (en) * 2008-02-01 2011-07-14 Arimilli Ravi K Look-Ahead Hardware Wake-and-Go Mechanism
US20110173419A1 (en) * 2008-02-01 2011-07-14 Arimilli Ravi K Look-Ahead Wake-and-Go Engine With Speculative Execution
US20110173417A1 (en) * 2008-02-01 2011-07-14 Arimilli Ravi K Programming Idiom Accelerators
US8127080B2 (en) 2008-02-01 2012-02-28 International Business Machines Corporation Wake-and-go mechanism with system address bus transaction master
US8171476B2 (en) 2008-02-01 2012-05-01 International Business Machines Corporation Wake-and-go mechanism with prioritization of threads
US8225120B2 (en) 2008-02-01 2012-07-17 International Business Machines Corporation Wake-and-go mechanism with data exclusivity
US8312458B2 (en) 2008-02-01 2012-11-13 International Business Machines Corporation Central repository for wake-and-go mechanism
US8341635B2 (en) 2008-02-01 2012-12-25 International Business Machines Corporation Hardware wake-and-go mechanism with look-ahead polling
US8516484B2 (en) 2008-02-01 2013-08-20 International Business Machines Corporation Wake-and-go mechanism for a data processing system
US8725992B2 (en) 2008-02-01 2014-05-13 International Business Machines Corporation Programming language exposing idiom calls to a programming idiom accelerator
US8732683B2 (en) 2008-02-01 2014-05-20 International Business Machines Corporation Compiler providing idiom to idiom accelerator
US8880853B2 (en) 2008-02-01 2014-11-04 International Business Machines Corporation CAM-based wake-and-go snooping engine for waking a thread put to sleep for spinning on a target address lock
US8886919B2 (en) 2009-04-16 2014-11-11 International Business Machines Corporation Remote update programming idiom accelerator with allocated processor resources

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4797814A (en) 1986-05-01 1989-01-10 International Business Machines Corporation Variable address mode cache
US5369753A (en) 1990-06-15 1994-11-29 Compaq Computer Corporation Method and apparatus for achieving multilevel inclusion in multilevel cache hierarchies
US5493668A (en) 1990-12-14 1996-02-20 International Business Machines Corporation Multiple processor system having software for selecting shared cache entries of an associated castout class for transfer to a DASD with one I/O operation
US5564035A (en) 1994-03-23 1996-10-08 Intel Corporation Exclusive and/or partially inclusive extension cache system and method to minimize swapping therein
US5636355A (en) * 1993-06-30 1997-06-03 Digital Equipment Corporation Disk cache management techniques using non-volatile storage
US5829038A (en) 1996-06-20 1998-10-27 Intel Corporation Backward inquiry to lower level caches prior to the eviction of a modified line from a higher level cache in a microprocessor hierarchical cache structure
US5829040A (en) 1994-04-11 1998-10-27 Samsung Electronics Co., Ltd. Snooper circuit of a multi-processor system
US5895495A (en) 1997-03-13 1999-04-20 International Business Machines Corporation Demand-based larx-reserve protocol for SMP system buses
US5946709A (en) 1997-04-14 1999-08-31 International Business Machines Corporation Shared intervention protocol for SMP bus using caches, snooping, tags and prioritizing
US5966729A (en) 1997-06-30 1999-10-12 Sun Microsystems, Inc. Snoop filter for use in multiprocessor computer systems
US6018791A (en) 1997-04-14 2000-01-25 International Business Machines Corporation Apparatus and method of maintaining cache coherency in a multi-processor computer system with global and local recently read states
US6021468A (en) 1997-04-14 2000-02-01 International Business Machines Corporation Cache coherency protocol with efficient write-through aliasing
US6023747A (en) * 1997-12-17 2000-02-08 International Business Machines Corporation Method and system for handling conflicts between cache operation requests in a data processing system
US6029204A (en) 1997-03-13 2000-02-22 International Business Machines Corporation Precise synchronization mechanism for SMP system buses using tagged snoop operations to avoid retries
US6058456A (en) 1997-04-14 2000-05-02 International Business Machines Corporation Software-managed programmable unified/split caching mechanism for instructions and data
US6195729B1 (en) 1998-02-17 2001-02-27 International Business Machines Corporation Deallocation with cache update protocol (L2 evictions)

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4797814A (en) 1986-05-01 1989-01-10 International Business Machines Corporation Variable address mode cache
US5369753A (en) 1990-06-15 1994-11-29 Compaq Computer Corporation Method and apparatus for achieving multilevel inclusion in multilevel cache hierarchies
US5493668A (en) 1990-12-14 1996-02-20 International Business Machines Corporation Multiple processor system having software for selecting shared cache entries of an associated castout class for transfer to a DASD with one I/O operation
US5636355A (en) * 1993-06-30 1997-06-03 Digital Equipment Corporation Disk cache management techniques using non-volatile storage
US5564035A (en) 1994-03-23 1996-10-08 Intel Corporation Exclusive and/or partially inclusive extension cache system and method to minimize swapping therein
US5829040A (en) 1994-04-11 1998-10-27 Samsung Electronics Co., Ltd. Snooper circuit of a multi-processor system
US5829038A (en) 1996-06-20 1998-10-27 Intel Corporation Backward inquiry to lower level caches prior to the eviction of a modified line from a higher level cache in a microprocessor hierarchical cache structure
US6029204A (en) 1997-03-13 2000-02-22 International Business Machines Corporation Precise synchronization mechanism for SMP system buses using tagged snoop operations to avoid retries
US5895495A (en) 1997-03-13 1999-04-20 International Business Machines Corporation Demand-based larx-reserve protocol for SMP system buses
US5946709A (en) 1997-04-14 1999-08-31 International Business Machines Corporation Shared intervention protocol for SMP bus using caches, snooping, tags and prioritizing
US6018791A (en) 1997-04-14 2000-01-25 International Business Machines Corporation Apparatus and method of maintaining cache coherency in a multi-processor computer system with global and local recently read states
US6021468A (en) 1997-04-14 2000-02-01 International Business Machines Corporation Cache coherency protocol with efficient write-through aliasing
US6058456A (en) 1997-04-14 2000-05-02 International Business Machines Corporation Software-managed programmable unified/split caching mechanism for instructions and data
US5966729A (en) 1997-06-30 1999-10-12 Sun Microsystems, Inc. Snoop filter for use in multiprocessor computer systems
US6023747A (en) * 1997-12-17 2000-02-08 International Business Machines Corporation Method and system for handling conflicts between cache operation requests in a data processing system
US6195729B1 (en) 1998-02-17 2001-02-27 International Business Machines Corporation Deallocation with cache update protocol (L2 evictions)

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Handy, Jim; The Cache Memory Book; Academic Press, Inc.; 1993; pp. 77-82.
Lebeck et al., Request combining in multiprocessors with arbitrary interconnection networks, IEEE digital library, vol. 5, No. 11, pp. 1140-1155, Nov. 1994.*
Texas Instruments Incorporated, TM532010 User's Guide, 1983, 3 pages.

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040091966A1 (en) * 1999-08-30 2004-05-13 Martin Zeidler Polypeptide regulation by conditional inteins
US6973520B2 (en) 2002-07-11 2005-12-06 International Business Machines Corporation System and method for providing improved bus utilization via target directed completion
US20040010644A1 (en) * 2002-07-11 2004-01-15 International Business Machines Corporation System and method for providing improved bus utilization via target directed completion
US7577793B2 (en) * 2006-01-19 2009-08-18 International Business Machines Corporation Patrol snooping for higher level cache eviction candidate identification
US20070168617A1 (en) * 2006-01-19 2007-07-19 International Business Machines Corporation Patrol snooping for higher level cache eviction candidate identification
TWI417723B (en) * 2006-01-19 2013-12-01 Ibm Method for cache line replacement
US8341635B2 (en) 2008-02-01 2012-12-25 International Business Machines Corporation Hardware wake-and-go mechanism with look-ahead polling
US8225120B2 (en) 2008-02-01 2012-07-17 International Business Machines Corporation Wake-and-go mechanism with data exclusivity
US20090199029A1 (en) * 2008-02-01 2009-08-06 Arimilli Ravi K Wake-and-Go Mechanism with Data Monitoring
US8880853B2 (en) 2008-02-01 2014-11-04 International Business Machines Corporation CAM-based wake-and-go snooping engine for waking a thread put to sleep for spinning on a target address lock
US8788795B2 (en) 2008-02-01 2014-07-22 International Business Machines Corporation Programming idiom accelerator to examine pre-fetched instruction streams for multiple processors
US8732683B2 (en) 2008-02-01 2014-05-20 International Business Machines Corporation Compiler providing idiom to idiom accelerator
US8725992B2 (en) 2008-02-01 2014-05-13 International Business Machines Corporation Programming language exposing idiom calls to a programming idiom accelerator
US20100293341A1 (en) * 2008-02-01 2010-11-18 Arimilli Ravi K Wake-and-Go Mechanism with Exclusive System Bus Response
US20100293340A1 (en) * 2008-02-01 2010-11-18 Arimilli Ravi K Wake-and-Go Mechanism with System Bus Response
US20110173423A1 (en) * 2008-02-01 2011-07-14 Arimilli Ravi K Look-Ahead Hardware Wake-and-Go Mechanism
US20110173419A1 (en) * 2008-02-01 2011-07-14 Arimilli Ravi K Look-Ahead Wake-and-Go Engine With Speculative Execution
US20110173417A1 (en) * 2008-02-01 2011-07-14 Arimilli Ravi K Programming Idiom Accelerators
US8015379B2 (en) 2008-02-01 2011-09-06 International Business Machines Corporation Wake-and-go mechanism with exclusive system bus response
US8640142B2 (en) 2008-02-01 2014-01-28 International Business Machines Corporation Wake-and-go mechanism with dynamic allocation in hardware private array
US8127080B2 (en) 2008-02-01 2012-02-28 International Business Machines Corporation Wake-and-go mechanism with system address bus transaction master
US8640141B2 (en) 2008-02-01 2014-01-28 International Business Machines Corporation Wake-and-go mechanism with hardware private array
US8145849B2 (en) 2008-02-01 2012-03-27 International Business Machines Corporation Wake-and-go mechanism with system bus response
US8171476B2 (en) 2008-02-01 2012-05-01 International Business Machines Corporation Wake-and-go mechanism with prioritization of threads
US8612977B2 (en) 2008-02-01 2013-12-17 International Business Machines Corporation Wake-and-go mechanism with software save of thread state
US20090199030A1 (en) * 2008-02-01 2009-08-06 Arimilli Ravi K Hardware Wake-and-Go Mechanism for a Data Processing System
US20090199197A1 (en) * 2008-02-01 2009-08-06 International Business Machines Corporation Wake-and-Go Mechanism with Dynamic Allocation in Hardware Private Array
US8250396B2 (en) 2008-02-01 2012-08-21 International Business Machines Corporation Hardware wake-and-go mechanism for a data processing system
US8312458B2 (en) 2008-02-01 2012-11-13 International Business Machines Corporation Central repository for wake-and-go mechanism
US8316218B2 (en) 2008-02-01 2012-11-20 International Business Machines Corporation Look-ahead wake-and-go engine with speculative execution
US20090199184A1 (en) * 2008-02-01 2009-08-06 Arimilli Ravi K Wake-and-Go Mechanism With Software Save of Thread State
US8386822B2 (en) 2008-02-01 2013-02-26 International Business Machines Corporation Wake-and-go mechanism with data monitoring
US8452947B2 (en) 2008-02-01 2013-05-28 International Business Machines Corporation Hardware wake-and-go mechanism and content addressable memory with instruction pre-fetch look-ahead to detect programming idioms
US8516484B2 (en) 2008-02-01 2013-08-20 International Business Machines Corporation Wake-and-go mechanism for a data processing system
US8180970B2 (en) 2008-02-22 2012-05-15 International Business Machines Corporation Least recently used (LRU) compartment capture in a cache memory system
US20090216955A1 (en) * 2008-02-22 2009-08-27 International Business Machines Corporation Method, system and computer program product for lru compartment capture
US8230201B2 (en) 2009-04-16 2012-07-24 International Business Machines Corporation Migrating sleeping and waking threads between wake-and-go mechanisms in a multiple processor data processing system
US8145723B2 (en) 2009-04-16 2012-03-27 International Business Machines Corporation Complex remote update programming idiom accelerator
US8082315B2 (en) 2009-04-16 2011-12-20 International Business Machines Corporation Programming idiom accelerator for remote update
US20100268790A1 (en) * 2009-04-16 2010-10-21 International Business Machines Corporation Complex Remote Update Programming Idiom Accelerator
US20100268791A1 (en) * 2009-04-16 2010-10-21 International Business Machines Corporation Programming Idiom Accelerator for Remote Update
US20100269115A1 (en) * 2009-04-16 2010-10-21 International Business Machines Corporation Managing Threads in a Wake-and-Go Engine
US8886919B2 (en) 2009-04-16 2014-11-11 International Business Machines Corporation Remote update programming idiom accelerator with allocated processor resources

Similar Documents

Publication Publication Date Title
US6343344B1 (en) System bus directory snooping mechanism for read/castout (RCO) address transaction
US6353875B1 (en) Upgrading of snooper cache state mechanism for system bus with read/castout (RCO) address transactions
US6275909B1 (en) Multiprocessor system bus with system controller explicitly updating snooper cache state information
US6343347B1 (en) Multiprocessor system bus with cache state and LRU snoop responses for read/castout (RCO) address transaction
US6571322B2 (en) Multiprocessor computer system with sectored cache line mechanism for cache intervention
US5802572A (en) Write-back cache having sub-line size coherency granularity and method for maintaining coherency within a write-back cache
US6502171B1 (en) Multiprocessor system bus with combined snoop responses explicitly informing snoopers to scarf data
US7698508B2 (en) System and method for reducing unnecessary cache operations
KR100326980B1 (en) Cache coherency protocol for a data processing system including a multi-level memory hierarchy
US6023747A (en) Method and system for handling conflicts between cache operation requests in a data processing system
USRE45078E1 (en) Highly efficient design of storage array utilizing multiple pointers to indicate valid and invalid lines for use in first and second cache spaces and memory subsystems
US7032074B2 (en) Method and mechanism to use a cache to translate from a virtual bus to a physical bus
KR100194253B1 (en) How to Use Mesh Data Coherency Protocol and Multiprocessor System
US6408362B1 (en) Data processing system, cache, and method that select a castout victim in response to the latencies of memory copies of cached data
US6185658B1 (en) Cache with enhanced victim selection using the coherency states of cache lines
US8621152B1 (en) Transparent level 2 cache that uses independent tag and valid random access memory arrays for cache access
KR100330934B1 (en) Cache coherency protocol including a hovering(h) state having a precise mode and an imprecise mode
JP3463292B2 (en) Method and system for selecting an alternative cache entry for replacement in response to a conflict between cache operation requests
US6615321B2 (en) Mechanism for collapsing store misses in an SMP computer system
US6279086B1 (en) Multiprocessor system bus with combined snoop responses implicitly updating snooper LRU position
US6349367B1 (en) Method and system for communication in which a castout operation is cancelled in response to snoop responses
US6338124B1 (en) Multiprocessor system bus with system controller explicitly updating snooper LRU information
KR100326632B1 (en) Cache coherency protocol including an h_r state
US7949833B1 (en) Transparent level 2 cache controller
KR19980086620A (en) Memory controller with queue and snoop table

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ARIMILLI, RAVI K.;DODSON, JOHN S.;GUTHRIE, GUY L.;AND OTHERS;REEL/FRAME:010158/0134

Effective date: 19990803

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

CC Certificate of correction
FPAY Fee payment

Year of fee payment: 4

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20100129