US5787243A - Main memory system and checkpointing protocol for fault-tolerant computer system - Google Patents
Main memory system and checkpointing protocol for fault-tolerant computer system Download PDFInfo
- Publication number
- US5787243A US5787243A US08/674,660 US67466096A US5787243A US 5787243 A US5787243 A US 5787243A US 67466096 A US67466096 A US 67466096A US 5787243 A US5787243 A US 5787243A
- Authority
- US
- United States
- Prior art keywords
- memory
- buffer memory
- cache
- data
- processor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1415—Saving, restoring, recovering or retrying at system level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1405—Saving, restoring, recovering or retrying at machine instruction level
- G06F11/1407—Checkpointing the instruction stream
Definitions
- This invention provides a mechanism for maintaining a consistent state in main memory without constraining normal computer operation, thereby enabling a computer system to recover from faults without loss of data integrity or processing continuity.
- a processor and input/output elements are connected to a main memory via a memory bus.
- a shadow memory element which includes a buffer memory and a main storage element, is also attached to this memory bus.
- data written to primary memory is also captured by the buffer memory of the shadow memory element.
- the data previously captured in the buffer memory is then copied to the main storage element of the shadow memory element.
- FIG. 1 is a block diagram of a fault tolerant computer system which uses the main memory structure of the present invention
- FIG. 2 is a block diagram illustrating in more detail a processing unit with a cache and a shadowed main memory
- FIG. 3 is a more detailed block diagram of the shadow memory shown in FIG. 2;
- FIG. 4 is a more detailed block diagram of the memory control logic shown in FIG. 3;
- FIG. 5 is a diagram of memory locations used by the processing units to maintain main memory consistency.
- FIG. 6 is a flowchart describing how each processing unit controls flushing of its cache to maintain main memory consistency.
- a computer system must guarantee the existence of a consistent state in main memory (i.e., a "checkpoint") to which all application programs can return following a fault if it is to be able to recover transparently from the fault. It is highly desirable for the computer system to provide this capability without placing any special requirements on application programs using it. In some currently available computer systems, a consistent state is assured by storing all modifiable data in two physically disjoint locations (a primary location and a shadow location) in main memory. U.S. Pat. Nos. 4,654,819 and 4,819,154 further describe such a computer system.
- each processor in the computer system must have a blocking cache; that is, the processor cannot write any cache line back to main memory unless it writes back all currently modified lines at the same time.
- any cache overflow or request for data in the cache from another processor forces the processor to flush the entire cache.
- FIG. 1 is a block diagram of one embodiment of a fault tolerant computer system 11 embodying the invention.
- One or more processing elements 14 and 16 are connected to one or more main memory systems 18 and 20 via one or more buses 10 and 12.
- One or more input/output (I/O) subsystems 22 and 24 are also connected to the bus 10 (12).
- Each I/O subsystem comprises an input/output (I/O) element 26 (28) and one or more buses 30 and 32 (34 and 36).
- An I/O element 26 (28) may also be connected to any standard I/O bus 38 (40), such as a VME bus.
- any standard I/O bus 38 40
- each processing element e.g., 14 includes a processing unit 44 connected to a cache 42. This connection also connects the processing unit 44 and the cache 42 to the bus 10.
- the processing unit 44 may be any standard microprocessor unit (MPU). For example, the PENTIUM microprocessor, available from Intel Corporation, is suitable for this purpose.
- the processing unit 44 operates in accordance with any suitable operating system, as is conventional.
- a processing element 14 may include dual processing units 44 for self-checking purposes.
- the cache 42 is either a write-through or a write-back type of cache and has an arbitrary size and associativity.
- the processing unit 44 may store in the cache 42 either data only or both computer program instructions and data.
- an additional similar instruction cache 43 may be connected to the processing unit 44 for the processing unit 44 to store computer program instructions. This connection also connects the instruction cache 43 to the bus 10. If this system is a symmetric multiprocessing computer system, each processing unit 44 may use any conventional mechanism to maintain cache coherency, such as bus snooping.
- the cache 42 is connected to a main memory system, e.g., 18, via bus 10.
- the main memory system includes a primary memory element (PME) 46 and a shadow memory element (SME) 48 which are interconnected and connected to bus 10.
- PME primary memory element
- SME shadow memory element
- the SME 48 includes a buffer memory 52 and a main storage element 50, each having data inputs, data outputs and control inputs including access control and address inputs.
- the buffer memory and main storage element are typically implemented as dynamic, volatile, random-access memories (DRAMs), in the form of integrated circuits, typically, single in-line memory modules (SIMMs).
- a bus transceiver 55 connects the inputs of a data input buffer 54 and data outputs of the main storage element 50 to bus 10. Outputs of the data input buffer 54 are connected to the data inputs of buffer memory 52 and the data inputs of main storage element 50.
- the data outputs of the buffer memory 52 are also connected to the data inputs of the main storage element 50.
- Memory control logic 58 has control outputs which are connected to control inputs of each of the buffer memory 52, main storage element 50, data input buffer 54 and bus transceiver 55 to control the flow of data among those elements, in a manner which is described below.
- Memory control logic 58 also has data paths connected to bus 10 through the bus transceiver 55, a first address input connected to the address portion of bus 10 via bus transceiver 57 and a second address input connected to the data outputs of an address buffer memory 56.
- the address buffer memory 56 is also connected to outputs of an address input buffer 59, of which outputs are connected to the address portion of bus 10 via bus transceiver 57. Both bus transceiver 57 and address input buffer 59 have a control input connected to the memory control logic 58.
- the memory control logic 58 also controls storage in the address buffer memory 56 of addresses which correspond to data stored in the buffer memory 52, in a manner which is described below.
- the non-memory logic elements may be implemented using conventional circuitry, custom or semi-custom integrated circuits or programmable gate arrays.
- the PME's 46 may also have the same structure as the SME's 48.
- the buffer memory 52 in a memory element used as a PME 46 may store computer program instructions or read-only data which does not have to be shadowed.
- the memory control logic 58 in a memory element is preferably programmable to enable the memory element to be either a PME or an SME.
- the buffer memory 52 should be large enough to capture all data modified between any pair of cache flushes. Given the process described below for using this system, the total capacity of all of the buffer memories 52 combined in computer system 11 should preferably be (at least) larger than the combined capacity of the caches 42 in the computer system 11.
- the memory control logic 58 is illustrated in more detail in FIG. 4. It includes a command register 68 which has data input connected to bus 10 via the bus transceiver 55. A status register 66 has an output also connected to the bus 10 via bus transceiver 55. Buffer memory control circuit 60 and main storage control circuit 62 provide the row address strobe (RAS), column address strobe (CAS), row and column addresses and write enable (WE) control signals to the buffer memory 52 and main storage element 50, respectively. Control circuit 60 and 62 also have connections for coordinating data transfer between buffer memory 52 and main storage element 50. Buffer memory control circuit 60 has an output connected to the input of the status register 66 to indicate how full the buffer memory 52 is and whether copying from the buffer memory 52 to main storage element 50 is complete.
- RAS row address strobe
- CAS column address strobe
- WE write enable
- Buffer memory control circuit 60 also has an input connected to the output of command register 68 which indicates whether it should copy data between the buffer memory 52 and the main storage element 50.
- the command register also indicates whether the memory element is a primary memory element or a shadow memory element.
- An I/O interface control 64 controls the flow of information through the status register 66 and command register 68, and coordinates data transfers through the bus transceivers 55 and 57 with the buffer memory control circuit 60 and main storage control circuit 62.
- the I/O interface control 64 also accepts inputs from the address portion of bus 10, so as to recognize addresses to the command and status registers and to the main memory system itself.
- This process allows data to be passed from one processing element 14 to another processing element 16 without requiring the entire cache 42 of processing unit 14 to be flushed. Since all processing units 44 in the computer system 11 have access to all buses, each processing unit 44 may use conventional bus snooping methods to assure cache coherency. If all processing units 44 do not have access to all system buses, the processing units 44 may use other well-known cache coherency techniques instead.
- each shadow memory element 48 allows consistency to be maintained in the main memory system 18 in the event of a fault. All data lines that are stored in one primary memory element 46 are also stored in the buffer memory 52, along with their corresponding memory (physical) addresses which are stored in the associated address buffer memory 56 in the shadow memory element 48.
- the protocol also applies to lines written to the primary memory element 46 when a cache 42 is flushed by the operating system using either specially designed flushing hardware or conventional cache flushing processor instructions. Flushing operations by the processing units 44 are synchronized. When all processing units 44 have completed their flush, the operating system instructs the shadow memory element 48, using command register 68, to copy, using main storage control circuit 62, the contents of the buffer memory element 52 into its main storage element 50. To maintain consistency, once a processing element 14 has begun a flush, it cannot resume normal operation until all other processing elements 14 have completed their flushes.
- Processor cache flushing is synchronized because the buffer memory needs to know which data should be copied to the main storage element 50, and which data should not. That is, the buffer memory needs to distinguish between post-flush and pre-flush data. Thus, if the buffer does not know what processor is sending data, all processors must complete their flushes before normal operation can begin in order to maintain consistency. Synchronization is preferably controlled using a test-and-set lock operation using a designated location in main memory 18, such as indicated at 80 in FIG. 5, to store the lock value. At periodic intervals, each processing unit 44 determines whether it should initiate a flush operation as indicated at step 90 in FIG. 6. The processing unit 44 can make this determination in a number of different ways.
- one or more bits in the status register 66 of the shadow memory element 48 could be used to indicate the remaining capacity of the buffer memory 52. If the buffer memory 52 is too full, a processing unit 44 initiates a flush. Also, a flush may be initiated after a fixed period of time has elapsed. If this processing unit 44 does not need to initiate a flush, then it examines the designated memory location 80 to determine whether another processing unit 44 has already set the lock (step 92). If the lock is not set, this process ends as indicated at 94. Otherwise, if the lock is set, this processing unit 44 flushes its cache 42 in step 96.
- the effect of the flushing operation is to store all lines in the cache (or preferably only those lines that have been modified since the last flush) to the primary memory element 46, and, because of the aforementioned properties of the shadow memory element 48, to the buffer memory 50 of the shadow memory element 48 as well.
- the processing unit 44 saves its state in the cache 42 so that this information is flushed as well.
- step 90 determines whether the lock is already set in step 98, similar to step 92. If the lock is already set, the processing unit 44 continues by flushing its cache 42 in step 96. Otherwise, it sets the lock in step 100, and identifies itself as the initiator of the flush before flushing its cache 42.
- each processing unit 44 flushes its cache 42 in step 96, it increments its corresponding flush counter in step 102.
- each processing unit 44 has a flush counter, such as shown at 82 and 84, which are predetermined designated locations in main memory 18.
- the processing unit 44 determines whether it is the initiator of this flush sequence (step 104). If it is not the initiator, it then waits until the lock is released in step 106. When the lock is released, this process ends in step 108 and the processing unit 44 may resume normal operations.
- step 104 If the processing unit 44 is the initiator of the flush as determined in step 104, it then waits until all flush counters (82-84) are incremented in step 110. Once all flush counters have been incremented, this processing unit 44 instructs the shadow memory element 48 to begin copying data in the buffer memory 52 into the main storage element 50, by sending a command to the command register 68, and releases the lock (step 112). Receipt of the command notifies the shadow memory element 48 that the flush has completed and causes the buffer memory control 60 in conjunction with the main storage control 62 to move the data that was stored in the buffer memory 52 into the appropriate locations (as determined by the corresponding physical address stored in address buffer memory 56) in the main storage element 50. Once this command has been sent, the flush lock is released and the processing units 44 can resume normal processing.
- the loops around steps 106 and 110 should have time-out protection which triggers fault recovery procedures, in the event of a failure during flushing operations.
- the buffer memory control 60 can distinguish between pre-flush and post-flush data, for example, by storing the last address of buffer memory 52 in which data is stored at the end of each synchronized flushing operation. There are other ways to identify such a boundary, for example by monitoring the addresses of buffer memory 52 from which data has been copied, or by counting how much data has been written to buffer memory 52. All data stored in addresses in buffer memory 52 between the address stored for the (i-1)th flush and the address stored for the ith flush is pre-flush data. Any data stored in an address outside of that range is post-flush data which is not copied to the main storage element 50. Any (i+1)th flush data may be placed in any area of the buffer memory 52 which has been copied to the main storage element.
- the contents of the shadow memory element 48 can either be copied to the corresponding primary memory element 46, if it is still operational, or the shadow memory element 48 can take over the role of the primary memory element 46. In either event, normal processing can resume from that saved state.
- Overflow of a buffer memory 52 is also not fatal.
- the contents of the associated shadow memory element 48 can always be restored by copying the contents of its associated primary memory element 46. Since the system may not be able to recover from a fault during this interval, however, it is important that the probability of such an overflow be kept to a minimum.
- This checkpointing protocol allows data to be written to a primary memory element 46 at any time. Consequently, a single cache line can be written to a primary memory element 46 without forcing the entire cache 42 to be flushed, thereby relaxing the requirement for a large, associative cache. Further, data can be passed from cache 42 of one processing unit 44 to cache 42 of another processing unit 44 so long as it is simultaneously updated in the primary memory element 46 and in the buffer memory 52 in the shadow memory element 48. Significant performance advantages can be obtained using this protocol in a multiprocessing system in which shared data is frequently passed from the processing element (e.g., 14) to another processing element (e.g., 16).
- a shadow memory element 48 remains passive so far as the bus 10 is concerned. It simply stores in its buffer memory 52 all data written to its corresponding primary memory element 46. In order for the shadow memory element 48 to accept data synchronously with the primary memory element 46 the data input buffer 54 temporarily stores the data because a line may be in the process of being copied from the buffer memory 52 to the main storage 50 at the time of the write.
- Some performance advantage can be gained if certain non-standard bus protocols are also implemented. For example, if the bus protocol allows the shadow memory element 48 to distinguish between processing elements 14, or at least to identify whether a line being stored has been written by a processing element 14 that has completed its ith flush or is still executing its ith flush, a processing element 14 does not have to wait until all other processing elements have completed their flushes before it resumes normal operation. In this case, consistency is maintained in main memory by requiring a processing element 14 to suspend normal operation after completing its ith flush, only until all other processing elements 16 have also at least begun (but not necessarily completed) their ith flushes. This relaxed synchronization restriction still achieves checkpoint consistency.
- This less restrictive synchronization protocol can be allowed if the logic associated with the buffer memory 52 can distinguish between data that is being written as part of the flushing operation (and hence must be stored in the part of the buffer memory 52 that is to be stored to the main storage element 50 as soon as all processing elements 14 have completed their flushes) and data that is being written by a processing element 14 that has completed its flush (and hence is not to be transferred to main storage element 50 until the next flush is completed).
- the order and placement of steps 96 and 102 in FIG. 6 are reversed.
- Non-standard bus protocol features while also not necessary to support memory consistency, can be introduced to decrease recovery times following a fault by reducing memory-to-memory copy time.
- Two such features are the ability to support "dual-write" and "copy” memory access modes. If a line is stored in dual-write mode, both the primary memory element 46 and the shadow memory element 48 store the line in the main storage element 50. (Thus, the shadow memory element 48 does not store this data in the associated buffer memory 52). In copy mode, the primary memory element 46 sources the addressed line and the shadow memory element 48 stores the resulting data to the corresponding location in the main storage element 50.
- a memory element may also be useful to provide the capability for a memory element to operate in a "phantom mode" in which it acts like a primary memory element for accesses over some designated range of addresses, but like a shadow for all other addresses. This mode allows the computer system 11 to operate with some PMEs 46 shadowed and others unshadowed. Such a feature may be useful, for example, when a portion of the primary memory has failed and no replacement is immediately available, but the remainder of primary memory is still functioning normally.
Abstract
Description
Claims (33)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US08/674,660 US5787243A (en) | 1994-06-10 | 1996-07-02 | Main memory system and checkpointing protocol for fault-tolerant computer system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US25816594A | 1994-06-10 | 1994-06-10 | |
US08/674,660 US5787243A (en) | 1994-06-10 | 1996-07-02 | Main memory system and checkpointing protocol for fault-tolerant computer system |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US25816594A Continuation | 1994-06-10 | 1994-06-10 |
Publications (1)
Publication Number | Publication Date |
---|---|
US5787243A true US5787243A (en) | 1998-07-28 |
Family
ID=22979374
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US08/674,660 Expired - Lifetime US5787243A (en) | 1994-06-10 | 1996-07-02 | Main memory system and checkpointing protocol for fault-tolerant computer system |
Country Status (6)
Country | Link |
---|---|
US (1) | US5787243A (en) |
EP (1) | EP0764302B1 (en) |
JP (1) | JPH10506483A (en) |
AU (1) | AU2663095A (en) |
DE (1) | DE69506404T2 (en) |
WO (1) | WO1995034860A1 (en) |
Cited By (44)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5878250A (en) * | 1997-04-07 | 1999-03-02 | Altera Corporation | Circuitry for emulating asynchronous register loading functions |
US5948081A (en) * | 1997-12-22 | 1999-09-07 | Compaq Computer Corporation | System for flushing queued memory write request corresponding to a queued read request and all prior write requests with counter indicating requests to be flushed |
US5958070A (en) * | 1995-11-29 | 1999-09-28 | Texas Micro, Inc. | Remote checkpoint memory system and protocol for fault-tolerant computer system |
US5968168A (en) * | 1996-06-07 | 1999-10-19 | Kabushiki Kaisha Toshiba | Scheduler reducing cache failures after check points in a computer system having check-point restart function |
US5991852A (en) * | 1996-10-28 | 1999-11-23 | Mti Technology Corporation | Cache ram using a secondary controller and switching circuit and improved chassis arrangement |
US6101599A (en) * | 1998-06-29 | 2000-08-08 | Cisco Technology, Inc. | System for context switching between processing elements in a pipeline of processing elements |
US6108707A (en) * | 1998-05-08 | 2000-08-22 | Apple Computer, Inc. | Enhanced file transfer operations in a computer system |
US6173386B1 (en) | 1998-12-14 | 2001-01-09 | Cisco Technology, Inc. | Parallel processor with debug capability |
US6298431B1 (en) * | 1997-12-31 | 2001-10-02 | Intel Corporation | Banked shadowed register file |
US6438661B1 (en) | 1999-03-03 | 2002-08-20 | International Business Machines Corporation | Method, system, and program for managing meta data in a storage system and rebuilding lost meta data in cache |
US6493795B1 (en) * | 1998-12-30 | 2002-12-10 | Emc Corporation | Data storage system |
KR100365891B1 (en) * | 2000-12-13 | 2002-12-27 | 한국전자통신연구원 | Backup/recovery Apparatus and method for non-log processing of real-time main memory database system |
US6502174B1 (en) | 1999-03-03 | 2002-12-31 | International Business Machines Corporation | Method and system for managing meta data |
US6505269B1 (en) | 2000-05-16 | 2003-01-07 | Cisco Technology, Inc. | Dynamic addressing mapping to eliminate memory resource contention in a symmetric multiprocessor system |
US6513097B1 (en) | 1999-03-03 | 2003-01-28 | International Business Machines Corporation | Method and system for maintaining information about modified data in cache in a storage system for use during a system failure |
US6513108B1 (en) | 1998-06-29 | 2003-01-28 | Cisco Technology, Inc. | Programmable processing engine for efficiently processing transient data |
US6622263B1 (en) | 1999-06-30 | 2003-09-16 | Jack Justin Stiffler | Method and apparatus for achieving system-directed checkpointing without specialized hardware assistance |
US6662252B1 (en) | 1999-11-03 | 2003-12-09 | Cisco Technology, Inc. | Group and virtual locking mechanism for inter processor synchronization |
US6862668B2 (en) | 2002-02-25 | 2005-03-01 | International Business Machines Corporation | Method and apparatus for using cache coherency locking to facilitate on-line volume expansion in a multi-controller storage system |
US20050076262A1 (en) * | 2003-09-23 | 2005-04-07 | Revivio, Inc. | Storage management device |
US6898700B2 (en) * | 1998-03-31 | 2005-05-24 | Intel Corporation | Efficient saving and restoring state in task switching |
US6920562B1 (en) | 1998-12-18 | 2005-07-19 | Cisco Technology, Inc. | Tightly coupled software protocol decode with hardware data encryption |
US20050165966A1 (en) * | 2000-03-28 | 2005-07-28 | Silvano Gai | Method and apparatus for high-speed parsing of network messages |
US20060242456A1 (en) * | 2005-04-26 | 2006-10-26 | Kondo Thomas J | Method and system of copying memory from a source processor to a target processor by duplicating memory writes |
EP1734534A2 (en) | 2005-05-19 | 2006-12-20 | Honeywell International Inc. | Interface between memories having different write times |
US20070150675A1 (en) * | 2005-12-22 | 2007-06-28 | International Business Machines Corporation | Validity of address ranges used in semi-synchronous memory copy operations |
US7239581B2 (en) | 2004-08-24 | 2007-07-03 | Symantec Operating Corporation | Systems and methods for synchronizing the internal clocks of a plurality of processor modules |
US20070180312A1 (en) * | 2006-02-01 | 2007-08-02 | Avaya Technology Llc | Software duplication |
US7287133B2 (en) | 2004-08-24 | 2007-10-23 | Symantec Operating Corporation | Systems and methods for providing a modification history for a location within a data store |
US7296008B2 (en) | 2004-08-24 | 2007-11-13 | Symantec Operating Corporation | Generation and use of a time map for accessing a prior image of a storage device |
US7409587B2 (en) | 2004-08-24 | 2008-08-05 | Symantec Operating Corporation | Recovering from storage transaction failures using checkpoints |
US20080307182A1 (en) * | 2005-12-22 | 2008-12-11 | International Business Machines Corporation | Efficient and flexible memory copy operation |
US7480909B2 (en) | 2002-02-25 | 2009-01-20 | International Business Machines Corporation | Method and apparatus for cooperative distributed task management in a storage subsystem with multiple controllers using cache locking |
US7536583B2 (en) | 2005-10-14 | 2009-05-19 | Symantec Operating Corporation | Technique for timeline compression in a data store |
US20100077164A1 (en) * | 2005-12-13 | 2010-03-25 | Jack Justin Stiffler | Memory-controller-embedded apparatus and procedure for achieving system-directed checkpointing without operating-system kernel support |
US7725760B2 (en) | 2003-09-23 | 2010-05-25 | Symantec Operating Corporation | Data storage system |
US7730222B2 (en) | 2004-08-24 | 2010-06-01 | Symantec Operating System | Processing storage-related I/O requests using binary tree data structures |
US20100192041A1 (en) * | 2009-01-23 | 2010-07-29 | Micron Technology, Inc. | Memory devices and methods for managing error regions |
US7827362B2 (en) | 2004-08-24 | 2010-11-02 | Symantec Corporation | Systems, apparatus, and methods for processing I/O requests |
US7904428B2 (en) | 2003-09-23 | 2011-03-08 | Symantec Corporation | Methods and apparatus for recording write requests directed to a data store |
US7991748B2 (en) | 2003-09-23 | 2011-08-02 | Symantec Corporation | Virtual data store creation and use |
USRE45632E1 (en) * | 2005-01-03 | 2015-07-28 | O'shantel Software L.L.C. | Memory-controller-embedded apparatus and procedure for achieving system-directed checkpointing without operating-system kernel support |
US20160140044A1 (en) * | 2012-10-11 | 2016-05-19 | Soft Machines, Inc. | Systems and methods for non-blocking implementation of cache flush instructions |
US9858151B1 (en) * | 2016-10-03 | 2018-01-02 | International Business Machines Corporation | Replaying processing of a restarted application |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5745672A (en) * | 1995-11-29 | 1998-04-28 | Texas Micro, Inc. | Main memory system and checkpointing protocol for a fault-tolerant computer system using a read buffer |
US8812781B2 (en) * | 2005-04-19 | 2014-08-19 | Hewlett-Packard Development Company, L.P. | External state cache for computer processor |
Citations (87)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2028517A1 (en) * | 1969-01-17 | 1970-10-09 | Plessey Btr Ltd | |
US3588829A (en) * | 1968-11-14 | 1971-06-28 | Ibm | Integrated memory system with block transfer to a buffer store |
US3736566A (en) * | 1971-08-18 | 1973-05-29 | Ibm | Central processing unit with hardware controlled checkpoint and retry facilities |
US3761881A (en) * | 1971-06-30 | 1973-09-25 | Ibm | Translation storage scheme for virtual memory system |
US3803560A (en) * | 1973-01-03 | 1974-04-09 | Honeywell Inf Systems | Technique for detecting memory failures and to provide for automatically for reconfiguration of the memory modules of a memory system |
US3864670A (en) * | 1970-09-30 | 1975-02-04 | Yokogawa Electric Works Ltd | Dual computer system with signal exchange system |
US3889237A (en) * | 1973-11-16 | 1975-06-10 | Sperry Rand Corp | Common storage controller for dual processor system |
US3979726A (en) * | 1974-04-10 | 1976-09-07 | Honeywell Information Systems, Inc. | Apparatus for selectively clearing a cache store in a processor having segmentation and paging |
US4020466A (en) * | 1974-07-05 | 1977-04-26 | Ibm Corporation | Memory hierarchy system with journaling and copy back |
US4044337A (en) * | 1975-12-23 | 1977-08-23 | International Business Machines Corporation | Instruction retry mechanism for a data processing system |
US4164017A (en) * | 1974-04-17 | 1979-08-07 | National Research Development Corporation | Computer systems |
JPS5541528A (en) * | 1978-09-18 | 1980-03-24 | Fujitsu Ltd | Dual file system |
JPS55115121A (en) * | 1979-02-28 | 1980-09-04 | Nec Corp | Input and output control unit possible for duplicated recording |
US4228496A (en) * | 1976-09-07 | 1980-10-14 | Tandem Computers Incorporated | Multiprocessor system |
US4373179A (en) * | 1978-06-26 | 1983-02-08 | Fujitsu Limited | Dynamic address translation system |
JPS5831651A (en) * | 1981-08-20 | 1983-02-24 | Nec Corp | Restart processing system for electronic exchanger |
US4393500A (en) * | 1979-09-04 | 1983-07-12 | Fujitsu Fanuc Limited | Method of modifying data stored in non-volatile memory and testing for power failure occurring during modification |
US4403284A (en) * | 1980-11-24 | 1983-09-06 | Texas Instruments Incorporated | Microprocessor which detects leading 1 bit of instruction to obtain microcode entry point address |
US4413327A (en) * | 1970-06-09 | 1983-11-01 | The United States Of America As Represented By The Secretary Of The Navy | Radiation circumvention technique |
US4426682A (en) * | 1981-05-22 | 1984-01-17 | Harris Corporation | Fast cache flush mechanism |
US4435762A (en) * | 1981-03-06 | 1984-03-06 | International Business Machines Corporation | Buffered peripheral subsystems |
WO1984002409A1 (en) * | 1982-12-09 | 1984-06-21 | Sequoia Systems Inc | Memory backup system |
US4459658A (en) * | 1982-02-26 | 1984-07-10 | Bell Telephone Laboratories Incorporated | Technique for enabling operation of a computer system with a consistent state of a linked list data structure after a main memory failure |
US4484273A (en) * | 1982-09-03 | 1984-11-20 | Sequoia Systems, Inc. | Modular computer system |
US4503534A (en) * | 1982-06-30 | 1985-03-05 | Intel Corporation | Apparatus for redundant operation of modules in a multiprocessing system |
US4509554A (en) * | 1982-11-08 | 1985-04-09 | Failla William G | High and low pressure, quick-disconnect coupling |
US4566106A (en) * | 1982-01-29 | 1986-01-21 | Pitney Bowes Inc. | Electronic postage meter having redundant memory |
US4703481A (en) * | 1985-08-16 | 1987-10-27 | Hewlett-Packard Company | Method and apparatus for fault recovery within a computing system |
EP0260625A1 (en) * | 1986-09-19 | 1988-03-23 | Asea Ab | Method for bumpless changeover from active units to back-up units in computer equipment and a device for carrying out the method |
US4734855A (en) * | 1983-10-17 | 1988-03-29 | Inria-Institut National De Recherche En Informatique Et En Automatique | Apparatus and method for fast and stable data storage |
US4740969A (en) * | 1986-06-27 | 1988-04-26 | Hewlett-Packard Company | Method and apparatus for recovering from hardware faults |
FR2606184A1 (en) * | 1986-10-31 | 1988-05-06 | Thomson Csf | RECONFIGURABLE CALCULATION DEVICE |
US4751639A (en) * | 1985-06-24 | 1988-06-14 | Ncr Corporation | Virtual command rollback in a fault tolerant data processing system |
EP0299511A2 (en) * | 1987-07-15 | 1989-01-18 | Fujitsu Limited | Hot standby memory copy system |
US4819232A (en) * | 1985-12-17 | 1989-04-04 | Bbc Brown, Boveri & Company, Limited | Fault-tolerant multiprocessor arrangement |
US4819154A (en) * | 1982-12-09 | 1989-04-04 | Sequoia Systems, Inc. | Memory back up system with one cache memory and two physically separated main memories |
US4823261A (en) * | 1986-11-24 | 1989-04-18 | International Business Machines Corp. | Multiprocessor system for updating status information through flip-flopping read version and write version of checkpoint data |
US4905196A (en) * | 1984-04-26 | 1990-02-27 | Bbc Brown, Boveri & Company Ltd. | Method and storage device for saving the computer status during interrupt |
US4912707A (en) * | 1988-08-23 | 1990-03-27 | International Business Machines Corporation | Checkpoint retry mechanism |
US4924466A (en) * | 1988-06-30 | 1990-05-08 | International Business Machines Corp. | Direct hardware error identification method and apparatus for error recovery in pipelined processing areas of a computer system |
US4958273A (en) * | 1987-08-26 | 1990-09-18 | International Business Machines Corporation | Multiprocessor system architecture with high availability |
US4959774A (en) * | 1984-07-06 | 1990-09-25 | Ampex Corporation | Shadow memory system for storing variable backup blocks in consecutive time periods |
US4965719A (en) * | 1988-02-16 | 1990-10-23 | International Business Machines Corporation | Method for lock management, page coherency, and asynchronous writing of changed pages to shared external store in a distributed computing system |
US4996687A (en) * | 1988-10-11 | 1991-02-26 | Honeywell Inc. | Fault recovery mechanism, transparent to digital system function |
EP0441087A1 (en) * | 1990-02-08 | 1991-08-14 | International Business Machines Corporation | Checkpointing mechanism for fault-tolerant systems |
EP0457308A2 (en) * | 1990-05-18 | 1991-11-21 | Fujitsu Limited | Data processing system having an input/output path disconnecting mechanism and method for controlling the data processing system |
DE4136729A1 (en) * | 1990-11-05 | 1992-05-21 | Mitsubishi Electric Corp | Cache control unit for fault tolerant computer system |
US5157663A (en) * | 1990-09-24 | 1992-10-20 | Novell, Inc. | Fault tolerant computer system |
US5214652A (en) * | 1991-03-26 | 1993-05-25 | International Business Machines Corporation | Alternate processor continuation of task of failed processor |
US5239637A (en) * | 1989-06-30 | 1993-08-24 | Digital Equipment Corporation | Digital data management system for maintaining consistency of data in a shadow set |
US5247618A (en) * | 1989-06-30 | 1993-09-21 | Digital Equipment Corporation | Transferring data in a digital data processing system |
US5263144A (en) * | 1990-06-29 | 1993-11-16 | Digital Equipment Corporation | Method and apparatus for sharing data between processors in a computer system |
US5269017A (en) * | 1991-08-29 | 1993-12-07 | International Business Machines Corporation | Type 1, 2 and 3 retry and checkpointing |
US5271013A (en) * | 1990-05-09 | 1993-12-14 | Unisys Corporation | Fault tolerant computer system |
US5276848A (en) * | 1988-06-28 | 1994-01-04 | International Business Machines Corporation | Shared two level cache including apparatus for maintaining storage consistency |
US5293613A (en) * | 1991-08-29 | 1994-03-08 | International Business Machines Corporation | Recovery control register |
US5301309A (en) * | 1989-04-28 | 1994-04-05 | Kabushiki Kaisha Toshiba | Distributed processing system with checkpoint restart facilities wherein checkpoint data is updated only if all processors were able to collect new checkpoint data |
US5313647A (en) * | 1991-09-20 | 1994-05-17 | Kendall Square Research Corporation | Digital data processor with improved checkpointing and forking |
US5325517A (en) * | 1989-05-17 | 1994-06-28 | International Business Machines Corporation | Fault tolerant data processing system |
US5325519A (en) * | 1991-10-18 | 1994-06-28 | Texas Microsystems, Inc. | Fault tolerant computer with archival rollback capabilities |
US5327532A (en) * | 1990-05-16 | 1994-07-05 | International Business Machines Corporation | Coordinated sync point management of protected resources |
US5363503A (en) * | 1992-01-22 | 1994-11-08 | Unisys Corporation | Fault tolerant computer system with provision for handling external events |
US5369757A (en) * | 1991-06-18 | 1994-11-29 | Digital Equipment Corporation | Recovery logging in the presence of snapshot files by ordering of buffer pool flushing |
US5381544A (en) * | 1991-01-22 | 1995-01-10 | Hitachi, Ltd. | Copyback memory system and cache memory controller which permits access while error recovery operations are performed |
US5394542A (en) * | 1992-03-30 | 1995-02-28 | International Business Machines Corporation | Clearing data objects used to maintain state information for shared data at a local complex when at least one message path to the local complex cannot be recovered |
US5398331A (en) * | 1992-07-08 | 1995-03-14 | International Business Machines Corporation | Shared storage controller for dual copy shared data |
US5408649A (en) * | 1993-04-30 | 1995-04-18 | Quotron Systems, Inc. | Distributed data access system including a plurality of database access processors with one-for-N redundancy |
US5408636A (en) * | 1991-06-24 | 1995-04-18 | Compaq Computer Corp. | System for flushing first and second caches upon detection of a write operation to write protected areas |
US5418940A (en) * | 1993-08-04 | 1995-05-23 | International Business Machines Corporation | Method and means for detecting partial page writes and avoiding initializing new pages on DASD in a transaction management system environment |
US5418916A (en) * | 1988-06-30 | 1995-05-23 | International Business Machines | Central processing unit checkpoint retry for store-in and store-through cache systems |
US5420996A (en) * | 1990-04-27 | 1995-05-30 | Kabushiki Kaisha Toshiba | Data processing system having selective data save and address translation mechanism utilizing CPU idle period |
US5448719A (en) * | 1992-06-05 | 1995-09-05 | Compaq Computer Corp. | Method and apparatus for maintaining and retrieving live data in a posted write cache in case of power failure |
US5463733A (en) * | 1993-06-14 | 1995-10-31 | International Business Machines Corporation | Failure recovery apparatus and method for distributed processing shared resource control |
US5485585A (en) * | 1992-09-18 | 1996-01-16 | International Business Machines Corporation | Personal computer with alternate system controller and register for identifying active system controller |
US5488716A (en) * | 1991-10-28 | 1996-01-30 | Digital Equipment Corporation | Fault tolerant computer system with shadow virtual processor |
US5495590A (en) * | 1991-08-29 | 1996-02-27 | International Business Machines Corporation | Checkpoint synchronization with instruction overlap enabled |
US5504861A (en) * | 1994-02-22 | 1996-04-02 | International Business Machines Corporation | Remote data duplexing |
US5530801A (en) * | 1990-10-01 | 1996-06-25 | Fujitsu Limited | Data storing apparatus and method for a data processing system |
US5530946A (en) * | 1994-10-28 | 1996-06-25 | Dell Usa, L.P. | Processor failure detection and recovery circuit in a dual processor computer system and method of operation thereof |
US5557735A (en) * | 1994-07-21 | 1996-09-17 | Motorola, Inc. | Communication system for a network and method for configuring a controller in a communication network |
US5566297A (en) * | 1994-06-16 | 1996-10-15 | International Business Machines Corporation | Non-disruptive recovery from file server failure in a highly available file system for clustered computing environments |
US5568380A (en) * | 1993-08-30 | 1996-10-22 | International Business Machines Corporation | Shadow register file for instruction rollback |
US5574874A (en) * | 1992-11-03 | 1996-11-12 | Tolsys Limited | Method for implementing a checkpoint between pairs of memory locations using two indicators to indicate the status of each associated pair of memory locations |
US5583987A (en) * | 1994-06-29 | 1996-12-10 | Mitsubishi Denki Kabushiki Kaisha | Method and apparatus for initializing a multiprocessor system while resetting defective CPU's detected during operation thereof |
US5630047A (en) * | 1995-09-12 | 1997-05-13 | Lucent Technologies Inc. | Method for software error recovery using consistent global checkpoints |
US5644742A (en) * | 1995-02-14 | 1997-07-01 | Hal Computer Systems, Inc. | Processor structure and method for a time-out checkpoint |
US5649152A (en) * | 1994-10-13 | 1997-07-15 | Vinca Corporation | Method and system for providing a static snapshot of data stored on a mass storage system |
-
1995
- 1995-06-07 AU AU26630/95A patent/AU2663095A/en not_active Abandoned
- 1995-06-07 EP EP95921611A patent/EP0764302B1/en not_active Expired - Lifetime
- 1995-06-07 DE DE69506404T patent/DE69506404T2/en not_active Expired - Fee Related
- 1995-06-07 JP JP8502301A patent/JPH10506483A/en active Pending
- 1995-06-07 WO PCT/US1995/007168 patent/WO1995034860A1/en active IP Right Grant
-
1996
- 1996-07-02 US US08/674,660 patent/US5787243A/en not_active Expired - Lifetime
Patent Citations (94)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3588829A (en) * | 1968-11-14 | 1971-06-28 | Ibm | Integrated memory system with block transfer to a buffer store |
FR2028517A1 (en) * | 1969-01-17 | 1970-10-09 | Plessey Btr Ltd | |
US4413327A (en) * | 1970-06-09 | 1983-11-01 | The United States Of America As Represented By The Secretary Of The Navy | Radiation circumvention technique |
US3864670A (en) * | 1970-09-30 | 1975-02-04 | Yokogawa Electric Works Ltd | Dual computer system with signal exchange system |
US3761881A (en) * | 1971-06-30 | 1973-09-25 | Ibm | Translation storage scheme for virtual memory system |
US3736566A (en) * | 1971-08-18 | 1973-05-29 | Ibm | Central processing unit with hardware controlled checkpoint and retry facilities |
US3803560A (en) * | 1973-01-03 | 1974-04-09 | Honeywell Inf Systems | Technique for detecting memory failures and to provide for automatically for reconfiguration of the memory modules of a memory system |
US3889237A (en) * | 1973-11-16 | 1975-06-10 | Sperry Rand Corp | Common storage controller for dual processor system |
US3979726A (en) * | 1974-04-10 | 1976-09-07 | Honeywell Information Systems, Inc. | Apparatus for selectively clearing a cache store in a processor having segmentation and paging |
US4164017A (en) * | 1974-04-17 | 1979-08-07 | National Research Development Corporation | Computer systems |
US4020466A (en) * | 1974-07-05 | 1977-04-26 | Ibm Corporation | Memory hierarchy system with journaling and copy back |
US4044337A (en) * | 1975-12-23 | 1977-08-23 | International Business Machines Corporation | Instruction retry mechanism for a data processing system |
US4228496A (en) * | 1976-09-07 | 1980-10-14 | Tandem Computers Incorporated | Multiprocessor system |
US4817091A (en) * | 1976-09-07 | 1989-03-28 | Tandem Computers Incorporated | Fault-tolerant multiprocessor system |
US4373179A (en) * | 1978-06-26 | 1983-02-08 | Fujitsu Limited | Dynamic address translation system |
JPS5541528A (en) * | 1978-09-18 | 1980-03-24 | Fujitsu Ltd | Dual file system |
JPS55115121A (en) * | 1979-02-28 | 1980-09-04 | Nec Corp | Input and output control unit possible for duplicated recording |
US4393500A (en) * | 1979-09-04 | 1983-07-12 | Fujitsu Fanuc Limited | Method of modifying data stored in non-volatile memory and testing for power failure occurring during modification |
US4403284A (en) * | 1980-11-24 | 1983-09-06 | Texas Instruments Incorporated | Microprocessor which detects leading 1 bit of instruction to obtain microcode entry point address |
US4435762A (en) * | 1981-03-06 | 1984-03-06 | International Business Machines Corporation | Buffered peripheral subsystems |
US4426682A (en) * | 1981-05-22 | 1984-01-17 | Harris Corporation | Fast cache flush mechanism |
JPS5831651A (en) * | 1981-08-20 | 1983-02-24 | Nec Corp | Restart processing system for electronic exchanger |
US4566106A (en) * | 1982-01-29 | 1986-01-21 | Pitney Bowes Inc. | Electronic postage meter having redundant memory |
US4459658A (en) * | 1982-02-26 | 1984-07-10 | Bell Telephone Laboratories Incorporated | Technique for enabling operation of a computer system with a consistent state of a linked list data structure after a main memory failure |
US4503534A (en) * | 1982-06-30 | 1985-03-05 | Intel Corporation | Apparatus for redundant operation of modules in a multiprocessing system |
US4484273A (en) * | 1982-09-03 | 1984-11-20 | Sequoia Systems, Inc. | Modular computer system |
US4509554A (en) * | 1982-11-08 | 1985-04-09 | Failla William G | High and low pressure, quick-disconnect coupling |
WO1984002409A1 (en) * | 1982-12-09 | 1984-06-21 | Sequoia Systems Inc | Memory backup system |
US4654819A (en) * | 1982-12-09 | 1987-03-31 | Sequoia Systems, Inc. | Memory back-up system |
US4819154A (en) * | 1982-12-09 | 1989-04-04 | Sequoia Systems, Inc. | Memory back up system with one cache memory and two physically separated main memories |
US4734855A (en) * | 1983-10-17 | 1988-03-29 | Inria-Institut National De Recherche En Informatique Et En Automatique | Apparatus and method for fast and stable data storage |
US4905196A (en) * | 1984-04-26 | 1990-02-27 | Bbc Brown, Boveri & Company Ltd. | Method and storage device for saving the computer status during interrupt |
US4959774A (en) * | 1984-07-06 | 1990-09-25 | Ampex Corporation | Shadow memory system for storing variable backup blocks in consecutive time periods |
US4751639A (en) * | 1985-06-24 | 1988-06-14 | Ncr Corporation | Virtual command rollback in a fault tolerant data processing system |
US4703481A (en) * | 1985-08-16 | 1987-10-27 | Hewlett-Packard Company | Method and apparatus for fault recovery within a computing system |
US4819232A (en) * | 1985-12-17 | 1989-04-04 | Bbc Brown, Boveri & Company, Limited | Fault-tolerant multiprocessor arrangement |
US4740969A (en) * | 1986-06-27 | 1988-04-26 | Hewlett-Packard Company | Method and apparatus for recovering from hardware faults |
US4941087A (en) * | 1986-09-19 | 1990-07-10 | Asea Aktiebolag | System for bumpless changeover between active units and backup units by establishing rollback points and logging write and read operations |
EP0260625A1 (en) * | 1986-09-19 | 1988-03-23 | Asea Ab | Method for bumpless changeover from active units to back-up units in computer equipment and a device for carrying out the method |
FR2606184A1 (en) * | 1986-10-31 | 1988-05-06 | Thomson Csf | RECONFIGURABLE CALCULATION DEVICE |
US4823261A (en) * | 1986-11-24 | 1989-04-18 | International Business Machines Corp. | Multiprocessor system for updating status information through flip-flopping read version and write version of checkpoint data |
EP0299511A2 (en) * | 1987-07-15 | 1989-01-18 | Fujitsu Limited | Hot standby memory copy system |
US5123099A (en) * | 1987-07-15 | 1992-06-16 | Fujitsu Ltd. | Hot standby memory copy system |
US4958273A (en) * | 1987-08-26 | 1990-09-18 | International Business Machines Corporation | Multiprocessor system architecture with high availability |
US4965719A (en) * | 1988-02-16 | 1990-10-23 | International Business Machines Corporation | Method for lock management, page coherency, and asynchronous writing of changed pages to shared external store in a distributed computing system |
US5276848A (en) * | 1988-06-28 | 1994-01-04 | International Business Machines Corporation | Shared two level cache including apparatus for maintaining storage consistency |
US4924466A (en) * | 1988-06-30 | 1990-05-08 | International Business Machines Corp. | Direct hardware error identification method and apparatus for error recovery in pipelined processing areas of a computer system |
US5418916A (en) * | 1988-06-30 | 1995-05-23 | International Business Machines | Central processing unit checkpoint retry for store-in and store-through cache systems |
US4912707A (en) * | 1988-08-23 | 1990-03-27 | International Business Machines Corporation | Checkpoint retry mechanism |
US4996687A (en) * | 1988-10-11 | 1991-02-26 | Honeywell Inc. | Fault recovery mechanism, transparent to digital system function |
US5301309A (en) * | 1989-04-28 | 1994-04-05 | Kabushiki Kaisha Toshiba | Distributed processing system with checkpoint restart facilities wherein checkpoint data is updated only if all processors were able to collect new checkpoint data |
US5325517A (en) * | 1989-05-17 | 1994-06-28 | International Business Machines Corporation | Fault tolerant data processing system |
US5239637A (en) * | 1989-06-30 | 1993-08-24 | Digital Equipment Corporation | Digital data management system for maintaining consistency of data in a shadow set |
US5247618A (en) * | 1989-06-30 | 1993-09-21 | Digital Equipment Corporation | Transferring data in a digital data processing system |
US5235700A (en) * | 1990-02-08 | 1993-08-10 | International Business Machines Corporation | Checkpointing mechanism for fault-tolerant systems |
EP0441087A1 (en) * | 1990-02-08 | 1991-08-14 | International Business Machines Corporation | Checkpointing mechanism for fault-tolerant systems |
US5420996A (en) * | 1990-04-27 | 1995-05-30 | Kabushiki Kaisha Toshiba | Data processing system having selective data save and address translation mechanism utilizing CPU idle period |
US5271013A (en) * | 1990-05-09 | 1993-12-14 | Unisys Corporation | Fault tolerant computer system |
US5327532A (en) * | 1990-05-16 | 1994-07-05 | International Business Machines Corporation | Coordinated sync point management of protected resources |
EP0457308A2 (en) * | 1990-05-18 | 1991-11-21 | Fujitsu Limited | Data processing system having an input/output path disconnecting mechanism and method for controlling the data processing system |
US5263144A (en) * | 1990-06-29 | 1993-11-16 | Digital Equipment Corporation | Method and apparatus for sharing data between processors in a computer system |
US5157663A (en) * | 1990-09-24 | 1992-10-20 | Novell, Inc. | Fault tolerant computer system |
US5530801A (en) * | 1990-10-01 | 1996-06-25 | Fujitsu Limited | Data storing apparatus and method for a data processing system |
DE4136729A1 (en) * | 1990-11-05 | 1992-05-21 | Mitsubishi Electric Corp | Cache control unit for fault tolerant computer system |
US5381544A (en) * | 1991-01-22 | 1995-01-10 | Hitachi, Ltd. | Copyback memory system and cache memory controller which permits access while error recovery operations are performed |
US5214652A (en) * | 1991-03-26 | 1993-05-25 | International Business Machines Corporation | Alternate processor continuation of task of failed processor |
US5369757A (en) * | 1991-06-18 | 1994-11-29 | Digital Equipment Corporation | Recovery logging in the presence of snapshot files by ordering of buffer pool flushing |
US5408636A (en) * | 1991-06-24 | 1995-04-18 | Compaq Computer Corp. | System for flushing first and second caches upon detection of a write operation to write protected areas |
US5293613A (en) * | 1991-08-29 | 1994-03-08 | International Business Machines Corporation | Recovery control register |
US5495587A (en) * | 1991-08-29 | 1996-02-27 | International Business Machines Corporation | Method for processing checkpoint instructions to allow concurrent execution of overlapping instructions |
US5495590A (en) * | 1991-08-29 | 1996-02-27 | International Business Machines Corporation | Checkpoint synchronization with instruction overlap enabled |
US5269017A (en) * | 1991-08-29 | 1993-12-07 | International Business Machines Corporation | Type 1, 2 and 3 retry and checkpointing |
US5313647A (en) * | 1991-09-20 | 1994-05-17 | Kendall Square Research Corporation | Digital data processor with improved checkpointing and forking |
US5325519A (en) * | 1991-10-18 | 1994-06-28 | Texas Microsystems, Inc. | Fault tolerant computer with archival rollback capabilities |
US5488716A (en) * | 1991-10-28 | 1996-01-30 | Digital Equipment Corporation | Fault tolerant computer system with shadow virtual processor |
US5363503A (en) * | 1992-01-22 | 1994-11-08 | Unisys Corporation | Fault tolerant computer system with provision for handling external events |
US5394542A (en) * | 1992-03-30 | 1995-02-28 | International Business Machines Corporation | Clearing data objects used to maintain state information for shared data at a local complex when at least one message path to the local complex cannot be recovered |
US5448719A (en) * | 1992-06-05 | 1995-09-05 | Compaq Computer Corp. | Method and apparatus for maintaining and retrieving live data in a posted write cache in case of power failure |
US5398331A (en) * | 1992-07-08 | 1995-03-14 | International Business Machines Corporation | Shared storage controller for dual copy shared data |
US5485585A (en) * | 1992-09-18 | 1996-01-16 | International Business Machines Corporation | Personal computer with alternate system controller and register for identifying active system controller |
US5574874A (en) * | 1992-11-03 | 1996-11-12 | Tolsys Limited | Method for implementing a checkpoint between pairs of memory locations using two indicators to indicate the status of each associated pair of memory locations |
US5408649A (en) * | 1993-04-30 | 1995-04-18 | Quotron Systems, Inc. | Distributed data access system including a plurality of database access processors with one-for-N redundancy |
US5463733A (en) * | 1993-06-14 | 1995-10-31 | International Business Machines Corporation | Failure recovery apparatus and method for distributed processing shared resource control |
US5418940A (en) * | 1993-08-04 | 1995-05-23 | International Business Machines Corporation | Method and means for detecting partial page writes and avoiding initializing new pages on DASD in a transaction management system environment |
US5568380A (en) * | 1993-08-30 | 1996-10-22 | International Business Machines Corporation | Shadow register file for instruction rollback |
US5504861A (en) * | 1994-02-22 | 1996-04-02 | International Business Machines Corporation | Remote data duplexing |
US5566297A (en) * | 1994-06-16 | 1996-10-15 | International Business Machines Corporation | Non-disruptive recovery from file server failure in a highly available file system for clustered computing environments |
US5583987A (en) * | 1994-06-29 | 1996-12-10 | Mitsubishi Denki Kabushiki Kaisha | Method and apparatus for initializing a multiprocessor system while resetting defective CPU's detected during operation thereof |
US5557735A (en) * | 1994-07-21 | 1996-09-17 | Motorola, Inc. | Communication system for a network and method for configuring a controller in a communication network |
US5649152A (en) * | 1994-10-13 | 1997-07-15 | Vinca Corporation | Method and system for providing a static snapshot of data stored on a mass storage system |
US5530946A (en) * | 1994-10-28 | 1996-06-25 | Dell Usa, L.P. | Processor failure detection and recovery circuit in a dual processor computer system and method of operation thereof |
US5644742A (en) * | 1995-02-14 | 1997-07-01 | Hal Computer Systems, Inc. | Processor structure and method for a time-out checkpoint |
US5649136A (en) * | 1995-02-14 | 1997-07-15 | Hal Computer Systems, Inc. | Processor structure and method for maintaining and restoring precise state at any instruction boundary |
US5630047A (en) * | 1995-09-12 | 1997-05-13 | Lucent Technologies Inc. | Method for software error recovery using consistent global checkpoints |
Non-Patent Citations (25)
Title |
---|
1991 IEEE, IACOPONI, Hardware Assisted Real Time Rollback in the Advanced Fault Tolerant Data Processor; pp. 269 274. * |
1991 IEEE, IACOPONI, 'Hardware Assisted Real-Time Rollback in the Advanced Fault-Tolerant Data Processor; pp. 269-274. |
A.Feridun and K.Shin, "A Fault-Tolerant Multiprocessor System with Rollback Recovery Capabilities," 1981 IEEE Transactions on Computers, pp. 283-298. |
A.Feridun and K.Shin, A Fault Tolerant Multiprocessor System with Rollback Recovery Capabilities, 1981 IEEE Transactions on Computers, pp. 283 298. * |
C.Kubiak et al., "Penelope: A Recovery Mechanism for Transient Hardware Failures and Software Errors," 1982 IEEE Transactions on Computers, pp. 127-133. |
C.Kubiak et al., Penelope: A Recovery Mechanism for Transient Hardware Failures and Software Errors, 1982 IEEE Transactions on Computers, pp. 127 133. * |
IBM Technical Disclosure Bulletin, vol. 34, No. 10A, Mar. 1992, Memory Recovery Facility for Computer Systems; pp. 341 342. * |
IBM Technical Disclosure Bulletin, vol. 34, No. 10A, Mar. 1992, Memory Recovery Facility for Computer Systems; pp. 341-342. |
IBM Technical Disclosure Bulletin, vol. 36, No. 08, Aug. 1993, "Efficient Cache Access Through Compression," pp. 161-165. |
IBM Technical Disclosure Bulletin, vol. 36, No. 08, Aug. 1993, Efficient Cache Access Through Compression, pp. 161 165. * |
IEEE, Dec. 1991, The Diffusion Model Based Task Remapping for Distributed Real pp. 2 11, by Morikazu Takegaki, et al. * |
IEEE, Dec. 1991, The Diffusion Model Based Task Remapping for Distributed Real-pp. 2-11, by Morikazu Takegaki, et al. |
IEEE, No. 2, Feb. 1988, "Sequoia: A Fault-Tolerant Tightly Coupled Multiprocessor Transaction Processing," pp. 37-45, by Philip A. Berstein. |
IEEE, No. 2, Feb. 1988, Sequoia: A Fault Tolerant Tightly Coupled Multiprocessor Transaction Processing, pp. 37 45, by Philip A. Berstein. * |
IEEE, vol. 4, No. 6, Dec. 1992, Incremental Recovery in Main Memory Database pp. 529 540, by Eliezer Levy, et al. * |
IEEE, vol. 4, No. 6, Dec. 1992, Incremental Recovery in Main Memory Database pp. 529-540, by Eliezer Levy, et al. |
International Search Report as cited in PCT Application No. PCT/US95/07168, dated Oct. 11, 1995. * |
M. Banatre, A. Gefflaut, C. Morin, "Scalable Shared Memory Multi-Processors: Some Ideas to Make Them Reliable". |
M. Banatre, A. Gefflaut, C. Morin, Scalable Shared Memory Multi Processors: Some Ideas to Make Them Reliable . * |
N.Bowen and D.Pradhan, "Processor-and Memory-Based Checkpoint and Rollback Recovery," 1993 IEEE Transactions on Computers, pp. 22-30. |
N.Bowen and D.Pradhan, Processor and Memory Based Checkpoint and Rollback Recovery, 1993 IEEE Transactions on Computers, pp. 22 30. * |
P.Lee et al., "A Recovery Cache for the PDP-11," IEEE Transactions on Computers, vol. C-29, No. 6, Jun. 1980, pp. 546-549. |
P.Lee et al., A Recovery Cache for the PDP 11, IEEE Transactions on Computers, vol. C 29, No. 6, Jun. 1980, pp. 546 549. * |
Y.Lee and K.Shin, "Rollback Propagation Detection and Performance Evaluation of FTMR2 M--A Fault Tolerant Multiprocessor," 1982 IEEE Transactions on Computers, pp. 171-180. |
Y.Lee and K.Shin, Rollback Propagation Detection and Performance Evaluation of FTMR 2 M A Fault Tolerant Multiprocessor, 1982 IEEE Transactions on Computers, pp. 171 180. * |
Cited By (69)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5958070A (en) * | 1995-11-29 | 1999-09-28 | Texas Micro, Inc. | Remote checkpoint memory system and protocol for fault-tolerant computer system |
US6279027B1 (en) | 1996-06-07 | 2001-08-21 | Kabushiki Kaisha Toshiba | Scheduler reducing cache failures after check points in a computer system having check-point restart functions |
US5968168A (en) * | 1996-06-07 | 1999-10-19 | Kabushiki Kaisha Toshiba | Scheduler reducing cache failures after check points in a computer system having check-point restart function |
US5991852A (en) * | 1996-10-28 | 1999-11-23 | Mti Technology Corporation | Cache ram using a secondary controller and switching circuit and improved chassis arrangement |
US5878250A (en) * | 1997-04-07 | 1999-03-02 | Altera Corporation | Circuitry for emulating asynchronous register loading functions |
US5948081A (en) * | 1997-12-22 | 1999-09-07 | Compaq Computer Corporation | System for flushing queued memory write request corresponding to a queued read request and all prior write requests with counter indicating requests to be flushed |
US6298431B1 (en) * | 1997-12-31 | 2001-10-02 | Intel Corporation | Banked shadowed register file |
US6898700B2 (en) * | 1998-03-31 | 2005-05-24 | Intel Corporation | Efficient saving and restoring state in task switching |
US6108707A (en) * | 1998-05-08 | 2000-08-22 | Apple Computer, Inc. | Enhanced file transfer operations in a computer system |
US6513108B1 (en) | 1998-06-29 | 2003-01-28 | Cisco Technology, Inc. | Programmable processing engine for efficiently processing transient data |
US6101599A (en) * | 1998-06-29 | 2000-08-08 | Cisco Technology, Inc. | System for context switching between processing elements in a pipeline of processing elements |
US7895412B1 (en) | 1998-06-29 | 2011-02-22 | Cisco Tehnology, Inc. | Programmable arrayed processing engine architecture for a network switch |
US6173386B1 (en) | 1998-12-14 | 2001-01-09 | Cisco Technology, Inc. | Parallel processor with debug capability |
US6920562B1 (en) | 1998-12-18 | 2005-07-19 | Cisco Technology, Inc. | Tightly coupled software protocol decode with hardware data encryption |
US6493795B1 (en) * | 1998-12-30 | 2002-12-10 | Emc Corporation | Data storage system |
US20030051113A1 (en) * | 1999-03-03 | 2003-03-13 | Beardsley Brent Cameron | Method and system for managing meta data |
US6513097B1 (en) | 1999-03-03 | 2003-01-28 | International Business Machines Corporation | Method and system for maintaining information about modified data in cache in a storage system for use during a system failure |
US6438661B1 (en) | 1999-03-03 | 2002-08-20 | International Business Machines Corporation | Method, system, and program for managing meta data in a storage system and rebuilding lost meta data in cache |
US20020138695A1 (en) * | 1999-03-03 | 2002-09-26 | Beardsley Brent Cameron | Method and system for recovery of meta data in a storage controller |
US6658542B2 (en) | 1999-03-03 | 2003-12-02 | International Business Machines Corporation | Method and system for caching data in a storage system |
US6988171B2 (en) | 1999-03-03 | 2006-01-17 | International Business Machines Corporation | Method and system for recovery of meta data in a storage controller |
US6981102B2 (en) | 1999-03-03 | 2005-12-27 | International Business Machines Corporation | Method and system for managing meta data |
US6502174B1 (en) | 1999-03-03 | 2002-12-31 | International Business Machines Corporation | Method and system for managing meta data |
US6622263B1 (en) | 1999-06-30 | 2003-09-16 | Jack Justin Stiffler | Method and apparatus for achieving system-directed checkpointing without specialized hardware assistance |
US6662252B1 (en) | 1999-11-03 | 2003-12-09 | Cisco Technology, Inc. | Group and virtual locking mechanism for inter processor synchronization |
US20050165966A1 (en) * | 2000-03-28 | 2005-07-28 | Silvano Gai | Method and apparatus for high-speed parsing of network messages |
US6505269B1 (en) | 2000-05-16 | 2003-01-07 | Cisco Technology, Inc. | Dynamic addressing mapping to eliminate memory resource contention in a symmetric multiprocessor system |
KR100365891B1 (en) * | 2000-12-13 | 2002-12-27 | 한국전자통신연구원 | Backup/recovery Apparatus and method for non-log processing of real-time main memory database system |
US6862668B2 (en) | 2002-02-25 | 2005-03-01 | International Business Machines Corporation | Method and apparatus for using cache coherency locking to facilitate on-line volume expansion in a multi-controller storage system |
US7930697B2 (en) | 2002-02-25 | 2011-04-19 | International Business Machines Corporation | Apparatus for cooperative distributed task management in a storage subsystem with multiple controllers using cache locking |
US7480909B2 (en) | 2002-02-25 | 2009-01-20 | International Business Machines Corporation | Method and apparatus for cooperative distributed task management in a storage subsystem with multiple controllers using cache locking |
US20090119666A1 (en) * | 2002-02-25 | 2009-05-07 | International Business Machines Corporation | Apparatus for cooperative distributed task management in a storage subsystem with multiple controllers using cache locking |
US20050076262A1 (en) * | 2003-09-23 | 2005-04-07 | Revivio, Inc. | Storage management device |
US7991748B2 (en) | 2003-09-23 | 2011-08-02 | Symantec Corporation | Virtual data store creation and use |
US7904428B2 (en) | 2003-09-23 | 2011-03-08 | Symantec Corporation | Methods and apparatus for recording write requests directed to a data store |
US7725760B2 (en) | 2003-09-23 | 2010-05-25 | Symantec Operating Corporation | Data storage system |
US7272666B2 (en) | 2003-09-23 | 2007-09-18 | Symantec Operating Corporation | Storage management device |
US7725667B2 (en) | 2003-09-23 | 2010-05-25 | Symantec Operating Corporation | Method for identifying the time at which data was written to a data store |
US7287133B2 (en) | 2004-08-24 | 2007-10-23 | Symantec Operating Corporation | Systems and methods for providing a modification history for a location within a data store |
US7296008B2 (en) | 2004-08-24 | 2007-11-13 | Symantec Operating Corporation | Generation and use of a time map for accessing a prior image of a storage device |
US7409587B2 (en) | 2004-08-24 | 2008-08-05 | Symantec Operating Corporation | Recovering from storage transaction failures using checkpoints |
US8521973B2 (en) | 2004-08-24 | 2013-08-27 | Symantec Operating Corporation | Systems and methods for providing a modification history for a location within a data store |
US7239581B2 (en) | 2004-08-24 | 2007-07-03 | Symantec Operating Corporation | Systems and methods for synchronizing the internal clocks of a plurality of processor modules |
US7827362B2 (en) | 2004-08-24 | 2010-11-02 | Symantec Corporation | Systems, apparatus, and methods for processing I/O requests |
US7730222B2 (en) | 2004-08-24 | 2010-06-01 | Symantec Operating System | Processing storage-related I/O requests using binary tree data structures |
USRE45632E1 (en) * | 2005-01-03 | 2015-07-28 | O'shantel Software L.L.C. | Memory-controller-embedded apparatus and procedure for achieving system-directed checkpointing without operating-system kernel support |
US20060242456A1 (en) * | 2005-04-26 | 2006-10-26 | Kondo Thomas J | Method and system of copying memory from a source processor to a target processor by duplicating memory writes |
US7590885B2 (en) * | 2005-04-26 | 2009-09-15 | Hewlett-Packard Development Company, L.P. | Method and system of copying memory from a source processor to a target processor by duplicating memory writes |
US20100064092A1 (en) * | 2005-05-19 | 2010-03-11 | Honeywell International Inc. | Interface for writing to memories having different write times |
US7698511B2 (en) | 2005-05-19 | 2010-04-13 | Honeywell International Inc. | Interface for writing to memories having different write times |
EP1734534A3 (en) * | 2005-05-19 | 2008-06-11 | Honeywell International Inc. | Interface between memories having different write times |
EP1734534A2 (en) | 2005-05-19 | 2006-12-20 | Honeywell International Inc. | Interface between memories having different write times |
US7536583B2 (en) | 2005-10-14 | 2009-05-19 | Symantec Operating Corporation | Technique for timeline compression in a data store |
US7840768B2 (en) * | 2005-12-13 | 2010-11-23 | Reliable Technologies, Inc. | Memory-controller-embedded apparatus and procedure for achieving system-directed checkpointing without operating-system kernel support |
US20100077164A1 (en) * | 2005-12-13 | 2010-03-25 | Jack Justin Stiffler | Memory-controller-embedded apparatus and procedure for achieving system-directed checkpointing without operating-system kernel support |
US8140801B2 (en) * | 2005-12-22 | 2012-03-20 | International Business Machines Corporation | Efficient and flexible memory copy operation |
US7506132B2 (en) * | 2005-12-22 | 2009-03-17 | International Business Machines Corporation | Validity of address ranges used in semi-synchronous memory copy operations |
US20070150675A1 (en) * | 2005-12-22 | 2007-06-28 | International Business Machines Corporation | Validity of address ranges used in semi-synchronous memory copy operations |
US20090182968A1 (en) * | 2005-12-22 | 2009-07-16 | International Business Machines Corp. | Validity of address ranges used in semi-synchronous memory copy operations |
US7882321B2 (en) * | 2005-12-22 | 2011-02-01 | International Business Machines Corporation | Validity of address ranges used in semi-synchronous memory copy operations |
US20080307182A1 (en) * | 2005-12-22 | 2008-12-11 | International Business Machines Corporation | Efficient and flexible memory copy operation |
US20070180312A1 (en) * | 2006-02-01 | 2007-08-02 | Avaya Technology Llc | Software duplication |
US20100192041A1 (en) * | 2009-01-23 | 2010-07-29 | Micron Technology, Inc. | Memory devices and methods for managing error regions |
US20160140044A1 (en) * | 2012-10-11 | 2016-05-19 | Soft Machines, Inc. | Systems and methods for non-blocking implementation of cache flush instructions |
US9842056B2 (en) * | 2012-10-11 | 2017-12-12 | Intel Corporation | Systems and methods for non-blocking implementation of cache flush instructions |
US10585804B2 (en) | 2012-10-11 | 2020-03-10 | Intel Corporation | Systems and methods for non-blocking implementation of cache flush instructions |
US9858151B1 (en) * | 2016-10-03 | 2018-01-02 | International Business Machines Corporation | Replaying processing of a restarted application |
US10540233B2 (en) | 2016-10-03 | 2020-01-21 | International Business Machines Corporation | Replaying processing of a restarted application |
US10896095B2 (en) | 2016-10-03 | 2021-01-19 | International Business Machines Corporation | Replaying processing of a restarted application |
Also Published As
Publication number | Publication date |
---|---|
WO1995034860A1 (en) | 1995-12-21 |
EP0764302A1 (en) | 1997-03-26 |
DE69506404D1 (en) | 1999-01-14 |
EP0764302B1 (en) | 1998-12-02 |
AU2663095A (en) | 1996-01-05 |
DE69506404T2 (en) | 1999-05-27 |
JPH10506483A (en) | 1998-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5787243A (en) | Main memory system and checkpointing protocol for fault-tolerant computer system | |
US5864657A (en) | Main memory system and checkpointing protocol for fault-tolerant computer system | |
EP0864126B1 (en) | Remote checkpoint memory system and method for fault-tolerant computer system | |
JP2916420B2 (en) | Checkpoint processing acceleration device and data processing method | |
JP4073464B2 (en) | Main memory system and checkpointing protocol for fault tolerant computer system using read buffer | |
US6148416A (en) | Memory update history storing apparatus and method for restoring contents of memory | |
US5133074A (en) | Deadlock resolution with cache snooping | |
US7055060B2 (en) | On-die mechanism for high-reliability processor | |
JP3086779B2 (en) | Memory state restoration device | |
US5673414A (en) | Snooping of I/O bus and invalidation of processor cache for memory data transfers between one I/O device and cacheable memory in another I/O device | |
JP2004046455A (en) | Information processor | |
JP3083786B2 (en) | Memory update history storage device and memory update history storage method | |
JP3348420B2 (en) | Information processing device with memory copy function | |
JPH0744459A (en) | Cache control method and cache controller | |
JP3862777B2 (en) | Duplex data matching method and duplex control device | |
KR100582782B1 (en) | How to keep cache consistent | |
JP2008171058A (en) | System controller, processor, information processing system, and information processing program | |
JPH10222423A (en) | Cache memory control system | |
JPH04195646A (en) | Multiplexed system | |
JP2000181738A (en) | Duplex system and memory control method | |
JPH10149307A (en) | Check point processing method and recording medium | |
JPH0469747A (en) | Arithmetic processor | |
JPH0944273A (en) | Storage device initializing circuit | |
JPH01231131A (en) | Dual synchronous system | |
JP2000501216A (en) | Main memory system and checkpointing protocol for fault-tolerant computer systems using read buffers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: TEXAS MICRO, INC., TEXAS Free format text: CHANGE OF NAME;ASSIGNOR:SEQUOIA SYSTEMS, INC.;REEL/FRAME:008744/0650 Effective date: 19970416 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FEPP | Fee payment procedure |
Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: RADISYS CPD, INC., OREGON Free format text: CHANGE OF NAME;ASSIGNOR:TEXAS MICRO, INC.;REEL/FRAME:011541/0866 Effective date: 19990913 |
|
AS | Assignment |
Owner name: RADISYS CORPORATION, OREGON Free format text: MERGER;ASSIGNOR:RADISYS CPD, INC.;REEL/FRAME:011862/0144 Effective date: 20010227 |
|
FEPP | Fee payment procedure |
Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Free format text: PAT HOLDER NO LONGER CLAIMS SMALL ENTITY STATUS, ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: STOL); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
REFU | Refund |
Free format text: REFUND - PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: R283); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Free format text: REFUND - 3.5 YR SURCHARGE - LATE PMT W/IN 6 MO, SMALL ENTITY (ORIGINAL EVENT CODE: R286); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
REMI | Maintenance fee reminder mailed | ||
FPAY | Fee payment |
Year of fee payment: 4 |
|
SULP | Surcharge for late payment | ||
FPAY | Fee payment |
Year of fee payment: 8 |
|
FPAY | Fee payment |
Year of fee payment: 12 |
|
AS | Assignment |
Owner name: HCP-FVG, LLC, NEW YORK Free format text: SECURITY INTEREST;ASSIGNORS:RADISYS CORPORATION;RADISYS INTERNATIONAL LLC;REEL/FRAME:044995/0671 Effective date: 20180103 |
|
AS | Assignment |
Owner name: MARQUETTE BUSINESS CREDIT, LLC, CALIFORNIA Free format text: SECURITY INTEREST;ASSIGNOR:RADISYS CORPORATION;REEL/FRAME:044540/0080 Effective date: 20180103 |