WO2006039713A9 - Configurable computing machine and related systems and methods - Google Patents

Configurable computing machine and related systems and methods

Info

Publication number
WO2006039713A9
WO2006039713A9 PCT/US2005/035818 US2005035818W WO2006039713A9 WO 2006039713 A9 WO2006039713 A9 WO 2006039713A9 US 2005035818 W US2005035818 W US 2005035818W WO 2006039713 A9 WO2006039713 A9 WO 2006039713A9
Authority
WO
WIPO (PCT)
Prior art keywords
circuit
pipeline
processor
operable
data
Prior art date
Application number
PCT/US2005/035818
Other languages
French (fr)
Other versions
WO2006039713A3 (en
WO2006039713A2 (en
Inventor
John Rapp
Scott Hellenbach
Chandan Mathur
Mark Jones
Joseph A Capizzi
Troy Cherasaro
Original Assignee
Lockheed Corp
John Rapp
Scott Hellenbach
Chandan Mathur
Mark Jones
Joseph A Capizzi
Troy Cherasaro
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lockheed Corp, John Rapp, Scott Hellenbach, Chandan Mathur, Mark Jones, Joseph A Capizzi, Troy Cherasaro filed Critical Lockheed Corp
Publication of WO2006039713A2 publication Critical patent/WO2006039713A2/en
Publication of WO2006039713A9 publication Critical patent/WO2006039713A9/en
Publication of WO2006039713A3 publication Critical patent/WO2006039713A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2038Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with a single idle spare processing component
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2028Failover techniques eliminating a faulty processor or activating a spare
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating

Definitions

  • a peer-vector computing machine which is described in the following U.S. Patent Publications, includes a pipeline accelerator that can often perform mathematical computations ten to one hundred times faster than a conventional processor-based computing machine can perform these computations: 2004/0133763; 2004/0181621 ; 2004/0136241 ; 2004/0170070; and, 2004/0130927, which are incorporated herein by reference.
  • the pipeline accelerator can often perform mathematical computations faster than a processor because unlike a processor, the accelerator processes data in a pipelined fashion while executing few, if any, software instructions.
  • a peer-vector computing machine may lack some of the popular features of a conventional processor-based computing machine.
  • a peer-vector computing machine may lack the ability to configure itself to operate with the installed hardware that composes the pipeline accelerator; and may lack the ability to reconfigure itself in response to a change in this hardware.
  • a conventional processor-based computing machine can configure its software and settings during its start-up routine to operate with the hardware installed in the machine, and can also reconfigure its software and settings in response to a change in this hardware. For example, assume that while the processor-based machine is "off 1 , one increases the amount of the machine's random-access memory (RAM).
  • RAM random-access memory
  • the machine detects the additional RAM, and reconfigures its operating system to recognize and exploit the additional RAM during subsequent operations.
  • the machine detects the additional RAM, and reconfigures its operating system to recognize and exploit the additional RAM during subsequent operations.
  • the machine detects the card and configures its operating system to recognize and allow a software application such as a web browser to use the card (the machine may need to download the card's driver via a CD-ROM or the internet). Consequently, to install new hardware in a typical processor-based machine, an operator merely inserts the hardware into the machine, which then configures or reconfigures the machine's software and settings without additional operator input.
  • a peer-vector machine may lack the ability to configure or reconfigure itself to operate the hardware that composes the pipeline accelerator. For example, assume that one wants the peer-vector machine to instantiate a pre-designed circuit on multiple programmable-logic integrated circuits (PLICs) such as field-programmable gate arrays (FPGAs), each of which is disposed on a respective pipeline unit of the pipeline accelerator. Typically, one manually generates configuration-firmware files for each of the PLICs, and loads these files into the machine's configuration memory. During a start-up routine, the peer- vector machine causes each of the PLICs to download a respective one of these files. Once the PLICs have downloaded these firmware files, the circuit is instantiated on the PLICs. But if one modifies the circuit, or modifies the type or number of pipeline units in the pipeline accelerator, then he may need to manually generate new configuration-firmware files and load them into the configuration memory before the machine can instantiate the modified circuit on the pipeline accelerator.
  • PLICs programmable-logic
  • the peer-vector computing machine may lack the ability to continue operating if a component of the machine fails.
  • Some conventional processor-based computing machines have redundant components that allow a machine to be fault tolerant, i.e., to continue operating when a component fails or otherwise exhibits a fault or causes a fault in the machine's operation.
  • a multi- processor-based computing machine may include a redundant processor that can "take over" for one of main processors if and when a main processor fails.
  • a peer-vector machine may have a lower level of fault tolerance than a fault-tolerant processor-based machine.
  • a computing machine includes programmable integrated circuits, a configuration registry, and a processor.
  • the registry stores a file that defines a circuit having portions, and the processor is, in response to the file, operable to instantiate one of the circuit portions on one of the programmable integrated circuits.
  • a computing machine comprises an electronic circuit operable to perform a function, a first programmable integrated circuit, and a first processor.
  • the first processor is operable to detect a failure of the electronic circuit and configure the programmable integrated circuit to perform the function of the electronic circuit in response to detecting the failure.
  • a computing machine By allowing a first type of circuit to take over for a failed second type of circuit, such a computing machine can be fault-tolerant without having redundant versions of each component.
  • a computing machine allows a programmable integrated circuit such as a field-programmable gate array (FPGA) to "take over" for a failed electronic circuit such as another FPGA, an ASIC, or a processor. Consequently, by allowing an FPGA to "take over" for an ASIC and for a processor, such a computing machine can omit a redundant ASIC and a redundant processor, and may thus allow a reduction in the cost and size of the computing machine.
  • a computing machine comprises a hardwired pipeline operable to perform a function and a processor operable to detect a failure of the pipeline and perform the function in response to detecting the failure.
  • a computing machine comprises a pipeline accelerator, a host processor coupled to the pipeline accelerator, and a redundant processor, a redundant pipeline unit, or both, coupled to the host processor and to the pipeline accelerator.
  • the computing machine may also include a system-restore server and a system-restore bus that allow the machine to periodically save the machine states in case of a failure.
  • Such a computing machine has a fault-tolerant scheme that is often more flexible than conventional schemes. For example, if the pipeline accelerator has more extra "space" than the host processor, then one can add to the computing machine one or more redundant pipeline units that can provide redundancy to both the pipeline and the host processor. Therefore, the computing machine can include redundancy for the host processor even though it has no redundant processing units. Likewise, if the host processor has more extra "space” than the pipeline accelerator, then one can add to the computing machine one or more redundant processing units that can provide redundancy to both the pipeline and the host processor. BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a schematic block diagram of a peer-vector computing machine according to an embodiment of the invention.
  • FIG. 2 is a schematic block diagram of a pipeline unit from the pipelined accelerator of FIG. 1 and including a PLIC according to an embodiment of the invention.
  • FIG. 3 is a block diagram of the circuitry that composes the interface-adapter and framework-services layers of the PLIC of FIG. 2 according to an embodiment of the invention.
  • FIG. 4 is a block diagram of the accelerator/host-processor- configuration registry of FIG. 1 according to an embodiment of the invention.
  • FIG. 5 is a diagram of a hardware-description file that describes, in a top-down fashion the layers of circuitry to be instantiated on a simple PLIC according to an embodiment of the invention.
  • FIG. 6 is a block diagram of the accelerator-template library of
  • FIG. 4 according to an embodiment of the invention.
  • FIG. 7 is a block diagram of the software-object library of FIG.
  • FIG. 8 is a block diagram of the circuit-definition library of FIG.
  • FIG. 9 is a block diagram of the accelerator-firmware library of
  • FIG. 4 according to an embodiment of the invention.
  • FIG. 10 is a functional block diagram of the host processor of FIG. 1 according to an embodiment of the invention.
  • FIG. 11 is a schematic block diagram of a circuit defined by a file in the circuit-definition library of FIGS. 4 and 8 for instantiation on the pipeline accelerator of FIG. 1 according to an embodiment of the invention.
  • FIG. 12 is a functional block diagram of the data paths between the PLICs of FIG. 11 according to an embodiment of the invention.
  • FIG. 13 is a schematic block diagram of the circuit of FIG. 11 instantiated on fewer PLICs according to an embodiment of the invention.
  • FIG. 14 is a functional block diagram of the data paths between the portions of the circuit of FIG. 11 instantiated on the pipeline accelerator of FIG. 1 and a software-application thread that the processing unit of FIG. 10 executes to perform the function of an un-instantiated portion of the circuit according to an embodiment of the invention.
  • FIG. 15 is block diagram of a peer-vector computing machine having redundancy according to an embodiment of the invention.
  • FIG. 16 is a block diagram of a peer-vector computing machine having a system-restore server and a system-restore bus according to an embodiment of the invention.
  • FIG. 17 is a block diagram of a hardwired pipeline that includes a save/restore circuit according to an embodiment of the invention.
  • FIG. 18 is a more-detailed block diagram of the hardwired pipeline of FIG. 17 according to an embodiment of the invention.
  • An accelerator-configuration manager that, according to embodiments of the invention, configures a peer-vector machine to operate with the hardware that composes the machine's pipeline accelerator and that reconfigures the machine to recognize and operate with newly modified accelerator hardware are discussed below in conjunctions with FIGS. 10 - 14.
  • an accelerator-configuration registry that, according to embodiments of the invention, facilitate the configuration manager's ability to configure and reconfigure the peer-vector machine is discussed below in conjunction with FIGS. 4 - 9.
  • FIGS. 1 - 3 But first is presented in conjunction with FIGS. 1 - 3 an overview of peer-vector-machine concepts that should facilitate the reader's understanding of the above-mentioned configuration manager, configuration registry, and fault-tolerant techniques.
  • FIG. 1 is a schematic block diagram of a computing machine
  • the peer-vector machine 10 which has a peer-vector architecture according to an embodiment of the invention.
  • the peer-vector machine 10 includes a pipelined accelerator 14, which is operable to process at least a portion of the data processed by the machine 10. Therefore, the host- processor 12 and the accelerator 14 are "peers" that can transfer data messages back and forth.
  • the accelerator 14 includes hardwired circuits (typically logic circuits) instantiated on one or more PLICs, it executes few, if any, program instructions, and thus for a given clock frequency, often performs mathematically intensive operations on data significantly faster than a bank of computer processors can.
  • the machine 10 has many of the same abilities as, but can often process data faster than, a conventional processor-based computing machine.
  • providing the accelerator 14 with a communication interface that is compatible with the interface of the host processor 12 facilitates the design and modification of the machine 10, particularly where the communication interface is an industry standard.
  • the accelerator 14 includes multiple pipeline units (not shown in FIG. 1) which are sometimes called daughter cards, providing each of these units with this compatible communication interface facilitates the design and modification of the accelerator, particularly where the communication interface is an industry standard.
  • the machine 10 may also provide other advantages as described in the following previously incorporated U.S. Patent Publication Nos.: 2004/0133763; 2004/0181621 ; 2004/0136241 ; 2004/0170070; and, 2004/0130927.
  • the peer-vector computing machine 10 includes a processor memory 16, an interface memory 18, a pipeline bus 20, a firmware memory 22, an optional raw-data input port 24, an optional processed-data output port 26, and an optional router 31.
  • the host processor 12 includes a processing unit 32 and a message handler 34
  • the processor memory 16 includes a processing-unit memory 36 and a handler memory 38, which respectively serve as both program and working memories for the processor unit and the message handler.
  • the processor memory 36 also includes an accelerator/host-processor-configuration registry 40 and a message-configuration registry 42.
  • the registry 40 stores configuration data that allows a configuration manager (not shown in FIG. 1) executed by the host processor 12 to configure the functioning of the accelerator 14 and, in some situations as discussed below in conjunction with FIGS. 10 - 14, the functioning of the host processor.
  • the pipelined accelerator 14 includes at least one pipeline unit
  • the firmware memory 22 stores the configuration-firmware files for the PLIC(s) of the accelerator 14. If the accelerator 14 is disposed on multiple PLICs, then these PLICs and their respective firmware memories may be disposed on multiple pipeline units.
  • the accelerator 14 and pipeline units are discussed further in previously incorporated U.S. Patent Application Publication Nos. 2004/0136241 , 2004/0181621 , and 2004/0130927. The pipeline units are also discussed below in conjunction with FIGS. 2 - 3.
  • the pipelined accelerator 14 receives data from one or more data-processor software applications running on the host processor 12, processes this data in a pipelined fashion with one or more logic circuits that perform one or more mathematical operations, and then returns the resulting data to the data-processing application(s).
  • the logic circuits execute few, if any, software instructions, they often process data one or more orders of magnitude faster than the host processor 12 can for a given clock frequency.
  • logic circuits are instantiated on one or more PLICs, one can often modify these circuits merely by modifying the firmware stored in the memory 52; that is, one can often modify these circuits without modifying the hardware components of the accelerator 14 or the interconnections between these components.
  • FIG. 2 is a schematic block diagram of a pipeline unit 50 of the pipeline accelerator 14 of FIG. 1 according to an embodiment of the invention.
  • the unit 50 includes a circuit board 52 on which are disposed the firmware memory 22, a platform-identification memory 54, a bus connector 56, a data memory 58, and a PLIC 60.
  • the firmware memory 22 stores the configuration-firmware file that the PLIC 60 downloads to instantiate one or more logic circuits, at least some of which compose the hardwired pipelines (44).
  • the platform memory 54 stores one or more values, i.e., platform identifiers, that respectively identify the one or more platforms with which the pipeline unit 50 is compatible.
  • a platform specifies a unique set of physical attributes that a pipeline unit may possess. Examples of these attributes include the number of external pins (not shown) on the PLIC 60, the width of the bus connector 56, the size of the PLIC, and the size of the data memory 58. Consequently, a pipeline unit 50 is compatible with a platform if the unit possesses all of the attributes that the platform specifies. So a pipeline unit 50 having a bus connector 56 with thirty two bits is incompatible with a platform that specifies a bus connector with sixty four bits. Some platforms may be compatible with the peer vector machine 10 (FIG.
  • the platform identifier(s) stored in the memory 54 may allow a configuration manager (not shown in FIG. 2) executed by the host processor 12 (FIG. 1) to determine whether the pipeline unit 50 is compatible with the platform(s) supported by the machine 10. And where the pipeline unit 50 is so compatible, the platform identifier(s) may also allow the configuration manager to determine how to configure the PLIC 60 or other portions of the pipeline unit as discussed below in conjunction with FIGS. 10 - 14.
  • the bus connector 56 is a physical connector that interfaces the PLIC 60, and perhaps other components of the pipeline unit 50, to the pipeline bus 20 (FIG. 1).
  • the data memory 58 acts as a buffer for storing data that the pipeline unit 50 receives from the host processor 12 (FIG. 1) and for providing this data to the PLIC 60.
  • the data memory 58 may also act as a buffer for storing data that the PLIC 60 generates for sending to the host processor 12, or as a working memory for the hardwired pipeline(s) 44.
  • Instantiated on the PLIC 60 are logic circuits that compose the hardwired pipeline(s) 44, and a hardware interface layer 62, which interfaces the hardwired pipeline(s) to the external pins (not shown) of the PLIC 60, and which thus interfaces the pipeline(s) to the pipeline bus 20 (via the connector 56), to the firmware and platform-identification memories 22 and 54, and to the data memory 58.
  • the topology of the interface layer 62 is primarily dependent upon the attributes specified by the platform(s) with which the pipeline unit 50 is compatible, one can often modify the pipeline(s) 44 without modifying the interface layer. For example, if a platform with which the unit 50 is compatible specifies a thirty- two bit bus, then the interface layer 62 provides a thirty-two-bit bus connection to the bus connector 56 regardless of the topology or other attributes of the pipeline(s) 44.
  • the hardware-interface layer 62 includes three circuit layers that are instantiated on the PLIC 60: an interface-adapter layer 70, a framework-services layer 72, and a communication layer 74, which is hereinafter called a communication shell.
  • the interface-adapter layer 70 includes circuitry, e.g., buffers and latches, that interface the framework-services layer 72 to the external pins (not shown) of the PLIC 60.
  • the framework-services layer 72 provides a set of services to the hardwired pipeline(s) 44 via the communication shell 74. For example, the layer 72 may synchronize data transfer between the pipeline(s) 44, the pipeline bus 20 (FIG. 1), and the data memory 58 (FIG. 2), and may control the sequence(s) in which the pipeline(s) operate.
  • the communication shell 74 includes circuitry, e.g., latches, that interface the framework-services layer 72 to the pipeline(s) 44.
  • the memory 54 may be omitted, and the platform identifier(s) may stored in the firmware memory 22, or by a jumper-configurable or hardwired circuit (not shown) disposed on the circuit board 52.
  • the framework-services layer 72 is shown as isolating the interface-adapter layer 70 from the communication shell 74, the interface-adapter layer may, at least at some circuit nodes, be directly coupled to the communication shell.
  • the communication shell 74 is shown as isolating the interface-adapter layer 70 and the framework-services layer 72 from the pipeline(s) 44, the interface- adapter layer or the framework-services layer may, at least at some circuit nodes, be directly coupled to the pipeline(s).
  • FIG. 3 is a schematic block diagram of the circuitry that composes the interface-adapter layer 70 and the framework-services layer 72 of FIG. 2 according to an embodiment of the invention.
  • a communication interface 80 and an optional industry-standard bus interface 82 compose the interface-adapter layer 70, and a controller 84, exception manager 86, and configuration manager 88 compose the framework-services layer 72.
  • the configuration manager 88 is local to the PLIC 60, and is thus different from the configuration manager executed by the host processor 12 as discussed above in conjunction with FIG. 1 and below in conjunction with FIGS. 10 - 14.
  • the communication interface 80 transfers data between a peer, such as the host processor 12 (FIG. 1) or another pipeline unit 50 (FIG.
  • the firmware memory 22 the platform-identifier memory 54, the data memory 58, and the following circuits instantiated within the PLIC 60: the hardwired pipeline(s) 44 (via the communication shell 74), the controller 84, the exception manager 86, and the configuration manager 88.
  • the optional industry-standard bus interface 82 couples the communication interface 80 to the bus connector 56.
  • the interfaces 80 and 82 may be merged such that the functionality of the interface 82 is included within the communication interface 80.
  • the controller 84 synchronizes the hardwired pipeline(s) 44 and monitors and controls the sequence in which it/they perform the respective data operations in response to communications, i.e., "events," from other peers.
  • a peer such as the host processor 12 may send an event to the pipeline unit 50 via the pipeline bus 20 to indicate that the peer has finished sending a block of data to the pipeline unit and to cause the hardwired pipeline(s) 44 ⁇ to begin processing this data.
  • the exception manager 86 monitors the status of the hardwired pipeline(s) 44-,, the communication interface 80, the communication shell 74, the controller 84, and the bus interface 82 (if present), and reports exceptions to another exception manager (not shown in FIG. 3) the host processor 12 (FIG. 1). For example, if a buffer (not shown) in the communication interface 80 overflows, then the exception manager 86 reports this to the host processor 12. The exception manager may also correct, or attempt to correct, the problem giving rise to the exception.
  • the exception manager 86 may increase the size of the buffer, either directly or via the configuration manager 88 as discussed below.
  • the configuration manager 88 sets the "soft" configuration of the hardwired pipeline(s) 44, the communication interface 80, the communication shell 74, the controller 84, the exception manager 86, and the interface 82 (if present) in response to soft-configuration data from the host processor 12 (FIG. 1). As discussed in previously incorporated U.S. Patent Application Publication No.
  • the "hard" configuration of a circuit within the PLIC 60 denotes the actual instantiation, on the transistor and circuit-block level, of the circuit, and the soft configuration denotes the settable physical parameters (e.g., data type, table size buffer depth) of the instantiated component. That is, soft-configuration data is similar to the data that one can load into a register of a processor (not shown in FIG. 3) to set the operating mode (e.g., burst-memory mode, page mode) of the processor.
  • the host processor 12 may send to the PLIC 60 soft-configuration data that causes the configuration manager 88 to set the number and respective priority levels of queues (not shown) within the communication interface 80.
  • the exception manager 86 may also send soft-configuration data that causes the configuration manager 88 to, e.g., increase the size of an overflowing buffer in the communication interface 80.
  • the pipeline unit 50 may include multiple PLICs.
  • the pipeline unit 50 may include two interconnected PLICs, where the circuitry that composes the interface-adapter layer 70 and framework-services layer 72 are instantiated on one of the PLICs, and the circuitry that composes the communication shell 74 and the hardwired pipeline(s) 44 are instantiated on the other PLIC.
  • FIG. 4 is a block diagram of the accelerator/host-processor configuration registry 40 of FIG. 1 according to an embodiment of the invention.
  • the registry 40 includes configuration data 100, an accelerator-template library 102, a software-object library 104, a circuit-definition library 106, and an accelerator-firmware library 108.
  • the configuration data 100 contains instructions that the configuration manager (not shown in FIG. 4) executed by the host processor 12 (FIG. 1) follows to configure the accelerator 14 (FIG. 1), and is further discussed below in conjunction with FIGS. 10 - 14. These instructions may be written in any conventional language or format.
  • the accelerator-template library 102 contains templates that define one or more interface-adapter layers 70, framework-services layers 72, communication shells 74, and hardwired pipelines 44 that the configuration manager executed by the host processor 12 (FIG. 1) can instantiate on the PLICs 60 (FIG. 2).
  • the library 102 is further discussed below in conjunction with FIGS. 5 - 6.
  • the software-object library 104 contains one or more software objects that, when executed, respectively perform in software the same (or similar) functions that the pipelines 44 defined by the templates in the accelerator-template library 702 perform in hardware. These software objects give the configuration manager executed by the host processor 12 the flexibility of instantiating in software at least some of the pipelined functions specified by the configuration data 100.
  • the library 104 is further discussed below in conjunction with FIG. 7.
  • the circuit-definition library 106 contains one or more circuit-definition files that each define a respective circuit for instantiation on the accelerator 14 (FIG. 1). Each circuit typically includes one or more interconnected hardwired pipelines 44 (FIG. 1), which are typically defined by corresponding templates in the library 102. The library 106 is further discussed below in conjunction with FIG. 8.
  • the accelerator-firmware library 108 contains one or more firmware-configuration files that each PLIC 60 (FIG. 2) of the accelerator 14 (FIG. 1) respectively downloads to set its internal circuit-node connections so as to instantiate a respective interface-adapter layer 70, framework-services layer 72, communication shell 74, and hardwired pipeline(s) 44. The library 108 is further discussed below in conjunction with FIG. 9.
  • FIG. 5 is a block diagram of a hardware-description file 120 from which the configuration manager (not shown in FIG. 5) executed by the host processor 12 (FIG. 1) can generate firmware for setting the circuit- node connections within a PLIC such as the PLIC 60 (FIGS. 2 - 3) according to an embodiment of the invention.
  • the accelerator-template library 102 contains templates that one can arrange to compose the file 120; consequently, an understanding of the hardware-description file 120 should facilitate the reader's understanding of the accelerator-template library 102, which is discussed below in conjunction with FIG. 6.
  • the below-described templates of the hardware-description file 120 are written in a conventional hardware description language (HDL) such as Verilog ® HDL, and are organized in a top-down structure that resembles the top-down structure of software source code that incorporates software objects.
  • HDL hardware description language
  • Verilog ® HDL a hardware description language
  • the hardware-description file 120 includes a top-level template
  • the definitions 122, 124, and 126 compose a top-level definition 123 of the hardware-interface layer 62 — of a PLIC such as the PLIC 60 (FIGS. 2 - 3).
  • the template 121 also defines the connections between the external pins (not shown) of the PLIC and the interface-adapter layer 70 (and in some cases between the external pins and the framework-services layer 72), and also defines the connections between the framework-services layer and the communication shell 74 (and in some cases between the interface-adapter layer and the communication shell).
  • FIGS. 2 - 3 incorporates an interface-adapter-layer template 128, which further defines the portions of the interface adapter layer defined by the top- level definition 122.
  • the top-level definition 122 defines a data-input buffer (not shown) in terms of its input and output nodes. That is, suppose the top-level definition 122 defines the data-input buffer as a functional block having defined input and output nodes.
  • the template 128 defines the circuitry that composes this functional buffer block, and defines the connections between this circuitry and the buffer input nodes and output nodes already defined by in the top-level definition 122.
  • the template 128 may incorporate one or more lower-level templates 129 that further define the data buffer or other components of the interface-adapter layer 70 already defined in the template 128.
  • these one or more lower-level templates 129 may each incorporate one or more even lower-level templates (not shown), and so on, until all portions of the interface-adapter layer 70 are defined in terms of circuit components (e.g., flip-flops, logic gates) that a PLIC synthesizing and routing tool (not shown) recognizes —
  • a PLIC synthesizing and routing tool is a conventional tool, typically provided by the PLIC manufacturer, that can generate from the hardware-description file 120 configuration firmware for a PLIC.
  • the top-level definition 124 of the framework-services layer 72 incorporates a framework-services-layer template 130, which further defines the portions of the framework-services layer defined by the top-level definition 724.
  • the top-level definition 724 defines a counter (not shown) in terms of its input and output nodes.
  • the template 730 defines the circuitry that composes this counter, and defines the connections between this circuitry and the counter input and output nodes already defined in the top-level definition 724.
  • the template 730 may incorporate a hierarchy of one or more lower-level templates 737 and even lower-level templates (not shown), and so on such that all portions of the framework-services layer 72 are, at some level of the hierarchy, defined in terms of circuit components (e.g., flip-flops, logic gates) that the PLIC synthesizing and routing tool recognizes.
  • circuit components e.g., flip-flops, logic gates
  • the template 730 may incorporate a lower-level template 737 that defines the circuitry within this up/down selector and the connections between this circuitry and the selector's input and output nodes already defined by the template 730.
  • the top-level definition 726 of the communication shell 74 incorporates a communication-shell template 732, which further defines the portions of the communication shell defined by the definition 726, and which also includes a top-level definition 733 of the hardwired pipeline(s) 44 disposed within the communication shell.
  • the definition 733 defines the connections between the communication shell 74 and the hardwired pipeline(s) 44.
  • the top-level definition 133 of the pipeline(s) 44 incorporates for each defined pipeline a respective hardwired-pipeline template 134, which further defines the portions of the respective pipeline 44 already defined by the definition 133.
  • the template or templates 134 may each incorporate a hierarchy of one or more lower-level templates 135, and even lower-level templates, such that all portions of the respective pipeline 44 are, at some level of the hierarchy, defined in terms of circuit components (e.g., flip-flops, logic gates) that the PLIC synthesizing and routing tool recognizes.
  • the communication-shell template 132 may incorporate a hierarchy of one or more lower-level templates 136, and even lower-level templates, such that all portions of the communication shell 74 other than the pipeline(s) 44 are, at some level of the hierarchy, also defined in terms of circuit components (e.g., flip-flops, logic gates) that the PLIC synthesizing and routing tool recognizes. *
  • a configuration template 138 provides definitions for one or more parameters having values that one can set to configure the circuitry that the templates 121, 128, 129, 130, 131, 132, 134, 135, and 136, and the even lower-level templates (not shown) define.
  • the bus interface 82 (FIG. 3) of the interface-adapter layer 70 (FIG. 3) is configurable to have either a thirty-two-bit or a sixty- four-bit interface to the bus connector 56.
  • the configuration template 138 defines a parameter BUS-WIDTH, the value of which determines the width of the interface 82.
  • BUS-WIDTH 0 configures the interface 82 to have a thirty-two-bit interface
  • BUS-WIDTH 1 configures the interface 82 to have a sixty-four-bit interface.
  • Other parameters that may be configurable in this manner include the depth of a first-in-first-out (FIFO) data buffer (not shown) disposed within the data memory 58 (FIGS. 2 - 3), the lengths of messages received and transmitted by the interface adapter layer 70, the precision and data type (e.g., integer, floating-point) of the pipeline(s) 44, and a constant coefficient of a mathematical expression (e.g., "a" in ax 2 ) that a pipeline executes.
  • FIFO first-in-first-out
  • the PLIC synthesizer and router tool (not shown) configures the interface-adapter layer 70, the framework-services layer 72, the communication shell 74, and the hardwired pipeline(s) 44 (FIGS. 2 - 3) according to the set values in the template 138 during the synthesis of the hardware-description file 120. Consequently, to reconfigure the circuit parameters associated with the parameters defined in the configuration template 138, one need only modify the values of these parameters in the configuration template, and then rerun the synthesizer and router tool on the file 120.
  • templates e.g., 121 , 128, 129, 130, 131, 132, 134, 135, and 136
  • templates that do not incorporate settable parameters such as those provided by the configuration template 138 are sometimes called modules or entities, and are typically lower-level templates that include Boolean expressions that a synthesizer and router tool (not shown) converts into circuitry for implementing the expressions.
  • the hardware-description file 120 may define circuitry for instantiation on an application-specific integrated circuit (ASIC).
  • ASIC application-specific integrated circuit
  • FIG. 6 is a block diagram of the accelerator-template library 702 of FIG. 4 according to an embodiment of the invention.
  • the library 102 contains one or more versions of the templates described above in conjunction with FIG. 5. For clarity, however, the optional lower-level templates 129, 131, 135, and 136 are omitted from FIG. 6. Furthermore, a library similar to the library 102 is described in previously incorporated U.S. Patent App. Ser. No. (Attorney Docket No. 1934-023-03).
  • the library 102 has m+1 sections: m sections 14O 1 - 140 m for the respective m platforms that the library supports, and a section 142 for the hardwired pipelines 44 (FIGS. 1 - 3) that the library supports.
  • the library section 14O 1 is discussed in detail, it being understood that the other library sections 14O 2 - 140 m are similar.
  • the library section 14O 1 also includes n communication-shell templates 132 1t1 - 132 1>n , which respectively correspond to the hardwired- pipeline templates 134 ⁇ - 134 n in the library section 142.
  • the communication shell 74 interfaces a hardwired pipeline or hardwired-pipelines 44 to the framework-services layer 72. Because each hardwired pipeline 44 is different and, therefore, typically has different interface specifications, the communication shell 74 is typically different for each hardwired pipeline. Consequently, in this embodiment, one provides design adjustments to create a unique version of the communication shell 74 for each hardwired pipeline 44. The designer provides these design adjustments by writing a unique communication-shell template 132 for each hardwired pipeline.
  • the group of communication-shell templates 132 ⁇ 1 - 132 1>n corresponds only to the version of the framework-services layer 72 that is defined by the template 13O 1 ; consequently, if there are multiple versions of the framework-services layer 72 that are compatible with the platform 1 , then the library section 14O 1 includes a respective group of n communication-shell templates 132 for each version of the framework-services layer.
  • the library section 14O 1 includes a configuration template 13B 1 , which defines for the other templates in this library section (and possibly for the hardwired-pipeline templates 134 in the section 142) configuration constants having designer-selectable values as discussed above in conjunction with the configuration template 138 of FIG. 5.
  • each template within the library section 14O 1 includes, or is associated with, a respective template description 144 1 - 152- ⁇ .
  • the descriptions 144 1 - 150 1>n describe the operational and other parameters of the circuitry that the respective templates 12I 1 , 12S 1 , 13O 1 , and 132 1t1 - 132 1>n respectively define.
  • the template description 152 ⁇ describes the settable parameters in the configuration template 138 ⁇ , the values that these parameters can have, and the meanings of these values.
  • Examples of parameters that a template description 144 1 - 15O 1 n may describe include the width of the data bus and the depths of FIFO buffers that the circuit defined by the corresponding template includes, the latency of the circuit, and the type and precision of the values received and generated by the circuit.
  • Each of the template descriptions 144-t - 152 1 may be embedded within the template 12I 1 , 128 1t 13O 1 , 132 1 - 132 1 ⁇ t1 , and 13Z 1 to which it corresponds.
  • the IAL template description 14Q 1 may be embedded within the interface-adapter-layer template 12Z 1 as extensible markup language (XML) tags or comments that are readable by both a human and the host processor 12 (FIG. 1) as discussed below in conjunction with FIGS. 10 - 14.
  • XML extensible markup language
  • each of the template descriptions 144 1 - 152 1 may be disposed in a separate file that is linked to the template to which the description corresponds, and this file may be written in a language other than XML.
  • the top-level-template description 144 1 may be disposed in a file that is linked to the top-level template 12I 1 .
  • the section 14O 1 of the library 102 also includes a description
  • the host processor 12 may use the description 154 1 to determine which platform(s) the library 102 supports as discussed below in conjunction with FIGS. 10 - 14. Examples of parameters that the description 154- 1 may describe include: 1 ) for each interface, the message specification, which lists the transmitted variables and the constraints for those variables, and 2) a behavior specification and any behavior constraints. Messages that the host processor 12 (FIG. 1) sends to the pipeline units 50 (FIG 2) and that the pipeline units send among themselves are further discussed in previously incorporated U.S. Patent Publication No. 2004/0181621. Examples of other parameters that the description 754 ?
  • the platform description 15A 1 may be written in XML or in another language.
  • the section 142 of the library 102 includes n hardwired-pipeline templates 134i - 134 n , which each define a respective hardwired pipeline 44-t - 44 n (FIGS. 1 - 3).
  • the templates 134 1 - 134 n are platform independent (the corresponding communication-shell templates 732 m ? - 132 m ⁇ n respectively define the specified interfaces between the pipelines 44 and the framework-services layer 70, the library 102 stores only one template 134 for each hardwired pipeline 44. That is, each hardwired pipeline 44 does not require a separate template 134 for each platform m that the library 102 supports.
  • each hardwired-pipeline template 734 includes, or is associated with, a respective template description 756 ? - 156 n , which describes parameters of the hardwired-pipeline 44 that the template defines.
  • parameters that a template description 15G 1 - 156 n may describe include the type (e.g., floating point or integer) and precision of the data values that the corresponding hardwired pipeline 44 can receive and generate, and the latency of the pipeline.
  • each of the descriptions 756 ? - 756 ⁇ may be respectively embedded within the hardwired-pipeline template 734 ? - 734 ⁇ to which the description corresponds as, e.g., XML tags, or may be disposed in a separate file that is linked to the corresponding hardwired- pipeline template.
  • each library section 14O 1 - 140 m may include a single description that describes all of the templates within that library section. For example, this single description may be embedded within or linked to the top-level template 121 or to the configuration template 138.
  • each library section 14O 1 - 140 m is described as including a respective communication-shell template 132 for each hardwired-pipeline template 134 in the library section 142, each section 140 may include fewer communication-shell templates, at least some of which are compatible with, and thus correspond to, more than one pipeline template 134.
  • each library section 14O 1 - 140 m may include only a single communication-shell template 132, which is compatible with all of the hardwired-pipeline templates 134 in the library section 142.
  • the library section 142 may include respective versions of each pipeline template 134 for each communication-shell template 132 in the library sections 14O 1 - 140 m .
  • FIG. 7 is a block diagram of the software-object library 104 of
  • FIG. 4 according to an embodiment of the invention.
  • the library 104 includes software objects 16O 1 - 160 q , at least some of which can cause the host processor 12 (FIG. 1) to perform in software the same functions that respective ones of the hardwired pipelines 44 1 - 44 n (FIGS. 2 - 3) can perform in hardware. For example, if the pipeline 44 1 squares a value v (v 2 ) input to the pipeline, then a corresponding software object 16O 1 can cause the host processor 12 to square an input value v (v 2 ).
  • the software objects 16O 1 - 160 q may be directly executable by the host processor 12, or may cause the host processor to generate corresponding programming code that the host processor can execute.
  • the software objects I6O 1 - 160 q may be written in any conventional programming language such as C ++ . Because object-oriented software architectures are known, further details of the software objects 160 are omitted for brevity.
  • the library 704 also includes respective object descriptions
  • the object descriptions 162 may describe parameters and other features of the software objects 160, such as the function(s) that they cause the host processor 12 (FIG. 1) to perform, and the latency and the type and precision of the values accepted and generated by the host processor 12 while performing the function(s).
  • the descriptions 762 may be written in a conventional language, such as XML 1 that the host processor 12 recognizes, and may be embedded, e.g., as comment tags, within the respective software objects 760 or may be contained within separate files that correspond to the respective software objects.
  • the software objects 760 provide the host processor 12 flexibility in configuring the pipeline accelerator 14, and in reconfiguring the peer-vector machine 70 in the invent of a failure.
  • the configuration data 700 calls for instantiating eight hardwired pipelines 44 on the accelerator 14, but the accelerator has room for only seven pipelines.
  • the host processor 12 may execute a software object that corresponds to the eighth pipeline so as to perform the function that the eighth pipeline otherwise would have performed.
  • the configuration data 700 calls for instantiating a pipeline 44 that performs a function (e.g., sin (v)), but no such pipeline is available.
  • the host processor 12 may execute a software object 760 so as to perform the function.
  • FIG. 8 is a block diagram of the circuit-definition library 706 of FIG. 4 according to an embodiment of the invention.
  • the library 106 includes circuit-definition files 17O 1 - 170 p , which each define a respective circuit for instantiation on one more PLICs (FIGS. 2 - 3) of the pipeline accelerator 14 (FIG. 1 ) in terms of templates from the accelerator-template library 102 (FIG. 4).
  • the circuit file 170 identifies from the template library 102 (FIGS.
  • the file 170 defines the interconnections between these pipelines.
  • the circuit file 170 identifies for each PLIC the templates that define the circuitry to be instantiated or that PLIC, and also defines the interconnections between the PLICs. An example of a circuit defined by a circuit file 170 is described below in conjunction with FIG. 11.
  • the library 106 also includes circuit descriptions 172 1 - 172 P that correspond to the circuit-definition files 17O 1 - 17O 9 .
  • the description 172 may also identify the platform(s) with which the corresponding circuit is compatible, and may include values for the constants defined by the configuration template(s) 138 (FIGS. 5 - 6) that the circuit definition file 170 identifies.
  • each of the circuit descriptions 170 may be written in a conventional language, such as XML, that the host processor 12 (FIG. 1) recognizes, and may be embedded (e.g., as comment tags) within a respective circuit-description file 170 or may be contained within a separate file that is linked to the respective circuit-description file.
  • circuit-definition files 170 Files similar to the circuit-definition files 170 and a tool for generating these files are disclosed in previously incorporated U.S. Patent Application (Attorney Docket No. 1934-023-03). Furthermore, although described as defining circuits for instantiation on one or more PLICs, some of the circuit-definition files 170 may define circuits for instantiation on an ASIC.
  • FIG. 9 is a block diagram of the accelerator-firmware library 108 of FIG. 4 according to an embodiment of the invention.
  • the library 108 includes firmware files 18O 1 - 18O n each of which, when downloaded by a PLIC, configures the PLIC to instantiate a respective circuit.
  • the respective circuit typically includes an interface-adapter layer 70, framework-services layer 72, communication shell 74, and one or more hardwired pipelines 44, although the circuit may have a different topology.
  • a PLIC synthesizing and routing tool may generate one or more of the firmware files 180 from templates in the accelerator-template library 102 (FIG. 4), or in another manner.
  • the firmware files 18O 1 - 180 r are the only files within the accelerator/host-processor-configuration registry 40 (FIG. 4) that can actually configure a PLIC to instantiate a circuit. That is, although the templates in the library 102 (FIG. 4) and the circuit-definition files 170 in the library 106 (FIG. 4) define circuits, the configuration manager (not shown in FIG. 9) executed by the host processor 12 (FIG. 1) cannot instantiate such a defined circuit on the pipeline accelerator 14 (FIG. 1) until the respective template(s) and/or circuit-definitions files are converted into one or more corresponding firmware files 180 using, for example, a PLIC synthesizing and routing tool (not shown). [111] Still referring to FIG.
  • the library 108 also includes respective descriptions 182 ⁇ - 182 r of the firmware files 18O 1 - 180 r .
  • the description 782 may also identify the platform(s) with which the corresponding circuit is compatible, and may also identify the type(s) of PLIC on which the circuit can be instantiated.
  • the descriptions 182 may be written in a conventional language, such as XML, that the host processor 12 (FIG. 1) recognizes, and may be embedded (e.g., as comment tags) within the respective firmware files 180 or may be contained within separate files that are linked to the respective firmware files.
  • FIG. 10 is a functional block diagram of the host processor 12, the interface memory 18, and the pipeline bus 20 of FIG. 1 according to an embodiment of the invention.
  • the processing unit 32 executes one or more software applications
  • the message handler 34 executes one or more software objects (different from the software objects in the library 104 of FIG. 4) that transfer data between the software application(s) and the pipeline accelerator 14 (FIG. 1). Splitting the data-processing, data-transferring, and other functions among different applications and objects allows for easier design and modification of the host-processor software.
  • a software application is described as performing a particular function, it is understood that in actual operation, the processing unit 32 or message handler 34 executes the software application and performs this function under the control of the application.
  • a software object is described as performing a particular function, it is understood that in actual operation, the processing unit 32 or message handler 34 executes the software object and performs this function under the control of the object.
  • a manager application e.g., configuration manager
  • the processing unit 32 or message handler 34 executes the manager application and performs this function under the control of the manager application.
  • the processing unit 32 executes at least one data-processing application 190, an accelerator-exception-manager application (hereinafter the exception manager) 192, and an accelerator-configuration-manager application (hereinafter the configuration manager) 194, which are collectively referred to as the processing-unit applications. Furthermore, the exception and configuration managers 192 and 194 are executed by the processing unit 32, and are thus different from the exception and configuration managers 88 and 90 disposed on the PLIC 60 of FIG. 3. [114]
  • the data-processing application 190 processes data in cooperation with the pipeline accelerator 14 (FIG. 1).
  • the data-processing application 190 may receive raw sonar data via the port 24, parse the data, and send the parsed data to the accelerator 14, and the accelerator may perform a fast Fourier transform (FFT) on the parsed data and return the FFT output data to the data-processing application for further processing.
  • FFT fast Fourier transform
  • the exception manager 192 handles exception messages from the pipeline accelerator 14 (FIG. 1), and may detect and handle exceptions that result from the operation of the host processor 12.
  • the PLIC exception manager(s) 88 (FIG. 3) typically generate the exception messages that the exception manager 192 receives from the pipeline accelerator 14.
  • the configuration manager 194 downloads the firmware files 780 from the library 106 (FIGS. 4 and 9) into accelerator the firmware memory or memories 22 (FIGS. 1 - 3) during initialization of the peer-vector machine 10 (FIG. 1), and may also reconfigure the pipeline accelerator 14 (FIG. 1) after the initialization in response to, e.g., a malfunction of the peer-vector machine.
  • the configuration manager 194 may perform additional functions as described below in conjunction with FIGS. 11 - 14.
  • the processing-unit applications 190, 192, and 194 may communicate with each other directly as indicated by the dashed lines 196, 198, and 200, or may communicate with each other via the data-transfer objects 202, which are described below. Furthermore, the processing-unit applications 190, 192, and 194 communicate with the pipeline accelerator 14 (FIG. 1) via the data-transfer objects 202.
  • the message handler 34 executes the data-transfer objects
  • the data-transfer objects 202 transfer data between the communication object 204 and the processing-unit applications 190, 192, and 194, and may use the interface memory 18 as one or more data buffers to allow the processing-unit applications and the pipeline accelerator 14 (FIG. 1) to operate independently.
  • the memory 18 allows the accelerator 14, which is often faster than the data-processing application 190, to operate without "waiting" for the data-processing application.
  • the communication object 204 transfers data between the data-transfer objects 202 and the pipeline bus 20.
  • the input- and output-reader objects 206 and 208 control the data-transfer objects 202 as they transfer data between the communication object 204 and the processing-unit applications 190, 192, and 194. And, when executed, the input- and output-queue objects 210 and 212 cause the input- and output-reader objects 206 and 208 to synchronize this transfer of data according to a desired priority
  • the message handler 34 instantiates and executes an object factory 214, which instantiates the data-transfer objects 202 from configuration data stored in the message-configuration registry 42 (FIG. 1).
  • the message handler 34 also instantiates the communication object 204, the input- and output-reader objects 206 and 208, and the input- and output-queue objects 210 and 212 from the configuration data stored in the message-configuration registry 42. Consequently, one can design and modify the objects 202- 212, and thus their data-transfer parameters, by merely designing or modifying the configuration data stored in the registry 42. This is typically less time consuming than designing or modifying each software object individually. [120] The structure and operation of the processing unit 32 and the message handler 34 are further described in previously incorporated U.S. Patent Publication No. 2004/0181621.
  • FIG. 11 is a block diagram of a circuit 220 that is designed for instantiation on the pipeline accelerator 14 (FIG. 1) according to an embodiment of the invention.
  • the clock signals, power signals, and other signals are omitted from FIG. 11 for clarity.
  • the circuit 220 generates, in a pipelined fashion, a stream of output values y from streams of input values x and z, which are related by the following equation:
  • the circuit 220 is designed for instantiation on a pipeline accelerator 14 (FIG. 1) that supports a platform that specifies sixty-four-bit data transfers and busses.
  • the circuit 220 includes eight hardwired pipelines 44 ⁇ - 44 8 (pipelines 44 5 and 44 6 are the same) and eight hardware-interface layers 62 1 - 62 8 respectively instantiated on eight PLICs 6O 1 - 6O 8 .
  • the pipeline 44*, on the PLIC 6O 1 receives the stream of input values x and generates a stream of values sin(x).
  • the pipeline 44 2 on the PLIC 6O 2 receives the stream of input values z and generates a stream of values bz 3
  • the pipeline 44 3 on the PLIC 6O 3 receives the stream x and generates a stream ax 4
  • the pipeline 44 4 on the PLIC 6O 4 receives the stream z and generates a stream cos(z).
  • the pipeline 44 5 on the PLIC 6O 5 receives from the PLICs 6O 1 and 6O 2 the streams sin(x) and bz 3 and generates a stream of values bz 3 sin(x), and the pipeline 44 6 on the PLIC 6O 6 receives from the PLICs 6O 3 and 6O 4 the streams ax 4 and cos(z) and generates a stream ax 4 cos(z).
  • the pipeline 44 7 on the PLIC 6O 7 receives from the PLICs 6O 5 and 6O 6 the streams bz 3 sin(x) and ax 4 cos(z) and generates a stream b ⁇ sin ⁇ x) + ax 4 cos(x).
  • FIG. 12 is a block diagram of the data paths between the PLICs 6O 1 - 6O 8 of FIG. 11 according to an embodiment of the invention, and is further described below.
  • FIG. 13 is a block diagram of the circuit 220 modified for instantiation on seven PLICs 60 instead of eight PLICs (as shown in FIG. 11) according to an embodiment of the invention, and is further described below.
  • FIG. 14 is a block diagram of the data paths between the .
  • FIG. 1 The operation of the configuration manager 194 during the initialization of the peer-vector machine 10 (FIG. 1) is discussed in conjunction with FIGS. 10 - 14 according to embodiments of the invention. Although a number of detailed operational examples are provided below, the following is a general overview of the configuration manager 194 and some of the advantages that it may provide.
  • the configuration manager 194 initializes the peer-vector machine 10 (FIG. 1) when the machine is "turned on,” restarted, or is otherwise reset.
  • the 194 determines the desired configuration of the pipeline accelerator 14 (FIG. 1) from the configuration data 700 (FIG. 4) within the accelerator/host- processor-configuration registry 40 (FIGS. 1 and 4), and also determines the physical composition (e.g., the number of pipeline units 50 (FIGS. 2 - 3) and the platform(s) that they support) of the pipeline accelerator.
  • the configuration manager 194 configures the pipeline accelerator 14 (FIG. 1) in response to the configuration data 700 (FIG. 4), one can typically change the accelerator configuration merely by "turning off the peer-vector machine 70 (FIG. 1), changing the configuration data, and then restarting the machine.
  • the configuration manager 794 can detect changes to the accelerator (e.g., the removal or addition of a pipeline unit 50 (FIGS. 2- 3), and can often "fit" the by the configuration data 700 (FIG. 4) into an altered accelerator. That is, the configuration manager 794 can often detect a physical change to the accelerator 74 and modify the specified circuit instantiation(s) accordingly so that the circuit(s) can fit onto the modified accelerator and process data as desired despite the change.
  • the configuration data 100 points to a circuit-definition file 170 (FIGS. 4 and 8) that defines the circuit 220 of FIG. 11, and instructs the configuration manager 194 to instantiate the circuit 220 on the pipeline accelerator 14 (FIG. 1) according to this circuit-definition file.
  • the configuration manager 194 reads the configuration data 100, determines from the configuration data the desired configuration of the pipeline accelerator 14 (FIG. 1), and also determines the physical composition of the pipeline accelerator. Regarding the former determination, the configuration manager 194 first determines that it is to read the circuit-definition file 170 pointed to by the configuration data 100. Next, the configuration manager 194 reads the file 170, and determines that the manager is to instantiate on each of the eight PLICs 6O 1 - 6O 8 a respective pipeline 44-, - 44 8 (pipelines 44 5 and 44 6 are the same) and hardware-interface layer 62 1 - 62 8 .
  • the pipeline bus 20 may include slots for receiving pipeline units 50 (FIGS. 2 - 3), and the configuration manager 194 may, for each slot, read a conventional indicator associated with the slot, or use another technique, for determining whether or not a pipeline unit is inserted into the slot. [135] Next, the configuration manager 194 determines whether the configuration indicated by the configuration data 100 (FIG. 4) is compatible with the physical composition of the pipeline accelerator 14 (FIG. 1).
  • the configuration manager 194 determines whether the accelerator 14 includes eight pipeline units 50 each having a respective one of the PLICs 6O 1 - 6O 8 on which the configuration manager can instantiate the pipeline units 44 ⁇ - 44 8 and the hardware-interface layers 62 ⁇ - 62 8 .
  • the configuration manager 194 determines that the desired configuration of the pipeline accelerator 14 (FIG. 1) is compatible with the physical composition of the accelerator.
  • the configuration manager 194 next determines whether the pipeline accelerator 14 (FIG. 1) supports the platform(s) that circuit-definition file 170 specifies as being compatible with the circuit 220. More specifically, the configuration manager 194 reads from the file 170 the specified platform(s), and reads from the platform-identifier memory 54 (FIGS. 2 - 3) on each pipeline unit 50 (FIGS. 2 - 3) the identity/identities of the platform(s) that the pipeline units support. Then, the configuration manager 194 compares the specified platform(s) from the file 170 to the identified platform(s) from the memories 54.
  • the configuration manager 194 determines that the platform(s) supported by the pipeline accelerator 14 is/are compatible with the platform(s) specified by the file 170.
  • the file 170 indicates that the circuit 220 is compatible with platform 1 (FIG. 6), and the platform-identifier memory 54 on each pipeline unit 50 indicates that the respective pipeline unit is compatible with this platform; consequently, the configuration manager 194 determines that the pipeline accelerator 14 is compatible with the platform (Ae., platform 1) specified by the circuit-definition file 170.
  • the configuration manager 194 determines that the pipeline accelerator 14 (FIG. 1) supports the platform(s) that the circuit-definition file 170 (FIG. 8) specifies.
  • the configuration manager 194 next determines whether the firmware library 708 (FIGS. 4 and 9) includes firmware files 780 that, when downloaded by the PLICs 6O 1 - 6O 8 , will respectively instantiate on these PLICs the pipelines 44 ⁇ - 44 8 and the hardware-interface layers 62 1 - 62 8 .
  • the configuration manager 194 makes this determination by reading the firmware descriptions 182 in the library 108.
  • the configuration manager 194 matches the firmware file 18O 1 to the PLIC 6O 1 in the circuit 220.
  • the configuration manager 794 determines that the library 108 (FIGS. 4 and 8) contains firmware files 18O 1 - 18O 7 for each of the PLICs 6O 1 - 6O 8 — the firmware file 18O 5 is for both the PLICs 6O 5 and 6O 6 — because the pipelines 44 5 - 44 6 are the same.
  • the configuration manager 194 next downloads these firmware files 780* - 78O 7 (FIG. 8) to the PLICs 6O 1 - 6O 8 (FIG. 11) via the pipeline bus 20.
  • Techniques for downloading these firmware files are described in previously incorporated U.S. Patent Publication No. 2004/0170070.
  • the configuration manager 794 determines the topology that the circuit-definition file 770 (FIG. 7) specifies for interconnecting the PLICs 6O 1 - 6O 8 of the circuit 220 (FIG. 11).
  • the circuit-description file 770 (FIG. 7) specifies that the PLICs 6O 1 - 6O 8 (FIG. 11) are to be interconnected via the host processor 12 (FIG. 1) as shown FIG. 12.
  • the configuration manager 794 instantiates in the interface memory 78 buffers 23O 1 - 23O 15 , and instantiates in the message handler 34 data-transfer objects 202 1 - 202 2Z .
  • the PLIC 6O 1 needs a path on which to provide the stream of values sin(x) to the corresponding input pin of the PLIC 6O 5 .
  • the configuration manager 794 forms this path by instantiating in the interface memory 78 the buffers 23O 1 and 23O 2 , and by instantiating in the message handler 34 the data-transfer objects 202 1t 202 2 , and 202 3 .
  • the PLIC 6O 1 provides the stream of values sin(x) to the data- transfer object 202 ⁇ via the pipeline bus 20 and communication object 204, and the data-transfer object 202 1 sequentially loads these values into the buffer 23O 1 .
  • the data-transfer object 202 2 sequentially transfers the values sin(x) from the buffer 23O 1 to the buffer 23O 2 in first-in-first-out fashion
  • the data-transfer object 202 3 transfers the values sin(x) from the buffer 23O 2 in first-in-first-out fashion to the corresponding input pin of the PLIC 6O 5 via the communication object 204 and the pipeline bus 20.
  • the configuration manager 194 forms the remaining paths interconnecting the PLICs in a similar manner.
  • the PLIC 6O 2 transfers the values jbz 3 to the corresponding input pin of the PLIC 6O 5 via the data-transfer objects 202 4 - 202 6 and the buffers 23O 3 - 23O 4
  • the PLlC 6O 3 transfers the values ax 4 to the corresponding input pin of the PLIC 6O 6 via the data-transfer objects 202 7 - 202 9 and the buffers 23O 5 - 23O 6 , and so on.
  • the PLICs 6O 1 - 6O 4 may receive the values x and z via the raw-data input port 24, or from the data- processing application 190 via respective buffers 230 and data-objects 202 (omitted from FIG. 12 for brevity) that the configuration manager 194 instantiates in response to the circuit-definition file 170.
  • the data-processing application 190 may provide the values x to a first data-transfer object 202 (not shown), which loads the values x into a buffer 230 (not shown).
  • a second data-transfer object 202 (not shown) unloads the values x from the buffer 230 and provides these values to the corresponding input pins of the PLICs 6O 1 and 6O 3 via the communication object 204 and the pipeline bus 20.
  • the configuration manager 194 After instantiating the data-transfer objects 202 1 - 202 23 , and the buffers 23O 1 - 23O 15 (and possibly the data-transfer objects and buffers described in the preceding paragraph), the configuration manager 194 sends to the configuration managers 88 (FIG. 3) on each of the PLICs 6O 1 - 6O 8 any soft-configuration data specified by the circuit-definition file 170. For example, the configuration manager 194 may send to the configuration managers 88 on the PLICs 6O 2 and 6O 3 soft-configuration data that sets the values of the constants a and b.
  • the configuration manager 194 may send to the configuration managers 88 on the PLICs 6O 1 and 6O 4 soft- configuration data that causes the respective exception managers 86 on these PLICs to indicate exceptions for values of sin(x) and cos(x) outside of the ranges -1 ⁇ sin(x) ⁇ 1 and -1 ⁇ cos(x) ⁇ 1 , respectively.
  • the configuration manager 194 sends this soft-configuration data to the PLICs 6O 1 - 6O 8 via one or more data-transfer objects 202 that the configuration manager has instantiated for this purpose.
  • the configuration manager 194 determines that the pipeline accelerator 14 (FIG. 1) includes fewer than eight PLICs 60, and thus arranges the circuit 220 to "fit" onto the available PLICs.
  • the configuration manager 194 determines that the pipeline accelerator 14 includes only seven PLICs 6O 1 - 6O 5 and 6O 7 - 6O 8 . [150] The configuration manager 194 sends this information to a circuit-design tool such as the circuit-design tool described in previously incorporated U.S. Patent Application Ser. No. (Attorney Docket
  • the tool may be executed by the host processor 12 (FIG. 1), and the configuration manager 194 may communicate with the tool via one or more data-transfer objects 202.
  • the circuit-design tool determines that the circuit 220 cannot fit onto the pipeline accelerator 14 (FIG. 1), and notifies the configuration manager 194, which generates an appropriate error message.
  • the host processor 12 may display this message via a display or by another conventional technique.
  • an operator (not shown) can install into the peer-vector machine 10 (FIG. 1) an additional pipeline unit 50 (FIG. 2) that includes the PLIC 6O 6 so that the configuration manager 194 can then instantiate the circuit 220 on the pipeline accelerator 14 as described above in Example 1.
  • the circuit-design tool accesses the library 102 and discovers a template 134 9 for a dual-multiplication pipeline 44 g (two multipliers in a single pipeline), and determines from the corresponding hardwired-pipeline- template description 156 9 that this pipeline (along with a corresponding hardware-interface layer 62 9 ) can fit into the PLIC 6O 5 and can give the circuit 220 the desired operating parameters (as included in the circuit- definition file 170 that defines the circuit 220). Then, using this dual-multiplication pipeline 44 9 , the tool redesigns the circuit 220 as shown in FIG.
  • the configuration manager 194 then instantiates the redesigned circuit 220 in a manner similar to that discussed above in conjunction with Example 1. If, however, the firmware library 108 includes no firmware file 180 for instantiating the dual-multiplier pipeline 44 9 on the PLIC 6O 5 , then the circuit design tool or the configuration manager 194 may notify an operator (not shown), who manually generates this firmware file and loads it into the firmware library.
  • the circuit-design tool or the configuration manager 194 may cause a PLIC synthesizing and routing tool (not shown) to generate this firmware file from the appropriate templates in the accelerator-template library 102 (FIGS. 4 and 6). Once this firmware file is generated and stored in the library 108, the configuration manager 194 proceeds to instantiate the redesigned circuit 220 of FIG. 13 in a manner similar to that discussed above in conjunction with Example 1.
  • Example 2 describes placing two multipliers on a single PLIC 6O 5
  • the configuration manager 194 and/or the circuit-design tool may fit the functions of multiple ones of the other pipelines 44 ⁇ - 44 8 of the circuit 220 on a single PLIC, including placing on a single PLIC a single pipeline that generates y in equation (2).
  • the circuit- design tool may instantiate multiple interconnected ones of the pipelines 44 ⁇ - 44 8 (FIG. 11) on a single PLIC instead of searching for existing pipelines that each perform multiple ones of the functions performed by the pipelines 44 ⁇ - 44 8 .
  • configuration manager 194 effectively replaces a hardwired pipeline 44 with a software object 160 (FIG. 7) from the software-object library 104 (FIGS. 4 and 7).
  • the configuration manager 194 determines that the pipeline accelerator 14 (FIG. 1) includes only seven PLICs 6O 1 - 6O 6 and 6O 8 . [156] The configuration manager 194 next reads the software-object descriptions 162 (FIG. 7) and determines that the software object 16O 1 can sum two values such as Oz 3 Sm(X) and ax 4 cos(z).
  • the configuration manager 194 instantiates the software object 16O 1 (FIG. 7) as part of a data-processing application thread 240 that, after the instantiation of the remaining portions of the circuit 220 on the pipeline accelerator 44 (FIG. 1), receives Oz 3 SJn(X) and ax 4 cos(z) from the PLICs 6O 5 and 6O 6 , respectively, sums corresponding values from these two streams, and then provides Oz 3 SiIi(X) + ax 4 cos(z) to the PLIC 6O 8 .
  • the thread 240 receives bz 3 sin(x) from the PLIC 6O 5 via the pipeline 20, communication object 204, data-transfer object 202 24 , buffer 23O 15 , and data-transfer object 202 25 .
  • the thread 240 receives ax 4 cos(z) from the PLIC 6O 6 via the pipeline 20, communication object 204, data-transfer object 202 26 , buffer 23O 16 , and data-transfer object 202 2 ⁇ .
  • the thread provides Oz 3 Sm(X) + ax 4 cos(z) to the PLIC 6O 8 via the data-transfer object 202 28 , buffer 23O 17 , data-transfer object 202 29 , communication object 204, and pipeline 20.
  • the configuration manager 194 instantiates these data-transfer objects and buffers as described above in conjunction with Example 1. Furthermore, the operation and instantiation of application threads such as the thread 240 are described in previously incorporated U.S. Patent Publication No. 2004/0181621.
  • the configuration manager 194 proceeds to instantiate the remaining portions of the circuit 220 on the pipeline accelerator 14 (FIG. 1) in a manner similar to that discussed above in conjunction with Example 1.
  • Example 3 describes replacing a single pipeline 44 6 with a data-processing application thread 240 that executes a single corresponding software object 160 (FIG. 7)
  • the configuration manager 194 may replace any number of the pipelines 44 1 - 44 8 in the circuit 220 (FIG. 11) with one or more threads that execute corresponding software objects.
  • the configuration manager 194 may combine the concepts described in conjunction with Examples 2 and 3 by fitting multiple pipelines 44 or multiple pipeline functions on each of one or more PLICs, and replacing other pipelines 44 with one or more data-processing application threads that execute corresponding software objects 160.
  • Example 1 except that the configuration manager 194 determines that the pipeline accelerator 14 (FIG. 1) does not support the platform(s) that circuit-definition file 170 (FIG. 8) specifies as being compatible with the circuit 220.
  • the configuration manager 194 generates an error message, and, in response, an operator (not shown) replaces the pipeline units 50 (FIG. 2) that do not support the specified platform(s) with pipeline units that do support the specified platform(s).
  • the configuration manager 194 instantiates a circuit that performs the same function as the circuit 220 (i.e., generates y in equation (1)) by downloading into the available PLICs firmware files 780 (FIG. 9) that instantiate the hardwired pipelines 44 ? - 44 8 with respective hardware-interface layers 62 that are compatible with the platform(s) supported by the pipeline accelerator 14 (FIG. 1). If the library 108 (FIGS. 4 and 8) does not contain such firmware files 180, then the configuration manager 794 and/or a circuit-design tool such as that described in previously incorporated U.S. Patent Application Ser. No. (Attorney Docket No.
  • the configuration manager 194 may generate these firmware files from the templates in the library 702 (FIGS. 4 and 6) as discussed above in conjunction with Example 2.
  • the configuration manager 194 instantiates the function of the circuit 220 ⁇ i.e., generates y in equation (1)) in one or more data-processing application threads 240 as discussed above in conjunction with Example 3.
  • the configuration manager 194 instantiates a portion of the circuit 220 on the pipeline accelerator 14 per the above described second embodiment of Example 4, and effectively instantiates the remaining portion of the circuit 220 in one or more data- processing application threads per the preceding paragraph.
  • Example 5
  • Example 1 except that the configuration manager 194 determines that the library 108 (FIGS. 4 and 8) lacks at least one of the firmware files 18O 1 - 18O 7 for the PLICs 6O 1 - 6O 8 (the firmware file 18O 5 corresponds to both the PLICs 6O 5 and 6O 6 ).
  • the configuration manager 194 generates an error message, and, in response, an operator loads the missing firmware file(s) 180 (FIG. 8) into the library 108 (FIGS. 4 and 8) so that the configuration manager can proceed with instantiating the circuit 220 on the PLICs 6O 1 - 6O 8 as discussed above in conjunction with Example 1.
  • the configuration manager 194 and/or a circuit-design tool such as that described in previously incorporated U.S. Patent Application Ser. No. (Attorney Docket
  • Example 6 No. 1934-023-03 generates these firmware files from the templates (FIG. 6) in the library 102 (FIGS. 4 and 6) as discussed above in conjunction with Example 2. Then, the configuration manager 194 loads these generated firmware files 180 into the library 108, and instantiates the circuit 220 on the PLICs 6O 1 - 6O 8 as discussed above in conjunction with Example 1. [168] In a third embodiment, the configuration manager 194 instantiates the function of a pipeline 44 corresponding to a missing firmware file 180 in a data-processing application thread 240 as discussed above in conjunction with Example 3. [169] In a fourth embodiment, the configuration manager 194 instantiates on the pipeline accelerator 14 (FIG. 1) a portion of the circuit 220 per the above-described second embodiment of Example 5, and effectively instantiates the remaining portion of the circuit 220 in one or more data-processing application threads 240 per the preceding paragraph.
  • Example 6 Example 6
  • circuit-definition file 770 (FIG. 8) that defines the circuit 220 specifies that the PLICs 6O 1 - 6O 8 are to be "directly" interconnected via the pipeline bus 20 (FIG. 1). That is, the PLIC 6O 1 provides the stream of values sin(x) to the PLIC 6O 5 without going through the message handler 34 and memory 18 as shown in FIG. 12.
  • 18O 7 (file 18O 5 is used twice) instantiate the communication interfaces 80 (FIG. 3) of the PLICs 6O 1 - 6O 8 Xo generate and send message objects (not shown) that identify the recipient PLIC and to recognize and receive messages from specified sender PLICs.
  • message objects are described in previously incorporated U.S. Patent Publication No. 2004/0181621.
  • these message objects each include an address header that identifies the destination PLIC or PLICs.
  • the communication interface 80 (FIG. 3) of the PLIC 6O 1 generates message objects that carry values sin(x) to the PLIC 6O 5 .
  • These message objects each include an address header that includes the address of the PLIC 6O 5 .
  • the communication interface 80 of the PLIC 6O 5 detects on the pipeline bus 20 (FIG. 1) a message object having this address, the interface uploads this message object from the bus.
  • the remaining PLICs 6O 2 - 6O 8 receive and generate message objects in a similar manner.
  • the configuration manager 194 soft configures the communication interfaces 80 (FIG. 3) of the PLICs 6O 1 - 6O 8 to receive and generate message objects per the preceding paragraph by sending appropriate soft-configuration data to the configuration managers 88 (FIG. 3) of the PLICs as discussed above in conjunction with Example 1.
  • the configuration data 700 may include meta-data that describes an algorithm, such as the algorithm represented by equation (1), and the configuration manager 794 may cause the peer vector machine 70 to implement the algorithm based on this meta-data. More specifically, the configuration manager 794 may first determine the attributes of the peer vector machine 70 as previously described. Next, based on the meta-data and the determined attributes of the peer vector machine 70, the configuration manager 794 may define an implementation of the algorithm that is compatible with the platform(s) supported by, and the components present within, the peer vector machine.
  • the configuration manager 794 may define the implementation using one or more templates from the library 702, one or more software objects 760 from the library 704, one or more circuit-definition files from the library 706, and one or more firmware files 780 from the library 708, or using any combination of these items. Then, the configuration manager 794 may instantiate the implementation on the peer vector machine 70 using any technique described above, any other technique(s), or any combination of these described/other techniques.
  • the configuration data 700 may include both meta-data that describes an algorithm and a pointer to a circuit-definition file 770 that defines a circuit for implementing the algorithm. If for some reason the circuit defined by the file 170 is incompatible with the peer vector machine 10, then the configuration manager 194 may define an implementation of the algorithm per above.
  • the configuration manager 194 may transfer a function previously performed by a failed pipeline unit 50 (FIG. 2) of the pipeline accelerator 14 (FIG. 1) to the host processor 12 (FIG. 1), and vice versa. That is, the host processor 12 may provide redundancy to the accelerator 14, and vice versa. Consequently, instead of adding redundant processing units to the host processor 12, it may be less expensive and less complex from a design perspective to add extra pipeline units 50 to the accelerator 14, where the configuration manager 194 can use these extra units to provide redundancy to both the host processor 12 and the accelerator.
  • the peer-vector machine 10 may include both extra processing units 32, and pipeline units 50 and may also include extras of other components of the host-processor 12 and the accelerator 14.
  • a soft failure include, e.g., corrupted configuration firmware or soft-configuration data stored in the PLIC 6O 1 , a buffer overflow, and a value sin(x) that is generated by the pipeline 44 1 on the PLIC 6O 1 but that is out of the predetermined range -1 ⁇ sin(x) ⁇ 1.
  • the accelerator-exception manager 192 detects the failure of the PLIC 6O 1 .
  • the exception manager 192 detects the failure in response to an exception received from the exception manager 86 on board the PLIC 6O 1 . For example, because -1 ⁇ sin(x) ⁇ 1 , then the exception manager 86 may be programmed to generate an exception if a value generated by the pipeline 44 1 is less than -1 or greater than 1. Or, the exception manager 86 may be programmed to send an exception if an input buffer for the value x on the data memory 58 overflows.
  • the exception manager 192 detects the failure in response to an improper value of data provided to or generated by the pipeline 44 1 on the PLIC 6O 1 .
  • the exception manager 192 may periodically analyze the stream of values x provided to the PLIC 6O 1 , or the stream of values sin(x) provided by the PLIC 60i, and detect a failure of the PLIC 6O 1 if any of the analyzed values are less than - 1 or greater than 1.
  • the exception manager 192 detects the failure in response to the PLIC 60- ⁇ failing to provide the stream of values sin(x).
  • the exception manager 192 may periodically analyze the stream of values sin(x) provided by the PLIC 6O 1 , and detect a failure of the PLIC 6O 1 if the PLIC 6O 1 stops generating sin(x) despite continuing to receive the stream of input values x.
  • the exception manager 792 notifies the configuration manager 194 that the PLIC 6O 1 has experienced a soft failure.
  • the configuration manager 194 first halts the processing of data by the PLICs 6O 1 - 6O 8 and any related data-processing applications 190 that the processor unit 32 is executing.
  • Examples of a related data-processing application 190 include an application that generates the values x or z or that receives and processes the values y.
  • the configuration manager 194 reloads this data into the PLIC 6O 1 and restarts the processing of data by the PLICs 6O 1 - 6O 8 and any related data-processing applications 790 that the processing unit 32 is executing.
  • the exception manager 192 detects no failure of the PLIC 6O 1 after the restart, then the configuration manager 194 allows the PLICs 6O 1 - 6O 8 and any related data-processing applications 190 to continue processing data.
  • the configuration manager 194 causes the PLIC 6O 1 to re-download the firmware file 180 (FIG. 9) that the PLIC 6O 1 downloaded during initialization of the peer-vector machine 10 (FIG. 1), and restarts the PLICs 6O 1 - 6O 8 and any related data-processing applications 190 for a second time.
  • the configuration manager 194 allows the PLICs 6O 1 - 6O 8 and any related data-processing applications 190 to continue processing data. [189] But if the exception manager 192 detects a failure of the PLIC
  • the configuration manager 794 again halts the processing of data by the PLICs 6O 1 - 6O 8 and any related data-processing applications 190.
  • the configuration manager 194 determines whether the pipeline accelerator 14 (FIG. 1) includes an extra PLIC 60 that is the same as or is similar to the PLIC 6O 1 .
  • the extra PLIC may be a PLIC that is reserved to replace a failed PLIC, or may merely be a PLIC that is unused. Also, the extra PLIC may be on an extra pipeline unit 50, or on a pipeline unit 50 that includes other, no-extra PLICs. Liyij If the pipeline accelerator 14 (FlG. 1) does include an extra PLIC 60 that is the same as or is similar to the PLIC 6O 1 .
  • the extra PLIC may be a PLIC that is reserved to replace a failed PLIC, or may merely be a PLIC that is unused. Also, the extra PLIC may be on an extra pipeline unit 50, or on a pipeline unit 50 that includes other, no-extra PLICs. Liyij If the pipeline accelerator 14 (FlG. 1) does include an extra PLIC 60 that is the same as or is similar to the PLIC 6
  • the configuration manager 194 causes the extra PLIC to download the same firmware file 180 (FIG. 9) previously downloaded by the PLIC 6O 1 during initialization of the peer-vector machine 10 (FIG. 1) and prior to the second restart.
  • the configuration manager 194 allows the PLICs 6O 2 - 6O 8 , the extra PLIC, and any related data-processing applications 190 to continue processing data.
  • the configuration manager 194 halts for a fourth time the processing of data by the PLICs 6O 1 - 6O 8 and any related data-processing applications 190 if the data processing is not already halted.
  • the configuration manager 194 may replace the failed PLIC 6O 1 with this other extra PLIC, and restart the data processing as discussed above.
  • the configuration manager 194 determines whether the circuit 220 can "fit" into the remaining PLICs 6O 2 - 6O 8 in a manner similar to that discussed above in conjunction with Example 2.
  • the configuration manager 194 reinstantiates the circuit 220 on these remaining PLICs in a manner similar to that discussed above in conjunction with Example 2, and restarts the PLICs 6O 2 - Q O 8 and any related data- processing applications 790.
  • the configuration manager 194 allows the PLICs QO 2 - QO 8 and any related data-processing applications 790 to continue processing data.
  • the configuration manager 794 halts the processing of data by the PLICs QO 2 - GO 8 and any corresponding data-processing applications 790 if the data-processing is not already halted.
  • the configuration manager 794 reads the software-object descriptions 792 (FIG. 7) to determine whether the library 704 (FIGS. 4 and 7) includes a software object 760 (FIG. 7) that can generate sin(x).
  • the configuration manager 794 instantiates on the processing unit 32 a data-processor application thread that executes the object 760 for generating sin(x) in a manner similar to that discussed above in conjunction with Example 3 and FIG. 14, and restarts the data processing.
  • the configuration manager 794 allows the PLICs GO 2 - GO 8 , the sin(x) application thread that executes the sin(x) software object 760, and any related data-processing applications to continue processing data.
  • an error message in response to which an operator (not shown) may take corrective action such as replacing the PLIC 6O 1 or replacing the pipeline unit 50 on which the defective PLIC 6O 1 is disposed.
  • Example 7 may omit any number of the above-described steps, and perform the non- omitted steps in any order.
  • the configuration manager 194 may generate an application thread that executes a sin(x) software object 160 (FIG. 7) without first trying to reconfigure the PLIC 6O 1 , to re-download the respective firmware file 180 (FIG. 9) to the PLIC 6O 1 to replace the PLIC 6O 1 with an extra PLIC, or to "fit" the circuit 220 on the remaining PLICs 6O 2 - 6O 8 .
  • the exception manager 190 may be omitted, and the configuration manager 194 may directly detect the failure of one or more PLICs 6O 1 - 6O 8 .
  • the configuration manager 194 may halt other portions of the peer-vector machine 10 as well, including halting the entire peer-vector machine. Example 8
  • a data-processing application 190 is generating y of equation (1) and experiences a failure while the peer-vector machine 10 (FIG. 1) is operating.
  • failure include, e.g., a mechanical failure of one or more processors that compose the processing unit 32, or the inability of the data-processing application 190 to process data at or above a specified speed.
  • the exception manager 192 detects the failure of the data-processing application 190.
  • the exception manager 192 detects the failure in response to an improper value of x or z being provided to the data- processing application 190, or in response to an improper value of y being generated by the application.
  • the exception manager 192 may periodically analyze the respective streams of values x and z provided to the data-processing application 190, or the stream of values y generated by the data-processing application, and detect a failure of the data-processing application if, e.g., the analyzed values are outside of a predetermined range or the data-processing application stops generating output values y despite continuing to receive the values x and z.
  • the exception manager 192 detects the failure in response to the frequency at which the data-processing application 190 generates the values y being below a predetermined frequency. [209] Next, the exception manager 192 notifies the configuration manager 194 that the data-procession application 790 has failed.
  • the configuration manager 194 first halts the processing of data by the data-processing application 190 and any related PLICs 60 (FIG. 3) of the pipeline accelerator 14 (FIG. 1).
  • Examples of a related PLIC include a PLIC that generates the input values x or z for the data-processing application 190 or that receive the values y from the application.
  • the configuration manager 194 determines whether the data-processing application 190 can be loaded onto and run by another portion of the processing unit such as an extra processor.
  • the configuration manager 194 loads the data-processing application onto the other portion of the processing unit 32, and restarts the application and any related PLICs.
  • the configuration manager 194 attempts to instantiate on the pipeline accelerator 14 (FIG. 1) a circuit, such as the circuit 220, for generating the stream of values y of equation (1 ) in place of the failed data-processing application 190.
  • the configuration manager 194 determines whether the library 108 (FIGS. 4 and 9) includes a firmware file 180 (FIG. 9) that can instantiate such a circuit on a single PLIC 60.
  • the configuration manager 194 downloads the file to a PLIC 60 of the pipeline accelerator 14 (FIG. 1), generates data-transfer objects 202 for transferring x and z to the PLIC and for transferring y from the PLIC, and starts the accelerator.
  • the configuration manager 194 may omit some or all of the data-transfer objects 202 if the pipeline accelerator 14 receives x or z via the input port 24 or provides y via the output port 26.
  • the configuration manager 194 allows the PLIC 60 and any related data-processing application 190 (e.g., to provide x or z to receive y from the PLIC 60) to continue processing data.
  • the configuration manager 194 determines whether the library 106 (FIGS. 4 and 8) includes a circuit-definition file 170 (FIG. 8) that describes a circuit, such as the circuit 220, for generating y of equation
  • the configuration manager 194 downloads the corresponding firmware files 180 (FIG. 9) to the corresponding PLICs 60 of the pipeline accelerator 14 (FIG. 1). For example, if the file circuit-definition 170 describes the circuit 220 of FIG. 11 , then the configuration manager 194 downloads the firmware files 180i - 18O 7 into the respective PLICs 6O 1 - 6O 8 (the file 18O 5 is downloaded into both the PLICs 6O 5 and 6O 6 as discussed above in conjunction with Example 1 ).
  • the configuration manager 794 may, as discussed above in conjunction with Example 5, generate the omitted firmware file from templates in the library 702 (FIGS. 4 and 6), store the generated firmware file in the library 708, and download the stored firmware file into the respective PLIC 60.
  • the configuration manager 794 may use the circuit-design tool (not shown) described in previously incorporated U.S. Patent Application Ser. No. (Attorney Docket No. 1934-023-03) to generate such a circuit-definition file as discussed above in conjunction with Example 5. Next, the configuration manager 794 generates (if necessary) and downloads the corresponding firmware files 18O 1 - 78O 7 into the corresponding PLICs 6O 1 - 6O 8 as described in the preceding paragraph. [222] After downloading the firmware files 18O 1 - 18O 7 into the PLICs
  • the configuration manager 194 instantiates the data-transfer objects 20I 1 - 202 21 (FIG. 12) as discussed above in conjunction with Example 1 , and starts the PLICs 6O 1 - 6O 8 and any related data-processing applications 790.
  • the configuration manager 194 cannot instantiate on the pipeline accelerator 14 (FIG. 1) a circuit for generating y of equation (1), then the configuration manager generates an error message in response to which an operator (not shown) can take corrective action.
  • the configuration manager 194 may be unable to instantiate such a circuit because, e.g., the accelerator 14 lacks sufficient resources or does not support a compatible platform, or the library 102 (FIGS. 4 and 6) lacks the proper templates.
  • Example 8 alternate embodiments of Example 8 are contemplated.
  • the configuration manager 194 may omit any number of the above-described steps, and perform the unomitted steps in any order.
  • the exception manager 190 may be omitted, and the configuration manager 794 may directly detect the failure of the data-processing application 790 that generates y of equation (1 ), and may directly detect the failure of any other portion of the peer- vector machine 70 (FIG. 1).
  • FIG. 15 is a block diagram of the peer-vector machine 70, which, in addition to the host processor 12 and pipeline accelerator 14, includes at least one redundant processing unit 250 and at least one redundant pipeline unit 252 according to an embodiment of the invention.
  • the redundant processing units 250 and the redundant pipeline units 252 provide fault-tolerant capabilities in addition to the dynamic-reconfiguration capabilities described above in conjunction with Examples 7 and 8. For example, if a PLIC 60 (FIG. 3) in the pipeline accelerator 14 fails, then the configuration manager 194 (FIG. 10) may dynamically reconfigure a redundant PLIC (not shown) on a redundant pipeline unit 252 to replace the failed PLIC 60 in a manner that is similar to that described above in conjunction with Example 7. Similarly, if the processing unit 32 (FIG. 10) of the host processor "/2 fails, then the configuration manager 194 may dynamically reconfigure a redundant processing unit 250 to replace the failed processing unit in a manner that is similar to that described above in conjunction with Example 8.
  • the configuration manager 194 may dynamically reconfigure a redundant processing unit 250 to replace a failed portion of the pipeline accelerator 14, or may dynamically reconfigure one or more redundant PLICs on one or more of the redundant pipeline units 252 to replace a failed processing unit 32 or another failed portion of the host processor 12 in a manner that is similar to that described above in conjunction with Examples 7 and 8.
  • the dynamic reconfiguration of the host processor 12 and the pipeline accelerator 14 may destroy the states of the, e.g., registers (not shown), in the host processor and in the pipeline accelerator. Consequently, once restarted after dynamic reconfiguration, the host processor 12 and pipeline accelerator 14 may need to reprocess all of the data processed prior to the failure that initiated the reconfiguration.
  • FIG. 16 is a block diagram of the peer-vector machine 10, which includes system-restore capabilities according to an embodiment of the invention.
  • this embodiment of the machine 10 periodically saves the states of some or all of the, e.g., registers, within the host processor 12 and the pipeline accelerator 14. Therefore, in the event of a failure and a subsequent restart, the peer-vector machine 10 can respectively restore the last-saved states to the host processor 12 and to the pipeline accelerator 14 so as to reduce or eliminate the volume of pre-failure data that the machine must reprocess.
  • the pipeline bus 20, the optional router 31, the optional redundant processing unit(s) 250, and the optional redundant pipeline unit(s) 254, this embodiment of the peer-vector machine 10 includes a system-restore server 260 and a system-restore bus 262.
  • the registers and other data-storing components of the host processor 12 and the pipeline accelerator 14 (and the redundant processing unit(s) 250 and pipeline unit(s) 252 if present and in use) periodically "dump" their contents onto the system-restore server 260 via the system-restore bus 262.
  • the separation of the system-restore bus 262 from the pipeline bus 20 reduces or eliminates a data-processing-speed penalty that this data dump may cause, and otherwise prevents a "bottleneck" on the bus 20.
  • the host processor 12 After a dynamic reconfiguration but before a restart of the peer-vector machine 10, the host processor 12 causes the server 260 to upload the last-saved set of data into the respective registers and other data-storing components.
  • the peer-vector machine 10 starts processing data from the point in time of the last-dumped set of data, and thus reprocesses only the pre-failure data that it processed between the last data dump and the failure.
  • the system-restore server 260 and the system-restore bus 262 provide a reduction in the overall data-processing time whenever the configuration manager 194 dynamically reconfigures and restarts the peer-vector machine.
  • system-restore bus 262 may be omitted, and the host processor 12 and the pipeline accelerator 14 (and the redundant processing unit(s) 250 and the redundant pipeline units 252 if present and in use) dump data to the system-restore server 260 via the pipeline bus 20.
  • FIG. 17 is a block diagram of a hardwired pipeline 44 that includes a save/restore circuit 270 according to an embodiment of the invention.
  • the circuit 270 allows the pipeline 44 to periodically "dump" the data within the pipeline's working registers (not shown in FIG. 17), and to restore the dumped data, as discussed above in conjunction with FIGS. 15- 16.
  • the save/restore circuit 270 is part of the framework-services layer 72 (FIGS. 2-3), and causes the working registers (not shown in FIG. 17) of the hardwired pipeline 44 to dump their data to the system-restore server 260 (FIG. 16) via the system-restore bus 262 under the control of a data-save manager 272, which is executed by the processing unit 32 of the host processor 12 (FIG. 10).
  • the data-save manager 272 and the circuit 270 may communicate with one another by sending messages over the system-restore bus 262, or over the pipeline bus 20 (FIG. 16).
  • the data-save manager 272 may be a part of the configuration manager 194 (FIG. 10), or the configuration manager 194 may perform the function(s) of the data-save manager.
  • the save/restore circuit 270 causes the working registers (not shown in FIG. 17) of the hardwired pipeline 44 to load saved data (typically the lasted-saved data) from the system-restore server 260 (FIG. 16) via the system-restore bus 262 under the control of a data-restore manager 274, which is executed by the processing unit 32 of the host processor 12 (FIG. 10).
  • the data-restore manager 274 and the circuit 270 may communicate with one another by sending messages over the system-restore bus 262, or over the pipeline bus 20 (FIG. 16).
  • the data-restore manager 274 may be a part of the configuration manager 194 (FIG. 10), or the configuration manager 194 may perform the function(s) of the data-restore manager.
  • the hardwired pipeline 44 includes one or more configurable registers and logic 276, and one or more exception registers and logic 278.
  • the configurable registers and logic 276 receive and store configuration data from the configuration manager 88 (see also FIG. 3), and use this stored configuration data to configure the pipeline 44 as previously described.
  • the exception registers and logic 274 generate and store exception data in response to exceptions that occur during operation of the pipeline 44, and provide this data to the exception manager 86 (see also FIG. 3) for handling as previously described.
  • the data-save manager 272 periodically causes the save/restore circuit 270 to dump the data from selected working registers (not shown in FIG. 17) within the hardwired pipeline 44 to the system-restore server 260 via the system-restore bus 262.
  • Data within the configurable register(s) 276 typically selects which working registers dump their data, and the data may so select any number of the working registers.
  • the data-save manager 272 may cause the save/restore circuit 270 to dump data from the selected working registers via the pipeline bus 20.
  • the exception register(s) and logic 278 may send a corresponding exception to the exception manager 86.
  • the configuration manager 194 may repeat the data-dump operation, at least for the hardwired pipeline(s) 44 that generate the exception.
  • the save/restore circuit 270 causes the save/restore circuit 270 to load previously dumped and saved data from the system-restore server 260 (FIG. 16) into the respective working registers (not shown in FIG. 17) within the hardwired pipeline 44 via the system-restore bus 262 or the pipeline bus 20.
  • the configuration manager 194 (FIG. 10) typically loads the configurable register(s) 276 with data that selects which working registers are to load data. Alternatively, data identifying the working registers which are to load restored data may have been previously stored in nonvolatile memory within the configuration register(s) and logic 276.
  • the save/restore circuit 270 may then run a check to make sure that it properly loaded the restore data.
  • the exception register(s) and logic 278 may send a corresponding exception to the exception manager 86.
  • the configuration manager 194 may repeat the system-restore operation, at least for the hardwired pipeline(s) 44 that generate the exception.
  • FIG. 18 is a more-detailed block diagram of the hardwired pipeline 44 of FIG. 17 according to an embodiment of the invention.
  • the hardwired pipeline 44 includes one or more working registers 280 (for clarity, only one working register is shown in FIG. 18), a respective input-data multiplexer 282 for each working register, a load port 281, a data-input port 283, and a data-output port 285.
  • the save-restore circuit 270 includes a respective data-save register 284 and a respective data-restore register 286 for each working register 280, saved-data transmit logic 288, and restored-data receive logic 290. [246] Still referring to FIG. 18, the operation of the hardwired pipeline
  • the data-save manager 272 causes the data-save register 284 to download the data from the corresponding working register 280 during each predetermined number of cycles of the save-restore clock.
  • the data-save manager 272 also causes the transmit logic 288 to transfer the data from the register 284 to the system-restore server 260 (FIG. 16), typically at the same rate at which the register 284 downloads data from the working register 280.
  • the working register 280 may load data from the data-input port 283 via the multiplexer 282 in response to a hardwired-pipeline clock and a load command on the load port 281, and may provide data via the data-output port 285.
  • the save-restore circuit 270 may include fewer data-save registers 284 than working registers 280, such that a single data-save register may serve multiple working registers, perhaps even all, of the working registers within the pipeline 44.
  • a data-save register 284 cooperates with the transmit logic 288 to download data from the corresponding working registers 280 in a serial fashion.
  • the configuration manager 194 may return the hardwired pipeline 44 to normal operation.
  • the save-restore circuit 270 may include fewer data-restore registers 286 than working registers 280, such that a single data-restore register may serve multiple working registers, perhaps all of the working registers in the pipeline 44.
  • such a data-restore register 286 cooperates with the receive logic 290 to upload data to the corresponding working registers 280 in a serial fashion.
  • a data-restore register 286 cooperates with the receive logic 290 to upload data to the corresponding working registers 280 in a serial fashion.
  • FIGS. 1-18 alternate embodiments of the peer vector machine 10 are contemplated.
  • some or all of the components of the peer vector machine 10, such as the host processor 12 (FIG. 1) and the pipeline units 50 (FIG. 3) of the pipeline accelerator 14 (FIG. 1), may be disposed on a single integrated circuit.

Abstract

A computing machine includes programmable integrated circuits, a configuration registry, and a processor. The registry stores a file that defines a circuit having portions, and the processor is, in response to the file, operable to instantiate one of the circuit portions on one of the programmable integrated circuits. Consequently, by accessing a file that defines a circuit, such a computing machine can often instantiate the circuit on a pipeline accelerator regardless of the hardware that compose the accelerator and despite modifications to the circuit or to the hardware. That is, the computing machine can often 'fit' the circuit into the pipeline accelerator regardless of its composition. A computing machine comprises an electronic circuit operable to perform a function, a programmable integrated circuit such as an FPGA, and a processor. The processor is operable to detect a failure of the electronic circuit and to configure the programmable integrated circuit to perform the function of the electronic circuit in response to detecting the failure. Alternatively, the computing machine comprises a hardwired pipeline operable to perform a function and a processor operable to detect a failure of the pipeline and to perform the function in response to detecting the failure. By allowing a first type of circuit (e.g., an FPGA) to take over for a failed second type of circuit (e.g., a processor), such a computing machine can be fault-tolerant without having redundant versions of each component, and may thus be less expensive and smaller than computing machines of comparable computing power. According to an embodiment of the invention, a computing machine comprises a pipeline accelerator, a host processor coupled to the oineline scheme that is often more flexible than conventional schemes. For example, if the pipeline accelerator has more extra 'space' than the host processor, then one can add to the computing machine one or more redundant pipeline units that can provide redundancy to both the pipeline and the host processor. Therefore, the computing machine can include redundancy for the host processor even though it has no redundant processing units.

Description

CONFIGURABLE COMPUTING MACHINE AND RELATED SYSTEMS
AND METHODS
CLAIM OF PRIORITY
[1] This application claims priority to U.S. Provisional Application Serial Nos. 60/615,192, 60/615,157, 60/615,170, 60/615,158, 60/615,193, and 60/615,050, filed on 01 October 2004, which are incorporated by reference.
CROSS REFERENCE TO RELATED APPLICATIONS
[2] This application is related to U.S. Patent Application Serial Nos. (Attorney Docket
Nos. 1934-021-03, 1934-023-03, 1934-024-03, 1934-026-03, 1934-031-03, 1934-035-03, and 1934-036-03), which have a common filing date of 03 October 2005 and assignee and which are incorporated by reference.
BACKGROUND [3] A peer-vector computing machine, which is described in the following U.S. Patent Publications, includes a pipeline accelerator that can often perform mathematical computations ten to one hundred times faster than a conventional processor-based computing machine can perform these computations: 2004/0133763; 2004/0181621 ; 2004/0136241 ; 2004/0170070; and, 2004/0130927, which are incorporated herein by reference. The pipeline accelerator can often perform mathematical computations faster than a processor because unlike a processor, the accelerator processes data in a pipelined fashion while executing few, if any, software instructions.
[4] Unfortunately, despite its oft-superior data-processing speed, a peer-vector computing machine may lack some of the popular features of a conventional processor-based computing machine. [5] For example, a peer-vector computing machine may lack the ability to configure itself to operate with the installed hardware that composes the pipeline accelerator; and may lack the ability to reconfigure itself in response to a change in this hardware. [6] Typically, a conventional processor-based computing machine can configure its software and settings during its start-up routine to operate with the hardware installed in the machine, and can also reconfigure its software and settings in response to a change in this hardware. For example, assume that while the processor-based machine is "off1, one increases the amount of the machine's random-access memory (RAM). During the next start-up routine, the machine detects the additional RAM, and reconfigures its operating system to recognize and exploit the additional RAM during subsequent operations. Similarly, assume that one adds a wireless-router card to the bus while the processor-based machine is off. During the next start-up routine, the machine detects the card and configures its operating system to recognize and allow a software application such as a web browser to use the card (the machine may need to download the card's driver via a CD-ROM or the internet). Consequently, to install new hardware in a typical processor-based machine, an operator merely inserts the hardware into the machine, which then configures or reconfigures the machine's software and settings without additional operator input.
[7] But a peer-vector machine may lack the ability to configure or reconfigure itself to operate the hardware that composes the pipeline accelerator. For example, assume that one wants the peer-vector machine to instantiate a pre-designed circuit on multiple programmable-logic integrated circuits (PLICs) such as field-programmable gate arrays (FPGAs), each of which is disposed on a respective pipeline unit of the pipeline accelerator. Typically, one manually generates configuration-firmware files for each of the PLICs, and loads these files into the machine's configuration memory. During a start-up routine, the peer- vector machine causes each of the PLICs to download a respective one of these files. Once the PLICs have downloaded these firmware files, the circuit is instantiated on the PLICs. But if one modifies the circuit, or modifies the type or number of pipeline units in the pipeline accelerator, then he may need to manually generate new configuration-firmware files and load them into the configuration memory before the machine can instantiate the modified circuit on the pipeline accelerator.
[8] Furthermore, the peer-vector computing machine may lack the ability to continue operating if a component of the machine fails.
[9] Some conventional processor-based computing machines have redundant components that allow a machine to be fault tolerant, i.e., to continue operating when a component fails or otherwise exhibits a fault or causes a fault in the machine's operation. For example, a multi- processor-based computing machine may include a redundant processor that can "take over" for one of main processors if and when a main processor fails.
[10] But unfortunately, a peer-vector machine may have a lower level of fault tolerance than a fault-tolerant processor-based machine.
[11] Moreover, existing fault-tolerant techniques may add significant cost and complexity to a computing machine. Per the above example, assume that a processor-based computing machine includes a redundant processor. If the machine has only one main processor, then adding the redundant processor may double the area that the processors occupy, and may double the costs for procuring and maintaining the processors.
[12] Therefore, a need has arisen for a peer-vector computing machine that can configure itself to operate with the hardware that composes the pipeline accelerator, and that can reconfigure itself to recognize and operate with newly modified accelerator installed hardware. [13] A need has also arisen for a peer-vector machine having a higher level of fault tolerance.
[14] Furthermore, a need has arisen for a fault-tolerant technique that is less costly and complex than providing redundancy solely by the inclusion of dedicated redundant components.
SUMMARY
[15] According to an embodiment of the invention, a computing machine includes programmable integrated circuits, a configuration registry, and a processor. The registry stores a file that defines a circuit having portions, and the processor is, in response to the file, operable to instantiate one of the circuit portions on one of the programmable integrated circuits.
[16] Consequently, by accessing a file that defines a circuit, such a computing machine can often instantiate the circuit on a pipeline accelerator regardless of the hardware that composes the accelerator and despite modifications to the circuit or to the hardware. That is, the computing machine can often "fit" the circuit into the pipeline accelerator regardless of its composition.
[17] According to an embodiment of the invention, a computing machine comprises an electronic circuit operable to perform a function, a first programmable integrated circuit, and a first processor. The first processor is operable to detect a failure of the electronic circuit and configure the programmable integrated circuit to perform the function of the electronic circuit in response to detecting the failure.
[18] By allowing a first type of circuit to take over for a failed second type of circuit, such a computing machine can be fault-tolerant without having redundant versions of each component. For example, such a computing machine allows a programmable integrated circuit such as a field-programmable gate array (FPGA) to "take over" for a failed electronic circuit such as another FPGA, an ASIC, or a processor. Consequently, by allowing an FPGA to "take over" for an ASIC and for a processor, such a computing machine can omit a redundant ASIC and a redundant processor, and may thus allow a reduction in the cost and size of the computing machine. [19] According to another embodiment of the invention, a computing machine comprises a hardwired pipeline operable to perform a function and a processor operable to detect a failure of the pipeline and perform the function in response to detecting the failure.
[20] By allowing processor to take over for a hardwired pipeline disposed on, e.g., an FPGA, such a computing machine can omit redundant hardware, and may thus allow a reduction in the cost and size of the computing machine as described above.
[21] According to an embodiment of the invention, a computing machine comprises a pipeline accelerator, a host processor coupled to the pipeline accelerator, and a redundant processor, a redundant pipeline unit, or both, coupled to the host processor and to the pipeline accelerator. The computing machine may also include a system-restore server and a system-restore bus that allow the machine to periodically save the machine states in case of a failure.
[22] Such a computing machine has a fault-tolerant scheme that is often more flexible than conventional schemes. For example, if the pipeline accelerator has more extra "space" than the host processor, then one can add to the computing machine one or more redundant pipeline units that can provide redundancy to both the pipeline and the host processor. Therefore, the computing machine can include redundancy for the host processor even though it has no redundant processing units. Likewise, if the host processor has more extra "space" than the pipeline accelerator, then one can add to the computing machine one or more redundant processing units that can provide redundancy to both the pipeline and the host processor. BRIEF DESCRIPTION OF THE DRAWINGS
[23] FIG. 1 is a schematic block diagram of a peer-vector computing machine according to an embodiment of the invention.
[24] FIG. 2 is a schematic block diagram of a pipeline unit from the pipelined accelerator of FIG. 1 and including a PLIC according to an embodiment of the invention.
[25] FIG. 3 is a block diagram of the circuitry that composes the interface-adapter and framework-services layers of the PLIC of FIG. 2 according to an embodiment of the invention. [26] FIG. 4 is a block diagram of the accelerator/host-processor- configuration registry of FIG. 1 according to an embodiment of the invention.
[27] FIG. 5 is a diagram of a hardware-description file that describes, in a top-down fashion the layers of circuitry to be instantiated on a simple PLIC according to an embodiment of the invention.
[28] FIG. 6 is a block diagram of the accelerator-template library of
FIG. 4 according to an embodiment of the invention.
[29] FIG. 7 is a block diagram of the software-object library of FIG.
4 according to an embodiment of the invention.
[30] FIG. 8 is a block diagram of the circuit-definition library of FIG.
4 according to an embodiment of the invention.
[31] FIG. 9 is a block diagram of the accelerator-firmware library of
FIG. 4 according to an embodiment of the invention.
[32] FIG. 10 is a functional block diagram of the host processor of FIG. 1 according to an embodiment of the invention.
[33] FIG. 11 is a schematic block diagram of a circuit defined by a file in the circuit-definition library of FIGS. 4 and 8 for instantiation on the pipeline accelerator of FIG. 1 according to an embodiment of the invention. [34] FIG. 12 is a functional block diagram of the data paths between the PLICs of FIG. 11 according to an embodiment of the invention.
[35] FIG. 13 is a schematic block diagram of the circuit of FIG. 11 instantiated on fewer PLICs according to an embodiment of the invention. [36] FIG. 14 is a functional block diagram of the data paths between the portions of the circuit of FIG. 11 instantiated on the pipeline accelerator of FIG. 1 and a software-application thread that the processing unit of FIG. 10 executes to perform the function of an un-instantiated portion of the circuit according to an embodiment of the invention. [37] FIG. 15 is block diagram of a peer-vector computing machine having redundancy according to an embodiment of the invention.
[38] FIG. 16 is a block diagram of a peer-vector computing machine having a system-restore server and a system-restore bus according to an embodiment of the invention. [39] FIG. 17 is a block diagram of a hardwired pipeline that includes a save/restore circuit according to an embodiment of the invention.
[40] FIG. 18 is a more-detailed block diagram of the hardwired pipeline of FIG. 17 according to an embodiment of the invention.
DETAILED DESCRIPTION Introduction
[41] An accelerator-configuration manager that, according to embodiments of the invention, configures a peer-vector machine to operate with the hardware that composes the machine's pipeline accelerator and that reconfigures the machine to recognize and operate with newly modified accelerator hardware are discussed below in conjunctions with FIGS. 10 - 14. And an accelerator-configuration registry that, according to embodiments of the invention, facilitate the configuration manager's ability to configure and reconfigure the peer-vector machine is discussed below in conjunction with FIGS. 4 - 9.
[42] Furthermore, improved fault-tolerant techniques that, according to embodiments of the invention, allow a peer-vector machine to continue operating if a portion of the machine fails are discussed below in conjunction with FIGS. 10 - 16.
[43] But first is presented in conjunction with FIGS. 1 - 3 an overview of peer-vector-machine concepts that should facilitate the reader's understanding of the above-mentioned configuration manager, configuration registry, and fault-tolerant techniques.
Overview Of Peer-Vector-Machine Concepts
[44] FIG. 1 is a schematic block diagram of a computing machine
10, which has a peer-vector architecture according to an embodiment of the invention. In addition to a host processor 12, the peer-vector machine 10 includes a pipelined accelerator 14, which is operable to process at least a portion of the data processed by the machine 10. Therefore, the host- processor 12 and the accelerator 14 are "peers" that can transfer data messages back and forth. Because the accelerator 14 includes hardwired circuits (typically logic circuits) instantiated on one or more PLICs, it executes few, if any, program instructions, and thus for a given clock frequency, often performs mathematically intensive operations on data significantly faster than a bank of computer processors can. Consequently, by combining the decision-making ability of the processor 12 and the number-crunching ability of the accelerator 14, the machine 10 has many of the same abilities as, but can often process data faster than, a conventional processor-based computing machine. Furthermore, as discussed below and in previously incorporated U.S. Patent Application Publication No. 2004/0136241 , providing the accelerator 14 with a communication interface that is compatible with the interface of the host processor 12 facilitates the design and modification of the machine 10, particularly where the communication interface is an industry standard. And where the accelerator 14 includes multiple pipeline units (not shown in FIG. 1) which are sometimes called daughter cards, providing each of these units with this compatible communication interface facilitates the design and modification of the accelerator, particularly where the communication interface is an industry standard. Moreover, the machine 10 may also provide other advantages as described in the following previously incorporated U.S. Patent Publication Nos.: 2004/0133763; 2004/0181621 ; 2004/0136241 ; 2004/0170070; and, 2004/0130927. [45] Still referring to FIG. 1 , in addition to the host processor 12 and the pipelined accelerator 14, the peer-vector computing machine 10 includes a processor memory 16, an interface memory 18, a pipeline bus 20, a firmware memory 22, an optional raw-data input port 24, an optional processed-data output port 26, and an optional router 31.
[46] The host processor 12 includes a processing unit 32 and a message handler 34, and the processor memory 16 includes a processing-unit memory 36 and a handler memory 38, which respectively serve as both program and working memories for the processor unit and the message handler. The processor memory 36 also includes an accelerator/host-processor-configuration registry 40 and a message-configuration registry 42. The registry 40 stores configuration data that allows a configuration manager (not shown in FIG. 1) executed by the host processor 12 to configure the functioning of the accelerator 14 and, in some situations as discussed below in conjunction with FIGS. 10 - 14, the functioning of the host processor. Similarly, the registry 42 stores configuration data that allows the host processor 12 to configure the structure of the messages that the message handler 34 sends and receives, and the paths over which the message handler sends and receives these messages. [47] The pipelined accelerator 14 includes at least one pipeline unit
(not shown in FIG. 1) on which is disposed at least one PLIC (not shown in FIG. 1). On the at least one PLIC is disposed at least one hardwired pipeline, which processes data while executing few, if any, program instructions. The firmware memory 22 stores the configuration-firmware files for the PLIC(s) of the accelerator 14. If the accelerator 14 is disposed on multiple PLICs, then these PLICs and their respective firmware memories may be disposed on multiple pipeline units. The accelerator 14 and pipeline units are discussed further in previously incorporated U.S. Patent Application Publication Nos. 2004/0136241 , 2004/0181621 , and 2004/0130927. The pipeline units are also discussed below in conjunction with FIGS. 2 - 3.
[48] Generally, in one mode of operation of the peer-vector computing machine 10, the pipelined accelerator 14 receives data from one or more data-processor software applications running on the host processor 12, processes this data in a pipelined fashion with one or more logic circuits that perform one or more mathematical operations, and then returns the resulting data to the data-processing application(s). As stated above, because the logic circuits execute few, if any, software instructions, they often process data one or more orders of magnitude faster than the host processor 12 can for a given clock frequency. Furthermore, because the logic circuits are instantiated on one or more PLICs, one can often modify these circuits merely by modifying the firmware stored in the memory 52; that is, one can often modify these circuits without modifying the hardware components of the accelerator 14 or the interconnections between these components.
[49] The operation of the peer-vector machine 10 is further discussed in previously incorporated U.S. Patent Application Publication No. 2004/0133763, the functional topology and operation of the host processor 12 is further discussed in previously incorporated U.S. Patent Application Publication No. 2004/0181621 , and the topology and operation of the accelerator 14 is further discussed in previously incorporated U.S. Patent Application Publication No. 2004/0136241.
[50] FIG. 2 is a schematic block diagram of a pipeline unit 50 of the pipeline accelerator 14 of FIG. 1 according to an embodiment of the invention.
[51] The unit 50 includes a circuit board 52 on which are disposed the firmware memory 22, a platform-identification memory 54, a bus connector 56, a data memory 58, and a PLIC 60. [52] As discussed above in conjunction with FIG. 1 , the firmware memory 22 stores the configuration-firmware file that the PLIC 60 downloads to instantiate one or more logic circuits, at least some of which compose the hardwired pipelines (44).
[53] The platform memory 54 stores one or more values, i.e., platform identifiers, that respectively identify the one or more platforms with which the pipeline unit 50 is compatible. Generally, a platform specifies a unique set of physical attributes that a pipeline unit may possess. Examples of these attributes include the number of external pins (not shown) on the PLIC 60, the width of the bus connector 56, the size of the PLIC, and the size of the data memory 58. Consequently, a pipeline unit 50 is compatible with a platform if the unit possesses all of the attributes that the platform specifies. So a pipeline unit 50 having a bus connector 56 with thirty two bits is incompatible with a platform that specifies a bus connector with sixty four bits. Some platforms may be compatible with the peer vector machine 10 (FIG. 1), and others may be incompatible. Consequently, the platform identifier(s) stored in the memory 54 may allow a configuration manager (not shown in FIG. 2) executed by the host processor 12 (FIG. 1) to determine whether the pipeline unit 50 is compatible with the platform(s) supported by the machine 10. And where the pipeline unit 50 is so compatible, the platform identifier(s) may also allow the configuration manager to determine how to configure the PLIC 60 or other portions of the pipeline unit as discussed below in conjunction with FIGS. 10 - 14.
[54] The bus connector 56 is a physical connector that interfaces the PLIC 60, and perhaps other components of the pipeline unit 50, to the pipeline bus 20 (FIG. 1).
[55] The data memory 58 acts as a buffer for storing data that the pipeline unit 50 receives from the host processor 12 (FIG. 1) and for providing this data to the PLIC 60. The data memory 58 may also act as a buffer for storing data that the PLIC 60 generates for sending to the host processor 12, or as a working memory for the hardwired pipeline(s) 44.
[56] Instantiated on the PLIC 60 are logic circuits that compose the hardwired pipeline(s) 44, and a hardware interface layer 62, which interfaces the hardwired pipeline(s) to the external pins (not shown) of the PLIC 60, and which thus interfaces the pipeline(s) to the pipeline bus 20 (via the connector 56), to the firmware and platform-identification memories 22 and 54, and to the data memory 58. Because the topology of the interface layer 62 is primarily dependent upon the attributes specified by the platform(s) with which the pipeline unit 50 is compatible, one can often modify the pipeline(s) 44 without modifying the interface layer. For example, if a platform with which the unit 50 is compatible specifies a thirty- two bit bus, then the interface layer 62 provides a thirty-two-bit bus connection to the bus connector 56 regardless of the topology or other attributes of the pipeline(s) 44.
[57] The hardware-interface layer 62 includes three circuit layers that are instantiated on the PLIC 60: an interface-adapter layer 70, a framework-services layer 72, and a communication layer 74, which is hereinafter called a communication shell. The interface-adapter layer 70 includes circuitry, e.g., buffers and latches, that interface the framework-services layer 72 to the external pins (not shown) of the PLIC 60. The framework-services layer 72 provides a set of services to the hardwired pipeline(s) 44 via the communication shell 74. For example, the layer 72 may synchronize data transfer between the pipeline(s) 44, the pipeline bus 20 (FIG. 1), and the data memory 58 (FIG. 2), and may control the sequence(s) in which the pipeline(s) operate. The communication shell 74 includes circuitry, e.g., latches, that interface the framework-services layer 72 to the pipeline(s) 44.
[58] Still referring to FIG. 2, alternate embodiments of the pipeline unit 50 are contemplated. For example, the memory 54 may be omitted, and the platform identifier(s) may stored in the firmware memory 22, or by a jumper-configurable or hardwired circuit (not shown) disposed on the circuit board 52. Furthermore, although the framework-services layer 72 is shown as isolating the interface-adapter layer 70 from the communication shell 74, the interface-adapter layer may, at least at some circuit nodes, be directly coupled to the communication shell. Furthermore, although the communication shell 74 is shown as isolating the interface-adapter layer 70 and the framework-services layer 72 from the pipeline(s) 44, the interface- adapter layer or the framework-services layer may, at least at some circuit nodes, be directly coupled to the pipeline(s).
[59] A pipeline unit similar to the unit 50 is discussed in previously incorporated U.S. Patent Application Publication No. 2004/0136241.
[60] FIG. 3 is a schematic block diagram of the circuitry that composes the interface-adapter layer 70 and the framework-services layer 72 of FIG. 2 according to an embodiment of the invention.
[61] A communication interface 80 and an optional industry-standard bus interface 82 compose the interface-adapter layer 70, and a controller 84, exception manager 86, and configuration manager 88 compose the framework-services layer 72. The configuration manager 88 is local to the PLIC 60, and is thus different from the configuration manager executed by the host processor 12 as discussed above in conjunction with FIG. 1 and below in conjunction with FIGS. 10 - 14. [62] The communication interface 80 transfers data between a peer, such as the host processor 12 (FIG. 1) or another pipeline unit 50 (FIG. 2), and the firmware memory 22, the platform-identifier memory 54, the data memory 58, and the following circuits instantiated within the PLIC 60: the hardwired pipeline(s) 44 (via the communication shell 74), the controller 84, the exception manager 86, and the configuration manager 88. If present, the optional industry-standard bus interface 82 couples the communication interface 80 to the bus connector 56. Alternatively, the interfaces 80 and 82 may be merged such that the functionality of the interface 82 is included within the communication interface 80.
[63] The controller 84 synchronizes the hardwired pipeline(s) 44 and monitors and controls the sequence in which it/they perform the respective data operations in response to communications, i.e., "events," from other peers. For example, a peer such as the host processor 12 may send an event to the pipeline unit 50 via the pipeline bus 20 to indicate that the peer has finished sending a block of data to the pipeline unit and to cause the hardwired pipeline(s) 44ή to begin processing this data. An event that includes data is typically called a message, and an event that does not include data is typically called a "door bell." [64] The exception manager 86 monitors the status of the hardwired pipeline(s) 44-,, the communication interface 80, the communication shell 74, the controller 84, and the bus interface 82 (if present), and reports exceptions to another exception manager (not shown in FIG. 3) the host processor 12 (FIG. 1). For example, if a buffer (not shown) in the communication interface 80 overflows, then the exception manager 86 reports this to the host processor 12. The exception manager may also correct, or attempt to correct, the problem giving rise to the exception. For example, for an overflowing buffer, the exception manager 86 may increase the size of the buffer, either directly or via the configuration manager 88 as discussed below. [65] The configuration manager 88 sets the "soft" configuration of the hardwired pipeline(s) 44, the communication interface 80, the communication shell 74, the controller 84, the exception manager 86, and the interface 82 (if present) in response to soft-configuration data from the host processor 12 (FIG. 1). As discussed in previously incorporated U.S. Patent Application Publication No. 2004/0133763, the "hard" configuration of a circuit within the PLIC 60 denotes the actual instantiation, on the transistor and circuit-block level, of the circuit, and the soft configuration denotes the settable physical parameters (e.g., data type, table size buffer depth) of the instantiated component. That is, soft-configuration data is similar to the data that one can load into a register of a processor (not shown in FIG. 3) to set the operating mode (e.g., burst-memory mode, page mode) of the processor. For example, the host processor 12 may send to the PLIC 60 soft-configuration data that causes the configuration manager 88 to set the number and respective priority levels of queues (not shown) within the communication interface 80. And as discussed in the preceding paragraph, the exception manager 86 may also send soft-configuration data that causes the configuration manager 88 to, e.g., increase the size of an overflowing buffer in the communication interface 80.
[66] The communication interface 80, optional industry-standard bus interface 82, controller 84, exception manager 86, and configuration manager 88 are further discussed in previously incorporated U.S. Patent Application Publication No. 2004/0136241.
[67] Referring again to FIGS. 2 - 3, although the pipeline unit 50 is disclosed as including only one PLIC 60, the pipeline unit may include multiple PLICs. For example, as discussed in previously incorporated U.S. Patent Application Publication No. 2004/0136241 , the pipeline unit 50 may include two interconnected PLICs, where the circuitry that composes the interface-adapter layer 70 and framework-services layer 72 are instantiated on one of the PLICs, and the circuitry that composes the communication shell 74 and the hardwired pipeline(s) 44 are instantiated on the other PLIC.
[68] FIG. 4 is a block diagram of the accelerator/host-processor configuration registry 40 of FIG. 1 according to an embodiment of the invention.
[69] The registry 40 includes configuration data 100, an accelerator-template library 102, a software-object library 104, a circuit-definition library 106, and an accelerator-firmware library 108.
[70] The configuration data 100 contains instructions that the configuration manager (not shown in FIG. 4) executed by the host processor 12 (FIG. 1) follows to configure the accelerator 14 (FIG. 1), and is further discussed below in conjunction with FIGS. 10 - 14. These instructions may be written in any conventional language or format.
[71] The accelerator-template library 102 contains templates that define one or more interface-adapter layers 70, framework-services layers 72, communication shells 74, and hardwired pipelines 44 that the configuration manager executed by the host processor 12 (FIG. 1) can instantiate on the PLICs 60 (FIG. 2). The library 102 is further discussed below in conjunction with FIGS. 5 - 6.
[72] The software-object library 104 contains one or more software objects that, when executed, respectively perform in software the same (or similar) functions that the pipelines 44 defined by the templates in the accelerator-template library 702 perform in hardware. These software objects give the configuration manager executed by the host processor 12 the flexibility of instantiating in software at least some of the pipelined functions specified by the configuration data 100. The library 104 is further discussed below in conjunction with FIG. 7.
[73] The circuit-definition library 106 contains one or more circuit-definition files that each define a respective circuit for instantiation on the accelerator 14 (FIG. 1). Each circuit typically includes one or more interconnected hardwired pipelines 44 (FIG. 1), which are typically defined by corresponding templates in the library 102. The library 106 is further discussed below in conjunction with FIG. 8. [74] The accelerator-firmware library 108 contains one or more firmware-configuration files that each PLIC 60 (FIG. 2) of the accelerator 14 (FIG. 1) respectively downloads to set its internal circuit-node connections so as to instantiate a respective interface-adapter layer 70, framework-services layer 72, communication shell 74, and hardwired pipeline(s) 44. The library 108 is further discussed below in conjunction with FIG. 9.
[75] FIG. 5 is a block diagram of a hardware-description file 120 from which the configuration manager (not shown in FIG. 5) executed by the host processor 12 (FIG. 1) can generate firmware for setting the circuit- node connections within a PLIC such as the PLIC 60 (FIGS. 2 - 3) according to an embodiment of the invention. The accelerator-template library 102 contains templates that one can arrange to compose the file 120; consequently, an understanding of the hardware-description file 120 should facilitate the reader's understanding of the accelerator-template library 102, which is discussed below in conjunction with FIG. 6.
[76] Typically, the below-described templates of the hardware-description file 120 are written in a conventional hardware description language (HDL) such as Verilog® HDL, and are organized in a top-down structure that resembles the top-down structure of software source code that incorporates software objects. A similar hardware-description file is described in previously incorporated U.S. Patent
App. Ser. No. (Attorney Docket No. 1934-023-03).
Furthermore, techniques for generating PLIC firmware from the file 120 are discussed below in conjunction with FIGS. 10 - 14. [77] The hardware-description file 120 includes a top-level template
121, which contains respective top-level definitions 122, 124, and 126 of the interface-adapter layer 70, the framework-services layer 72, and the communication shell 74 — together, the definitions 122, 124, and 126 compose a top-level definition 123 of the hardware-interface layer 62 — of a PLIC such as the PLIC 60 (FIGS. 2 - 3). The template 121 also defines the connections between the external pins (not shown) of the PLIC and the interface-adapter layer 70 (and in some cases between the external pins and the framework-services layer 72), and also defines the connections between the framework-services layer and the communication shell 74 (and in some cases between the interface-adapter layer and the communication shell).
[78] The top-level definition 122 of the interface-adapter layer 70
(FIGS. 2 - 3) incorporates an interface-adapter-layer template 128, which further defines the portions of the interface adapter layer defined by the top- level definition 122. For example, suppose that the top-level definition 122 defines a data-input buffer (not shown) in terms of its input and output nodes. That is, suppose the top-level definition 122 defines the data-input buffer as a functional block having defined input and output nodes. The template 128 defines the circuitry that composes this functional buffer block, and defines the connections between this circuitry and the buffer input nodes and output nodes already defined by in the top-level definition 122. Furthermore, the template 128 may incorporate one or more lower-level templates 129 that further define the data buffer or other components of the interface-adapter layer 70 already defined in the template 128. Moreover, these one or more lower-level templates 129 may each incorporate one or more even lower-level templates (not shown), and so on, until all portions of the interface-adapter layer 70 are defined in terms of circuit components (e.g., flip-flops, logic gates) that a PLIC synthesizing and routing tool (not shown) recognizes — a PLIC synthesizing and routing tool is a conventional tool, typically provided by the PLIC manufacturer, that can generate from the hardware-description file 120 configuration firmware for a PLIC.
[79] Similarly, the top-level definition 124 of the framework-services layer 72 (FIGS. 2 - 3) incorporates a framework-services-layer template 130, which further defines the portions of the framework-services layer defined by the top-level definition 724. For example, suppose that the top-level definition 724 defines a counter (not shown) in terms of its input and output nodes. The template 730 defines the circuitry that composes this counter, and defines the connections between this circuitry and the counter input and output nodes already defined in the top-level definition 724. Furthermore, the template 730 may incorporate a hierarchy of one or more lower-level templates 737 and even lower-level templates (not shown), and so on such that all portions of the framework-services layer 72 are, at some level of the hierarchy, defined in terms of circuit components (e.g., flip-flops, logic gates) that the PLIC synthesizing and routing tool recognizes. For example, suppose that the template 730 defines the counter as including a count-up/down selector having input and output nodes. The template 730 may incorporate a lower-level template 737 that defines the circuitry within this up/down selector and the connections between this circuitry and the selector's input and output nodes already defined by the template 730.
[80] Likewise, the top-level definition 726 of the communication shell 74 (FIGS. 2 - 3) incorporates a communication-shell template 732, which further defines the portions of the communication shell defined by the definition 726, and which also includes a top-level definition 733 of the hardwired pipeline(s) 44 disposed within the communication shell. For example, the definition 733 defines the connections between the communication shell 74 and the hardwired pipeline(s) 44. [81] The top-level definition 133 of the pipeline(s) 44 (FIGS. 2 - 3) incorporates for each defined pipeline a respective hardwired-pipeline template 134, which further defines the portions of the respective pipeline 44 already defined by the definition 133. The template or templates 134 may each incorporate a hierarchy of one or more lower-level templates 135, and even lower-level templates, such that all portions of the respective pipeline 44 are, at some level of the hierarchy, defined in terms of circuit components (e.g., flip-flops, logic gates) that the PLIC synthesizing and routing tool recognizes. [82] Moreover, the communication-shell template 132 may incorporate a hierarchy of one or more lower-level templates 136, and even lower-level templates, such that all portions of the communication shell 74 other than the pipeline(s) 44 are, at some level of the hierarchy, also defined in terms of circuit components (e.g., flip-flops, logic gates) that the PLIC synthesizing and routing tool recognizes. *
[83] Still referring to FIG. 5, a configuration template 138 provides definitions for one or more parameters having values that one can set to configure the circuitry that the templates 121, 128, 129, 130, 131, 132, 134, 135, and 136, and the even lower-level templates (not shown) define. For example, suppose that the bus interface 82 (FIG. 3) of the interface-adapter layer 70 (FIG. 3) is configurable to have either a thirty-two-bit or a sixty- four-bit interface to the bus connector 56. The configuration template 138 defines a parameter BUS-WIDTH, the value of which determines the width of the interface 82. For example, BUS-WIDTH = 0 configures the interface 82 to have a thirty-two-bit interface, and BUS-WIDTH = 1 configures the interface 82 to have a sixty-four-bit interface. Examples of other parameters that may be configurable in this manner include the depth of a first-in-first-out (FIFO) data buffer (not shown) disposed within the data memory 58 (FIGS. 2 - 3), the lengths of messages received and transmitted by the interface adapter layer 70, the precision and data type (e.g., integer, floating-point) of the pipeline(s) 44, and a constant coefficient of a mathematical expression (e.g., "a" in ax2) that a pipeline executes.
[84] One or more of the templates 121, 128, 129, 130, 131, 132,
134, 135, and 136 and the lower-level templates (not shown) incorporate the parameter(s) defined in the configuration template 138. The PLIC synthesizer and router tool (not shown) configures the interface-adapter layer 70, the framework-services layer 72, the communication shell 74, and the hardwired pipeline(s) 44 (FIGS. 2 - 3) according to the set values in the template 138 during the synthesis of the hardware-description file 120. Consequently, to reconfigure the circuit parameters associated with the parameters defined in the configuration template 138, one need only modify the values of these parameters in the configuration template, and then rerun the synthesizer and router tool on the file 120. Alternatively, if one or more of the parameters in the configuration template 138 can be sent to the PLIC as soft-configuration data after instantiation of the circuit, then one can modify the corresponding circuit parameters by merely modifying the soft-configuration data. Therefore, according to this alternative, may avoid rerunning the synthesizer and router tool on the file 120. Moreover, templates (e.g., 121 , 128, 129, 130, 131, 132, 134, 135, and 136) that do not incorporate settable parameters such as those provided by the configuration template 138 are sometimes called modules or entities, and are typically lower-level templates that include Boolean expressions that a synthesizer and router tool (not shown) converts into circuitry for implementing the expressions.
[85] Alternate embodiments of the hardware-description file 120 are contemplated. For example, although described as defining circuitry for instantiation on a PLIC, the file 120 may define circuitry for instantiation on an application-specific integrated circuit (ASIC).
[86] FIG. 6 is a block diagram of the accelerator-template library 702 of FIG. 4 according to an embodiment of the invention. The library 102 contains one or more versions of the templates described above in conjunction with FIG. 5. For clarity, however, the optional lower-level templates 129, 131, 135, and 136 are omitted from FIG. 6. Furthermore, a library similar to the library 102 is described in previously incorporated U.S. Patent App. Ser. No. (Attorney Docket No. 1934-023-03).
[87] The library 102 has m+1 sections: m sections 14O1 - 140m for the respective m platforms that the library supports, and a section 142 for the hardwired pipelines 44 (FIGS. 1 - 3) that the library supports.
[88] For example purposes, the library section 14O1 is discussed in detail, it being understood that the other library sections 14O2 - 140m are similar.
[89] The library section 14O1 includes a top-level template 12I1, which is similar to the template 121 of FIG. 5, and which thus includes top-level definitions 1221, 1241t and 12Q1 of the respective versions of the interface-adapter layer (IAL) 70, the framework-services layer (FSL) 72, and the communication shell(s) 74 (FIGS. 2 - 3) that are compatible with the platform m=1, i.e., platform 1.
[90] In this embodiment, we assume that there is only one version of the interface-adapter layer 70 and one version of the framework services layer 72 (FIGS. 2 - 3) available for each platform m, and, therefore, that the library section 14O1 includes only one interface-adapter-layer template 128-, and only one framework-services-layer template 13O1. But in an embodiment where each platform m includes multiple versions of the interface-adapter layer 70 and multiple versions of the framework-services layer 72, the library section 14O1 would include multiple interface-adapter- and framework-services-layer templates 128 and 130.
[91] The library section 14O1 also includes n communication-shell templates 1321t1 - 1321>n, which respectively correspond to the hardwired- pipeline templates 134ή - 134n in the library section 142. As stated above in conjunction with FIGS. 2 - 3, the communication shell 74 interfaces a hardwired pipeline or hardwired-pipelines 44 to the framework-services layer 72. Because each hardwired pipeline 44 is different and, therefore, typically has different interface specifications, the communication shell 74 is typically different for each hardwired pipeline. Consequently, in this embodiment, one provides design adjustments to create a unique version of the communication shell 74 for each hardwired pipeline 44. The designer provides these design adjustments by writing a unique communication-shell template 132 for each hardwired pipeline. Of course the group of communication-shell templates 132^1 - 1321>n corresponds only to the version of the framework-services layer 72 that is defined by the template 13O1; consequently, if there are multiple versions of the framework-services layer 72 that are compatible with the platform 1 , then the library section 14O1 includes a respective group of n communication-shell templates 132 for each version of the framework-services layer.
[92] Furthermore, the library section 14O1 includes a configuration template 13B1, which defines for the other templates in this library section (and possibly for the hardwired-pipeline templates 134 in the section 142) configuration constants having designer-selectable values as discussed above in conjunction with the configuration template 138 of FIG. 5.
[93] In addition, each template within the library section 14O1 includes, or is associated with, a respective template description 1441 - 152-\. The descriptions 1441- 1501>n describe the operational and other parameters of the circuitry that the respective templates 12I1, 12S1, 13O1, and 1321t1 - 1321>n respectively define. Similarly, the template description 152ή describes the settable parameters in the configuration template 138ή, the values that these parameters can have, and the meanings of these values. Examples of parameters that a template description 1441 - 15O1 n may describe include the width of the data bus and the depths of FIFO buffers that the circuit defined by the corresponding template includes, the latency of the circuit, and the type and precision of the values received and generated by the circuit. An example of a settable parameter and the associated selectable values that the description 1521 may describe is BUS-WIDTH, which represents the width of the interface between the communication interface 80 and the bus connector 56 (FIGS. 2 - 3), where BUS_WIDTH = 0 sets this interface to thirty-two bits and BUS_WIDTH = 1 sets this interface to sixty-four bits.
[94] Each of the template descriptions 144-t - 1521 may be embedded within the template 12I1, 1281t 13O1, 1321 - 1321ιt1, and 13Z1 to which it corresponds. For example, the IAL template description 14Q1 may be embedded within the interface-adapter-layer template 12Z1 as extensible markup language (XML) tags or comments that are readable by both a human and the host processor 12 (FIG. 1) as discussed below in conjunction with FIGS. 10 - 14.
[95] Alternatively, each of the template descriptions 1441 - 1521 may be disposed in a separate file that is linked to the template to which the description corresponds, and this file may be written in a language other than XML. For example, the top-level-template description 1441 may be disposed in a file that is linked to the top-level template 12I1.
[96] The section 14O1 of the library 102 also includes a description
1541, which describes parameters specified by of the platform m = 1. The host processor 12 (FIG. 1) may use the description 1541 to determine which platform(s) the library 102 supports as discussed below in conjunction with FIGS. 10 - 14. Examples of parameters that the description 154-1 may describe include: 1 ) for each interface, the message specification, which lists the transmitted variables and the constraints for those variables, and 2) a behavior specification and any behavior constraints. Messages that the host processor 12 (FIG. 1) sends to the pipeline units 50 (FIG 2) and that the pipeline units send among themselves are further discussed in previously incorporated U.S. Patent Publication No. 2004/0181621. Examples of other parameters that the description 754? may describe include the size and resources (e.g., the number of multipliers and the amount of available memory) that the platform specifies for the PLICs that compose a compatible pipeline accelerator 14 (FIG. 1). Furthermore, like the template descriptions 744? - 752?jn, the platform description 15A1 may be written in XML or in another language.
[97] Still referring to FIG. 6, the section 142 of the library 102 includes n hardwired-pipeline templates 134i - 134n, which each define a respective hardwired pipeline 44-t - 44n (FIGS. 1 - 3). As discussed above in conjunction with FIG. 5, because the templates 1341 - 134n are platform independent (the corresponding communication-shell templates 732m ? - 132mιn respectively define the specified interfaces between the pipelines 44 and the framework-services layer 70, the library 102 stores only one template 134 for each hardwired pipeline 44. That is, each hardwired pipeline 44 does not require a separate template 134 for each platform m that the library 102 supports. As discussed in previously incorporated U.S.
Patent Application Ser. No. (Attorney Docket No. 1934-023-03), an advantage of this top-down design is that one need generate only a single template 134 to define a respective hardwired pipeline 44, not m templates.
[98] Furthermore, each hardwired-pipeline template 734 includes, or is associated with, a respective template description 756? - 156n, which describes parameters of the hardwired-pipeline 44 that the template defines. Examples of parameters that a template description 15G1 - 156n may describe include the type (e.g., floating point or integer) and precision of the data values that the corresponding hardwired pipeline 44 can receive and generate, and the latency of the pipeline. Like the template descriptions 744? - 752?, each of the descriptions 756? - 756π may be respectively embedded within the hardwired-pipeline template 734? - 734π to which the description corresponds as, e.g., XML tags, or may be disposed in a separate file that is linked to the corresponding hardwired- pipeline template.
[99] Still referring to FIG. 6, alternate embodiments of the library
102 are contemplated. For example, instead of each template within each library section 14O1 - 140m being associated with a respective description 144 - 152, each library section 14O1 - 140m may include a single description that describes all of the templates within that library section. For example, this single description may be embedded within or linked to the top-level template 121 or to the configuration template 138. Furthermore, although each library section 14O1 - 140m is described as including a respective communication-shell template 132 for each hardwired-pipeline template 134 in the library section 142, each section 140 may include fewer communication-shell templates, at least some of which are compatible with, and thus correspond to, more than one pipeline template 134. In an extreme, each library section 14O1 - 140m may include only a single communication-shell template 132, which is compatible with all of the hardwired-pipeline templates 134 in the library section 142. In addition, the library section 142 may include respective versions of each pipeline template 134 for each communication-shell template 132 in the library sections 14O1 - 140m.
[100] FIG. 7 is a block diagram of the software-object library 104 of
FIG. 4 according to an embodiment of the invention.
[101] The library 104 includes software objects 16O1 - 160q, at least some of which can cause the host processor 12 (FIG. 1) to perform in software the same functions that respective ones of the hardwired pipelines 441 - 44n (FIGS. 2 - 3) can perform in hardware. For example, if the pipeline 441 squares a value v (v2) input to the pipeline, then a corresponding software object 16O1 can cause the host processor 12 to square an input value v (v2). The software objects 16O1 - 160q may be directly executable by the host processor 12, or may cause the host processor to generate corresponding programming code that the host processor can execute. Furthermore, the software objects I6O1 - 160q may be written in any conventional programming language such as C++. Because object-oriented software architectures are known, further details of the software objects 160 are omitted for brevity.
[102] The library 704 also includes respective object descriptions
762* - 162q of the software objects 16O1 - 160q. The object descriptions 162 may describe parameters and other features of the software objects 160, such as the function(s) that they cause the host processor 12 (FIG. 1) to perform, and the latency and the type and precision of the values accepted and generated by the host processor 12 while performing the function(s). Furthermore, the descriptions 762 may be written in a conventional language, such as XML1 that the host processor 12 recognizes, and may be embedded, e.g., as comment tags, within the respective software objects 760 or may be contained within separate files that correspond to the respective software objects.
[103] Referring to FIGS. 1 and 7, as discussed below in conjunction with FIGS. 10 - 14, the software objects 760 provide the host processor 12 flexibility in configuring the pipeline accelerator 14, and in reconfiguring the peer-vector machine 70 in the invent of a failure. For example, suppose that the configuration data 700 calls for instantiating eight hardwired pipelines 44 on the accelerator 14, but the accelerator has room for only seven pipelines. The host processor 12 may execute a software object that corresponds to the eighth pipeline so as to perform the function that the eighth pipeline otherwise would have performed. Or, suppose that the configuration data 700 calls for instantiating a pipeline 44 that performs a function (e.g., sin (v)), but no such pipeline is available. The host processor 12 may execute a software object 760 so as to perform the function.
[104] FIG. 8 is a block diagram of the circuit-definition library 706 of FIG. 4 according to an embodiment of the invention. [105] The library 106 includes circuit-definition files 17O1 - 170p, which each define a respective circuit for instantiation on one more PLICs (FIGS. 2 - 3) of the pipeline accelerator 14 (FIG. 1 ) in terms of templates from the accelerator-template library 102 (FIG. 4). To define a circuit for instantiation on a single PLIC, the circuit file 170 identifies from the template library 102 (FIGS. 4 and 6) the respective top-level template 121, interface-adapter-layer template 128, framework-services-layer template 130, communication-shell template 132, hardwired-pipeline-template(s) 134, configuration template 138, and corresponding lower-level templates that define the circuitry to be instantiated on that PLIC. And if the PLIC is to include multiple hardwired-pipelines 44, then the file 170 defines the interconnections between these pipelines. To define a circuit for instantiation on multiple PLICs, the circuit file 170 identifies for each PLIC the templates that define the circuitry to be instantiated or that PLIC, and also defines the interconnections between the PLICs. An example of a circuit defined by a circuit file 170 is described below in conjunction with FIG. 11.
[106] The library 106 also includes circuit descriptions 1721 - 172P that correspond to the circuit-definition files 17O1 - 17O9. A description 172 typically describes the function (e.g., y = x2 + z3) performed by the circuit that the corresponding file 170 detects, and the operating parameters and other features of the circuit, such as the latency and the type and precision of the values accepted and generated by each PLIC that composes the circuit. The description 172 may also identify the platform(s) with which the corresponding circuit is compatible, and may include values for the constants defined by the configuration template(s) 138 (FIGS. 5 - 6) that the circuit definition file 170 identifies. Furthermore, each of the circuit descriptions 170 may be written in a conventional language, such as XML, that the host processor 12 (FIG. 1) recognizes, and may be embedded (e.g., as comment tags) within a respective circuit-description file 170 or may be contained within a separate file that is linked to the respective circuit-description file.
[107] Files similar to the circuit-definition files 170 and a tool for generating these files are disclosed in previously incorporated U.S. Patent Application (Attorney Docket No. 1934-023-03). Furthermore, although described as defining circuits for instantiation on one or more PLICs, some of the circuit-definition files 170 may define circuits for instantiation on an ASIC.
[108] FIG. 9 is a block diagram of the accelerator-firmware library 108 of FIG. 4 according to an embodiment of the invention.
[109] The library 108 includes firmware files 18O1 - 18On each of which, when downloaded by a PLIC, configures the PLIC to instantiate a respective circuit. As described above in conjunction with FIGS. 2 - 3, the respective circuit typically includes an interface-adapter layer 70, framework-services layer 72, communication shell 74, and one or more hardwired pipelines 44, although the circuit may have a different topology. A PLIC synthesizing and routing tool (not shown) may generate one or more of the firmware files 180 from templates in the accelerator-template library 102 (FIG. 4), or in another manner.
[110] The firmware files 18O1 - 180r are the only files within the accelerator/host-processor-configuration registry 40 (FIG. 4) that can actually configure a PLIC to instantiate a circuit. That is, although the templates in the library 102 (FIG. 4) and the circuit-definition files 170 in the library 106 (FIG. 4) define circuits, the configuration manager (not shown in FIG. 9) executed by the host processor 12 (FIG. 1) cannot instantiate such a defined circuit on the pipeline accelerator 14 (FIG. 1) until the respective template(s) and/or circuit-definitions files are converted into one or more corresponding firmware files 180 using, for example, a PLIC synthesizing and routing tool (not shown). [111] Still referring to FIG. 9, the library 108 also includes respective descriptions 182ή - 182r of the firmware files 18O1 - 180r. Each description 182 typically describes the function (e.g., y = x2 + z3) performed by the circuit that the corresponding firmware file 180 can instantiate, and the parameters and other features of the circuit, such as the latency and the type and precision of the values accepted and generated by the circuit. The description 782 may also identify the platform(s) with which the corresponding circuit is compatible, and may also identify the type(s) of PLIC on which the circuit can be instantiated. Furthermore, the descriptions 182 may be written in a conventional language, such as XML, that the host processor 12 (FIG. 1) recognizes, and may be embedded (e.g., as comment tags) within the respective firmware files 180 or may be contained within separate files that are linked to the respective firmware files.
[112] FIG. 10 is a functional block diagram of the host processor 12, the interface memory 18, and the pipeline bus 20 of FIG. 1 according to an embodiment of the invention. Generally, the processing unit 32 executes one or more software applications, and the message handler 34 executes one or more software objects (different from the software objects in the library 104 of FIG. 4) that transfer data between the software application(s) and the pipeline accelerator 14 (FIG. 1). Splitting the data-processing, data-transferring, and other functions among different applications and objects allows for easier design and modification of the host-processor software. Furthermore, although in the following description a software application is described as performing a particular function, it is understood that in actual operation, the processing unit 32 or message handler 34 executes the software application and performs this function under the control of the application. Moreover, although in the following description a software object is described as performing a particular function, it is understood that in actual operation, the processing unit 32 or message handler 34 executes the software object and performs this function under the control of the object. In addition, although in the following description a manager application (e.g., configuration manager) is described as performing a particular function, it is understood that in actual operation, the processing unit 32 or message handler 34 executes the manager application and performs this function under the control of the manager application.
[113] Still referring to FIG. 10, the processing unit 32 executes at least one data-processing application 190, an accelerator-exception-manager application (hereinafter the exception manager) 192, and an accelerator-configuration-manager application (hereinafter the configuration manager) 194, which are collectively referred to as the processing-unit applications. Furthermore, the exception and configuration managers 192 and 194 are executed by the processing unit 32, and are thus different from the exception and configuration managers 88 and 90 disposed on the PLIC 60 of FIG. 3. [114] The data-processing application 190 processes data in cooperation with the pipeline accelerator 14 (FIG. 1). For example, the data-processing application 190 may receive raw sonar data via the port 24, parse the data, and send the parsed data to the accelerator 14, and the accelerator may perform a fast Fourier transform (FFT) on the parsed data and return the FFT output data to the data-processing application for further processing.
[115] The exception manager 192 handles exception messages from the pipeline accelerator 14 (FIG. 1), and may detect and handle exceptions that result from the operation of the host processor 12. The PLIC exception manager(s) 88 (FIG. 3) typically generate the exception messages that the exception manager 192 receives from the pipeline accelerator 14.
[116] And, as discussed further below in conjunction with FIGS. 11 -
14, the configuration manager 194 downloads the firmware files 780 from the library 106 (FIGS. 4 and 9) into accelerator the firmware memory or memories 22 (FIGS. 1 - 3) during initialization of the peer-vector machine 10 (FIG. 1), and may also reconfigure the pipeline accelerator 14 (FIG. 1) after the initialization in response to, e.g., a malfunction of the peer-vector machine. The configuration manager 194 may perform additional functions as described below in conjunction with FIGS. 11 - 14. [117] The processing-unit applications 190, 192, and 194 may communicate with each other directly as indicated by the dashed lines 196, 198, and 200, or may communicate with each other via the data-transfer objects 202, which are described below. Furthermore, the processing-unit applications 190, 192, and 194 communicate with the pipeline accelerator 14 (FIG. 1) via the data-transfer objects 202.
[118] The message handler 34 executes the data-transfer objects
202, a communication object 204, and input and output reader objects 206 and 208, and may also execute input- and output-queue objects 210 and 212. The data-transfer objects 202 transfer data between the communication object 204 and the processing-unit applications 190, 192, and 194, and may use the interface memory 18 as one or more data buffers to allow the processing-unit applications and the pipeline accelerator 14 (FIG. 1) to operate independently. For example, the memory 18 allows the accelerator 14, which is often faster than the data-processing application 190, to operate without "waiting" for the data-processing application. The communication object 204 transfers data between the data-transfer objects 202 and the pipeline bus 20. The input- and output-reader objects 206 and 208 control the data-transfer objects 202 as they transfer data between the communication object 204 and the processing-unit applications 190, 192, and 194. And, when executed, the input- and output-queue objects 210 and 212 cause the input- and output-reader objects 206 and 208 to synchronize this transfer of data according to a desired priority
[119] Furthermore, during initialization of the peer-vector machine 10
(FIG. 1), the message handler 34 instantiates and executes an object factory 214, which instantiates the data-transfer objects 202 from configuration data stored in the message-configuration registry 42 (FIG. 1). The message handler 34 also instantiates the communication object 204, the input- and output-reader objects 206 and 208, and the input- and output-queue objects 210 and 212 from the configuration data stored in the message-configuration registry 42. Consequently, one can design and modify the objects 202- 212, and thus their data-transfer parameters, by merely designing or modifying the configuration data stored in the registry 42. This is typically less time consuming than designing or modifying each software object individually. [120] The structure and operation of the processing unit 32 and the message handler 34 are further described in previously incorporated U.S. Patent Publication No. 2004/0181621.
[121] The operation of the exception manager 192 and the configuration manager 194 is further discussed below in conjunction with FIGS. 11 - 14 according to an embodiment of the invention.
[122] FIG. 11 is a block diagram of a circuit 220 that is designed for instantiation on the pipeline accelerator 14 (FIG. 1) according to an embodiment of the invention. The clock signals, power signals, and other signals are omitted from FIG. 11 for clarity. [123] During operation, the circuit 220 generates, in a pipelined fashion, a stream of output values y from streams of input values x and z, which are related by the following equation:
(1 ) y = ^j ax4 cos(z) + bz3 sin(x)
where each x and z is a sixty-four bit floating-point number, each y is a sixty-four bit floating-point number, and a and b are respective sixty-four bit floating-point constant coefficients. Therefore, the circuit 220 is designed for instantiation on a pipeline accelerator 14 (FIG. 1) that supports a platform that specifies sixty-four-bit data transfers and busses. [124] As initially designed, the circuit 220 includes eight hardwired pipelines 44ή - 448 (pipelines 445 and 446 are the same) and eight hardware-interface layers 621 - 628 respectively instantiated on eight PLICs 6O1 - 6O8. The pipeline 44*, on the PLIC 6O1 receives the stream of input values x and generates a stream of values sin(x). Similarly, the pipeline 442 on the PLIC 6O2 receives the stream of input values z and generates a stream of values bz3, the pipeline 443 on the PLIC 6O3 receives the stream x and generates a stream ax4, and the pipeline 444 on the PLIC 6O4 receives the stream z and generates a stream cos(z). Furthermore, the pipeline 445 on the PLIC 6O5 receives from the PLICs 6O1 and 6O2 the streams sin(x) and bz3 and generates a stream of values bz3sin(x), and the pipeline 446 on the PLIC 6O6 receives from the PLICs 6O3 and 6O4 the streams ax4 and cos(z) and generates a stream ax4cos(z). In addition, the pipeline 447 on the PLIC 6O7 receives from the PLICs 6O5 and 6O6 the streams bz3sin(x) and ax4cos(z) and generates a stream b^sinζx) + ax4cos(x). Finally, the pipeline 448 on the PLIC 6O8 receives from the PLIC 6O7 the stream b^sinζx) + ax4cos(x) and generates a stream y = -Jαx4 cos(z) + bz3 sin(jc) per equation (1).
[125] FIG. 12 is a block diagram of the data paths between the PLICs 6O1 - 6O8 of FIG. 11 according to an embodiment of the invention, and is further described below.
[126] FIG. 13 is a block diagram of the circuit 220 modified for instantiation on seven PLICs 60 instead of eight PLICs (as shown in FIG. 11) according to an embodiment of the invention, and is further described below.
[127] FIG. 14 is a block diagram of the data paths between the .
PLICs 6O1 and 6O3 - 6O8 of FIG. 11 and a software-application thread that effectively replaces the pipeline 442 of FIG. 11 according to an embodiment of the invention, and is further described below. Operation of the Configuration Manager 194 (FIG. 10) During Initialization Of The Peer-Vector Machine 10 (FlG. 1)
[128] The operation of the configuration manager 194 during the initialization of the peer-vector machine 10 (FIG. 1) is discussed in conjunction with FIGS. 10 - 14 according to embodiments of the invention. Although a number of detailed operational examples are provided below, the following is a general overview of the configuration manager 194 and some of the advantages that it may provide.
[129] The configuration manager 194 initializes the peer-vector machine 10 (FIG. 1) when the machine is "turned on," restarted, or is otherwise reset.
[130] At the beginning of the initialization, the configuration manager
194 determines the desired configuration of the pipeline accelerator 14 (FIG. 1) from the configuration data 700 (FIG. 4) within the accelerator/host- processor-configuration registry 40 (FIGS. 1 and 4), and also determines the physical composition (e.g., the number of pipeline units 50 (FIGS. 2 - 3) and the platform(s) that they support) of the pipeline accelerator.
[131] Therefore, because the configuration manager 194 configures the pipeline accelerator 14 (FIG. 1) in response to the configuration data 700 (FIG. 4), one can typically change the accelerator configuration merely by "turning off the peer-vector machine 70 (FIG. 1), changing the configuration data, and then restarting the machine.
[132] Furthermore, by determining the composition of the pipeline accelerator 14 (FIG. 1) at the beginning of each initialization, the configuration manager 794 can detect changes to the accelerator (e.g., the removal or addition of a pipeline unit 50 (FIGS. 2- 3), and can often "fit" the by the configuration data 700 (FIG. 4) into an altered accelerator. That is, the configuration manager 794 can often detect a physical change to the accelerator 74 and modify the specified circuit instantiation(s) accordingly so that the circuit(s) can fit onto the modified accelerator and process data as desired despite the change.
Example 1
[133] Referring to FIGS. 10 - 12, in this example, the configuration data 100 (FIG. 4) points to a circuit-definition file 170 (FIGS. 4 and 8) that defines the circuit 220 of FIG. 11, and instructs the configuration manager 194 to instantiate the circuit 220 on the pipeline accelerator 14 (FIG. 1) according to this circuit-definition file.
[134] At the beginning of the initialization of the peer-vector machine 10 (FIG. 1), the configuration manager 194 reads the configuration data 100, determines from the configuration data the desired configuration of the pipeline accelerator 14 (FIG. 1), and also determines the physical composition of the pipeline accelerator. Regarding the former determination, the configuration manager 194 first determines that it is to read the circuit-definition file 170 pointed to by the configuration data 100. Next, the configuration manager 194 reads the file 170, and determines that the manager is to instantiate on each of the eight PLICs 6O1 - 6O8 a respective pipeline 44-, - 448 (pipelines 445 and 446 are the same) and hardware-interface layer 621 - 628. Regarding the determination of the composition of the pipeline accelerator 14, the pipeline bus 20 (FIG. 1) may include slots for receiving pipeline units 50 (FIGS. 2 - 3), and the configuration manager 194 may, for each slot, read a conventional indicator associated with the slot, or use another technique, for determining whether or not a pipeline unit is inserted into the slot. [135] Next, the configuration manager 194 determines whether the configuration indicated by the configuration data 100 (FIG. 4) is compatible with the physical composition of the pipeline accelerator 14 (FIG. 1). Specifically, in this example, the configuration manager 194 determines whether the accelerator 14 includes eight pipeline units 50 each having a respective one of the PLICs 6O1 - 6O8 on which the configuration manager can instantiate the pipeline units 44ή - 448 and the hardware-interface layers 62ή - 628.
[136] In this example, the configuration manager 194 determines that the desired configuration of the pipeline accelerator 14 (FIG. 1) is compatible with the physical composition of the accelerator.
[137] Consequently, the configuration manager 194 next determines whether the pipeline accelerator 14 (FIG. 1) supports the platform(s) that circuit-definition file 170 specifies as being compatible with the circuit 220. More specifically, the configuration manager 194 reads from the file 170 the specified platform(s), and reads from the platform-identifier memory 54 (FIGS. 2 - 3) on each pipeline unit 50 (FIGS. 2 - 3) the identity/identities of the platform(s) that the pipeline units support. Then, the configuration manager 194 compares the specified platform(s) from the file 170 to the identified platform(s) from the memories 54. If at least one platform from the file 170 matches at least one platform from the memories 54, then the configuration manager 194 determines that the platform(s) supported by the pipeline accelerator 14 is/are compatible with the platform(s) specified by the file 170. In this example, the file 170 indicates that the circuit 220 is compatible with platform 1 (FIG. 6), and the platform-identifier memory 54 on each pipeline unit 50 indicates that the respective pipeline unit is compatible with this platform; consequently, the configuration manager 194 determines that the pipeline accelerator 14 is compatible with the platform (Ae., platform 1) specified by the circuit-definition file 170.
[138] In this example, the configuration manager 194 determines that the pipeline accelerator 14 (FIG. 1) supports the platform(s) that the circuit-definition file 170 (FIG. 8) specifies.
[139] Therefore, the configuration manager 194 next determines whether the firmware library 708 (FIGS. 4 and 9) includes firmware files 780 that, when downloaded by the PLICs 6O1 - 6O8, will respectively instantiate on these PLICs the pipelines 44^ - 448 and the hardware-interface layers 621 - 628. The configuration manager 194 makes this determination by reading the firmware descriptions 182 in the library 108. For example, if the description 182i indicates that the corresponding firmware file 18O1 will instantiate the pipeline 441 within a hardware-interface layer (i.e., the hardware-interface layer 62-,) that is compatible with platform 1 then the configuration manager 194 matches the firmware file 18O1 to the PLIC 6O1 in the circuit 220.
[140] In this example, the configuration manager 794 determines that the library 108 (FIGS. 4 and 8) contains firmware files 18O1 - 18O7 for each of the PLICs 6O1 - 6O8 — the firmware file 18O5 is for both the PLICs 6O5 and 6O6 — because the pipelines 445 - 446 are the same.
[141] Consequently, the configuration manager 194 next downloads these firmware files 780* - 78O7 (FIG. 8) to the PLICs 6O1 - 6O8 (FIG. 11) via the pipeline bus 20. Techniques for downloading these firmware files are described in previously incorporated U.S. Patent Publication No. 2004/0170070.
[142] Then, the configuration manager 794 determines the topology that the circuit-definition file 770 (FIG. 7) specifies for interconnecting the PLICs 6O1 - 6O8 of the circuit 220 (FIG. 11). [143] In this example, the circuit-description file 770 (FIG. 7) specifies that the PLICs 6O1 - 6O8 (FIG. 11) are to be interconnected via the host processor 12 (FIG. 1) as shown FIG. 12.
[144] Therefore, referring to FIG. 12, the configuration manager 794 instantiates in the interface memory 78 buffers 23O1 - 23O15, and instantiates in the message handler 34 data-transfer objects 2021 - 2022Z.
[145] Referring to FIG. 11 , the PLIC 6O1 needs a path on which to provide the stream of values sin(x) to the corresponding input pin of the PLIC 6O5. The configuration manager 794 forms this path by instantiating in the interface memory 78 the buffers 23O1 and 23O2, and by instantiating in the message handler 34 the data-transfer objects 2021t 2022, and 2023. In operation, the PLIC 6O1 provides the stream of values sin(x) to the data- transfer object 202ή via the pipeline bus 20 and communication object 204, and the data-transfer object 2021 sequentially loads these values into the buffer 23O1. Then, the data-transfer object 2022 sequentially transfers the values sin(x) from the buffer 23O1 to the buffer 23O2 in first-in-first-out fashion, and the data-transfer object 2023 transfers the values sin(x) from the buffer 23O2 in first-in-first-out fashion to the corresponding input pin of the PLIC 6O5 via the communication object 204 and the pipeline bus 20. The configuration manager 194 forms the remaining paths interconnecting the PLICs in a similar manner. Therefore, in operation the PLIC 6O2 transfers the values jbz3 to the corresponding input pin of the PLIC 6O5 via the data-transfer objects 2024 - 2026 and the buffers 23O3 - 23O4, the PLlC 6O3 transfers the values ax4 to the corresponding input pin of the PLIC 6O6 via the data-transfer objects 2027 - 2029 and the buffers 23O5 - 23O6, and so on. Finally, the PLIC 6O8 provides the values y = ^J αx4 cos(z) + δz3 sin(jc) to the data-processing application 190 via the data-transfer objects 20222 and 20223 and the buffer 23O15. Furthermore, the PLICs 6O1 - 6O4 may receive the values x and z via the raw-data input port 24, or from the data- processing application 190 via respective buffers 230 and data-objects 202 (omitted from FIG. 12 for brevity) that the configuration manager 194 instantiates in response to the circuit-definition file 170. For example, the data-processing application 190 may provide the values x to a first data-transfer object 202 (not shown), which loads the values x into a buffer 230 (not shown). Then, a second data-transfer object 202 (not shown) unloads the values x from the buffer 230 and provides these values to the corresponding input pins of the PLICs 6O1 and 6O3 via the communication object 204 and the pipeline bus 20.
[146] After instantiating the data-transfer objects 2021 - 20223, and the buffers 23O1 - 23O15 (and possibly the data-transfer objects and buffers described in the preceding paragraph), the configuration manager 194 sends to the configuration managers 88 (FIG. 3) on each of the PLICs 6O1 - 6O8 any soft-configuration data specified by the circuit-definition file 170. For example, the configuration manager 194 may send to the configuration managers 88 on the PLICs 6O2 and 6O3 soft-configuration data that sets the values of the constants a and b. Or, the configuration manager 194 may send to the configuration managers 88 on the PLICs 6O1 and 6O4 soft- configuration data that causes the respective exception managers 86 on these PLICs to indicate exceptions for values of sin(x) and cos(x) outside of the ranges -1 ≤ sin(x) ≤ 1 and -1 < cos(x) ≤ 1 , respectively. In one embodiment, the configuration manager 194 sends this soft-configuration data to the PLICs 6O1 - 6O8 via one or more data-transfer objects 202 that the configuration manager has instantiated for this purpose.
[147] After soft configuring the PLICs 6O1 - 6O8 and configuring any remaining portions of the pipeline accelerator 14 (FIG. 1), the interface memory 18, and the message handler 34, the configuration manager 194 exits the initialization mode and relinquishes control of the peer-vector machine 10 (FIG. 1) back to the host processor 12, which enters an operational mode where the PLICs 6O1 - 6O8 cooperate as described above to generate a stream of output values y = ^j ax4 cos(z) + bzz sin(x) in a pipelined fashion.
Example 2
[148] Referring to FIGS. 10 - 13, this example is similar to Example
1 , except that the configuration manager 194 determines that the pipeline accelerator 14 (FIG. 1) includes fewer than eight PLICs 60, and thus arranges the circuit 220 to "fit" onto the available PLICs.
[149] More specifically, the configuration manager 194 determines that the pipeline accelerator 14 includes only seven PLICs 6O1 - 6O5 and 6O7 - 6O8. [150] The configuration manager 194 sends this information to a circuit-design tool such as the circuit-design tool described in previously incorporated U.S. Patent Application Ser. No. (Attorney Docket
No. 1934-023-03). The tool may be executed by the host processor 12 (FIG. 1), and the configuration manager 194 may communicate with the tool via one or more data-transfer objects 202.
[151] In a first embodiment, the circuit-design tool determines that the circuit 220 cannot fit onto the pipeline accelerator 14 (FIG. 1), and notifies the configuration manager 194, which generates an appropriate error message. The host processor 12 (FIG. 1) may display this message via a display or by another conventional technique. In response to this message, an operator (not shown) can install into the peer-vector machine 10 (FIG. 1) an additional pipeline unit 50 (FIG. 2) that includes the PLIC 6O6 so that the configuration manager 194 can then instantiate the circuit 220 on the pipeline accelerator 14 as described above in Example 1.
[152] Referring to FIGS. 6 and 10 - 13, in a second embodiment, the circuit-design tool (not shown) accesses the library 102 and discovers a template 1349 for a dual-multiplication pipeline 44g (two multipliers in a single pipeline), and determines from the corresponding hardwired-pipeline- template description 1569 that this pipeline (along with a corresponding hardware-interface layer 629) can fit into the PLIC 6O5 and can give the circuit 220 the desired operating parameters (as included in the circuit- definition file 170 that defines the circuit 220). Then, using this dual-multiplication pipeline 449, the tool redesigns the circuit 220 as shown in FIG. 13 for instantiation on seven PLICS 6O1 - 6O5 and 6O7- 6O8, generates a circuit-definition file 170 corresponding to the redesigned circuit 220, and stores this circuit-definition file in the library 106 (FIGS. 4 and 8). The configuration manager 194 then instantiates the redesigned circuit 220 in a manner similar to that discussed above in conjunction with Example 1. If, however, the firmware library 108 includes no firmware file 180 for instantiating the dual-multiplier pipeline 449 on the PLIC 6O5, then the circuit design tool or the configuration manager 194 may notify an operator (not shown), who manually generates this firmware file and loads it into the firmware library. Alternatively, the circuit-design tool or the configuration manager 194 may cause a PLIC synthesizing and routing tool (not shown) to generate this firmware file from the appropriate templates in the accelerator-template library 102 (FIGS. 4 and 6). Once this firmware file is generated and stored in the library 108, the configuration manager 194 proceeds to instantiate the redesigned circuit 220 of FIG. 13 in a manner similar to that discussed above in conjunction with Example 1.
[153] Alternate embodiments of Example 2 are contemplated. For example, although Example 2 describes placing two multipliers on a single PLIC 6O5, the configuration manager 194 and/or the circuit-design tool (not shown) may fit the functions of multiple ones of the other pipelines 44ή - 448 of the circuit 220 on a single PLIC, including placing on a single PLIC a single pipeline that generates y in equation (2). Moreover, the circuit- design tool (not shown) may instantiate multiple interconnected ones of the pipelines 44^ - 448 (FIG. 11) on a single PLIC instead of searching for existing pipelines that each perform multiple ones of the functions performed by the pipelines 44ή - 448.
Example 3
[154] Referring to FIGS. 10 - 14, this example is similar to Example
2, except that the configuration manager 194 effectively replaces a hardwired pipeline 44 with a software object 160 (FIG. 7) from the software-object library 104 (FIGS. 4 and 7).
[155] More specifically, the configuration manager 194 determines that the pipeline accelerator 14 (FIG. 1) includes only seven PLICs 6O1 - 6O6 and 6O8. [156] The configuration manager 194 next reads the software-object descriptions 162 (FIG. 7) and determines that the software object 16O1 can sum two values such as Oz3Sm(X) and ax4cos(z).
[157] Consequently, referring to FIGS. 11 and 14, the configuration manager 194 instantiates the software object 16O1 (FIG. 7) as part of a data-processing application thread 240 that, after the instantiation of the remaining portions of the circuit 220 on the pipeline accelerator 44 (FIG. 1), receives Oz3SJn(X) and ax4cos(z) from the PLICs 6O5 and 6O6, respectively, sums corresponding values from these two streams, and then provides Oz3SiIi(X) + ax4cos(z) to the PLIC 6O8. More specifically, the thread 240 receives bz3sin(x) from the PLIC 6O5 via the pipeline 20, communication object 204, data-transfer object 20224, buffer 23O15, and data-transfer object 20225. Similarly, the thread 240 receives ax4cos(z) from the PLIC 6O6 via the pipeline 20, communication object 204, data-transfer object 20226, buffer 23O16, and data-transfer object 2022γ. And the thread provides Oz3Sm(X) + ax4cos(z) to the PLIC 6O8 via the data-transfer object 20228, buffer 23O17, data-transfer object 20229, communication object 204, and pipeline 20. The configuration manager 194 instantiates these data-transfer objects and buffers as described above in conjunction with Example 1. Furthermore, the operation and instantiation of application threads such as the thread 240 are described in previously incorporated U.S. Patent Publication No. 2004/0181621.
[158] Next, the configuration manager 194 proceeds to instantiate the remaining portions of the circuit 220 on the pipeline accelerator 14 (FIG. 1) in a manner similar to that discussed above in conjunction with Example 1.
[159] Although Example 3 describes replacing a single pipeline 446 with a data-processing application thread 240 that executes a single corresponding software object 160 (FIG. 7), the configuration manager 194 may replace any number of the pipelines 441 - 448 in the circuit 220 (FIG. 11) with one or more threads that execute corresponding software objects. Moreover, the configuration manager 194 may combine the concepts described in conjunction with Examples 2 and 3 by fitting multiple pipelines 44 or multiple pipeline functions on each of one or more PLICs, and replacing other pipelines 44 with one or more data-processing application threads that execute corresponding software objects 160.
Example 4
[160] Referring to FIGS. 10 - 12 and 14, this example is similar to
Example 1 , except that the configuration manager 194 determines that the pipeline accelerator 14 (FIG. 1) does not support the platform(s) that circuit-definition file 170 (FIG. 8) specifies as being compatible with the circuit 220.
[161] . In a first embodiment, the configuration manager 194 generates an error message, and, in response, an operator (not shown) replaces the pipeline units 50 (FIG. 2) that do not support the specified platform(s) with pipeline units that do support the specified platform(s).
[162] In a second embodiment, the configuration manager 194 instantiates a circuit that performs the same function as the circuit 220 (i.e., generates y in equation (1)) by downloading into the available PLICs firmware files 780 (FIG. 9) that instantiate the hardwired pipelines 44? - 448 with respective hardware-interface layers 62 that are compatible with the platform(s) supported by the pipeline accelerator 14 (FIG. 1). If the library 108 (FIGS. 4 and 8) does not contain such firmware files 180, then the configuration manager 794 and/or a circuit-design tool such as that described in previously incorporated U.S. Patent Application Ser. No. (Attorney Docket No. 1934-023-03) may generate these firmware files from the templates in the library 702 (FIGS. 4 and 6) as discussed above in conjunction with Example 2. [163] In a third embodiment, the configuration manager 194 instantiates the function of the circuit 220 {i.e., generates y in equation (1)) in one or more data-processing application threads 240 as discussed above in conjunction with Example 3. [164] In a fourth embodiment, the configuration manager 194 instantiates a portion of the circuit 220 on the pipeline accelerator 14 per the above described second embodiment of Example 4, and effectively instantiates the remaining portion of the circuit 220 in one or more data- processing application threads per the preceding paragraph. Example 5
[165] Referring to FIGS. 10 - 12, and 14, this example is similar to
Example 1 , except that the configuration manager 194 determines that the library 108 (FIGS. 4 and 8) lacks at least one of the firmware files 18O1 - 18O7 for the PLICs 6O1 - 6O8 (the firmware file 18O5 corresponds to both the PLICs 6O5 and 6O6).
[166] In a first embodiment, the configuration manager 194 generates an error message, and, in response, an operator loads the missing firmware file(s) 180 (FIG. 8) into the library 108 (FIGS. 4 and 8) so that the configuration manager can proceed with instantiating the circuit 220 on the PLICs 6O1 - 6O8 as discussed above in conjunction with Example 1.
[167] In a second embodiment, the configuration manager 194 and/or a circuit-design tool such as that described in previously incorporated U.S. Patent Application Ser. No. (Attorney Docket
No. 1934-023-03) generates these firmware files from the templates (FIG. 6) in the library 102 (FIGS. 4 and 6) as discussed above in conjunction with Example 2. Then, the configuration manager 194 loads these generated firmware files 180 into the library 108, and instantiates the circuit 220 on the PLICs 6O1 - 6O8 as discussed above in conjunction with Example 1. [168] In a third embodiment, the configuration manager 194 instantiates the function of a pipeline 44 corresponding to a missing firmware file 180 in a data-processing application thread 240 as discussed above in conjunction with Example 3. [169] In a fourth embodiment, the configuration manager 194 instantiates on the pipeline accelerator 14 (FIG. 1) a portion of the circuit 220 per the above-described second embodiment of Example 5, and effectively instantiates the remaining portion of the circuit 220 in one or more data-processing application threads 240 per the preceding paragraph. Example 6
[170] Referring to FIGS. 10 - 11, this example is similar to Example
1 , except that the circuit-definition file 770 (FIG. 8) that defines the circuit 220 specifies that the PLICs 6O1 - 6O8 are to be "directly" interconnected via the pipeline bus 20 (FIG. 1). That is, the PLIC 6O1 provides the stream of values sin(x) to the PLIC 6O5 without going through the message handler 34 and memory 18 as shown in FIG. 12.
[171] In a first embodiment, the corresponding firmware files 18O1 -
18O7 (file 18O5 is used twice) instantiate the communication interfaces 80 (FIG. 3) of the PLICs 6O1 - 6O8 Xo generate and send message objects (not shown) that identify the recipient PLIC and to recognize and receive messages from specified sender PLICs. Such message objects are described in previously incorporated U.S. Patent Publication No. 2004/0181621. In summary, these message objects each include an address header that identifies the destination PLIC or PLICs. For example, the communication interface 80 (FIG. 3) of the PLIC 6O1 generates message objects that carry values sin(x) to the PLIC 6O5. These message objects each include an address header that includes the address of the PLIC 6O5. Therefore, when the communication interface 80 of the PLIC 6O5 detects on the pipeline bus 20 (FIG. 1) a message object having this address, the interface uploads this message object from the bus. The remaining PLICs 6O2 - 6O8 receive and generate message objects in a similar manner.
[172] In a second embodiment, the configuration manager 194 soft configures the communication interfaces 80 (FIG. 3) of the PLICs 6O1 - 6O8 to receive and generate message objects per the preceding paragraph by sending appropriate soft-configuration data to the configuration managers 88 (FIG. 3) of the PLICs as discussed above in conjunction with Example 1.
[173] Referring to FIGS. 4 -14, other embodiments of the peer vector machine 10 (FIG. 1) are contemplated. For example, instead of pointing to a circuit-definition file 170 in the circuit-definition library 106, the configuration data 700 may include meta-data that describes an algorithm, such as the algorithm represented by equation (1), and the configuration manager 794 may cause the peer vector machine 70 to implement the algorithm based on this meta-data. More specifically, the configuration manager 794 may first determine the attributes of the peer vector machine 70 as previously described. Next, based on the meta-data and the determined attributes of the peer vector machine 70, the configuration manager 794 may define an implementation of the algorithm that is compatible with the platform(s) supported by, and the components present within, the peer vector machine. The configuration manager 794 may define the implementation using one or more templates from the library 702, one or more software objects 760 from the library 704, one or more circuit-definition files from the library 706, and one or more firmware files 780 from the library 708, or using any combination of these items. Then, the configuration manager 794 may instantiate the implementation on the peer vector machine 70 using any technique described above, any other technique(s), or any combination of these described/other techniques. Of course the configuration data 700 may include both meta-data that describes an algorithm and a pointer to a circuit-definition file 770 that defines a circuit for implementing the algorithm. If for some reason the circuit defined by the file 170 is incompatible with the peer vector machine 10, then the configuration manager 194 may define an implementation of the algorithm per above. Moreover, one may write such meta-data manually, or use a tool, such as that described in previously incorporated U.S. Patent App. Ser. No. (Attorney Docket Nos. 1934-23-3 and 1934-35- 3), to generate the meta-data.
Reconfiguration of the Peer-Vector Machine 10 of FIG. 1 by the Configuration Manager 194 While The Peer-Vector Machine Is Operating (Dynamic Reconfiguration)
[174] Dynamic reconfiguration of the peer-vector machine 10 (FIG. 1) by the configuration manager 194 (FIG. 10) is discussed below in conjunction with FIGS. 10 - 12 and 14 - 16 according to embodiments of the invention. Although a number of detailed examples are provided below, the following is a general overview of dynamic reconfiguration and some of the advantages that it provides.
[175] Conventional fault-tolerant computing machines (not shown) often have built-in redundancy such that if one portion of the machine fails during operation, another, redundant, portion can take over for the failed part. For example, if a processor fails, then a redundant processor can take over for the failed processor. Typically, if one wants to add redundancy for a component of the machine, then he adds to the machine a like redundant component. For example, if one wants to add redundancy to a bank of processors, then he typically adds to the machine at least one redundant processor. The same is true for other components such as hard drives. [176] The configuration manager 194 can render the peer-vector machine 10 (FIG. 1) fault tolerant in a manner that is often more flexible and less costly than a conventional redundancy scheme. For example, as discussed below, the configuration manager 194 may transfer a function previously performed by a failed pipeline unit 50 (FIG. 2) of the pipeline accelerator 14 (FIG. 1) to the host processor 12 (FIG. 1), and vice versa. That is, the host processor 12 may provide redundancy to the accelerator 14, and vice versa. Consequently, instead of adding redundant processing units to the host processor 12, it may be less expensive and less complex from a design perspective to add extra pipeline units 50 to the accelerator 14, where the configuration manager 194 can use these extra units to provide redundancy to both the host processor 12 and the accelerator. Or, instead of adding extra pipeline units 50 to the accelerator 14, it may be less expensive and less complex from a design perspective to add extra processing units 32 to the host processor 12, where the configuration manager 194 can use these extra processing units to provide redundancy to both the host processor and the accelerator. Of course the peer-vector machine 10 may include both extra processing units 32, and pipeline units 50 and may also include extras of other components of the host-processor 12 and the accelerator 14. Example 7
[177] The PLIC 6O1 (FIG. 11 ), or another portion of the pipeline unit
5O1 on which the PLIC 6O1 is disposed, experiences a "soft" failure while the peer-vector machine 10 (FIG. 1) is operating, and the circuit 220 is executing equation (1 ). Examples of a soft failure include, e.g., corrupted configuration firmware or soft-configuration data stored in the PLIC 6O1, a buffer overflow, and a value sin(x) that is generated by the pipeline 441 on the PLIC 6O1 but that is out of the predetermined range -1 < sin(x) < 1.
[178] First, the accelerator-exception manager 192 detects the failure of the PLIC 6O1.
[179] In one embodiment, the exception manager 192 detects the failure in response to an exception received from the exception manager 86 on board the PLIC 6O1. For example, because -1 < sin(x) < 1 , then the exception manager 86 may be programmed to generate an exception if a value generated by the pipeline 441 is less than -1 or greater than 1. Or, the exception manager 86 may be programmed to send an exception if an input buffer for the value x on the data memory 58 overflows.
[180] In a second embodiment, the exception manager 192 detects the failure in response to an improper value of data provided to or generated by the pipeline 441 on the PLIC 6O1. For example, the exception manager 192 may periodically analyze the stream of values x provided to the PLIC 6O1, or the stream of values sin(x) provided by the PLIC 60i, and detect a failure of the PLIC 6O1 if any of the analyzed values are less than - 1 or greater than 1. [181] In a third embodiment, the exception manager 192 detects the failure in response to the PLIC 60-ι failing to provide the stream of values sin(x). For example, per the previous paragraph, the exception manager 192 may periodically analyze the stream of values sin(x) provided by the PLIC 6O1, and detect a failure of the PLIC 6O1 if the PLIC 6O1 stops generating sin(x) despite continuing to receive the stream of input values x.
[182] Next, the exception manager 792 notifies the configuration manager 194 that the PLIC 6O1 has experienced a soft failure.
[183] In response to this notification, the configuration manager 194 first halts the processing of data by the PLICs 6O1 - 6O8 and any related data-processing applications 190 that the processor unit 32 is executing. Examples of a related data-processing application 190 include an application that generates the values x or z or that receives and processes the values y.
[184] Next, if the configuration manager 194 previously loaded soft-configuration data into the PLIC 6O1 during initialization of the peer-vector machine 10 (FIG. 1), then the configuration manager reloads this data into the PLIC 6O1 and restarts the processing of data by the PLICs 6O1 - 6O8 and any related data-processing applications 790 that the processing unit 32 is executing. [185] If the exception manager 192 detects no failure of the PLIC 6O1 after the restart, then the configuration manager 194 allows the PLICs 6O1 - 6O8 and any related data-processing applications 190 to continue processing data. [186] But if the configuration manager 194 did not load soft-configuration data into the PLIC 6O1 during initialization of the peer-vector machine 10, or if the exception manager 192 detects a failure of the PLIC 6O1 after the restart, then the configuration manager again halts the processing of data by the PLICs 6O1 - 6O8 and any related data-processing applications 190.
[187] Next, the configuration manager 194 causes the PLIC 6O1 to re-download the firmware file 180 (FIG. 9) that the PLIC 6O1 downloaded during initialization of the peer-vector machine 10 (FIG. 1), and restarts the PLICs 6O1 - 6O8 and any related data-processing applications 190 for a second time.
[188] If the exception manager 192 detects no failure of the PLIC 6O1 after the restart, then the configuration manager 194 allows the PLICs 6O1 - 6O8 and any related data-processing applications 190 to continue processing data. [189] But if the exception manager 192 detects a failure of the PLIC
6O1 after the second restart, then the configuration manager 794 again halts the processing of data by the PLICs 6O1 - 6O8 and any related data-processing applications 190.
[190] Then, the configuration manager 194 determines whether the pipeline accelerator 14 (FIG. 1) includes an extra PLIC 60 that is the same as or is similar to the PLIC 6O1. The extra PLIC may be a PLIC that is reserved to replace a failed PLIC, or may merely be a PLIC that is unused. Also, the extra PLIC may be on an extra pipeline unit 50, or on a pipeline unit 50 that includes other, no-extra PLICs. Liyij If the pipeline accelerator 14 (FlG. 1) does include an extra
PLIC, then the configuration manager 194 causes the extra PLIC to download the same firmware file 180 (FIG. 9) previously downloaded by the PLIC 6O1 during initialization of the peer-vector machine 10 (FIG. 1) and prior to the second restart.
[192] Next, the configuration manager 194 restarts the PLICs 6O2 -
6O8, the extra PLIC, and any related data-processing applications 190 such that the extra PLIC takes the place of the failed PLIC 6O1 in the circuit 220.
[193] If the exception manager 192 detects no failure of the extra PLIC after the third restart, then the configuration manager 194 allows the PLICs 6O2 - 6O8, the extra PLIC, and any related data-processing applications 190 to continue processing data.
[194] But if the pipeline accelerator 14 (FIG. 1) includes no extra
PLIC, or if the exception manager 192 detects a failure of the extra PLIC after the third restart, then the configuration manager 194 halts for a fourth time the processing of data by the PLICs 6O1 - 6O8 and any related data-processing applications 190 if the data processing is not already halted.
[195] Then, if the pipeline accelerator 14 (FIG. 1) includes another extra PLIC, then the configuration manager 194 may replace the failed PLIC 6O1 with this other extra PLIC, and restart the data processing as discussed above.
[196] But if the pipeline accelerator 14 (FIG. 1) contains no other extra PLICs (or if these extra PLICs fail), then the configuration manager 194 determines whether the circuit 220 can "fit" into the remaining PLICs 6O2 - 6O8 in a manner similar to that discussed above in conjunction with Example 2.
[197] If the circuit 220 can "fit" into the remaining PLICs 6O2 - 6O8, then the configuration manager 194 reinstantiates the circuit 220 on these remaining PLICs in a manner similar to that discussed above in conjunction with Example 2, and restarts the PLICs 6O2 - QO8 and any related data- processing applications 790.
[198] If the exception manager 192 detects no failure of the reinstantiated circuit 220 after the restart, then the configuration manager 194 allows the PLICs QO2 - QO8 and any related data-processing applications 790 to continue processing data.
[199] But if the circuit 220 cannot fit into the PLICs QO2 - QO8, or if exception manager 792 detects a failure of the reinstantiated circuit 220 after the restart, then the configuration manager 794 halts the processing of data by the PLICs QO2 - GO8 and any corresponding data-processing applications 790 if the data-processing is not already halted.
[200] Next, the configuration manager 794 reads the software-object descriptions 792 (FIG. 7) to determine whether the library 704 (FIGS. 4 and 7) includes a software object 760 (FIG. 7) that can generate sin(x).
[201] If the library 704 (FIGS. 4 and 7) includes such a sin(x) software object 760 (FIG. 7), then the configuration manager 794 instantiates on the processing unit 32 a data-processor application thread that executes the object 760 for generating sin(x) in a manner similar to that discussed above in conjunction with Example 3 and FIG. 14, and restarts the data processing.
[202] If the exception manager 792 detects no failure of the circuit
220 (includes the application thread executing the sin(x) software object 760) after the restart, then the configuration manager 794 allows the PLICs GO2 - GO8, the sin(x) application thread that executes the sin(x) software object 760, and any related data-processing applications to continue processing data.
[203] But if the library 704 (FIGS. 4 and 7) includes no sin(x) software object 760 (FIG. 4), then the configuration manager 794 generates W
an error message, in response to which an operator (not shown) may take corrective action such as replacing the PLIC 6O1 or replacing the pipeline unit 50 on which the defective PLIC 6O1 is disposed.
[204] Still referring to FIGS. 3, 10, and 11, alternate embodiments of Example 7 are contemplated. For example, the configuration manager 194 may omit any number of the above-described steps, and perform the non- omitted steps in any order. An example, in response to a failure of the PLIC 6O1, the configuration manager 194 may generate an application thread that executes a sin(x) software object 160 (FIG. 7) without first trying to reconfigure the PLIC 6O1, to re-download the respective firmware file 180 (FIG. 9) to the PLIC 6O1 to replace the PLIC 6O1 with an extra PLIC, or to "fit" the circuit 220 on the remaining PLICs 6O2 - 6O8. Furthermore, the exception manager 190 may be omitted, and the configuration manager 194 may directly detect the failure of one or more PLICs 6O1 - 6O8. Moreover, although described as halting the PLICs 6O1 - 6O8 and related data-processing applications 190 in response to a failure of one of the PLICs 6O1 - 6O8, the configuration manager 194 may halt other portions of the peer-vector machine 10 as well, including halting the entire peer-vector machine. Example 8
[205] Referring to FIGS. 3 and 10 - 11, in this example, a data-processing application 190 is generating y of equation (1) and experiences a failure while the peer-vector machine 10 (FIG. 1) is operating. Examples of such failure include, e.g., a mechanical failure of one or more processors that compose the processing unit 32, or the inability of the data-processing application 190 to process data at or above a specified speed.
[206] First, the exception manager 192 detects the failure of the data-processing application 190. [207] In a first embodiment, the exception manager 192 detects the failure in response to an improper value of x or z being provided to the data- processing application 190, or in response to an improper value of y being generated by the application. For example, the exception manager 192 may periodically analyze the respective streams of values x and z provided to the data-processing application 190, or the stream of values y generated by the data-processing application, and detect a failure of the data-processing application if, e.g., the analyzed values are outside of a predetermined range or the data-processing application stops generating output values y despite continuing to receive the values x and z.
[208] In a second embodiment, the exception manager 192 detects the failure in response to the frequency at which the data-processing application 190 generates the values y being below a predetermined frequency. [209] Next, the exception manager 192 notifies the configuration manager 194 that the data-procession application 790 has failed.
[210] In response to this notification, the configuration manager 194 first halts the processing of data by the data-processing application 190 and any related PLICs 60 (FIG. 3) of the pipeline accelerator 14 (FIG. 1). Examples of a related PLIC include a PLIC that generates the input values x or z for the data-processing application 190 or that receive the values y from the application.
[211] Next, if the failure is due to a mechanical failure of a portion of the processing unit 32, then the configuration manager 194 determines whether the data-processing application 190 can be loaded onto and run by another portion of the processing unit such as an extra processor.
[212] If the data-processing application 190 can be loaded onto and run by another portion of the processing unit 32, then the configuration manager 194 loads the data-processing application onto the other portion of the processing unit 32, and restarts the application and any related PLICs.
[213] If the exception manager 192 detects no failure of the data-processing application 190 after the restart, then the configuration manager 194 allows the application and any related PLICs to continue processing data.
[214] But if the configuration manager 194 cannot load and run the data-processing application 190 on another portion of the processing unit 32, or if the exception manager 192 detects a failure of the application after the restart, then the configuration manager halts the processing of data by the application and any related PLICs 60.
[215] Next, the configuration manager 194 attempts to instantiate on the pipeline accelerator 14 (FIG. 1) a circuit, such as the circuit 220, for generating the stream of values y of equation (1 ) in place of the failed data-processing application 190.
[216] First, the configuration manager 194 determines whether the library 108 (FIGS. 4 and 9) includes a firmware file 180 (FIG. 9) that can instantiate such a circuit on a single PLIC 60.
[217] If the library 108 (FIGS. 4 and 9) includes such a firmware file 180 (FIG. 9), then the configuration manager 194 downloads the file to a PLIC 60 of the pipeline accelerator 14 (FIG. 1), generates data-transfer objects 202 for transferring x and z to the PLIC and for transferring y from the PLIC, and starts the accelerator. Alternatively, the configuration manager 194 may omit some or all of the data-transfer objects 202 if the pipeline accelerator 14 receives x or z via the input port 24 or provides y via the output port 26.
[218] If the exception manager 192 detects no failure of the single
PLIC 60 after the start of the pipeline accelerator 14 (FIG. 1), then the configuration manager 194 allows the PLIC 60 and any related data-processing application 190 (e.g., to provide x or z to receive y from the PLIC 60) to continue processing data.
[219] But if the library 108 (FIGS. 4 and 9) includes no such firmware file 180 (FIG. 9), then the configuration manager 194 determines whether the library 106 (FIGS. 4 and 8) includes a circuit-definition file 170 (FIG. 8) that describes a circuit, such as the circuit 220, for generating y of equation
(1 ).
[220] If the library 108 (FIGS. 4 and 9) includes such a circuit-definition file 770, then the configuration manager 194 downloads the corresponding firmware files 180 (FIG. 9) to the corresponding PLICs 60 of the pipeline accelerator 14 (FIG. 1). For example, if the file circuit-definition 170 describes the circuit 220 of FIG. 11 , then the configuration manager 194 downloads the firmware files 180i - 18O7 into the respective PLICs 6O1 - 6O8 (the file 18O5 is downloaded into both the PLICs 6O5 and 6O6 as discussed above in conjunction with Example 1 ). If however, the library 108 lacks at least one of the firmware files 180 corresponding to the circuit-definition file 770, then the configuration manager 794 may, as discussed above in conjunction with Example 5, generate the omitted firmware file from templates in the library 702 (FIGS. 4 and 6), store the generated firmware file in the library 708, and download the stored firmware file into the respective PLIC 60.
[221] But if the library 706 (FIGS. 4 and 8) includes no such circuit-definition file 770 (FIG. 8), then the configuration manager 794 may use the circuit-design tool (not shown) described in previously incorporated U.S. Patent Application Ser. No. (Attorney Docket No. 1934-023-03) to generate such a circuit-definition file as discussed above in conjunction with Example 5. Next, the configuration manager 794 generates (if necessary) and downloads the corresponding firmware files 18O1 - 78O7 into the corresponding PLICs 6O1 - 6O8 as described in the preceding paragraph. [222] After downloading the firmware files 18O1 - 18O7 into the PLICs
6O1 - 6O8, the configuration manager 194 instantiates the data-transfer objects 20I1 - 20221 (FIG. 12) as discussed above in conjunction with Example 1 , and starts the PLICs 6O1 - 6O8 and any related data-processing applications 790.
[223] But if the configuration manager 194 cannot instantiate on the pipeline accelerator 14 (FIG. 1) a circuit for generating y of equation (1), then the configuration manager generates an error message in response to which an operator (not shown) can take corrective action. The configuration manager 194 may be unable to instantiate such a circuit because, e.g., the accelerator 14 lacks sufficient resources or does not support a compatible platform, or the library 102 (FIGS. 4 and 6) lacks the proper templates.
[224] Still referring to FIGS. 3 and 10 - 11 , alternate embodiments of Example 8 are contemplated. For example, the configuration manager 194 may omit any number of the above-described steps, and perform the unomitted steps in any order. Furthermore, the exception manager 190 may be omitted, and the configuration manager 794 may directly detect the failure of the data-processing application 790 that generates y of equation (1 ), and may directly detect the failure of any other portion of the peer- vector machine 70 (FIG. 1).
System Save. Restore, and Redundancy
[225] FIG. 15 is a block diagram of the peer-vector machine 70, which, in addition to the host processor 12 and pipeline accelerator 14, includes at least one redundant processing unit 250 and at least one redundant pipeline unit 252 according to an embodiment of the invention.
[226] The redundant processing units 250 and the redundant pipeline units 252 provide fault-tolerant capabilities in addition to the dynamic-reconfiguration capabilities described above in conjunction with Examples 7 and 8. For example, if a PLIC 60 (FIG. 3) in the pipeline accelerator 14 fails, then the configuration manager 194 (FIG. 10) may dynamically reconfigure a redundant PLIC (not shown) on a redundant pipeline unit 252 to replace the failed PLIC 60 in a manner that is similar to that described above in conjunction with Example 7. Similarly, if the processing unit 32 (FIG. 10) of the host processor "/2 fails, then the configuration manager 194 may dynamically reconfigure a redundant processing unit 250 to replace the failed processing unit in a manner that is similar to that described above in conjunction with Example 8. In addition, the configuration manager 194 may dynamically reconfigure a redundant processing unit 250 to replace a failed portion of the pipeline accelerator 14, or may dynamically reconfigure one or more redundant PLICs on one or more of the redundant pipeline units 252 to replace a failed processing unit 32 or another failed portion of the host processor 12 in a manner that is similar to that described above in conjunction with Examples 7 and 8.
[227] . Referring to FIGS. 10-15 and Examples 7 and 8, the dynamic reconfiguration of the host processor 12 and the pipeline accelerator 14 may destroy the states of the, e.g., registers (not shown), in the host processor and in the pipeline accelerator. Consequently, once restarted after dynamic reconfiguration, the host processor 12 and pipeline accelerator 14 may need to reprocess all of the data processed prior to the failure that initiated the reconfiguration.
[228] Unfortunately, the reprocessing of pre-failure data may adversely affect some applications of the peer-vector machine 10, such as the processing of data from a sonar array or other application where the peer-vector machine processes data in real time.
[229] FIG. 16 is a block diagram of the peer-vector machine 10, which includes system-restore capabilities according to an embodiment of the invention. Generally, this embodiment of the machine 10 periodically saves the states of some or all of the, e.g., registers, within the host processor 12 and the pipeline accelerator 14. Therefore, in the event of a failure and a subsequent restart, the peer-vector machine 10 can respectively restore the last-saved states to the host processor 12 and to the pipeline accelerator 14 so as to reduce or eliminate the volume of pre-failure data that the machine must reprocess.
[230] In addition to the host processor 12, the pipeline accelerator
14, the pipeline bus 20, the optional router 31, the optional redundant processing unit(s) 250, and the optional redundant pipeline unit(s) 254, this embodiment of the peer-vector machine 10 includes a system-restore server 260 and a system-restore bus 262.
[231] During operation of the peer-vector machine 10, the registers and other data-storing components of the host processor 12 and the pipeline accelerator 14 (and the redundant processing unit(s) 250 and pipeline unit(s) 252 if present and in use) periodically "dump" their contents onto the system-restore server 260 via the system-restore bus 262. The separation of the system-restore bus 262 from the pipeline bus 20 reduces or eliminates a data-processing-speed penalty that this data dump may cause, and otherwise prevents a "bottleneck" on the bus 20.
[232] After a dynamic reconfiguration but before a restart of the peer-vector machine 10, the host processor 12 causes the server 260 to upload the last-saved set of data into the respective registers and other data-storing components.
[233] Therefore, after the restart, the peer-vector machine 10 starts processing data from the point in time of the last-dumped set of data, and thus reprocesses only the pre-failure data that it processed between the last data dump and the failure.
[234] Consequently, by reducing the amount of pre-failure data that the peer-vector machine 10 reprocesses, the system-restore server 260 and the system-restore bus 262 provide a reduction in the overall data-processing time whenever the configuration manager 194 dynamically reconfigures and restarts the peer-vector machine.
[235] Still referring to FIG. 16, other embodiments of the peer-vector machine 10 are contemplated. For example, the system-restore bus 262 may be omitted, and the host processor 12 and the pipeline accelerator 14 (and the redundant processing unit(s) 250 and the redundant pipeline units 252 if present and in use) dump data to the system-restore server 260 via the pipeline bus 20.
[236] FIG. 17 is a block diagram of a hardwired pipeline 44 that includes a save/restore circuit 270 according to an embodiment of the invention. The circuit 270 allows the pipeline 44 to periodically "dump" the data within the pipeline's working registers (not shown in FIG. 17), and to restore the dumped data, as discussed above in conjunction with FIGS. 15- 16.
[237] The save/restore circuit 270 is part of the framework-services layer 72 (FIGS. 2-3), and causes the working registers (not shown in FIG. 17) of the hardwired pipeline 44 to dump their data to the system-restore server 260 (FIG. 16) via the system-restore bus 262 under the control of a data-save manager 272, which is executed by the processing unit 32 of the host processor 12 (FIG. 10). The data-save manager 272 and the circuit 270 may communicate with one another by sending messages over the system-restore bus 262, or over the pipeline bus 20 (FIG. 16). Furthermore, the data-save manager 272 may be a part of the configuration manager 194 (FIG. 10), or the configuration manager 194 may perform the function(s) of the data-save manager.
[238] During a system restore, the save/restore circuit 270 causes the working registers (not shown in FIG. 17) of the hardwired pipeline 44 to load saved data (typically the lasted-saved data) from the system-restore server 260 (FIG. 16) via the system-restore bus 262 under the control of a data-restore manager 274, which is executed by the processing unit 32 of the host processor 12 (FIG. 10). The data-restore manager 274 and the circuit 270 may communicate with one another by sending messages over the system-restore bus 262, or over the pipeline bus 20 (FIG. 16). Furthermore, the data-restore manager 274 may be a part of the configuration manager 194 (FIG. 10), or the configuration manager 194 may perform the function(s) of the data-restore manager.
[239] In addition to the save/restore circuit 270, the hardwired pipeline 44 includes one or more configurable registers and logic 276, and one or more exception registers and logic 278. The configurable registers and logic 276 receive and store configuration data from the configuration manager 88 (see also FIG. 3), and use this stored configuration data to configure the pipeline 44 as previously described. The exception registers and logic 274 generate and store exception data in response to exceptions that occur during operation of the pipeline 44, and provide this data to the exception manager 86 (see also FIG. 3) for handling as previously described.
[240] Still referring to FIG. 17, the operation of the hardwired pipeline
44 is described according to an embodiment of the invention.
[241] During normal operation, the data-save manager 272 periodically causes the save/restore circuit 270 to dump the data from selected working registers (not shown in FIG. 17) within the hardwired pipeline 44 to the system-restore server 260 via the system-restore bus 262. Data within the configurable register(s) 276 typically selects which working registers dump their data, and the data may so select any number of the working registers. Alternatively, the data-save manager 272 may cause the save/restore circuit 270 to dump data from the selected working registers via the pipeline bus 20. Furthermore, if a data dump from one or more of the selected working registers fails, then the exception register(s) and logic 278 may send a corresponding exception to the exception manager 86. In response to such an exception, the configuration manager 194 may repeat the data-dump operation, at least for the hardwired pipeline(s) 44 that generate the exception.
[242] During a system-restore operation, the data-restore manager
274 causes the save/restore circuit 270 to load previously dumped and saved data from the system-restore server 260 (FIG. 16) into the respective working registers (not shown in FIG. 17) within the hardwired pipeline 44 via the system-restore bus 262 or the pipeline bus 20. Before loading the data, the configuration manager 194 (FIG. 10) typically loads the configurable register(s) 276 with data that selects which working registers are to load data. Alternatively, data identifying the working registers which are to load restored data may have been previously stored in nonvolatile memory within the configuration register(s) and logic 276. The save/restore circuit 270 may then run a check to make sure that it properly loaded the restore data. If the check fails, then the exception register(s) and logic 278 may send a corresponding exception to the exception manager 86. In response to such an exception, the configuration manager 194 may repeat the system-restore operation, at least for the hardwired pipeline(s) 44 that generate the exception.
[243] FIG. 18 is a more-detailed block diagram of the hardwired pipeline 44 of FIG. 17 according to an embodiment of the invention.
[244] In addition to the save-restore circuit 270, the hardwired pipeline 44 includes one or more working registers 280 (for clarity, only one working register is shown in FIG. 18), a respective input-data multiplexer 282 for each working register, a load port 281, a data-input port 283, and a data-output port 285.
[245] The save-restore circuit 270 includes a respective data-save register 284 and a respective data-restore register 286 for each working register 280, saved-data transmit logic 288, and restored-data receive logic 290. [246] Still referring to FIG. 18, the operation of the hardwired pipeline
44 is described according to an embodiment of the invention.
[247] During normal operation, the data-save manager 272 causes the data-save register 284 to download the data from the corresponding working register 280 during each predetermined number of cycles of the save-restore clock. The data-save manager 272 also causes the transmit logic 288 to transfer the data from the register 284 to the system-restore server 260 (FIG. 16), typically at the same rate at which the register 284 downloads data from the working register 280. Furthermore, the working register 280 may load data from the data-input port 283 via the multiplexer 282 in response to a hardwired-pipeline clock and a load command on the load port 281, and may provide data via the data-output port 285. Alternatively, the save-restore circuit 270 may include fewer data-save registers 284 than working registers 280, such that a single data-save register may serve multiple working registers, perhaps even all, of the working registers within the pipeline 44. In such an alternative embodiment, such a data-save register 284 cooperates with the transmit logic 288 to download data from the corresponding working registers 280 in a serial fashion.
[248] During a system-restore operation, the data-restore manager
274 causes the receive.logic 290 to load previously saved data from the system-restore server 260 (FIG. 16) and into the data-restore registers 286 during each predetermined number of cycles of the save-restore clock. The data-restore manager 274 also causes each data-restore register 286 to load the previously saved data back into a respective working register 280 via a respective multiplexer 282. Once all of the working registers 280 are loaded with respective previously saved data, then the configuration manager 194 may return the hardwired pipeline 44 to normal operation. Alternatively, the save-restore circuit 270 may include fewer data-restore registers 286 than working registers 280, such that a single data-restore register may serve multiple working registers, perhaps all of the working registers in the pipeline 44. In such an alternative embodiment, such a data-restore register 286 cooperates with the receive logic 290 to upload data to the corresponding working registers 280 in a serial fashion. [249] Referring to FIGS. 1-18, alternate embodiments of the peer vector machine 10 are contemplated. For example, some or all of the components of the peer vector machine 10, such as the host processor 12 (FIG. 1) and the pipeline units 50 (FIG. 3) of the pipeline accelerator 14 (FIG. 1), may be disposed on a single integrated circuit.
[250] The preceding discussion is presented to enable a person skilled in the art to make and use the invention. Various modifications to the embodiments will be readily apparent to those skilled in the art, and the generic principles herein may be applied to other embodiments and applications without departing from the spirit and scope of the present invention. Thus, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.

Claims

WHAT IS CLAIMED IS:
1. A computing machine, comprising: programmable integrated circuits; a configuration registry operable to store a file that defines a circuit having portions; and a processor coupled to the registry and, in response to the file, operable to instantiate a first one of the circuit portions on a first one of the programmable integrated circuits.
2. The computing machine of claim 1 wherein each of the programmable integrated circuits comprises a respective field-programmable gate array.
3. The computing machine of claim 1 wherein each of the circuit portions comprises a respective hardwired pipeline.
4. The computing machine of claim 1 wherein the processor is further operable to: instantiate a second of the circuit portions on a second one of the programmable integrated circuits; and generate and execute a data-transfer object that is operable to receive data from the first instantiated circuit portion and to provide the data to the second instantiated circuit portion.
5. The computing machine of claim 1 wherein the processor is further operable to: instantiate a second of the circuit portions on a second one of the programmable integrated circuits; and generate and execute a data-transfer object that is operable to receive data from the first integrated circuit and to provide the data to the second integrated circuit.
6. The computing machine of claim 1 wherein: the registry is further operable to store a firmware file that corresponds to the first circuit portion; and the processor is operable to instantiate the first circuit portion by providing the firmware file to the first integrated circuit.
7. The computing machine of claim 1 wherein: the processor is operable to generate a firmware file that corresponds to the first circuit portion; and the processor is operable to instantiate the first circuit portion by providing the firmware file to the first integrated circuit.
8. The computing machine of claim 1 wherein: the registry is further operable to store a firmware file that corresponds to the first circuit portion; the processor is further operable to generate a firmware file that corresponds to a second one of the circuit portions; and the processor is further operable to instantiate the first and second circuit portions by providing the stored firmware file to the first integrated circuit and by providing the generated firmware file to a second one of the integrated circuits.
9. The computing machine of claim 1 wherein: the registry is further operable to store a firmware file that corresponds to a second one of the circuit portions; the processor is further operable to generate a firmware file that corresponds to the first circuit portion; and the processor is further operable to instantiate the first and second circuit portions by providing the stored firmware file to a second one of the integrated circuits and by providing the generated firmware file to the first integrated circuit.
10. The computing machine of claim 1 wherein the processor is further operable to configure an operating parameter of the instantiated first circuit portion by sending configuration data to the first integrated circuit.
11. The computing machine of claim 1 wherein the processor is further operable to: determine whether the programmable integrated circuits are together operable to hold all of the circuit portions; and if the programmable integrated circuits are together operable to hold all of the circuit portions, then instantiating each of the circuit portions on a respective one of the integrated circuits.
12. The computing machine of claim 1 wherein the processor is further operable to: determine whether the programmable integrated circuits are together operable to hold all of the circuit portions; and if the programmable integrated circuits are not together operable to hold all of the circuit portions, then instantiating a second one of the circuit portions on the first integrated circuit.
13. The computing machine of claim 1 wherein: each of the circuit portions is operable to perform a respective function; and the processor is further operable to, determine whether the programmable integrated circuits are together operable to hold all of the circuit portions; and if the programmable integrated circuits are not together operable to hold all of the circuit portions, then executing the function of a second one of the circuit portions.
14. A computing machine, comprising: a configuration registry operable to store a file that defines a circuit having one or more portions that are each operable to perform a respective function; and a processor coupled to the registry and, in response to the file, operable to, determine whether a programmable integrated circuit is available to hold any of the circuit portions, and if no programmable integrated circuit is available to hold any of the circuit portions, then executing the functions of the circuit portions.
15. A system, comprising: a computing machine, comprising, programmable integrated circuits, a configuration registry operable to store a file that defines a circuit having portions, and a processor coupled to the registry and, in response to the file, operable to instantiate a first one of the circuit portions on a first one of the programmable integrated circuits.
16. A system, comprising: a computing machine, comprising, a configuration registry operable to store a file that defines a circuit having one or more portions that are each operable to perform a respective function, and a processor coupled to the registry and, in response to the file, operable to, determine whether a programmable integrated circuit is available to hold any of the circuit portions, and if no programmable integrated circuit is available to hold any of the circuit portions, then executing the functions of the circuit portions.
17. A method, comprising: reading with a processor a file that defines a circuit having portions; and instantiating with the processor and, in response to the file a first one of the circuit portions on a first programmable integrated circuit.
18. The method of claim 17 wherein the first programmable integrated circuit comprises a field-programmable gate array.
19. The method of claim 17, further comprising performing a pipelined operation with the instantiated first circuit portion.
20. The method of claim 17, further comprising: instantiating with the processor a second circuit portion on a second integrated circuit; executing with the processor a data-transfer object that is operable to receive data from the first instantiated circuit portion and to provide the data to the second instantiated circuit portion.
21. The method of claim 17, further comprising: instantiating with the processor a second circuit portion on a second integrated circuit; executing with the processor a data-transfer object that is operable to receive data from the first integrated circuit and to provide the data to the second integrated circuit.
22. The method of claim 17 wherein the processor is operable to instantiate the first circuit portion by providing a corresponding firmware file to the first integrated circuit.
23. The method of claim 17, further comprising: generating with the processor a firmware file that corresponds to the first circuit portion; and wherein the processor is operable to instantiate the first circuit portion by providing the firmware file to the first integrated circuit.
24. The method of claim 17, further comprising configuring an operating parameter of the instantiated first circuit portion by sending configuration data to the first integrated circuit with the processor.
25. The method of claim 17, further comprising: determining with the processor whether the first integrated circuit is operable to hold the first circuit portion and a second circuit portion; and if the first integrated circuit is operable to hold the first and second circuit portions, then instantiating with the processor the first and second circuit portions on the first integrated circuit.
26. The method of claim 17, further comprising: determining with the processor whether a group of one or more programmable integrated circuits that includes the first integrated circuit are together operable to hold a group of one or more circuit portions that includes the first circuit portion; and if the group of programmable integrated circuits is not operable to hold all of the one or more circuit portions, then executing with the processor the function of a second one of the circuit portions.
27. A method, comprising: determining with a processor whether a programmable integrated circuit is available to hold any of one or more portions of a circuit, each circuit portion operable to perform a respective function; and if no programmable integrated circuit is available to hold any of the one or more circuit portions, then executing with the processor the functions of the one or more circuit portions.
28. A computing machine, comprising: an electronic circuit operable to perform a function; a first programmable integrated circuit; and a first processor coupled to the electronic circuit and to the programmable integrated circuit and operable to, detect a failure of the electronic circuit, and configure the programmable integrated circuit to perform the function in response to detecting the failure.
29. The computing machine of claim 28 wherein the electronic circuit comprises a hardwired pipeline.
30. The computing machine of claim 28 wherein the electronic circuit is disposed on a second integrated circuit.
31. The computing machine of claim 28 wherein the electronic circuit is disposed on a second programmable integrated circuit.
32. The computing machine of claim 28 wherein: the electronic circuit is disposed on the first programmable integrated circuit; and the processor is operable to configure the first programmable integrated circuit to perform the function by modifying the electronic circuit.
33. The computing machine of claim 28 wherein the electronic circuit comprises a second processor operable to execute instructions that cause the second processor to perform the function.
34. The computing machine of claim 28 wherein the first programmable circuit comprises a field-programmable gate array.
35. The computing machine of claim 28 wherein the processor is operable configure the first programmable integrated circuit to perform the function by instantiating the electronic circuit on the first programmable integrated circuit.
36. A system, comprising: a computing machine, comprising, an electronic circuit operable to perform a function, a first programmable integrated circuit, and a first processor coupled to the electronic circuit and to the programmable integrated circuit and operable to, detect a failure of the electronic circuit, and configure the programmable integrated circuit to perform the function in response to detecting the failure.
37. A computing machine, comprising: a hardwired pipeline operable to perform a function; and a processor coupled to the pipeline and operable to, detect a failure of the pipeline, and perform the function in response to detecting the failure.
38. The computing machine of claim 37 wherein the hardwired pipeline is disposed on a programmable integrated circuit.
39. The computing machine of claim 37 wherein the hardwired pipeline is disposed on a field-programmable gate array.
40. The computing machine of claim 37 wherein the processor is operable to perform the function by executing a software object that causes the processor to perform the function.
41. A system, comprising: a computing machine, comprising, a hardwired pipeline operable to perform a function, and a processor coupled to the pipeline and operable to, detect a failure of the pipeline, and perform the function in response to detecting the failure.
42. A computing machine, comprising: a programmable integrated circuit; and a processor coupled to the programmable integrated circuit and, operable to perform a function, and if the processor becomes unable to perform the function, operable to, instantiate on the programmable integrated circuit a hardwired pipeline operable to perform the function, and cause the hardwired pipeline to perform the function.
43. The computing machine of claim 42 wherein the programmable integrated circuit comprises a field-programmable gate array.
44. A system, comprising: a computing machine, comprising, a programmable integrated circuit, and a processor coupled to the programmable integrated circuit and, operable to perform a function, and if the processor becomes unable to perform the function, operable to, instantiate on the programmable integrated circuit a hardwired pipeline operable to perform the function, and cause the hardwired pipeline to perform the function.
45. A method, comprising: detecting a failure of an electronic circuit to perform a predetermined function; and configuring with a processor a programmable integrated circuit to perform the function in response to detecting the failure.
46. The method of claim 45 wherein configuring the programmable integrated circuit comprises instantiating the electronic circuit on the programmable integrated circuit.
47. The method of claim 45, further comprising: wherein the electronic circuit is disposed on the programmable integrated circuit; and configuring the programmable integrated circuit comprises modifying the electronic circuit.
48. A method, comprising: detecting a failure of a hardwired pipeline to perform a function; and performing the function with a processor in response to detecting the failure.
49. The method of claim 48 wherein performing the function comprises executing with the processor a software object that causes the processor to perform the function.
50. A method, comprising: with a first processor, instantiating a hardwired pipeline on a programmable integrated circuit if a second processor fails to perform a predetermined function; and with the first processor, causing the hardwired pipeline to perform the function.
51. The method of claim 50 wherein the second processor comprises the first processor.
52. A computing machine, comprising: a pipeline accelerator; a host processor coupled to the pipeline accelerator; and a redundant processor coupled to the host processor and to the pipeline accelerator.
53. The computing machine of claim 52, further comprising: wherein the pipeline accelerator comprises a pipeline unit; and a redundant pipeline unit having a redundant programmable integrated circuit and coupled to the pipeline accelerator, the host • processor, and the redundant processor.
54. An electronic system, comprising: a computing machine, comprising, a pipeline accelerator, a host processor coupled to the pipeline accelerator, and a redundant processor coupled to the host processor and to the pipeline accelerator.
55. A computing machine, comprising: a pipeline accelerator having a pipeline unit; a host processor coupled to the pipeline accelerator; and a redundant pipeline unit coupled to the pipeline accelerator and to the host processor and including a redundant programmable integrated circuit.
56. A system, comprising: a computing machine, comprising, a pipeline accelerator having a pipeline unit, a host processor coupled to the pipeline accelerator, and a redundant pipeline unit coupled to the pipeline accelerator and to the host processor and including a redundant programmable integrated circuit.
57. A computing machine, comprising: a pipeline accelerator; a host processor coupled to the pipeline accelerator; and a recovery device coupled to the pipeline accelerator and to the host processor and operable to, periodically save first data representing a state of the pipeline accelerator during a respective predetermined period, and periodically save second data representing a state of the host processor during the respective predetermined period.
58. The computing machine of claim 57 wherein after a failure of the pipeline accelerator, the recovery device is further operable to restore to the pipeline accelerator the most recent first data.
59. The computing machine of claim 57 wherein after a failure of the pipeline accelerator, the recovery device is further operable to restore to the host processor the most recent second data.
60. The computing machine of claim 57 wherein after a failure of the host processor, the recovery device is further operable to restore to the pipeline accelerator the most recent first data.
61. The computing machine of claim 57 wherein after a failure of the host processor, the recovery device is further operable to restore to the host processor the most recent second data.
62. The computing machine of claim 57, further comprising: a pipeline bus; wherein the host processor is coupled to the pipeline accelerator via the pipeline bus; and wherein the recovery device is coupled to the pipeline accelerator and to the host processor via the pipeline bus.
63. The computing machine of claim 57, further comprising: a pipeline bus; a recovery bus that is separate from the pipeline bus; wherein the host processor is coupled to the pipeline accelerator via the pipeline bus; and wherein the recovery device is coupled to the pipeline accelerator and to the host processor via the recovery bus.
64. A system, comprising: a computing machine, comprising, a pipeline accelerator, a host processor coupled to the pipeline accelerator, and a recovery device coupled to the pipeline accelerator and to the host processor and operable to, periodically save first data representing a state of the pipeline accelerator during a respective predetermined period, and periodically save second data representing a state of the host processor during the respective predetermined period.
65. A method, comprising: processing data with a pipeline accelerator and a host processor coupled to the pipeline accelerator; and processing the data with the pipeline accelerator and a redundant processor coupled the pipeline accelerator if the host processor fails.
66. A method, comprising: processing data with a pipeline unit of a pipeline accelerator and with a host processor coupled to the pipeline accelerator; and processing the data with the a redundant pipeline unit coupled to the pipeline accelerator and to the host processor if the pipeline unit fails, the redundant pipeline unit including a redundant programmable integrated circuit.
67. A method, comprising: periodically saving first data representing a state of a pipeline accelerator during a respective predetermined period; and periodically saving second data representing a state of a host processor during the respective predetermined period, the host processor being coupled to the pipeline accelerator.
68. The method of claim 67, further comprising restoring to the pipeline accelerator after a failure of the pipeline accelerator the most recent first data.
69. The method of claim 67, further comprising restoring to the host processor after a failure of the pipeline accelerator the most recent second data.
70. The method of claim 67, further comprising restoring to the pipeline accelerator after a failure of the host processor the most recent first data.
71. The method of claim 67, further comprising restoring to the host processor after a failure of the host processor the most recent second data.
72. The method of claim 67 wherein periodically saving the first and second data comprises periodically saving the first and second data via a recovery bus that is separate from a bus over which the host processor and pipeline accelerator transfer processed data.
73. The method of claim 67 wherein periodically saving the first and second data comprises periodically saving the first and second data via a bus over which the host processor and pipeline accelerator transfer processed data.
PCT/US2005/035818 2004-10-01 2005-10-03 Configurable computing machine and related systems and methods WO2006039713A2 (en)

Applications Claiming Priority (11)

Application Number Priority Date Filing Date Title
US61515804P 2004-10-01 2004-10-01
US61519304P 2004-10-01 2004-10-01
US61519204P 2004-10-01 2004-10-01
US61515704P 2004-10-01 2004-10-01
US61505004P 2004-10-01 2004-10-01
US60/615,193 2004-10-01
US60/615,157 2004-10-01
US60/615,192 2004-10-01
US60/615,158 2004-10-01
US60/615,170 2004-10-01
US60/615,050 2004-10-01

Publications (3)

Publication Number Publication Date
WO2006039713A2 WO2006039713A2 (en) 2006-04-13
WO2006039713A9 true WO2006039713A9 (en) 2006-08-17
WO2006039713A3 WO2006039713A3 (en) 2006-09-28

Family

ID=36143162

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2005/035818 WO2006039713A2 (en) 2004-10-01 2005-10-03 Configurable computing machine and related systems and methods

Country Status (1)

Country Link
WO (1) WO2006039713A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7676649B2 (en) 2004-10-01 2010-03-09 Lockheed Martin Corporation Computing machine with redundancy and related systems and methods
US7987341B2 (en) 2002-10-31 2011-07-26 Lockheed Martin Corporation Computing machine using software objects for transferring data that includes no destination information

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2352548B (en) * 1999-07-26 2001-06-06 Sun Microsystems Inc Method and apparatus for executing standard functions in a computer system
US7061485B2 (en) * 2002-10-31 2006-06-13 Hewlett-Packard Development Company, Lp. Method and system for producing a model from optical images

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7987341B2 (en) 2002-10-31 2011-07-26 Lockheed Martin Corporation Computing machine using software objects for transferring data that includes no destination information
US8250341B2 (en) 2002-10-31 2012-08-21 Lockheed Martin Corporation Pipeline accelerator having multiple pipeline units and related computing machine and method
US7676649B2 (en) 2004-10-01 2010-03-09 Lockheed Martin Corporation Computing machine with redundancy and related systems and methods
US7809982B2 (en) 2004-10-01 2010-10-05 Lockheed Martin Corporation Reconfigurable computing machine and related systems and methods

Also Published As

Publication number Publication date
WO2006039713A3 (en) 2006-09-28
WO2006039713A2 (en) 2006-04-13

Similar Documents

Publication Publication Date Title
US7676649B2 (en) Computing machine with redundancy and related systems and methods
US7987341B2 (en) Computing machine using software objects for transferring data that includes no destination information
CA2503622C (en) Computing machine having improved computing architecture and related system and method
US5968185A (en) Transparent fault tolerant computer system
US7941698B1 (en) Selective availability in processor systems
WO2004042562A2 (en) Pipeline accelerator and related system and method
US20070061779A1 (en) Method and System and Computer Program Product For Maintaining High Availability Of A Distributed Application Environment During An Update
WO1997022930A9 (en) Transparent fault tolerant computer system
US7127638B1 (en) Method and apparatus for preserving data in a high-availability system preserving device characteristic data
US7194614B2 (en) Boot swap method for multiple processor computer systems
CN101334735B (en) Non-disruptive code update of a single processor in a multi-processor computing system
US7103639B2 (en) Method and apparatus for processing unit synchronization for scalable parallel processing
WO2006039713A9 (en) Configurable computing machine and related systems and methods
US7472224B1 (en) Reconfigurable processing node including first and second processor cores
US7584271B2 (en) Method, system, and computer readable medium for delaying the configuration of a shared resource
KR940017582A (en) Dual Operation Method of Control System in Electronic Switching System
TWI244031B (en) Booting switch method for computer system having multiple processors
JPH05216852A (en) Data processor
US20070136499A1 (en) Method for designing a completely decentralized computer architecture
KR20170056269A (en) Multi-booting method and apparatus for managing data transport system

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV LY MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase