US20060230377A1 - Computer-based tool and method for designing an electronic circuit and related system - Google Patents
Computer-based tool and method for designing an electronic circuit and related system Download PDFInfo
- Publication number
- US20060230377A1 US20060230377A1 US11/243,509 US24350905A US2006230377A1 US 20060230377 A1 US20060230377 A1 US 20060230377A1 US 24350905 A US24350905 A US 24350905A US 2006230377 A1 US2006230377 A1 US 2006230377A1
- Authority
- US
- United States
- Prior art keywords
- circuit
- templates
- algorithm
- template
- operable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/34—Circuit design for reconfigurable circuits, e.g. field programmable gate arrays [FPGA] or programmable logic devices [PLD]
- G06F30/343—Logical level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1405—Saving, restoring, recovering or retrying at machine instruction level
- G06F11/1407—Checkpointing the instruction stream
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1415—Saving, restoring, recovering or retrying at system level
- G06F11/1417—Boot up procedures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1415—Saving, restoring, recovering or retrying at system level
- G06F11/142—Reconfiguring to eliminate the error
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Failover techniques
- G06F11/2025—Failover techniques using centralised failover control functionality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Failover techniques
- G06F11/2028—Failover techniques eliminating a faulty processor or activating a spare
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2051—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant in regular structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1668—Details of memory controller
- G06F13/1694—Configuration of memory controller to different memory types
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7867—Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8053—Vector processors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/32—Circuit design at the digital level
- G06F30/327—Logic synthesis; Behaviour synthesis, e.g. mapping logic, HDL to netlist, high-level language to RTL or netlist
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/34—Circuit design for reconfigurable circuits, e.g. field programmable gate arrays [FPGA] or programmable logic devices [PLD]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04Q—SELECTING
- H04Q9/00—Arrangements in telecontrol or telemetry systems for selectively calling a substation from a main station, in which substation desired apparatus is selected for applying a control signal thereto or for obtaining measured values therefrom
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2035—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant without idle spare hardware
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2038—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with a single idle spare processing component
Definitions
- PLICs programmable logic integrated circuits
- FPGAs field-programmable gate arrays
- ASICs application-specific integrated circuits
- a software programmer when a software programmer writes source code for a software application, he can often save time by incorporating into the application previously written and debugged software objects from a software-object library.
- a software-object library includes a first software object for squaring a value (here x), a second software object for cubing a value (here z), and a third software object for summing two values (here x 2 and z 3 ).
- a compiler effectively merges these objects into the software application while compiling the source code.
- the object library allows the programmer to write the software application in a shorter time and with less effort because the programmer does not have to “reinvent the wheel” by writing and debugging pieces of source code that respectively square x, cube z, and sum x 2 and z 3 . Furthermore, if the programmer needs to modify the software application, he can do so without modifying and re-debugging the first, second, and third software objects.
- the first circuit portion squares x, cubes z, and sums x 2 and z 3 , and the second circuit portion interfaces the first circuit portion to the external pins of the PLIC.
- the engineer then compiles the source code with PLIC design tool (typically provided by the PLIC manufacturer), which synthesizes and routes the circuit and then generates the configuration firmware that, when loaded into the PLIC, instantiates the circuit.
- the engineer loads the firmware into the PLIC and debugs the instantiated circuit.
- the synthesizing and routing steps are often not trivial, and may take a number of hours or even days depending upon the size and complexity of the circuit. And even if the engineer makes only a minor modification to a small portion of the circuit, he typically must repeat the synthesizing, routing, and debugging steps for the entire circuit.
- a PLIC design tool typically recognizes only hardware-specific source code.
- a mathematician who writes an equation using mathematical symbols (e.g., “+,” “ ⁇ ,” “ ⁇ ,” “ ⁇ ,” “ ⁇ ” “ ⁇ ,” “x 2 ,” “z 3 ,” and “ ⁇ ”), wishes to instantiate on a PLIC a circuit that solves for a variable in a complex equation that includes, e.g., partial derivatives and integrations.
- a computer-based circuit design tool includes a front end, an interpreter coupled to the front end, and an integrator coupled to the interpreter.
- the front end receives symbols that define a logical expression, and the interpreter parses the expression into respective portions.
- the integrator identifies a corresponding circuit template for each of the expression portions, and logically interconnects the identified templates into a representation of an electronic circuit that is operable to execute the expression.
- such a tool may shorten the time and reduce the effort that an engineer expends designing a circuit for instantiation on a PLIC by allowing the engineer to build the circuit from templates of previously designed and debugged circuits.
- the front end of the design tool recognizes mathematical symbols so that one can design a PLIC circuit for executing a mathematical expression with little or no assistance from a hardware engineer.
- FIG. 1 is a block diagram of a peer-vector computing machine having a pipelined accelerator that one can design with a design tool according to an embodiment of the invention.
- FIG. 2 is a block diagram of a pipeline unit that includes a PLIC and that can be included in the pipelined accelerator of FIG. 1 according to an embodiment of the invention.
- FIG. 3 is a diagram of the circuit layers that compose the hardware interface layer within the PLIC of FIG. 2 according to an embodiment of the invention.
- FIG. 4 is a block diagram of the circuitry that composes the interface adapter and framework services layers of FIG. 3 according to an embodiment of the invention.
- FIG. 5 is a diagram of a hardware-description file for a circuit that one can instantiate on a PLIC according to an embodiment of the invention.
- FIG. 6 is a block diagram of a PLIC circuit-template library according to an embodiment of the invention.
- FIG. 7 is a block diagram of circuit-design system that includes a computer-based tool for designing a circuit using templates from the library of FIG. 6 according to an embodiment of the invention.
- FIG. 8 illustrates the parsing of a mathematical expression according to an embodiment of the invention.
- FIG. 9 illustrates a table of hardwired-pipeline library templates corresponding to the hardwired-pipelines available for executing respective portions of the parsed mathematical expression of FIG. 8 according to an embodiment of the invention.
- FIG. 10 is a block diagram of a circuit that the tool of FIG. 7 generates from circuit templates downloaded from the library of FIG. 6 according to an embodiment of the invention.
- FIG. 11 is a block diagram of a circuit that the tool of FIG. 7 generates from circuit templates downloaded from the library of FIG. 6 according to another embodiment of the invention.
- FIG. 12 is a block diagram of a circuit that the tool of FIG. 7 generates from circuit templates downloaded from the library of FIG. 6 according to yet another embodiment of the invention.
- FIG. 13 is a block diagram of a circuit that the tool of FIG. 7 generates for implementing a function as a series expansion according to an embodiment of the invention.
- FIG. 14 is a block diagram of a circuit that the tool of FIG. 7 generates for implementing the function of FIG. 13 as a series expansion according to another embodiment of the invention.
- FIG. 15 is a block diagram of a power-of-x term generator that the tool of FIG. 7 generates as a replacement for the power-of-x multipliers of FIGS. 13 and 14 according to an embodiment of the invention.
- FIG. 16 is a block diagram of a circuit that the tool of FIG. 7 generates for implementing another function as a series expansion according to an embodiment of the invention.
- FIG. 17 is a block diagram of a sign determiner from FIG. 16 according to an embodiment of the invention.
- a computer-based circuit design tool according to an embodiment of the invention is discussed below in conjunction with FIGS. 7-10 .
- FIGS. 1-6 But first is presented in conjunction with FIGS. 1-6 an overview of concepts that are related to the design tool according to an embodiment of the invention. An understanding of these concepts should facilitate the reader's understanding of the design tool.
- FIG. 1 is a schematic block diagram of a computing machine 10 , which has a peer-vector architecture according to an embodiment of the invention.
- the peer-vector machine 10 includes a pipelined accelerator 14 , which is operable to process at least a portion of the data processed by the machine 10 . Therefore, the host-processor 12 and the accelerator 14 are “peers” that can transfer data messages back and forth. Because the accelerator 14 includes hardwired logic circuits instantiated on one or more PLICs, it executes few, if any, program instructions, and thus typically performs mathematically intensive operations on data significantly faster than a bank of computer processors can for a given clock frequency.
- the machine 10 has the same abilities as, but can often process data faster than, a conventional processor-based computing machine. Furthermore, as discussed below and in U.S. Patent Publication No. 2004/0136241, which is incorporated by reference, providing the accelerator 14 with a communication interface that is compatible with the interface of the host processor 12 facilitates the design and modification of the machine 10 , particularly where the communication interface is an industry standard. And where the accelerator 14 includes multiple pipeline units ( FIG. 2 ), providing each of these units with this compatible communication interface facilitates the design and modification of the accelerator, particularly where the communication interface is an industry standard. Moreover, the machine 10 may also provide other advantages as described in the following other patent publications, which are incorporated by reference: 2004/0133763; 2004/0181621; 2004/0170070; and, 2004/0130927.
- the peer-vector computing machine 10 includes a processor memory 16 , an interface memory 18 , a bus 20 , a firmware memory 22 , an optional raw-data input port 24 , an optional processed-data output port 26 , and an optional router 31 .
- the host processor 12 includes a processing unit 32 and a message handler 34
- the processor memory 16 includes a processing-unit memory 36 and a handler memory 38 , which respectively serve as both program and working memories for the processor unit and the message handler.
- the processor memory 36 also includes an accelerator-configuration registry 40 and a message-configuration registry 42 , which store respective configuration data that allow the host processor 12 to configure the functioning of the accelerator 14 and the structure of the messages that the message handler 34 sends and receives.
- the pipelined accelerator 14 includes at least one PLIC ( FIG. 2 ) on which are disposed hardwired pipeline 44 1 - 44 n , which process respective data while executing few, if any, program instructions.
- the firmware memory 22 stores the configuration firmware for the PLIC(s) of the accelerator 14 . If the accelerator 14 is disposed on multiple PLICs, these PLICs and their respective firmware memories may be disposed on multiple circuit boards that are often called daughter cards or pipeline units ( FIG. 2 ).
- the accelerator 14 and pipeline units are discussed further in previously incorporated U.S. Patent Publication Nos. 2004/0136241, 2004/0181621, and 2004/0130927. The pipeline units are also discussed below in conjunction with FIGS. 2-4 .
- the pipelined accelerator 14 receives data from one or more software applications running on the host processor 12 , processes this data in a pipelined fashion with one or more logic circuits that execute one or more mathematical algorithms, and then returns the resulting data to the application(s).
- the logic circuits execute few if any software instructions, they often process data one or more orders of magnitude faster than the host processor 12 .
- the logic circuits are instantiated on one or more PLICs, one can modify these circuits merely by modifying the firmware stored in the memory 52 ; that is, one need not modify the hardware components of the accelerator 14 or the interconnections between these components.
- FIG. 2 is a diagram of a pipeline unit 50 of the pipelined accelerator 14 of FIG. 1 according to an embodiment of the invention.
- the unit 50 includes a circuit board 52 on which are disposed the firmware memory 22 , a platform-identification memory 54 , a bus connector 56 , a data memory 58 , and a PLIC 60 .
- the firmware memory 22 stores the configuration firmware that the PLIC 60 downloads to instantiate one or more logic circuits.
- the platform memory 54 stores a value that identifies the one or more platforms with which the pipeline unit 50 is compatible.
- a platform specifies a unique set of physical attributes that a pipeline unit may possess. Examples of these attributes include the number of external pins (not shown) on the PLIC 60 , the width of the bus connector 56 , the size of the PLIC, and the size of the data memory. Consequently, a pipeline unit 50 is compatible with a platform if the unit possesses all of the attributes that the platform specifies. So a pipeline unit 50 having a bus connector 56 with thirty-two bits is incompatible with a platform that specifies a bus connector with sixty-four bits. Some platforms may be compatible with the peer vector machine 10 ( FIG. 1 ), and others may be incompatible.
- the platform identifier stored in the memory 54 may allow the host processor 12 ( FIG. 1 ) to determine whether the pipeline unit 50 is compatible with the platforms supported by the machine 10 . And where the pipeline unit 50 is so compatible, the platform identifier may also allow the host processor 12 to determine how to configure the PLIC 60 or other portions of the pipeline unit.
- the bus connector 56 is a physical connector that interfaces the PLIC 60 , and perhaps other components of the pipeline unit 50 , with the pipeline bus 20 of FIG. 1 .
- the data memory 58 acts as a buffer for storing data that the pipeline unit 50 receives from the host processor 12 ( FIG. 1 ) and for providing this data to the PLIC 60 .
- the data memory 58 may also act as a buffer for storing data that the PLIC 60 generates for sending to the host processor 12 , or as a working memory for the hardwired pipelines 44 .
- Instantiated on the PLIC 60 are logic circuits that compose the hardwired pipeline(s) 44 and a hardware interface layer 62 , which interfaces the hardwired pipelines to the external pins (not shown) of the PLIC 60 , and which thus interfaces the pipelines to the pipeline bus 20 (via the connector 56 ), the firmware and platform-identification memories 22 and 54 , and the data memory 58 . Because the topology of interface layer 62 is primarily dependent upon the attributes specified by the platform(s) with which the pipeline unit 50 is compatible, one can often modify the pipeline(s) 44 without modifying the interface layer.
- the interface layer 62 provides a thirty-two-bit bus connection to the bus connector 60 regardless of the topology or other attributes of the pipeline(s) 44 . Consequently, as discussed below in conjunction with FIGS. 7-10 , an embodiment of the computer-based design tool allows one to design and debug the pipeline(s) 44 independently of the interface layer 62 , and vice versa.
- the memory 54 may be omitted, and the platform identifier may stored in the firmware memory 22 , or by a jumper-configurable or hardwired circuit (not shown).
- FIG. 3 is a diagram of the hardware layers that compose the hardware interface layer 62 within the PLIC 60 of FIG. 2 according to an embodiment of the invention.
- the hardware interface layer 62 includes three layers of circuitry that is instantiated on the PLIC 60 : an interface-adapter layer 70 , a framework-services layer 72 , and a communication layer 74 , which is hereinafter called a communication shell.
- the interface-adapter layer 70 includes circuitry, e.g., buffers and latches, that interfaces the framework-services layer 72 to the external pins (not shown) of the PLIC 60 .
- the framework-services layer 72 provides a set of services to the hardwired pipeline(s) 44 via the communication shell 74 .
- the layer 72 may synchronize data transfer between the pipeline(s) 44 , the pipeline bus 20 ( FIG. 1 ), and the data memory 58 ( FIG. 2 ), and may control the sequence(s) in which the pipeline(s) operate.
- the communication shell 74 includes circuitry, e.g., latches, that interface the framework-services layer 72 to the pipeline(s) 44 .
- alternate embodiments of the hardware-interface layer 62 are contemplated.
- the framework-services layer 72 is shown as isolating the interface-adapter layer 70 from the communication shell 74
- the interface-adapter layer may, at least at some circuit nodes, be directly coupled to the communication shell.
- the communication shell 74 is shown as isolating the interface-adapter layer 70 and the framework-services layer 72 from the hardwired pipeline(s) 44
- the interface-adapter layer or the framework-services layer may, at least at some circuit nodes, be directly coupled to the pipeline(s).
- FIG. 4 is a schematic block diagram of the circuitry that composes the interface-adapter layer 70 and the framework-services layer 72 of FIG. 3 according to an embodiment of the invention.
- a communication interface 80 and an optional industry-standard bus interface 82 compose the interface-adapter layer 70 , and a controller 84 , exception manager 86 , and configuration manager 88 compose the framework-services layer 72 .
- the communication interface 80 transfers data between a peer, such as the host processor 12 ( FIG. 1 ) or another pipeline unit 50 ( FIG. 2 ), and the firmware memory 22 , the platform-identifier memory 54 , the data memory 58 , and the following components instantiated within the PLIC 60 : the hardwired pipelines 44 (via the communication shell 74 ), the controller 86 , the exception manager 88 , and the configuration manager 90 .
- the optional industry-standard bus interface 82 couples the communication interface 80 to the bus connector 56 .
- the interfaces 80 and 82 may be combined such that the functionality of the interface 82 is included within the communication interface 80 .
- the controller 84 synchronizes the hardwired pipelines 44 1 - 44 n and monitors and controls the sequence in which they perform the respective data operations in response to communications, i.e., “events,” from other peers.
- a peer such as the host processor 12 may send an event to the pipeline unit 50 via the pipeline bus 20 to indicate that the peer has finished sending a block of data to the pipeline unit and to cause the hardwired pipelines 44 1 - 44 n to begin processing this data.
- An event that includes data is typically called a message, and an event that does not include data is typically called a “door bell.”
- the exception manager 86 monitors the status of the hardwired pipelines 44 1 - 44 n , the communication interface 80 , the communication shell 74 , the controller 84 , and the bus interface 82 (if present), and reports exceptions to the host processor 12 ( FIG. 1 ). For example, if a buffer (not shown) in the communication interface 80 overflows, then the exception manager 86 reports this to the host processor 12 .
- the exception manager may also correct, or attempt to correct, the problem giving rise to the exception. For example, for an overflowing buffer, the exception manager 86 may increase the size of the buffer, either directly or via the configuration manager 88 as discussed below.
- the configuration manager 88 sets the “soft” configuration of the hardwired pipelines 44 1 - 44 n , the communication interface 80 , the communication shell 74 , the controller 84 , the exception manager 86 , and the interface 82 (if present) in response to soft-configuration data from the host processor 12 ( FIG. 1 ).
- the “hard” configuration of a component within the PLIC 60 denotes the actual instantiation, on the transistor and circuit-block level, of the component
- the soft configuration denotes the physical parameters (e.g., data width, table size) of the instantiated component.
- soft-configuration data is similar to the data that one can load into a register of a processor (not shown in FIG. 4 ) to set the operating mode (e.g., burst-memory mode) of the processor.
- the host processor 12 may send to the PLIC 60 soft-configuration data that causes the configuration manager 88 to set the number and respective priority levels of queues (not shown) within the communication interface 80 .
- the exception manager 86 may also send soft-configuration data that causes the configuration manager 88 to, e.g., increase the size of an overflowing buffer in the communication interface 80 .
- the communication interface 80 optional industry-standard bus interface 82 , controller 84 , exception manager 86 , and configuration manager 88 are further discussed in previously incorporated U.S. Patent Publication No. 2004/0136241.
- the pipeline unit 50 may include multiple PLICs.
- the pipeline unit 50 may include two interconnected PLICs, where the circuitry that composes the interface-adapter layer 70 and framework-services layer 72 is instantiated on one of the PLICs, and the circuitry that composes the communication shell 74 and the hardwired pipelines 44 is instantiated on the other PLIC.
- FIG. 5 is a diagram of a hardware-description file 100 from which a conventional PLIC synthesizer and router tool (not shown) can generate the configuration firmware for the PLIC 60 of FIGS. 2-4 according to an embodiment of the invention.
- the hardware-description file 100 includes templates that are written in a conventional hardware description language (HDL) such as Verilog® HDL.
- HDL hardware description language
- the top-down structure of the file 100 resembles the top-down structure of software source code that incorporates software objects.
- Such a top-down structure for software source code provides at least two advantages. First, it allows a programmer to avoid writing and debugging source code for a function when a software object that performs the function has already been written and debugged.
- the top-down structure of the file 100 provides similar advantages. For example, it allows one to incorporate in the file 100 existing templates that define an already-debugged hardware-interface layer 62 ( FIGS. 2-3 ). Furthermore, it allows one to change an existing hardwired pipeline 44 or to add to a circuit a new hardwired pipeline 44 with little or no rewriting and debugging of the templates that define the layer 62 .
- the hardware-description file 100 includes a top-level template 101 , which includes respective top-level definitions 102 , 104 , and 106 of the interface-adapter layer 70 , the framework-services layer 72 , and the communication shell 74 (collectively the hardware-interface layer 62 ) of the PLIC 60 ( FIGS. 2-4 ).
- the template 101 also defines the connections between the external pins (not shown) of the PLIC 60 and the interface-adapter 70 (and in some cases the framework-services layer 72 ), and also defines the connections between the framework-services layer (and in some cases the interface-adapter layer) and the communication shell 74 .
- the top-level definition 102 of the interface-adapter layer 70 incorporates an interface-adapter-layer template 108 , which further defines the portions of the interface-adapter layer defined by the top-level definition 102 .
- the top-level definition 102 defines a data-input buffer (not shown) in terms of its input and output nodes. That is, suppose the top-level definition 102 defines the data-input buffer as a functional block having defined input and output nodes.
- the template 108 defines the circuitry that composes this functional buffer block, and defines the connections between this circuitry and the buffer input nodes and output nodes recited in the top-level definition 102 .
- the template 108 may incorporate one or more lower-level templates 109 that further define the data buffer or other components of the interface-adapter layer 70 recited in the template 108 .
- these one or more lower-level templates 109 may each incorporate one or more even lower-level templates (not shown), and so on, until all portions of the interface-adapter layer 70 are defined in terms of circuit components (e.g., flip-flops, logic gates) that the PLIC synthesizing and routing tool (not shown) recognizes.
- the top-level definition 104 of the framework-services layer 72 incorporates a framework-services-layer template 110 , which further defines the portions of the framework-services layer defined by the definition 104 .
- a framework-services-layer template 110 defines the circuitry that composes this counter, and defines the connections between this circuitry and the counter input and output nodes recited by the top-level definition 104 .
- the template 110 may incorporate a hierarchy of one or more lower-level templates 111 and even lower-level templates (not shown), and so on, such that all portions of the framework-services layer 72 are, at some level of the hierarchy, defined in terms of circuit components (e.g., flip-flops, logic gates) that the PLIC synthesizing and routing tool recognizes.
- circuit components e.g., flip-flops, logic gates
- the template 110 may incorporate a lower-level template 111 that defines the circuitry within the selector circuit and defines the connections between this circuitry and the selector circuit's input and output nodes defined by the template 110 .
- the top-level definition 106 of the communication shell 74 incorporates a communication-shell template 112 , which further defines the portions of the communication shell defined by the definition 106 and which also includes a top-level definition 113 of the hardwired pipeline(s) 44 disposed within the communication shell.
- the definition 113 defines the connections between the communication shell 74 and the hardwired pipeline(s) 44 .
- the top-level definition 113 of the hardwired pipeline(s) 44 incorporates one or more hardwired-pipeline templates 114 , which further define the portions of the hardwired pipeline(s) 44 defined by the definition 113 .
- the template or templates 114 may each incorporate a hierarchy of one or more lower-level templates 115 and even lower-level templates (not shown) such that all portions of the respective pipeline(s) 44 are, at some level of the hierarchy, defined in terms of circuit components (e.g., flip-flops, logic gates) that the PLIC synthesizing and routing tool recognizes.
- the communication-shell template 112 may incorporate a hierarchy of one or more lower-level templates 116 and even lower-level templates (not shown) such that all portions of the communication shell 74 other than the hardwired pipeline(s) 44 are, at some level of the hierarchy, defined in terms of circuit components (e.g., flip-flops, logic gates) that the PLIC synthesizing and routing tool recognizes.
- circuit components e.g., flip-flops, logic gates
- a configuration template 118 provides definitions for one or more parameters having values that one can set to configure the circuitry that the templates 101 , 108 , 110 , 112 , 114 and lower-level templates 109 , 111 , 115 , and 116 define.
- the bus interface 82 of the interface-adapter layer 70 ( FIG. 4 ) is configurable to have either a thirty-two-bit or a sixty-four-bit interface with the bus connector 56 .
- the configuration template 118 defines a template BUS-WIDTH, the value of which determines the width of the interface between the interface 82 and the connector 56 .
- Other parameters that may be configurable include the depth of a first-in-first-out (FIFO) data buffer (not shown) disposed within the framework-services layer 72 ( FIGS. 2-4 ), the lengths of messages received and transmitted by the interface-adapter layer 70 , and the precision and data structure (e.g., integer, floating-point) of the hardwired pipeline(s) 44 .
- FIFO first-in-first-out
- One or more of the templates 101 , 108 , 110 , 112 , 114 and the lower-level templates incorporate the parameters defined in the configuration template 118 .
- the PLIC synthesizer and router tool (not shown) configures the interface-adapter layer 70 , the framework-services layer 72 , the communication shell 74 , and the hardwired pipeline(s) 44 ( FIGS. 3-4 ) according to the values in the template 118 during the synthesis of this circuitry. Consequently, to reconfigure the circuit parameters represented by the parameters in the configuration template 118 , one need only modify the values of these parameters in the template 118 , and then rerun the synthesizer and router tool on the file 100 .
- the parameters in the configuration template 118 can be sent to the PLIC as soft-configuration data after instantiation of the circuit, then one can modify the corresponding circuit parameters by merely modifying the soft-configuration data. Therefore, according to this alternative, may avoid rerunning the synthesizer and router tool on the file 100 .
- templates e.g., 101 , 108 , 109 , 110 , 111 , 112 , 114 , 115 , and 116
- templates that do not incorporate settable parameters such as those provided by the configuration template 118 are sometimes called modules or entities, and are typically lower-level templates that include Boolean expressions that a synthesizer and router tool (not shown) converts into circuitry for implementing the expressions.
- the hardware-description file 100 may define circuitry for instantiation on an ASIC.
- FIG. 6 is a block diagram of a library 120 that stores PLIC circuit templates, such as the templates 101 , 108 , 110 , 112 , and 114 (and any existing lower-level templates) of FIG. 5 , according to an embodiment of the invention.
- PLIC circuit templates such as the templates 101 , 108 , 110 , 112 , and 114 (and any existing lower-level templates) of FIG. 5 , according to an embodiment of the invention.
- the library 120 has m+1 sections: m sections 122 1 - 122 m for the respective m platforms that the library supports, and a section 124 for the hardwired-pipelines 44 ( FIGS. 2-4 ) that the library supports.
- the library section 122 1 is discussed in detail, it being understood that the other library sections 122 2 - 122 m are similar.
- the library section 122 1 includes only one interface-adapter-layer template 108 1 and only one framework-services-layer template 110 1 .
- the library section 122 1 would include multiple interface-adapter- and framework-services-layer templates 108 and 110 .
- the library section 122 1 also includes n communication-shell templates 112 1,1 - 112 1,n, which respectively correspond to the hardwired-pipeline templates 114 1 - 114 n in the library section 124 .
- the communication shell 74 interfaces a hardwired pipeline or hardwired-pipelines 44 to the framework-services layer 72 . Because each hardwired pipeline 44 is different and typically has different interface specifications, the communication shell 74 is typically adapted for each hardwired pipeline. Consequently, in this embodiment, one provides design adjustments to create a unique version of the communication shell 74 for each hardwired pipeline 44 . The designer provides these design adjustments by writing a unique communication-shell template 112 for each hardwired pipeline.
- the library section 122 1 includes a configuration template 118 1 , which defines configuration constants having designer-selectable values as discussed above in conjunction with the configuration template 118 of FIG. 5 .
- each template within the library section 122 1 includes, or is associated with, a respective description 126 1 - 134 1 .
- the descriptions 126 1 - 132 1,n describe the operational and other parameters of the circuitry that the respective templates 101 1 , 108 1 , 110 1 , and 112 1,1 - 112 1,n define.
- the description 134 1 describes the settable parameters in the configuration template 118 1 , the values that these parameters can have, and the meanings of these values.
- the design tool discussed below in conjunction with FIGS. 7-11 uses the descriptions 126 1 - 134 1 to design and simulate a circuit that includes a combination of the hardwired pipelines 44 1 - 44 n , which are respectively defined by the templates 114 1 - 114 n .
- Examples of parameters that the descriptions 126 1 - 132 1,n may describe include the width of the data bus and the depths of buffers that the circuit defined by the corresponding template includes, the latency of the circuit, and the precision of the values received and generated by the circuit.
- Each of the descriptions 126 1 - 134 1 may be embedded within the respective template 101 1 , 108 1 , 110 1 , 112 1 - 112 1,n and 118 1 to which it corresponds.
- the description 128 1 may be embedded within the template 108 1 as extensible markup language (XML) tags or comments that are readable by both a human and the tool discussed below in conjunction with FIGS. 7-11 .
- XML extensible markup language
- each description 126 1 - 134 1 may be disposed in a separate file that is linked to the template to which the description corresponds, and this file may be written in a language other than XML.
- the description 126 1 may be disposed in a file that is linked to the top-level template 101 1 .
- the design tool discussed below in conjunction with FIGS. 7-11 may use the description 136 1 to determine which platforms the library 120 supports. Examples of parameters that the description 136 1 may describe include 1) for each interface, the message specification, which lists the transmitted variables and the constraints for those variables, and 2) a behavior specification and any behavior constraints. Messages that the host processor 12 ( FIG. 1 ) sends to the pipeline units 50 ( FIG. 2 ) and that the pipeline units send among themselves are further discussed in previously incorporated U.S. Patent Publication No. 2004/0181621.
- Examples of other parameters that the description 136 1 may describe include the size and resources (e.g., the number of multipliers and the amount of available memory) of the PLIC 60 ( FIGS. 2-4 ). Furthermore, the platform description 136 1 may be written in XML or in another language.
- the section 124 of the library 120 includes n hardwired-pipeline templates 114 1 - 114 n , which each define a respective hardwired pipeline 44 1 - 44 n ( FIGS. 2-4 ).
- the templates 114 1 - 114 n are platform independent (the corresponding communication-shell templates 112 m,1 - 112 m,n define the specified interface to the interface-adapter and framework-services layers 70 and 72 of FIGS. 3-4 ), the library 120 stores only one template 114 for each hardwired pipeline 44 ( FIGS. 2-4 ). That is, each hardwired pipeline 44 does not require a separate template 114 for each platform that the library 120 supports.
- an advantage of this top-down design is that one need only create a single template 114 to define a hardwired pipeline 44 , not m templates.
- each hardwired-pipeline template 114 includes, or is associated with, a respective description 138 1 - 138 n , which describes the parameters of the hardwired-pipeline 44 that the template defines.
- the design tool discussed below in conjunction with FIGS. 7-11 uses the descriptions 138 to design and simulate a circuit that includes a combination of the hardwired pipelines 44 1 - 44 n , which are respectively defined by the templates 114 1 - 114 n .
- parameters that the descriptions 138 1 - 138 n may describe include the type (e.g., floating point or integer) and precision of the data that the corresponding hardwired pipeline 44 can receive and generate, and the latency of the pipeline.
- each of the descriptions 138 1 - 138 n may be embedded within the respective template 114 1 - 114 n to which the description corresponds as, e.g., XML tags, or may be disposed in a separate file that is linked to the template to which the description corresponds.
- each library section 122 1 - 122 m may include a single description that describes all of the templates within that library section.
- this single description may be embedded within or linked to the top-level template 101 or the configuration template 118 .
- each library section 122 1 - 122 m is described as including a respective communication-shell template 112 for each hardwired-pipeline template 114 in the library section 124
- each section 122 may include fewer communication-shell templates, at least some of which are compatible with, and thus correspond to, more than one pipeline template 114 .
- each library section 122 1 - 122 m may include only a single communication-shell template 112 , which is compatible with all of the hardwired-pipeline templates 114 in the library section 124 .
- the library section 124 may include respective versions of each pipeline template 114 for each communication-shell template 112 in the library sections 122 1 - 122 m .
- FIG. 7 is a block diagram of a circuit-design system 150 , which includes a computer-based software tool 152 for designing a circuit using templates from the library 120 of FIG. 6 according to an embodiment of the invention.
- the tool 152 allows one to design a circuit that includes a combination of one or more previously designed and debugged hardware-interface layers 62 ( FIG. 2 ) and hardwired pipelines 44 ( FIGS. 2-4 ). Because another has already tested and debugged the one or more layers 62 and pipelines 44 , the tool 152 may significantly decrease the time required for one to design such a combination circuit as compared to a conventional design progression.
- the tool 152 allows him to define the circuit with an expression of conventional mathematical symbols, where the expression defines the algorithm; consequently, one having little or no experience in circuit design can use the tool to design a circuit for executing an algorithm.
- the system 150 includes a processor (not shown) for executing the software code that composes the tool 152 . Consequently, in response to the code, the processor performs the functions that are attributed to the tool 152 in the discussion below. But for clarity of explanation, the tool 152 , not the processor, is described as performing the actions.
- the system 150 includes an input device 154 , a display device 155 , and the library 120 of FIG. 6 .
- the input device 154 which may include a keyboard and a mouse, allows one to provide to the tool 152 information that describes an algorithm and that describes a circuit for executing the algorithm. Such information may include an expression of mathematical symbols, circuit parameters (e.g., buffer width, latency), operation exceptions (e.g., a divide by zero), and the platform on which one wishes to instantiate the circuit.
- the device 155 displays the input information and other information
- the library 120 includes the templates that the tool 152 uses to build the circuit and to generate a file that defines the circuit.
- the tool 152 includes a symbolic-math front end 156 , an interpreter 158 , a generator 160 for generating a file 162 that defines a circuit, and a simulator 164 .
- the front end 156 receives from the input device 154 the mathematical expression that defines the algorithm that the circuit is to execute and other design information, and converts this information into a form that is readable by the interpreter 158 .
- the front end 156 includes a web browser that accepts XML with a schema for Math Markup Language (MathML).
- MathML is software standard that allows one to enter expressions using conventional mathematical symbols.
- the schema of MathML is a conventional plug in that imparts to a web browser this same ability, i.e., the ability to enter expressions using mathematical symbols.
- the front end 156 may utilize another technique for allowing one to define a circuit using a mathematical expression.
- another technique include the technique used by the conventional software mathematical-expression solver MathCAD.
- MathCAD the conventional software mathematical-expression solver
- one may enter the identity of a platform or pipeline accelerator 14 ( FIG. 1 ) on which he wants the circuit instantiated, and may enter test data with which the simulator 164 will simulate the operation of the circuit.
- one may enter valid-range constraints for any variables within the entered mathematical expression and constraints on execution of the expression, and may specify the action(s) to be taken if the constraints are violated.
- the front end 156 then converts all of the entered information into a format, such as HDL, that is compatible with the interpreter 158 . Moreover, as discussed above, the front end 156 may cause the device 155 to display the input information and other related information. For example, the front end 156 may cause the device 155 to display the mathematical expression that the designer enters to define the algorithm to be executed by the circuit.
- the interpreter 158 parses the information from the front end 156 and determines: 1) whether the library 120 includes templates 114 ( FIG. 6 ) defining hardwired pipelines 44 ( FIGS. 2-4 ) that, when combined, can execute the algorithm entered by the designer, and 2), if the answer to (1) is “yes,” which, if any, available pipeline accelerators 14 ( FIG. 1 ) described by the description 140 in the library 120 has sufficient resources to instantiate a circuit that can execute the algorithm. For example, suppose the algorithm includes the mathematical operation ⁇ square root over (v) ⁇ . If the library 120 does not include a template 114 ( FIG. 6 ) defining a hardwired pipeline 44 ( FIGS.
- the interpreter 158 determines that the tool 152 cannot generate a file 162 that defines a circuit for executing the algorithm. Furthermore, suppose that the circuit for executing the algorithm requires the resources of at least five PLICs 60 ( FIGS. 2-4 ). If the description 140 indicates that the available accelerators 14 each have only three pipeline units 50 ( FIG. 2 ), and thus each have only three PLICs 60 , then the interpreter 158 determines that even though the tool 152 may be able to generate a file 162 that defines a circuit for executing the algorithm, one cannot implement this circuit on an available accelerator.
- the interpreter 158 makes a similar determination if the designer indicates that he wants the algorithm executed by a circuit having a sixty-four-bit bus width, but the available platforms support only a thirty-two-bit bus width. In situations where the interpreter 158 determines that the tool 152 cannot generate a circuit for executing the desired algorithm or that one cannot implement the circuit on an existing platform and/or accelerator 14 , the interpreter 158 causes the device 155 to display an appropriate error message (e.g., “no library template for instantiating “ ⁇ square root over (v) ⁇ ,” “insufficient PLIC resources,” “bus-width not supported”).
- an appropriate error message e.g., “no library template for instantiating “ ⁇ square root over (v) ⁇ ,” “insufficient PLIC resources,” “bus-width not supported”.
- the interpreter 158 determines whether the circuit can be instantiated on the identified platform or accelerator. But if the circuit cannot be so instantiated, the interpreter 158 may determine that the circuit can be instantiated on another platform or accelerator, and thus may so inform the designer with an appropriate message via the display device 155 . This allows the designer the choice of instantiating the circuit on another platform or accelerator 14 .
- the interpreter 158 determines that the library 120 includes a sufficient number of hardwired-pipeline templates 114 ( FIG. 6 ) to define a circuit that can execute the desired algorithm, and also determines that the circuit can be instantiated on an available platform and accelerator 14 ( FIG. 1 ), then the interpreter provides to the file generator 160 the identities of the hardwired-pipeline templates 114 that correspond to portions of the algorithm.
- the file generator 160 combines the hardwired pipelines 44 ( FIGS. 2-4 ) defined by the identified hardwired-pipeline templates 114 such that the combination forms a circuit that can execute the algorithm.
- the generator 160 then generates the file 162 , which defines the circuit for executing the algorithm in terms of the hardwired pipelines 44 ( FIGS. 2-4 ) and the hardware-interface layers 62 ( FIG. 2 ) that compose the circuit, the PLIC(s) 60 ( FIGS. 2-3 ) on which the pipelines are disposed, and the interconnections between the pipelines (if multiple pipelines on a PLIC) and/or between the PLICs (if the pipelines are disposed on more than one PLIC).
- the host processor 12 can use the file 162 to instantiate on the pipeline accelerator 14 ( FIG. 1 ) the defined circuit as discussed in previously incorporated U.S. patent application Ser. No. (Attorney Docket No. 1934-25-3).
- the host processor 12 may instantiate some or all portions of the defined circuit in software executed by the processing unit 32 .
- the simulator 164 receives the file 162 from the generator 160 and receives from the front end 154 designer-entered test data, such as a test vector, designer-entered constraint data, and a designer-entered exception-handling protocol, and then simulates operation of the circuit defined by the file 162 .
- the simulator 164 also gathers parameter information (e.g., precision, latency) from the description files 138 ( FIG. 6 ) that correspond to the hardwired-pipeline templates 114 that define the pipelines 44 that compose the circuit.
- the simulator 164 may retrieve this parameter information directly from the library 120 , or the generator 160 may include this parameter information in the file 162 .
- FIG. 8 illustrates the parsing of a symbolic mathematical expression by the interpreter 158 according to an embodiment of the invention.
- the syntax of the design language is the same as that used by mathematicians for writing algebraic equations.
- the explanations that follow show how a symbolic mathematical expression is a sufficient syntax for defining the hardwired pipelines 44 from a simple set of circuit primitives.
- FIG. 9 illustrates a table of hardwired-pipeline templates 114 , which correspond to the hardwired pipelines 44 ( FIGS. 2-4 ) that the interpreter 158 ( FIG. 7 ) identifies for executing portions of the parsed algorithm ( FIG. 8 ) according to an embodiment of the invention.
- y ⁇ square root over (x 4 cos(z)+z 3 sin(x)) ⁇ (2) Also suppose that x, y, and z are thirty-two-bit floating-point values.
- equation (2) into the front end 156 of the tool 152 by entering the following sequence of mathematical symbols: “ ⁇ ”, “x 4 ”, “ ⁇ ”, “cos(z)”, “+”, “z 3 ”, “ ⁇ ”, and “sin(x)”.
- the designer also enters information specifying the input and output message specifications, for example indicating that x, y, and z are thirty-two-bit floating-point values.
- the designer may also enter information indicating desired operating parameters, such as the desired latency, in clock cycles, from inputs x and z to output y, and the desired types and precision of any intermediate values, such as cos(z) and sin(x), generated during the calculation of y.
- the designer may enter information that identifies a desired platform or pipeline accelerator 14 ( FIG. 1 ) on which he wants the circuit instantiated. Moreover, the designer may specify the accuracy of any mathematical approximations that the tool 152 may make. For example, if the tool 152 approximates cos(z) using a Taylor series expansion, then by specifying the accuracy of this approximation, the designer effectively specifies the number of terms needed in the expansion. Alternatively, the designer may directly specify the number of terms in the expansion. The implementation of a function as a Taylor series expansion is further described below in conjunction with FIGS. 13-17 .
- the front end 156 converts these mathematical symbols and the other information into a format compatible with the interpreter 158 if this information is not already in a compatible format.
- the interpreter 158 determines whether any of the hardwired-pipeline templates 114 in the library 120 defines a hardwired pipeline 44 that can solve for y in equation (2) within the specified behavior and operating parameters and that can be instantiated within the desired platform and on the desired pipeline accelerator 14 ( FIG. 1 ).
- the interpreter 158 informs the designer, via the display device 155 , that a conventional FPGA synthesizing and routing tool can generate firmware for instantiating this hardwired pipeline 44 from the identified template 114 , the corresponding communication-shell template 112 , and the corresponding top-level template 101 .
- the interpreter 158 parses the equation (2) into portions, and determines whether the library includes templates 114 that define hardwired pipelines 44 for executing these portions within the specified behavior, operating parameters, and platform and on the specified pipeline accelerator 14 ( FIG. 1 ).
- the interpreter 158 parses the equation (2) according to a top-down parsing sequence as discussed below.
- this top-down parsing sequence corresponds to the known algebraic laws for the order of operations.
- the interpreter 158 parses the equation (2) into the following two portions: “ ⁇ ”, which is portion 170 in FIG. 8 , and “x 4 cos(z)+z 3 sin(x)”, which is portion 172 .
- the interpreter 158 determines that the library 120 includes at least two hardwired-pipeline templates 114 that define hardwired pipelines 44 for respectively executing the portions 170 and 172 of equation (2), then the interpreter passes the identity of these templates to the file generator 160 .
- the interpreter 158 determines that although the library 120 includes a hardwired-pipeline template 114 that defines a pipeline 44 for executing the square-root operation 170 of equation (2), the library includes no hardwired-pipeline template that defines a pipeline for executing the portion 172 .
- the interpreter 158 parses the portion 172 of equation (2). Specifically, the interpreter. 158 parses the portion 172 into the following three respective portions 174 , 176 , and 178 : “x 4 cos(z)”, “+”, and “z 3 sin(x)”.
- the interpreter 158 determines that the library 120 includes at least three hardwired-pipeline templates 114 that define hardwired pipelines 44 for respectively executing the portions 174 , 176 , and 178 of equation (2), then the interpreter passes the identity of these templates to the file generator 160 .
- the interpreter 158 determines that although the library 120 includes a hardwired-pipeline template 114 that defines a hardwired pipeline 44 for executing the summing operation 176 of equation (2), the library includes no templates 114 that define hardwired pipelines for executing the portions 174 or 178 .
- the interpreter 158 parses the portions 174 and 178 of equation (2). Specifically, the interpreter 158 parses the portion 174 into three portions 180 (“x 4 ⁇ ), 182 (“ ⁇ ”), and 184 (“cos(z)”), and parses the portion 178 into three portions 186 (“z 3 ”), 188 (“ ⁇ ”), and 190 (“sin(x)”).
- the interpreter 158 determines that the library 120 does not include hardwired-pipeline templates 114 that define hardwired pipelines 44 for respectively executing each of the portions 180 , 182 , 184 , 186 , 188 , and 190 , then the interpreter displays via the device 155 an error message indicating that the library does not support a circuit that can solve for y in equation (2).
- the library 120 includes hardwired-pipeline templates 114 that provide the primitive operations for multiplication and for raising variables to a power (e.g., cubing a value by using two multipliers in sequence) for single- or double-precision floating-point data types, and for data-type conversion.
- the tool 152 recognizes common factors, for example that x is a factor of x 3 if sin(x 3 ) was needed instead of the sin(x), and generates circuitry to provide these common factors from chained multipliers.
- the interpreter 158 determines that the library 120 includes hardwired-pipeline templates 114 that define hardwired pipelines 44 for respectively executing each portion 180 , 182 , 184 , 186 , 188 , and 190 of equation (2).
- the interpreter 158 provides to the file generator 160 the identities of all the hardwired-pipeline templates 114 that define the hardwired-pipelines 44 for executing the following eight portions of equation (1): 170 (“ ⁇ ”), 176 (“+”), 180 (x 4 ”), 182 (“ ⁇ ”), 184 (“cos(z)”), 186 (“z 3 ”), 188 (“ ⁇ ”), and 190 (“sin(x)”).
- the file generator 160 generates a table 192 ( FIG. 9 ) of the hardwired-pipeline templates 114 identified by the interpreter 158 , and displays this table via the device 155 .
- the table 192 lists the portions 170 (“ ⁇ ”), 176 (“+”), 180 (“x 4 ”), 182 (“ ⁇ ”), 184 (“cos(z)”), 186 (“z 3 ”), 188 (“ ⁇ ”), and 190 (“sin(x)”) of equation (2).
- the table 192 lists the hardwired-pipeline template or templates 114 that define a hardwired pipeline 44 for executing the respective portion of equation (2).
- the table 192 lists parameters, such as the latency (in units of cycles of the signal that clocks the defined pipeline 44 ) and the input and output precision, of the hardwired pipeline(s) 44 defined by the templates 114 in the second column 196 .
- the seven hardwired-pipeline templates 144 1 - 114 7 in column 196 define hardwired pipelines 44 1 - 44 7 for respectively executing the corresponding portions of equation (2) in column 194 .
- the pipeline templates may include multiple templates 114 that define respective pipelines for executing each of the eight portions 170 , 176 , 180 , 182 , 184 , 186 , 188 , and 190 of equation (2).
- the file generator 160 selects the pipelines 44 from which to build a circuit that solves for y in equation (2).
- the generator 160 selects these pipelines 44 based on the behavior(s), operating parameter(s), platform(s), and pipeline accelerator(s) 14 ( FIG. 1 ) that the designer specified. For example, if the designer specified that x, y, and z are thirty-two-bit floating-point quantities, then the generator 160 selects pipelines 44 that operate on thirty-two-bit floating-point numbers. If the available pipelines 44 for a particular portion of the equation (2) do not meet all of the designer's specifications, then the generator 160 may use a default set of rules to select the best pipeline.
- the rules may indicate that if there is no available pipeline 44 that meets the specified latency and precision requirements, then, with the designer's authorization, the generator 160 defaults to the pipeline having the specified precision and the latency closest to the specified latency. Otherwise a new pipeline 44 with the specified latency is placed in the library, or the designer can select another pipeline from the table 192 .
- two versions of an x 4 circuit may be represented by respective hardwired-pipeline templates 114 in the library 120 : a pipelined version using two fully registered multipliers in a cascade, or an in-place version using a single, fully registered multiplier, a one-bit counter, and a multiplexer.
- the pipelined version consumes roughly twice the circuit resources but accepts one input value every clock cycle.
- the in-place version consumes fewer circuit resources but accepts a new input value only every other clock cycle.
- the file generator 160 interconnects the selected hardwired pipelines 44 to form a circuit 200 ( FIG. 10 ) that can solve for y in equation (2).
- the generator 160 also generates a schematic diagram of the circuit 200 for display via the device 155 .
- the file generator 160 first determines how the selected hardwired pipelines 44 1 - 44 7 can “fit” into the resources of a specified accelerator 14 ( FIG. 1 ) (or a default accelerator if the designer does not specify one). For example, the file generator 160 calculates the number of PLICs 60 ( FIG. 3 ) needed to contain the eight instances of the pipelines 44 1 - 44 7 (this includes two instances of the pipeline 445 )
- the generator 160 determines that each PLIC 60 ( FIG. 3 ) can hold only a respective one of the pipelines 44 1 - 44 7 ; consequently, the generator 160 determines that eight pipeline units 50 1 - 50 8 are needed to instantiate the circuit 200 .
- the generator 160 “inserts” into each of the PLICs 60 1 - 60 8 of the pipeline units 50 1 - 50 8 a respective hardware-interface layer 62 1 - 62 8 .
- the generator 160 generates the layers 62 1 - 62 8 from the following templates in section 122 1 of the library 120 : the interface-adapter-layer template 108 1 , the framework-services-layer template 110 1 , and the communication-shell templates 112 1,1 - 112 1,7, which respectively correspond to the pipeline templates 114 1 - 114 7 , and thus to the pipelines 44 1 - 44 7 .
- the generator 160 generates the hardware-interface layer 62 1 from the interface-adapter-layer template 108 1 , the framework-services-layer template 110 1 , and the communication-shell template 112 1,1 .
- the generator 160 generates the hardware-interface layer 62 2 from the templates 108 1 , 110 1 , and 112 1,2, the hardware-interface layer 62 3 from the templates 108 1 , 110 1 , and 112 1,3, and so on.
- the generator 160 generates both of the hardware-interface layers 62 5 and 62 6 from the interface-adapter and framework-services templates 108 1 and 110 1 and from the communication-shell template 112 1,5 ; consequently, the hardware-interface layers 62 5 and 62 6 are identical but are instantiated on respective PLICs 60 5 and 60 6 . Moreover, the generator 160 generates the hardware-interface layer 62 7 from the templates 108 1 , 110 1 , and 112 1,6 , and the hardware-interface layer 62 8 from the templates 108 1 , 110 1 , and 112 1,7 .
- the generator 160 “inserts” into each hardware-interface layer 62 1 - 62 8 a respective hardwired pipeline 44 1 - 44 7 (the generator 160 inserts the pipeline 44 5 into both of the hardware-interface layers 62 5 and 62 6 , the pipeline 44 6 into the hardware-interface layer 62 7 , and the pipeline 44 7 into the hardware-interface layer 62 8 ). More specifically, the generator 160 inserts the pipelines 44 1 - 44 7 into the hardware-interface layers 62 1 - 62 8 by respectively inserting the hardwired-pipeline templates 114 1 - 114 7 into the communication-shell templates 112 1,1 - 112 1,7 .
- the generator 160 interconnects the pipeline units 50 1 - 50 8 to form the circuit 200 , which generates the value y from equation (2) at its output (i.e., the output of the pipeline unit 50 8 ).
- the circuit 200 includes an input stage 206 , first and second intermediate stages 208 and 210 , and an output stage 212 , and operates as follows.
- the input stage 206 includes the hardwired pipelines 44 1 - 44 4 and operates as follows.
- the pipeline 44 1 receives a stream of values x via an input portion of the hardware-interface layer 62 1 and generates, in a pipelined fashion, a corresponding stream of values sin(x) via an output portion of the layer 62 1 .
- the pipeline 40 2 receives a stream of values z via an input portion of the hardware-interface layer 62 2 and generates, in a pipelined fashion, a corresponding stream of values z 3 via an output portion of the layer 62 2
- the pipeline 44 3 receives the stream of values x via an input portion of the hardware-interface layer 62 3 and generates, in a pipelined fashion, a corresponding stream of values x 4 via an output portion of the layer 62 3
- the pipeline 44 4 receives the stream of values z via an input portion of the hardware-interface layer 62 4 and generates, in a pipelined fashion, a corresponding stream of values cos(z) via an output portion of the layer 62 4 .
- the first intermediate stage 208 of the circuit 200 includes two instantiations of the pipelines 44 5 and operates as follows.
- the pipeline 44 5 in the PLIC 60 5 receives the streams of values sin(x) and z 3 from the input stage 206 via an input portion of the hardware-interface layer 62 5 and generates, in a pipelined fashion, a corresponding stream of values z 3 sin(x) via an output portion of the layer 62 5 .
- the pipeline 44 5 in the PLIC 60 6 receives the streams of values x 4 and cos(z) from the input stage 206 via an input portion of the hardware-interface layer 62 6 and generates, in a pipelined fashion, a corresponding stream of values x 4 cos(z) via an output portion of the layer 62 6 .
- the second intermediate stage 210 of the circuit 200 includes the hardwired pipeline 44 6 , which receives the streams of values z 3 sin(x) and x 4 cos(z) from the first intermediate stage 208 via an input portion of the hardware-interface layer 62 7 , and generates, in a pipelined fashion, a corresponding stream of values z 3 sin(x)+x 4 cos(z) via an output portion of the layer 62 7 .
- the designer may choose to alter the circuit 200 via the input device 154 .
- the designer may swap out one or more of the pipelines 44 1 - 44 7 with one or more other pipelines from the table 192 .
- the square-root pipeline 44 7 has a high precision but a relatively long latency per the default rules that the generator 160 follows as discussed above.
- the table 192 includes another square-root pipeline having a shorter latency, then the designer may replace the pipeline 44 7 with the other square-root pipeline, for example by using the input device 154 to “drag” the other pipeline from the table into the schematic representation of the PLIC 60 8 .
- the designer may swap out one or more of the hardwired pipelines 44 1 - 44 7 with a symbolically defined polynomial series (i.e., a Taylor Series equivalent) that approximates one of the pipelined operations.
- a symbolically defined polynomial series i.e., a Taylor Series equivalent
- the available square-root pipeline 44 7 has insufficient mathematical accuracy per the designers' specification and the default rules that the generator 160 follows as discussed above. If the designer then specifies a new square-root function as a series summation of related monomials, then the front end 156 , interpreter 158 , and file generator 160 concatenate a series of parameterized monomial circuit templates into a circuit that solves for square roots. In this way the designer replaces the default pipeline 44 7 with the higher-precision square-root circuit using symbolic design.
- This example illustrates the symbolic use of polynomials to define new mathematical functions as established by Taylor's Theorem. A more detailed example is discussed below in conjunction with FIGS. 13-17 .
- the designer may also change the topology of the circuit 200 .
- the generator 160 places each instantiation of the hardwired pipelines 44 1 - 44 7 into a separate PLIC 60 .
- each PLIC 60 has sufficient resources to hold multiple pipelines 44 . Consequently, to reduce the number of pipeline units 50 that the circuit 200 occupies, the designer may, using the input device 154 , move some of the pipelines 44 into the same PLIC. For example, the designer may move both instantiations of the multiplier pipeline 44 5 out of the PLICs 60 5 and 60 6 and into the PLIC 60 7 with the adder pipeline 44 6 , thus reducing by two the number of PLICs that the circuit 200 occupies.
- the designer then manually interconnects the two instantiations of the pipeline 44 5 to the pipeline 44 6 within the PLIC 60 7 , or may instruct the generator 160 to perform this interconnection.
- the library 120 may not include a communication-shell template 112 that defines a communication shell 74 for this combination of multiple pipelines 44 5 and 44 6
- the designer or another may write such a template and debug the communication shell that the template defines without having to rewrite the interface-adapter-layer and framework-services templates 108 1 and 110 1 and, therefore, without having to re-debug the layers that these templates define.
- This rearranging of pipelines 44 within the PLICs 60 is also called “refactoring” the circuit 200 .
- the designer may decide to breakdown one or more of the pipelines 44 1 - 44 7 into multiple, less complex pipelines 44 .
- the designer may decide to breakdown the x 4 pipeline 44 3 into two x 2 pipelines (not shown) and a multiplier pipeline 44 5 .
- the designer may decide to replace the sin(x) pipeline 44 1 with a combination of pipelines (not shown) that represents sin(x) in a series-expansion form (e.g. Taylor series, MacLaurin series).
- the file 162 also identifies the eight PLICs 60 1 - 60 8 on the eight pipeline units 50 1 - 50 8 , and for each PLIC, identifies the templates in the library 120 that define the circuitry to be instantiated on the PLIC. For example, referring to FIGS. 6 and 10 , the file 162 indicates that the combination of the following templates in the library 120 defines the circuitry to be instantiated on the PLIC 60 1 : 101 1 , 108 1 , 110 1 , 112 1,1 , 114 1 , and 116 1 . Furthermore, the file 162 includes the values of all constants defined in the configuration template 118 1 .
- the file 162 may also include one or more of the descriptions 128 - 134 and 138 corresponding to these templates, or portions of these descriptions. Moreover, the file 162 defines the interconnections between the PLICs 60 1 - 60 8 and the message specifications for these interconnections The file 162 also defines any designer-specified range constraints for generated values, exceptions, and exception-handline routines.
- the generator 160 may write the file 162 in XML or in another language with XML tags so that both humans and other tools/machines can read the file. Alternatively, the generator 160 may write the file 162 in a language other than XML and without XML tags.
- the designer may instruct the simulator 164 , via the input device 154 , to simulate the circuit 200 using a conventional simulation algorithm.
- the simulator 164 uses the information in the file 162 and the test vectors provided by the designer to simulate the operation of the circuit 200 .
- the simulator 164 first determines the operating parameters of the hardware-interface layers 62 1 - 62 8 and of the hardwired pipelines 44 1 - 44 7 from the file 162 , or by extracting this information directly from the description files 128 1 , 130 1 , 132 1,1 - 132 1,7, and 138 1 - 138 7 in the library 120 .
- these parameters include, e.g., circuit latencies, and the precision (e.g., thirty-two-bit integer, sixty-four-bit floating point) of the values that the pipelines 44 1 - 44 7 receive and generate.
- the simulator 164 determines the latency of the PLIC 60 1 from the time a value x enters the hardware-interface layer 62 1 until the time that the layer 62 1 provides sin(x) on an external pin (not shown) of the PLIC 60 1 .
- the latency information in these description files may be estimated information, or may be actual information derived from an analysis of an instantiation of the pipeline 44 1 and the hardware-interface layer 62 1 on the PLIC 60 1 .
- the simulator 164 estimates the latencies and other operating parameters of the PLICs 60 2 - 60 8 , and simulates the operation of the circuit 200 to generate an output test stream of values y in response to input test streams of values x and z.
- FIG. 11 is a schematic diagram of the circuit 200 of FIG. 10 disposed on a single pipeline unit 50 and in a single PLIC 60 according to an embodiment of the invention.
- the generator 160 determines that all of the hardwired pipelines 44 1 - 44 7 (the multiplier pipeline 44 5 is instantiated twice) can fit within a single PLIC 60 with the same topology shown in FIG. 10 .
- the tool 152 derives the operational parameters and message specifications of the hardware-interface layer 62 from the description files 128 1 , 130 1 , 132 1,1 - 132 1,4 , and 132 1,7 . Because the PLIC 60 incorporates the interface-adapter layer 70 and framework-services layer 72 defined by the templates 108 1 and 110 1 , the tool 152 estimates the input and output operational parameters, e.g., input and output latencies, and the message specifications of the layers 70 and 72 directly from the description files 128 1 and 130 1 . Then, referring to FIGS.
- the tool 152 derives the input operating parameters of the communication shell 74 of FIG. 11 from the description files 132 1 - 132 1,4 , which describe the communications shells for the pipelines 44 1 - 44 4 . For example, if the operational parameters of these communication shells are similar, then the tool 152 may merely estimate that the input-side operational parameters for the shell 74 are the same as the parameters from one of the description files 132 1,1 - 132 1,4 .
- the tool 152 may estimate that an intermediate data-type translation is needed for the input-side operational parameters of the communication shell 74 , or that an averaging operation is needed for the input-side operational parameters of the communication shell, if the respective input-side parameters in the description files 132 1,1 - 132 1,4 do not match.
- the tool 152 derives the output operating parameters for the communication shell 74 from the description file 132 1,7 , which describes the communication shell for the pipeline 44 7 .
- the tool 152 may estimate that the output-side operational parameters for the shell 74 are the same as the output-side parameters from the description file 132 1,7 .
- the generator 160 generates the file 162 , which defines the circuit 200 of FIG. 11 , and the simulator 164 simulates the circuit using the operational parameters calculated for the hardware-interface layer 62 by the generator 160 .
- FIG. 12 is a block diagram of a circuit 220 , for which the tool 152 of FIG. 7 generates a file 162 according to an embodiment of the invention where the circuit solves for a variable in an equation that includes constant coefficients.
- the circuit 220 is similar to the circuit 200 except that the hardwired pipelines 44 2 and 44 3 respectively generate ax 4 and bz 3 instead of x 4 and z 3 , where a and b are constant coefficients.
- one way for the tool 152 to generate such a circuit is to modify the circuit 200 is to parse equation (3) into portions including “a ⁇ x 4 ” and “b ⁇ z 3 ”, and to add two corresponding PLICs (not shown) on which are instantiated the multiplication pipeline 44 5 : one such multiplier PLIC between the PLICs 60 2 and 60 5 and receiving as inputs z 3 and b, and the other such multiplier PLIC between the PLICs 60 3 and 60 6 and receiving as inputs x 4 and a.
- the tool 152 generates the circuit 220 by replacing the pipelines 44 2 and 44 3 in the circuit 200 with pipelines 44 8 and 44 9 , which respectively perform the operations bz 3 and ax 4 .
- the section 124 of the library 120 FIG. 6
- the section 124 of the library 120 includes corresponding hardwired-pipeline templates 114 8 and 114 9 .
- the generator 160 then generates the file 162 to include the entered values for the coefficients a and b. These values may contained within one or more XML tags or be present in some other form.
- the values of a and b may be provided to the configuration managers 88 ( FIG. 3 ) of the PLICs 60 3 and 60 2 as soft-configuration data. More specifically, a configuration manager (not shown and different from the configuration managers 88 ), which is described in previously incorporated U.S. patent application Ser. No. (Attorney Docket No. 1934-25-3, 1934-26-3, and 1934-36-3) and which is executed by the host processor 12 ( FIG. 1 ), initializes the values of a and b by sending configuration messages for a and b to the pipeline units 50 3 and 50 2 .
- the accelerator-configuration registry 40 ( FIG. 1 ) may store a and b as XML files to initialize the configuration messages created and sent by the configuration manager executed by the host processor 12 .
- the tool 152 can use similar techniques to set the values of constant coefficients for other types of circuit portions such as filters, Fast Fourier Transformers (FFTs), and Inverse Fast Fourier Transformers (IFFTs).
- FFTs Fast Fourier Transformers
- IFFTs Inverse Fast Fourier Transformers
- FIGS. 7-12 other embodiments of the tool 152 and its operation are contemplated.
- one or more of the functions of the tool 152 may be performed by a functional block (e.g., front end 156 , interpreter 158 ) other than the block to which the function is attributed in the above discussion.
- a functional block e.g., front end 156 , interpreter 158
- the tool 152 may be described using more or fewer functional blocks.
- the tool 152 is described as either fitting the eight instantiations of the hardwired pipelines 44 1 - 44 7 in eight PLICs 60 1 - 60 8 ( FIGS. 10 and 12 ) or in a single PLIC 60 ( FIG. 11 ), the tool 152 may fit these pipelines in more than one but fewer than eight PLICs, depending on the resources available on each PLIC.
- the tool 152 is described as allowing one to design a circuit for instantiation on a PLIC, the tool 152 may also allow one to design a circuit for instantiation on an ASIC.
- the tool 152 is described as generating a file 162 that defines an algorithm-implementing circuit, such as the circuit 200 ( FIG. 11 ), for instantiation on a specific pipeline accelerator 14 ( FIG. 14 ) or on a pipeline accelerator that is compatible with a specific platform
- the tool may generate, in addition to or instead of the file 162 , a file (not shown) that more generally defines the algorithm.
- a file may include algorithm-definition data that is sometimes called “meta-data,” and may allow the host processor 12 ( FIG. 1 ) to implement the algorithm in any manner (e.g., hardwired pipeline(s), software, a combination of both pipeline(s) and software) supported by the peer vector machine 10 ( FIG. 1 ).
- meta-data describes something, such as an algorithm or another file, but is not executable.
- the information in the description files 126 - 134 may include meta-data.
- a processor such as the host processor 12
- a meta-data file that defines an algorithm may allow the host processor 12 to configure the peer vector machine 10 for implementing the algorithm even where the machine does not support the implementation(s) specified by the file 162 .
- Such configuring of the peer vector machine 10 is described in U.S. patent application Ser. No. (Attorney Docket Nos. 1934-25-3, 1934-26-3, and 1934-36-3), which were previously incorporated by reference.
- the tool 152 may generate, and the library 120 ( FIG. 6 ) may store, one or more meta-data files (not shown) for describing the messages that carry data to/from the PLICs 60 (or software equivalents) of a circuit, such as the circuit 200 ( FIG. 10 ).
- a meta-data file specifies this.
- the file 162 FIG. 7 ) incorporates or points to these meta-data files so that the host processor 12 ( FIG. 1 ) can instantiate the message objects that generate such messages as discussed in previously incorporated U.S. patent application Ser. Nos. (Attorney Docket Nos. 1934-25-3, 1934-26-3, and 1934-36-3).
- the tool 152 may generate, and the library 120 ( FIG. 6 ) may store, one or more meta-data files (not shown) for describing the exceptions that the PLICs 60 (or software equivalents) of a circuit, such as the circuit 200 ( FIG. 10 ), generate.
- a meta-data file specifies this.
- the file 162 FIG. 7 ) incorporates or points to these meta-data files so that the host processor 12 ( FIG. 1 ) can instantiate corresponding exception handlers as discussed in previously incorporated U.S. patent application Ser. Nos. (Attorney Docket Nos. 1934-25-3, 1934-26-3, and 1934-36-3).
- the tool 152 may generate, and the library 120 ( FIG. 6 ) may store, one or more meta-data files (not shown) for describing the PLICs 60 (or software equivalents) of a circuit, such as the circuit 200 ( FIG. 10 ).
- a meta-data file may describe the mathematical operation performed by, and the input and output specifications of, circuitry to be instantiated on a corresponding PLIC (or a software equivalent of the circuitry).
- the file 162 ( FIG. 7 ) incorporates or points to these meta-data files so that the host processor 12 ( FIG.
- 1 ) can 1 ) determine which firmware files (or software equivalents) stored in the library 120 or in another library will respectively cause the PLICs (or the host processor 12 ) to instantiate the desired circuitry, or 2 ) generate one or more of these firmware files (or software equivalents) that are not otherwise available, as described in previously incorporated U.S. patent application Ser. Nos. (Attorney Docket Nos. 1934-25-3, 1934-26-3, and 1934-36-3).
- the library 120 may store one or more of the files 162 ( FIG. 7 ) that the tool 152 generates, so that a designer can incorporate previously designed circuits, such as the circuit 200 ( FIG. 10 ), into a new larger and more complex circuit.
- the tool 152 may then generate a new file 162 that defines this new circuit.
- the tool 152 ( FIG. 7 ) allows one to design a circuit for implementing virtually any complex function f(x) by expanding the function into an equivalent infinite series.
- a combination of summing and multiplying hardwired pipelines 44 interconnected to generate ax+bx 2 +cx 3 + . . . +vx n can implement any function f(x) that one can expand into a MacLaurin series, where the only differences in this combination of pipelines from function to function are the values of the constant coefficients a, b, c, . . ., v. Therefore, if the tool 152 is programmed with, or otherwise has access to, the coefficients for a number of functions f(x), then the tool can implement any of these functions as a series expansion.
- the tool 152 may set the number of expansion terms that the interconnected pipelines 44 generate based on the level of accuracy for f(x) that the circuit designer (not shown) enters into the tool.
- a designer may directly enter a function f(x) into the front end 156 ( FIG. 7 ) of the tool 152 in series-expansion form.
- FIG. 13 shows only the adders, multipliers, and delay blocks that compose the circuit 240 , it being understood that the tool 152 may define the circuit for instantiation on one or more PLICs 60 using one or more hardwired pipelines 44 and one or more hardware-interface layers 62 (e.g., FIGS. 10 and 12 ) per one of the techniques described above in conjunction with FIGS. 7-12 .
- the circuit 240 may be part of a larger circuit (not shown) for implementing an algorithm having cos(x) as one of its portions.
- the circuit 240 includes a term-generating section 242 and a term-summing section 244 .
- the circuit 240 includes a term-generating section 242 and a term-summing section 244 .
- a term-generating section 242 For clarity, only the parts of these sections that respectively generate and sum the first four power-of-x terms of the cos(x) series expansion are shown, it being understood that any remaining portions of these sections for respectively generating and summing the fifth and higher power-of-x terms are similar.
- the term-generating section 242 includes a chain of multipliers 246 1 - 246 p (only multipliers 246 1 - 246 8 are shown) and delay blocks 248 1 - 248 q (only delay blocks 248 1 - 248 3 are shown) that generate the power-of-x terms of the cos(x) series expansion.
- the delay blocks 248 insure that the multipliers 246 only multiply powers of x from the same sample time.
- the term-summing section 244 includes two summing paths: a path 250 for positive numbers, and a path 252 for negative numbers.
- the path 250 includes a chain of adders 254 1 - 254 r (only adders 254 1 - 254 2 are shown) and delay blocks 256 1 - 256 2 (only blocks 256 1 and 256 2 are shown).
- the path 252 includes a chain of adders 258 1 - 258 t (only adder 258 1 is shown) and delay blocks 260 1 - 260 u (only blocks 260 1 and 260 2 are shown).
- a final adder 262 sums the cumulative positive and negative sums from the paths 250 and 252 to provide the value for cos(x).
- the adder 262 is shown as summing the first five terms of the expansion (1 and the first four power-of-x terms), it is understood that the final adder 262 may be disposed further down the paths 250 and 252 if the circuit 240 generates additional terms of the cos(x) expansion.
- numbers being summed are floating-point numbers, exceptions, such as a mantissa-register underflow, may occur when a positive number is summed with a negative number that is almost equal to the positive number. But by providing separate summing paths 250 and 252 for positive and negative numbers, respectively, the circuit 240 limits the number of possible locations where such exceptions can occur to a single adder 262 .
- providing the separate paths 250 and 252 may significantly reduce the frequency of such floating-point exceptions, and thus may reduce the time that the peer-vector machine 10 ( FIG. 1 ) consumes handling such exceptions and the size and complexity of the exception manager 86 ( FIG. 4 ).
- each of the multipliers 246 , adders 254 and 258 has a latency (i.e., delay) D of one clock cycle.
- a value x is present at the inputs of the multiplier 246 1 and after the first clock edge, the value x2 is present at the output of the multiplier 246 1 .
- the multipliers 246 and adders 254 and 258 may have different latencies and latencies other than one, and that the delays provided by the blocks 248 , 256 , and 260 may be adjusted accordingly.
- a value x 1 is present at the input of the multiplier 246 1 where the subscript “1” denotes the time or position of x 1 relative to the other values of x.
- a value x 2 is present at the input of the multiplier 246 1 and x 1 2 is present at the output of this multiplier.
- this example follows only the propagation of x 1 , it being understood that the propagation of x 2 and subsequent values of x is similar but delayed relative to the propagation of x 1 .
- x 1 is hereinafter referred to “x” in this example.
- ⁇ x 2 /2! is present at the output of the multiplier 246 2
- x 4 is present at the output of the multiplier 246 3
- x 2 is available at the output of the block 248 1 .
- ⁇ x 6 /6! is present at the output of the multiplier 246 6
- x 8 is present at the output of the multiplier 246 7
- x 2 is available at the output of the block 248 3
- “1+x 4 /4!” is available at the output of the summer 254 1 .
- x 8 /8! is present at the output of the multiplier 246 8
- “1+x 4 /4!” is available at the output of the block 256 2
- “ ⁇ x 2 /2! ⁇ x 6 /6!” is available at the output of the adder 258 1 .
- the adder 262 is located after (to the right in FIG. 13 ) the adder that sums the highest generated term to a preceding term, and the operation continues as above.
- the circuit 240 may include multipliers and adders to generate and sum the odd power-of-x terms (e.g., x, x3, x5) with the coefficients of these terms set to zero.
- odd power-of-x terms e.g., x, x3, x5
- Such an alternate circuit 240 is more flexible because it allows one to implement function expansions that include odd powers of x, but in this case would have a greater latency than seven clock cycles.
- the circuit 270 has a topology that reduces the number of delay blocks and the latency as compared to the circuit 240 of FIG. 13 .
- FIG. 14 shows only the adders, multipliers, and delay blocks that compose the circuit 270 , it being understood that the tool 152 may define the circuit for instantiation on one or more PLICs 60 using one or more hardwired pipelines 44 and one or more hardware-interface layers 62 (e.g., FIGS.
- circuit 270 may be part of a larger circuit (not shown) for implementing an algorithm having cos(x) as one of its portions.
- the circuit 270 includes a term-generating section 272 and a term-summing section 274 .
- a term-generating section 272 and a term-summing section 274 .
- the circuit 270 includes a term-generating section 272 and a term-summing section 274 .
- a term-generating section 272 and a term-summing section 274 .
- any remaining portions of these sections for respectively generating and summing the fifth and higher power-of-x terms are similar.
- the term-generating section 272 includes a hierarchy of multipliers 276 1 - 276 p (only multipliers 276 1 - 276 8 are shown) and delay blocks 278 1 - 278 q (only delay blocks 278 1 - 278 2 are shown) that generate the power-of-x terms of the cos(x) series expansion.
- the delay blocks 278 insure that the multipliers 276 only multiply powers of x from the same sample time.
- the term-summing section 274 includes two summing paths: a path 280 for positive numbers, and a path 282 for negative numbers.
- the path 280 includes a chain of adders 284 1 - 284 r (only adders 284 1 - 284 2 are shown) and delay blocks 286 1 - 286 s (only block 286 1 is shown).
- the path 282 includes a chain of adders 288 1 - 288 t (only adder 288 1 is shown) and delay blocks 290 1 - 290 u (only block 290 1 is shown).
- a final adder 292 sums the cumulative positive and negative sums from the paths 280 and 282 to provide the value for cos(x).
- the adder 292 is shown as summing the first five terms of the expansion ( 1 and the first four power-of-x terms), it is understood that the final adder 292 may be disposed further down the paths 280 and 282 if the circuit 270 generates additional terms of the cos(x) expansion.
- each of the multipliers 276 , adders 284 and 288 has a latency (i.e., delay) D of one clock cycle. It is understood, however, that the multipliers 276 and adders 284 and 288 may have different latencies and latencies other than one, and that the delays provided by the blocks 278 and 288 may be adjusted accordingly.
- a value x is present at the input of the multiplier 276 1 .
- x 2 is present at the output of the multiplier 276 1 .
- x 4 is present at the output of the multiplier 276 2 , and x 2 is available at the output of the block 278 1 .
- ⁇ x 6 /6! is present at the output of the multiplier 276 7
- x 8 /8! is present at the output of the multiplier 276 8
- ⁇ x 2 /2! is available at the output of the block 290 1
- “1+x 4 /4!” is available at the output of the summer 284 1 .
- the latency of the circuit 270 is six clock cycles, which is one fewer clock cycle than the latency of the circuit 240 of FIG. 13 . But as the number of the power-of-x terms increases beyond four, the gap between the latencies of the circuits 270 and 240 increases such that the circuit 270 provides an even greater improvement in the latency.
- the adder 292 is located after (to the right in FIG. 14 ) the adder that sums the highest generated term to a preceding term, and the operation continues as above.
- the circuit 270 may include multipliers and adders to generate and sum the odd power-of-x terms (e.g., x, x3, x5) with the coefficients of these terms set to zero.
- Such an alternate circuit 270 may be more flexible because it allows one to implement function expansions that include odd powers of x without increasing the circuit's latency for a given highest power of x. That is, where the highest power of x generated by the circuit 270 is x 8 , adding multipliers and adders to generate x 3 , x 5 , and x 7 would not increase the latency of the circuit 270 beyond six clock cycles. This is because the circuit 270 would generate the power-of-x terms in parallel, not serially like the circuit 240 of FIG. 13 .
- FIG. 15 is a block diagram of a power-of-x term generator 300 that the tool 152 ( FIG. 7 ) defines to replace the power-of-x-term odd multipliers 246 3 , 246 5 , 246 7 , . . . of the term-generating section 242 of FIG. 13 and the power-of-x-term multipliers 276 1 , 276 2 , 276 3 , 276 4 , . . . of FIG. 14 according to an embodiment of the invention.
- the generator 300 includes fewer multipliers (here one) than the term-generating sections 242 and 272 (which each include eight multipliers), but may have a higher latency for a given number of generated power-of-x terms.
- FIGS. 1 the generator 300 that the tool 152 ( FIG. 7 ) defines to replace the power-of-x-term odd multipliers 246 3 , 246 5 , 246 7 , . . . of the term-generating section 242 of FIG. 13 and the power-of
- FIG. 15 shows only the multipliers and other components that compose the term generator 300 , it being understood that the tool 152 may define a circuit that includes the term generator for instantiation on one or more PLICs 60 using one or more hardwired pipelines 44 and one or more hardware-interface layers 62 (e.g., FIGS. 10 and 12 ) per one of the techniques described above in conjunction with FIGS. 7-12 .
- the term generator 300 includes a register 302 for storing x, a multiplier 304 , a multiplexer 306 , and term-storage registers 308 1 - 308 p (only registers 308 1 - 308 4 are shown). For clarity, only the parts of the generator 302 that generates the first four power-of-x terms of the cos(x) series expansion are shown, it being understood that any remaining portions of the generator for generating the fifth and higher power-of-x terms are similar.
- each of the register 302 , multiplier 304 , and registers 308 has a respective latency (i.e., delay) of one clock cycle, and that the multiplexer 306 is not clocked, i.e., is asynchronous. It is understood, however, that the register 302 , multiplier 304 , and registers 308 may have different latencies and latencies other than one, that the multiplexer 306 may be clocked and have a latency of one or more clock cycles, and that the term-summing sections 244 and 274 of FIGS. 13 and 14 , respectively, may be adjusted accordingly.
- a value x is present at the input of the register 302 .
- the current value of x is loaded into, and thus is present at the output of, the register 302 , and is present at the output of the multiplexer 306 , which couples its input 312 to its output.
- the register 302 is then disabled. Alternatively, the register 302 is not disabled but the value of x at the input of this register does not change.
- x 2 is present at the output of the multiplier 304 , and the multiplexer changes state and couples its input 314 to its output such that x2 is also present at the output of the multiplexer 306 .
- x 2 is loaded into, and thus is available at the output of, the register 3101 , and x 3 is available at the output of the multiplier 304 and at the output of the multiplexer 306 .
- x 4 is available at the output of the multiplier 304 and at the output of the multiplexer 306 .
- x 4 is loaded into, and thus is available at the output of, the register 3102
- x 5 is available at the output of the multiplier 304 and at the output of the multiplexer 306 .
- x 6 is available at the output of the multiplier 304 and at the output of the multiplexer 306 .
- x 6 is loaded into, and thus is available at the output of, the register 310 3
- x 7 is available at the output of the multiplier 304 and at the output of the multiplexer 306 .
- x 8 is available at the output of the multiplier 304 and at the output of the multiplexer 306 .
- x 8 is loaded into, and thus is available at the output of, the register 310 4 , the next value of x is loaded into the register 302 . But if the generator 300 generates powers of x higher than x 8 , the generator continues operating in the described manner before loading the next value of x into the register 302 .
- the register 302 , multiplier 304 , multiplexer 306 , and registers 310 repeat the above procedure for each subsequent value of x.
- the generator 300 may be modified to load x 2 into the register 302 so that the multiplier 304 thereafter generates only even powers of x.
- the registers 308 may be eliminated, and the multiplexer 306 may feed the respective powers of x directly to the term multipliers, e.g., the term multipliers 246 2 , 246 4 , 246 6 , 246 8 , . . . of FIG. 13 and the term multipliers 276 5 , 276 6 , 276 7 , 276 8 , . . . of FIG. 14 .
- the circuit 320 is similar to the circuit 240 of FIG. 13 , but because the odd power-of-x terms for the e x expansion may be positive or negative, the circuit 320 also includes sign determiners (described below and in conjunction with FIG. 17 ) that respectively provide these odd-power-of-x terms to the proper path (positive or negative) of the term-summing section. For clarity, FIG.
- the tool 152 may define the circuit for instantiation on one or more PLICs 60 using one or more hardwired pipelines 44 and one or more hardware-interface layers 62 (e.g., FIGS. 10 and 12 ) per one of the techniques described above in conjunction with FIGS. 7-12 .
- the circuit 320 may be part of a larger circuit (not shown) for implementing an algorithm having e x as one of its portions.
- the circuit 320 includes a term-generating section 322 and a term-summing section 324 , which includes positive- and negative-value summing paths 326 and 328 .
- e x 1 + x + 1 2 ! ⁇ x 2 + 1 3 ! ⁇ x 3 + 1 4 ! ⁇ x 4 + 1 5 ! ⁇ x 5 ⁇ ... ( 5 )
- the circuit 320 includes a term-generating section 322 and a term-summing section 324 , which includes positive- and negative-value summing paths 326 and 328 . For clarity, only the parts of these sections that respectively generate and sum the first five power-of-x terms of the e x series expansion are shown, it being understood that any remaining portions of these sections for respectively generating and summing the sixth and higher power-of-x terms are similar.
- the term-generating section 322 includes a chain of multipliers 330 1 - 330 p (only multipliers 330 1 - 330 8 are shown) and delay blocks 332 1 - 332 q (only delay blocks 332 1 - 332 4 are shown) that generate the power-of-x terms of the ex series expansion.
- the section 322 also includes, for each odd-power-of-x term (e.g., x, x 3 , x 5 , . . .
- a respective sign determiner 334 1 - 334 v (only determiners 334 1 - 334 3 are shown) that directs positive values of the odd-power-of-x term to the positive summing path 326 of the term-summing section 324 , and that directs negative values of the odd-power-of-x term to the negative summing path 328 .
- the positive-value path 326 of the term-summing section 324 includes a chain of adders 336 1 - 336 r (only adders 336 1 - 336 5 are shown) and delay blocks 338 1 - 338 s (only blocks 338 1 - 338 3 are shown).
- the negative-value path 328 includes a chain of adders 340 1 - 340 t (only adders 340 1 - 340 2 are shown) and delay blocks 342 1 - 342 u (only blocks 342 1 - 342 2 are shown).
- a final adder 344 sums the cumulative positive and negative sums from the paths 326 and 328 to provide the value for e x .
- the final adder 344 is shown as summing the first six terms of the e x expansion (“1” and the first five power-of-x terms), it is understood that the final adder may be disposed further down the paths 326 and 328 if the circuit 320 generates additional terms of the expansion.
- each of the multipliers 330 , sign determiners 334 , and adders 336 and 340 has a latency (i.e., delay) D of one clock cycle. It is understood, however, that the multipliers 330 , sign determiners 334 , and adders 334 and 336 may have different latencies and latencies other than one, and that the delays provided by the blocks 332 , 338 , and 342 may be adjusted accordingly.
- a value x is present at both inputs of the multiplier 330 1 , at the input of the delay block 332 1 , and at the input of the sign determiner 334 1 .
- x 2 is available at the output of the multiplier 330 1
- x is available at the output of the delay block 332 1
- “1” is available at the output of the delay block 338 1 .
- x and logic “0” are respectively available at the (+) and ( ⁇ ) outputs of the sign determiner 334 1 ; conversely, if x is negative, logic “0” and x are respectively available at the (+) and ( ⁇ ) outputs of the determiner 334 1 .
- x 2 /2! is available at the output of the multiplier 330 2
- x 3 is present at the output of the multiplier 330 3
- x is available at the output of the delay block 332 2 .
- x is positive
- x 3 /3! is available at the output of the multiplier 330 4
- x 4 is available at the output of the multiplier 330 5
- x is available at the output of the delay block 332 3
- “1+x+x 2 /2!” (x positive) or “1+x 2 /2!” (x negative) is available at the output of the adder 336 2 .
- x 4 /4! is present at the output of the multiplier 330 6
- x 5 is present at the output of the multiplier 330 7
- x is available at the output of the block 332 4
- “1+x+x 2 /2!” (x positive) or “1+x 2 /2!” (x negative) is available at the output of the delay block 338 2 .
- x 3 /3! and thus x, is positive
- x 3 /3! and logic “0” are respectively present at the (+) and ( ⁇ ) outputs of the sign determiner 334 2 ; conversely, if x 3 /3!, and thus x, is negative, logic “0” and x 3 /3!
- x 5 /5! is available at the output of the multiplier 330 8
- “1+x+x 2 /2!+x 3 /3!” (x positive) or “1+x 2 /2!” (x negative) is available at the output of the adder 336 3
- x 4 /4! is available at the output of the delay block 338 3
- “0” (x positive) or “ ⁇ x ⁇ x 3 /3!” (x negative) is available at the output of the adder 340 1 .
- the latency of the circuit 320 is eight. Furthermore, if the adder 344 , while summing a positive number and a negative floating-point number, generates an exception, the exception manager 86 ( FIG. 4 ) or the host processor 12 ( FIG. 1 ) may handle this exception using a conventional floating-point-exception routine.
- the adder 344 is located after (to the right in FIG. 16 ) the adder 336 or 340 that sums the highest generated term to a preceding term, and the operation continues as above.
- circuit 320 may replace the term-generating section 322 with a section similar to the term-generating section 272 of FIG. 14 , or may replace the chain of multipliers 330 with a power-of-x generator similar to the generator 300 of FIG. 15 .
- FIG. 17 is a block diagram of the sign determiner 334 ,of FIG. 16 according to an embodiment of the invention, it being understood that the sign determiners 334 2 - 334 v are similar.
- the sign determiner 334 1 includes an input node 350 , a ( ⁇ ) output node 352 , a (+) output node 354 , a register 356 that stores a logic “0”, and demultiplexers 358 and 360 .
- the demultiplexer 358 includes a control node 362 coupled to receive a sign bit of the value at the input node 350 , a ( ⁇ ) input node 364 coupled to the input node 350 , a (+) input node 366 coupled to the register 356 , and an output node 368 coupled to the ( ⁇ ) output node 352 .
- the demultiplexer 360 includes a control node 370 coupled to receive the sign bit of the value at the input node 350 , a ( ⁇ ) input node 372 coupled to the register 356 , a (+) input node 374 coupled to the input node 350 , and an output node 376 coupled to the (+) output node 354 .
- the sign determiner 334 1 receives at its input node 350 a positive (+) value v, which, therefore, includes a positive sign bit.
- This sign bit is typically the most-significant bit of v, although the sign bit may be any other bit of v.
- the demultiplexer 360 couples v (including the sign bit) from its (+) input node 374 to its output node 376 , and thus to the (+) output node 354 of the sign determiner 3341 .
- the demultiplexer 358 couples the logic “0” stored in the register 356 from the (+) input node 366 to the output node 368 , and thus to the ( ⁇ ) output node 352 of the sign determiner 334 1 .
- the sign determiner 334 1 receives at its input node 350 a negative ( ⁇ ) value v, which, therefore, includes a negative sign bit.
- the demultiplexer 358 couples v (including the sign bit) from its ( ⁇ ) input node 364 to its output node 368 , and thus to the ( ⁇ ) output node 352 of the sign determiner 334 1 .
- the demultiplexer 360 couples the logic “0” stored in the register 356 from the ( ⁇ ) input node 372 to the output node 376 , and thus to the (+) output node 354 of the sign determiner 334 1 .
- sign determiner 334 1 may replace the logic “0” register with a component, such as pull-down resistor, coupled to a logic “0” voltage level, such as ground.
- a component such as pull-down resistor
- the peer vector machine 10 may be disposed on a single integrated circuit.
Abstract
A computer-based circuit-design tool includes a front end, an interpreter coupled to the front end, and a generator coupled the interpreter. The front end receives symbols that define an algorithm, and the interpreter parses the algorithm into respective algorithm portions. The generator identifies a corresponding circuit template for each of the algorithm portions, each template defining a circuit for executing the respective algorithm portion, and interconnects the identified templates such that the interconnected templates define a circuit that is operable to execute the algorithm. As compared to prior design tools, this tool may decrease the time and effort required to design a circuit for instantiation on a programmable logic integrated circuit (PLIC) or on an application-specific integrated circuit (ASIC) by allowing one to construct the circuit from previously written templates that define previously tested and debugged circuits.
Description
- This application claims priority to U.S. Provisional Application Ser. Nos. 60/615,192, 60/615,157, 60/615,170, 60/615,158, 60/615,193, and 60/615,050, filed on Oct. 1, 2004, which are incorporated by reference.
- This application is related to U.S. patent application Ser. Nos. ______ (Attorney Docket Nos. 1934-21-3,1934-24-3, 1934-25-3, 1934-26-3,1934-31-3, 1934-35-3, and 1934-36-3), which have a common filing date and assignee and which are incorporated by reference.
- Electronics engineers often instantiate circuits, such as logic circuits, on programmable logic integrated circuits (PLICs) such as field-programmable gate arrays (FPGAs), and on application-specific integrated circuits (ASICs). Because an engineer typically configures with firmware the circuit components and interconnections inside of a PLIC, he can modify a circuit instantiated on the PLIC merely by modifying and reloading the firmware. An example of a computer architecture that exploits the ability to configure and reconfigure circuitry within a PLIC with firmware is described in U.S. Patent Publication No. 2004/0133763, which is incorporated herein by reference.
- But unfortunately, it is often difficult and time consuming to design a circuit for instantiation on a PLIC, and an increase in the level of design difficulty and the time required to complete the design often accompany the routing resources, component density, and component variety on a PLIC.
- Comparatively, when a software programmer writes source code for a software application, he can often save time by incorporating into the application previously written and debugged software objects from a software-object library. Suppose the programmer wishes to write a software application that solves for y in the following equation:
y=x 2 +z 3 (1)
Further suppose that a software-object library includes a first software object for squaring a value (here x), a second software object for cubing a value (here z), and a third software object for summing two values (here x2 and z3). By incorporating pointers to these three objects in the source code, a compiler effectively merges these objects into the software application while compiling the source code. Therefore, the object library allows the programmer to write the software application in a shorter time and with less effort because the programmer does not have to “reinvent the wheel” by writing and debugging pieces of source code that respectively square x, cube z, and sum x2 and z3. Furthermore, if the programmer needs to modify the software application, he can do so without modifying and re-debugging the first, second, and third software objects. - In contrast, there are typically no time- or effort-saving equivalents of software objects available to a hardware engineer who wishes to design a circuit for instantiation on a PLIC; consequently, when a hardware engineer designs a circuit for instantiation on a PLIC, he typically must write the source code (e.g., Verilog Hardware Description Language (VHDL)) “from scratch.” Suppose that an engineer wishes to design a logic circuit that solves for y equation (1). Because there are typically no hardware equivalents of the first, second, and third software objects described in the preceding paragraph, the engineer may write source code that describes first and second portions of a circuit for solving equation (1). The first circuit portion squares x, cubes z, and sums x2 and z3, and the second circuit portion interfaces the first circuit portion to the external pins of the PLIC. The engineer then compiles the source code with PLIC design tool (typically provided by the PLIC manufacturer), which synthesizes and routes the circuit and then generates the configuration firmware that, when loaded into the PLIC, instantiates the circuit. Next, the engineer loads the firmware into the PLIC and debugs the instantiated circuit. Unfortunately, the synthesizing and routing steps are often not trivial, and may take a number of hours or even days depending upon the size and complexity of the circuit. And even if the engineer makes only a minor modification to a small portion of the circuit, he typically must repeat the synthesizing, routing, and debugging steps for the entire circuit.
- Another factor that may add to the time and effort that an engineer expends while designing a circuit for instantiation on a PLIC is that a PLIC design tool typically recognizes only hardware-specific source code. Suppose that a mathematician, who writes an equation using mathematical symbols (e.g., “+,” “−,” “≦,” “Σ,” “∫” “∂,” “x2,” “z3,” and “√”), wishes to instantiate on a PLIC a circuit that solves for a variable in a complex equation that includes, e.g., partial derivatives and integrations. Because a PLIC design tool typically recognizes few, if any, mathematical symbols, the mathematician often must explain the equation and the desired operating parameters (e.g., latency and precision) of the circuit to a hardware engineer, who then translates the equation and operating parameters into source code that the design tool recognizes. These explanation and translation steps are often time consuming and difficult for the engineer, particularly where the equation is mathematically complex or the circuit has stringent operating parameters (e.g., high speed, high precision).
- Therefore, a need has arisen for a new methodology and for a new tool for designing a circuit for instantiation on a PLIC.
- According to an embodiment of the invention, a computer-based circuit design tool includes a front end, an interpreter coupled to the front end, and an integrator coupled to the interpreter. The front end receives symbols that define a logical expression, and the interpreter parses the expression into respective portions. The integrator identifies a corresponding circuit template for each of the expression portions, and logically interconnects the identified templates into a representation of an electronic circuit that is operable to execute the expression.
- As compared to prior circuit design tools, such a tool may shorten the time and reduce the effort that an engineer expends designing a circuit for instantiation on a PLIC by allowing the engineer to build the circuit from templates of previously designed and debugged circuits.
- According to a related embodiment of the invention, the front end of the design tool recognizes mathematical symbols so that one can design a PLIC circuit for executing a mathematical expression with little or no assistance from a hardware engineer.
-
FIG. 1 is a block diagram of a peer-vector computing machine having a pipelined accelerator that one can design with a design tool according to an embodiment of the invention. -
FIG. 2 is a block diagram of a pipeline unit that includes a PLIC and that can be included in the pipelined accelerator ofFIG. 1 according to an embodiment of the invention. -
FIG. 3 is a diagram of the circuit layers that compose the hardware interface layer within the PLIC ofFIG. 2 according to an embodiment of the invention. -
FIG. 4 is a block diagram of the circuitry that composes the interface adapter and framework services layers ofFIG. 3 according to an embodiment of the invention. -
FIG. 5 is a diagram of a hardware-description file for a circuit that one can instantiate on a PLIC according to an embodiment of the invention. -
FIG. 6 is a block diagram of a PLIC circuit-template library according to an embodiment of the invention. -
FIG. 7 is a block diagram of circuit-design system that includes a computer-based tool for designing a circuit using templates from the library ofFIG. 6 according to an embodiment of the invention. -
FIG. 8 illustrates the parsing of a mathematical expression according to an embodiment of the invention. -
FIG. 9 illustrates a table of hardwired-pipeline library templates corresponding to the hardwired-pipelines available for executing respective portions of the parsed mathematical expression ofFIG. 8 according to an embodiment of the invention. -
FIG. 10 is a block diagram of a circuit that the tool ofFIG. 7 generates from circuit templates downloaded from the library ofFIG. 6 according to an embodiment of the invention. -
FIG. 11 is a block diagram of a circuit that the tool ofFIG. 7 generates from circuit templates downloaded from the library ofFIG. 6 according to another embodiment of the invention. -
FIG. 12 is a block diagram of a circuit that the tool ofFIG. 7 generates from circuit templates downloaded from the library ofFIG. 6 according to yet another embodiment of the invention. -
FIG. 13 is a block diagram of a circuit that the tool ofFIG. 7 generates for implementing a function as a series expansion according to an embodiment of the invention. -
FIG. 14 is a block diagram of a circuit that the tool ofFIG. 7 generates for implementing the function ofFIG. 13 as a series expansion according to another embodiment of the invention. -
FIG. 15 is a block diagram of a power-of-x term generator that the tool ofFIG. 7 generates as a replacement for the power-of-x multipliers ofFIGS. 13 and 14 according to an embodiment of the invention. -
FIG. 16 is a block diagram of a circuit that the tool ofFIG. 7 generates for implementing another function as a series expansion according to an embodiment of the invention. -
FIG. 17 is a block diagram of a sign determiner fromFIG. 16 according to an embodiment of the invention. - Introduction
- A computer-based circuit design tool according to an embodiment of the invention is discussed below in conjunction with
FIGS. 7-10 . - But first is presented in conjunction with
FIGS. 1-6 an overview of concepts that are related to the design tool according to an embodiment of the invention. An understanding of these concepts should facilitate the reader's understanding of the design tool. - Overview of Concepts Related to Design Tool
-
FIG. 1 is a schematic block diagram of acomputing machine 10, which has a peer-vector architecture according to an embodiment of the invention. In addition to ahost processor 12, the peer-vector machine 10 includes apipelined accelerator 14, which is operable to process at least a portion of the data processed by themachine 10. Therefore, the host-processor 12 and theaccelerator 14 are “peers” that can transfer data messages back and forth. Because theaccelerator 14 includes hardwired logic circuits instantiated on one or more PLICs, it executes few, if any, program instructions, and thus typically performs mathematically intensive operations on data significantly faster than a bank of computer processors can for a given clock frequency. Consequently, by combing the decision-making ability of theprocessor 12 and the number-crunching ability of theaccelerator 14, themachine 10 has the same abilities as, but can often process data faster than, a conventional processor-based computing machine. Furthermore, as discussed below and in U.S. Patent Publication No. 2004/0136241, which is incorporated by reference, providing theaccelerator 14 with a communication interface that is compatible with the interface of thehost processor 12 facilitates the design and modification of themachine 10, particularly where the communication interface is an industry standard. And where theaccelerator 14 includes multiple pipeline units (FIG. 2 ), providing each of these units with this compatible communication interface facilitates the design and modification of the accelerator, particularly where the communication interface is an industry standard. Moreover, themachine 10 may also provide other advantages as described in the following other patent publications, which are incorporated by reference: 2004/0133763; 2004/0181621; 2004/0170070; and, 2004/0130927. - Still referring to
FIG. 1 , in addition to thehost processor 12 and the pipelinedaccelerator 14, the peer-vector computing machine 10 includes aprocessor memory 16, aninterface memory 18, abus 20, afirmware memory 22, an optional raw-data input port 24, an optional processed-data output port 26, and anoptional router 31. - The
host processor 12 includes aprocessing unit 32 and amessage handler 34, and theprocessor memory 16 includes a processing-unit memory 36 and ahandler memory 38, which respectively serve as both program and working memories for the processor unit and the message handler. Theprocessor memory 36 also includes an accelerator-configuration registry 40 and a message-configuration registry 42, which store respective configuration data that allow thehost processor 12 to configure the functioning of theaccelerator 14 and the structure of the messages that themessage handler 34 sends and receives. - The pipelined
accelerator 14 includes at least one PLIC (FIG. 2 ) on which are disposed hardwired pipeline 44 1-44 n, which process respective data while executing few, if any, program instructions. Thefirmware memory 22 stores the configuration firmware for the PLIC(s) of theaccelerator 14. If theaccelerator 14 is disposed on multiple PLICs, these PLICs and their respective firmware memories may be disposed on multiple circuit boards that are often called daughter cards or pipeline units (FIG. 2 ). Theaccelerator 14 and pipeline units are discussed further in previously incorporated U.S. Patent Publication Nos. 2004/0136241, 2004/0181621, and 2004/0130927. The pipeline units are also discussed below in conjunction withFIGS. 2-4 . - Generally, in one mode of operation of the peer-
vector computing machine 10, the pipelinedaccelerator 14 receives data from one or more software applications running on thehost processor 12, processes this data in a pipelined fashion with one or more logic circuits that execute one or more mathematical algorithms, and then returns the resulting data to the application(s). As stated above, because the logic circuits execute few if any software instructions, they often process data one or more orders of magnitude faster than thehost processor 12. Furthermore, because the logic circuits are instantiated on one or more PLICs, one can modify these circuits merely by modifying the firmware stored in thememory 52; that is, one need not modify the hardware components of theaccelerator 14 or the interconnections between these components. The operation of the peer-vector machine 10 is further discussed in previously incorporated U.S. Patent Publication No. 2004/0133763, the functional topology and operation of thehost processor 12 is further discussed in previously incorporated U.S. Patent Publication No. 2004/0181621, and the topology and operation of theaccelerator 14 is further discussed in previously incorporated U.S. Patent Publication No. 2004/0136241. -
FIG. 2 is a diagram of apipeline unit 50 of the pipelinedaccelerator 14 ofFIG. 1 according to an embodiment of the invention. - The
unit 50 includes acircuit board 52 on which are disposed thefirmware memory 22, a platform-identification memory 54, abus connector 56, adata memory 58, and aPLIC 60. - As discussed above in conjunction with
FIG. 1 , thefirmware memory 22 stores the configuration firmware that thePLIC 60 downloads to instantiate one or more logic circuits. - The
platform memory 54 stores a value that identifies the one or more platforms with which thepipeline unit 50 is compatible. Generally, a platform specifies a unique set of physical attributes that a pipeline unit may possess. Examples of these attributes include the number of external pins (not shown) on thePLIC 60, the width of thebus connector 56, the size of the PLIC, and the size of the data memory. Consequently, apipeline unit 50 is compatible with a platform if the unit possesses all of the attributes that the platform specifies. So apipeline unit 50 having abus connector 56 with thirty-two bits is incompatible with a platform that specifies a bus connector with sixty-four bits. Some platforms may be compatible with the peer vector machine 10 (FIG. 1 ), and others may be incompatible. Therefore, the platform identifier stored in thememory 54 may allow the host processor 12 (FIG. 1 ) to determine whether thepipeline unit 50 is compatible with the platforms supported by themachine 10. And where thepipeline unit 50 is so compatible, the platform identifier may also allow thehost processor 12 to determine how to configure thePLIC 60 or other portions of the pipeline unit. - The
bus connector 56 is a physical connector that interfaces thePLIC 60, and perhaps other components of thepipeline unit 50, with thepipeline bus 20 ofFIG. 1 . - The
data memory 58 acts as a buffer for storing data that thepipeline unit 50 receives from the host processor 12 (FIG. 1 ) and for providing this data to thePLIC 60. Thedata memory 58 may also act as a buffer for storing data that thePLIC 60 generates for sending to thehost processor 12, or as a working memory for thehardwired pipelines 44. - Instantiated on the
PLIC 60 are logic circuits that compose the hardwired pipeline(s) 44 and ahardware interface layer 62, which interfaces the hardwired pipelines to the external pins (not shown) of thePLIC 60, and which thus interfaces the pipelines to the pipeline bus 20 (via the connector 56), the firmware and platform-identification memories data memory 58. Because the topology ofinterface layer 62 is primarily dependent upon the attributes specified by the platform(s) with which thepipeline unit 50 is compatible, one can often modify the pipeline(s) 44 without modifying the interface layer. For example, if a platform with which thepipeline unit 50 is compatible specifies a thirty-two-bit bus, then theinterface layer 62 provides a thirty-two-bit bus connection to thebus connector 60 regardless of the topology or other attributes of the pipeline(s) 44. Consequently, as discussed below in conjunction withFIGS. 7-10 , an embodiment of the computer-based design tool allows one to design and debug the pipeline(s) 44 independently of theinterface layer 62, and vice versa. - Still referring to
FIG. 2 , alternate embodiments of thepipeline unit 50 are contemplated. For example, thememory 54 may be omitted, and the platform identifier may stored in thefirmware memory 22, or by a jumper-configurable or hardwired circuit (not shown). - A pipeline unit similar to the
unit 50 is discussed in previously incorporated U.S. Patent Publication No. 2004/0136241. -
FIG. 3 is a diagram of the hardware layers that compose thehardware interface layer 62 within thePLIC 60 ofFIG. 2 according to an embodiment of the invention. Thehardware interface layer 62 includes three layers of circuitry that is instantiated on the PLIC 60: an interface-adapter layer 70, a framework-services layer 72, and acommunication layer 74, which is hereinafter called a communication shell. The interface-adapter layer 70 includes circuitry, e.g., buffers and latches, that interfaces the framework-services layer 72 to the external pins (not shown) of thePLIC 60. The framework-services layer 72 provides a set of services to the hardwired pipeline(s) 44 via thecommunication shell 74. For example, thelayer 72 may synchronize data transfer between the pipeline(s) 44, the pipeline bus 20 (FIG. 1 ), and the data memory 58 (FIG. 2 ), and may control the sequence(s) in which the pipeline(s) operate. Thecommunication shell 74 includes circuitry, e.g., latches, that interface the framework-services layer 72 to the pipeline(s) 44. - Still referring to
FIG. 3 , alternate embodiments of the hardware-interface layer 62 are contemplated. For example, although the framework-services layer 72 is shown as isolating the interface-adapter layer 70 from thecommunication shell 74, the interface-adapter layer may, at least at some circuit nodes, be directly coupled to the communication shell. Furthermore, although thecommunication shell 74 is shown as isolating the interface-adapter layer 70 and the framework-services layer 72 from the hardwired pipeline(s) 44, the interface-adapter layer or the framework-services layer may, at least at some circuit nodes, be directly coupled to the pipeline(s). -
FIG. 4 is a schematic block diagram of the circuitry that composes the interface-adapter layer 70 and the framework-services layer 72 ofFIG. 3 according to an embodiment of the invention. - A
communication interface 80 and an optional industry-standard bus interface 82 compose the interface-adapter layer 70, and acontroller 84,exception manager 86, and configuration manager 88 compose the framework-services layer 72. - The
communication interface 80 transfers data between a peer, such as the host processor 12 (FIG. 1 ) or another pipeline unit 50 (FIG. 2 ), and thefirmware memory 22, the platform-identifier memory 54, thedata memory 58, and the following components instantiated within the PLIC 60: the hardwired pipelines 44 (via the communication shell 74), thecontroller 86, the exception manager 88, and the configuration manager 90. If present, the optional industry-standard bus interface 82 couples thecommunication interface 80 to thebus connector 56. Alternatively, theinterfaces 80 and 82 may be combined such that the functionality of the interface 82 is included within thecommunication interface 80. - The
controller 84 synchronizes the hardwired pipelines 44 1-44 n and monitors and controls the sequence in which they perform the respective data operations in response to communications, i.e., “events,” from other peers. For example, a peer such as thehost processor 12 may send an event to thepipeline unit 50 via thepipeline bus 20 to indicate that the peer has finished sending a block of data to the pipeline unit and to cause the hardwired pipelines 44 1-44 n to begin processing this data. An event that includes data is typically called a message, and an event that does not include data is typically called a “door bell.” - The
exception manager 86 monitors the status of the hardwired pipelines 44 1-44 n, thecommunication interface 80, thecommunication shell 74, thecontroller 84, and the bus interface 82 (if present), and reports exceptions to the host processor 12 (FIG. 1 ). For example, if a buffer (not shown) in thecommunication interface 80 overflows, then theexception manager 86 reports this to thehost processor 12. The exception manager may also correct, or attempt to correct, the problem giving rise to the exception. For example, for an overflowing buffer, theexception manager 86 may increase the size of the buffer, either directly or via the configuration manager 88 as discussed below. - The configuration manager 88 sets the “soft” configuration of the hardwired pipelines 44 1-44 n, the
communication interface 80, thecommunication shell 74, thecontroller 84, theexception manager 86, and the interface 82 (if present) in response to soft-configuration data from the host processor 12 (FIG. 1 ). As discussed in previously incorporated U.S. Patent Publication No. 2004/0133763, the “hard” configuration of a component within thePLIC 60 denotes the actual instantiation, on the transistor and circuit-block level, of the component, and the soft configuration denotes the physical parameters (e.g., data width, table size) of the instantiated component. That is, soft-configuration data is similar to the data that one can load into a register of a processor (not shown inFIG. 4 ) to set the operating mode (e.g., burst-memory mode) of the processor. For example, thehost processor 12 may send to thePLIC 60 soft-configuration data that causes the configuration manager 88 to set the number and respective priority levels of queues (not shown) within thecommunication interface 80. Theexception manager 86 may also send soft-configuration data that causes the configuration manager 88 to, e.g., increase the size of an overflowing buffer in thecommunication interface 80. - The
communication interface 80, optional industry-standard bus interface 82,controller 84,exception manager 86, and configuration manager 88 are further discussed in previously incorporated U.S. Patent Publication No. 2004/0136241. - Referring again to
FIG. 2 , although thepipeline unit 50 is disclosed as including only onePLIC 60, the pipeline unit may include multiple PLICs. For example, as discussed in previously incorporated U.S. Patent Publication No. 2004/0136241, thepipeline unit 50 may include two interconnected PLICs, where the circuitry that composes the interface-adapter layer 70 and framework-services layer 72 is instantiated on one of the PLICs, and the circuitry that composes thecommunication shell 74 and thehardwired pipelines 44 is instantiated on the other PLIC. -
FIG. 5 is a diagram of a hardware-description file 100 from which a conventional PLIC synthesizer and router tool (not shown) can generate the configuration firmware for thePLIC 60 ofFIGS. 2-4 according to an embodiment of the invention. Typically, the hardware-description file 100 includes templates that are written in a conventional hardware description language (HDL) such as Verilog® HDL. The top-down structure of thefile 100 resembles the top-down structure of software source code that incorporates software objects. Such a top-down structure for software source code provides at least two advantages. First, it allows a programmer to avoid writing and debugging source code for a function when a software object that performs the function has already been written and debugged. Second, it allows the programmer to change or add a function by modifying an existing object or writing a new object with little or no rewriting and debugging of the source code that incorporates the object. As discussed below, the top-down structure of thefile 100 provides similar advantages. For example, it allows one to incorporate in thefile 100 existing templates that define an already-debugged hardware-interface layer 62 (FIGS. 2-3 ). Furthermore, it allows one to change an existinghardwired pipeline 44 or to add to a circuit a newhardwired pipeline 44 with little or no rewriting and debugging of the templates that define thelayer 62. - The hardware-
description file 100 includes a top-level template 101, which includes respective top-level definitions adapter layer 70, the framework-services layer 72, and the communication shell 74 (collectively the hardware-interface layer 62) of the PLIC 60 (FIGS. 2-4 ). Thetemplate 101 also defines the connections between the external pins (not shown) of thePLIC 60 and the interface-adapter 70 (and in some cases the framework-services layer 72), and also defines the connections between the framework-services layer (and in some cases the interface-adapter layer) and thecommunication shell 74. - The top-
level definition 102 of the interface-adapter layer 70 (FIGS. 3-4 ) incorporates an interface-adapter-layer template 108, which further defines the portions of the interface-adapter layer defined by the top-level definition 102. For example, suppose that the top-level definition 102 defines a data-input buffer (not shown) in terms of its input and output nodes. That is, suppose the top-level definition 102 defines the data-input buffer as a functional block having defined input and output nodes. Thetemplate 108 defines the circuitry that composes this functional buffer block, and defines the connections between this circuitry and the buffer input nodes and output nodes recited in the top-level definition 102. Furthermore, thetemplate 108 may incorporate one or more lower-level templates 109 that further define the data buffer or other components of the interface-adapter layer 70 recited in thetemplate 108. Moreover, these one or more lower-level templates 109 may each incorporate one or more even lower-level templates (not shown), and so on, until all portions of the interface-adapter layer 70 are defined in terms of circuit components (e.g., flip-flops, logic gates) that the PLIC synthesizing and routing tool (not shown) recognizes. - Similarly, the top-
level definition 104 of the framework-services layer 72 (FIGS. 3-4 ) incorporates a framework-services-layer template 110, which further defines the portions of the framework-services layer defined by thedefinition 104. For example, suppose the top-level definition 104 defines a counter (not shown) in terms of its input and output nodes. Thetemplate 110 defines the circuitry that composes this counter, and defines the connections between this circuitry and the counter input and output nodes recited by the top-level definition 104. Furthermore, thetemplate 110 may incorporate a hierarchy of one or more lower-level templates 111 and even lower-level templates (not shown), and so on, such that all portions of the framework-services layer 72 are, at some level of the hierarchy, defined in terms of circuit components (e.g., flip-flops, logic gates) that the PLIC synthesizing and routing tool recognizes. For example, suppose thetemplate 110 defines the counter as including a count-up/down-selector circuit having input and output nodes. Thetemplate 110 may incorporate a lower-level template 111 that defines the circuitry within the selector circuit and defines the connections between this circuitry and the selector circuit's input and output nodes defined by thetemplate 110. - Likewise, the top-
level definition 106 of the communication shell 74 (FIGS. 3-4 ) incorporates a communication-shell template 112, which further defines the portions of the communication shell defined by thedefinition 106 and which also includes a top-level definition 113 of the hardwired pipeline(s) 44 disposed within the communication shell. For example, thedefinition 113 defines the connections between thecommunication shell 74 and the hardwired pipeline(s) 44. - The top-
level definition 113 of the hardwired pipeline(s) 44 (FIGS. 3-4 ) incorporates one or more hardwired-pipeline templates 114, which further define the portions of the hardwired pipeline(s) 44 defined by thedefinition 113. The template ortemplates 114 may each incorporate a hierarchy of one or more lower-level templates 115 and even lower-level templates (not shown) such that all portions of the respective pipeline(s) 44 are, at some level of the hierarchy, defined in terms of circuit components (e.g., flip-flops, logic gates) that the PLIC synthesizing and routing tool recognizes. - Moreover, the communication-
shell template 112 may incorporate a hierarchy of one or more lower-level templates 116 and even lower-level templates (not shown) such that all portions of thecommunication shell 74 other than the hardwired pipeline(s) 44 are, at some level of the hierarchy, defined in terms of circuit components (e.g., flip-flops, logic gates) that the PLIC synthesizing and routing tool recognizes. - Still referring to
FIG. 5 , a configuration template 118 provides definitions for one or more parameters having values that one can set to configure the circuitry that thetemplates level templates FIG. 4 ) is configurable to have either a thirty-two-bit or a sixty-four-bit interface with thebus connector 56. The configuration template 118 defines a template BUS-WIDTH, the value of which determines the width of the interface between the interface 82 and theconnector 56. For example, BUS-WIDTH=0 configures the interface 82 to have a thirty-two-bit interface, and BUS-WIDTH=1 configures the interface 82 to have a sixty-four-bit interface. Examples of other parameters that may be configurable include the depth of a first-in-first-out (FIFO) data buffer (not shown) disposed within the framework-services layer 72 (FIGS. 2-4 ), the lengths of messages received and transmitted by the interface-adapter layer 70, and the precision and data structure (e.g., integer, floating-point) of the hardwired pipeline(s) 44. - One or more of the
templates adapter layer 70, the framework-services layer 72, thecommunication shell 74, and the hardwired pipeline(s) 44 (FIGS. 3-4 ) according to the values in the template 118 during the synthesis of this circuitry. Consequently, to reconfigure the circuit parameters represented by the parameters in the configuration template 118, one need only modify the values of these parameters in the template 118, and then rerun the synthesizer and router tool on thefile 100. Alternatively, if one or more of the parameters in the configuration template 118 can be sent to the PLIC as soft-configuration data after instantiation of the circuit, then one can modify the corresponding circuit parameters by merely modifying the soft-configuration data. Therefore, according to this alternative, may avoid rerunning the synthesizer and router tool on thefile 100. Moreover, templates (e.g., 101, 108, 109, 110, 111, 112, 114, 115, and 116) that do not incorporate settable parameters such as those provided by the configuration template 118 are sometimes called modules or entities, and are typically lower-level templates that include Boolean expressions that a synthesizer and router tool (not shown) converts into circuitry for implementing the expressions. - Alternate embodiments of the hardware-
description file 100 are contemplated. For example, although described as defining circuitry for instantiation on a PLIC, thefile 100 may define circuitry for instantiation on an ASIC. -
FIG. 6 is a block diagram of alibrary 120 that stores PLIC circuit templates, such as thetemplates FIG. 5 , according to an embodiment of the invention. - The
library 120 has m+1 sections: m sections 122 1-122 m for the respective m platforms that the library supports, and asection 124 for the hardwired-pipelines 44 (FIGS. 2-4 ) that the library supports. - For example purposes, the library section 122 1 is discussed in detail, it being understood that the other library sections 122 2-122 m are similar.
- The library section 122 1 includes a top-
level template 101 1, which is similar in structure to thetemplate 101 ofFIG. 5 , and which thus includes top-level definitions adapter layer 70, the framework-services layer 72, and thecommunication shell 74 that are compatible with the platform m=1. - In this embodiment, we assume that there is only one version of the interface-
adapter layer 70 and one version of the framework-services layer 72 available for each platform m, and, therefore, that the library section 122 1 includes only one interface-adapter-layer template 108 1 and only one framework-services-layer template 110 1. But in an embodiment that includes multiple versions of the interface-adapter layer 70 and multiple versions of the framework-services layer 72 for each platform m, the library section 122 1 would include multiple interface-adapter- and framework-services-layer templates - The library section 122 1 also includes n communication-shell templates 112 1,1-112 1,n, which respectively correspond to the hardwired-pipeline templates 114 1-114 n in the
library section 124. As stated above in conjunction withFIG. 3 , thecommunication shell 74 interfaces a hardwired pipeline or hardwired-pipelines 44 to the framework-services layer 72. Because eachhardwired pipeline 44 is different and typically has different interface specifications, thecommunication shell 74 is typically adapted for each hardwired pipeline. Consequently, in this embodiment, one provides design adjustments to create a unique version of thecommunication shell 74 for eachhardwired pipeline 44. The designer provides these design adjustments by writing a unique communication-shell template 112 for each hardwired pipeline. Of course the group of communication-shell templates 112 1,1-112 1,n corresponds only to the version of the framework-services layer 72 that is defined by thetemplate 110 1; consequently, if there are multiple versions of the framework-services layer 72 that are compatible with the platform m=1, then the library section 122 1 includes a respective group of n communication-shell templates 112 for each version of the framework-services layer. - In addition, the library section 122 1 includes a configuration template 118 1, which defines configuration constants having designer-selectable values as discussed above in conjunction with the configuration template 118 of
FIG. 5 . - Furthermore, each template within the library section 122 1 includes, or is associated with, a respective description 126 1-134 1. The descriptions 126 1-132 1,n describe the operational and other parameters of the circuitry that the
respective templates FIGS. 7-11 uses the descriptions 126 1-134 1 to design and simulate a circuit that includes a combination of the hardwired pipelines 44 1-44 n, which are respectively defined by the templates 114 1-114 n. Examples of parameters that the descriptions 126 1-132 1,n may describe include the width of the data bus and the depths of buffers that the circuit defined by the corresponding template includes, the latency of the circuit, and the precision of the values received and generated by the circuit. Furthermore, an example of a settable parameter and the associated selectable values that the description 134 1 may describe is BUS-WIDTH, which represents the width of the interface between thecommunication interface 80 and the bus connector 56 (FIG. 4 ), and BUS_WIDTH=0 sets the bus width to thirty-two bits and BUS_WIDTH=1 sets the width to sixty-four bits. - Each of the descriptions 126 1-134 1 may be embedded within the
respective template template 108 1 as extensible markup language (XML) tags or comments that are readable by both a human and the tool discussed below in conjunction withFIGS. 7-11 . - Alternatively, each description 126 1-134 1 may be disposed in a separate file that is linked to the template to which the description corresponds, and this file may be written in a language other than XML. For example, the
description 126 1 may be disposed in a file that is linked to the top-level template 101 1. - The section 122 1 of the
library 120 also includes a description 136 1, which describes the parameters of the platform m=1. The design tool discussed below in conjunction withFIGS. 7-11 may use the description 136 1 to determine which platforms thelibrary 120 supports. Examples of parameters that the description 136 1 may describe include 1) for each interface, the message specification, which lists the transmitted variables and the constraints for those variables, and 2) a behavior specification and any behavior constraints. Messages that the host processor 12 (FIG. 1 ) sends to the pipeline units 50 (FIG. 2 ) and that the pipeline units send among themselves are further discussed in previously incorporated U.S. Patent Publication No. 2004/0181621. Examples of other parameters that the description 136 1may describe include the size and resources (e.g., the number of multipliers and the amount of available memory) of the PLIC 60 (FIGS. 2-4 ). Furthermore, the platform description 136 1 may be written in XML or in another language. - Still referring to
FIG. 6 , thesection 124 of thelibrary 120 includes n hardwired-pipeline templates 114 1-114 n, which each define a respective hardwired pipeline 44 1-44 n (FIGS. 2-4 ). As discussed above in conjunction withFIG. 5 , because the templates 114 1-114 n are platform independent (the corresponding communication-shell templates 112 m,1-112 m,n define the specified interface to the interface-adapter and framework-services layers FIGS. 3-4 ), thelibrary 120 stores only onetemplate 114 for each hardwired pipeline 44 (FIGS. 2-4 ). That is, eachhardwired pipeline 44 does not require aseparate template 114 for each platform that thelibrary 120 supports. As discussed above, an advantage of this top-down design is that one need only create asingle template 114 to define ahardwired pipeline 44, not m templates. - Furthermore, each hardwired-
pipeline template 114 includes, or is associated with, a respective description 138 1-138 n, which describes the parameters of the hardwired-pipeline 44 that the template defines. Like the descriptions 126 1-134 1 discussed above, the design tool discussed below in conjunction withFIGS. 7-11 uses the descriptions 138 to design and simulate a circuit that includes a combination of the hardwired pipelines 44 1-44 n, which are respectively defined by the templates 114 1-114 n. Examples of parameters that the descriptions 138 1-138 n may describe include the type (e.g., floating point or integer) and precision of the data that the correspondinghardwired pipeline 44 can receive and generate, and the latency of the pipeline. Also like the descriptions 126 1-134 1, each of the descriptions 138 1-138 n may be embedded within the respective template 114 1-114 n to which the description corresponds as, e.g., XML tags, or may be disposed in a separate file that is linked to the template to which the description corresponds. - Referring again to the library section 122 1, this section also includes a description 140 of the one or more available pipeline accelerators 14 (
FIG. 1 ) that support the platform m=1. More specifically, the description 140 describes the resources that each of thepipeline accelerators 14 includes. For example, the description 140 may indicate that oneavailable accelerator 14 includes only one pipeline unit 50 (FIG. 2 ), while another available accelerator includes five pipeline units. The description 140 may be written in XML or in another language. - Still referring to
FIG. 6 , alternate embodiments of thelibrary 120 are contemplated. For example, instead of each template within each library section 122 1-122 m being associated with a respective description 126-134, each library section 122 1-122 m may include a single description that describes all of the templates within that library section. For example, this single description may be embedded within or linked to the top-level template 101 or the configuration template 118. Furthermore, although each library section 122 1-122 m is described as including a respective communication-shell template 112 for each hardwired-pipeline template 114 in thelibrary section 124, each section 122 may include fewer communication-shell templates, at least some of which are compatible with, and thus correspond to, more than onepipeline template 114. In an extreme, each library section 122 1-122 m may include only a single communication-shell template 112, which is compatible with all of the hardwired-pipeline templates 114 in thelibrary section 124. In addition, thelibrary section 124 may include respective versions of eachpipeline template 114 for each communication-shell template 112 in the library sections 122 1-122 m. -
FIG. 7 is a block diagram of a circuit-design system 150, which includes a computer-basedsoftware tool 152 for designing a circuit using templates from thelibrary 120 ofFIG. 6 according to an embodiment of the invention. By using library templates, thetool 152 allows one to design a circuit that includes a combination of one or more previously designed and debugged hardware-interface layers 62 (FIG. 2 ) and hardwired pipelines 44 (FIGS. 2-4 ). Because another has already tested and debugged the one ormore layers 62 andpipelines 44, thetool 152 may significantly decrease the time required for one to design such a combination circuit as compared to a conventional design progression. Furthermore, where one wants to design a circuit for executing an algorithm, thetool 152 allows him to define the circuit with an expression of conventional mathematical symbols, where the expression defines the algorithm; consequently, one having little or no experience in circuit design can use the tool to design a circuit for executing an algorithm. - The
system 150 includes a processor (not shown) for executing the software code that composes thetool 152. Consequently, in response to the code, the processor performs the functions that are attributed to thetool 152 in the discussion below. But for clarity of explanation, thetool 152, not the processor, is described as performing the actions. - In addition to the processor, the
system 150 includes aninput device 154, adisplay device 155, and thelibrary 120 ofFIG. 6 . Theinput device 154, which may include a keyboard and a mouse, allows one to provide to thetool 152 information that describes an algorithm and that describes a circuit for executing the algorithm. Such information may include an expression of mathematical symbols, circuit parameters (e.g., buffer width, latency), operation exceptions (e.g., a divide by zero), and the platform on which one wishes to instantiate the circuit. And as described below, thedevice 155 displays the input information and other information, and thelibrary 120 includes the templates that thetool 152 uses to build the circuit and to generate a file that defines the circuit. - The
tool 152 includes a symbolic-mathfront end 156, aninterpreter 158, agenerator 160 for generating afile 162 that defines a circuit, and asimulator 164. - The
front end 156 receives from theinput device 154 the mathematical expression that defines the algorithm that the circuit is to execute and other design information, and converts this information into a form that is readable by theinterpreter 158. To allow one to define a circuit in terms of the mathematical expression that defines the algorithm that the circuit is to execute, in one embodiment thefront end 156 includes a web browser that accepts XML with a schema for Math Markup Language (MathML). MathML is software standard that allows one to enter expressions using conventional mathematical symbols. The schema of MathML is a conventional plug in that imparts to a web browser this same ability, i.e., the ability to enter expressions using mathematical symbols. Alternatively, thefront end 156 may utilize another technique for allowing one to define a circuit using a mathematical expression. Examples of such another technique include the technique used by the conventional software mathematical-expression solver MathCAD. Furthermore, as discussed below, one may enter the identity of a platform or pipeline accelerator 14 (FIG. 1 ) on which he wants the circuit instantiated, and may enter test data with which thesimulator 164 will simulate the operation of the circuit. Moreover, one may enter valid-range constraints for any variables within the entered mathematical expression and constraints on execution of the expression, and may specify the action(s) to be taken if the constraints are violated. For example, because −1≦sin(x)≦1 for all values of x, for an expression that includes sin(x), one may enter this constraint, and specify that any data generated from a value of sin(x) outside of this range is to be disregarded. Or, because division by zero of any x yields infinity, one may specify that data generated in response to a division by zero is to be disregarded. Thefront end 156 then converts all of the entered information into a format, such as HDL, that is compatible with theinterpreter 158. Moreover, as discussed above, thefront end 156 may cause thedevice 155 to display the input information and other related information. For example, thefront end 156 may cause thedevice 155 to display the mathematical expression that the designer enters to define the algorithm to be executed by the circuit. - The
interpreter 158 parses the information from thefront end 156 and determines: 1) whether thelibrary 120 includes templates 114 (FIG. 6 ) defining hardwired pipelines 44 (FIGS. 2-4 ) that, when combined, can execute the algorithm entered by the designer, and 2), if the answer to (1) is “yes,” which, if any, available pipeline accelerators 14 (FIG. 1 ) described by the description 140 in thelibrary 120 has sufficient resources to instantiate a circuit that can execute the algorithm. For example, suppose the algorithm includes the mathematical operation √{square root over (v)}. If thelibrary 120 does not include a template 114 (FIG. 6 ) defining a hardwired pipeline 44 (FIGS. 2-4 ) that calculates the square root of a value, then theinterpreter 158 determines that thetool 152 cannot generate afile 162 that defines a circuit for executing the algorithm. Furthermore, suppose that the circuit for executing the algorithm requires the resources of at least five PLICs 60 (FIGS. 2-4 ). If the description 140 indicates that theavailable accelerators 14 each have only three pipeline units 50 (FIG. 2 ), and thus each have only threePLICs 60, then theinterpreter 158 determines that even though thetool 152 may be able to generate afile 162 that defines a circuit for executing the algorithm, one cannot implement this circuit on an available accelerator. Theinterpreter 158 makes a similar determination if the designer indicates that he wants the algorithm executed by a circuit having a sixty-four-bit bus width, but the available platforms support only a thirty-two-bit bus width. In situations where theinterpreter 158 determines that thetool 152 cannot generate a circuit for executing the desired algorithm or that one cannot implement the circuit on an existing platform and/oraccelerator 14, theinterpreter 158 causes thedevice 155 to display an appropriate error message (e.g., “no library template for instantiating “√{square root over (v)},” “insufficient PLIC resources,” “bus-width not supported”). Furthermore, where the designer identifies a platform oraccelerator 14 on which he desires to instantiate the resulting circuit, theinterpreter 158 determines whether the circuit can be instantiated on the identified platform or accelerator. But if the circuit cannot be so instantiated, theinterpreter 158 may determine that the circuit can be instantiated on another platform or accelerator, and thus may so inform the designer with an appropriate message via thedisplay device 155. This allows the designer the choice of instantiating the circuit on another platform oraccelerator 14. - If the
interpreter 158 determines that thelibrary 120 includes a sufficient number of hardwired-pipeline templates 114 (FIG. 6 ) to define a circuit that can execute the desired algorithm, and also determines that the circuit can be instantiated on an available platform and accelerator 14 (FIG. 1 ), then the interpreter provides to thefile generator 160 the identities of the hardwired-pipeline templates 114 that correspond to portions of the algorithm. - The
file generator 160 combines the hardwired pipelines 44 (FIGS. 2-4 ) defined by the identified hardwired-pipeline templates 114 such that the combination forms a circuit that can execute the algorithm. - The
generator 160 then generates thefile 162, which defines the circuit for executing the algorithm in terms of the hardwired pipelines 44 (FIGS. 2-4 ) and the hardware-interface layers 62 (FIG. 2 ) that compose the circuit, the PLIC(s) 60 (FIGS. 2-3 ) on which the pipelines are disposed, and the interconnections between the pipelines (if multiple pipelines on a PLIC) and/or between the PLICs (if the pipelines are disposed on more than one PLIC). - Next, the host processor 12 (
FIG. 1 ) can use thefile 162 to instantiate on the pipeline accelerator 14 (FIG. 1 ) the defined circuit as discussed in previously incorporated U.S. patent application Ser. No. (Attorney Docket No. 1934-25-3). Alternatively, also as discussed in U.S. patent application Ser. No. (Attorney Docket No. 1934-25-3), thehost processor 12 may instantiate some or all portions of the defined circuit in software executed by theprocessing unit 32. Or, one can instantiate the circuit defined by thefile 162 in another manner. - The
simulator 164 receives thefile 162 from thegenerator 160 and receives from thefront end 154 designer-entered test data, such as a test vector, designer-entered constraint data, and a designer-entered exception-handling protocol, and then simulates operation of the circuit defined by thefile 162. Thesimulator 164 also gathers parameter information (e.g., precision, latency) from the description files 138 (FIG. 6 ) that correspond to the hardwired-pipeline templates 114 that define thepipelines 44 that compose the circuit. Thesimulator 164 may retrieve this parameter information directly from thelibrary 120, or thegenerator 160 may include this parameter information in thefile 162. -
FIG. 8 illustrates the parsing of a symbolic mathematical expression by theinterpreter 158 according to an embodiment of the invention. In other words, the syntax of the design language is the same as that used by mathematicians for writing algebraic equations. The explanations that follow show how a symbolic mathematical expression is a sufficient syntax for defining thehardwired pipelines 44 from a simple set of circuit primitives. -
FIG. 9 illustrates a table of hardwired-pipeline templates 114, which correspond to the hardwired pipelines 44 (FIGS. 2-4 ) that the interpreter 158 (FIG. 7 ) identifies for executing portions of the parsed algorithm (FIG. 8 ) according to an embodiment of the invention. - Referring to
FIGS. 5-9 , the operation of thetool 152 is discussed according to an embodiment of the invention. - Suppose that one wishes to design a circuit that solves for a value y, which equals a mathematical expression according to the following equation:
y=√{square root over (x 4 cos(z)+z 3 sin(x))} (2)
Also suppose that x, y, and z are thirty-two-bit floating-point values. - Using the
input device 154, the designer enters equation (2) into thefront end 156 of thetool 152 by entering the following sequence of mathematical symbols: “√”, “x4”, “·”, “cos(z)”, “+”, “z3”, “·”, and “sin(x)”. The designer also enters information specifying the input and output message specifications, for example indicating that x, y, and z are thirty-two-bit floating-point values. The designer may also enter information indicating desired operating parameters, such as the desired latency, in clock cycles, from inputs x and z to output y, and the desired types and precision of any intermediate values, such as cos(z) and sin(x), generated during the calculation of y. Furthermore, the designer may enter information that identifies a desired platform or pipeline accelerator 14 (FIG. 1 ) on which he wants the circuit instantiated. Moreover, the designer may specify the accuracy of any mathematical approximations that thetool 152 may make. For example, if thetool 152 approximates cos(z) using a Taylor series expansion, then by specifying the accuracy of this approximation, the designer effectively specifies the number of terms needed in the expansion. Alternatively, the designer may directly specify the number of terms in the expansion. The implementation of a function as a Taylor series expansion is further described below in conjunction withFIGS. 13-17 . - The
front end 156 converts these mathematical symbols and the other information into a format compatible with theinterpreter 158 if this information is not already in a compatible format. - Next, the
interpreter 158 determines whether any of the hardwired-pipeline templates 114 in thelibrary 120 defines ahardwired pipeline 44 that can solve for y in equation (2) within the specified behavior and operating parameters and that can be instantiated within the desired platform and on the desired pipeline accelerator 14 (FIG. 1 ). - If the
library 120 does include such atemplate 114, then theinterpreter 158 informs the designer, via thedisplay device 155, that a conventional FPGA synthesizing and routing tool can generate firmware for instantiating thishardwired pipeline 44 from the identifiedtemplate 114, the corresponding communication-shell template 112, and the corresponding top-level template 101. - If, however, the
library 120 includes notemplate 114 that defines ahardwired pipeline 44 that can solve for y in equation (2), then theinterpreter 158 parses the equation (2) into portions, and determines whether the library includestemplates 114 that definehardwired pipelines 44 for executing these portions within the specified behavior, operating parameters, and platform and on the specified pipeline accelerator 14 (FIG. 1 ). - To identify a circuit that can solve for y in equation (2) but that includes the fewest number of
hardwired pipelines 44, theinterpreter 158 parses the equation (2) according to a top-down parsing sequence as discussed below. Typically, this top-down parsing sequence corresponds to the known algebraic laws for the order of operations. - First, the
interpreter 158 parses the equation (2) into the following two portions: “√”, which isportion 170 inFIG. 8 , and “x4 cos(z)+z3 sin(x)”, which isportion 172. - If the
interpreter 158 determines that thelibrary 120 includes at least two hardwired-pipeline templates 114 that definehardwired pipelines 44 for respectively executing theportions file generator 160. - In this example, however, the
interpreter 158 determines that although thelibrary 120 includes a hardwired-pipeline template 114 that defines apipeline 44 for executing the square-root operation 170 of equation (2), the library includes no hardwired-pipeline template that defines a pipeline for executing theportion 172. - Next, the
interpreter 158 parses theportion 172 of equation (2). Specifically, the interpreter. 158 parses theportion 172 into the following threerespective portions - If the
interpreter 158 determines that thelibrary 120 includes at least three hardwired-pipeline templates 114 that definehardwired pipelines 44 for respectively executing theportions file generator 160. - In this example, however, the
interpreter 158 determines that although thelibrary 120 includes a hardwired-pipeline template 114 that defines ahardwired pipeline 44 for executing the summingoperation 176 of equation (2), the library includes notemplates 114 that define hardwired pipelines for executing theportions - Next, the
interpreter 158 parses theportions interpreter 158 parses theportion 174 into three portions 180 (“x4⇄), 182 (“·”), and 184 (“cos(z)”), and parses theportion 178 into three portions 186 (“z3”), 188 (“·”), and 190 (“sin(x)”). - If the
interpreter 158 determines that thelibrary 120 does not include hardwired-pipeline templates 114 that definehardwired pipelines 44 for respectively executing each of theportions device 155 an error message indicating that the library does not support a circuit that can solve for y in equation (2). In one embodiment of the invention, however, thelibrary 120 includes hardwired-pipeline templates 114 that provide the primitive operations for multiplication and for raising variables to a power (e.g., cubing a value by using two multipliers in sequence) for single- or double-precision floating-point data types, and for data-type conversion. Also in this embodiment, thetool 152 recognizes common factors, for example that x is a factor of x3 if sin(x3) was needed instead of the sin(x), and generates circuitry to provide these common factors from chained multipliers. - In this example, however, the
interpreter 158 determines that thelibrary 120 includes hardwired-pipeline templates 114 that definehardwired pipelines 44 for respectively executing eachportion - Then, the
interpreter 158 provides to thefile generator 160 the identities of all the hardwired-pipeline templates 114 that define the hardwired-pipelines 44 for executing the following eight portions of equation (1): 170 (“√”), 176 (“+”),180 (x4”), 182 (“·”),184 (“cos(z)”), 186 (“z3”), 188 (“·”), and 190 (“sin(x)”). - Referring to
FIGS. 6-10 , thefile generator 160 generates a table 192 (FIG. 9 ) of the hardwired-pipeline templates 114 identified by theinterpreter 158, and displays this table via thedevice 155. In afirst column 194, the table 192 lists the portions 170 (“√”), 176 (“+”),180 (“x4”), 182 (“·”), 184 (“cos(z)”), 186 (“z3”), 188 (“·”), and 190 (“sin(x)”) of equation (2). In asecond column 196, the table 192 lists the hardwired-pipeline template ortemplates 114 that define ahardwired pipeline 44 for executing the respective portion of equation (2). And in athird column 198, the table 192 lists parameters, such as the latency (in units of cycles of the signal that clocks the defined pipeline 44) and the input and output precision, of the hardwired pipeline(s) 44 defined by thetemplates 114 in thesecond column 196. As shown in the table 192, in this example the seven hardwired-pipeline templates 144 1-114 7 incolumn 196 define hardwired pipelines 44 1-44 7 for respectively executing the corresponding portions of equation (2) incolumn 194. There are only seven pipeline templates 114 1-114 7 for the eight portions of equation (2) because thetemplate 114 5 defines amultiplier pipeline 44 5 that can execute both 37 ·”portions library 120. Moreover, thelibrary 120, and thus the table 192, may includemultiple templates 114 that define respective pipelines for executing each of the eightportions - Next, using the table 192, the
file generator 160 selects thepipelines 44 from which to build a circuit that solves for y in equation (2). Thegenerator 160 selects thesepipelines 44 based on the behavior(s), operating parameter(s), platform(s), and pipeline accelerator(s) 14 (FIG. 1 ) that the designer specified. For example, if the designer specified that x, y, and z are thirty-two-bit floating-point quantities, then thegenerator 160 selectspipelines 44 that operate on thirty-two-bit floating-point numbers. If theavailable pipelines 44 for a particular portion of the equation (2) do not meet all of the designer's specifications, then thegenerator 160 may use a default set of rules to select the best pipeline. For example, the rules may indicate that if there is noavailable pipeline 44 that meets the specified latency and precision requirements, then, with the designer's authorization, thegenerator 160 defaults to the pipeline having the specified precision and the latency closest to the specified latency. Otherwise anew pipeline 44 with the specified latency is placed in the library, or the designer can select another pipeline from the table 192. As an example of satisfying the latency requirements, two versions of an x4 circuit may be represented by respective hardwired-pipeline templates 114 in the library 120: a pipelined version using two fully registered multipliers in a cascade, or an in-place version using a single, fully registered multiplier, a one-bit counter, and a multiplexer. The pipelined version consumes roughly twice the circuit resources but accepts one input value every clock cycle. In contrast, the in-place version consumes fewer circuit resources but accepts a new input value only every other clock cycle. - Then, the
file generator 160 interconnects the selectedhardwired pipelines 44 to form a circuit 200 (FIG. 10 ) that can solve for y in equation (2). Thegenerator 160 also generates a schematic diagram of thecircuit 200 for display via thedevice 155. - To form the
circuit 200, thefile generator 160 first determines how the selected hardwired pipelines 44 1-44 7 can “fit” into the resources of a specified accelerator 14 (FIG. 1 ) (or a default accelerator if the designer does not specify one). For example, thefile generator 160 calculates the number of PLICs 60 (FIG. 3 ) needed to contain the eight instances of the pipelines 44 1-44 7 (this includes two instances of the pipeline 445) - In this example, the
generator 160 determines that each PLIC 60 (FIG. 3 ) can hold only a respective one of the pipelines 44 1-44 7; consequently, thegenerator 160 determines that eight pipeline units 50 1-50 8 are needed to instantiate thecircuit 200. - Next, based on the platform that the designer specifies, the
generator 160 “inserts” into each of the PLICs 60 1-60 8 of the pipeline units 50 1-50 8 a respective hardware-interface layer 62 1-62 8. Assuming that the designer specifies platform m=1, thegenerator 160 generates the layers 62 1-62 8 from the following templates in section 122 1 of the library 120: the interface-adapter-layer template 108 1, the framework-services-layer template 110 1, and the communication-shell templates 112 1,1-112 1,7, which respectively correspond to the pipeline templates 114 1-114 7, and thus to the pipelines 44 1-44 7. More specifically, thegenerator 160 generates the hardware-interface layer 62 1 from the interface-adapter-layer template 108 1, the framework-services-layer template 110 1, and the communication-shell template 112 1,1. Similarly, thegenerator 160 generates the hardware-interface layer 62 2 from thetemplates interface layer 62 3 from thetemplates multiplier pipeline 44 5, thegenerator 160 generates both of the hardware-interface layers services templates shell template 112 1,5; consequently, the hardware-interface layers generator 160 generates the hardware-interface layer 62 7 from thetemplates interface layer 62 8 from thetemplates - Then, the
generator 160 “inserts” into each hardware-interface layer 62 1-62 8 a respective hardwired pipeline 44 1-44 7 (thegenerator 160 inserts thepipeline 44 5 into both of the hardware-interface layers pipeline 44 6 into the hardware-interface layer 62 7, and thepipeline 44 7 into the hardware-interface layer 62 8). More specifically, thegenerator 160 inserts the pipelines 44 1-44 7 into the hardware-interface layers 62 1-62 8 by respectively inserting the hardwired-pipeline templates 114 1-114 7 into the communication-shell templates 112 1,1-112 1,7. - Next, the
generator 160 interconnects the pipeline units 50 1-50 8 to form thecircuit 200, which generates the value y from equation (2) at its output (i.e., the output of the pipeline unit 50 8). - Referring to
FIG. 10 , thecircuit 200 includes aninput stage 206, first and secondintermediate stages output stage 212, and operates as follows. Theinput stage 206 includes the hardwired pipelines 44 1-44 4 and operates as follows. Thepipeline 44 1 receives a stream of values x via an input portion of the hardware-interface layer 62 1 and generates, in a pipelined fashion, a corresponding stream of values sin(x) via an output portion of thelayer 62 1. Likewise, thepipeline 40 2 receives a stream of values z via an input portion of the hardware-interface layer 62 2 and generates, in a pipelined fashion, a corresponding stream of values z3 via an output portion of thelayer 62 2, thepipeline 44 3 receives the stream of values x via an input portion of the hardware-interface layer 62 3 and generates, in a pipelined fashion, a corresponding stream of values x4 via an output portion of thelayer 62 3, and thepipeline 44 4 receives the stream of values z via an input portion of the hardware-interface layer 62 4 and generates, in a pipelined fashion, a corresponding stream of values cos(z) via an output portion of thelayer 62 4. - The first
intermediate stage 208 of thecircuit 200 includes two instantiations of thepipelines 44 5 and operates as follows. Thepipeline 44 5 in thePLIC 60 5 receives the streams of values sin(x) and z3 from theinput stage 206 via an input portion of the hardware-interface layer 62 5 and generates, in a pipelined fashion, a corresponding stream of values z3 sin(x) via an output portion of thelayer 62 5. Similarly, thepipeline 44 5 in thePLIC 60 6 receives the streams of values x4 and cos(z) from theinput stage 206 via an input portion of the hardware-interface layer 62 6 and generates, in a pipelined fashion, a corresponding stream of values x4 cos(z) via an output portion of thelayer 62 6. - The second
intermediate stage 210 of thecircuit 200 includes thehardwired pipeline 44 6, which receives the streams of values z3 sin(x) and x4 cos(z) from the firstintermediate stage 208 via an input portion of the hardware-interface layer 62 7, and generates, in a pipelined fashion, a corresponding stream of values z3 sin(x)+x4 cos(z) via an output portion of thelayer 62 7. - And the
output stage 212 of thecircuit 200 includes thehardwired pipeline 44 7, which receives the stream of values z3 sin(x)+x4 cos(z) from the secondintermediate stage 210 via an input portion of the hardware-interface layer 62 8, and generates, in a pipelined fashion, a corresponding stream of values y=√{square root over (z3 sin(x)+x4 cos(z))} via an output portion of thelayer 62 8. - Referring to
FIGS. 7, 9 , and 10, the designer may choose to alter thecircuit 200 via theinput device 154. - For example, the designer may swap out one or more of the pipelines 44 1-44 7 with one or more other pipelines from the table 192. Suppose the square-
root pipeline 44 7 has a high precision but a relatively long latency per the default rules that thegenerator 160 follows as discussed above. If the table 192 includes another square-root pipeline having a shorter latency, then the designer may replace thepipeline 44 7 with the other square-root pipeline, for example by using theinput device 154 to “drag” the other pipeline from the table into the schematic representation of thePLIC 60 8. - In addition, the designer may swap out one or more of the hardwired pipelines 44 1-44 7 with a symbolically defined polynomial series (i.e., a Taylor Series equivalent) that approximates one of the pipelined operations. Suppose the available square-
root pipeline 44 7 has insufficient mathematical accuracy per the designers' specification and the default rules that thegenerator 160 follows as discussed above. If the designer then specifies a new square-root function as a series summation of related monomials, then thefront end 156,interpreter 158, andfile generator 160 concatenate a series of parameterized monomial circuit templates into a circuit that solves for square roots. In this way the designer replaces thedefault pipeline 44 7 with the higher-precision square-root circuit using symbolic design. This example illustrates the symbolic use of polynomials to define new mathematical functions as established by Taylor's Theorem. A more detailed example is discussed below in conjunction withFIGS. 13-17 . - The designer may also change the topology of the
circuit 200. Suppose that according to the default rules discussed above, thegenerator 160 places each instantiation of the hardwired pipelines 44 1-44 7 into aseparate PLIC 60. But also suppose that eachPLIC 60 has sufficient resources to holdmultiple pipelines 44. Consequently, to reduce the number ofpipeline units 50 that thecircuit 200 occupies, the designer may, using theinput device 154, move some of thepipelines 44 into the same PLIC. For example, the designer may move both instantiations of themultiplier pipeline 44 5 out of thePLICs PLIC 60 7 with theadder pipeline 44 6, thus reducing by two the number of PLICs that thecircuit 200 occupies. The designer then manually interconnects the two instantiations of thepipeline 44 5 to thepipeline 44 6 within thePLIC 60 7, or may instruct thegenerator 160 to perform this interconnection. Although thelibrary 120 may not include a communication-shell template 112 that defines acommunication shell 74 for this combination ofmultiple pipelines services templates pipelines 44 within thePLICs 60 is also called “refactoring” thecircuit 200. - Moreover, the designer may decide to breakdown one or more of the pipelines 44 1-44 7 into multiple, less
complex pipelines 44. For example, to equalize the latencies in thestage 206 of thecircuit 200, the designer may decide to breakdown the x4 pipeline 44 3 into two x2 pipelines (not shown) and amultiplier pipeline 44 5. Or, the designer may decide to replace the sin(x)pipeline 44 1 with a combination of pipelines (not shown) that represents sin(x) in a series-expansion form (e.g. Taylor series, MacLaurin series). - Referring to
FIGS. 7 and 10 , after the designer has made any desired changes to thecircuit 200, thegenerator 160 generates thefile 162, which describes the circuit in terms of thepipeline units 50, thePLICs 60, the library templates that compose the circuit, and the interconnections between the pipeline units. Specifically, assuming that the designer has not modified thecircuit 200 from the layout shown inFIG. 10 , thefile 162 indicates that the circuit is designed for instantiation on eight pipeline units 50 1-50 8 of a pipeline accelerator 14 (FIG. 1 ) that is compatible with platform m=1. Thefile 162 also identifies the eight PLICs 60 1-60 8 on the eight pipeline units 50 1-50 8, and for each PLIC, identifies the templates in thelibrary 120 that define the circuitry to be instantiated on the PLIC. For example, referring toFIGS. 6 and 10 , thefile 162 indicates that the combination of the following templates in thelibrary 120 defines the circuitry to be instantiated on the PLIC 60 1: 101 1, 108 1, 110 1, 112 1,1, 114 1, and 116 1. Furthermore, thefile 162 includes the values of all constants defined in the configuration template 118 1. Thefile 162 may also include one or more of the descriptions 128-134 and 138 corresponding to these templates, or portions of these descriptions. Moreover, thefile 162 defines the interconnections between the PLICs 60 1-60 8 and the message specifications for these interconnections Thefile 162 also defines any designer-specified range constraints for generated values, exceptions, and exception-handline routines. Thegenerator 160 may write thefile 162 in XML or in another language with XML tags so that both humans and other tools/machines can read the file. Alternatively, thegenerator 160 may write thefile 162 in a language other than XML and without XML tags. - Referring to
FIGS. 6, 7 , 9, and 10, the designer may instruct thesimulator 164, via theinput device 154, to simulate thecircuit 200 using a conventional simulation algorithm. Thesimulator 164 uses the information in thefile 162 and the test vectors provided by the designer to simulate the operation of thecircuit 200. Thesimulator 164 first determines the operating parameters of the hardware-interface layers 62 1-62 8 and of the hardwired pipelines 44 1-44 7 from thefile 162, or by extracting this information directly from the description files 128 1, 130 1, 132 1,1-132 1,7, and 138 1-138 7 in thelibrary 120. As discussed above, these parameters include, e.g., circuit latencies, and the precision (e.g., thirty-two-bit integer, sixty-four-bit floating point) of the values that the pipelines 44 1-44 7 receive and generate. For example, from the description files 128 1, 130 1, 132 1,1, and 138 1, thesimulator 164 determines the latency of thePLIC 60 1 from the time a value x enters the hardware-interface layer 62 1 until the time that thelayer 62 1 provides sin(x) on an external pin (not shown) of thePLIC 60 1. The latency information in these description files may be estimated information, or may be actual information derived from an analysis of an instantiation of thepipeline 44 1 and the hardware-interface layer 62 1 on thePLIC 60 1. Thesimulator 164 then estimates the latencies and other operating parameters of the PLICs 60 2-60 8, and simulates the operation of thecircuit 200 to generate an output test stream of values y in response to input test streams of values x and z. -
FIG. 11 is a schematic diagram of thecircuit 200 ofFIG. 10 disposed on asingle pipeline unit 50 and in asingle PLIC 60 according to an embodiment of the invention. - Referring to
FIGS. 6, 7 , 9, and 11, the operation of thetool 152 is discussed according to another embodiment of the invention. - Following the same steps described above in conjunction with the formation of the
circuit 200 ofFIG. 10 , thegenerator 160 determines that all of the hardwired pipelines 44 1-44 7 (themultiplier pipeline 44 5 is instantiated twice) can fit within asingle PLIC 60 with the same topology shown inFIG. 10 . - Although the
library 120 includes no communication-shell templates 112 for this combination of the hardwired pipelines 44 1-44 7, for simulation purposes thetool 152 derives the operational parameters and message specifications of the hardware-interface layer 62 from the description files 128 1, 130 1, 132 1,1-132 1,4, and 132 1,7. Because thePLIC 60 incorporates the interface-adapter layer 70 and framework-services layer 72 defined by thetemplates tool 152 estimates the input and output operational parameters, e.g., input and output latencies, and the message specifications of thelayers FIGS. 10-11 , because the values x and z are input in parallel to the pipelines 44 1-44 4, thetool 152 derives the input operating parameters of thecommunication shell 74 ofFIG. 11 from the description files 132 1-132 1,4, which describe the communications shells for the pipelines 44 1-44 4. For example, if the operational parameters of these communication shells are similar, then thetool 152 may merely estimate that the input-side operational parameters for theshell 74 are the same as the parameters from one of the description files 132 1,1-132 1,4. Alternatively, thetool 152 may estimate that an intermediate data-type translation is needed for the input-side operational parameters of thecommunication shell 74, or that an averaging operation is needed for the input-side operational parameters of the communication shell, if the respective input-side parameters in the description files 132 1,1-132 1,4 do not match. Similarly, because the values y are output from thepipeline 44 7, thetool 152 derives the output operating parameters for thecommunication shell 74 from the description file 132 1,7, which describes the communication shell for thepipeline 44 7. For example, thetool 152 may estimate that the output-side operational parameters for theshell 74 are the same as the output-side parameters from the description file 132 1,7. - Next, the
generator 160 generates thefile 162, which defines thecircuit 200 ofFIG. 11 , and thesimulator 164 simulates the circuit using the operational parameters calculated for the hardware-interface layer 62 by thegenerator 160. -
FIG. 12 is a block diagram of a circuit 220, for which thetool 152 ofFIG. 7 generates afile 162 according to an embodiment of the invention where the circuit solves for a variable in an equation that includes constant coefficients. The circuit 220 is similar to thecircuit 200 except that thehardwired pipelines - In this embodiment, the designer wants to design a circuit to solve for y in the following equation:
y=√{square root over (ax 4 cos(z)+bz 3 sin(x))} (3)
The only differences between equation (3) and equation (2) is the presence of the constant coefficients a and b. - Referring to
FIG. 10 , one way for thetool 152 to generate such a circuit is to modify thecircuit 200 is to parse equation (3) into portions including “a·x4” and “b·z3”, and to add two corresponding PLICs (not shown) on which are instantiated the multiplication pipeline 44 5: one such multiplier PLIC between the PLICs 60 2 and 60 5 and receiving as inputs z3 and b, and the other such multiplier PLIC between the PLICs 60 3 and 60 6 and receiving as inputs x4 and a. - Although such a modified
circuit 200 is contemplated to accommodate the constant coefficients a and b, this circuit would require twoadditional pipeline units 50. - Referring to
FIGS. 7, 10 , and 12, in this embodiment, however, thetool 152 generates the circuit 220 by replacing thepipelines circuit 200 withpipelines section 124 of the library 120 (FIG. 6 ) includes corresponding hardwired-pipeline templates - Referring to
FIGS. 7 and 12 , to set the values of the coefficients a and b, the designer may enter the values as part of equation (3), or may enter the values separately. Assume that the designer wants a=2.0 and b=3.5. According to the former entry method, he enters equation (3) as: “y=√{square root over (2x4 cos(z)+3.5z3 sin(x))}”. And according to the latter entry method, he enters equation (3) as y={square root over (ax4 cos(z)+bz3 sin(x))}, and then enters “a=2.0, b=3.5.” - The
generator 160 then generates thefile 162 to include the entered values for the coefficients a and b. These values may contained within one or more XML tags or be present in some other form. - In another variation, the values of a and b may be provided to the configuration managers 88 (
FIG. 3 ) of thePLICs FIG. 1 ), initializes the values of a and b by sending configuration messages for a and b to thepipeline units FIG. 1 ) may store a and b as XML files to initialize the configuration messages created and sent by the configuration manager executed by thehost processor 12. - Still referring to
FIGS. 7 and 12 , thetool 152 can use similar techniques to set the values of constant coefficients for other types of circuit portions such as filters, Fast Fourier Transformers (FFTs), and Inverse Fast Fourier Transformers (IFFTs). - Referring to
FIGS. 7-12 , other embodiments of thetool 152 and its operation are contemplated. - For example, one or more of the functions of the
tool 152 may be performed by a functional block (e.g.,front end 156, interpreter 158) other than the block to which the function is attributed in the above discussion. - Furthermore, the
tool 152 may be described using more or fewer functional blocks. In addition, although thetool 152 is described as either fitting the eight instantiations of the hardwired pipelines 44 1-44 7 in eight PLICs 60 1-60 8 (FIGS. 10 and 12 ) or in a single PLIC 60 (FIG. 11 ), thetool 152 may fit these pipelines in more than one but fewer than eight PLICs, depending on the resources available on each PLIC. - Moreover, although described as allowing a designer to define a circuit using conventional mathematical symbols, alternate embodiments of the
front end 156 of thetool 152 may lack this ability, or may allow one to define a circuit using other formats or languages such as C++ or VHDL. - Furthermore, although the
tool 152 is described as allowing one to design a circuit for instantiation on a PLIC, thetool 152 may also allow one to design a circuit for instantiation on an ASIC. - In addition, although the
tool 152 is described as generating afile 162 that defines an algorithm-implementing circuit, such as the circuit 200 (FIG. 11 ), for instantiation on a specific pipeline accelerator 14 (FIG. 14 ) or on a pipeline accelerator that is compatible with a specific platform, the tool may generate, in addition to or instead of thefile 162, a file (not shown) that more generally defines the algorithm. Such a file may include algorithm-definition data that is sometimes called “meta-data,” and may allow the host processor 12 (FIG. 1 ) to implement the algorithm in any manner (e.g., hardwired pipeline(s), software, a combination of both pipeline(s) and software) supported by the peer vector machine 10 (FIG. 1 ). Typically, meta-data describes something, such as an algorithm or another file, but is not executable. For example, the information in the description files 126-134 (FIG. 6 ) may include meta-data. But a processor, such as thehost processor 12, may be able to generate executable code from meta-data. Consequently, a meta-data file that defines an algorithm may allow thehost processor 12 to configure thepeer vector machine 10 for implementing the algorithm even where the machine does not support the implementation(s) specified by thefile 162. Such configuring of thepeer vector machine 10 is described in U.S. patent application Ser. No. (Attorney Docket Nos. 1934-25-3, 1934-26-3, and 1934-36-3), which were previously incorporated by reference. - Moreover, the
tool 152 may generate, and the library 120 (FIG. 6 ) may store, one or more meta-data files (not shown) for describing the messages that carry data to/from the PLICs 60 (or software equivalents) of a circuit, such as the circuit 200 (FIG. 10 ). For example, if the data generated by thePLICs 60 is floating-point data, then a meta-data file specifies this. The file 162 (FIG. 7 ) incorporates or points to these meta-data files so that the host processor 12 (FIG. 1 ) can instantiate the message objects that generate such messages as discussed in previously incorporated U.S. patent application Ser. Nos. (Attorney Docket Nos. 1934-25-3, 1934-26-3, and 1934-36-3). - Furthermore, the
tool 152 may generate, and the library 120 (FIG. 6 ) may store, one or more meta-data files (not shown) for describing the exceptions that the PLICs 60 (or software equivalents) of a circuit, such as the circuit 200 (FIG. 10 ), generate. For example, if aPLIC 60 implements a divide-by-zero exception, then a meta-data file specifies this. The file 162 (FIG. 7 ) incorporates or points to these meta-data files so that the host processor 12 (FIG. 1 ) can instantiate corresponding exception handlers as discussed in previously incorporated U.S. patent application Ser. Nos. (Attorney Docket Nos. 1934-25-3, 1934-26-3, and 1934-36-3). - In addition, the
tool 152 may generate, and the library 120 (FIG. 6 ) may store, one or more meta-data files (not shown) for describing the PLICs 60 (or software equivalents) of a circuit, such as the circuit 200 (FIG. 10 ). For example, such a meta-data file may describe the mathematical operation performed by, and the input and output specifications of, circuitry to be instantiated on a corresponding PLIC (or a software equivalent of the circuitry). The file 162 (FIG. 7 ) incorporates or points to these meta-data files so that the host processor 12 (FIG. 1 ) can 1) determine which firmware files (or software equivalents) stored in thelibrary 120 or in another library will respectively cause the PLICs (or the host processor 12) to instantiate the desired circuitry, or 2) generate one or more of these firmware files (or software equivalents) that are not otherwise available, as described in previously incorporated U.S. patent application Ser. Nos. (Attorney Docket Nos. 1934-25-3, 1934-26-3, and 1934-36-3). - Moreover, the library 120 (
FIG. 6 ) may store one or more of the files 162 (FIG. 7 ) that thetool 152 generates, so that a designer can incorporate previously designed circuits, such as the circuit 200 (FIG. 10 ), into a new larger and more complex circuit. Thetool 152 may then generate anew file 162 that defines this new circuit. - Referring to
FIGS. 13-17 , according to another embodiment of the invention, the tool 152 (FIG. 7 ) allows one to design a circuit for implementing virtually any complex function f(x) by expanding the function into an equivalent infinite series. Many functions, such as f(x)=cos(x) and f(x)=ex, can be expanded into an infinite series, such as the Taylor series or the following MacLaurin series, which is a special case (a=0) of the Taylor series:
Consequently, a combination of summing and multiplyinghardwired pipelines 44 interconnected to generate ax+bx2+cx3+ . . . +vxn can implement any function f(x) that one can expand into a MacLaurin series, where the only differences in this combination of pipelines from function to function are the values of the constant coefficients a, b, c, . . ., v. Therefore, if thetool 152 is programmed with, or otherwise has access to, the coefficients for a number of functions f(x), then the tool can implement any of these functions as a series expansion. Furthermore, because the accuracy of the implementation of a function f(x) is proportional to the number of expansion terms calculated and summed together, thetool 152 may set the number of expansion terms that theinterconnected pipelines 44 generate based on the level of accuracy for f(x) that the circuit designer (not shown) enters into the tool. Alternatively, a designer may directly enter a function f(x) into the front end 156 (FIG. 7 ) of thetool 152 in series-expansion form. -
FIG. 13 is a block diagram of acircuit 240 that the tool 152 (FIG. 7 ) defines for implementing f(x)=cos(x) as a MacLaurin series according to an embodiment of the invention. For clarity,FIG. 13 shows only the adders, multipliers, and delay blocks that compose thecircuit 240, it being understood that thetool 152 may define the circuit for instantiation on one or more PLICs 60 using one or morehardwired pipelines 44 and one or more hardware-interface layers 62 (e.g.,FIGS. 10 and 12 ) per one of the techniques described above in conjunction withFIGS. 7-12 . Furthermore, thecircuit 240 may be part of a larger circuit (not shown) for implementing an algorithm having cos(x) as one of its portions. - F(x)=cos(x) is represented by the following MacLaurin series:
Thecircuit 240 includes a term-generatingsection 242 and a term-summingsection 244. For clarity, only the parts of these sections that respectively generate and sum the first four power-of-x terms of the cos(x) series expansion are shown, it being understood that any remaining portions of these sections for respectively generating and summing the fifth and higher power-of-x terms are similar. - The term-generating
section 242 includes a chain of multipliers 246 1-246 p (only multipliers 246 1-246 8 are shown) and delay blocks 248 1-248 q (only delay blocks 248 1-248 3 are shown) that generate the power-of-x terms of the cos(x) series expansion. The delay blocks 248 insure that the multipliers 246 only multiply powers of x from the same sample time. - The term-summing
section 244 includes two summing paths: a path 250 for positive numbers, and a path 252 for negative numbers. The path 250 includes a chain of adders 254 1-254 r (only adders 254 1-254 2 are shown) and delay blocks 256 1-256 2 (only blocks 256 1 and 256 2 are shown). Similarly, the path 252 includes a chain of adders 258 1-258 t (only adder 258 1 is shown) and delay blocks 260 1-260 u (only blocks 260 1 and 260 2 are shown). Afinal adder 262 sums the cumulative positive and negative sums from the paths 250 and 252 to provide the value for cos(x). Although theadder 262 is shown as summing the first five terms of the expansion (1 and the first four power-of-x terms), it is understood that thefinal adder 262 may be disposed further down the paths 250 and 252 if thecircuit 240 generates additional terms of the cos(x) expansion. Where numbers being summed are floating-point numbers, exceptions, such as a mantissa-register underflow, may occur when a positive number is summed with a negative number that is almost equal to the positive number. But by providing separate summing paths 250 and 252 for positive and negative numbers, respectively, thecircuit 240 limits the number of possible locations where such exceptions can occur to asingle adder 262. Consequently, providing the separate paths 250 and 252 may significantly reduce the frequency of such floating-point exceptions, and thus may reduce the time that the peer-vector machine 10 (FIG. 1 ) consumes handling such exceptions and the size and complexity of the exception manager 86 (FIG. 4 ). - Still referring to
FIG. 13 , the operation of thecircuit 240 is discussed according to an embodiment of the invention. For purposes of explanation, it is assumed that each of the multipliers 246, adders 254 and 258, has a latency (i.e., delay) D of one clock cycle. For example, prior to a first clock edge, a value x is present at the inputs of the multiplier 246 1 and after the first clock edge, the value x2 is present at the output of the multiplier 246 1. It is understood, however, that the multipliers 246 and adders 254 and 258 may have different latencies and latencies other than one, and that the delays provided by the blocks 248, 256, and 260 may be adjusted accordingly. - At a start time, a value x1 is present at the input of the multiplier 246 1 where the subscript “1” denotes the time or position of x1 relative to the other values of x.
- In response to a first clock edge, a value x2 is present at the input of the multiplier 246 1 and x1 2 is present at the output of this multiplier. For brevity, this example follows only the propagation of x1, it being understood that the propagation of x2 and subsequent values of x is similar but delayed relative to the propagation of x1. Furthermore, for clarity, x1 is hereinafter referred to “x” in this example.
- In response to a second clock edge, −x2/2! is present at the output of the multiplier 246 2, x4 is present at the output of the multiplier 246 3, and x2 is available at the output of the block 248 1.
- In response to a third clock edge, “1” is present at the output of the block 256 1, x4/4! is present at the output of the multiplier 246 4, x6 is present at the output of the multiplier 246 5, and x2 is available at the output of the block 248 2.
- In response to a fourth clock edge, −x6/6! is present at the output of the multiplier 246 6, x8 is present at the output of the multiplier 246 7, x2 is available at the output of the block 248 3, and “1+x4/4!” is available at the output of the summer 254 1.
- In response to a fifth clock edge, x8/8! is present at the output of the multiplier 246 8, “1+x4/4!” is available at the output of the block 256 2, and “−x2/2!−x6/6!” is available at the output of the adder 258 1.
- In response to a sixth clock edge, “1+x4/4!+x8/8!” is available at the output of the adder 254 2, and “−x2/2!−x6/6!” is available at the output of the block 260 2.
- And in response to a seventh clock edge, “cos(x)=1−x2/2!+x4/4!−x6/6!+x8/8!” (cos(x) approximated to the first four power-of-x terms of the MacLaurin series expansion) is available at the output of the
adder 262. Therefore, in this example the latency of the circuit 240 (i.e., the number of clock cycles from when x is available at the inputs of the multiplier 246, to when cos(x) is available at the output of the adder 262) is seven clock cycles. Furthermore, if theadder 262 summing a positive number and a negative floating-point number generates an exception, the exception manager 86 (FIG. 4 ) or the host processor 12 (FIG. 1 ) may handle this exception using a conventional floating-point-exception routine. - Alternatively, if the
circuit 240 calculates one or more higher power-of-x terms, then theadder 262 is located after (to the right inFIG. 13 ) the adder that sums the highest generated term to a preceding term, and the operation continues as above. - Still referring to
FIG. 13 , alternate embodiments of thecircuit 240 are contemplated. For example, thecircuit 240 may include multipliers and adders to generate and sum the odd power-of-x terms (e.g., x, x3, x5) with the coefficients of these terms set to zero. Such analternate circuit 240 is more flexible because it allows one to implement function expansions that include odd powers of x, but in this case would have a greater latency than seven clock cycles. -
FIG. 14 is a block diagram of acircuit 270 that the tool 152 (FIG. 7 ) defines for implementing f(x)=cos(x) as a MacLaurin series according to another embodiment of the invention. Thecircuit 270 has a topology that reduces the number of delay blocks and the latency as compared to thecircuit 240 ofFIG. 13 . Furthermore, likeFIG. 13 ,FIG. 14 shows only the adders, multipliers, and delay blocks that compose thecircuit 270, it being understood that thetool 152 may define the circuit for instantiation on one or more PLICs 60 using one or morehardwired pipelines 44 and one or more hardware-interface layers 62 (e.g.,FIGS. 10 and 12 ) per one of the techniques described above in conjunction withFIGS. 7-12 . Furthermore, like thecircuit 240, thecircuit 270 may be part of a larger circuit (not shown) for implementing an algorithm having cos(x) as one of its portions. - The
circuit 270 includes a term-generatingsection 272 and a term-summingsection 274. For clarity, only the parts of these sections that respectively generate and sum the first four power-of-x terms of the cos(x) series expansion are shown, it being understood that any remaining portions of these sections for respectively generating and summing the fifth and higher power-of-x terms are similar. - The term-generating
section 272 includes a hierarchy of multipliers 276 1-276 p (only multipliers 276 1-276 8 are shown) and delay blocks 278 1-278 q (only delay blocks 278 1-278 2 are shown) that generate the power-of-x terms of the cos(x) series expansion. The delay blocks 278 insure that the multipliers 276 only multiply powers of x from the same sample time. - The term-summing
section 274 includes two summing paths: a path 280 for positive numbers, and a path 282 for negative numbers. The path 280 includes a chain of adders 284 1-284 r (only adders 284 1-284 2 are shown) and delay blocks 286 1-286 s (only block 286 1 is shown). Similarly, the path 282 includes a chain of adders 288 1-288 t (only adder 288 1 is shown) and delay blocks 290 1-290 u (only block 290 1 is shown). Afinal adder 292 sums the cumulative positive and negative sums from the paths 280 and 282 to provide the value for cos(x). Although theadder 292 is shown as summing the first five terms of the expansion (1 and the first four power-of-x terms), it is understood that thefinal adder 292 may be disposed further down the paths 280 and 282 if thecircuit 270 generates additional terms of the cos(x) expansion. - Still referring to
FIG. 14 , the operation of thecircuit 240 is discussed according to an embodiment of the invention. For purposes of explanation, it is assumed that each of the multipliers 276, adders 284 and 288, has a latency (i.e., delay) D of one clock cycle. It is understood, however, that the multipliers 276 and adders 284 and 288 may have different latencies and latencies other than one, and that the delays provided by the blocks 278 and 288 may be adjusted accordingly. - At a start time, a value x is present at the input of the multiplier 276 1.
- In response to a first clock edge, x2 is present at the output of the multiplier 276 1.
- In response to a second clock edge, x4 is present at the output of the multiplier 276 2, and x2 is available at the output of the block 278 1.
- In response to a third clock edge, “1” is present at the output of the block 286 1, x4/4! is present at the output of the multiplier 276 6, x6 is present at the output of the multiplier 276 4, −x2/2! is available at the output of the multiplier 276 5, and x8 is available at the output of the multiplier 276 3,
- In response to a fourth clock edge, −x6/6! is present at the output of the multiplier 276 7, x8/8! is present at the output of the multiplier 276 8, −x2/2! is available at the output of the block 290 1, and “1+x4/4!” is available at the output of the summer 284 1.
- In response to a fifth clock edge, “1+x4/4!+x8/8!” is available at the output of the adder 284 2, and “−x2/2!−x6/6!” is available at the output of the adder 288 1.
- And in response to a sixth clock edge, “cos(x)=1−x2/2!+x4/4!−x6/6!+x8/8!” (cos(x) approximated to the first four power-of-x terms of the MacLaurin series expansion) is available at the output of the
adder 292. Therefore, in this example the latency of thecircuit 270 is six clock cycles, which is one fewer clock cycle than the latency of thecircuit 240 ofFIG. 13 . But as the number of the power-of-x terms increases beyond four, the gap between the latencies of thecircuits circuit 270 provides an even greater improvement in the latency. - Alternatively, if the
circuit 270 calculates one or more higher power-of-x terms, then theadder 292 is located after (to the right inFIG. 14 ) the adder that sums the highest generated term to a preceding term, and the operation continues as above. - Still referring to
FIG. 14 , alternate embodiments of thecircuit 270 are contemplated. For example, thecircuit 270 may include multipliers and adders to generate and sum the odd power-of-x terms (e.g., x, x3, x5) with the coefficients of these terms set to zero. Such analternate circuit 270 may be more flexible because it allows one to implement function expansions that include odd powers of x without increasing the circuit's latency for a given highest power of x. That is, where the highest power of x generated by thecircuit 270 is x8, adding multipliers and adders to generate x3, x5, and x7 would not increase the latency of thecircuit 270 beyond six clock cycles. This is because thecircuit 270 would generate the power-of-x terms in parallel, not serially like thecircuit 240 ofFIG. 13 . -
FIG. 15 is a block diagram of a power-of-x term generator 300 that the tool 152 (FIG. 7 ) defines to replace the power-of-x-term odd multipliers 246 3, 246 5, 246 7, . . . of the term-generatingsection 242 ofFIG. 13 and the power-of-x-term multipliers 276 1, 276 2, 276 3, 276 4, . . . ofFIG. 14 according to an embodiment of the invention. Generally, thegenerator 300 includes fewer multipliers (here one) than the term-generatingsections 242 and 272 (which each include eight multipliers), but may have a higher latency for a given number of generated power-of-x terms. Furthermore, like FIGS. 13-14,FIG. 15 shows only the multipliers and other components that compose theterm generator 300, it being understood that thetool 152 may define a circuit that includes the term generator for instantiation on one or more PLICs 60 using one or morehardwired pipelines 44 and one or more hardware-interface layers 62 (e.g.,FIGS. 10 and 12 ) per one of the techniques described above in conjunction withFIGS. 7-12 . - The
term generator 300 includes aregister 302 for storing x, amultiplier 304, amultiplexer 306, and term-storage registers 308 1-308 p (only registers 308 1-308 4 are shown). For clarity, only the parts of thegenerator 302 that generates the first four power-of-x terms of the cos(x) series expansion are shown, it being understood that any remaining portions of the generator for generating the fifth and higher power-of-x terms are similar. - Still referring to
FIG. 15 , the operation of thecircuit 300 is discussed according to an embodiment of the invention. For purposes of explanation, it is assumed that each of theregister 302,multiplier 304, and registers 308 has a respective latency (i.e., delay) of one clock cycle, and that themultiplexer 306 is not clocked, i.e., is asynchronous. It is understood, however, that theregister 302,multiplier 304, and registers 308 may have different latencies and latencies other than one, that themultiplexer 306 may be clocked and have a latency of one or more clock cycles, and that the term-summingsections FIGS. 13 and 14 , respectively, may be adjusted accordingly. - At a start time, a value x is present at the input of the
register 302. - In response to a first clock edge, the current value of x is loaded into, and thus is present at the output of, the
register 302, and is present at the output of themultiplexer 306, which couples its input 312 to its output. Theregister 302 is then disabled. Alternatively, theregister 302 is not disabled but the value of x at the input of this register does not change. - In response to a second clock edge, x2 is present at the output of the
multiplier 304, and the multiplexer changes state and couples its input 314 to its output such that x2 is also present at the output of themultiplexer 306. - In response to a third clock edge, x2 is loaded into, and thus is available at the output of, the register 3101, and x3 is available at the output of the
multiplier 304 and at the output of themultiplexer 306. - In response to a fourth clock edge, x4 is available at the output of the
multiplier 304 and at the output of themultiplexer 306. - In response to a fifth clock edge, x4 is loaded into, and thus is available at the output of, the register 3102, and x5 is available at the output of the
multiplier 304 and at the output of themultiplexer 306. - In response to a sixth clock edge, x6 is available at the output of the
multiplier 304 and at the output of themultiplexer 306. - In response to a seventh clock edge, x6 is loaded into, and thus is available at the output of, the register 310 3, and x7 is available at the output of the
multiplier 304 and at the output of themultiplexer 306. - In response to an eighth clock edge, x8 is available at the output of the
multiplier 304 and at the output of themultiplexer 306. - And in response to a ninth clock edge, x8 is loaded into, and thus is available at the output of, the register 310 4, the next value of x is loaded into the
register 302. But if thegenerator 300 generates powers of x higher than x8, the generator continues operating in the described manner before loading the next value of x into theregister 302. - After the
generator 300 generates all of the specified powers of the current value of x, theregister 302,multiplier 304,multiplexer 306, and registers 310 repeat the above procedure for each subsequent value of x. - Alternative embodiments of the
generator 300 are contemplated. For example, to generate the odd powers of x for a function other than cos(x), one can merely add additional registers 310 to store these values, because themultiplier 304 inherently generates these odd powers. Alternatively, thegenerator 300 may be modified to load x2 into theregister 302 so that themultiplier 304 thereafter generates only even powers of x. Moreover, one or more of the registers 308 may be eliminated, and themultiplexer 306 may feed the respective powers of x directly to the term multipliers, e.g., the term multipliers 246 2, 246 4, 246 6, 246 8, . . . ofFIG. 13 and the term multipliers 276 5, 276 6, 276 7, 276 8, . . . ofFIG. 14 . -
FIG. 16 is a block diagram of acircuit 320 that the tool 152 (FIG. 7 ) defines for implementing f(x)=ex as a MacLaurin series according to an embodiment of the invention. Thecircuit 320 is similar to thecircuit 240 ofFIG. 13 , but because the odd power-of-x terms for the ex expansion may be positive or negative, thecircuit 320 also includes sign determiners (described below and in conjunction withFIG. 17 ) that respectively provide these odd-power-of-x terms to the proper path (positive or negative) of the term-summing section. For clarity,FIG. 16 shows only the adders, multipliers, delay blocks, and sign determiners that compose thecircuit 320, it being understood that thetool 152 may define the circuit for instantiation on one or more PLICs 60 using one or morehardwired pipelines 44 and one or more hardware-interface layers 62 (e.g.,FIGS. 10 and 12 ) per one of the techniques described above in conjunction withFIGS. 7-12 . Furthermore, thecircuit 320 may be part of a larger circuit (not shown) for implementing an algorithm having ex as one of its portions. - F(x)=ex is represented by the following MacLaurin series:
Thecircuit 320 includes a term-generatingsection 322 and a term-summingsection 324, which includes positive- and negative-value summing paths 326 and 328. For clarity, only the parts of these sections that respectively generate and sum the first five power-of-x terms of the ex series expansion are shown, it being understood that any remaining portions of these sections for respectively generating and summing the sixth and higher power-of-x terms are similar. - The term-generating
section 322 includes a chain of multipliers 330 1-330 p (only multipliers 330 1-330 8 are shown) and delay blocks 332 1-332 q (only delay blocks 332 1-332 4 are shown) that generate the power-of-x terms of the ex series expansion. Thesection 322 also includes, for each odd-power-of-x term (e.g., x, x3, x5, . . . ), a respective sign determiner 334 1-334 v (only determiners 334 1-334 3 are shown) that directs positive values of the odd-power-of-x term to the positive summing path 326 of the term-summingsection 324, and that directs negative values of the odd-power-of-x term to the negative summing path 328. - The positive-value path 326 of the term-summing
section 324 includes a chain of adders 336 1-336 r (only adders 336 1-336 5 are shown) and delay blocks 338 1-338 s (only blocks 338 1-338 3 are shown). Similarly, the negative-value path 328 includes a chain of adders 340 1-340 t (only adders 340 1-340 2 are shown) and delay blocks 342 1-342 u (only blocks 342 1-342 2 are shown). Afinal adder 344 sums the cumulative positive and negative sums from the paths 326 and 328 to provide the value for ex. Although thefinal adder 344 is shown as summing the first six terms of the ex expansion (“1” and the first five power-of-x terms), it is understood that the final adder may be disposed further down the paths 326 and 328 if thecircuit 320 generates additional terms of the expansion. - Still referring to
FIG. 16 , the operation of thecircuit 320 is discussed according to an embodiment of the invention. For purposes of explanation, it is assumed that each of the multipliers 330, sign determiners 334, and adders 336 and 340 has a latency (i.e., delay) D of one clock cycle. It is understood, however, that the multipliers 330, sign determiners 334, and adders 334 and 336 may have different latencies and latencies other than one, and that the delays provided by the blocks 332, 338, and 342 may be adjusted accordingly. - At a start time, a value x is present at both inputs of the multiplier 330 1, at the input of the delay block 332 1, and at the input of the sign determiner 334 1.
- In response to a first clock edge, x2 is available at the output of the multiplier 330 1, x is available at the output of the delay block 332 1, and “1” is available at the output of the delay block 338 1. Furthermore, if x is positive, x and logic “0” are respectively available at the (+) and (−) outputs of the sign determiner 334 1; conversely, if x is negative, logic “0” and x are respectively available at the (+) and (−) outputs of the determiner 334 1.
- In response to a second clock edge, x2/2! is available at the output of the multiplier 330 2, x3 is present at the output of the multiplier 330 3, and x is available at the output of the delay block 332 2. Furthermore, if x is positive, “1+x” is available at the output of the adder 336 1; conversely, if x is negative, “1+0=1” is present at the output of the adder 336 1.
- In response to a third clock edge, x3/3! is available at the output of the multiplier 330 4, x4 is available at the output of the multiplier 330 5, x is available at the output of the delay block 332 3, and “1+x+x2/2!” (x positive) or “1+x2/2!” (x negative) is available at the output of the adder 336 2.
- In response to a fourth clock edge, x4/4! is present at the output of the multiplier 330 6, x5 is present at the output of the multiplier 330 7, x is available at the output of the block 332 4, and “1+x+x2/2!” (x positive) or “1+x2/2!” (x negative) is available at the output of the delay block 338 2. Furthermore, if x3/3!, and thus x, is positive, x3/3! and logic “0” are respectively present at the (+) and (−) outputs of the sign determiner 334 2; conversely, if x3/3!, and thus x, is negative, logic “0” and x3/3! are respectively present at the (+) and (−) outputs of the determiner 334 2. Moreover, if x is negative, then x is available at the output of the delay block 342 1; conversely, if x is positive, then logic “0” is available at the output of the delay block 342 1.
- In response to a fifth clock edge, x5/5! is available at the output of the multiplier 330 8, “1+x+x2/2!+x3/3!” (x positive) or “1+x2/2!” (x negative) is available at the output of the adder 336 3, x4/4! is available at the output of the delay block 338 3, and “0” (x positive) or “−x−x3/3!” (x negative) is available at the output of the adder 340 1.
- In response to a sixth clock edge, if x5/5!, and thus x, is positive, x5/5! and logic “0” are respectively available at the (+) and (−) outputs of the sign determiner 334 3; conversely, if x5/5!, and thus x, is negative, logic “0” and x5/5! are respectively available at the (+) and (−) outputs of the determiner 334 3. Furthermore, “1+x+x2/2!+x3/3!+x4/4!” (x positive) or “1+x2/2!+x4/4!” (x negative) is available at the output of the multiplier 336 4, and “0” (x positive) or “−x−x3/3!” (x negative) is available at the output of the delay block 342 2.
- In response to a seventh clock edge, “1+x+x2/2!+x3/3!+x4/4!+x5/5!” (x positive) or “1+x2/2!+x4/4!” (x negative) is available at the output of the adder 336 5, and “0” (x positive) or “x−x3/3!−x5/4!” (x negative) is available at the output of the adder 340 2.
- And in response to an eighth clock edge, “ex=“1+x+x2/2!+x3/3!+x4/4!+x5/5!” (x positive) or “ex=1−x+x2/2!−x3/3!+x4/4!−x5/5!” (x negative) is available at the output of the
adder 344. - Therefore, in this example, the latency of the
circuit 320 is eight. Furthermore, if theadder 344, while summing a positive number and a negative floating-point number, generates an exception, the exception manager 86 (FIG. 4 ) or the host processor 12 (FIG. 1 ) may handle this exception using a conventional floating-point-exception routine. - Alternatively, if the
circuit 320 calculates one or more power-of-x terms higher than the fifth power, then theadder 344 is located after (to the right inFIG. 16 ) the adder 336 or 340 that sums the highest generated term to a preceding term, and the operation continues as above. - Still referring to
FIG. 16 , alternate embodiments of thecircuit 320 are contemplated. For example, one may replace the term-generatingsection 322 with a section similar to the term-generatingsection 272 ofFIG. 14 , or may replace the chain of multipliers 330 with a power-of-x generator similar to thegenerator 300 ofFIG. 15 . -
FIG. 17 is a block diagram of the sign determiner 334,ofFIG. 16 according to an embodiment of the invention, it being understood that the sign determiners 334 2-334 v are similar. - The sign determiner 334 1 includes an
input node 350, a (−)output node 352, a (+)output node 354, aregister 356 that stores a logic “0”, anddemultiplexers - The
demultiplexer 358 includes acontrol node 362 coupled to receive a sign bit of the value at theinput node 350, a (−)input node 364 coupled to theinput node 350, a (+)input node 366 coupled to theregister 356, and anoutput node 368 coupled to the (−)output node 352. - Similarly, the
demultiplexer 360 includes acontrol node 370 coupled to receive the sign bit of the value at theinput node 350, a (−)input node 372 coupled to theregister 356, a (+)input node 374 coupled to theinput node 350, and anoutput node 376 coupled to the (+)output node 354. - Still referring to
FIG. 17 , two operating modes of the sign determiner 334, are described according to an embodiment of the invention. - In one operating mode, the sign determiner 334 1 receives at its input node 350 a positive (+) value v, which, therefore, includes a positive sign bit. This sign bit is typically the most-significant bit of v, although the sign bit may be any other bit of v. In response to the positive sign bit, the
demultiplexer 360 couples v (including the sign bit) from its (+)input node 374 to itsoutput node 376, and thus to the (+)output node 354 of thesign determiner 3341. Furthermore, thedemultiplexer 358 couples the logic “0” stored in theregister 356 from the (+)input node 366 to theoutput node 368, and thus to the (−)output node 352 of the sign determiner 334 1. - In the other operating mode, the sign determiner 334 1 receives at its input node 350 a negative (−) value v, which, therefore, includes a negative sign bit. In response to the negative sign bit, the
demultiplexer 358 couples v (including the sign bit) from its (−)input node 364 to itsoutput node 368, and thus to the (−)output node 352 of the sign determiner 334 1. Furthermore, thedemultiplexer 360 couples the logic “0” stored in theregister 356 from the (−)input node 372 to theoutput node 376, and thus to the (+)output node 354 of the sign determiner 334 1. - Still referring to
FIG. 17 , alternative embodiments of the sign determiner 334 1 are contemplated. For example, one may replace the logic “0” register with a component, such as pull-down resistor, coupled to a logic “0” voltage level, such as ground. - Referring to
FIGS. 1-17 , alternate embodiments of thepeer vector machine 10 are contemplated. For example, some or all of the components of thepeer vector machine 10, such as the host processor 12 (FIG. 1 ) and the pipeline units 50 (FIG. 3 ) of the pipeline accelerator 14 (FIG. 1 ), may be disposed on a single integrated circuit. - The preceding discussion is presented to enable a person skilled in the art to make and use the invention. Various modifications to the embodiments will be readily apparent to those skilled in the art, and the generic principles herein may be applied to other embodiments and applications without departing from the spirit and scope of the present invention. Thus, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
Claims (26)
1. A computer-based design tool, comprising:
a front end operable to receive symbols that define an algorithm;
an interpreter coupled to the front end and operable to parse the algorithm into respective algorithm portions; and
a generator coupled to the interpreter and operable to
identify a corresponding circuit template for each of the algorithm portions, each template defining a circuit for executing the respective algorithm portion, and
interconnecting the identified templates such that the interconnected templates define a circuit that is operable to execute the algorithm.
2. The design tool of claim 1 wherein the generator is operable to generate a file comprising:
a respective pointer to each of the identified templates within a template library; and
a list of the interconnections between the identified templates.
3. The design tool of claim 1 wherein the symbols comprise mathematical symbols.
4. The design tool of claim 1 wherein the interpreter is operable to parse the algorithm into respective algorithm portions that each correspond to a template in a library.
5. The design tool of claim 1 wherein the generator is operable to identify the corresponding templates by accessing a library that includes the identified templates.
6. The design tool of claim 1 , further comprising a library coupled to the generator and operable to store the identified templates.
7. The design tool of claim 1 , further comprising a simulator coupled to the generator and operable to simulate operation of the circuit by determining a transfer function of the circuit defined by the interconnected templates.
8. The design tool of claim 1 wherein:
the front end is further operable to receive a desired operational characteristic of the circuit; and
the generator is further operable to identify the corresponding template for each of the algorithm portions such that the interconnection of identified templates defines the circuit having the desired operational characteristic.
9. The design tool of claim 8 wherein the desired operational characteristic comprises a latency of the circuit.
10. The design tool of claim 1 wherein:
the front end is further operable to receive an identity of a platform; and
the generator is further operable to,
identify a hardware-abstraction-layer template corresponding to the platform, and
interconnect the identified circuit templates to the identified layer template such that the interconnection of the templates defines the electronic circuit.
11. The tool of claim 1 wherein:
the front end is further operable to receive an identity of a platform; and
the generator is further operable to determine whether the circuit defined by the interconnection of the identified templates can be instantiated on the identified platform.
12. A computer-based design tool, comprising:
a front end operable to receive symbols that define an algorithm;
a generator coupled to the front end and operable to,
identify a template that defines a first electronic circuit that is operable to execute the algorithm,
identify a template that defines a hardware interface that is compatible with the first electronic circuit, and
interconnect the identified templates to define a resulting electronic circuit that includes the first circuit interconnected to the hardware interface.
13. A method, comprising:
parsing an algorithm into a combination of respective smaller algorithms;
identifying a corresponding template for each of the smaller algorithms, each template defining a respective circuit that is operable to execute the respective smaller algorithm; and
interconnecting the identified templates such that the interconnected templates define an electronic circuit that is operable to execute the algorithm.
14. The method of claim 13 , further comprising:
generating a respective pointer to each of the identified templates within a library; and
generating a list of the interconnections between the identified templates.
15. The method of claim 13 , further comprising:
receiving an expression of mathematical symbols that defines the algorithm; and
wherein parsing the algorithm comprises parsing the expression into groups of symbols that respectively define the smaller algorithms.
16. The method of claim 13 wherein identifying the corresponding templates comprises searching a library that includes the corresponding templates.
17. The method of claim 13 , further comprising simulating the electronic circuit by:
determining a transfer function of the circuit from characteristics of the interconnected templates; and
determining a signal output from the circuit in response to a signal input to the circuit.
18. The method of claim 13 wherein identifying the corresponding templates comprises identifying the templates such that the interconnection of the templates represents the electronic circuit having a predetermined operational characteristic.
19. The method of claim 13 , further comprising:
identifying an interface template that defines a hardware interface that is compatible with a predetermined platform on which the electronic circuit can be instantiated; and
wherein interconnecting the templates comprises interconnecting the circuit templates to the interface template such that the interconnection of the circuit and interface templates defines the electronic circuit.
20. The method of claim 13 , further comprising determining whether the electronic circuit can be instantiated on a predetermined platform.
21. A method, comprising:
identifying a circuit template that defines a first electronic circuit that is operable to execute an algorithm;
identifying an interface template that defines a hardware interface that is compatible with the first electronic circuit; and
interconnecting the identified circuit and interface templates to generate a definition of a resulting electronic circuit that includes the first circuit interconnected to the hardware interface.
22. The method of claim 21 , further comprising using the definition to instantiate the resulting electronic circuit on a programmable logic circuit.
23. The method of claim 21 , further comprising using the definition to instantiate the resulting electronic circuit on a programmable logic circuit having signal pins such that the hardware interface is disposed between the pins and the first circuit.
24. The method of claim 21 , further comprising simulating operation of the resulting electronic circuit based on information included in the circuit and interface templates.
25. The method of claim 21 , further comprising simulating operation of the resulting electronic circuit based on information included in a description file that corresponds to the circuit and interface templates.
26. A computer-readable medium, that when executed by a processor, causes the processor to:
parse an algorithm into a combination of respective smaller algorithms;
identify a corresponding template for each of the smaller algorithms, each template defining a respective circuit that is operable to execute the respective smaller algorithm; and
interconnect the identified templates such that the interconnected templates define an electronic circuit that is operable to execute the algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/243,509 US20060230377A1 (en) | 2004-10-01 | 2005-10-03 | Computer-based tool and method for designing an electronic circuit and related system |
Applications Claiming Priority (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US61515804P | 2004-10-01 | 2004-10-01 | |
US61519304P | 2004-10-01 | 2004-10-01 | |
US61519204P | 2004-10-01 | 2004-10-01 | |
US61515704P | 2004-10-01 | 2004-10-01 | |
US61517004P | 2004-10-01 | 2004-10-01 | |
US61505004P | 2004-10-01 | 2004-10-01 | |
US11/243,509 US20060230377A1 (en) | 2004-10-01 | 2005-10-03 | Computer-based tool and method for designing an electronic circuit and related system |
Publications (1)
Publication Number | Publication Date |
---|---|
US20060230377A1 true US20060230377A1 (en) | 2006-10-12 |
Family
ID=35645569
Family Applications (8)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/243,527 Expired - Fee Related US7487302B2 (en) | 2004-10-01 | 2005-10-03 | Service layer architecture for memory access system and method |
US11/243,459 Abandoned US20060101250A1 (en) | 2004-10-01 | 2005-10-03 | Configurable computing machine and related systems and methods |
US11/243,509 Abandoned US20060230377A1 (en) | 2004-10-01 | 2005-10-03 | Computer-based tool and method for designing an electronic circuit and related system |
US11/243,508 Expired - Fee Related US7809982B2 (en) | 2004-10-01 | 2005-10-03 | Reconfigurable computing machine and related systems and methods |
US11/243,528 Expired - Fee Related US7619541B2 (en) | 2004-10-01 | 2005-10-03 | Remote sensor processing system and method |
US11/243,507 Expired - Fee Related US7676649B2 (en) | 2004-10-01 | 2005-10-03 | Computing machine with redundancy and related systems and methods |
US11/243,502 Expired - Fee Related US8073974B2 (en) | 2004-10-01 | 2005-10-03 | Object oriented mission framework and system and method |
US11/243,506 Abandoned US20060085781A1 (en) | 2004-10-01 | 2005-10-03 | Library for computer-based tool and related system and method |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/243,527 Expired - Fee Related US7487302B2 (en) | 2004-10-01 | 2005-10-03 | Service layer architecture for memory access system and method |
US11/243,459 Abandoned US20060101250A1 (en) | 2004-10-01 | 2005-10-03 | Configurable computing machine and related systems and methods |
Family Applications After (5)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/243,508 Expired - Fee Related US7809982B2 (en) | 2004-10-01 | 2005-10-03 | Reconfigurable computing machine and related systems and methods |
US11/243,528 Expired - Fee Related US7619541B2 (en) | 2004-10-01 | 2005-10-03 | Remote sensor processing system and method |
US11/243,507 Expired - Fee Related US7676649B2 (en) | 2004-10-01 | 2005-10-03 | Computing machine with redundancy and related systems and methods |
US11/243,502 Expired - Fee Related US8073974B2 (en) | 2004-10-01 | 2005-10-03 | Object oriented mission framework and system and method |
US11/243,506 Abandoned US20060085781A1 (en) | 2004-10-01 | 2005-10-03 | Library for computer-based tool and related system and method |
Country Status (2)
Country | Link |
---|---|
US (8) | US7487302B2 (en) |
WO (2) | WO2006039711A1 (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060101253A1 (en) * | 2004-10-01 | 2006-05-11 | Lockheed Martin Corporation | Computing machine with redundancy and related systems and methods |
US20060200788A1 (en) * | 2005-03-03 | 2006-09-07 | Lsi Logic Corporation | Method for describing and deploying design platform sets |
US20060265927A1 (en) * | 2004-10-29 | 2006-11-30 | Lockheed Martin Corporation | Projectile accelerator and related vehicle and method |
US20070044044A1 (en) * | 2005-08-05 | 2007-02-22 | John Wilson | Automating power domains in electronic design automation |
US7366998B1 (en) * | 2005-11-08 | 2008-04-29 | Xilinx, Inc. | Efficient communication of data between blocks in a high level modeling system |
US20100046175A1 (en) * | 2008-06-18 | 2010-02-25 | Lockheed Martin Corporation | Electronics module, enclosure assembly housing same, and related systems and methods |
US20100046177A1 (en) * | 2008-06-18 | 2010-02-25 | Lockheed Martin Corporation | Enclosure assembly housing at least one electronic board assembly and systems using same |
US20110107252A1 (en) * | 2009-10-30 | 2011-05-05 | Synopsys, Inc. | Technique for generating an analysis equation |
US7987341B2 (en) | 2002-10-31 | 2011-07-26 | Lockheed Martin Corporation | Computing machine using software objects for transferring data that includes no destination information |
US7984581B2 (en) | 2004-10-29 | 2011-07-26 | Lockheed Martin Corporation | Projectile accelerator and related vehicle and method |
US20120017187A1 (en) * | 2010-07-13 | 2012-01-19 | Satish Padmanabhan | Automatic optimal integrated circuit generator from algorithms and specification |
US8635567B1 (en) * | 2012-10-11 | 2014-01-21 | Xilinx, Inc. | Electronic design automation tool for guided connection assistance |
US8739103B1 (en) * | 2013-03-04 | 2014-05-27 | Cypress Semiconductor Corporation | Techniques for placement in highly constrained architectures |
US9569581B1 (en) * | 2015-08-10 | 2017-02-14 | International Business Machines Corporation | Logic structure aware circuit routing |
US10922463B1 (en) * | 2019-10-20 | 2021-02-16 | Xilinx, Inc. | User dialog-based automated system design for programmable integrated circuits |
Families Citing this family (58)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7124318B2 (en) * | 2003-09-18 | 2006-10-17 | International Business Machines Corporation | Multiple parallel pipeline processor having self-repairing capability |
US7263672B2 (en) * | 2004-09-03 | 2007-08-28 | Abb Research Ltd. | Methods, systems, and data models for describing an electrical device |
US7797661B2 (en) * | 2004-09-03 | 2010-09-14 | Abb Research Ag | Method and apparatus for describing and managing properties of a transformer coil |
KR100633099B1 (en) * | 2004-10-15 | 2006-10-11 | 삼성전자주식회사 | System using data bus and method for operation controlling thereof |
US7409475B2 (en) * | 2004-10-20 | 2008-08-05 | Kabushiki Kaisha Toshiba | System and method for a high-speed shift-type buffer |
US7721241B2 (en) * | 2005-07-29 | 2010-05-18 | Abb Research Ltd. | Automated method and tool for documenting a transformer design |
US7571395B1 (en) * | 2005-08-03 | 2009-08-04 | Xilinx, Inc. | Generation of a circuit design from a command language specification of blocks in matrix form |
US7913223B2 (en) * | 2005-12-16 | 2011-03-22 | Dialogic Corporation | Method and system for development and use of a user-interface for operations, administration, maintenance and provisioning of a telecommunications system |
US7577870B2 (en) * | 2005-12-21 | 2009-08-18 | The Boeing Company | Method and system for controlling command execution |
US8055444B2 (en) * | 2006-04-04 | 2011-11-08 | Yahoo! Inc. | Content display and navigation interface |
JP5437556B2 (en) * | 2006-07-12 | 2014-03-12 | 日本電気株式会社 | Information processing apparatus and processor function changing method |
US7856546B2 (en) * | 2006-07-28 | 2010-12-21 | Drc Computer Corporation | Configurable processor module accelerator using a programmable logic device |
CN100428174C (en) * | 2006-10-31 | 2008-10-22 | 哈尔滨工业大学 | Embedded fault injection system and its method |
US8127113B1 (en) | 2006-12-01 | 2012-02-28 | Synopsys, Inc. | Generating hardware accelerators and processor offloads |
US8289966B1 (en) | 2006-12-01 | 2012-10-16 | Synopsys, Inc. | Packet ingress/egress block and system and method for receiving, transmitting, and managing packetized data |
US8706987B1 (en) | 2006-12-01 | 2014-04-22 | Synopsys, Inc. | Structured block transfer module, system architecture, and method for transferring |
US8614633B1 (en) * | 2007-01-08 | 2013-12-24 | Lockheed Martin Corporation | Integrated smart hazard assessment and response planning (SHARP) system and method for a vessel |
EP1986369B1 (en) * | 2007-04-27 | 2012-03-07 | Accenture Global Services Limited | End user control configuration system with dynamic user interface |
US7900168B2 (en) * | 2007-07-12 | 2011-03-01 | The Mathworks, Inc. | Customizable synthesis of tunable parameters for code generation |
US8145894B1 (en) * | 2008-02-25 | 2012-03-27 | Drc Computer Corporation | Reconfiguration of an accelerator module having a programmable logic device |
JP5536760B2 (en) | 2008-05-30 | 2014-07-02 | アドバンスト・マイクロ・ディバイシズ・インコーポレイテッド | Redundancy method and apparatus for shader rows |
US8195882B2 (en) | 2008-05-30 | 2012-06-05 | Advanced Micro Devices, Inc. | Shader complex with distributed level one cache system and centralized level two cache |
US8502832B2 (en) * | 2008-05-30 | 2013-08-06 | Advanced Micro Devices, Inc. | Floating point texture filtering using unsigned linear interpolators and block normalizations |
US20100106668A1 (en) * | 2008-10-17 | 2010-04-29 | Louis Hawthorne | System and method for providing community wisdom based on user profile |
US8856463B2 (en) * | 2008-12-16 | 2014-10-07 | Frank Rau | System and method for high performance synchronous DRAM memory controller |
US8127262B1 (en) * | 2008-12-18 | 2012-02-28 | Xilinx, Inc. | Communicating state data between stages of pipelined packet processor |
US8156264B2 (en) * | 2009-04-03 | 2012-04-10 | Analog Devices, Inc. | Digital output sensor FIFO buffer with single port memory |
US8276159B2 (en) | 2009-09-23 | 2012-09-25 | Microsoft Corporation | Message communication of sensor and other data |
US8914672B2 (en) * | 2009-12-28 | 2014-12-16 | Intel Corporation | General purpose hardware to replace faulty core components that may also provide additional processor functionality |
US8705052B2 (en) * | 2010-04-14 | 2014-04-22 | Hewlett-Packard Development Company, L.P. | Communicating state data to a network service |
US8887054B2 (en) | 2010-04-15 | 2014-11-11 | Hewlett-Packard Development Company, L.P. | Application selection user interface |
JP5555116B2 (en) * | 2010-09-29 | 2014-07-23 | キヤノン株式会社 | Information processing apparatus and inter-processor communication control method |
DE102010062191B4 (en) * | 2010-11-30 | 2012-06-28 | Siemens Aktiengesellschaft | Pipeline system and method for operating a pipeline system |
US8943113B2 (en) * | 2011-07-21 | 2015-01-27 | Xiaohua Yi | Methods and systems for parsing and interpretation of mathematical statements |
US8898516B2 (en) * | 2011-12-09 | 2014-11-25 | Toyota Jidosha Kabushiki Kaisha | Fault-tolerant computer system |
US9015289B2 (en) * | 2012-04-12 | 2015-04-21 | Netflix, Inc. | Method and system for evaluating the resiliency of a distributed computing service by inducing a latency |
US9716802B2 (en) | 2012-04-12 | 2017-07-25 | Hewlett-Packard Development Company, L.P. | Content model for a printer interface |
US10270709B2 (en) | 2015-06-26 | 2019-04-23 | Microsoft Technology Licensing, Llc | Allocating acceleration component functionality for supporting services |
US10140129B2 (en) | 2012-12-28 | 2018-11-27 | Intel Corporation | Processing core having shared front end unit |
US9361116B2 (en) | 2012-12-28 | 2016-06-07 | Intel Corporation | Apparatus and method for low-latency invocation of accelerators |
US9417873B2 (en) | 2012-12-28 | 2016-08-16 | Intel Corporation | Apparatus and method for a hybrid latency-throughput processor |
US10346195B2 (en) | 2012-12-29 | 2019-07-09 | Intel Corporation | Apparatus and method for invocation of a multi threaded accelerator |
US9990212B2 (en) * | 2013-02-19 | 2018-06-05 | Empire Technology Development Llc | Testing and repair of a hardware accelerator image in a programmable logic circuit |
US9898339B2 (en) * | 2013-03-12 | 2018-02-20 | Itron, Inc. | Meter reading data validation |
JP6455132B2 (en) * | 2014-12-22 | 2019-01-23 | 富士通株式会社 | Information processing apparatus, processing method, and program |
WO2016125202A1 (en) * | 2015-02-04 | 2016-08-11 | Renesas Electronics Corporation | Data transfer apparatus |
US10198294B2 (en) | 2015-04-17 | 2019-02-05 | Microsoft Licensing Technology, LLC | Handling tenant requests in a system that uses hardware acceleration components |
US9792154B2 (en) | 2015-04-17 | 2017-10-17 | Microsoft Technology Licensing, Llc | Data processing system having a hardware acceleration plane and a software plane |
US20160308649A1 (en) * | 2015-04-17 | 2016-10-20 | Microsoft Technology Licensing, Llc | Providing Services in a System having a Hardware Acceleration Plane and a Software Plane |
KR20160148952A (en) * | 2015-06-17 | 2016-12-27 | 에스케이하이닉스 주식회사 | Memory system and operating method of memory system |
US10216555B2 (en) | 2015-06-26 | 2019-02-26 | Microsoft Technology Licensing, Llc | Partially reconfiguring acceleration components |
US10275160B2 (en) | 2015-12-21 | 2019-04-30 | Intel Corporation | Method and apparatus to enable individual non volatile memory express (NVME) input/output (IO) Queues on differing network addresses of an NVME controller |
US10013168B2 (en) | 2015-12-24 | 2018-07-03 | Intel Corporation | Disaggregating block storage controller stacks |
US10200376B2 (en) | 2016-08-24 | 2019-02-05 | Intel Corporation | Computer product, method, and system to dynamically provide discovery services for host nodes of target systems and storage resources in a network |
US10176116B2 (en) | 2016-09-28 | 2019-01-08 | Intel Corporation | Computer product, method, and system to provide discovery services to discover target storage resources and register a configuration of virtual target storage resources mapping to the target storage resources and an access control list of host nodes allowed to access the virtual target storage resources |
US10545770B2 (en) * | 2016-11-14 | 2020-01-28 | Intel Corporation | Configurable client hardware |
US10545925B2 (en) | 2018-06-06 | 2020-01-28 | Intel Corporation | Storage appliance for processing of functions as a service (FaaS) |
CN112631168A (en) * | 2020-12-09 | 2021-04-09 | 广东电网有限责任公司 | FPGA-based deformation detector control circuit design method |
Citations (89)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4703475A (en) * | 1985-12-04 | 1987-10-27 | American Telephone And Telegraph Company At&T Bell Laboratories | Data communication method and apparatus using multiple physical data links |
US4782461A (en) * | 1984-06-21 | 1988-11-01 | Step Engineering | Logical grouping of facilities within a computer development system |
US4862407A (en) * | 1987-10-05 | 1989-08-29 | Motorola, Inc. | Digital signal processing apparatus |
US4873626A (en) * | 1986-12-17 | 1989-10-10 | Massachusetts Institute Of Technology | Parallel processing system with processor array having memory system included in system memory |
US4914653A (en) * | 1986-12-22 | 1990-04-03 | American Telephone And Telegraph Company | Inter-processor communication protocol |
US4956771A (en) * | 1988-05-24 | 1990-09-11 | Prime Computer, Inc. | Method for inter-processor data transfer |
US4985832A (en) * | 1986-09-18 | 1991-01-15 | Digital Equipment Corporation | SIMD array processing system with routing networks having plurality of switching stages to transfer messages among processors |
US5185871A (en) * | 1989-12-26 | 1993-02-09 | International Business Machines Corporation | Coordination of out-of-sequence fetching between multiple processors using re-execution of instructions |
US5283883A (en) * | 1991-10-17 | 1994-02-01 | Sun Microsystems, Inc. | Method and direct memory access controller for asynchronously reading/writing data from/to a memory with improved throughput |
US5317752A (en) * | 1989-12-22 | 1994-05-31 | Tandem Computers Incorporated | Fault-tolerant computer system with auto-restart after power-fall |
US5421028A (en) * | 1991-03-15 | 1995-05-30 | Hewlett-Packard Company | Processing commands and data in a common pipeline path in a high-speed computer graphics system |
US5440682A (en) * | 1993-06-04 | 1995-08-08 | Sun Microsystems, Inc. | Draw processor for a high performance three dimensional graphic accelerator |
US5524075A (en) * | 1993-05-24 | 1996-06-04 | Sagem S.A. | Digital image processing circuitry |
US5603043A (en) * | 1992-11-05 | 1997-02-11 | Giga Operations Corporation | System for compiling algorithmic language source code for implementation in programmable hardware |
US5623418A (en) * | 1990-04-06 | 1997-04-22 | Lsi Logic Corporation | System and method for creating and validating structural description of electronic system |
US5640107A (en) * | 1995-10-24 | 1997-06-17 | Northrop Grumman Corporation | Method for in-circuit programming of a field-programmable gate array configuration memory |
US5648732A (en) * | 1995-10-04 | 1997-07-15 | Xilinx, Inc. | Field programmable pipeline array |
US5655069A (en) * | 1994-07-29 | 1997-08-05 | Fujitsu Limited | Apparatus having a plurality of programmable logic processing units for self-repair |
US5732107A (en) * | 1995-08-31 | 1998-03-24 | Northrop Grumman Corporation | Fir interpolator with zero order hold and fir-spline interpolation combination |
US5752071A (en) * | 1995-07-17 | 1998-05-12 | Intel Corporation | Function coprocessor |
US5784636A (en) * | 1996-05-28 | 1998-07-21 | National Semiconductor Corporation | Reconfigurable computer architecture for use in signal processing applications |
US5801958A (en) * | 1990-04-06 | 1998-09-01 | Lsi Logic Corporation | Method and system for creating and validating low level description of electronic design from higher level, behavior-oriented description, including interactive system for hierarchical display of control and dataflow information |
US5867399A (en) * | 1990-04-06 | 1999-02-02 | Lsi Logic Corporation | System and method for creating and validating structural description of electronic system from higher-level and behavior-oriented description |
US5892962A (en) * | 1996-11-12 | 1999-04-06 | Lucent Technologies Inc. | FPGA-based processor |
US5909565A (en) * | 1995-04-28 | 1999-06-01 | Matsushita Electric Industrial Co., Ltd. | Microprocessor system which efficiently shares register data between a main processor and a coprocessor |
US5910897A (en) * | 1994-06-01 | 1999-06-08 | Lsi Logic Corporation | Specification and design of complex digital systems |
US5916307A (en) * | 1996-06-05 | 1999-06-29 | New Era Of Networks, Inc. | Method and structure for balanced queue communication between nodes in a distributed computing application |
US5933356A (en) * | 1990-04-06 | 1999-08-03 | Lsi Logic Corporation | Method and system for creating and verifying structural logic model of electronic design from behavioral description, including generation of logic and timing models |
US5931959A (en) * | 1997-05-21 | 1999-08-03 | The United States Of America As Represented By The Secretary Of The Air Force | Dynamically reconfigurable FPGA apparatus and method for multiprocessing and fault tolerance |
US6018793A (en) * | 1997-10-24 | 2000-01-25 | Cirrus Logic, Inc. | Single chip controller-memory device including feature-selectable bank I/O and architecture and methods suitable for implementing the same |
US6023742A (en) * | 1996-07-18 | 2000-02-08 | University Of Washington | Reconfigurable computing architecture for providing pipelined data paths |
US6049222A (en) * | 1997-12-30 | 2000-04-11 | Xilinx, Inc | Configuring an FPGA using embedded memory |
US6096091A (en) * | 1998-02-24 | 2000-08-01 | Advanced Micro Devices, Inc. | Dynamically reconfigurable logic networks interconnected by fall-through FIFOs for flexible pipeline processing in a system-on-a-chip |
US6108693A (en) * | 1997-10-17 | 2000-08-22 | Nec Corporation | System and method of data communication in multiprocessor system |
US6112288A (en) * | 1998-05-19 | 2000-08-29 | Paracel, Inc. | Dynamic configurable system of parallel modules comprising chain of chips comprising parallel pipeline chain of processors with master controller feeding command and data |
US6115047A (en) * | 1996-07-01 | 2000-09-05 | Sun Microsystems, Inc. | Method and apparatus for implementing efficient floating point Z-buffering |
US6192384B1 (en) * | 1998-09-14 | 2001-02-20 | The Board Of Trustees Of The Leland Stanford Junior University | System and method for performing compound vector operations |
US6202139B1 (en) * | 1998-06-19 | 2001-03-13 | Advanced Micro Devices, Inc. | Pipelined data cache with multiple ports and processor with load/store unit selecting only load or store operations for concurrent processing |
US6205516B1 (en) * | 1997-10-31 | 2001-03-20 | Brother Kogyo Kabushiki Kaisha | Device and method for controlling data storage device in data processing system |
US6216252B1 (en) * | 1990-04-06 | 2001-04-10 | Lsi Logic Corporation | Method and system for creating, validating, and scaling structural description of electronic device |
US6216191B1 (en) * | 1997-10-15 | 2001-04-10 | Lucent Technologies Inc. | Field programmable gate array having a dedicated processor interface |
US6247118B1 (en) * | 1998-06-05 | 2001-06-12 | Mcdonnell Douglas Corporation | Systems and methods for transient error recovery in reduced instruction set computer processors via instruction retry |
US6247134B1 (en) * | 1999-03-31 | 2001-06-12 | Synopsys, Inc. | Method and system for pipe stage gating within an operating pipelined circuit for power savings |
US6253276B1 (en) * | 1998-06-30 | 2001-06-26 | Micron Technology, Inc. | Apparatus for adaptive decoding of memory addresses |
US20010014937A1 (en) * | 1997-12-17 | 2001-08-16 | Huppenthal Jon M. | Multiprocessor computer architecture incorporating a plurality of memory algorithm processors in the memory subsystem |
US6282627B1 (en) * | 1998-06-29 | 2001-08-28 | Chameleon Systems, Inc. | Integrated processor and programmable data path chip for reconfigurable computing |
US6282578B1 (en) * | 1995-06-26 | 2001-08-28 | Hitachi, Ltd. | Execution management method of program on reception side of message in distributed processing system |
US6308311B1 (en) * | 1999-05-14 | 2001-10-23 | Xilinx, Inc. | Method for reconfiguring a field programmable gate array from a host |
US6363465B1 (en) * | 1996-11-25 | 2002-03-26 | Kabushiki Kaisha Toshiba | Synchronous data transfer system and method with successive stage control allowing two more stages to simultaneous transfer |
US20020066910A1 (en) * | 2000-12-01 | 2002-06-06 | Hiroshi Tamemoto | Semiconductor integrated circuit |
US20020080174A1 (en) * | 1997-08-18 | 2002-06-27 | National Instruments Corporation | System and method for configuring an instrument to perform measurement functions utilizing conversion of graphical programs into hardware implementations |
US20020087829A1 (en) * | 2000-12-29 | 2002-07-04 | Snyder Walter L. | Re-targetable communication system |
US20020120883A1 (en) * | 2001-02-27 | 2002-08-29 | International Business Machines Corporation | Synchronous to asynchronous to synchronous interface |
US20020144175A1 (en) * | 2001-03-28 | 2002-10-03 | Long Finbarr Denis | Apparatus and methods for fault-tolerant computing using a switching fabric |
US6470482B1 (en) * | 1990-04-06 | 2002-10-22 | Lsi Logic Corporation | Method and system for creating, deriving and validating structural description of electronic system from higher level, behavior-oriented description, including interactive schematic design and simulation |
US20020162086A1 (en) * | 2001-04-30 | 2002-10-31 | Morgan David A. | RTL annotation tool for layout induced netlist changes |
US20030009651A1 (en) * | 2001-05-15 | 2003-01-09 | Zahid Najam | Apparatus and method for interconnecting a processor to co-processors using shared memory |
US6516420B1 (en) * | 1999-10-25 | 2003-02-04 | Motorola, Inc. | Data synchronizer using a parallel handshaking pipeline wherein validity indicators generate and send acknowledgement signals to a different clock domain |
US20030061409A1 (en) * | 2001-02-23 | 2003-03-27 | Rudusky Daryl | System, method and article of manufacture for dynamic, automated product fulfillment for configuring a remotely located device |
US6606360B1 (en) * | 1999-12-30 | 2003-08-12 | Intel Corporation | Method and apparatus for receiving data |
US6611920B1 (en) * | 2000-01-21 | 2003-08-26 | Intel Corporation | Clock distribution system for selectively enabling clock signals to portions of a pipelined circuit |
US20030177223A1 (en) * | 2002-03-12 | 2003-09-18 | Erickson Michael J. | Verification of computer programs |
US6624819B1 (en) * | 2000-05-01 | 2003-09-23 | Broadcom Corporation | Method and system for providing a flexible and efficient processor for use in a graphics processing system |
US6625749B1 (en) * | 1999-12-21 | 2003-09-23 | Intel Corporation | Firmware mechanism for correcting soft errors |
US6684314B1 (en) * | 2000-07-14 | 2004-01-27 | Agilent Technologies, Inc. | Memory controller with programmable address configuration |
US20040019883A1 (en) * | 2001-01-26 | 2004-01-29 | Northwestern University | Method and apparatus for automatically generating hardware from algorithms described in matlab |
US20040045015A1 (en) * | 2002-08-29 | 2004-03-04 | Kazem Haji-Aghajani | Common interface framework for developing field programmable device based applications independent of target circuit board |
US6704816B1 (en) * | 1999-07-26 | 2004-03-09 | Sun Microsystems, Inc. | Method and apparatus for executing standard functions in a computer system using a field programmable gate array |
US20040064198A1 (en) * | 2002-05-06 | 2004-04-01 | Cyber Switching, Inc. | Method and/or system and/or apparatus for remote power management and monitoring supply |
US20040061147A1 (en) * | 2001-01-19 | 2004-04-01 | Ryo Fujita | Electronic circuit device |
US20040130927A1 (en) * | 2002-10-31 | 2004-07-08 | Lockheed Martin Corporation | Pipeline accelerator having multiple pipeline units and related computing machine and method |
US6769072B1 (en) * | 1999-09-14 | 2004-07-27 | Fujitsu Limited | Distributed processing system with registered reconfiguration processors and registered notified processors |
US20040153752A1 (en) * | 2002-12-02 | 2004-08-05 | Marvell International Ltd. | Self-reparable semiconductor and method thereof |
US6839873B1 (en) * | 2000-06-23 | 2005-01-04 | Cypress Semiconductor Corporation | Method and apparatus for programmable logic device (PLD) built-in-self-test (BIST) |
US20050104743A1 (en) * | 2003-11-19 | 2005-05-19 | Ripolone James G. | High speed communication for measurement while drilling |
US20050149898A1 (en) * | 1998-10-14 | 2005-07-07 | Hakewill James R.H. | Method and apparatus for managing the configuration and functionality of a semiconductor design |
US6925549B2 (en) * | 2000-12-21 | 2005-08-02 | International Business Machines Corporation | Asynchronous pipeline control interface using tag values to control passing data through successive pipeline stages |
US6982976B2 (en) * | 2000-08-11 | 2006-01-03 | Texas Instruments Incorporated | Datapipe routing bridge |
US7024654B2 (en) * | 2002-06-11 | 2006-04-04 | Anadigm, Inc. | System and method for configuring analog elements in a configurable hardware device |
US7024660B2 (en) * | 1998-02-17 | 2006-04-04 | National Instruments Corporation | Debugging a program intended to execute on a reconfigurable device using a test feed-through configuration |
US20060123282A1 (en) * | 2004-10-01 | 2006-06-08 | Gouldey Brent I | Service layer architecture for memory access system and method |
US7073158B2 (en) * | 2002-05-17 | 2006-07-04 | Pixel Velocity, Inc. | Automated system for designing and developing field programmable gate arrays |
US7117390B1 (en) * | 2002-05-20 | 2006-10-03 | Sandia Corporation | Practical, redundant, failure-tolerant, self-reconfiguring embedded system architecture |
US20060236018A1 (en) * | 2001-05-18 | 2006-10-19 | Xilinx, Inc. | Programmable logic device including programmable interface core and central processing unit |
US7143368B1 (en) * | 2004-06-10 | 2006-11-28 | Altera Corporation | DSP design system level power estimation |
US7178112B1 (en) * | 2003-04-16 | 2007-02-13 | The Mathworks, Inc. | Management of functions for block diagrams |
US7177310B2 (en) * | 2001-03-12 | 2007-02-13 | Hitachi, Ltd. | Network connection apparatus |
US7228520B1 (en) * | 2004-01-30 | 2007-06-05 | Xilinx, Inc. | Method and apparatus for a programmable interface of a soft platform on a programmable logic device |
US7360196B1 (en) * | 2004-06-02 | 2008-04-15 | Altera Corporation | Technology mapping for programming and design of a programmable logic device by equating logic expressions |
Family Cites Families (79)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US2006114A (en) * | 1931-04-27 | 1935-06-25 | Rosenmund Karl Wilhelm | Aliphatic-aromatic amine and process of making same |
US3665173A (en) * | 1968-09-03 | 1972-05-23 | Ibm | Triple modular redundancy/sparing |
CH658567GA3 (en) | 1984-03-28 | 1986-11-28 | ||
US4774574A (en) | 1987-06-02 | 1988-09-27 | Eastman Kodak Company | Adaptive block transform image coding method and apparatus |
US4915653A (en) * | 1988-12-16 | 1990-04-10 | Amp Incorporated | Electrical connector |
US5212777A (en) | 1989-11-17 | 1993-05-18 | Texas Instruments Incorporated | Multi-processor reconfigurable in single instruction multiple data (SIMD) and multiple instruction multiple data (MIMD) modes and method of operation |
US5553002A (en) * | 1990-04-06 | 1996-09-03 | Lsi Logic Corporation | Method and system for creating and validating low level description of electronic design from higher level, behavior-oriented description, using milestone matrix incorporated into user-interface |
JPH0581216A (en) * | 1991-09-20 | 1993-04-02 | Hitachi Ltd | Parallel processor |
US5159449A (en) * | 1991-12-26 | 1992-10-27 | Workstation Technologies, Inc. | Method and apparatus for data reduction in a video image data reduction system |
JP2500038B2 (en) * | 1992-03-04 | 1996-05-29 | インターナショナル・ビジネス・マシーンズ・コーポレイション | Multiprocessor computer system, fault tolerant processing method and data processing system |
EP0566015A3 (en) * | 1992-04-14 | 1994-07-06 | Eastman Kodak Co | Neural network optical character recognition system and method for classifying characters in amoving web |
US5339413A (en) * | 1992-08-21 | 1994-08-16 | International Business Machines Corporation | Data stream protocol for multimedia data streaming data processing system |
US5383187A (en) * | 1992-09-18 | 1995-01-17 | Hughes Aricraft Company | Adaptive protocol for packet communications network and method |
US5361373A (en) * | 1992-12-11 | 1994-11-01 | Gilson Kent L | Integrated circuit computing device comprising a dynamically configurable gate array having a microprocessor and reconfigurable instruction execution means and method therefor |
US5490088A (en) * | 1994-02-28 | 1996-02-06 | Motorola, Inc. | Method of handling data retrieval requests |
US5583964A (en) * | 1994-05-02 | 1996-12-10 | Motorola, Inc. | Computer utilizing neural network and method of using same |
US5568614A (en) | 1994-07-29 | 1996-10-22 | International Business Machines Corporation | Data streaming between peer subsystems of a computer system |
US5710910A (en) * | 1994-09-30 | 1998-01-20 | University Of Washington | Asynchronous self-tuning clock domains and method for transferring data among domains |
US5649135A (en) * | 1995-01-17 | 1997-07-15 | International Business Machines Corporation | Parallel processing system and method using surrogate instructions |
US5692183A (en) | 1995-03-31 | 1997-11-25 | Sun Microsystems, Inc. | Methods and apparatus for providing transparent persistence in a distributed object operating environment |
US5649176A (en) | 1995-08-10 | 1997-07-15 | Virtual Machine Works, Inc. | Transition analysis and circuit resynthesis method and device for digital circuit modeling |
JP4075078B2 (en) * | 1995-10-09 | 2008-04-16 | 松下電器産業株式会社 | optical disk |
JPH09106407A (en) * | 1995-10-12 | 1997-04-22 | Toshiba Corp | Design supporting system |
JPH09148907A (en) | 1995-11-22 | 1997-06-06 | Nec Corp | Synchronous semiconductor logic device |
US5963454A (en) * | 1996-09-25 | 1999-10-05 | Vlsi Technology, Inc. | Method and apparatus for efficiently implementing complex function blocks in integrated circuit designs |
US6028939A (en) * | 1997-01-03 | 2000-02-22 | Redcreek Communications, Inc. | Data security system and method |
US5978578A (en) * | 1997-01-30 | 1999-11-02 | Azarya; Arnon | Openbus system for control automation networks |
US5941999A (en) * | 1997-03-31 | 1999-08-24 | Sun Microsystems | Method and system for achieving high availability in networked computer systems |
US5996059A (en) | 1997-07-01 | 1999-11-30 | National Semiconductor Corporation | System for monitoring an execution pipeline utilizing an address pipeline in parallel with the execution pipeline |
US5987620A (en) | 1997-09-19 | 1999-11-16 | Thang Tran | Method and apparatus for a self-timed and self-enabled distributed clock |
EP0945788B1 (en) | 1998-02-04 | 2004-08-04 | Texas Instruments Inc. | Data processing system with digital signal processor core and co-processor and data processing method |
US6230253B1 (en) | 1998-03-31 | 2001-05-08 | Intel Corporation | Executing partial-width packed data instructions |
US5916037A (en) | 1998-09-11 | 1999-06-29 | Hill; Gaius | Golf swing training device and method |
US6237054B1 (en) | 1998-09-14 | 2001-05-22 | Advanced Micro Devices, Inc. | Network interface unit including a microcontroller having multiple configurable logic blocks, with a test/program bus for performing a plurality of selected functions |
US6405266B1 (en) * | 1998-11-30 | 2002-06-11 | Hewlett-Packard Company | Unified publish and subscribe paradigm for local and remote publishing destinations |
US6477170B1 (en) | 1999-05-21 | 2002-11-05 | Advanced Micro Devices, Inc. | Method and apparatus for interfacing between systems operating under different clock regimes with interlocking to prevent overwriting of data |
EP1061438A1 (en) | 1999-06-15 | 2000-12-20 | Hewlett-Packard Company | Computer architecture containing processor and coprocessor |
EP1061439A1 (en) | 1999-06-15 | 2000-12-20 | Hewlett-Packard Company | Memory and instructions in computer architecture containing processor and coprocessor |
US20030014627A1 (en) | 1999-07-08 | 2003-01-16 | Broadcom Corporation | Distributed processing in a cryptography acceleration chip |
JP2001142695A (en) | 1999-10-01 | 2001-05-25 | Hitachi Ltd | Loading method of constant to storage place, loading method of constant to address storage place, loading method of constant to register, deciding method of number of code bit, normalizing method of binary number and instruction in computer system |
US6526430B1 (en) * | 1999-10-04 | 2003-02-25 | Texas Instruments Incorporated | Reconfigurable SIMD coprocessor architecture for sum of absolute differences and symmetric filtering (scalable MAC engine for image processing) |
US6326806B1 (en) | 2000-03-29 | 2001-12-04 | Xilinx, Inc. | FPGA-based communications access point and system for reconfiguration |
US6532009B1 (en) * | 2000-05-18 | 2003-03-11 | International Business Machines Corporation | Programmable hardwired geometry pipeline |
US6817005B2 (en) | 2000-05-25 | 2004-11-09 | Xilinx, Inc. | Modular design method and system for programmable logic devices |
DE10026118C2 (en) * | 2000-05-26 | 2002-11-28 | Ronald Neuendorf | Device for moistening liquid-absorbing agents, such as toilet paper |
US7196710B1 (en) | 2000-08-23 | 2007-03-27 | Nintendo Co., Ltd. | Method and apparatus for buffering graphics data in a graphics system |
US6829697B1 (en) * | 2000-09-06 | 2004-12-07 | International Business Machines Corporation | Multiple logical interfaces to a shared coprocessor resource |
GB0023409D0 (en) * | 2000-09-22 | 2000-11-08 | Integrated Silicon Systems Ltd | Data encryption apparatus |
US6708239B1 (en) * | 2000-12-08 | 2004-03-16 | The Boeing Company | Network device interface for digitally interfacing data channels to a controller via a network |
US6785841B2 (en) * | 2000-12-14 | 2004-08-31 | International Business Machines Corporation | Processor with redundant logic |
AU2002234212A1 (en) * | 2001-01-03 | 2002-08-19 | University Of Southern California | System level applications of adaptive computing (slaac) technology |
US6662285B1 (en) | 2001-01-09 | 2003-12-09 | Xilinx, Inc. | User configurable memory system having local and global memory blocks |
US7036059B1 (en) | 2001-02-14 | 2006-04-25 | Xilinx, Inc. | Techniques for mitigating, detecting and correcting single event upset effects in systems using SRAM-based field programmable gate arrays |
JP2002269063A (en) | 2001-03-07 | 2002-09-20 | Toshiba Corp | Massaging program, messaging method of distributed system, and messaging system |
JP2002281079A (en) | 2001-03-21 | 2002-09-27 | Victor Co Of Japan Ltd | Image data transmitting device |
US7219309B2 (en) | 2001-05-02 | 2007-05-15 | Bitstream Inc. | Innovations for the display of web pages |
US6985975B1 (en) * | 2001-06-29 | 2006-01-10 | Sanera Systems, Inc. | Packet lockstep system and method |
US20030086595A1 (en) * | 2001-11-07 | 2003-05-08 | Hui Hu | Display parameter-dependent pre-transmission processing of image data |
US7106715B1 (en) * | 2001-11-16 | 2006-09-12 | Vixs Systems, Inc. | System for providing data to multiple devices and method thereof |
US7143418B1 (en) * | 2001-12-10 | 2006-11-28 | Xilinx, Inc. | Core template package for creating run-time reconfigurable cores |
US20040013258A1 (en) * | 2002-07-22 | 2004-01-22 | Web. De Ag | Communications environment having a connection device |
US7137020B2 (en) * | 2002-05-17 | 2006-11-14 | Sun Microsystems, Inc. | Method and apparatus for disabling defective components in a computer system |
US20030231649A1 (en) | 2002-06-13 | 2003-12-18 | Awoseyi Paul A. | Dual purpose method and apparatus for performing network interface and security transactions |
US7076681B2 (en) * | 2002-07-02 | 2006-07-11 | International Business Machines Corporation | Processor with demand-driven clock throttling power reduction |
EP1383042B1 (en) | 2002-07-19 | 2007-03-28 | STMicroelectronics S.r.l. | A multiphase synchronous pipeline structure |
WO2004042562A2 (en) | 2002-10-31 | 2004-05-21 | Lockheed Martin Corporation | Pipeline accelerator and related system and method |
AU2003287317B2 (en) | 2002-10-31 | 2010-03-11 | Lockheed Martin Corporation | Pipeline accelerator having multiple pipeline units and related computing machine and method |
US7200114B1 (en) * | 2002-11-18 | 2007-04-03 | At&T Corp. | Method for reconfiguring a router |
US7260794B2 (en) * | 2002-12-20 | 2007-08-21 | Quickturn Design Systems, Inc. | Logic multiprocessor for FPGA implementation |
US20040203383A1 (en) * | 2002-12-31 | 2004-10-14 | Kelton James Robert | System for providing data to multiple devices and method thereof |
US7096444B2 (en) * | 2003-06-09 | 2006-08-22 | Kuoching Lin | Representing device layout using tree structure |
CN1802622A (en) * | 2003-06-10 | 2006-07-12 | 皇家飞利浦电子股份有限公司 | Embedded computing system with reconfigurable power supply and/or clock frequency domains |
US7284225B1 (en) * | 2004-05-20 | 2007-10-16 | Xilinx, Inc. | Embedding a hardware object in an application system |
ITPD20040058U1 (en) * | 2004-07-06 | 2004-10-06 | Marchioro Spa | MODULAR CAGE STRUCTURE |
WO2006039713A2 (en) | 2004-10-01 | 2006-04-13 | Lockheed Martin Corporation | Configurable computing machine and related systems and methods |
TWM279747U (en) * | 2004-11-24 | 2005-11-01 | Jing-Jung Chen | Improved structure of a turbine blade |
US20070030816A1 (en) * | 2005-08-08 | 2007-02-08 | Honeywell International Inc. | Data compression and abnormal situation detection in a wireless sensor network |
JP5009979B2 (en) | 2006-05-22 | 2012-08-29 | コーヒレント・ロジックス・インコーポレーテッド | ASIC design based on execution of software program in processing system |
KR20080086423A (en) | 2008-09-01 | 2008-09-25 | 김태진 | Secondary comb for cutting hair |
-
2005
- 2005-10-03 WO PCT/US2005/035814 patent/WO2006039711A1/en active Application Filing
- 2005-10-03 US US11/243,527 patent/US7487302B2/en not_active Expired - Fee Related
- 2005-10-03 WO PCT/US2005/035813 patent/WO2006039710A2/en active Application Filing
- 2005-10-03 US US11/243,459 patent/US20060101250A1/en not_active Abandoned
- 2005-10-03 US US11/243,509 patent/US20060230377A1/en not_active Abandoned
- 2005-10-03 US US11/243,508 patent/US7809982B2/en not_active Expired - Fee Related
- 2005-10-03 US US11/243,528 patent/US7619541B2/en not_active Expired - Fee Related
- 2005-10-03 US US11/243,507 patent/US7676649B2/en not_active Expired - Fee Related
- 2005-10-03 US US11/243,502 patent/US8073974B2/en not_active Expired - Fee Related
- 2005-10-03 US US11/243,506 patent/US20060085781A1/en not_active Abandoned
Patent Citations (100)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4782461A (en) * | 1984-06-21 | 1988-11-01 | Step Engineering | Logical grouping of facilities within a computer development system |
US4703475A (en) * | 1985-12-04 | 1987-10-27 | American Telephone And Telegraph Company At&T Bell Laboratories | Data communication method and apparatus using multiple physical data links |
US4985832A (en) * | 1986-09-18 | 1991-01-15 | Digital Equipment Corporation | SIMD array processing system with routing networks having plurality of switching stages to transfer messages among processors |
US4873626A (en) * | 1986-12-17 | 1989-10-10 | Massachusetts Institute Of Technology | Parallel processing system with processor array having memory system included in system memory |
US4914653A (en) * | 1986-12-22 | 1990-04-03 | American Telephone And Telegraph Company | Inter-processor communication protocol |
US4862407A (en) * | 1987-10-05 | 1989-08-29 | Motorola, Inc. | Digital signal processing apparatus |
US4956771A (en) * | 1988-05-24 | 1990-09-11 | Prime Computer, Inc. | Method for inter-processor data transfer |
US5317752A (en) * | 1989-12-22 | 1994-05-31 | Tandem Computers Incorporated | Fault-tolerant computer system with auto-restart after power-fall |
US5185871A (en) * | 1989-12-26 | 1993-02-09 | International Business Machines Corporation | Coordination of out-of-sequence fetching between multiple processors using re-execution of instructions |
US6216252B1 (en) * | 1990-04-06 | 2001-04-10 | Lsi Logic Corporation | Method and system for creating, validating, and scaling structural description of electronic device |
US5623418A (en) * | 1990-04-06 | 1997-04-22 | Lsi Logic Corporation | System and method for creating and validating structural description of electronic system |
US5867399A (en) * | 1990-04-06 | 1999-02-02 | Lsi Logic Corporation | System and method for creating and validating structural description of electronic system from higher-level and behavior-oriented description |
US6470482B1 (en) * | 1990-04-06 | 2002-10-22 | Lsi Logic Corporation | Method and system for creating, deriving and validating structural description of electronic system from higher level, behavior-oriented description, including interactive schematic design and simulation |
US5933356A (en) * | 1990-04-06 | 1999-08-03 | Lsi Logic Corporation | Method and system for creating and verifying structural logic model of electronic design from behavioral description, including generation of logic and timing models |
US5801958A (en) * | 1990-04-06 | 1998-09-01 | Lsi Logic Corporation | Method and system for creating and validating low level description of electronic design from higher level, behavior-oriented description, including interactive system for hierarchical display of control and dataflow information |
US5421028A (en) * | 1991-03-15 | 1995-05-30 | Hewlett-Packard Company | Processing commands and data in a common pipeline path in a high-speed computer graphics system |
US5283883A (en) * | 1991-10-17 | 1994-02-01 | Sun Microsystems, Inc. | Method and direct memory access controller for asynchronously reading/writing data from/to a memory with improved throughput |
US5603043A (en) * | 1992-11-05 | 1997-02-11 | Giga Operations Corporation | System for compiling algorithmic language source code for implementation in programmable hardware |
US5524075A (en) * | 1993-05-24 | 1996-06-04 | Sagem S.A. | Digital image processing circuitry |
US5440682A (en) * | 1993-06-04 | 1995-08-08 | Sun Microsystems, Inc. | Draw processor for a high performance three dimensional graphic accelerator |
US5910897A (en) * | 1994-06-01 | 1999-06-08 | Lsi Logic Corporation | Specification and design of complex digital systems |
US5655069A (en) * | 1994-07-29 | 1997-08-05 | Fujitsu Limited | Apparatus having a plurality of programmable logic processing units for self-repair |
US5909565A (en) * | 1995-04-28 | 1999-06-01 | Matsushita Electric Industrial Co., Ltd. | Microprocessor system which efficiently shares register data between a main processor and a coprocessor |
US6282578B1 (en) * | 1995-06-26 | 2001-08-28 | Hitachi, Ltd. | Execution management method of program on reception side of message in distributed processing system |
US5752071A (en) * | 1995-07-17 | 1998-05-12 | Intel Corporation | Function coprocessor |
US5732107A (en) * | 1995-08-31 | 1998-03-24 | Northrop Grumman Corporation | Fir interpolator with zero order hold and fir-spline interpolation combination |
US5648732A (en) * | 1995-10-04 | 1997-07-15 | Xilinx, Inc. | Field programmable pipeline array |
US5640107A (en) * | 1995-10-24 | 1997-06-17 | Northrop Grumman Corporation | Method for in-circuit programming of a field-programmable gate array configuration memory |
US5784636A (en) * | 1996-05-28 | 1998-07-21 | National Semiconductor Corporation | Reconfigurable computer architecture for use in signal processing applications |
US5916307A (en) * | 1996-06-05 | 1999-06-29 | New Era Of Networks, Inc. | Method and structure for balanced queue communication between nodes in a distributed computing application |
US6115047A (en) * | 1996-07-01 | 2000-09-05 | Sun Microsystems, Inc. | Method and apparatus for implementing efficient floating point Z-buffering |
US6023742A (en) * | 1996-07-18 | 2000-02-08 | University Of Washington | Reconfigurable computing architecture for providing pipelined data paths |
US5892962A (en) * | 1996-11-12 | 1999-04-06 | Lucent Technologies Inc. | FPGA-based processor |
US6363465B1 (en) * | 1996-11-25 | 2002-03-26 | Kabushiki Kaisha Toshiba | Synchronous data transfer system and method with successive stage control allowing two more stages to simultaneous transfer |
US5931959A (en) * | 1997-05-21 | 1999-08-03 | The United States Of America As Represented By The Secretary Of The Air Force | Dynamically reconfigurable FPGA apparatus and method for multiprocessing and fault tolerance |
US20020080174A1 (en) * | 1997-08-18 | 2002-06-27 | National Instruments Corporation | System and method for configuring an instrument to perform measurement functions utilizing conversion of graphical programs into hardware implementations |
US6784903B2 (en) * | 1997-08-18 | 2004-08-31 | National Instruments Corporation | System and method for configuring an instrument to perform measurement functions utilizing conversion of graphical programs into hardware implementations |
US6216191B1 (en) * | 1997-10-15 | 2001-04-10 | Lucent Technologies Inc. | Field programmable gate array having a dedicated processor interface |
US6108693A (en) * | 1997-10-17 | 2000-08-22 | Nec Corporation | System and method of data communication in multiprocessor system |
US6018793A (en) * | 1997-10-24 | 2000-01-25 | Cirrus Logic, Inc. | Single chip controller-memory device including feature-selectable bank I/O and architecture and methods suitable for implementing the same |
US6205516B1 (en) * | 1997-10-31 | 2001-03-20 | Brother Kogyo Kabushiki Kaisha | Device and method for controlling data storage device in data processing system |
US20010014937A1 (en) * | 1997-12-17 | 2001-08-16 | Huppenthal Jon M. | Multiprocessor computer architecture incorporating a plurality of memory algorithm processors in the memory subsystem |
US6049222A (en) * | 1997-12-30 | 2000-04-11 | Xilinx, Inc | Configuring an FPGA using embedded memory |
US7024660B2 (en) * | 1998-02-17 | 2006-04-04 | National Instruments Corporation | Debugging a program intended to execute on a reconfigurable device using a test feed-through configuration |
US6096091A (en) * | 1998-02-24 | 2000-08-01 | Advanced Micro Devices, Inc. | Dynamically reconfigurable logic networks interconnected by fall-through FIFOs for flexible pipeline processing in a system-on-a-chip |
US6112288A (en) * | 1998-05-19 | 2000-08-29 | Paracel, Inc. | Dynamic configurable system of parallel modules comprising chain of chips comprising parallel pipeline chain of processors with master controller feeding command and data |
US6247118B1 (en) * | 1998-06-05 | 2001-06-12 | Mcdonnell Douglas Corporation | Systems and methods for transient error recovery in reduced instruction set computer processors via instruction retry |
US20010025338A1 (en) * | 1998-06-05 | 2001-09-27 | The Boeing Company | Systems and methods for transient error recovery in reduced instruction set computer processors via instruction retry |
US6785842B2 (en) * | 1998-06-05 | 2004-08-31 | Mcdonnell Douglas Corporation | Systems and methods for use in reduced instruction set computer processors for retrying execution of instructions resulting in errors |
US6202139B1 (en) * | 1998-06-19 | 2001-03-13 | Advanced Micro Devices, Inc. | Pipelined data cache with multiple ports and processor with load/store unit selecting only load or store operations for concurrent processing |
US6282627B1 (en) * | 1998-06-29 | 2001-08-28 | Chameleon Systems, Inc. | Integrated processor and programmable data path chip for reconfigurable computing |
US6253276B1 (en) * | 1998-06-30 | 2001-06-26 | Micron Technology, Inc. | Apparatus for adaptive decoding of memory addresses |
US6192384B1 (en) * | 1998-09-14 | 2001-02-20 | The Board Of Trustees Of The Leland Stanford Junior University | System and method for performing compound vector operations |
US20050149898A1 (en) * | 1998-10-14 | 2005-07-07 | Hakewill James R.H. | Method and apparatus for managing the configuration and functionality of a semiconductor design |
US6247134B1 (en) * | 1999-03-31 | 2001-06-12 | Synopsys, Inc. | Method and system for pipe stage gating within an operating pipelined circuit for power savings |
US6308311B1 (en) * | 1999-05-14 | 2001-10-23 | Xilinx, Inc. | Method for reconfiguring a field programmable gate array from a host |
US6704816B1 (en) * | 1999-07-26 | 2004-03-09 | Sun Microsystems, Inc. | Method and apparatus for executing standard functions in a computer system using a field programmable gate array |
US6769072B1 (en) * | 1999-09-14 | 2004-07-27 | Fujitsu Limited | Distributed processing system with registered reconfiguration processors and registered notified processors |
US6516420B1 (en) * | 1999-10-25 | 2003-02-04 | Motorola, Inc. | Data synchronizer using a parallel handshaking pipeline wherein validity indicators generate and send acknowledgement signals to a different clock domain |
US20040019771A1 (en) * | 1999-12-21 | 2004-01-29 | Nhon Quach | Firmwave mechanism for correcting soft errors |
US6625749B1 (en) * | 1999-12-21 | 2003-09-23 | Intel Corporation | Firmware mechanism for correcting soft errors |
US6606360B1 (en) * | 1999-12-30 | 2003-08-12 | Intel Corporation | Method and apparatus for receiving data |
US6611920B1 (en) * | 2000-01-21 | 2003-08-26 | Intel Corporation | Clock distribution system for selectively enabling clock signals to portions of a pipelined circuit |
US6624819B1 (en) * | 2000-05-01 | 2003-09-23 | Broadcom Corporation | Method and system for providing a flexible and efficient processor for use in a graphics processing system |
US6839873B1 (en) * | 2000-06-23 | 2005-01-04 | Cypress Semiconductor Corporation | Method and apparatus for programmable logic device (PLD) built-in-self-test (BIST) |
US6684314B1 (en) * | 2000-07-14 | 2004-01-27 | Agilent Technologies, Inc. | Memory controller with programmable address configuration |
US6982976B2 (en) * | 2000-08-11 | 2006-01-03 | Texas Instruments Incorporated | Datapipe routing bridge |
US20020066910A1 (en) * | 2000-12-01 | 2002-06-06 | Hiroshi Tamemoto | Semiconductor integrated circuit |
US6925549B2 (en) * | 2000-12-21 | 2005-08-02 | International Business Machines Corporation | Asynchronous pipeline control interface using tag values to control passing data through successive pipeline stages |
US20020087829A1 (en) * | 2000-12-29 | 2002-07-04 | Snyder Walter L. | Re-targetable communication system |
US20040061147A1 (en) * | 2001-01-19 | 2004-04-01 | Ryo Fujita | Electronic circuit device |
US20040019883A1 (en) * | 2001-01-26 | 2004-01-29 | Northwestern University | Method and apparatus for automatically generating hardware from algorithms described in matlab |
US20030061409A1 (en) * | 2001-02-23 | 2003-03-27 | Rudusky Daryl | System, method and article of manufacture for dynamic, automated product fulfillment for configuring a remotely located device |
US20020120883A1 (en) * | 2001-02-27 | 2002-08-29 | International Business Machines Corporation | Synchronous to asynchronous to synchronous interface |
US7177310B2 (en) * | 2001-03-12 | 2007-02-13 | Hitachi, Ltd. | Network connection apparatus |
US20020144175A1 (en) * | 2001-03-28 | 2002-10-03 | Long Finbarr Denis | Apparatus and methods for fault-tolerant computing using a switching fabric |
US20020162086A1 (en) * | 2001-04-30 | 2002-10-31 | Morgan David A. | RTL annotation tool for layout induced netlist changes |
US20030009651A1 (en) * | 2001-05-15 | 2003-01-09 | Zahid Najam | Apparatus and method for interconnecting a processor to co-processors using shared memory |
US20060236018A1 (en) * | 2001-05-18 | 2006-10-19 | Xilinx, Inc. | Programmable logic device including programmable interface core and central processing unit |
US20030177223A1 (en) * | 2002-03-12 | 2003-09-18 | Erickson Michael J. | Verification of computer programs |
US20040064198A1 (en) * | 2002-05-06 | 2004-04-01 | Cyber Switching, Inc. | Method and/or system and/or apparatus for remote power management and monitoring supply |
US20060206850A1 (en) * | 2002-05-17 | 2006-09-14 | Mccubbrey David L | Automated system for designing and developing field programmable gate arrays |
US7073158B2 (en) * | 2002-05-17 | 2006-07-04 | Pixel Velocity, Inc. | Automated system for designing and developing field programmable gate arrays |
US7117390B1 (en) * | 2002-05-20 | 2006-10-03 | Sandia Corporation | Practical, redundant, failure-tolerant, self-reconfiguring embedded system architecture |
US7024654B2 (en) * | 2002-06-11 | 2006-04-04 | Anadigm, Inc. | System and method for configuring analog elements in a configurable hardware device |
US20040045015A1 (en) * | 2002-08-29 | 2004-03-04 | Kazem Haji-Aghajani | Common interface framework for developing field programmable device based applications independent of target circuit board |
US20040133763A1 (en) * | 2002-10-31 | 2004-07-08 | Lockheed Martin Corporation | Computing architecture and related system and method |
US20040130927A1 (en) * | 2002-10-31 | 2004-07-08 | Lockheed Martin Corporation | Pipeline accelerator having multiple pipeline units and related computing machine and method |
US20040136241A1 (en) * | 2002-10-31 | 2004-07-15 | Lockheed Martin Corporation | Pipeline accelerator for improved computing architecture and related system and method |
US20040181621A1 (en) * | 2002-10-31 | 2004-09-16 | Lockheed Martin Corporation | Computing machine having improved computing architecture and related system and method |
US20040170070A1 (en) * | 2002-10-31 | 2004-09-02 | Lockheed Martin Corporation | Programmable circuit and related computing machine and method |
US20040153752A1 (en) * | 2002-12-02 | 2004-08-05 | Marvell International Ltd. | Self-reparable semiconductor and method thereof |
US20070055907A1 (en) * | 2002-12-02 | 2007-03-08 | Sehat Sutardja | Self-reparable semiconductor and method thereof |
US20070157138A1 (en) * | 2003-04-16 | 2007-07-05 | The Mathworks, Inc. | Management of functions for block diagrams |
US7178112B1 (en) * | 2003-04-16 | 2007-02-13 | The Mathworks, Inc. | Management of functions for block diagrams |
US20050104743A1 (en) * | 2003-11-19 | 2005-05-19 | Ripolone James G. | High speed communication for measurement while drilling |
US7228520B1 (en) * | 2004-01-30 | 2007-06-05 | Xilinx, Inc. | Method and apparatus for a programmable interface of a soft platform on a programmable logic device |
US7360196B1 (en) * | 2004-06-02 | 2008-04-15 | Altera Corporation | Technology mapping for programming and design of a programmable logic device by equating logic expressions |
US7143368B1 (en) * | 2004-06-10 | 2006-11-28 | Altera Corporation | DSP design system level power estimation |
US20060123282A1 (en) * | 2004-10-01 | 2006-06-08 | Gouldey Brent I | Service layer architecture for memory access system and method |
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7987341B2 (en) | 2002-10-31 | 2011-07-26 | Lockheed Martin Corporation | Computing machine using software objects for transferring data that includes no destination information |
US8250341B2 (en) | 2002-10-31 | 2012-08-21 | Lockheed Martin Corporation | Pipeline accelerator having multiple pipeline units and related computing machine and method |
US20060101253A1 (en) * | 2004-10-01 | 2006-05-11 | Lockheed Martin Corporation | Computing machine with redundancy and related systems and methods |
US8073974B2 (en) | 2004-10-01 | 2011-12-06 | Lockheed Martin Corporation | Object oriented mission framework and system and method |
US7676649B2 (en) | 2004-10-01 | 2010-03-09 | Lockheed Martin Corporation | Computing machine with redundancy and related systems and methods |
US7984581B2 (en) | 2004-10-29 | 2011-07-26 | Lockheed Martin Corporation | Projectile accelerator and related vehicle and method |
US20060265927A1 (en) * | 2004-10-29 | 2006-11-30 | Lockheed Martin Corporation | Projectile accelerator and related vehicle and method |
US7814696B2 (en) | 2004-10-29 | 2010-10-19 | Lockheed Martin Corporation | Projectile accelerator and related vehicle and method |
US7331031B2 (en) * | 2005-03-03 | 2008-02-12 | Lsi Logic Corporation | Method for describing and deploying design platform sets |
US20060200788A1 (en) * | 2005-03-03 | 2006-09-07 | Lsi Logic Corporation | Method for describing and deploying design platform sets |
US8271928B2 (en) | 2005-08-05 | 2012-09-18 | Mentor Graphics Corporation | Automating power domains in electronic design automation |
US20090276742A1 (en) * | 2005-08-05 | 2009-11-05 | John Wilson | Automating power domains in electronic design automation |
US7574683B2 (en) * | 2005-08-05 | 2009-08-11 | John Wilson | Automating power domains in electronic design automation |
US20070044044A1 (en) * | 2005-08-05 | 2007-02-22 | John Wilson | Automating power domains in electronic design automation |
US7870522B1 (en) | 2005-11-08 | 2011-01-11 | Xilinx, Inc. | Efficient communication of data between blocks in a high level modeling system |
US7366998B1 (en) * | 2005-11-08 | 2008-04-29 | Xilinx, Inc. | Efficient communication of data between blocks in a high level modeling system |
US20100046175A1 (en) * | 2008-06-18 | 2010-02-25 | Lockheed Martin Corporation | Electronics module, enclosure assembly housing same, and related systems and methods |
US8773864B2 (en) | 2008-06-18 | 2014-07-08 | Lockheed Martin Corporation | Enclosure assembly housing at least one electronic board assembly and systems using same |
US8189345B2 (en) | 2008-06-18 | 2012-05-29 | Lockheed Martin Corporation | Electronics module, enclosure assembly housing same, and related systems and methods |
US20100046177A1 (en) * | 2008-06-18 | 2010-02-25 | Lockheed Martin Corporation | Enclosure assembly housing at least one electronic board assembly and systems using same |
US8225269B2 (en) * | 2009-10-30 | 2012-07-17 | Synopsys, Inc. | Technique for generating an analysis equation |
US20110107252A1 (en) * | 2009-10-30 | 2011-05-05 | Synopsys, Inc. | Technique for generating an analysis equation |
US8370784B2 (en) * | 2010-07-13 | 2013-02-05 | Algotochip Corporation | Automatic optimal integrated circuit generator from algorithms and specification |
US20130263067A1 (en) * | 2010-07-13 | 2013-10-03 | Algotochip Corporation | Automatic optimal integrated circuit generator from algorithms and specification |
US20120017187A1 (en) * | 2010-07-13 | 2012-01-19 | Satish Padmanabhan | Automatic optimal integrated circuit generator from algorithms and specification |
US8635567B1 (en) * | 2012-10-11 | 2014-01-21 | Xilinx, Inc. | Electronic design automation tool for guided connection assistance |
US8739103B1 (en) * | 2013-03-04 | 2014-05-27 | Cypress Semiconductor Corporation | Techniques for placement in highly constrained architectures |
US9569581B1 (en) * | 2015-08-10 | 2017-02-14 | International Business Machines Corporation | Logic structure aware circuit routing |
US20170083657A1 (en) * | 2015-08-10 | 2017-03-23 | International Business Machines Corporation | Logic structure aware circuit routing |
US20170083656A1 (en) * | 2015-08-10 | 2017-03-23 | International Business Machines Corporation | Logic structure aware circuit routing |
US9659135B2 (en) * | 2015-08-10 | 2017-05-23 | International Business Machines Corporation | Logic structure aware circuit routing |
US9672314B2 (en) * | 2015-08-10 | 2017-06-06 | International Business Machines Corporation | Logic structure aware circuit routing |
US10922463B1 (en) * | 2019-10-20 | 2021-02-16 | Xilinx, Inc. | User dialog-based automated system design for programmable integrated circuits |
Also Published As
Publication number | Publication date |
---|---|
US20060101253A1 (en) | 2006-05-11 |
US20060101307A1 (en) | 2006-05-11 |
WO2006039710A3 (en) | 2006-06-01 |
WO2006039710A2 (en) | 2006-04-13 |
US20060087450A1 (en) | 2006-04-27 |
WO2006039710A9 (en) | 2006-07-20 |
US8073974B2 (en) | 2011-12-06 |
US20060085781A1 (en) | 2006-04-20 |
US7619541B2 (en) | 2009-11-17 |
US20060101250A1 (en) | 2006-05-11 |
US20060123282A1 (en) | 2006-06-08 |
US7487302B2 (en) | 2009-02-03 |
US7676649B2 (en) | 2010-03-09 |
US20060149920A1 (en) | 2006-07-06 |
WO2006039711A1 (en) | 2006-04-13 |
US7809982B2 (en) | 2010-10-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20060230377A1 (en) | Computer-based tool and method for designing an electronic circuit and related system | |
US6038392A (en) | Implementation of boolean satisfiability with non-chronological backtracking in reconfigurable hardware | |
US6023568A (en) | Extracting accurate and efficient timing models of latch-based designs | |
Benini et al. | Telescopic units: A new paradigm for performance optimization of VLSI designs | |
US7945880B1 (en) | Constraint based retiming of synchronous circuits | |
US7926011B1 (en) | System and method of generating hierarchical block-level timing constraints from chip-level timing constraints | |
US8095899B1 (en) | Verifiable multimode multipliers | |
US5790830A (en) | Extracting accurate and efficient timing models of latch-based designs | |
US7302656B2 (en) | Method and system for performing functional verification of logic circuits | |
US9727668B2 (en) | Delta retiming in logic simulation | |
Gibiluka et al. | A bundled-data asynchronous circuit synthesis flow using a commercial EDA framework | |
US7318014B1 (en) | Bit accurate hardware simulation in system level simulators | |
Raudvere et al. | Application and verification of local nonsemantic-preserving transformations in system design | |
Kovač et al. | FAUST: design and implementation of a pipelined RISC-V vector floating-point unit | |
Kam et al. | Correct-by-construction microarchitectural pipelining | |
Baumgartner et al. | Min-area retiming on flexible circuit structures | |
Côté et al. | Automated SystemC to VHDL translation in hardware/software codesign | |
US6584597B1 (en) | Modeling of phase synchronous circuits | |
Reichenbach et al. | RISC-V3: A RISC-V compatible CPU with a data path based on redundant number systems | |
Tan et al. | The design of an asynchronous VHDL synthesizer | |
US6516453B1 (en) | Method for timing analysis during automatic scheduling of operations in the high-level synthesis of digital systems | |
US7246053B2 (en) | Method for transforming behavioral architectural and verification specifications into cycle-based compliant specifications | |
Keutzer | The need for formal methods for integrated circuit design | |
Ayala-Rincón et al. | Prototyping time-and space-efficient computations of algebraic operations over dynamically reconfigurable systems modeled by rewriting-logic | |
Kaivola et al. | Timed causal fanin analysis for symbolic circuit simulation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: LOCKHEED MARTIN CORPORATION, MARYLAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:RAPP, JOHN;HELLENBACH, SCOTT;KURIAN, T. J.;AND OTHERS;REEL/FRAME:017416/0633;SIGNING DATES FROM 20051028 TO 20051101 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |