WO1993020505A2 - Superscalar risc instruction scheduling - Google Patents

Superscalar risc instruction scheduling Download PDF

Info

Publication number
WO1993020505A2
WO1993020505A2 PCT/JP1993/000375 JP9300375W WO9320505A2 WO 1993020505 A2 WO1993020505 A2 WO 1993020505A2 JP 9300375 W JP9300375 W JP 9300375W WO 9320505 A2 WO9320505 A2 WO 9320505A2
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
register
instructions
dependencies
register file
Prior art date
Application number
PCT/JP1993/000375
Other languages
French (fr)
Other versions
WO1993020505A3 (en
Inventor
Sanjiv Garg
Kevin Ray Iadonato
Original Assignee
Seiko Epson Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=25333867&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=WO1993020505(A2) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by Seiko Epson Corporation filed Critical Seiko Epson Corporation
Priority to DE69311330T priority Critical patent/DE69311330T2/en
Priority to JP51729393A priority patent/JP3730252B2/en
Priority to EP93906834A priority patent/EP0636256B1/en
Publication of WO1993020505A2 publication Critical patent/WO1993020505A2/en
Publication of WO1993020505A3 publication Critical patent/WO1993020505A3/en
Priority to KR1019940703382A priority patent/KR950701101A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/3013Organisation of register space, e.g. banked or distributed register file according to data content, e.g. floating-point registers, address registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3824Operand accessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3838Dependency mechanisms, e.g. register scoreboarding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3838Dependency mechanisms, e.g. register scoreboarding
    • G06F9/384Register renaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3854Instruction completion, e.g. retiring, committing or graduating
    • G06F9/3856Reordering of instructions, e.g. using queues or age tags

Definitions

  • the present invention relates to superscalar reduced instruction set computers (RISC), more particularly, the present invention relates to instruction scheduling including register renaming and instruction issuing for superscalar RISC computers.
  • RISC superscalar reduced instruction set computers
  • True dependencies (sometimes called “flow dependencies” or “write-read” dependencies) are often grouped with anti-dependencies (also called “read-write” dependencies) and output dependencies (also called “write-write” dependencies) into a single group of instruction dependencies. The reason for this grouping is that each of these dependencies manifests itself through use of registers or other storage locations. However, it is important to distinguish true dependencies from the other two. True dependencies represent the flow of data and information through a program. Anti- and output dependencies arise because, at different points in time, registers or other storage locations hold different values for different computations.
  • register identifier precisely identifies the value contained in the corresponding register.
  • correspondence between registers and values breaks down, and values conflict for registers. This problem is severe when the goal of register allocation is to keep as many values in as few registers as possible. Keeping a large number of values in a small number of registers creates a large number of conflicts when the execution order is changed -fro the order assumed by the register allocator.
  • Anti- and output dependencies are more properly called “storage conflicts” because reusing storage locations (including registers) causes instructions to interfere with one another even though conflicting instructions are otherwise independent. Storage conflicts constrain instruction issue and reduce performance. But storage conflicts, like other resource conflicts, can be reduced or eluninated by duplicating the troublesome resource.
  • Johnson also discusses in detail various dependency mechanisms, including: software, register renaming, register renaming with a reorder buffer, register renaming with a future buffer, interlocks, the copying of operands in the instruction window to avoid dependencies, and partial renaming.
  • a conventional hardware implementation relies on software to enforce dependencies between instructions.
  • a compiler or other code generator can arrange the order of instructions so that the hardware cannot possibly see an instruction until it is free of true dependencies and storage conflicts.
  • This approach rims into several problems.
  • Software does not always know the latency processor operations, and thus, cannot always know how to arrange instructions avoid dependencies.
  • a scalar processor wit low operation latencies, software can insert "no-ops" in the code to satisfy da dependencies without too much overhead.
  • the no-ops use a precious resource, the instruction cache, to enco dependencies between instructions.
  • One technique for removing storage conflicts is by providing additional register that are used to reestablish the correspondence between registers and values. Th additional registers are conventionally allocated dynamically by hardware, and th registers are associated with values needed by the program using "registe renaming.”
  • processors typically allocate a ne register for every new value produced (i.e., for every instruction that writes a register). An instruction identifying the original register, for the purpose of reading its value, obtains instead the value in the newly allocated register.
  • hardware renames the original register identifier in the instruction to identify the new register and correct value. The same register identifier in several different instructions may access different hardware registers, depending on the locations of register references with respect to register assignments.
  • Eac assignment to a register creates a new "instance" of the register, denoted by an alphabetic subscript.
  • the creation of a new instance for R3 in the third instruction avoids the anti- and output dependencies on the second and first instructions, respectively, and yet does not interfere with correctly supplying an operand to the fourth instruction.
  • the assignment to R3 in the third instruction supersedes the assignment to R3 in the first instruction, causing R3c to become the new R3 seen by subsequent instructions until another instruction assigns a value to R3.
  • Hardware that performs renaming creates each new register instance and destroys the instance when its value is superseded and there are no outstanding references to the value. This removes anti- and output dependencies and allows more instruction parallelism. Registers are still reused, but reuse is in line with the requirements of parallel execution. This is particularly helpful with out-of-order issue, because storage conflicts introduce instruction issue constraints that are not really necessary to produce correct results. For example, in the preceding instruction sequence, renaming allows the third instruction to be issued immediately, whereas, without renaming, the instruction must be delayed until the first instruction is complete and the second instruction is issued.
  • Another technique for reducing dependencies is to associate a single bit (called a "scoreboard bit") with each register.
  • the scoreboard bit is used to indicate that a register has a pending update.
  • the processor sets the associated scoreboard bit.
  • the scoreboard bit is reset when the write actually occurs. Because there is only one scoreboard bit indicating whether or not there is a pending update, there can be only one such update for each register.
  • the scoreboard stalls instruction decoding if a decoded instruction will update a register that already has a pending update (indicated by the scoreboard bit being set). This avoids output dependencies by allowing only one pending update to register at any given time.
  • Register renaming uses multiple-bit tags to identify the variou uncomputed values, some of which values may be destined for the same processo register (that is, the same program-visible register).
  • Conventional renaming require hardware to allocate tags from a pool of available tags that are not currentl associated with any value and requires hardware to free the tags to the pool once th values have been computed.
  • scoreboarding allows only on pending update to a given register, the processor is not concerned about which updat is the most recent.
  • a further technique for reducing dependencies is using register renaming with "reorder buffer” which uses associative lookup.
  • the associative lookup maps th register identifier to the reorder buffer entry as soon as the entry is allocated, and, t avoid output dependencies, the lookup is prioritized so that only the value for th most recent assignment is obtained if the register is assigned more than once. A ta is obtained if the result is not yet available.
  • the values for the different instances are written from th reorder buffer to the register file in sequential order. When the value for the fina instance is written to the register file, the reorder buffer no longer maps the register the register file contains the only instance of the register, and this is the most recen instance.
  • renaming with a reorder buffer relies on the associative lookup in th reorder buffer to map register identifiers to values.
  • th associative lookup is prioritized so that the reorder buffer always provides the mos recent value in the register of interest (or a tag).
  • the reorder buffer also write values to the register file in order, so that, if the value is not in the reorder buffer, th register file must contain the most recent value.
  • associative lookup can b eliminated using a "future file.”
  • the future file does not have the properties of th reorder buffer discussed in the preceding paragraph.
  • a value presented to the future file to be written may not be the most recent value destined for the correspondin register, and the value cannot be treated as the most recent value unless it actuall is.
  • the future file therefore keeps track of the most recent update and checks tha each write corresponds to the most recent update before it actually performs th write.
  • an instruction When an instruction is decoded, it accesses tags in the future file along with th operand values. If the register has one or more pending updates, the tag identifies the update value required by the decoded instruction. Once an instruction is decoded, other instructions may overwrite this instructions's source operands without bein constrained by anti-dependencies, because the operands are copied into th instruction window. Output dependencies are handled by preventing the writing as result into the future file if the result does not have a tag for the most recent value Both anti- and output dependencies are handled without stalling instruction issue.
  • interlocks must use t enforce dependencies. An interlock simply delays the execution of an instruction unti the instruction is free of dependencies.
  • instructio f om being executed: one way is to prevent the instruction f om being decoded, an the other is to prevent the instruction from being issued.
  • the dispatch stack is a instruction window that augments each instruction in the window with dependenc counts.
  • th dependency counts are set by comparing the instruction's register identifiers with th register identifiers of all instructions already in the dispatch stack. As instruction complete, the dependency counts of instructions that are still in the window ar decremented based on the source and destination register identifiers of completin instructions (the counts are decremented by a variable amount, depending on th number of instructions completed). An instruction is independent when all of it counts are zero. The use of counts avoids having to compare all instructions in th dispatch stack to all other instructions on every cycle.
  • Anti-dependencies can be avoided altogether by copying operands to th instruction window (for example, to the reservation stations) during instructio decode. In this manner, the operands cannot be overwritten by subsequent registe updates. Operands can be copied to eliminate anti-dependencies in any approach independent of register renaming. The alternative to copying operands is to interloc anti-dependencies, but the comparators and/or counters required for these interlock are costly, considering the number of combinations of source and result registers t be compared.
  • a tag can be supplied for the operand rather than the operand itself. This tag i simply a means for the hardware to identify which value the instruction requires, s that, when the operand value is produced, it can be matched to the instruction.
  • the register identifier can serve a tag (as with scoreboarding). If there can be more than one pending update to register (as with renaming), there must be a mechanism for allocating result ta and insuring uniqueness.
  • An alternative to scoreboarding interlocking is to allow multiple pending update of registers to avoid stalling the decoder for output dependencies, but to handle an dependencies by copying operands (or tags) during decode.
  • An instruction in th window is not issued until it is free of output dependencies, so the updates to eac register are performed in the same order in which they would be performed with i order completion, except that updates for different registers are out of order wit respect to each other.
  • This alternative has almost all of the capabilities of registe renaming, lacking only the capability to issue instructions so that updates to th same register occur out of order.
  • the present invention is directed to instruction scheduling including registe renaming and instruction issuing for superscalar RISC computers.
  • a Registe Rename Circuit which is part of the scheduling logic allows a computer' Instruction Execution Unit (IEU) to execute several instructions at the same tim while avoiding dependencies.
  • IEU Instruction Execution Unit
  • th present invention does not actually rename register addresses.
  • the RRC of th present invention temporarily buffers the instruction results, and the results of ou of-order instruction execution are not transferred to the register file until all previou instructions are done.
  • the RRC also performs result forwarding to provid temporarily buffered operands (results) to dependant instructions.
  • the RR contains three subsections: a Data Dependency Checker (DDC), Tag Assign Logi (TAL) and Register file Port MUXes (RPM).
  • DDC Data Dependency Checker
  • TAL Tag Assign Logi
  • RPM Register file Port MUXes
  • the function of the DDC is to locate the dependencies between the instructions fo a group of instructions.
  • the DDC does this by comparing the addresses of the sourc registers of each instruction to the addresses of the destination registers of eac previous instruction in the group. For example, if instruction A reads a value from register that is written to by instruction B, then instruction A is dependent upo instruction B and instruction A cannot start until instruction B has finished.
  • the DDC outputs indicate these dependencies.
  • the outputs of the DDC go to the TAL. Because it is possible for an instruction to be dependent on more than one previous instruction, the TAL must determine whic of those previous instructions will be the last one to be executed.
  • the presen invention automatically maps each instruction a predetermined temporary buffe location; hence, the present invention does not need prioritized associative look-up as used by convention reorder buffers, thereby saving chip area/cost and execution speed.
  • Out-of-order results for several instructions being executed at the same time are stored in a set of temporary buffers, rather that the file register designated by the instruction. If the DDC determines, for example, that a register that instruction 6's source is written to by instructions 2, 3 and 5, then the TAL will indicate that instruction 6 must wait for instruction 5 by outputting the "tag" of instruction 5 for instruction 6.
  • the tag of instruction 5 shows the temporary buffer location where instruction 5's result is stored. It also contains a one bit signal (called a "done flag") that indicates if instruction 5 is finished or not.
  • the TAL will output three tags for each instruction, because each instruction can have three source registers.
  • the TAL will output the register file address of the instruction's input, rather an a temporary buffer's address.
  • the last part of the RRC are the RPMs or Register file Port MUXes.
  • the inputs of the RPMs are the outputs of the TAL, and the select lines for the RPMs come from another part of the IEU called the Instruction Scheduler or Issuer.
  • the Instruction Scheduler chooses which instruction to execute (this decision is based partly on the done flags) and then uses the RPMs to select the tags of that instruction. These tags go to the read address ports of the computer's register files. In the previous example, once instruction 5 has finished, the Instruction Scheduler will start instruction 6.
  • FIG. 1 shows a representative high level block diagram of the register renamin circuit of the present invention.
  • FIG. 2 shows a representative block diagram of the data dependency check circui of the present invention.
  • FIG. 3 shows a representative block diagram of the tag assignment logic of th present invention.
  • FIG. 4 shows a representative block diagram of the register port file multiplexer of the present invention.
  • FIG. 5 is a representative flowchart showing a data dependency check method fo IXS1 and IYS/D in accordance with the present invention.
  • FIG's. 6A and 6B are representative flowcharts showing a tag assignmen method in accordance with the present invention.
  • FIG. 7 shows a representative block diagram which compares an instruction Y source/destination operand with each operand of an instruction X in accordance wit an embodiment of the present invention.
  • FIG. 8 shows a representative circuit diagram for comparator block 706 of FIG. 7
  • FIG. 9 shows a representative block diagram of a Priority Encoder in accordanc with an embodiment of the present invention.
  • FIG. 10 shows a representative block diagram of the instruction scheduling logi of the present invention.
  • FIG. 1 shows a representative high level block diagram of an Instructio Execution Unit (IEU) 100 associated with the present invention.
  • IEU Instructio Execution Unit
  • the goal of IE 100 is to execute as many instructions as possible in the shortest amount of time There are two basic ways to accomplish this: optimize IEU 100 so that eac instruction takes as little time as possible or optimize IEU 100 so that it can execut several instructions at the same time.
  • IEU 100 Instructions are sent to IEU 100 from an Instruction Fetch Unit (IFU, no shown) through an instruction FIFO (first-in-first-out register stack storage device 101 in groups of four called "buckets.”
  • IEU 100 can decode and schedule up to tw buckets of instructions at one time.
  • FIFO 101 stores 16 total instructions in fou buckets labeled 0-3.
  • IEU 100 looks at the an instruction window 102.
  • window 102 comprises eight instruction (buckets 0 and 1). Every cycle IEU 100 tries to issue a maximum number o instructions from window 102.
  • Window 102 functions as a instruction buffer registe Once the instructions in a bucket are executed and their results stored in th processor's register file (see block 117), the bucket is flushed out a bottom 104 and new bucket is dropped in at a top 106.
  • RRC Register Rename Circuit
  • Input dependencies occur when an instruction call it A, that performs an operation on the result of a previous instruction, call it B
  • Output dependencies occur when the outputs of A and B are to be stored in the sam place.
  • Anti-dependencies occur when instruction A comes before B in the instructio stream and B's result will be stored in the same place as one of A's inputs.
  • Input dependencies are handled by not executing instructions until their input are available.
  • RRC 112 is used to locate the input dependencies between curren instructions and then to signal an Instruction Scheduler or Issuer 118 when al inputs for a particular instruction are ready.
  • RRC 112 compares the register file addresses of each instruction's inputs with th addresses of each previous instruction's output using a data dependency circui (DDC) 108. If one instruction's input comes from a register where a previou instruction's output will be stored, then the latter instruction must wait for th former to finish.
  • DDC data dependency circui
  • RRC 112 can check eight instructions at the same time so a current instruction is defined as any one of those eight from window 102. I should become evident to those skilled in the art that the present invention can easil be adapted to check more or less instructions.
  • instructions can have from 0 to inputs and 0 or 1 outputs. Most instructions' inputs and outputs come from, or ar stored in, one of several register files.
  • Each register file 117 e.g., separate intege floating and boolean register files
  • Thi movement of results from temporary buffers 116 to register file 117 is calle "retirement" and is controlled by termination logic, as should become evident to thos skilled in the art. More than one instruction may be retired at a time. Retiremen comprises updating the "official state" of the machine including the computer' Program Counter, as will become evident to those skilled in the art. For example, i instruction 10 happens to complete directly before instruction II, both results can b stored directly into register file 117. But if instruction 13 then completes, its resul must be stored in temporary buffer 116 until instruction 12 completes. By havin IEU 100 store each instruction's result in its preassigned place in the temporar buffers 116, IEU 100 can execute instructions out of program order and still avoi the problems caused by output and anti-dependencies.
  • RRC 112 sends a bit map to an Instruction Scheduler 118 via a bus 12 indicating which instructions in window 102 are ready for issuing.
  • Instruction decod logic (not shown) indicates to Issuer 118 the resource requirements for eac instruction over a bus 123.
  • Issuer 118 scans this information an selects the first and subsequent instructions for issuing by sending issue signals ove bus 121.
  • the issue signals select a group of Register File Port MUXes (RPMs) 12 inside RRC 112 whose inputs are the addresses of each instruction's inputs.
  • RPMs Register File Port MUXes
  • RRC 112 contains three subsections: a Data Dependency Checker (DDC) 108 Tag Assign Logic (TAL) 122 and Register File Port MUXes (RPM) 124.
  • DDC 10 determines where the input dependencies are between the current instructions.
  • TA 122 monitors the dependencies for Issuer 118 and controls result forwarding.
  • RP 124 is controlled by Issuer 118 and directs the outputs of TAL 122 to the appropriat register file address ports 119.
  • Instructions are passed to DDC 108 via bus 110.
  • Al source registers are compared with all previous destination registers for eac instruction in window 102. Each instruction has only one destination, which may be a double register in one embodiment.
  • An instruction can only depend on a previous instruction and may have up to three source registers. There are various register file source and destination addresses that need to be checked against each other for any dependencies. As noted above, the eight bottom instructions corresponding to the lower two buckets are checked by DDC 108. All source register addresses are compared with all previous destination register addresses for the instructions in window 102.
  • a progra has the following instruction sequence: addR0, Rl, R2 (0) add 0, R2, R3 (1) add R4, R5, R2 (2) add R2, R3, R4 (3)
  • the first two registers in each instruction 0-3 are the source registers, and the last listed register in each instruction is the destination register.
  • R0 and Rl are the source registers for instruction 0 and R2 is the destination register.
  • Instruction 0 adds the contents of registers 0 and 1 and stores the result in R2.
  • Instruction 0D the following are the comparisons needed to evaluate all of the dependencies: I1S1, 11S2 vs. I0D
  • IXRS1 is the address of source (input) number 1 of instruction X
  • IXRS2 is the address of source (input) number 2 of instruction X
  • DGD is the address of the destination (output) of instruction X.
  • RRC 112 can ignore the fact that instruction 2 is output dependent on instruction 0, because the processor has a temporary buffer where instruction 2's result can be stored without interfering with instruction 0's result. As discussed before, instruction 2's result will not be moved from temporary buffers 116 to register file 117 until instructions 0 and l's results are moved to register file 117.
  • the number of instructions that can be checked by RRC 112 is easily scaleable.
  • I5S1 I5S2 vs I4D, I3D, I2D, I1D, I0D I6S1, 16S2 vs I5D, I4D, I3D, I2D, I1D, I0D I7S1, 17S2 vs I6D, I5D, I4D, I3D, I2D, I1D, I0D
  • RRC 112 must handle in order to do the dependency check.
  • Another special case occurs when a program contains instructions that genera 64 bit outputs (called long-word operations). These instructions need two registers i which to store their results.
  • registers must be sequentia
  • RRC 112 is checking instruction 4's dependencies and instruction 1 is a lon word operation, then it must do the following comparisons: I4S1,I4S2 vs. I3D,I2D,I1D,I1D+1,I0D
  • RRC 112 mu ignore any dependencies between instructions without destination registers and an future instructions. Also, instructions may not have only one valid source register, RRC 112 must ignore any dependencies between the unused source register (usuall S2) and any previous instructions.
  • RRC 112 is also capable of dealing with multiple register files. When usin multiple register files, dependencies only occur when one instruction's source regist has the same address and is in the same register file as some other instruction' destination register. RRC 112 treats the information regarding which register file particular address is from as part of the address. For example, in an implementatio using four 32 bit register files, RRC 112 would do 7 bit compares instead of 5 b compares (5 for the address and 2 for the register file).
  • DDC 108 A block diagram of DDC 108 is shown in FIG. 2.
  • Source address signals arri from IFIFO 101 for all eight instructions of window 102. Additional inputs include long-word load operation flags, register file decode signals, invalid destination regist flags, destination address signals and addressing mode flags for all eight instructions.
  • DDC 208 comprises 28 data dependency blocks 204.
  • Each block 204 is describe in a KEY 206.
  • Each block 204 receives 3 inputs, IXSl, IXS2 and IXS D.
  • IXSl is t address of source (input) number 1 of instruction X
  • IXS2 is the address of sourc (input) number 2 of instruction X
  • IXS D is the address of the source/destinatio (input) of instruction X.
  • Each block 204 also receives input IYS D, which is th destination register address for some previous instruction Y.
  • a top row 208 receives I0S/D, which is the destination register address for instruction
  • Each block 204 outputs the data dependency results to one of a corresponding b line 114.
  • the address of I2S D must be checked with operand address SI, S2 and S D of instructions 7, 6, 5, 4, and 3.
  • IXS2 IYS D
  • IXS D IYS D
  • comparator blocks 702, 704 and 706, are represented by three comparator blocks 702, 704 and 706, respectively.
  • One set of inputs to comparator blocks 702, 704 and 706 are the bits of the IYS/D field, which is represented by number 708.
  • Comparator block 702 has as its second set of inputs the bits of the IXSl.
  • comparator block 704 has as its second set of inputs the bits of the IXSl, and comparator block 706 has as its second set of inputs the bits of the IXS D.
  • FIG. 706 can be performed by random logic.
  • An example of random logic for comparator block 706 is shown in FIG. 8.
  • Instruction Ys source/destination bits [6:0] are shown input from the right at reference number 802 and instruction X's source/destination bits [6:0] are shown input from the top at reference number 804.
  • the most significant bit (MSB) is bit 6 and the least significant bit (LSB) is bit 0.
  • the corresponding bits from the two operands are fed to a set of seven exclusive NOR gates (XNORs) 806.
  • the outputs of XNORs 806 are then ANDed by a seven input AND gate 808. If the corresponding bits are the same, the output of XNOR 806 will be logic high. When all bits are the same, all seven XNOR 806 outputs are logic high and the output of AND gate 808 is logic high, this indicates that there is a dependency between JXS/D and IYS D.
  • the random logic for comparator blocks 702 and 704 will be identical to that shown in FIG. 8.
  • the present invention contemplates many other random logic circuits for performing data dependency checking, as will become evident to those skilled in the art without departing from the spirit of this example.
  • An illustrative special data dependency checking case is for long word handling. As mentioned before, if a long word operation writes to register X, the first 32 bits are written to register X and the second 32 bits are written to register X+l. The data dependency checker therefore needs to check both resisters when doing a comparison. In a preferred embodiment, register X is an even register, X+l is an odd register and thus they only differ by the LSB. The easiest way to check both registers at the same time is to simply ignore the LSB.
  • DD 108 first checks whether IXSl and IYS D are in the same register file, as shown at conditional block 502. If they are not in the same register file there is no dependenc This is shown at block a 504. If there is a dependency, DDC 108 then determine whether IXSl and IYS/D are in the same register, as shown at a block 506. If the are not in the same register, flow proceeds to a conditional block 508 where DDC 10 determines whether IY is a long word operation. If IY is not a long word operatio there is no dependency and flow proceeds to a block 504.
  • IY is a long wor operation
  • flow then proceeds to a conditional statement 510 where DDC 10 determines whether IXSl and IYS/D+1 are the same register. If they are not, ther is no dependency and flow proceeds to a block 504. If DG31 and IYS/D+1 are th same register, flow proceeds to a conditional block 512 where DDC 108 determines i IY has a valid destination. If it does not have a valid destination, there is n dependency and flow proceeds to block 504. If IY does have a valid destination, flo proceeds to a conditional block 514 where DDC 108 determines if IXSl has a vali source register.
  • TAL 122 Once TAL 122 has determined where the real dependencies are, it must locate th inputs for each instruction.
  • th inputs can come from the actual register file or an array temporary buffers 116
  • RRC 112 assumes that if an instruction has no dependencies, its inputs are all in the register file. In this case, RRC 112 passes the IXSl, IXS2 and DSD addresses that came from IFIFO 102 to the register file. If an instruction has a dependency, then RRC 112 assumes that the data is in temporary buffers 116.
  • RRC 112 Since RRC 112 knows which previous instruction each instruction depends on, and since each instruction always writes to the same place in temporary buffers 116, RRC 112 can determine where in temporary buffers 116 an instruction's inputs are stored. It sends these addresses to register file read ports 119 and register file 117 outputs the data from temporary buffers 116 so that the instruction can use it.
  • tag assignments The following is an example of tag assignments:
  • I3S1 has two possible dependencies, I0SD and 12SD. Because TAL 122 must pick the last one (highest numbered one), I2SD is chosen.
  • TAL 122 is preparing the tags, it is also monitoring the outputs o DCL 130 and passing them on to Issuer 118 using bus 120. TAL 122 chooses th proper outputs of DCL's 130 to pass to Issuer 118 by the same method that i chooses the tags that it sends to RPM 124.
  • I0S2 INFO 1
  • I0S D INFO 1
  • the DONE signals come from DCL 130 via a bus 132.
  • the term “done” means the result of the instruction is in temporary buffer or otherwise available at the output of a functional unit. Contrastingly, the term “terminate” means the result of the instruction is in th register file.
  • TAL 122 comprises 8 tag assignment logic blocks 302.
  • Each TAL bloc 302 receives the corresponding data dependency results via buses 114, as well a further signals that come from the computer's Instruction Decode and control logi (not shown).
  • the BKT bit signal forms the least significant bit of the tag.
  • DONE[X flags are for instructions 0 through 6, and indicate if instruction X is done.
  • DBLREG[X] flags indicates which, if any, of the instructions is a double Gong) word.
  • Each TAL block 302 also receives its own instructions register addresses as inputs.
  • the Misc. signals, DBLREG and BKT signals are all implementation dependen control signals.
  • Each TAL block 302 outputs 3 TAGs 126 labeled IXSl, IXS2 an IXS/D, which are 6 bits.
  • TAL 122 outputs the least significant 5 bits of each TA signal to RPMs 124 and the most significant TAG to Issuer 118.
  • Each block 302 of FIG. 3 comprises three Priority Encoders (PE), one for SI, on for S2 and one for S/D. There is one exception however. 10 requires no tag assignment. Its tags are the same as the original SI, S2 and S/D addresses, because 10 is always independent.
  • PE Priority Encoders
  • PE 902 has eight inputs 904 and eight outputs 906.
  • Inputs 904 for PE 902 are outputs 114 from DDC 108 which show where dependencies exist.
  • I7S1 tag assign PE 902's seven inputs are the seven outputs 114 of DDC 108 that indicate whether I7S1 is dependent on I6D, whether I7S1 is dependent on I5D, and so on down to whether I7S1 is dependent on I0D.
  • An eighth input, shown at reference number 908, is always tied high because there should always be an output from PE 902.
  • PE 902 will select and output only the most previous instruction (in program order) on which there is a dependency. This is accomplished by connecting the signal showing if there is a dependency on the most previous instruction to the highest priority input of the PE 902 and the signal showing if there is a dependency on the second most previous instruction to the input of PE 902 with the second highest priority and so on for all previous instructions.
  • the input of the PE 902 with the lowest priority is always tied high so that at least one of PE 902's outputs will be asserted.
  • Outputs 906 are used as select lines for a MUX 910.
  • MUX 910 has eight inputs 912 to which the tags for each instruction are applied.
  • TAL 122 monitors up to three dependencies for each instruction and sends three vectors for each instruction (totalling 24 vectors) to Issuer 118. If an instruction is independent, TAL 122 signals to Issuer 118 that the instruction can begin immediately.
  • the MSB of the tag outputs which are sent to RPMs 124 is used to indicate if the address is a register file address or a temporary buffer address. If an instruction is independent, then the five LSB outputs indicate the source register address. For instructions that have dependencies: the second MSB indicates that the address is for a 64 bit valve; the third through fifth MSB outputs specify the temporary buffer address; and the LSB output indicates which bucket is the current bucket, which is equal to the BKT signal in TAL 122.
  • TAL 122 has numerous implementation dependent, (i.e., special cases) that it handles.
  • registe number 0 of the register file is always equal to 0. Therefore, even if one instructio writes to register 0 and another reads from register 0, there will be no dependenc between them.
  • TAL 122 receives three signals from Instruction Decode Logic (IDL not shown) for each instruction to indicate if one of that instruction's sources i register 0. If any of those is asserted, TAL 122 will ignore any dependencies for tha particular input of that instruction.
  • IDL Instruction Decode Logic
  • TAG assignment of instruction 7's source 1 is shown in flowchart in FIG's. 6A-6B.
  • TAL 122 first determines whether I7S1 is register 0, a shown at a conditional block 602. If the first source operand for 17 is register 0, th TAG is set equal to zero, and the I7Sl's INFO flag is set equal to one, as shown in block 604. If the first source operand (SI) for 17 is not register 0, TAL 122 the determines if I7S1 is dependent on I6S/D, as shown at a conditional block 606.
  • I I7S1 is dependent on I6S D flow then proceeds to a block 610 where I7Sl's TAG i set equal to ⁇ 1,DBLREG[6],0,1,0,BKT ⁇ and I7Sl"s INFO flag is set equal to DONE[6] as shown at a block 610. If either of the condition tested at a conditional block 606 i not met, flow proceeds to conditional block 612 where TAL 122 determines if I7S1 i dependent on I5S/D.
  • I7S1 TAG signals are forwarded directly th register file port MUXes of register file 117.
  • the I7S1 INFO signals are sent Issuer 118 to tell it when I7's SI input is ready.
  • Issuer 118 has one scanner block 1002 for each resource (functiona unit) that has to be allocated.
  • Issuer 118 has scanner blocks FU1 FU2, FU3, FU4 through FU/i.
  • Requests for functional units are generated from instruction information by decoding logic (not shown) in a known manner, which are sent to scanners 1002 via bus 123.
  • Each scanner block 1002 scans from instruction 10 to 17 and selects the first request for the corresponding functional unit to be serviced during that cycle.
  • Issuer 118 is capable of issuing instructions having operands stored in different register files.
  • an ADD instruction may have a first operand from the floating point register file and a second operand from the integer register file. Instructions with operands from different register files are typically given higher issue priority (i.e., they are issued first). This issuing technique conserves processor execution time and functional unit resources.
  • IEU 100 may include two ALU's
  • ALU scanning becomes a bit more complicated. For speed reasons, one ALU scanner block scans from 10 to 17, while the other scanner block scans from 17 to 10. This is how two ALU requests are selected. With this scheme it is possible that an ALU instruction in bucket 1 will get issued before an ALU instruction in bucket 0, while increasing scanning efficiency.
  • Scanner outputs 1003 are selected by MUXing logic 1004.
  • a set of SELect inputs 1006 for MUX 1004 receive three 8-bit vectors (one for each operand) from TAL 122 via bus 120. The vectors indicate which of the eight instructions have no dependencies and are ready to be issued. Issuer 118 must wait for this information before it can start to issue any instructions. Issuer 118 monitors these vectors and when all three go high for a particular instruction, Issuer 118 knows that the inputs for that instruction are ready. Once the necessary functional unit is ready, the issuer can issue that instruction and send select signals to the register file port MUXes to pass the corresponding instructions outputs to register file 117.
  • Issuer 118 After Issuer 118 is done it provides two 8-bit vectors per register file back to RRC 112 via MUXOUTputs 1008 to bus 121. These vectors indicate which instructions are issued this cycle, are used a select lines for RPMs 124.
  • the mpYiTrmm number of instructions that can be issued simultaneously for each register file is restricted by the number of register file read ports available.
  • a data dependency with a previous uncompleted instruction may prevent an instruction from being issued.
  • an instruction may be prevented from being issued if the necessary functional unit is allocated to another instruction.
  • RRC 112 The last section of RRC 112 is the register file port MUX (RPM) section 124.
  • the function of RPMs 124 is to provide a way for Issuer 118 to get data out of register files 117 for each instruction to use.
  • RPMs 124 receive tag information via bus 126, and the select lines for RPMs 124 come from Issuer 118 via a bus 121 and also from the computer's IEU control logic.
  • the selected TAGs comprise read addresses that are sent to a predetermined set of ports 119 of register file 117 using bus 128.
  • RPMs 124 depend on the number of register files and the number of ports on each register file.
  • RPMs 124 comprises 3 register port file MUXes 402, 404 and 406.
  • MUX 402 receives as inputs the TAGs of instructions 0-7 corresponding to the source register field SI that are generated by TAL 122.
  • MUX 404 receives as inputs the TAGs of instructions 0-7 corresponding to the source register field S2 that are generated by TAL 122.
  • MUX 406 receives as inputs the TAGs of instructions 0-7 corresponding to the source/destination register field S/D that are generated by TAL 122.
  • the outputs of MUXes 402, 404 and 406 are connected to the read addresses ports of register file 117 via bus 128.
  • RRC 112 and Issuer 118 allow the processor to execute instructions simultaneously and out of program order.
  • An IEU for use with the present invention is disclosed in commonly owned, co-pending application Serial No. 07/817,810 (Attorney Docket No. SP015/1397.0280001), the disclosure of which is incorporated herein by reference.

Abstract

A register renaming system for out-of-order execution of a set of reduced instruction set computer instructions having addressable source and destination register fields, adapted for use in a computer having an instruction execution unit with a register file accessed by read address ports and for storing instruction operands. A data dependance check circuit is included for determining data dependencies between the instructions. A tag assignment circuit generates one or more tags to specify the location of operands, based on the data dependencies determined by the data dependence check circuit. A set of register file port multiplexers select the tags generated by the tag assignment circuit and pass the tags onto the read address ports of the register file for storing execution results.

Description

DESCRIPTION
SUPERSCALAR RISC INSTRUCTION SCHEDULING
CROSS-REFERENCE TO RELATED APPLICATIONS
The following are commonly owned, co-pending applications: * "Semiconductor Floor Plan and Method for a Register Renaming Circuit",
Serial No. 07/860,718, filed 3/31/92 concurrently filed with the present application (Attorney Docket No. SP041);
* "High Performance RISC Microprocessor Architecture", Serial No. 07/817,810, filed 1 8/92 (Attorney Docket No. SP015); * "Extensible RISC Microprocessor Architecture", Serial No. 07/817,809, filed
1/8/92 (Attorney Docket No. SP021). The disclosures of the above applications are incorporated herein by reference.
BACKGROUND OF THE INVENTION
Field of the Invention
The present invention relates to superscalar reduced instruction set computers (RISC), more particularly, the present invention relates to instruction scheduling including register renaming and instruction issuing for superscalar RISC computers.
2. Related Art
A more detailed description of some of the basic concepts discussed in this application is found in a number of references, including Make Johnson, Superscalar Microprocessor Design (Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1991); John L. Hennessy et al., Computer Architecture - A Quantitative Approach" (Morgan Kaufmann Publishers, Inc., San Mateo, California, 1990). Johnson's text, particularly Chapters 2, 6 and 7 provide an excellent discussion of the register renaming issues addressed by the present invention.
A major consideration in a superscalar RISC processor is to how to execute multiple instructions in parallel and out-of-order, without incurring data errors due to dependencies inherent in such execution. Data dependency checking, register
l - -renaming and instruction scheduling are integral aspects of the solution.
2.1 Storage Conflicts and Register Renaming
True dependencies (sometimes called "flow dependencies" or "write-read" dependencies) are often grouped with anti-dependencies (also called "read-write" dependencies) and output dependencies (also called "write-write" dependencies) into a single group of instruction dependencies. The reason for this grouping is that each of these dependencies manifests itself through use of registers or other storage locations. However, it is important to distinguish true dependencies from the other two. True dependencies represent the flow of data and information through a program. Anti- and output dependencies arise because, at different points in time, registers or other storage locations hold different values for different computations.
When instructions are issued in order and complete in order, there is a one-to-one correspondence between registers and values. At any given point in execution, a register identifier precisely identifies the value contained in the corresponding register. When instructions are issued out of order and complete out of order, correspondence between registers and values breaks down, and values conflict for registers. This problem is severe when the goal of register allocation is to keep as many values in as few registers as possible. Keeping a large number of values in a small number of registers creates a large number of conflicts when the execution order is changed -fro the order assumed by the register allocator.
Anti- and output dependencies are more properly called "storage conflicts" because reusing storage locations (including registers) causes instructions to interfere with one another even though conflicting instructions are otherwise independent. Storage conflicts constrain instruction issue and reduce performance. But storage conflicts, like other resource conflicts, can be reduced or eluninated by duplicating the troublesome resource.
2.2 Dependency Mechanisms
Johnson also discusses in detail various dependency mechanisms, including: software, register renaming, register renaming with a reorder buffer, register renaming with a future buffer, interlocks, the copying of operands in the instruction window to avoid dependencies, and partial renaming.
A conventional hardware implementation relies on software to enforce dependencies between instructions. A compiler or other code generator can arrange the order of instructions so that the hardware cannot possibly see an instruction until it is free of true dependencies and storage conflicts. Unfortunately, this approach rims into several problems. Software does not always know the latency processor operations, and thus, cannot always know how to arrange instructions avoid dependencies. There is the question of how the software prevents the hardwa from seeing an instruction until it is free of dependencies. In a scalar processor wit low operation latencies, software can insert "no-ops" in the code to satisfy da dependencies without too much overhead. If the processor is attempting to fetc several instructions per cycle, or if some operations take several cycles to complet the number of no-ops required to prevent the processor from seeing depende instructions rapidly becomes excessive, causing an unacceptable increase in co size. The no-ops use a precious resource, the instruction cache, to enco dependencies between instructions.
When a processor permits out-of-order issue, it is not at all clear what mechanis software should use to enforce dependencies. Software has little control over t behavior of the processor, so it is hard to see how software prevents the process from decoding dependent instructions. The second consideration is that no existin binary code for any scalar processor enforces the dependencies in a superscal processor, because the mode of execution is very different in the superscal processor. Relying on software to enforce dependencies requires that the code b regenerated for the superscalar processor. Finally, the dependencies in the code ar directly determined by the latencies in the hardware, so that the best code for eac version of a superscalar processor depends on the implementation of that version.
On the other hand, there is some motivation against hardware dependenc techniques, because they are inherently complex. Assuming instructions with tw input operands and one output value, as holds for typical RISC instructions, the there are five possible dependencies between any two instructions: two tru dependencies, two anti-dependencies, and one output dependency. Furthermore, th number of dependencies between a group of instructions, such as a group instructions in a window, varies with the square of the number of instructions in th group, because each instruction must be considered against every other instruction. Complexity is further multiplied by the number of instructions that the processo attempts to decode, issue, and complete in a single cycle. These actions introduc dependencies. The only aid in reducing complexity is that the dependencies can b determined incrementally, over many cycles to help reduce the scope and complexit of the dependency hardware. One technique for removing storage conflicts is by providing additional register that are used to reestablish the correspondence between registers and values. Th additional registers are conventionally allocated dynamically by hardware, and th registers are associated with values needed by the program using "registe renaming." To implement register renaming, processors typically allocate a ne register for every new value produced (i.e., for every instruction that writes a register). An instruction identifying the original register, for the purpose of reading its value, obtains instead the value in the newly allocated register. Thus, hardware renames the original register identifier in the instruction to identify the new register and correct value. The same register identifier in several different instructions may access different hardware registers, depending on the locations of register references with respect to register assignments.
Consider the following code sequence where "op" is an operation, "Rn" represents a numbered register, and ":=" represents assignment: R3b := R3a op R5a (1)
R4b := R3b + 1 (2)
R3c := R5a + 1 (3)
R7b := R3c op 4b (4)
Eac assignment to a register creates a new "instance" of the register, denoted by an alphabetic subscript. The creation of a new instance for R3 in the third instruction avoids the anti- and output dependencies on the second and first instructions, respectively, and yet does not interfere with correctly supplying an operand to the fourth instruction. The assignment to R3 in the third instruction supersedes the assignment to R3 in the first instruction, causing R3c to become the new R3 seen by subsequent instructions until another instruction assigns a value to R3.
Hardware that performs renaming creates each new register instance and destroys the instance when its value is superseded and there are no outstanding references to the value. This removes anti- and output dependencies and allows more instruction parallelism. Registers are still reused, but reuse is in line with the requirements of parallel execution. This is particularly helpful with out-of-order issue, because storage conflicts introduce instruction issue constraints that are not really necessary to produce correct results. For example, in the preceding instruction sequence, renaming allows the third instruction to be issued immediately, whereas, without renaming, the instruction must be delayed until the first instruction is complete and the second instruction is issued.
Another technique for reducing dependencies is to associate a single bit (called a "scoreboard bit") with each register. The scoreboard bit is used to indicate that a register has a pending update. When an instruction is decoded that will write a register, the processor sets the associated scoreboard bit. The scoreboard bit is reset when the write actually occurs. Because there is only one scoreboard bit indicating whether or not there is a pending update, there can be only one such update for each register. The scoreboard stalls instruction decoding if a decoded instruction will update a register that already has a pending update (indicated by the scoreboard bit being set). This avoids output dependencies by allowing only one pending update to register at any given time.
Register renaming, in contrast, uses multiple-bit tags to identify the variou uncomputed values, some of which values may be destined for the same processo register (that is, the same program-visible register). Conventional renaming require hardware to allocate tags from a pool of available tags that are not currentl associated with any value and requires hardware to free the tags to the pool once th values have been computed. Furthermore, since scoreboarding allows only on pending update to a given register, the processor is not concerned about which updat is the most recent.
A further technique for reducing dependencies is using register renaming with "reorder buffer" which uses associative lookup. The associative lookup maps th register identifier to the reorder buffer entry as soon as the entry is allocated, and, t avoid output dependencies, the lookup is prioritized so that only the value for th most recent assignment is obtained if the register is assigned more than once. A ta is obtained if the result is not yet available. There can be as many instances of given register as there are reorder buffer entries, so there are no storage conflict between instructions. The values for the different instances are written from th reorder buffer to the register file in sequential order. When the value for the fina instance is written to the register file, the reorder buffer no longer maps the register the register file contains the only instance of the register, and this is the most recen instance.
However, renaming with a reorder buffer relies on the associative lookup in th reorder buffer to map register identifiers to values. In the reorder buffer, th associative lookup is prioritized so that the reorder buffer always provides the mos recent value in the register of interest (or a tag). The reorder buffer also write values to the register file in order, so that, if the value is not in the reorder buffer, th register file must contain the most recent value.
In a still further technique for reducing dependencies, associative lookup can b eliminated using a "future file." The future file does not have the properties of th reorder buffer discussed in the preceding paragraph. A value presented to the future file to be written may not be the most recent value destined for the correspondin register, and the value cannot be treated as the most recent value unless it actuall is. The future file therefore keeps track of the most recent update and checks tha each write corresponds to the most recent update before it actually performs th write.
When an instruction is decoded, it accesses tags in the future file along with th operand values. If the register has one or more pending updates, the tag identifies the update value required by the decoded instruction. Once an instruction is decoded, other instructions may overwrite this instructions's source operands without bein constrained by anti-dependencies, because the operands are copied into th instruction window. Output dependencies are handled by preventing the writing as result into the future file if the result does not have a tag for the most recent value Both anti- and output dependencies are handled without stalling instruction issue.
If dependencies are not removed through renaming, "interlocks" must use t enforce dependencies. An interlock simply delays the execution of an instruction unti the instruction is free of dependencies. There are two ways to prevent an instructio f om being executed: one way is to prevent the instruction f om being decoded, an the other is to prevent the instruction from being issued.
To improve performance over scoreboarding, interlocks are moved from th decoder to the instruction window using a "dispatch stack." The dispatch stack is a instruction window that augments each instruction in the window with dependenc counts. There is a dependency count associated with the source register of eac instruction in the window, giving the number of pending prior updates to the sourc register and thus the number of updates that must be completed before all possibl true dependencies are removed. There are two similar dependency counts associate with the destination register of each instruction in the window, giving both th number of pending prior uses of the register (which is the number of anti dependencies) and the number of pending prior updates to the register (which is th number of output dependencies).
When an instruction is decoded and loaded into the dispatch stack, th dependency counts are set by comparing the instruction's register identifiers with th register identifiers of all instructions already in the dispatch stack. As instruction complete, the dependency counts of instructions that are still in the window ar decremented based on the source and destination register identifiers of completin instructions (the counts are decremented by a variable amount, depending on th number of instructions completed). An instruction is independent when all of it counts are zero. The use of counts avoids having to compare all instructions in th dispatch stack to all other instructions on every cycle.
Anti-dependencies can be avoided altogether by copying operands to th instruction window (for example, to the reservation stations) during instructio decode. In this manner, the operands cannot be overwritten by subsequent registe updates. Operands can be copied to eliminate anti-dependencies in any approach independent of register renaming. The alternative to copying operands is to interloc anti-dependencies, but the comparators and/or counters required for these interlock are costly, considering the number of combinations of source and result registers t be compared.
A tag can be supplied for the operand rather than the operand itself. This tag i simply a means for the hardware to identify which value the instruction requires, s that, when the operand value is produced, it can be matched to the instruction. there can be only one pending update to a register, the register identifier can serve a tag (as with scoreboarding). If there can be more than one pending update to register (as with renaming), there must be a mechanism for allocating result ta and insuring uniqueness.
An alternative to scoreboarding interlocking is to allow multiple pending update of registers to avoid stalling the decoder for output dependencies, but to handle an dependencies by copying operands (or tags) during decode. An instruction in th window is not issued until it is free of output dependencies, so the updates to eac register are performed in the same order in which they would be performed with i order completion, except that updates for different registers are out of order wit respect to each other. This alternative has almost all of the capabilities of registe renaming, lacking only the capability to issue instructions so that updates to th same register occur out of order.
There appears to be no better alternative to renaming other than with a reorde buffer. Underlying the discussion of dependencies has been the assumption that th processor performs out-of-order issue and already has a reorder buffer for recoverin from mispredicted branches. Out-of-order issue makes it unacceptable to stall th decoder for dependencies. If the processor has an instruction window, it i inconsistent to limit the look ahead capability of the processor by interlocking th decoder. There are then only two alternatives: implement anti- and outp dependency interlocks in the window or remove these altogether with renaming.
SUMMARY OF THE INVENTION
The present invention is directed to instruction scheduling including registe renaming and instruction issuing for superscalar RISC computers. A Registe Rename Circuit (RRC), which is part of the scheduling logic allows a computer' Instruction Execution Unit (IEU) to execute several instructions at the same tim while avoiding dependencies. In contrast to conventional register renaming, th present invention does not actually rename register addresses. The RRC of th present invention temporarily buffers the instruction results, and the results of ou of-order instruction execution are not transferred to the register file until all previou instructions are done. The RRC also performs result forwarding to provid temporarily buffered operands (results) to dependant instructions. The RR contains three subsections: a Data Dependency Checker (DDC), Tag Assign Logi (TAL) and Register file Port MUXes (RPM).
The function of the DDC is to locate the dependencies between the instructions fo a group of instructions. The DDC does this by comparing the addresses of the sourc registers of each instruction to the addresses of the destination registers of eac previous instruction in the group. For example, if instruction A reads a value from register that is written to by instruction B, then instruction A is dependent upo instruction B and instruction A cannot start until instruction B has finished. The DDC outputs indicate these dependencies.
The outputs of the DDC go to the TAL. Because it is possible for an instruction to be dependent on more than one previous instruction, the TAL must determine whic of those previous instructions will be the last one to be executed. The presen invention automatically maps each instruction a predetermined temporary buffe location; hence, the present invention does not need prioritized associative look-up as used by convention reorder buffers, thereby saving chip area/cost and execution speed.
Out-of-order results for several instructions being executed at the same time are stored in a set of temporary buffers, rather that the file register designated by the instruction. If the DDC determines, for example, that a register that instruction 6's source is written to by instructions 2, 3 and 5, then the TAL will indicate that instruction 6 must wait for instruction 5 by outputting the "tag" of instruction 5 for instruction 6. The tag of instruction 5 shows the temporary buffer location where instruction 5's result is stored. It also contains a one bit signal (called a "done flag") that indicates if instruction 5 is finished or not. The TAL will output three tags for each instruction, because each instruction can have three source registers. If an instruction is not dependent on any previous instruction, the TAL will output the register file address of the instruction's input, rather an a temporary buffer's address. The last part of the RRC are the RPMs or Register file Port MUXes. The inputs of the RPMs are the outputs of the TAL, and the select lines for the RPMs come from another part of the IEU called the Instruction Scheduler or Issuer. The Instruction Scheduler chooses which instruction to execute (this decision is based partly on the done flags) and then uses the RPMs to select the tags of that instruction. These tags go to the read address ports of the computer's register files. In the previous example, once instruction 5 has finished, the Instruction Scheduler will start instruction 6. It will select the RPM so that the address of instruction 5's result (its tag) is sent to the register file, and the register file will make the result of instruction 5 available to instruction 6. The foregoing and other features and advantages of the present invention will be apparent from the following more particular description of the preferred embodiments of the invention, as illustrated in the accompanying drawings. BRIEF DESCRIPTION OF THE DRAWINGS
The invention will be better understood if reference is made to the accompanyin drawings. FIG. 1 shows a representative high level block diagram of the register renamin circuit of the present invention.
FIG. 2 shows a representative block diagram of the data dependency check circui of the present invention.
FIG. 3 shows a representative block diagram of the tag assignment logic of th present invention.
FIG. 4 shows a representative block diagram of the register port file multiplexer of the present invention.
FIG. 5 is a representative flowchart showing a data dependency check method fo IXS1 and IYS/D in accordance with the present invention. FIG's. 6A and 6B are representative flowcharts showing a tag assignmen method in accordance with the present invention.
FIG. 7 shows a representative block diagram which compares an instruction Y source/destination operand with each operand of an instruction X in accordance wit an embodiment of the present invention. FIG. 8 shows a representative circuit diagram for comparator block 706 of FIG. 7
FIG. 9 shows a representative block diagram of a Priority Encoder in accordanc with an embodiment of the present invention.
FIG. 10 shows a representative block diagram of the instruction scheduling logi of the present invention.
DETAILED DESCRIPTION
FIG. 1 shows a representative high level block diagram of an Instructio Execution Unit (IEU) 100 associated with the present invention. The goal of IE 100 is to execute as many instructions as possible in the shortest amount of time There are two basic ways to accomplish this: optimize IEU 100 so that eac instruction takes as little time as possible or optimize IEU 100 so that it can execut several instructions at the same time.
Instructions are sent to IEU 100 from an Instruction Fetch Unit (IFU, no shown) through an instruction FIFO (first-in-first-out register stack storage device 101 in groups of four called "buckets." IEU 100 can decode and schedule up to tw buckets of instructions at one time. FIFO 101 stores 16 total instructions in fou buckets labeled 0-3. IEU 100 looks at the an instruction window 102. In on embodiment of the present invention, window 102 comprises eight instruction (buckets 0 and 1). Every cycle IEU 100 tries to issue a maximum number o instructions from window 102. Window 102 functions as a instruction buffer registe Once the instructions in a bucket are executed and their results stored in th processor's register file (see block 117), the bucket is flushed out a bottom 104 and new bucket is dropped in at a top 106.
In order to execute instructions in parallel or out of order, care must be taken s that the data that each instruction needs is available when the instruction needs i and also so that the result of each instruction is available for any future instruction that might need it. A Register Rename Circuit (RRC), which, is part of the schedulin logic of the computer's DϋU performs this function by locating dependencies betwee current instructions and then renaming the sources (inputs) of the instruction.
As noted above, there are three types of dependencies: input dependencies, outpu dependencies and anti-dependencies. Input dependencies occur when an instruction call it A, that performs an operation on the result of a previous instruction, call it B Output dependencies occur when the outputs of A and B are to be stored in the sam place. Anti-dependencies occur when instruction A comes before B in the instructio stream and B's result will be stored in the same place as one of A's inputs.
Input dependencies are handled by not executing instructions until their input are available. RRC 112 is used to locate the input dependencies between curren instructions and then to signal an Instruction Scheduler or Issuer 118 when al inputs for a particular instruction are ready. In order to locate these dependencies RRC 112 compares the register file addresses of each instruction's inputs with th addresses of each previous instruction's output using a data dependency circui (DDC) 108. If one instruction's input comes from a register where a previou instruction's output will be stored, then the latter instruction must wait for th former to finish.
This implementation of RRC 112 can check eight instructions at the same time so a current instruction is defined as any one of those eight from window 102. I should become evident to those skilled in the art that the present invention can easil be adapted to check more or less instructions.
In one embodiment of the present invention, instructions can have from 0 to inputs and 0 or 1 outputs. Most instructions' inputs and outputs come from, or ar stored in, one of several register files. Each register file 117 (e.g., separate intege floating and boolean register files) has 32 real entries plus the group of 8 temporar buffers 116. When an instruction completes, (The term "complete" means that th operation is complete and the operand is ready to be written to its destinatio register.) its result is stored in its preassigned location in the temporary buffers 116 Its result is later moved to the appropriate place in register file 117 after all previou instructions' results have been moved to their places in the register file. Thi movement of results from temporary buffers 116 to register file 117 is calle "retirement" and is controlled by termination logic, as should become evident to thos skilled in the art. More than one instruction may be retired at a time. Retiremen comprises updating the "official state" of the machine including the computer' Program Counter, as will become evident to those skilled in the art. For example, i instruction 10 happens to complete directly before instruction II, both results can b stored directly into register file 117. But if instruction 13 then completes, its resul must be stored in temporary buffer 116 until instruction 12 completes. By havin IEU 100 store each instruction's result in its preassigned place in the temporar buffers 116, IEU 100 can execute instructions out of program order and still avoi the problems caused by output and anti-dependencies.
RRC 112 sends a bit map to an Instruction Scheduler 118 via a bus 12 indicating which instructions in window 102 are ready for issuing. Instruction decod logic (not shown) indicates to Issuer 118 the resource requirements for eac instruction over a bus 123. For each resource in IEU 100 (e.g., each functional uni being an adder, multiplier, shifter, or the like), Issuer 118 scans this information an selects the first and subsequent instructions for issuing by sending issue signals ove bus 121. The issue signals select a group of Register File Port MUXes (RPMs) 12 inside RRC 112 whose inputs are the addresses of each instruction's inputs. Because the results may stay in temporary buffer 116 several cycles before goin to register file 117, a mechanism is provided to get results from temporary buffer 11 before they go to register file 117, so the information can be used as operands fo other instructions. This mechanism is called "result forwarding," and without it Issuer 118 would not be able to issue instructions out of order. This result forwardin is done in register file 117 and is controlled by RRC 112. The control signal necessary for performing the result forwarding will be come evident to those skilled i the art, as should the random logic used for generating such control signals.
If an instruction is not dependent on any of the current instructions resul forwarding is not necessary since the instruction's inputs are already in register fil 117. When Issuer 118 decides to execute that instruction, RRC 112 tells register fil 117 to output its data.
RRC 112 contains three subsections: a Data Dependency Checker (DDC) 108 Tag Assign Logic (TAL) 122 and Register File Port MUXes (RPM) 124. DDC 10 determines where the input dependencies are between the current instructions. TA 122 monitors the dependencies for Issuer 118 and controls result forwarding. RP 124 is controlled by Issuer 118 and directs the outputs of TAL 122 to the appropriat register file address ports 119. Instructions are passed to DDC 108 via bus 110. Al source registers are compared with all previous destination registers for eac instruction in window 102. Each instruction has only one destination, which may be a double register in one embodiment. An instruction can only depend on a previous instruction and may have up to three source registers. There are various register file source and destination addresses that need to be checked against each other for any dependencies. As noted above, the eight bottom instructions corresponding to the lower two buckets are checked by DDC 108. All source register addresses are compared with all previous destination register addresses for the instructions in window 102.
For example, let's say a progra has the following instruction sequence: addR0, Rl, R2 (0) add 0, R2, R3 (1) add R4, R5, R2 (2) add R2, R3, R4 (3)
The first two registers in each instruction 0-3 are the source registers, and the last listed register in each instruction is the destination register. For example, R0 and Rl are the source registers for instruction 0 and R2 is the destination register.
Instruction 0 adds the contents of registers 0 and 1 and stores the result in R2. For instructions 1-3 in this example, the following are the comparisons needed to evaluate all of the dependencies: I1S1, 11S2 vs. I0D
I2S1, 12S2 vs. I1D, I0D
I3S1, 13S2 vs. I2D, I1D, I0D
The key to the above is as follows: IXRS1 is the address of source (input) number 1 of instruction X; IXRS2 is the address of source (input) number 2 of instruction X; and DGD is the address of the destination (output) of instruction X.
Note also that RRC 112 can ignore the fact that instruction 2 is output dependent on instruction 0, because the processor has a temporary buffer where instruction 2's result can be stored without interfering with instruction 0's result. As discussed before, instruction 2's result will not be moved from temporary buffers 116 to register file 117 until instructions 0 and l's results are moved to register file 117.
The number of instructions that can be checked by RRC 112 is easily scaleable.
In order to check eight instructions at a time instead of four, the following additional comparisons would also need to be made: I4S1, 14S2 vs I3D, I2D, I1D, I0D
I5S1. I5S2 vs I4D, I3D, I2D, I1D, I0D I6S1, 16S2 vs I5D, I4D, I3D, I2D, I1D, I0D I7S1, 17S2 vs I6D, I5D, I4D, I3D, I2D, I1D, I0D
There are several special cases that RRC 112 must handle in order to do the dependency check. First, there are some instructions that use the same register as an input and an output. Thus, RRC 112 must compare this source/destination register address with the destination register addresses of all previous instructions. So for instruction 7, the following comparisons would be necessary: I7S1,I7S2,I7S D vs. I6D,I5D,I4D,I3D,I2D,I1D,I0D. Another special case occurs when a program contains instructions that genera 64 bit outputs (called long-word operations). These instructions need two registers i which to store their results. In this embodiment, these registers must be sequentia Thus if RRC 112 is checking instruction 4's dependencies and instruction 1 is a lon word operation, then it must do the following comparisons: I4S1,I4S2 vs. I3D,I2D,I1D,I1D+1,I0D
Sometimes, instructions do not have destination registers. Thus RRC 112 mu ignore any dependencies between instructions without destination registers and an future instructions. Also, instructions may not have only one valid source register, RRC 112 must ignore any dependencies between the unused source register (usuall S2) and any previous instructions.
RRC 112 is also capable of dealing with multiple register files. When usin multiple register files, dependencies only occur when one instruction's source regist has the same address and is in the same register file as some other instruction' destination register. RRC 112 treats the information regarding which register file particular address is from as part of the address. For example, in an implementatio using four 32 bit register files, RRC 112 would do 7 bit compares instead of 5 b compares (5 for the address and 2 for the register file).
Signals indicating which instructions are long-word operations or have invali source or destination registers are sent to RRC 112 from Instruction Decode Log (IDL; not shown). IDL also tells RRC 112 which register file each instruction' sources and destinations will come from or go to. A block diagram of DDC 108 is shown in FIG. 2. Source address signals arri from IFIFO 101 for all eight instructions of window 102. Additional inputs inclu long-word load operation flags, register file decode signals, invalid destination regist flags, destination address signals and addressing mode flags for all eight instructions.
DDC 208 comprises 28 data dependency blocks 204. Each block 204 is describe in a KEY 206. Each block 204 receives 3 inputs, IXSl, IXS2 and IXS D. IXSl is t address of source (input) number 1 of instruction X, IXS2 is the address of sourc (input) number 2 of instruction X; and IXS D is the address of the source/destinatio (input) of instruction X. Each block 204 also receives input IYS D, which is th destination register address for some previous instruction Y. A top row 208, f example, receives I0S/D, which is the destination register address for instruction Each block 204 outputs the data dependency results to one of a corresponding b line 114. For example, the address of I2S D must be checked with operand address SI, S2 and S D of instructions 7, 6, 5, 4, and 3. Each block 204 performs the three comparisons. To illustrate these comparisons, consider a generic block 700 shown in FIG. 7, which compares instruction Ys source/destination operand with each operand of instruction X. In this example, the three following comparisons must be made: IXSl = IYS D
IXS2 = IYS D IXS D = IYS D
These comparisons are represented by three comparator blocks 702, 704 and 706, respectively. One set of inputs to comparator blocks 702, 704 and 706 are the bits of the IYS/D field, which is represented by number 708. Comparator block 702 has as its second set of inputs the bits of the IXSl. Similarly, comparator block 704 has as its second set of inputs the bits of the IXSl, and comparator block 706 has as its second set of inputs the bits of the IXS D. In a preferred embodiment, the comparisons performed by blocks 702, 704 and
706 can be performed by random logic. An example of random logic for comparator block 706 is shown in FIG. 8. Instruction Ys source/destination bits [6:0] are shown input from the right at reference number 802 and instruction X's source/destination bits [6:0] are shown input from the top at reference number 804. The most significant bit (MSB) is bit 6 and the least significant bit (LSB) is bit 0. The corresponding bits from the two operands are fed to a set of seven exclusive NOR gates (XNORs) 806. The outputs of XNORs 806 are then ANDed by a seven input AND gate 808. If the corresponding bits are the same, the output of XNOR 806 will be logic high. When all bits are the same, all seven XNOR 806 outputs are logic high and the output of AND gate 808 is logic high, this indicates that there is a dependency between JXS/D and IYS D.
The random logic for comparator blocks 702 and 704 will be identical to that shown in FIG. 8. The present invention contemplates many other random logic circuits for performing data dependency checking, as will become evident to those skilled in the art without departing from the spirit of this example.
As will further become evident to those skilled in the art, various implementation specific special cases can arise which require additional random logic to perform data dependency checking. An illustrative special data dependency checking case is for long word handling. As mentioned before, if a long word operation writes to register X, the first 32 bits are written to register X and the second 32 bits are written to register X+l. The data dependency checker therefore needs to check both resisters when doing a comparison. In a preferred embodiment, register X is an even register, X+l is an odd register and thus they only differ by the LSB. The easiest way to check both registers at the same time is to simply ignore the LSB. In the case of a store long (STLG) or load long (LDLG) operation, if X and Y only differ by the LSB bit [0], th logic in FIG. 8 would cause there to be no dependency, when there really is dependency. Therefore, for a long word operation the STLG and LDLG flags must b ORed with the output of the [0] bit XNOR to assure that all dependencies ar detected.
A data dependency check flowchart for IXSl and IYS/D is shown in FIG. 5. DD 108 first checks whether IXSl and IYS D are in the same register file, as shown at conditional block 502. If they are not in the same register file there is no dependenc This is shown at block a 504. If there is a dependency, DDC 108 then determine whether IXSl and IYS/D are in the same register, as shown at a block 506. If the are not in the same register, flow proceeds to a conditional block 508 where DDC 10 determines whether IY is a long word operation. If IY is not a long word operatio there is no dependency and flow proceeds to a block 504. If IY is a long wor operation, flow then proceeds to a conditional statement 510 where DDC 10 determines whether IXSl and IYS/D+1 are the same register. If they are not, ther is no dependency and flow proceeds to a block 504. If DG31 and IYS/D+1 are th same register, flow proceeds to a conditional block 512 where DDC 108 determines i IY has a valid destination. If it does not have a valid destination, there is n dependency and flow proceeds to block 504. If IY does have a valid destination, flo proceeds to a conditional block 514 where DDC 108 determines if IXSl has a vali source register. Again, if no valid source register is detected there is no dependenc and flow proceeds to a block 504. If a valid source register is detected, DDC 108 ha determined that there is a dependency between DG31 and IYX/D, as shown at a bloc 516. A more detailed discussion of data dependency checking is found in commonl owned, copending application Serial No. 07/860,718 (Attorney Docket No. SP041 the disclosure of which is incorporated herein by reference.
Because it is possible that an instruction might get one of its inputs from register that was written to by several other instructions, the present inventio must choose which one is the real dependency. For example, if instructions 2 and write to register 4 and instruction 7 reads register 4, then instruction 7 has tw possible dependencies. In this case, it is assumed that since instruction 5 came afte instruction 2 in the program, the programmer intended instruction 7 to us instruction 5's result and not instruction 2's. So, if an instruction can be dependen on several previous instructions, RRC 112 will consider it to be dependent on th highest numbered previous instruction.
Once TAL 122 has determined where the real dependencies are, it must locate th inputs for each instruction. In a preferred embodiment of the present invention, th inputs can come from the actual register file or an array temporary buffers 116 RRC 112 assumes that if an instruction has no dependencies, its inputs are all in the register file. In this case, RRC 112 passes the IXSl, IXS2 and DSD addresses that came from IFIFO 102 to the register file. If an instruction has a dependency, then RRC 112 assumes that the data is in temporary buffers 116. Since RRC 112 knows which previous instruction each instruction depends on, and since each instruction always writes to the same place in temporary buffers 116, RRC 112 can determine where in temporary buffers 116 an instruction's inputs are stored. It sends these addresses to register file read ports 119 and register file 117 outputs the data from temporary buffers 116 so that the instruction can use it. The following is an example of tag assignments:
0:addr0,rl,r2
I:addr0,r2,r3
2:addr4,r5,r2
3:addr2,r3,r4
The following are the dependencies for the above operations (dependencies are represented by the symbol "#"):
I1S2#I0S/D
I3S1#I0S/D I3S1#I2S/D
I3S2#I1S/D
First, look at 10; since it has no dependencies, its tags are equal to its original source register addresses: I0SlTAG = I0Sl = r0
I0S2TAG = I0S2 = rl I0SDTAG = I0SD = r2
11 has one dependency, and its tags are as follows:
IlSlTAG = IlSl = rO I1S2 TAG = I0S/D = tO where: (tO = inst.0's slot in temporary buffer) IlSDTAG = IlSD = r3
12 is also independent:
I2SlTAG = I2Sl = r4 I2S2TAG = I2S2 = r5 I2SD TAG = I2SD = r2
I3S1 has two possible dependencies, I0SD and 12SD. Because TAL 122 must pick the last one (highest numbered one), I2SD is chosen.
I3S1TAG = I2SD = t2
I3S2TAG = I1S/D = tl I3S/D TAG = I3S D = r4
These tags are then sent to RPM 124 via bus 126 to be selected by Issuer 118
At the same time TAL 122 is preparing the tags, it is also monitoring the outputs o DCL 130 and passing them on to Issuer 118 using bus 120. TAL 122 chooses th proper outputs of DCL's 130 to pass to Issuer 118 by the same method that i chooses the tags that it sends to RPM 124.
Continuing the example, TAL 122 sends the following ready signals to Issuer 118: I0S1 INFO = 1 (Inst 0 is independent so it can start immediately)
I0S2 INFO = 1 I0S D INFO = 1
I1S1 INFO = 1 I1S2 INFO= DONE[0]
(DONE[0] = 1 when 10 is done) I1S/D INFO = 1
I2S1 INFO = 1 I2S2 INFO = 1
I2S D INFO = 1
1351 INFO = DONE[2]
1352 INFO = DONE[l] I3SZD READ = 1
(The DONE signals come from DCL 130 via a bus 132. In connection with th present invention, the term "done" means the result of the instruction is in temporary buffer or otherwise available at the output of a functional unit. Contrastingly, the term "terminate" means the result of the instruction is in th register file.)
Turning now to FIG. 3, a representative block diagram of TAL 122 will b discussed. TAL 122 comprises 8 tag assignment logic blocks 302. Each TAL bloc 302 receives the corresponding data dependency results via buses 114, as well a further signals that come from the computer's Instruction Decode and control logi (not shown). The BKT bit signal forms the least significant bit of the tag. DONE[X flags are for instructions 0 through 6, and indicate if instruction X is done. DBLREG[X] flags indicates which, if any, of the instructions is a double Gong) word. Each TAL block 302 also receives its own instructions register addresses as inputs. The Misc. signals, DBLREG and BKT signals are all implementation dependen control signals. Each TAL block 302 outputs 3 TAGs 126 labeled IXSl, IXS2 an IXS/D, which are 6 bits. TAL 122 outputs the least significant 5 bits of each TA signal to RPMs 124 and the most significant TAG to Issuer 118.
Each block 302 of FIG. 3 comprises three Priority Encoders (PE), one for SI, on for S2 and one for S/D. There is one exception however. 10 requires no tag assignment. Its tags are the same as the original SI, S2 and S/D addresses, because 10 is always independent.
An illustrative PE is shown in FIG. 9. PE 902 has eight inputs 904 and eight outputs 906. Inputs 904 for PE 902 are outputs 114 from DDC 108 which show where dependencies exist. For example, in the case of source register 1 (SI), I7S1 tag assign PE 902's seven inputs are the seven outputs 114 of DDC 108 that indicate whether I7S1 is dependent on I6D, whether I7S1 is dependent on I5D, and so on down to whether I7S1 is dependent on I0D. An eighth input, shown at reference number 908, is always tied high because there should always be an output from PE 902.
As stated before, if an instruction depends on several previous instructions, PE 902 will select and output only the most previous instruction (in program order) on which there is a dependency. This is accomplished by connecting the signal showing if there is a dependency on the most previous instruction to the highest priority input of the PE 902 and the signal showing if there is a dependency on the second most previous instruction to the input of PE 902 with the second highest priority and so on for all previous instructions. The input of the PE 902 with the lowest priority is always tied high so that at least one of PE 902's outputs will be asserted.
Outputs 906 are used as select lines for a MUX 910. MUX 910 has eight inputs 912 to which the tags for each instruction are applied.
To illustrate this, assume that 17 depends on 16 and 15, then, since 16 has a higher priority than 15, the bit corresponding to 16 at outputs 906 of PE 902 will be high. At the corresponding input 912 of MUX 910 will be I6's tag for Si (recall PE 902 is for I7S1). Because 17 is dependent on 16, the location of I6's result must be output from MUX 910 so that it can be used by 17. I6's tag will therefore be selected and output on an output line 914. I6's done flag, DONE[6] must also be output from MUX 910 so that Issuer 118 will know when I7's input is ready. This data is passed to Issuer 118 via bus 120. Since an instruction can have up to three sources, TAL 122 monitors up to three dependencies for each instruction and sends three vectors for each instruction (totalling 24 vectors) to Issuer 118. If an instruction is independent, TAL 122 signals to Issuer 118 that the instruction can begin immediately.
The MSB of the tag outputs which are sent to RPMs 124 is used to indicate if the address is a register file address or a temporary buffer address. If an instruction is independent, then the five LSB outputs indicate the source register address. For instructions that have dependencies: the second MSB indicates that the address is for a 64 bit valve; the third through fifth MSB outputs specify the temporary buffer address; and the LSB output indicates which bucket is the current bucket, which is equal to the BKT signal in TAL 122.
Like DDC 108, TAL 122 has numerous implementation dependent, (i.e., special cases) that it handles. First, in an embodiment of the present invention, registe number 0 of the register file is always equal to 0. Therefore, even if one instructio writes to register 0 and another reads from register 0, there will be no dependenc between them. TAL 122 receives three signals from Instruction Decode Logic (IDL not shown) for each instruction to indicate if one of that instruction's sources i register 0. If any of those is asserted, TAL 122 will ignore any dependencies for tha particular input of that instruction.
Another special case occurs because under some circumstances, an instruction i bucket 0 will be guaranteed to not have any of the instructions in bucket 1 dependen on it. A four bit signal called BKTl_NODEP_ is sent to RRC 112 from the IE control logic (not shown) and if BKTl_NODEP[X] = 1 then RRC 112 knows to ignor any dependencies between instructions, 4,5,6 or 7 and instruction X.
An example for TAG assignment of instruction 7's source 1 (I7S1) is shown in flowchart in FIG's. 6A-6B. TAL 122 first determines whether I7S1 is register 0, a shown at a conditional block 602. If the first source operand for 17 is register 0, th TAG is set equal to zero, and the I7Sl's INFO flag is set equal to one, as shown in block 604. If the first source operand (SI) for 17 is not register 0, TAL 122 the determines if I7S1 is dependent on I6S/D, as shown at a conditional block 606. I I7S1 is dependent on I6S D flow then proceeds to a block 610 where I7Sl's TAG i set equal to {1,DBLREG[6],0,1,0,BKT} and I7Sl"s INFO flag is set equal to DONE[6] as shown at a block 610. If either of the condition tested at a conditional block 606 i not met, flow proceeds to conditional block 612 where TAL 122 determines if I7S1 i dependent on I5S/D. If there is a dependency, flow then proceeds to block 616 wher TAL 122 sets I7Sl's TAG equal to {1,DBLREG[5],0,0,1,BKT} and I7Sl's INFO fla is set equal to DONE[5]. If the condition tested at block 612 is not met, flow proceed to a block 618 where TAL 122 determines if I7S1 is dependent on I4S D.
As evident by inspection of the remaining sections of FIG's. 6A and 6B, simila TAG determinations are made depending on whether I7S1 is dependent on I4S/D I3S/D, I2S D, I1S/D and I0S/D, as shown at sections 620, 622, 624, 626 and 628 respectively. Finally, if instruction 7 is independent of instruction 0 or if al instructions in bucket 1 are independent of instruction 0 (i.e., if BKT1_NODEP[0] 1), as tested at a conditional block 630, the flow proceeds to block 632 where TA 122 sets I7Sl's TAG equal to {0,I7S1} and I7Sl's INFO flag equal to 1. It should b noted for the above example that I7S1 TAG signals are forwarded directly th register file port MUXes of register file 117. The I7S1 INFO signals are sent Issuer 118 to tell it when I7's SI input is ready.
A representative block diagram of Issuer 118 is shown in FIG. 10. In a preferre embodiment, Issuer 118 has one scanner block 1002 for each resource (functiona unit) that has to be allocated. In this example, Issuer 118 has scanner blocks FU1 FU2, FU3, FU4 through FU/i. Requests for functional units are generated from instruction information by decoding logic (not shown) in a known manner, which are sent to scanners 1002 via bus 123. Each scanner block 1002 scans from instruction 10 to 17 and selects the first request for the corresponding functional unit to be serviced during that cycle. a the case of multiple register files (integer, floating and/or boolean), Issuer 118 is capable of issuing instructions having operands stored in different register files. For example, an ADD instruction may have a first operand from the floating point register file and a second operand from the integer register file. Instructions with operands from different register files are typically given higher issue priority (i.e., they are issued first). This issuing technique conserves processor execution time and functional unit resources.
In a further embodiment in which IEU 100 may include two ALU's, ALU scanning becomes a bit more complicated. For speed reasons, one ALU scanner block scans from 10 to 17, while the other scanner block scans from 17 to 10. This is how two ALU requests are selected. With this scheme it is possible that an ALU instruction in bucket 1 will get issued before an ALU instruction in bucket 0, while increasing scanning efficiency.
Scanner outputs 1003 are selected by MUXing logic 1004. A set of SELect inputs 1006 for MUX 1004 receive three 8-bit vectors (one for each operand) from TAL 122 via bus 120. The vectors indicate which of the eight instructions have no dependencies and are ready to be issued. Issuer 118 must wait for this information before it can start to issue any instructions. Issuer 118 monitors these vectors and when all three go high for a particular instruction, Issuer 118 knows that the inputs for that instruction are ready. Once the necessary functional unit is ready, the issuer can issue that instruction and send select signals to the register file port MUXes to pass the corresponding instructions outputs to register file 117.
In a preferred embodiment of the present invention, after Issuer 118 is done it provides two 8-bit vectors per register file back to RRC 112 via MUXOUTputs 1008 to bus 121. These vectors indicate which instructions are issued this cycle, are used a select lines for RPMs 124.
The mpYiTrmm number of instructions that can be issued simultaneously for each register file is restricted by the number of register file read ports available. A data dependency with a previous uncompleted instruction may prevent an instruction from being issued. In addition, an instruction may be prevented from being issued if the necessary functional unit is allocated to another instruction.
Several instructions, such as load immediate instructions, Boolean operations and relative conditional branches, may be issued independently, because they may not require resources other than register file read ports or they may potentially have no dependencies.
The last section of RRC 112 is the register file port MUX (RPM) section 124. The function of RPMs 124 is to provide a way for Issuer 118 to get data out of register files 117 for each instruction to use. RPMs 124 receive tag information via bus 126, and the select lines for RPMs 124 come from Issuer 118 via a bus 121 and also from the computer's IEU control logic. The selected TAGs comprise read addresses that are sent to a predetermined set of ports 119 of register file 117 using bus 128.
The number and design of RPMs 124 depend on the number of register files and the number of ports on each register file. One embodiment of RPMs 124 is shown in FIG. 4. In this embodiment, RPMs 124 comprises 3 register port file MUXes 402, 404 and 406. MUX 402 receives as inputs the TAGs of instructions 0-7 corresponding to the source register field SI that are generated by TAL 122. MUX 404 receives as inputs the TAGs of instructions 0-7 corresponding to the source register field S2 that are generated by TAL 122. MUX 406 receives as inputs the TAGs of instructions 0-7 corresponding to the source/destination register field S/D that are generated by TAL 122. The outputs of MUXes 402, 404 and 406 are connected to the read addresses ports of register file 117 via bus 128.
RRC 112 and Issuer 118 allow the processor to execute instructions simultaneously and out of program order. An IEU for use with the present invention is disclosed in commonly owned, co-pending application Serial No. 07/817,810 (Attorney Docket No. SP015/1397.0280001), the disclosure of which is incorporated herein by reference.
While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example, and not limitation. Thus the breadth and scope of the present invention should not be Umited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims

What is claimed is: 1. A register renaming system for out-of-order execution of a set of reduced instruction set computer instructions having addressable source and destination register fields, adapted for use in a computer having an instruction execution unit with a register file accessed by read address ports and for storing instruction operands, the system comprising: (a) data dependance check means for determining data dependencies between the instructions; (b) tag assignment means for generating one of more tags to specify the location of operands, based on said data dependencies determined by said data dependance check means; and (c) register file port means for selecting said tags generated by said tag assignment means and passing said tags onto the read address ports of the register file for storing execution results.
2.The system of claim 1, wherein said data dependance check means determines said data dependencies by comparing the addresses of the source register field of each instruction to the addresses of the destination register fields.
3.The system of claim 1, further comprising temporary storage means for temporarily storing out-of-order execution results, wherein said out-of-order execution results are passed to the register files in order after execution of the set of instructions is completed.
4. A register renaming method for performing out-of-order execution of a set of reduced instruction set computer instructions having addressable source and destination register fields, adapted for use in a computer having an instruction execution unit with a register file accessed by read address ports and for storing instruction operands, the method comprising the steps of: (1) determining data dependencies between the instructions; (2) performing at least one of in-order and out-of-order issuing of two or more of the instructions in the instruction execution unit; (3) storing, temporarily, any out-of-order results in temporary storage means; (4) generating one or more tags to specify the location of said out-of- order results based on said data dependencies; (5) selecting appropriate ones of said tags corresponding to the issued instruction; and (6) passing said selected tags onto the read address ports of the register file for one of: (i) accessing said out-of-order results; and (ii) storing execution results.
δ.The method of claim 4, wherein said determining step further comprises the step of comparing the addresses of the source register field of each instruction to the addresses of the destination register fields.
β.The method of claim 4, further comprising the step of storing in-order results in the register file and the temporary storage means.
7.The method of claim 4, further comprising the step of passing said out-of-order results to the register files in-order after execution of the set of instructions is completed.
8.A method for issuing instructions in a superscalar reduced instruction set computer having an instruction issuer system capable of issuing a plurality of instructions in a single cycle, the system having more than one register file sets, the method comprising the steps of: prioritizing instructions to be issued according to the number of different register file sets furnishing operands to the instruction; and issuing instructions according to said prioritizing step.
9.The method according to claim 8, wherein said prioritizing step assigns a higher priority to those instructions having operands furnished from the greater number of register file sets.
10.An instruction issuer system in a superscalar reduced instruction set computer, the system capable of issuing a plurality of instructions in a single cycle, the system having more than one set of register files, the system comprising: first means for prioritizing instructions to be issued according to the number of different register files furnishing operands to the instruction; and second means, responsive to said first means, for issuing instructions according to said priority.
11. The system according to claim 10, wherein said first means further assigns a higher priority to those instructions having operands furnished from the greater number of register files.
PCT/JP1993/000375 1992-03-31 1993-03-26 Superscalar risc instruction scheduling WO1993020505A2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
DE69311330T DE69311330T2 (en) 1992-03-31 1993-03-26 COMMAND SEQUENCE PLANNING FROM A RISC SUPER SCALAR PROCESSOR
JP51729393A JP3730252B2 (en) 1992-03-31 1993-03-26 Register name changing method and name changing system
EP93906834A EP0636256B1 (en) 1992-03-31 1993-03-26 Superscalar risc processor instruction scheduling
KR1019940703382A KR950701101A (en) 1992-03-31 1994-09-28 Superscalar RCS command scheduling

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US86071992A 1992-03-31 1992-03-31
US07/860,719 1992-03-31

Publications (2)

Publication Number Publication Date
WO1993020505A2 true WO1993020505A2 (en) 1993-10-14
WO1993020505A3 WO1993020505A3 (en) 1993-11-25

Family

ID=25333867

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP1993/000375 WO1993020505A2 (en) 1992-03-31 1993-03-26 Superscalar risc instruction scheduling

Country Status (6)

Country Link
US (7) US5497499A (en)
EP (1) EP0636256B1 (en)
JP (7) JP3730252B2 (en)
KR (3) KR950701101A (en)
DE (1) DE69311330T2 (en)
WO (1) WO1993020505A2 (en)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0605875A1 (en) * 1993-01-08 1994-07-13 International Business Machines Corporation Method and system for single cycle dispatch of multiple instruction in a superscalar processor system
WO1994016384A1 (en) * 1992-12-31 1994-07-21 Seiko Epson Corporation System and method for register renaming
EP0649085A1 (en) 1993-10-18 1995-04-19 Cyrix Corporation Microprocessor pipe control and register translation
EP0685788A1 (en) * 1994-06-01 1995-12-06 Advanced Micro Devices, Inc. Programme counter update mechanism
GB2265481B (en) * 1992-03-25 1995-12-20 Hewlett Packard Co Memory processor that permits aggressive execution of load instructions
EP0709769A3 (en) * 1994-10-24 1997-04-16 Ibm Apparatus and method for the analysis and resolution of operand dependencies
US5630149A (en) * 1993-10-18 1997-05-13 Cyrix Corporation Pipelined processor with register renaming hardware to accommodate multiple size registers
WO1997025669A1 (en) * 1996-01-04 1997-07-17 Advanced Micro Devices, Inc. Method and apparatus to translate a first instruction set to a second instruction set
US5664120A (en) * 1995-08-25 1997-09-02 International Business Machines Corporation Method for executing instructions and execution unit instruction reservation table within an in-order completion processor
US5689693A (en) * 1994-04-26 1997-11-18 Advanced Micro Devices, Inc. Range finding circuit for selecting a consecutive sequence of reorder buffer entries using circular carry lookahead
WO1998033116A2 (en) * 1997-01-29 1998-07-30 Advanced Micro Devices, Inc. A line-oriented reorder buffer for a superscalar microprocessor
WO1998037485A1 (en) * 1997-02-21 1998-08-27 Richard Byron Wilmot Method and apparatus for forwarding of operands in a computer system
WO1999008185A1 (en) * 1997-08-06 1999-02-18 Advanced Micro Devices, Inc. A dependency table for reducing dependency checking hardware
US5878244A (en) * 1995-01-25 1999-03-02 Advanced Micro Devices, Inc. Reorder buffer configured to allocate storage capable of storing results corresponding to a maximum number of concurrently receivable instructions regardless of a number of instructions received
US5896542A (en) * 1992-12-31 1999-04-20 Seiko Epson Corporation System and method for assigning tags to control instruction processing in a superscalar processor
WO1999026132A2 (en) * 1997-11-17 1999-05-27 Advanced Micro Devices, Inc. Processor configured to generate lookahead results from collapsed moves, compares and simple arithmetic instructions
US5915110A (en) * 1996-07-26 1999-06-22 Advanced Micro Devices, Inc. Branch misprediction recovery in a reorder buffer having a future file
US5946468A (en) * 1996-07-26 1999-08-31 Advanced Micro Devices, Inc. Reorder buffer having an improved future file for storing speculative instruction execution results
US5961634A (en) * 1996-07-26 1999-10-05 Advanced Micro Devices, Inc. Reorder buffer having a future file for storing speculative instruction execution results
US5983342A (en) * 1996-09-12 1999-11-09 Advanced Micro Devices, Inc. Superscalar microprocessor employing a future file for storing results into multiportion registers
EP1004959A2 (en) * 1998-10-06 2000-05-31 Texas Instruments Incorporated Processor with pipeline protection
US6073231A (en) * 1993-10-18 2000-06-06 Via-Cyrix, Inc. Pipelined processor with microcontrol of register translation hardware
EP1006439A1 (en) * 1998-11-30 2000-06-07 Nec Corporation Instruction-issuing circuit and method for out-of-order execution that set reference dependency information in an instruction when a succeeding instruction is stored in an instruction buffer
US6108769A (en) * 1996-05-17 2000-08-22 Advanced Micro Devices, Inc. Dependency table for reducing dependency checking hardware
US6138230A (en) * 1993-10-18 2000-10-24 Via-Cyrix, Inc. Processor with multiple execution pipelines using pipe stage state information to control independent movement of instructions between pipe stages of an execution pipeline
EP1204022A1 (en) * 1997-01-29 2002-05-08 Advanced Micro Devices, Inc. A line-oriented reorder buffer for a superscalar microprocessor
WO2002057908A2 (en) * 2001-01-16 2002-07-25 Sun Microsystems, Inc. A superscalar processor having content addressable memory structures for determining dependencies
GB2372120A (en) * 2000-11-29 2002-08-14 Nec Corp Data dependence detector
GB2563582A (en) * 2017-06-16 2018-12-26 Imagination Tech Ltd Methods and systems for inter-pipeline data hazard avoidance

Families Citing this family (111)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5493687A (en) 1991-07-08 1996-02-20 Seiko Epson Corporation RISC microprocessor architecture implementing multiple typed register sets
US5539911A (en) * 1991-07-08 1996-07-23 Seiko Epson Corporation High-performance, superscalar-based computer system with out-of-order instruction execution
US5371684A (en) * 1992-03-31 1994-12-06 Seiko Epson Corporation Semiconductor floor plan for a register renaming circuit
JP3730252B2 (en) * 1992-03-31 2005-12-21 トランスメタ コーポレイション Register name changing method and name changing system
US5452401A (en) * 1992-03-31 1995-09-19 Seiko Epson Corporation Selective power-down for high performance CPU/system
DE69308548T2 (en) * 1992-05-01 1997-06-12 Seiko Epson Corp DEVICE AND METHOD FOR COMPLETING THE COMMAND IN A SUPER-SCALAR PROCESSOR.
JP3417984B2 (en) * 1993-09-10 2003-06-16 株式会社日立製作所 Compile method to reduce cache contention
JPH0793152A (en) * 1993-09-20 1995-04-07 Fujitsu Ltd Microprocessor controller
US5481743A (en) * 1993-09-30 1996-01-02 Apple Computer, Inc. Minimal instruction set computer architecture and multiple instruction issue method
JP3299611B2 (en) * 1993-10-20 2002-07-08 松下電器産業株式会社 Resource allocation device
US5918046A (en) * 1994-01-03 1999-06-29 Intel Corporation Method and apparatus for a branch instruction pointer table
JP3311462B2 (en) * 1994-02-23 2002-08-05 富士通株式会社 Compile processing unit
US5675758A (en) * 1994-11-15 1997-10-07 Advanced Micro Devices, Inc. Processor having primary integer execution unit and supplemental integer execution unit for performing out-of-order add and move operations
US5903741A (en) * 1995-01-25 1999-05-11 Advanced Micro Devices, Inc. Method of allocating a fixed reorder buffer storage line for execution results regardless of a number of concurrently dispatched instructions
TW295646B (en) * 1995-01-25 1997-01-11 Ibm
US6237082B1 (en) 1995-01-25 2001-05-22 Advanced Micro Devices, Inc. Reorder buffer configured to allocate storage for instruction results corresponding to predefined maximum number of concurrently receivable instructions independent of a number of instructions received
US5901302A (en) * 1995-01-25 1999-05-04 Advanced Micro Devices, Inc. Superscalar microprocessor having symmetrical, fixed issue positions each configured to execute a particular subset of instructions
WO1996025705A1 (en) * 1995-02-14 1996-08-22 Fujitsu Limited Structure and method for high-performance speculative execution processor providing special features
US5655115A (en) * 1995-02-14 1997-08-05 Hal Computer Systems, Inc. Processor structure and method for watchpoint of plural simultaneous unresolved branch evaluation
US6112019A (en) * 1995-06-12 2000-08-29 Georgia Tech Research Corp. Distributed instruction queue
US5799165A (en) * 1996-01-26 1998-08-25 Advanced Micro Devices, Inc. Out-of-order processing that removes an issued operation from an execution pipeline upon determining that the operation would cause a lengthy pipeline delay
US5765035A (en) * 1995-11-20 1998-06-09 Advanced Micro Devices, Inc. Recorder buffer capable of detecting dependencies between accesses to a pair of caches
US5787026A (en) * 1995-12-20 1998-07-28 Intel Corporation Method and apparatus for providing memory access in a processor pipeline
US5768556A (en) * 1995-12-22 1998-06-16 International Business Machines Corporation Method and apparatus for identifying dependencies within a register
US5787266A (en) * 1996-02-20 1998-07-28 Advanced Micro Devices, Inc. Apparatus and method for accessing special registers without serialization
US5796975A (en) * 1996-05-24 1998-08-18 Hewlett-Packard Company Operand dependency tracking system and method for a processor that executes instructions out of order
USRE38599E1 (en) * 1996-06-11 2004-09-21 Sun Microsystems, Inc. Pipelined instruction dispatch unit in a superscalar processor
US5958042A (en) 1996-06-11 1999-09-28 Sun Microsystems, Inc. Grouping logic circuit in a pipelined superscalar processor
US5826070A (en) * 1996-08-30 1998-10-20 International Business Machines Corporation Apparatus and method for maintaining status flags and condition codes using a renaming technique in an out of order floating point execution unit
US5926832A (en) * 1996-09-26 1999-07-20 Transmeta Corporation Method and apparatus for aliasing memory data in an advanced microprocessor
US5802386A (en) * 1996-11-19 1998-09-01 International Business Machines Corporation Latency-based scheduling of instructions in a superscalar processor
US5838941A (en) * 1996-12-30 1998-11-17 Intel Corporation Out-of-order superscalar microprocessor with a renaming device that maps instructions from memory to registers
US5978900A (en) * 1996-12-30 1999-11-02 Intel Corporation Renaming numeric and segment registers using common general register pool
US5996063A (en) * 1997-03-03 1999-11-30 International Business Machines Corporation Management of both renamed and architected registers in a superscalar computer system
JP2917969B2 (en) * 1997-06-06 1999-07-12 日本電気株式会社 Logical equivalence verification method and logical equivalence verification device
US5790827A (en) * 1997-06-20 1998-08-04 Sun Microsystems, Inc. Method for dependency checking using a scoreboard for a pair of register sets having different precisions
US5784588A (en) * 1997-06-20 1998-07-21 Sun Microsystems, Inc. Dependency checking apparatus employing a scoreboard for a pair of register sets having different precisions
US6263416B1 (en) * 1997-06-27 2001-07-17 Sun Microsystems, Inc. Method for reducing number of register file ports in a wide instruction issue processor
US6061785A (en) * 1998-02-17 2000-05-09 International Business Machines Corporation Data processing system having an apparatus for out-of-order register operations and method therefor
US6336160B1 (en) 1998-06-19 2002-01-01 International Business Machines Corporation Method and system for dividing a computer processor register into sectors and storing frequently used values therein
US6393552B1 (en) 1998-06-19 2002-05-21 International Business Machines Corporation Method and system for dividing a computer processor register into sectors
US6163839A (en) * 1998-09-30 2000-12-19 Intel Corporation Non-stalling circular counterflow pipeline processor with reorder buffer
US6550001B1 (en) * 1998-10-30 2003-04-15 Intel Corporation Method and implementation of statistical detection of read after write and write after write hazards
US6311267B1 (en) 1998-11-20 2001-10-30 International Business Machines Corporation Just-in-time register renaming technique
JP2001092657A (en) * 1999-09-22 2001-04-06 Toshiba Corp Central arithmetic unit and compile method and recording medium recording compile program
US6643762B1 (en) * 2000-01-24 2003-11-04 Hewlett-Packard Development Company, L.P. Processing system and method utilizing a scoreboard to detect data hazards between instructions of computer programs
US6766440B1 (en) * 2000-02-18 2004-07-20 Texas Instruments Incorporated Microprocessor with conditional cross path stall to minimize CPU cycle time length
US6883165B1 (en) 2000-09-28 2005-04-19 International Business Machines Corporation Apparatus and method for avoiding deadlocks in a multithreaded environment
US6779106B1 (en) 2000-09-28 2004-08-17 International Business Machines Corporation Apparatus and method for an enhanced integer divide in an IA64 architecture
US6912647B1 (en) * 2000-09-28 2005-06-28 International Business Machines Corportion Apparatus and method for creating instruction bundles in an explicitly parallel architecture
US6886094B1 (en) 2000-09-28 2005-04-26 International Business Machines Corporation Apparatus and method for detecting and handling exceptions
US6799262B1 (en) * 2000-09-28 2004-09-28 International Business Machines Corporation Apparatus and method for creating instruction groups for explicity parallel architectures
US6662273B1 (en) * 2000-09-29 2003-12-09 Intel Corporation Least critical used replacement with critical cache
US6782469B1 (en) 2000-09-29 2004-08-24 Intel Corporation Runtime critical load/data ordering
US6760816B1 (en) 2000-09-29 2004-07-06 Intel Corporation Critical loads guided data prefetching
EP1217514A3 (en) * 2000-12-23 2003-08-13 International Business Machines Corporation Method and apparatus for bypassing pipeline stages
US7844799B2 (en) * 2000-12-23 2010-11-30 International Business Machines Corporation Method and system for pipeline reduction
US7203817B2 (en) * 2001-09-24 2007-04-10 Broadcom Corporation Power consumption reduction in a pipeline by stalling instruction issue on a load miss
US7269714B2 (en) 2001-09-24 2007-09-11 Broadcom Corporation Inhibiting of a co-issuing instruction in a processor having different pipeline lengths
US6976152B2 (en) * 2001-09-24 2005-12-13 Broadcom Corporation Comparing operands of instructions against a replay scoreboard to detect an instruction replay and copying a replay scoreboard to an issue scoreboard
US7308563B2 (en) * 2001-09-28 2007-12-11 Intel Corporation Dual-target block register allocation
JP3577052B2 (en) * 2002-03-19 2004-10-13 株式会社東芝 Instruction issuing device and instruction issuing method
US20030217249A1 (en) * 2002-05-20 2003-11-20 The Regents Of The University Of Michigan Method and apparatus for virtual register renaming to implement an out-of-order processor
EP1462934A1 (en) * 2003-03-29 2004-09-29 Deutsche Thomson-Brandt Gmbh Method and apparatus for forwarding of results
JP2004318502A (en) * 2003-04-16 2004-11-11 Matsushita Electric Ind Co Ltd Microprocessor with power control function, and device for converting instruction
US7437532B1 (en) 2003-05-07 2008-10-14 Marvell International Ltd. Memory mapped register file
US7430654B2 (en) * 2003-07-09 2008-09-30 Via Technologies, Inc. Dynamic instruction dependency monitor and control system
US7096345B1 (en) 2003-09-26 2006-08-22 Marvell International Ltd. Data processing system with bypass reorder buffer having non-bypassable locations and combined load/store arithmetic logic unit and processing method thereof
US7711932B2 (en) * 2003-12-02 2010-05-04 Intel Corporation Scalable rename map table recovery
US20060095732A1 (en) * 2004-08-30 2006-05-04 Tran Thang M Processes, circuits, devices, and systems for scoreboard and other processor improvements
US7487337B2 (en) * 2004-09-30 2009-02-03 Intel Corporation Back-end renaming in a continual flow processor pipeline
KR100725393B1 (en) * 2005-05-19 2007-06-07 삼성전자주식회사 System and method for reducing execution time of byte code at java virtual machine
JP4243271B2 (en) * 2005-09-30 2009-03-25 富士通マイクロエレクトロニクス株式会社 Data processing apparatus and data processing method
US7380104B2 (en) * 2006-04-25 2008-05-27 International Business Machines Corporation Method and apparatus for back to back issue of dependent instructions in an out of order issue queue
US7725687B2 (en) * 2006-06-27 2010-05-25 Texas Instruments Incorporated Register file bypass with optional results storage and separate predication register file in a VLIW processor
US8291431B2 (en) * 2006-08-29 2012-10-16 Qualcomm Incorporated Dependent instruction thread scheduling
US7949837B2 (en) * 2006-10-05 2011-05-24 Waratek Pty Ltd. Contention detection and resolution
AU2007304895A1 (en) * 2006-10-05 2008-04-10 Waratek Pty Limited Advanced contention detection
US20080133862A1 (en) * 2006-10-05 2008-06-05 Holt John M Contention detection with modified message format
US20080140973A1 (en) * 2006-10-05 2008-06-12 Holt John M Contention detection with data consolidation
US20080126503A1 (en) * 2006-10-05 2008-05-29 Holt John M Contention resolution with echo cancellation
US20080127214A1 (en) * 2006-10-05 2008-05-29 Holt John M Contention detection with counter rollover
US7849151B2 (en) * 2006-10-05 2010-12-07 Waratek Pty Ltd. Contention detection
US20080250221A1 (en) * 2006-10-09 2008-10-09 Holt John M Contention detection with data consolidation
CN100451953C (en) * 2007-03-27 2009-01-14 威盛电子股份有限公司 Program command regulating method
US20110320787A1 (en) * 2010-06-28 2011-12-29 Qualcomm Incorporated Indirect Branch Hint
US8683261B2 (en) 2011-07-20 2014-03-25 International Business Machines Corporation Out of order millicode control operation
JP5786769B2 (en) 2012-03-14 2015-09-30 富士通株式会社 Nayoro Support Program, Nayoro Support Method, and Nayoro Support Device
US9996348B2 (en) 2012-06-14 2018-06-12 Apple Inc. Zero cycle load
US9411584B2 (en) * 2012-12-29 2016-08-09 Intel Corporation Methods, apparatus, instructions, and logic to provide vector address conflict detection functionality
US9766866B2 (en) * 2013-04-22 2017-09-19 Nvidia Corporation Techniques for determining instruction dependencies
US9772827B2 (en) * 2013-04-22 2017-09-26 Nvidia Corporation Techniques for determining instruction dependencies
US9792252B2 (en) 2013-05-31 2017-10-17 Microsoft Technology Licensing, Llc Incorporating a spatial array into one or more programmable processor cores
GB2514618B (en) * 2013-05-31 2020-11-11 Advanced Risc Mach Ltd Data processing systems
US9330432B2 (en) * 2013-08-19 2016-05-03 Apple Inc. Queuing system for register file access
US9612840B2 (en) * 2014-03-28 2017-04-04 Intel Corporation Method and apparatus for implementing a dynamic out-of-order processor pipeline
US11068271B2 (en) 2014-07-28 2021-07-20 Apple Inc. Zero cycle move using free list counts
US10175988B2 (en) 2015-06-26 2019-01-08 Microsoft Technology Licensing, Llc Explicit instruction scheduler state information for a processor
US9946548B2 (en) 2015-06-26 2018-04-17 Microsoft Technology Licensing, Llc Age-based management of instruction blocks in a processor instruction window
US10191747B2 (en) 2015-06-26 2019-01-29 Microsoft Technology Licensing, Llc Locking operand values for groups of instructions executed atomically
US9720693B2 (en) 2015-06-26 2017-08-01 Microsoft Technology Licensing, Llc Bulk allocation of instruction blocks to a processor instruction window
US10409606B2 (en) 2015-06-26 2019-09-10 Microsoft Technology Licensing, Llc Verifying branch targets
US9952867B2 (en) 2015-06-26 2018-04-24 Microsoft Technology Licensing, Llc Mapping instruction blocks based on block size
US10409599B2 (en) 2015-06-26 2019-09-10 Microsoft Technology Licensing, Llc Decoding information about a group of instructions including a size of the group of instructions
US10169044B2 (en) 2015-06-26 2019-01-01 Microsoft Technology Licensing, Llc Processing an encoding format field to interpret header information regarding a group of instructions
US10346168B2 (en) 2015-06-26 2019-07-09 Microsoft Technology Licensing, Llc Decoupled processor instruction window and operand buffer
US10540260B2 (en) * 2018-02-23 2020-01-21 Intel Corporation Dynamic instruction latency management in SIMD machines
US10996957B1 (en) 2019-06-20 2021-05-04 Marvell Asia Pte, Ltd. System and method for instruction mapping in an out-of-order processor
US11036515B1 (en) 2019-06-20 2021-06-15 Marvell Asia Pte, Ltd. System and method for instruction unwinding in an out-of-order processor
US11200062B2 (en) 2019-08-26 2021-12-14 Apple Inc. History file for previous register mapping storage and last reference indication
US11416254B2 (en) 2019-12-05 2022-08-16 Apple Inc. Zero cycle load bypass in a decode group

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1991020031A1 (en) * 1990-06-11 1991-12-26 Supercomputer Systems Limited Partnership Method for optimizing instruction scheduling
EP0533337A1 (en) * 1991-09-20 1993-03-24 Advanced Micro Devices, Inc. Apparatus and method for resolving dependencies among a plurality of instructions within a storage device

Family Cites Families (226)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3346851A (en) 1964-07-08 1967-10-10 Control Data Corp Simultaneous multiprocessing computer system
US3718912A (en) * 1970-12-22 1973-02-27 Ibm Instruction execution unit
US3789365A (en) * 1971-06-03 1974-01-29 Bunker Ramo Processor interrupt system
US3718851A (en) * 1971-08-16 1973-02-27 Gen Electric Means responsive to an overvoltage condition for generating a frequency increasing control signal
US3771138A (en) 1971-08-31 1973-11-06 Ibm Apparatus and method for serializing instructions from two independent instruction streams
US3913074A (en) 1973-12-18 1975-10-14 Honeywell Inf Systems Search processing apparatus
US4034349A (en) 1976-01-29 1977-07-05 Sperry Rand Corporation Apparatus for processing interrupts in microprocessing systems
US4128880A (en) 1976-06-30 1978-12-05 Cray Research, Inc. Computer vector register processing
US4212076A (en) 1976-09-24 1980-07-08 Giddings & Lewis, Inc. Digital computer structure providing arithmetic and boolean logic operations, the latter controlling the former
US4315314A (en) * 1977-12-30 1982-02-09 Rca Corporation Priority vectored interrupt having means to supply branch address directly
US4200927A (en) * 1978-01-03 1980-04-29 International Business Machines Corporation Multi-instruction stream branch processing mechanism
US4228495A (en) 1978-12-19 1980-10-14 Allen-Bradley Company Multiprocessor numerical control system
US4315308A (en) * 1978-12-21 1982-02-09 Intel Corporation Interface between a microprocessor chip and peripheral subsystems
US4296470A (en) 1979-06-21 1981-10-20 International Business Machines Corp. Link register storage and restore system for use in an instruction pre-fetch micro-processor interrupt system
JPS5616248A (en) 1979-07-17 1981-02-17 Matsushita Electric Ind Co Ltd Processing system for interruption
JPS6028015B2 (en) 1980-08-28 1985-07-02 日本電気株式会社 information processing equipment
US4434461A (en) * 1980-09-15 1984-02-28 Motorola, Inc. Microprocessor with duplicate registers for processing interrupts
JPS5757345A (en) 1980-09-24 1982-04-06 Toshiba Corp Data controller
US4574349A (en) * 1981-03-30 1986-03-04 International Business Machines Corp. Apparatus for addressing a larger number of instruction addressable central processor registers than can be identified by a program instruction
US4814979A (en) * 1981-04-01 1989-03-21 Teradata Corporation Network to transmit prioritized subtask pockets to dedicated processors
JPS57204125A (en) 1981-06-10 1982-12-14 Hitachi Ltd Electron-ray drawing device
US4482950A (en) 1981-09-24 1984-11-13 Dshkhunian Valery Single-chip microcomputer
US4498134A (en) * 1982-01-26 1985-02-05 Hughes Aircraft Company Segregator functional plane for use in a modular array processor
JPS58151655A (en) 1982-03-03 1983-09-08 Fujitsu Ltd Information processing device
JPS5932045A (en) 1982-08-16 1984-02-21 Hitachi Ltd Information processor
US4500963A (en) * 1982-11-29 1985-02-19 The United States Of America As Represented By The Secretary Of The Army Automatic layout program for hybrid microcircuits (HYPAR)
US4597054A (en) 1982-12-02 1986-06-24 Ncr Corporation Arbiter circuit and method
US4594655A (en) 1983-03-14 1986-06-10 International Business Machines Corporation (k)-Instructions-at-a-time pipelined processor for parallel execution of inherently sequential instructions
US4807115A (en) 1983-10-07 1989-02-21 Cornell Research Foundation, Inc. Instruction issuing mechanism for processors with multiple functional units
GB8329509D0 (en) 1983-11-04 1983-12-07 Inmos Ltd Computer
JPS60120439A (en) * 1983-12-05 1985-06-27 Nec Corp Operation processor
US4561051A (en) 1984-02-10 1985-12-24 Prime Computer, Inc. Memory access method and apparatus in multiple processor systems
JPS60225943A (en) 1984-04-25 1985-11-11 Hitachi Ltd Exceptional interruption processing system
US4648045A (en) * 1984-05-23 1987-03-03 The Board Of Trustees Of The Leland Standford Jr. University High speed memory and processor system for raster display
JPS6140650A (en) 1984-08-02 1986-02-26 Nec Corp Microcomputer
US4775927A (en) 1984-10-31 1988-10-04 International Business Machines Corporation Processor including fetch operation for branch instruction with control tag
US4991081A (en) * 1984-10-31 1991-02-05 Texas Instruments Incorporated Cache memory addressable by both physical and virtual addresses
JPH0652784B2 (en) 1984-12-07 1994-07-06 富士通株式会社 Gate array integrated circuit device and manufacturing method thereof
US4829467A (en) 1984-12-21 1989-05-09 Canon Kabushiki Kaisha Memory controller including a priority order determination circuit
US5255384A (en) 1985-02-22 1993-10-19 Intergraph Corporation Memory address translation system having modifiable and non-modifiable translation mechanisms
US4714994A (en) 1985-04-30 1987-12-22 International Business Machines Corp. Instruction prefetch buffer control
JPH0762823B2 (en) 1985-05-22 1995-07-05 株式会社日立製作所 Data processing device
US4613941A (en) 1985-07-02 1986-09-23 The United States Of America As Represented By The Secretary Of The Army Routing method in computer aided customization of a two level automated universal array
US4945479A (en) 1985-07-31 1990-07-31 Unisys Corporation Tightly coupled scientific processing system
US4777588A (en) 1985-08-30 1988-10-11 Advanced Micro Devices, Inc. General-purpose register file optimized for intraprocedural register allocation, procedure calls, and multitasking performance
US4719569A (en) * 1985-10-11 1988-01-12 Sun Microsystems, Inc. Arbitrator for allocating access to data processing resources
US4722049A (en) 1985-10-11 1988-01-26 Unisys Corporation Apparatus for out-of-order program execution
JPS62152043A (en) 1985-12-26 1987-07-07 Nec Corp Control system for instruction code access
EP0239081B1 (en) 1986-03-26 1995-09-06 Hitachi, Ltd. Pipelined data processor capable of decoding and executing plural instructions in parallel
JP2545789B2 (en) 1986-04-14 1996-10-23 株式会社日立製作所 Information processing device
US4903196A (en) * 1986-05-02 1990-02-20 International Business Machines Corporation Method and apparatus for guaranteeing the logical integrity of data in the general purpose registers of a complex multi-execution unit uniprocessor
US4811208A (en) * 1986-05-16 1989-03-07 Intel Corporation Stack frame cache on a microprocessor chip
JP2684362B2 (en) 1986-06-18 1997-12-03 株式会社日立製作所 Variable length data storage method
US4814978A (en) * 1986-07-15 1989-03-21 Dataflow Computer Corporation Dataflow processing element, multiprocessor, and processes
JPS6324428A (en) * 1986-07-17 1988-02-01 Mitsubishi Electric Corp Cache memory
US4766566A (en) 1986-08-18 1988-08-23 International Business Machines Corp. Performance enhancement scheme for a RISC type VLSI processor using dual execution units for parallel instruction processing
JPS6393041A (en) 1986-10-07 1988-04-23 Mitsubishi Electric Corp Computer
US4841453A (en) 1986-11-10 1989-06-20 Ibm Corporation Multidirectional scan and print capability
JPH0793358B2 (en) 1986-11-10 1995-10-09 日本電気株式会社 Block placement processing method
JPS63131230A (en) * 1986-11-21 1988-06-03 Hitachi Ltd Information processor
JPH0810430B2 (en) 1986-11-28 1996-01-31 株式会社日立製作所 Information processing device
US5283903A (en) * 1986-12-25 1994-02-01 Nec Corporation Priority selector
US5226170A (en) 1987-02-24 1993-07-06 Digital Equipment Corporation Interface between processor and special instruction processor in digital data processing system
US5179689A (en) * 1987-03-13 1993-01-12 Texas Instruments Incorporated Dataprocessing device with instruction cache
US4833599A (en) 1987-04-20 1989-05-23 Multiflow Computer, Inc. Hierarchical priority branch handling for parallel execution in a parallel processor
US4858116A (en) 1987-05-01 1989-08-15 Digital Equipment Corporation Method and apparatus for managing multiple lock indicators in a multiprocessor computer system
US4811296A (en) 1987-05-15 1989-03-07 Analog Devices, Inc. Multi-port register file with flow-through of data
JPH07113903B2 (en) * 1987-06-26 1995-12-06 株式会社日立製作所 Cache storage control method
US4992938A (en) * 1987-07-01 1991-02-12 International Business Machines Corporation Instruction control mechanism for a computing system with register renaming, map table and queues indicating available registers
US4901233A (en) * 1987-07-20 1990-02-13 International Business Machines Corporation Computer system with logic for writing instruction identifying data into array control lists for precise post-branch recoveries
US5134561A (en) 1987-07-20 1992-07-28 International Business Machines Corporation Computer system with logic for writing instruction identifying data into array control lists for precise post-branch recoveries
US5150309A (en) 1987-08-04 1992-09-22 Texas Instruments Incorporated Comprehensive logic circuit layout system
US4980817A (en) 1987-08-31 1990-12-25 Digital Equipment Vector register system for executing plural read/write commands concurrently and independently routing data to plural read/write ports
US4991078A (en) * 1987-09-29 1991-02-05 Digital Equipment Corporation Apparatus and method for a pipelined central processing unit in a data processing system
EP0312764A3 (en) 1987-10-19 1991-04-10 International Business Machines Corporation A data processor having multiple execution units for processing plural classes of instructions in parallel
US5089951A (en) * 1987-11-05 1992-02-18 Kabushiki Kaisha Toshiba Microcomputer incorporating memory
US5197136A (en) * 1987-11-12 1993-03-23 Matsushita Electric Industrial Co., Ltd. Processing system for branch instruction
US4823201A (en) * 1987-11-16 1989-04-18 Technology, Inc. 64 Processor for expanding a compressed video signal
US5185878A (en) * 1988-01-20 1993-02-09 Advanced Micro Device, Inc. Programmable cache memory as well as system incorporating same and method of operating programmable cache memory
US4926323A (en) 1988-03-03 1990-05-15 Advanced Micro Devices, Inc. Streamlined instruction processor
JPH01228865A (en) 1988-03-09 1989-09-12 Minolta Camera Co Ltd Printer controller
US5187796A (en) * 1988-03-29 1993-02-16 Computer Motion, Inc. Three-dimensional vector co-processor having I, J, and K register files and I, J, and K execution units
US5301278A (en) * 1988-04-29 1994-04-05 International Business Machines Corporation Flexible dynamic memory controller
US5003462A (en) * 1988-05-31 1991-03-26 International Business Machines Corporation Apparatus and method for implementing precise interrupts on a pipelined processor with multiple functional units with separate address translation interrupt means
US4897810A (en) * 1988-06-13 1990-01-30 Advanced Micro Devices, Inc. Asynchronous interrupt status bit circuit
US5261057A (en) 1988-06-30 1993-11-09 Wang Laboratories, Inc. I/O bus to system interface
US5097409A (en) * 1988-06-30 1992-03-17 Wang Laboratories, Inc. Multi-processor system with cache memories
JP2761506B2 (en) 1988-07-08 1998-06-04 株式会社日立製作所 Main memory controller
US5032985A (en) 1988-07-21 1991-07-16 International Business Machines Corporation Multiprocessor system with memory fetch buffer invoked during cross-interrogation
US5148536A (en) 1988-07-25 1992-09-15 Digital Equipment Corporation Pipeline having an integral cache which processes cache misses and loads data in parallel
JPH0673105B2 (en) 1988-08-11 1994-09-14 株式会社東芝 Instruction pipeline type microprocessor
US5291615A (en) * 1988-08-11 1994-03-01 Kabushiki Kaisha Toshiba Instruction pipeline microprocessor
US4974155A (en) 1988-08-15 1990-11-27 Evans & Sutherland Computer Corp. Variable delay branch system
US5101341A (en) * 1988-08-25 1992-03-31 Edgcore Technology, Inc. Pipelined system for reducing instruction access time by accumulating predecoded instruction bits a FIFO
US5167035A (en) 1988-09-08 1992-11-24 Digital Equipment Corporation Transferring messages between nodes in a network
EP0365188B1 (en) * 1988-10-18 1996-09-18 Hewlett-Packard Company Central processor condition code method and apparatus
JP2810068B2 (en) 1988-11-11 1998-10-15 株式会社日立製作所 Processor system, computer system, and instruction processing method
JPH0769811B2 (en) 1988-12-21 1995-07-31 松下電器産業株式会社 Data processing device
US5148533A (en) 1989-01-05 1992-09-15 Bull Hn Information Systems Inc. Apparatus and method for data group coherency in a tightly coupled data processing system with plural execution and data cache units
US5125092A (en) 1989-01-09 1992-06-23 International Business Machines Corporation Method and apparatus for providing multiple condition code fields to to allow pipelined instructions contention free access to separate condition codes
JP2736092B2 (en) * 1989-01-10 1998-04-02 株式会社東芝 Buffer device
US5127091A (en) 1989-01-13 1992-06-30 International Business Machines Corporation System for reducing delay in instruction execution by executing branch instructions in separate processor while dispatching subsequent instructions to primary processor
US5142634A (en) 1989-02-03 1992-08-25 Digital Equipment Corporation Branch prediction
US5125083A (en) 1989-02-03 1992-06-23 Digital Equipment Corporation Method and apparatus for resolving a variable number of potential memory access conflicts in a pipelined computer system
US4985825A (en) * 1989-02-03 1991-01-15 Digital Equipment Corporation System for delaying processing of memory access exceptions until the execution stage of an instruction pipeline of a virtual memory system based digital computer
US5222223A (en) 1989-02-03 1993-06-22 Digital Equipment Corporation Method and apparatus for ordering and queueing multiple memory requests
US5109495A (en) * 1989-02-03 1992-04-28 Digital Equipment Corp. Method and apparatus using a source operand list and a source operand pointer queue between the execution unit and the instruction decoding and operand processing units of a pipelined data processor
US5167026A (en) 1989-02-03 1992-11-24 Digital Equipment Corporation Simultaneously or sequentially decoding multiple specifiers of a variable length pipeline instruction based on detection of modified value of specifier registers
US5067069A (en) * 1989-02-03 1991-11-19 Digital Equipment Corporation Control of multiple functional units with parallel operation in a microcoded execution unit
US5142633A (en) * 1989-02-03 1992-08-25 Digital Equipment Corporation Preprocessing implied specifiers in a pipelined processor
US5133074A (en) 1989-02-08 1992-07-21 Acer Incorporated Deadlock resolution with cache snooping
US5293500A (en) * 1989-02-10 1994-03-08 Mitsubishi Denki K.K. Parallel processing method and apparatus
US5226166A (en) 1989-02-10 1993-07-06 Mitsubishi Denki K.K. Parallel operation processor with second command unit
US5226126A (en) * 1989-02-24 1993-07-06 Nexgen Microsystems Processor having plurality of functional units for orderly retiring outstanding operations based upon its associated tags
US5768575A (en) 1989-02-24 1998-06-16 Advanced Micro Devices, Inc. Semi-Autonomous RISC pipelines for overlapped execution of RISC-like instructions within the multiple superscalar execution units of a processor having distributed pipeline control for sepculative and out-of-order execution of complex instructions
US5119485A (en) 1989-05-15 1992-06-02 Motorola, Inc. Method for data bus snooping in a data processing system by selective concurrent read and invalidate cache operation
US5155809A (en) * 1989-05-17 1992-10-13 International Business Machines Corp. Uncoupling a central processing unit from its associated hardware for interaction with data handling apparatus alien to the operating system controlling said unit and hardware
US5072364A (en) 1989-05-24 1991-12-10 Tandem Computers Incorporated Method and apparatus for recovering from an incorrect branch prediction in a processor that executes a family of instructions in parallel
JPH02308330A (en) * 1989-05-23 1990-12-21 Nec Corp Knowledge information processing device
CA2016068C (en) 1989-05-24 2000-04-04 Robert W. Horst Multiple instruction issue computer architecture
US5129067A (en) 1989-06-06 1992-07-07 Advanced Micro Devices, Inc. Multiple instruction decoder for minimizing register port requirements
US5136697A (en) 1989-06-06 1992-08-04 Advanced Micro Devices, Inc. System for reducing delay for execution subsequent to correctly predicted branch instruction using fetch information stored with each block of instructions in cache
JP2550213B2 (en) * 1989-07-07 1996-11-06 株式会社日立製作所 Parallel processing device and parallel processing method
JPH07120284B2 (en) 1989-09-04 1995-12-20 三菱電機株式会社 Data processing device
US5303382A (en) * 1989-09-21 1994-04-12 Digital Equipment Corporation Arbiter with programmable dynamic request prioritization
US5179530A (en) * 1989-11-03 1993-01-12 Zoran Corporation Architecture for integrated concurrent vector signal processor
US5226125A (en) 1989-11-17 1993-07-06 Keith Balmer Switch matrix having integrated crosspoint logic and method of operation
DE68928980T2 (en) 1989-11-17 1999-08-19 Texas Instruments Inc Multiprocessor with coordinate switch between processors and memories
US5487156A (en) * 1989-12-15 1996-01-23 Popescu; Valeri Processor architecture having independently fetching issuing and updating operations of instructions which are sequentially assigned and stored in order fetched
JPH03186928A (en) * 1989-12-16 1991-08-14 Mitsubishi Electric Corp Data processor
US5179673A (en) * 1989-12-18 1993-01-12 Digital Equipment Corporation Subroutine return prediction mechanism using ring buffer and comparing predicated address with actual address to validate or flush the pipeline
US5197130A (en) 1989-12-29 1993-03-23 Supercomputer Systems Limited Partnership Cluster architecture for a highly parallel scalar/vector multiprocessor system
JPH063583B2 (en) * 1990-01-11 1994-01-12 インターナシヨナル・ビジネス・マシーンズ・コーポレーシヨン Digital computer and method for generating an address during an instruction decode cycle
US5251306A (en) * 1990-01-16 1993-10-05 Advanced Micro Devices, Inc. Apparatus for controlling execution of a program in a computing device
JPH061463B2 (en) 1990-01-16 1994-01-05 インターナショナル・ビジネス・マシーンズ・コーポレーション Multiprocessor system and its private cache control method
US5222240A (en) 1990-02-14 1993-06-22 Intel Corporation Method and apparatus for delaying writing back the results of instructions to a processor
US5241636A (en) 1990-02-14 1993-08-31 Intel Corporation Method for parallel instruction execution in a computer
US5230068A (en) 1990-02-26 1993-07-20 Nexgen Microsystems Cache memory system for dynamically altering single cache memory line as either branch target entry or pre-fetch instruction queue based upon instruction sequence
US5185872A (en) * 1990-02-28 1993-02-09 Intel Corporation System for executing different cycle instructions by selectively bypassing scoreboard register and canceling the execution of conditionally issued instruction if needed resources are busy
US5120083A (en) 1990-03-19 1992-06-09 Henkels & Mccoy, Inc. Expansion joint for conduit for cables
JP2818249B2 (en) 1990-03-30 1998-10-30 株式会社東芝 Electronic computer
IT1247640B (en) 1990-04-26 1994-12-28 St Microelectronics Srl BOOLEAN OPERATIONS BETWEEN TWO ANY BITS OF TWO ANY REGISTERS
US5201056A (en) * 1990-05-02 1993-04-06 Motorola, Inc. RISC microprocessor architecture with multi-bit tag extended instructions for selectively attaching tag from either instruction or input data to arithmetic operation output
US5214763A (en) * 1990-05-10 1993-05-25 International Business Machines Corporation Digital computer system capable of processing two or more instructions in parallel and having a coche and instruction compounding mechanism
EP0457403B1 (en) 1990-05-18 1998-01-21 Koninklijke Philips Electronics N.V. Multilevel instruction cache and method for using said cache
US5249286A (en) 1990-05-29 1993-09-28 National Semiconductor Corporation Selectively locking memory locations within a microprocessor's on-chip cache
CA2038264C (en) 1990-06-26 1995-06-27 Richard James Eickemeyer In-memory preprocessor for a scalable compound instruction set machine processor
DE69127936T2 (en) 1990-06-29 1998-05-07 Digital Equipment Corp Bus protocol for processor with write-back cache
US5155843A (en) 1990-06-29 1992-10-13 Digital Equipment Corporation Error transition mode for multi-processor system
CA2045773A1 (en) 1990-06-29 1991-12-30 Compaq Computer Corporation Byte-compare operation for high-performance processor
US5197132A (en) * 1990-06-29 1993-03-23 Digital Equipment Corporation Register mapping system having a log containing sequential listing of registers that were changed in preceding cycles for precise post-branch recovery
US5778423A (en) * 1990-06-29 1998-07-07 Digital Equipment Corporation Prefetch instruction for improving performance in reduced instruction set processor
EP0463965B1 (en) * 1990-06-29 1998-09-09 Digital Equipment Corporation Branch prediction unit for high-performance processor
JPH0480823A (en) * 1990-07-23 1992-03-13 Nec Corp Conversion system for machine language instruction train
JP3035324B2 (en) 1990-09-03 2000-04-24 日本電信電話株式会社 How to change satellite spin axis
USH1291H (en) 1990-12-20 1994-02-01 Hinton Glenn J Microprocessor in which multiple instructions are executed in one clock cycle by providing separate machine bus access to a register file for different types of instructions
US5222244A (en) * 1990-12-20 1993-06-22 Intel Corporation Method of modifying a microinstruction with operands specified by an instruction held in an alias register
US5303362A (en) * 1991-03-20 1994-04-12 Digital Equipment Corporation Coupled memory multiprocessor computer system including cache coherency management protocols
US5261071A (en) * 1991-03-21 1993-11-09 Control Data System, Inc. Dual pipe cache memory with out-of-order issue capability
US5287467A (en) * 1991-04-18 1994-02-15 International Business Machines Corporation Pipeline for removing and concurrently executing two or more branch instructions in synchronization with other instructions executing in the execution unit
US5488729A (en) * 1991-05-15 1996-01-30 Ross Technology, Inc. Central processing unit architecture with symmetric instruction scheduling to achieve multiple instruction launch and execution
US5355457A (en) * 1991-05-21 1994-10-11 Motorola, Inc. Data processor for performing simultaneous instruction retirement and backtracking
US5630157A (en) 1991-06-13 1997-05-13 International Business Machines Corporation Computer organization for multiple and out-of-order execution of condition code testing and setting instructions
US5278963A (en) * 1991-06-21 1994-01-11 International Business Machines Corporation Pretranslation of virtual addresses prior to page crossing
EP0547240B1 (en) * 1991-07-08 2000-01-12 Seiko Epson Corporation Risc microprocessor architecture implementing fast trap and exception state
US5961629A (en) 1991-07-08 1999-10-05 Seiko Epson Corporation High performance, superscalar-based computer system with out-of-order instruction execution
US5539911A (en) 1991-07-08 1996-07-23 Seiko Epson Corporation High-performance, superscalar-based computer system with out-of-order instruction execution
EP0886209B1 (en) 1991-07-08 2005-03-23 Seiko Epson Corporation Extensible risc microprocessor architecture
WO1993001565A1 (en) * 1991-07-08 1993-01-21 Seiko Epson Corporation Single chip page printer controller
US5493687A (en) 1991-07-08 1996-02-20 Seiko Epson Corporation RISC microprocessor architecture implementing multiple typed register sets
US5440752A (en) 1991-07-08 1995-08-08 Seiko Epson Corporation Microprocessor architecture with a switch network for data transfer between cache, memory port, and IOU
US5826055A (en) 1991-07-08 1998-10-20 Seiko Epson Corporation System and method for retiring instructions in a superscalar microprocessor
GB2260628A (en) 1991-10-11 1993-04-21 Intel Corp Line buffer for cache memory
JPH0820949B2 (en) * 1991-11-26 1996-03-04 松下電器産業株式会社 Information processing device
US5285527A (en) * 1991-12-11 1994-02-08 Northern Telecom Limited Predictive historical cache memory
US5617554A (en) 1992-02-10 1997-04-01 Intel Corporation Physical address size selection and page size selection in an address translator
US5398330A (en) * 1992-03-05 1995-03-14 Seiko Epson Corporation Register file backup queue
JPH07504773A (en) 1992-03-18 1995-05-25 セイコーエプソン株式会社 System and method for supporting multi-width memory subsystems
JP3730252B2 (en) 1992-03-31 2005-12-21 トランスメタ コーポレイション Register name changing method and name changing system
US5371684A (en) 1992-03-31 1994-12-06 Seiko Epson Corporation Semiconductor floor plan for a register renaming circuit
DE69308548T2 (en) 1992-05-01 1997-06-12 Seiko Epson Corp DEVICE AND METHOD FOR COMPLETING THE COMMAND IN A SUPER-SCALAR PROCESSOR.
US5442756A (en) * 1992-07-31 1995-08-15 Intel Corporation Branch prediction and resolution apparatus for a superscalar computer processor
US5619668A (en) * 1992-08-10 1997-04-08 Intel Corporation Apparatus for register bypassing in a microprocessor
US6735685B1 (en) 1992-09-29 2004-05-11 Seiko Epson Corporation System and method for handling load and/or store operations in a superscalar microprocessor
US5524225A (en) 1992-12-18 1996-06-04 Advanced Micro Devices Inc. Cache system and method for providing software controlled writeback
US5604912A (en) * 1992-12-31 1997-02-18 Seiko Epson Corporation System and method for assigning tags to instructions to control instruction execution
US5628021A (en) 1992-12-31 1997-05-06 Seiko Epson Corporation System and method for assigning tags to control instruction processing in a superscalar processor
WO1994016384A1 (en) * 1992-12-31 1994-07-21 Seiko Epson Corporation System and method for register renaming
US5627984A (en) 1993-03-31 1997-05-06 Intel Corporation Apparatus and method for entry allocation for a buffer resource utilizing an internal two cycle pipeline
JPH09500989A (en) 1993-05-14 1997-01-28 インテル・コーポレーション Inference history in branch target buffer
US5577217A (en) 1993-05-14 1996-11-19 Intel Corporation Method and apparatus for a branch target buffer with shared branch pattern tables for associated branch predictions
JPH0728695A (en) 1993-07-08 1995-01-31 Nec Corp Memory controller
US5613132A (en) * 1993-09-30 1997-03-18 Intel Corporation Integer and floating point register alias table within processor device
US5446912A (en) 1993-09-30 1995-08-29 Intel Corporation Partial width stalls within register alias table
US5630149A (en) 1993-10-18 1997-05-13 Cyrix Corporation Pipelined processor with register renaming hardware to accommodate multiple size registers
US5689672A (en) 1993-10-29 1997-11-18 Advanced Micro Devices, Inc. Pre-decoded instruction cache and method therefor particularly suitable for variable byte-length instructions
EP0651321B1 (en) 1993-10-29 2001-11-14 Advanced Micro Devices, Inc. Superscalar microprocessors
JP3218524B2 (en) 1993-12-22 2001-10-15 村田機械株式会社 Extrusion detection device for work holder
US5574935A (en) 1993-12-29 1996-11-12 Intel Corporation Superscalar processor with a multi-port reorder buffer
US5630075A (en) 1993-12-30 1997-05-13 Intel Corporation Write combining buffer for sequentially addressed partial line operations originating from a single instruction
US5604877A (en) * 1994-01-04 1997-02-18 Intel Corporation Method and apparatus for resolving return from subroutine instructions in a computer processor
US5619664A (en) * 1994-01-04 1997-04-08 Intel Corporation Processor with architecture for improved pipelining of arithmetic instructions by forwarding redundant intermediate data forms
US5627985A (en) 1994-01-04 1997-05-06 Intel Corporation Speculative and committed resource files in an out-of-order processor
US5452426A (en) 1994-01-04 1995-09-19 Intel Corporation Coordinating speculative and committed state register source data and immediate source data in a processor
US5577200A (en) 1994-02-28 1996-11-19 Intel Corporation Method and apparatus for loading and storing misaligned data on an out-of-order execution computer system
US5608885A (en) * 1994-03-01 1997-03-04 Intel Corporation Method for handling instructions from a branch prior to instruction decoding in a computer which executes variable-length instructions
US5586278A (en) 1994-03-01 1996-12-17 Intel Corporation Method and apparatus for state recovery following branch misprediction in an out-of-order microprocessor
US5625788A (en) 1994-03-01 1997-04-29 Intel Corporation Microprocessor with novel instruction for signaling event occurrence and for providing event handling information in response thereto
US5630083A (en) 1994-03-01 1997-05-13 Intel Corporation Decoder for decoding multiple instructions in parallel
US5564056A (en) * 1994-03-01 1996-10-08 Intel Corporation Method and apparatus for zero extension and bit shifting to preserve register parameters in a microprocessor utilizing register renaming
US5623628A (en) * 1994-03-02 1997-04-22 Intel Corporation Computer system and method for maintaining memory consistency in a pipelined, non-blocking caching bus request queue
US5394351A (en) * 1994-03-11 1995-02-28 Nexgen, Inc. Optimized binary adder and comparator having an implicit constant for an input
US5574927A (en) * 1994-03-25 1996-11-12 International Meta Systems, Inc. RISC architecture computer configured for emulation of the instruction set of a target computer
US5490280A (en) * 1994-03-31 1996-02-06 Intel Corporation Apparatus and method for entry allocation for a resource buffer
US5615126A (en) * 1994-08-24 1997-03-25 Lsi Logic Corporation High-speed internal interconnection technique for integrated circuits that reduces the number of signal lines through multiplexing
BR9509845A (en) * 1994-12-02 1997-12-30 Intel Corp Microprocessor with compacting operation of composite operating elements
US5819101A (en) 1994-12-02 1998-10-06 Intel Corporation Method for packing a plurality of packed data elements in response to a pack instruction
US5666494A (en) 1995-03-31 1997-09-09 Samsung Electronics Co., Ltd. Queue management mechanism which allows entries to be processed in any order
US6385634B1 (en) 1995-08-31 2002-05-07 Intel Corporation Method for performing multiply-add operations on packed data
US5745375A (en) 1995-09-29 1998-04-28 Intel Corporation Apparatus and method for controlling power usage
US5778210A (en) * 1996-01-11 1998-07-07 Intel Corporation Method and apparatus for recovering the state of a speculatively scheduled operation in a processor which cannot be executed at the speculated time
US5832205A (en) * 1996-08-20 1998-11-03 Transmeta Corporation Memory controller for a microprocessor for detecting a failure of speculation on the physical nature of a component being addressed
JP4096132B2 (en) 1997-07-24 2008-06-04 富士ゼロックス株式会社 Time-series information specific section determination device, information storage device, and information storage / playback device
US6418529B1 (en) 1998-03-31 2002-07-09 Intel Corporation Apparatus and method for performing intra-add operation
JP4054638B2 (en) 2002-08-30 2008-02-27 スミダコーポレーション株式会社 Optical pickup
US7897110B2 (en) 2005-12-20 2011-03-01 Asml Netherlands B.V. System and method for detecting at least one contamination species in a lithographic apparatus

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1991020031A1 (en) * 1990-06-11 1991-12-26 Supercomputer Systems Limited Partnership Method for optimizing instruction scheduling
EP0533337A1 (en) * 1991-09-20 1993-03-24 Advanced Micro Devices, Inc. Apparatus and method for resolving dependencies among a plurality of instructions within a storage device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
COMPUTER ARCHITECTURE NEWS vol. 17, no. 2, April 1989, NEW YORK US pages 290 - 302 M. D. SMITH ET AL. 'Limits on Multiple Instruction Issue' *
IBM TECHNICAL DISCLOSURE BULLETIN. vol. 27, no. 7A, December 1984, NEW YORK US pages 3735 - 3736 R. D. DEGROOT 'Method for Prioritizing Waiting Arithmetic Instructions' *
IEEE MICRO. vol. 11, no. 3, June 1991, NEW YORK US PAGES 10-13 & 63-73 V. POPESCU ET AL. 'The Metaflow Architecture' *
SIGPLAN NOTICES vol. 21, no. 7, July 1986, PALO ALTO, CA, US pages 11 - 16 P. B. GIBBONS, S. S. MUCHNICK 'Efficient Instruction Scheduling for a Pipelined Architecture' *

Cited By (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2265481B (en) * 1992-03-25 1995-12-20 Hewlett Packard Co Memory processor that permits aggressive execution of load instructions
WO1994016384A1 (en) * 1992-12-31 1994-07-21 Seiko Epson Corporation System and method for register renaming
US6092176A (en) * 1992-12-31 2000-07-18 Seiko Epson Corporation System and method for assigning tags to control instruction processing in a superscalar processor
US6360309B1 (en) 1992-12-31 2002-03-19 Seiko Epson Corporation System and method for assigning tags to control instruction processing in a superscalar processor
US5896542A (en) * 1992-12-31 1999-04-20 Seiko Epson Corporation System and method for assigning tags to control instruction processing in a superscalar processor
US6757808B2 (en) 1992-12-31 2004-06-29 Seiko Epson Corporation System and method for assigning tags to control instruction processing in a superscalar processor
US7043624B2 (en) 1992-12-31 2006-05-09 Seiko Epson Corporation System and method for assigning tags to control instruction processing in a superscalar processor
US7430651B2 (en) 1992-12-31 2008-09-30 Seiko-Epson Corporation System and method for assigning tags to control instruction processing in a superscalar processor
EP0605875A1 (en) * 1993-01-08 1994-07-13 International Business Machines Corporation Method and system for single cycle dispatch of multiple instruction in a superscalar processor system
US5465373A (en) * 1993-01-08 1995-11-07 International Business Machines Corporation Method and system for single cycle dispatch of multiple instructions in a superscalar processor system
US5630149A (en) * 1993-10-18 1997-05-13 Cyrix Corporation Pipelined processor with register renaming hardware to accommodate multiple size registers
EP0779577A2 (en) * 1993-10-18 1997-06-18 Cyrix Corporation Micoprocessor pipe control and register translation
EP0779577A3 (en) * 1993-10-18 1997-09-17 Cyrix Corp
US6073231A (en) * 1993-10-18 2000-06-06 Via-Cyrix, Inc. Pipelined processor with microcontrol of register translation hardware
EP0649085A1 (en) 1993-10-18 1995-04-19 Cyrix Corporation Microprocessor pipe control and register translation
US6138230A (en) * 1993-10-18 2000-10-24 Via-Cyrix, Inc. Processor with multiple execution pipelines using pipe stage state information to control independent movement of instructions between pipe stages of an execution pipeline
US5996067A (en) * 1994-04-26 1999-11-30 Advanced Micro Devices, Inc. Range finding circuit for selecting a consecutive sequence of reorder buffer entries using circular carry lookahead
US5689693A (en) * 1994-04-26 1997-11-18 Advanced Micro Devices, Inc. Range finding circuit for selecting a consecutive sequence of reorder buffer entries using circular carry lookahead
US6351801B1 (en) 1994-06-01 2002-02-26 Advanced Micro Devices, Inc. Program counter update mechanism
US5799162A (en) * 1994-06-01 1998-08-25 Advanced Micro Devices, Inc. Program counter update mechanism
US6035386A (en) * 1994-06-01 2000-03-07 Advanced Micro Devices, Inc. Program counter update mechanism
US5559975A (en) * 1994-06-01 1996-09-24 Advanced Micro Devices, Inc. Program counter update mechanism
EP0685788A1 (en) * 1994-06-01 1995-12-06 Advanced Micro Devices, Inc. Programme counter update mechanism
EP0709769A3 (en) * 1994-10-24 1997-04-16 Ibm Apparatus and method for the analysis and resolution of operand dependencies
US5878244A (en) * 1995-01-25 1999-03-02 Advanced Micro Devices, Inc. Reorder buffer configured to allocate storage capable of storing results corresponding to a maximum number of concurrently receivable instructions regardless of a number of instructions received
US5664120A (en) * 1995-08-25 1997-09-02 International Business Machines Corporation Method for executing instructions and execution unit instruction reservation table within an in-order completion processor
US5826089A (en) * 1996-01-04 1998-10-20 Advanced Micro Devices, Inc. Instruction translation unit configured to translate from a first instruction set to a second instruction set
WO1997025669A1 (en) * 1996-01-04 1997-07-17 Advanced Micro Devices, Inc. Method and apparatus to translate a first instruction set to a second instruction set
US6249862B1 (en) 1996-05-17 2001-06-19 Advanced Micro Devices, Inc. Dependency table for reducing dependency checking hardware
US6209084B1 (en) 1996-05-17 2001-03-27 Advanced Micro Devices, Inc. Dependency table for reducing dependency checking hardware
US6108769A (en) * 1996-05-17 2000-08-22 Advanced Micro Devices, Inc. Dependency table for reducing dependency checking hardware
US5961634A (en) * 1996-07-26 1999-10-05 Advanced Micro Devices, Inc. Reorder buffer having a future file for storing speculative instruction execution results
US5946468A (en) * 1996-07-26 1999-08-31 Advanced Micro Devices, Inc. Reorder buffer having an improved future file for storing speculative instruction execution results
US5915110A (en) * 1996-07-26 1999-06-22 Advanced Micro Devices, Inc. Branch misprediction recovery in a reorder buffer having a future file
US5983342A (en) * 1996-09-12 1999-11-09 Advanced Micro Devices, Inc. Superscalar microprocessor employing a future file for storing results into multiportion registers
WO1998033116A2 (en) * 1997-01-29 1998-07-30 Advanced Micro Devices, Inc. A line-oriented reorder buffer for a superscalar microprocessor
EP1204022A1 (en) * 1997-01-29 2002-05-08 Advanced Micro Devices, Inc. A line-oriented reorder buffer for a superscalar microprocessor
WO1998033116A3 (en) * 1997-01-29 1998-11-05 Advanced Micro Devices Inc A line-oriented reorder buffer for a superscalar microprocessor
GB2337142B (en) * 1997-02-21 2002-04-17 Richard Byron Wilmot Method and apparatus for forwarding of operands in a computer system
WO1998037485A1 (en) * 1997-02-21 1998-08-27 Richard Byron Wilmot Method and apparatus for forwarding of operands in a computer system
GB2337142A (en) * 1997-02-21 1999-11-10 Richard Byron Wilmot Method and apparatus for forwarding of operands in a computer system
WO1999008185A1 (en) * 1997-08-06 1999-02-18 Advanced Micro Devices, Inc. A dependency table for reducing dependency checking hardware
WO1999026132A3 (en) * 1997-11-17 1999-09-16 Advanced Micro Devices Inc Processor configured to generate lookahead results from collapsed moves, compares and simple arithmetic instructions
WO1999026132A2 (en) * 1997-11-17 1999-05-27 Advanced Micro Devices, Inc. Processor configured to generate lookahead results from collapsed moves, compares and simple arithmetic instructions
US6112293A (en) * 1997-11-17 2000-08-29 Advanced Micro Devices, Inc. Processor configured to generate lookahead results from operand collapse unit and for inhibiting receipt/execution of the first instruction based on the lookahead result
US6810475B1 (en) 1998-10-06 2004-10-26 Texas Instruments Incorporated Processor with pipeline conflict resolution using distributed arbitration and shadow registers
EP1004959A2 (en) * 1998-10-06 2000-05-31 Texas Instruments Incorporated Processor with pipeline protection
EP1004959A3 (en) * 1998-10-06 2003-02-05 Texas Instruments Incorporated Processor with pipeline protection
EP1006439A1 (en) * 1998-11-30 2000-06-07 Nec Corporation Instruction-issuing circuit and method for out-of-order execution that set reference dependency information in an instruction when a succeeding instruction is stored in an instruction buffer
US6553484B1 (en) 1998-11-30 2003-04-22 Nec Corporation Instruction-issuing circuit that sets reference dependency information in a preceding instruction when a succeeding instruction is stored in an instruction out-of-order buffer
GB2372120B (en) * 2000-11-29 2005-08-03 Nec Corp Data processor with an improved data dependence detector
GB2372120A (en) * 2000-11-29 2002-08-14 Nec Corp Data dependence detector
US6931514B2 (en) 2000-11-29 2005-08-16 Nec Corporation Data dependency detection using history table of entry number hashed from memory address
US7418583B2 (en) 2000-11-29 2008-08-26 Nec Corporation Data dependency detection using history table of entry number hashed from memory address
WO2002057908A3 (en) * 2001-01-16 2002-11-07 Sun Microsystems Inc A superscalar processor having content addressable memory structures for determining dependencies
WO2002057908A2 (en) * 2001-01-16 2002-07-25 Sun Microsystems, Inc. A superscalar processor having content addressable memory structures for determining dependencies
US6862676B1 (en) 2001-01-16 2005-03-01 Sun Microsystems, Inc. Superscalar processor having content addressable memory structures for determining dependencies
GB2563582A (en) * 2017-06-16 2018-12-26 Imagination Tech Ltd Methods and systems for inter-pipeline data hazard avoidance
GB2563582B (en) * 2017-06-16 2020-01-01 Imagination Tech Ltd Methods and systems for inter-pipeline data hazard avoidance
US10817301B2 (en) 2017-06-16 2020-10-27 Imagination Technologies Limited Methods and systems for inter-pipeline data hazard avoidance
US11200064B2 (en) 2017-06-16 2021-12-14 Imagination Technologies Limited Methods and systems for inter-pipeline data hazard avoidance
US11698790B2 (en) 2017-06-16 2023-07-11 Imagination Technologies Limited Queues for inter-pipeline data hazard avoidance
US11900122B2 (en) 2017-06-16 2024-02-13 Imagination Technologies Limited Methods and systems for inter-pipeline data hazard avoidance

Also Published As

Publication number Publication date
JP3571265B2 (en) 2004-09-29
US20030005260A1 (en) 2003-01-02
US5497499A (en) 1996-03-05
US7802074B2 (en) 2010-09-21
JP3571263B2 (en) 2004-09-29
US20080059770A1 (en) 2008-03-06
JP3571264B2 (en) 2004-09-29
JP2000148486A (en) 2000-05-30
EP0636256A1 (en) 1995-02-01
JP2000148488A (en) 2000-05-30
KR100371927B1 (en) 2003-02-12
KR100371930B1 (en) 2003-02-12
EP0636256B1 (en) 1997-06-04
JP2000148485A (en) 2000-05-30
JP2000148484A (en) 2000-05-30
JP3571266B2 (en) 2004-09-29
JP2000148487A (en) 2000-05-30
JPH07505494A (en) 1995-06-15
KR950701101A (en) 1995-02-20
JP3730252B2 (en) 2005-12-21
DE69311330T2 (en) 1997-09-25
JP3571267B2 (en) 2004-09-29
US6289433B1 (en) 2001-09-11
DE69311330D1 (en) 1997-07-10
US7051187B2 (en) 2006-05-23
US5737624A (en) 1998-04-07
WO1993020505A3 (en) 1993-11-25
US5974526A (en) 1999-10-26
US20060041736A1 (en) 2006-02-23
JP2000148489A (en) 2000-05-30

Similar Documents

Publication Publication Date Title
US5974526A (en) Superscalar RISC instruction scheduling
US5371684A (en) Semiconductor floor plan for a register renaming circuit
KR0122529B1 (en) Method and system for single cycle dispatch of multiple instruction in a superscalar processor system
US7650486B2 (en) Dynamic recalculation of resource vector at issue queue for steering of dependent instructions
US6272617B1 (en) System and method for register renaming
EP0762270B1 (en) Microprocessor with load/store operation to/from multiple registers
EP0638183A1 (en) A system and method for retiring instructions in a superscalar microprocessor.
US5978900A (en) Renaming numeric and segment registers using common general register pool
JP2001092657A (en) Central arithmetic unit and compile method and recording medium recording compile program
US5765017A (en) Method and system in a data processing system for efficient management of an indication of a status of each of multiple registers

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): JP KR

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE

AK Designated states

Kind code of ref document: A3

Designated state(s): JP KR

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 1019940703382

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 1993906834

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1993906834

Country of ref document: EP

WWG Wipo information: grant in national office

Ref document number: 1993906834

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 1020007014694

Country of ref document: KR

Ref document number: 1020007014693

Country of ref document: KR