US20040049657A1 - Extended register space apparatus and methods for processors - Google Patents

Extended register space apparatus and methods for processors Download PDF

Info

Publication number
US20040049657A1
US20040049657A1 US10/238,276 US23827602A US2004049657A1 US 20040049657 A1 US20040049657 A1 US 20040049657A1 US 23827602 A US23827602 A US 23827602A US 2004049657 A1 US2004049657 A1 US 2004049657A1
Authority
US
United States
Prior art keywords
instruction
register space
processor
field
registers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/238,276
Inventor
Ralph Kling
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Priority to US10/238,276 priority Critical patent/US20040049657A1/en
Assigned to INTEL CORPORATION reassignment INTEL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KLING, RALPH M.
Publication of US20040049657A1 publication Critical patent/US20040049657A1/en
Priority to US11/830,473 priority patent/US7676654B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30134Register stacks; shift registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30138Extension of register space, e.g. register cache

Definitions

  • the present disclosure relates generally to microprocessors and, more particularly, to apparatus and methods that extend the register space available to a processor without requiring modification of the instruction set encodings associated with that processor.
  • the architectural register set or register space of a processor is typically physically integrated within the processor (i.e., is on-chip).
  • Register space or registers may be used to facilitate the rapid execution of instructions and manipulation of operand values by a processor.
  • the registers making up a register space are not a shared resource and, as a result, can be accessed more rapidly than other resources that are physically external or separate from the processor chip (i.e., off-chip) and/or which are shared with other agent resources.
  • the register space of a processor is not subject to memory coherency schemes (such as those that are used within multiprocessor systems) and other operational overhead associated with the management of shared memory resources. Also, using a memory stack in lieu of a larger register file introduces additional overhead associated with address calculations.
  • Some microprocessors or processors provide a relatively limited register space or architectural register set.
  • the thirty-two bit Intel processor families which are collectively referred to as IA-32 processors, provide eight thirty-two bit general purpose registers, which are located on-chip.
  • IPC instruction-per-clock-cycle
  • many compiler optimizations which are usually used to increase the effective instruction-per-clock-cycle (IPC) rate of processors, typically require more than eight general purpose registers.
  • a larger number of registers is generally beneficial because a larger number of registers enables program execution to be carried out using fewer memory-based operations, thereby reducing the overhead associated with accessing stack-based operands and, thus, reducing cache occupation and bandwidth (i.e., cache ports) overhead. Reducing the number of stack-based memory operations performed by a processor can free a substantial amount of cache space and bandwidth for use by other load, store and prefetch instructions, which can substantially increase the IPC rate of the processor.
  • FIG. 1 is a block diagram of an example processor system that uses the extended register space apparatus and methods described herein;
  • FIG. 2 is a more detailed block diagram of the processor shown in FIG. 1;
  • FIG. 3 is a block diagram that depicts an example manner in which an instruction encoding can be used by the processor shown in FIGS. 1 and 2 to access an extended register space;
  • FIG. 4 is a flow diagram that depicts an example manner in which the processor shown in FIGS. 1 and 2 can process an instruction encoding to access an extended register space;
  • FIG. 5 is a block diagram that depicts another example manner in which an instruction encoding can be used by the processor shown in FIGS. 1 and 2 to access an extended register space.
  • FIG. 1 is a block diagram of an example processor system 10 that uses the extended register space apparatus and methods described herein.
  • the processor system 10 includes a processor 12 that is coupled to an interconnection bus or network 14 .
  • the processor 12 includes an architectural register set or register space 16 , which is depicted in FIG. 1 as being entirely on-chip, but which could alternatively be located entirely or partially off-chip and directly coupled to the processor 12 via dedicated electrical connections and/or via the interconnection network or bus 14 .
  • the processor 12 may be any suitable processor, processing unit or microprocessor such as, for example, an Intel ItaniumTM processor, Intel X-ScaleTM processor, Intel PentiumTM processor, etc. However, in the example described in detail below, the processor 12 is a thirty-two bit Intel processor, which is commonly referred to as an IA-32 processor.
  • the register space 16 is extended to provide more than eight thirty-two bit general purpose registers, which are currently provided by existing IA-32 processors.
  • the system 10 may be a multi-processor system and, thus, may include one or more additional processors that are identical or similar to the processor 12 and which are coupled to the interconnection bus or network 14 .
  • the processor 12 of FIG. 1 is coupled to a chipset 18 , which includes a memory controller 20 and an input/output (I/O) controller 22 .
  • a chipset typically provides I/O and memory management functions as well as a plurality of general purpose and/or special purpose registers, timers, etc. that are accessible or used by one or more processors coupled to the chipset.
  • the memory controller 20 performs functions that enable the processor 12 (or processors if there are multiple processors) to access a system memory 24 , which may include any desired type of volatile memory such as, for example, static random access memory (SRAM), dynamic random access memory (DRAM), etc.
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • the I/O controller 22 performs functions that enable the processor 12 to communicate with peripheral input/output (I/O) devices 26 and 28 via an I/O bus 30 .
  • the I/O devices 26 and 28 may be any desired type of I/O device such as, for example, a keyboard, a video display or monitor, a mouse, etc. While the memory controller 20 and the I/O controller 22 are depicted in FIG. 1 as separate functional blocks within the chipset 18 , the functions performed by these blocks may be integrated within a single semiconductor circuit or may be implemented using two or more separate integrated circuits.
  • FIG. 2 is a more detailed block diagram of the processor 12 shown in FIG. 1.
  • the register space 16 of the processor 12 includes eight on-chip general purpose registers 36 that are currently provided by existing IA-32 processors and an extended on-chip register space or set of registers 38 .
  • the processor 12 includes instruction processing hardware and/or logic 40 which, in addition to the pipeline hardware provided with known IA-32 processors, includes two decoding blocks 42 and 44 that are adapted to process or decode instructions or portions of an instruction in parallel.
  • the processor 12 includes microcode 46 that, in addition to enabling the processor 12 to carry out the functions of a known IA-32 processor, enables the processor 12 to utilize the extended register space 38 for carrying out existing IA-32 instruction set encodings.
  • FIG. 3 is a block diagram that depicts an example manner in which an existing or standard IA-32 instruction encoding can be used by the processor 12 of FIGS. 1 and 2 to access the extended register space 38 .
  • the encoding fields 50 of a standard instruction for an IA-32 processor include an optional prefix field 52 , an opcode field 54 , an Mr/m field 56 , an Sib field 58 , a displacement addressing field 60 , and an immediate addressing field 62 .
  • the IA-32 processor instruction encoding fields 50 shown in FIG. 3 are well known, additional detailed description of these fields is not required. However, for purposes of facilitating an understanding of the examples described herein, some additional description of the purpose and operation of these fields is provided below.
  • the opcode field 54 contains the binary encoding, which in this example is one-byte or eight bits of encoding, required to carry out a particular processor operation such as, for example, an arithmetic operation, a memory access operation, a register contents manipulation (e.g., shift), or any combination of these operations.
  • the Mr/m field 56 is a one-byte field that determines the addressing mode to be used in carrying out an instruction (e.g., execution of an instruction by a processor such as the processor 12 shown in FIG. 1). For example, a displacement addressing mode or an immediate addressing mode may be used depending on the status of the bits within the Mr/m field 56 .
  • a displacement addressing mode uses the contents of the displacement field 60 to address an operand associated with an instruction relative to another memory address such as, for example, the starting address of the instruction.
  • an immediate addressing mode uses the immediate addressing field 62 to address an operand associated with the instruction based on the contents of the immediate addressing field 62 .
  • the immediate addressing field 62 typically contains an absolute (as opposed to a relative) memory address, which is associated with an operand of the instruction.
  • the example instruction described in connection with FIG. 3 is an add with carry instruction, which is represented mnemonically as ADC.
  • ADC add with carry instruction
  • the ADC instruction for an IA-32 processor requires two operands, one of which is referred to as a source (SRC) operand and the other of which is referred to as a destination (DEST) operand.
  • SRC source
  • DEST destination
  • the register address represented in the Mr/m field 56 may be either the location of the SRC operand or the DEST operand.
  • the on-chip register is the DEST operand and existing IA-32 processors use the displacement field 60 to address a portion of system memory (e.g., a portion of the memory 24 shown in FIG. 1) for the SRC operand.
  • the on-chip register is the SRC operand and existing IA-32 processors use the displacement field 60 to address system memory for the DEST operand.
  • the register space 16 (FIG. 1) is extended and, thus, contains more than the eight traditional general purpose registers currently provided with IA-32 processors.
  • the register space 16 is extended to contain an additional 1024 thirty-two bit registers.
  • any other number of additional registers may be used instead.
  • the apparatus and methods described herein enable the instruction encoding fields 50 shown in FIG. 3 to access the register space 16 of the processor 12 .
  • FIG. 4 when executing an ADC instruction in a displacement addressing mode as depicted in FIG.
  • the processor 12 reads the most significant (i.e., the upper) twenty bits of the displacement field 60 as a page identifier or tag and then compares this page identifier or tag to a predetermined identifier value associated with the extended register space 38 . As described in detail in connection with FIG. 4 below, if the page identifier or tag read from the displacement field 60 matches the identifier value associated with the extended register space 38 , the processor 12 processes the instruction by using the lower twelve bits of the displacement field 60 to access one of the two operands of the instruction within the extended register space 38 .
  • the lower twelve bits or offset of the displacement field 60 are used as a register index to the extended register space 38 .
  • bits two to eleven are used to address the 1024 thirty-two bit registers. The lowest two bits (i.e., zero and one) are ignored because these bits correspond to (i.e., may be used to individually address or select) the four bytes making up each thirty-two bit register word.
  • bits three to five of the Mr/m field 56 address the SRC operand
  • bits two to eleven of the displacement field 60 are used by the processor 12 to address the DEST operand within the extended register space 38 .
  • bits three to five of the Mr/m field 56 address the DEST operand
  • bits two to eleven of the displacement field 60 are used by the processor 12 to address the SRC operand within the extended register space 38 .
  • FIG. 3 uses a single page identifier or tag that corresponds to a four kilobyte page or 1024 thirty-two bit words within the memory map of the processor 12
  • additional page identifiers or tags could be used to enable the processor 12 to access more than 1024 thirty-two bit registers within the extended register space 38 .
  • fewer than 1024 thirty-two bit registers may be provided within the extended register space 38 , in which case some of the register addresses provided by the lower twelve bits of the displacement field 60 may be unused or ignored.
  • a tag having more than twenty bits may be used to access registers within the extended register space 38 .
  • the offset or register index portion of the displacement field 60 would have fewer than twelve bits and, thus, would enable addressing and access to fewer than 1024 thirty-two bit registers.
  • any other instruction using memory operands could be used instead.
  • FIG. 3 is based on using an instruction set for an IA-32 processor, other instruction sets associated with other processor types could be used instead.
  • the fields associated with the native register address and memory address would be used instead of the IA-32 fields “M/rm” and “displacement.”
  • the processor 12 is an IA-32 processor and the register space 16 includes the eight general purpose on-chip registers that are traditionally provided by known IA-32 processors and an additional 1024 thirty-two bit on-chip registers, which have not previously been provided with IA-32 processors.
  • the processor 12 includes microarchitecture (e.g., microcode) for causing the processor 12 to carry out the instruction processing technique described in detail in connection with FIG. 4 below.
  • the operating system (OS) and/or basic input/output system (BIOS) of the computer system 10 is configured so that the memory map of the system 10 reserves the memory page associated with the extended register space 38 for exclusive use by the processor 12 .
  • the memory page identifier that would normally be used by existing IA-32 processors to address a physical page of memory within the system memory 24 is instead used exclusively by the processor 12 (i.e., is not shared by other resources within the system 10 ) to address registers within the extended register space 38 .
  • FIG. 4 is a flow diagram that depicts an example manner in which the processor 12 shown in FIGS. 1 and 2 can process existing or standard IA-32 instruction encodings to access the extended register space 38 .
  • the flow diagram shown in FIG. 4 depicts an example manner in which the front-end instruction processing pipeline within the instruction processing hardware or logic 40 of the processor 12 is configured to operate when processing a standard IA-32 instruction encoding such as, for example, the instruction depicted in FIG. 3.
  • the processor 12 accesses the cache (block 100 ), fetches the next instruction to be processed (block 102 ) and decodes the length of the instruction to be processed (block 104 ).
  • decoding the length of an instruction enables a processor to parse the instruction into its component encoding fields (i.e., opcode field, Mr/m field, displacement field, etc.).
  • the instruction to be processed by the processor 12 is then decoded (blocks 106 and 108 ), renamed (block 110 ) and then queued for execution (block 112 ).
  • blocks 106 and 108 The instruction to be processed by the processor 12 is then decoded (blocks 106 and 108 ), renamed (block 110 ) and then queued for execution (block 112 ).
  • the processor 12 is adapted to perform additional activities in parallel to the instruction processing activities associated with blocks 100 - 112 described in connection with FIG. 4.
  • the processor 12 uses the decoding blocks 42 and 44 to carry out the decoding activities associated with blocks 106 and 108 .
  • the decoding blocks 42 and 44 are used to determine whether the page identifier or tag portion of the displacement field 60 matches an identifier value or tag associated with the extended register space 38 of the processor 12 (block 114 ). If the tag portion of the displacement field 60 does not match the tag associated with the extended register space 38 of the processor 12 , then the decoding hardware or logic performing parallel decoding (i.e., in parallel to blocks 106 and 108 ) of the instruction currently being processed takes no further action in connection with the instruction.
  • the processor 12 uses one of the decoders 42 and 44 to decode (block 116 ) the register pointer bits (i.e., bits three to five) of the Mr/m field 56 and the register index bits (i.e., the lower twelve bits) of the displacement field 60 to determine whether the SRC operand or DEST operand is located within the extended register space 38 and, thus, is to be addressed by the register index portion of the displacement field 60 .
  • the register pointer bits i.e., bits three to five
  • the register index bits i.e., the lower twelve bits
  • the number of clock cycles required to decode an instruction that utilizes the extended register space 38 can be minimized by providing additional decoding hardware and/or logic that performs register decoding operations (e.g., block 116 ) in parallel to instruction decoding activities (e.g., blocks 106 and 108 ).
  • register decoding operations e.g., block 116
  • instruction decoding activities e.g., blocks 106 and 108
  • the addressing mode used by the instruction effects the extent to which instruction decoding and register decoding operations can be performed in parallel.
  • displacement addressing is used. With displacement addressing, an operand address is directly encoded within the instruction (i.e., within the displacement field 60 and/or the Mr/m field 56 ), thereby enabling substantial parallel processing of the encoding fields within the instruction.
  • the technique shown in FIG. 4 may be used to compare (block 114 ) the value stored in the base register to the tag or value associated with the extended register space 38 .
  • a comparison may be speculative because the comparison is performed at the front-end of the instruction processing pipeline and a subsequent processor operation could change the value stored in the base register.
  • the processor 12 is preferably configured to track changes to the base register and, upon recognition of changes to the base register value, restart any instruction affected by the change.
  • a standard or known IA-32 instruction set or encodings can be used to enable an IA-32 processor having an extended register space (e.g., the extended register space 38 of the processor 12 ) to use that extended register space to store operand values that would traditionally be stored within system memory (e.g., within off-chip shared memory).
  • the use of register-based operations in place of operations that would otherwise be memory-based reduces the use of stack-based operations and other memory access overhead, thereby resulting in an increased IPC rate for the processor having the extended register space.
  • the extended register space 38 provided within the processor 12 can be more or less than 1024 thirty-two bit words (e.g., more than one page) if desired.
  • the tag match or comparison (block 114 ) shown in FIG. 4 compares the tag portion of the displacement field 60 of each instruction executed in the thread to identifier values or tags that correspond to the multiple pages of register space. If any one of the identifiers or tags matches the tag portion of the displacement field 60 , the processor 12 carries out the register decoding (block 116 ) as described in connection with FIG. 4 above.
  • each thread or process can be associated with a different page identifier or tag so that each thread or process has its own page of register space.
  • the tag match or comparison block 114 shown in FIG. 4 compares the tag portion of the displacement field 60 to the identifier associated with the page used for the current thread or process.
  • the processor 12 may execute multiple threads or processes where some or all of those threads or processes use a plurality of pages within the extended register space 38 . In other words, there may be multiple threads and each of those threads may have access to more than one page within the extended register space 38 .
  • the tag match or comparison (block 114 ) compares the tag portion of the displacement field 60 to the identifier values or tags associated with the current thread.
  • the operating system is preferably adapted to save and restore the extended register space 38 for each thread or process in response to a context switch (i.e., when switching from execution of one process or thread to another process or thread). Additionally, an efficient transfer of operands between the eight traditional on-chip general purpose registers and the extended register space 38 can be implemented by mapping the traditional registers into the extended register space 38 .
  • the eight traditional registers associated with known IA-32 processors may be kept physically and logically separate from the extended register space 38 and specific encodings of the Mr/m field 56 can be used to indicate that a source or destination operand is located in one of the eight traditional on-chip registers.
  • FIG. 5 is a block diagram that depicts another example manner in which instruction encoding fields 150 of a standard IA-32 instruction can be used by the processor 12 shown in FIG. 1 to access the extended register space 38 .
  • the example instruction is composed using standard IA-32 processor instruction encoding fields (i.e., the encoding fields that are used with IA-32 processors having only eight on-chip general purpose registers).
  • the example encoding fields 150 include a prefix field 152 , an opcode field 154 , an Mr/m byte or field 156 , an Sib field 158 , a displacement addressing field 160 and an immediate addressing field 162 .
  • bits three to five of the Mr/m field 156 and an offset portion (i.e., bits zero to eleven) 163 of the displacement field 160 are used by the processor 12 to access three operands within three different registers.
  • bits within the Mr/m field 156 and the offset portion 163 of the displacement field 160 are decoded as a three operand add with carry (ADC) instruction 164 .
  • ADC three operand add with carry
  • the principals depicted in FIG. 5 could be applied to any other instruction.
  • the ADC instruction 164 can be depicted as DEST SRC1+SCR2+CF.
  • the processor 12 executes the register decode process (block 116 of FIG. 4) so that bits three to five of the Mr/m field 156 and bits ten and eleven of the offset 163 are used to address the destination (DEST) operand, bits five to nine of the offset 163 are used to address the first source operand (SRC1) and bits zero to four of the offset 163 are used to address the second source operand (SRC2).
  • each of the three operands shown in FIG. 5 is represented by a five-bit value and, as a result, each of the operands can randomly access any one of thirty-two registers located in the extended register space 38 of the processor 12 .
  • FIG. 5 The example manner of enabling the processor 12 to access an extended register space depicted in FIG. 5 is similar to the technique depicted in FIG. 4. However, as can be seen from a comparison of FIGS. 3 and 5, the manner in which the bits of the displacement field are decoded enables native backward compatibility of software written using the standard IA-32 encodings on known IA-32 processors.

Abstract

Methods and apparatus for accessing an extended register space associated with a processor are disclosed. In an example method, a first portion of an encoding field of an instruction is compared to a value associated with the extended register space. A first operand of the instruction is associated with a second portion of the encoding field if the first portion of the encoding field matches the value associated with the extended register space.

Description

    FIELD OF THE DISCLOSURE
  • The present disclosure relates generally to microprocessors and, more particularly, to apparatus and methods that extend the register space available to a processor without requiring modification of the instruction set encodings associated with that processor. [0001]
  • BACKGROUND
  • The architectural register set or register space of a processor is typically physically integrated within the processor (i.e., is on-chip). Register space or registers may be used to facilitate the rapid execution of instructions and manipulation of operand values by a processor. As is well known, the registers making up a register space are not a shared resource and, as a result, can be accessed more rapidly than other resources that are physically external or separate from the processor chip (i.e., off-chip) and/or which are shared with other agent resources. The register space of a processor is not subject to memory coherency schemes (such as those that are used within multiprocessor systems) and other operational overhead associated with the management of shared memory resources. Also, using a memory stack in lieu of a larger register file introduces additional overhead associated with address calculations. [0002]
  • Some microprocessors or processors provide a relatively limited register space or architectural register set. For example, the thirty-two bit Intel processor families, which are collectively referred to as IA-32 processors, provide eight thirty-two bit general purpose registers, which are located on-chip. Unfortunately, many compiler optimizations, which are usually used to increase the effective instruction-per-clock-cycle (IPC) rate of processors, typically require more than eight general purpose registers. Additionally, a larger number of registers is generally beneficial because a larger number of registers enables program execution to be carried out using fewer memory-based operations, thereby reducing the overhead associated with accessing stack-based operands and, thus, reducing cache occupation and bandwidth (i.e., cache ports) overhead. Reducing the number of stack-based memory operations performed by a processor can free a substantial amount of cache space and bandwidth for use by other load, store and prefetch instructions, which can substantially increase the IPC rate of the processor. [0003]
  • While it is a relatively simple matter to redesign a processor to have a larger register space, such a processor redesign typically requires changes to the instruction set encodings to enable the redesigned processor to efficiently use the additional register space. Furthermore, instruction set encoding changes are typically not backward compatible with earlier versions of the processor that have a smaller register space.[0004]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram of an example processor system that uses the extended register space apparatus and methods described herein; [0005]
  • FIG. 2 is a more detailed block diagram of the processor shown in FIG. 1; [0006]
  • FIG. 3 is a block diagram that depicts an example manner in which an instruction encoding can be used by the processor shown in FIGS. 1 and 2 to access an extended register space; [0007]
  • FIG. 4 is a flow diagram that depicts an example manner in which the processor shown in FIGS. 1 and 2 can process an instruction encoding to access an extended register space; and [0008]
  • FIG. 5 is a block diagram that depicts another example manner in which an instruction encoding can be used by the processor shown in FIGS. 1 and 2 to access an extended register space.[0009]
  • DESCRIPTION OF THE PREFERRED EXAMPLES
  • FIG. 1 is a block diagram of an [0010] example processor system 10 that uses the extended register space apparatus and methods described herein. As shown in FIG. 1, the processor system 10 includes a processor 12 that is coupled to an interconnection bus or network 14. The processor 12 includes an architectural register set or register space 16, which is depicted in FIG. 1 as being entirely on-chip, but which could alternatively be located entirely or partially off-chip and directly coupled to the processor 12 via dedicated electrical connections and/or via the interconnection network or bus 14. The processor 12 may be any suitable processor, processing unit or microprocessor such as, for example, an Intel Itanium™ processor, Intel X-Scale™ processor, Intel Pentium™ processor, etc. However, in the example described in detail below, the processor 12 is a thirty-two bit Intel processor, which is commonly referred to as an IA-32 processor.
  • In the example shown in FIG. 1, regardless of whether the [0011] register space 16 is implemented on-chip, off-chip, or some combination of on-chip and off-chip, the register space 16 is extended to provide more than eight thirty-two bit general purpose registers, which are currently provided by existing IA-32 processors. Although not shown in FIG. 1, the system 10 may be a multi-processor system and, thus, may include one or more additional processors that are identical or similar to the processor 12 and which are coupled to the interconnection bus or network 14.
  • The [0012] processor 12 of FIG. 1 is coupled to a chipset 18, which includes a memory controller 20 and an input/output (I/O) controller 22. As is well known, a chipset typically provides I/O and memory management functions as well as a plurality of general purpose and/or special purpose registers, timers, etc. that are accessible or used by one or more processors coupled to the chipset. The memory controller 20 performs functions that enable the processor 12 (or processors if there are multiple processors) to access a system memory 24, which may include any desired type of volatile memory such as, for example, static random access memory (SRAM), dynamic random access memory (DRAM), etc. The I/O controller 22 performs functions that enable the processor 12 to communicate with peripheral input/output (I/O) devices 26 and 28 via an I/O bus 30. The I/ O devices 26 and 28 may be any desired type of I/O device such as, for example, a keyboard, a video display or monitor, a mouse, etc. While the memory controller 20 and the I/O controller 22 are depicted in FIG. 1 as separate functional blocks within the chipset 18, the functions performed by these blocks may be integrated within a single semiconductor circuit or may be implemented using two or more separate integrated circuits.
  • FIG. 2 is a more detailed block diagram of the [0013] processor 12 shown in FIG. 1. In the example of FIG. 2, the register space 16 of the processor 12 includes eight on-chip general purpose registers 36 that are currently provided by existing IA-32 processors and an extended on-chip register space or set of registers 38. In addition, the processor 12 includes instruction processing hardware and/or logic 40 which, in addition to the pipeline hardware provided with known IA-32 processors, includes two decoding blocks 42 and 44 that are adapted to process or decode instructions or portions of an instruction in parallel. Still further, the processor 12 includes microcode 46 that, in addition to enabling the processor 12 to carry out the functions of a known IA-32 processor, enables the processor 12 to utilize the extended register space 38 for carrying out existing IA-32 instruction set encodings.
  • FIG. 3 is a block diagram that depicts an example manner in which an existing or standard IA-32 instruction encoding can be used by the [0014] processor 12 of FIGS. 1 and 2 to access the extended register space 38. As shown in FIG. 3, the encoding fields 50 of a standard instruction for an IA-32 processor include an optional prefix field 52, an opcode field 54, an Mr/m field 56, an Sib field 58, a displacement addressing field 60, and an immediate addressing field 62. Because the IA-32 processor instruction encoding fields 50 shown in FIG. 3 are well known, additional detailed description of these fields is not required. However, for purposes of facilitating an understanding of the examples described herein, some additional description of the purpose and operation of these fields is provided below.
  • The [0015] opcode field 54 contains the binary encoding, which in this example is one-byte or eight bits of encoding, required to carry out a particular processor operation such as, for example, an arithmetic operation, a memory access operation, a register contents manipulation (e.g., shift), or any combination of these operations. The Mr/m field 56, among other things, is a one-byte field that determines the addressing mode to be used in carrying out an instruction (e.g., execution of an instruction by a processor such as the processor 12 shown in FIG. 1). For example, a displacement addressing mode or an immediate addressing mode may be used depending on the status of the bits within the Mr/m field 56. As is known, a displacement addressing mode uses the contents of the displacement field 60 to address an operand associated with an instruction relative to another memory address such as, for example, the starting address of the instruction. On the other hand, an immediate addressing mode uses the immediate addressing field 62 to address an operand associated with the instruction based on the contents of the immediate addressing field 62. In other words, if used, the immediate addressing field 62 typically contains an absolute (as opposed to a relative) memory address, which is associated with an operand of the instruction.
  • The example instruction described in connection with FIG. 3 is an add with carry instruction, which is represented mnemonically as ADC. As is known, the ADC instruction for an IA-32 processor requires two operands, one of which is referred to as a source (SRC) operand and the other of which is referred to as a destination (DEST) operand. With existing IA-32 processors, one of the two operands (i.e., SRC or DEST) must be located within an on-chip register and the other one of the operands may be located within system memory. When executed by an existing IA-32 processor, the ADC instruction results in the summation of the contents of the SRC, the DEST and the carry flag (CF) and storage of the sum in the location associated with the DEST operand. Mnemonically, this operation can be represented as DEST<=DEST+SRC+CF. Thus, the DEST location functions as both an operand and a storage location for the result of the instruction. For processor architectures that allow more than one memory operand, the methods described herein can be individually applied to each memory operand. [0016]
  • When executing an ADC instruction, existing or known IA-32 processors interpret bits three to five of the Mr/[0017] m field 56 as an address for one of the eight known or traditional general purpose on-chip registers (e.g., registers zero to seven). Depending on the particular encodings used for the ADC instruction, the register address represented in the Mr/m field 56 may be either the location of the SRC operand or the DEST operand. In the example depicted in connection with reference numeral 66, the on-chip register is the DEST operand and existing IA-32 processors use the displacement field 60 to address a portion of system memory (e.g., a portion of the memory 24 shown in FIG. 1) for the SRC operand. On the other hand, in the example depicted in connection with reference numeral 68, the on-chip register is the SRC operand and existing IA-32 processors use the displacement field 60 to address system memory for the DEST operand.
  • For the example IA-32 processor of FIG. 2, the register space [0018] 16 (FIG. 1) is extended and, thus, contains more than the eight traditional general purpose registers currently provided with IA-32 processors. In the example of FIG. 2, the register space 16 is extended to contain an additional 1024 thirty-two bit registers. However, any other number of additional registers may be used instead. As described in greater detail in connection with FIG. 4 below, the apparatus and methods described herein enable the instruction encoding fields 50 shown in FIG. 3 to access the register space 16 of the processor 12. In particular, when executing an ADC instruction in a displacement addressing mode as depicted in FIG. 3, the processor 12 reads the most significant (i.e., the upper) twenty bits of the displacement field 60 as a page identifier or tag and then compares this page identifier or tag to a predetermined identifier value associated with the extended register space 38. As described in detail in connection with FIG. 4 below, if the page identifier or tag read from the displacement field 60 matches the identifier value associated with the extended register space 38, the processor 12 processes the instruction by using the lower twelve bits of the displacement field 60 to access one of the two operands of the instruction within the extended register space 38.
  • As depicted in FIG. 3, the lower twelve bits or offset of the [0019] displacement field 60 are used as a register index to the extended register space 38. Specifically, bits two to eleven are used to address the 1024 thirty-two bit registers. The lowest two bits (i.e., zero and one) are ignored because these bits correspond to (i.e., may be used to individually address or select) the four bytes making up each thirty-two bit register word. Thus, if bits three to five of the Mr/m field 56 address the SRC operand, then bits two to eleven of the displacement field 60 are used by the processor 12 to address the DEST operand within the extended register space 38. On the other hand, if bits three to five of the Mr/m field 56 address the DEST operand, then bits two to eleven of the displacement field 60 are used by the processor 12 to address the SRC operand within the extended register space 38.
  • Although the example described in connection with FIG. 3 uses a single page identifier or tag that corresponds to a four kilobyte page or 1024 thirty-two bit words within the memory map of the [0020] processor 12, additional page identifiers or tags could be used to enable the processor 12 to access more than 1024 thirty-two bit registers within the extended register space 38. Likewise, fewer than 1024 thirty-two bit registers may be provided within the extended register space 38, in which case some of the register addresses provided by the lower twelve bits of the displacement field 60 may be unused or ignored. Alternatively, a tag having more than twenty bits may be used to access registers within the extended register space 38. In that case, the offset or register index portion of the displacement field 60 would have fewer than twelve bits and, thus, would enable addressing and access to fewer than 1024 thirty-two bit registers. Additionally, although the example depicted in FIG. 3 is based on an add with carry instruction any other instruction using memory operands could be used instead. Still further, while the example depicted in FIG. 3 is based on using an instruction set for an IA-32 processor, other instruction sets associated with other processor types could be used instead. In particular, for implementations based on these other instruction sets and processor types, the fields associated with the native register address and memory address would be used instead of the IA-32 fields “M/rm” and “displacement.”
  • In the example described in connection with FIGS. [0021] 1-3, the processor 12 is an IA-32 processor and the register space 16 includes the eight general purpose on-chip registers that are traditionally provided by known IA-32 processors and an additional 1024 thirty-two bit on-chip registers, which have not previously been provided with IA-32 processors. To enable the processor 12 to access the extended register space 38 using instruction encodings compatible with existing IA-32 processors (i.e., processors which do not have the extended register space 38), the processor 12 includes microarchitecture (e.g., microcode) for causing the processor 12 to carry out the instruction processing technique described in detail in connection with FIG. 4 below. In addition, the operating system (OS) and/or basic input/output system (BIOS) of the computer system 10 is configured so that the memory map of the system 10 reserves the memory page associated with the extended register space 38 for exclusive use by the processor 12. In other words, the memory page identifier that would normally be used by existing IA-32 processors to address a physical page of memory within the system memory 24 is instead used exclusively by the processor 12 (i.e., is not shared by other resources within the system 10) to address registers within the extended register space 38.
  • FIG. 4 is a flow diagram that depicts an example manner in which the [0022] processor 12 shown in FIGS. 1 and 2 can process existing or standard IA-32 instruction encodings to access the extended register space 38. In particular, the flow diagram shown in FIG. 4 depicts an example manner in which the front-end instruction processing pipeline within the instruction processing hardware or logic 40 of the processor 12 is configured to operate when processing a standard IA-32 instruction encoding such as, for example, the instruction depicted in FIG. 3. As shown in FIG. 4, the processor 12 accesses the cache (block 100), fetches the next instruction to be processed (block 102) and decodes the length of the instruction to be processed (block 104). As is known, decoding the length of an instruction enables a processor to parse the instruction into its component encoding fields (i.e., opcode field, Mr/m field, displacement field, etc.). The instruction to be processed by the processor 12 is then decoded (blocks 106 and 108), renamed (block 110) and then queued for execution (block 112). It should be recognized that the activities associated with blocks 100112 of FIG. 4 are currently employed by existing IA-32 processors and, thus, are well known and are not described in greater detail herein.
  • The [0023] processor 12 is adapted to perform additional activities in parallel to the instruction processing activities associated with blocks 100-112 described in connection with FIG. 4. The processor 12 uses the decoding blocks 42 and 44 to carry out the decoding activities associated with blocks 106 and 108. In addition, the decoding blocks 42 and 44 are used to determine whether the page identifier or tag portion of the displacement field 60 matches an identifier value or tag associated with the extended register space 38 of the processor 12 (block 114). If the tag portion of the displacement field 60 does not match the tag associated with the extended register space 38 of the processor 12, then the decoding hardware or logic performing parallel decoding (i.e., in parallel to blocks 106 and 108) of the instruction currently being processed takes no further action in connection with the instruction. On the other hand, if the page identifier or tag portion of the displacement field 60 does match the tag associated with the extended register space 38, then the processor 12 uses one of the decoders 42 and 44 to decode (block 116) the register pointer bits (i.e., bits three to five) of the Mr/m field 56 and the register index bits (i.e., the lower twelve bits) of the displacement field 60 to determine whether the SRC operand or DEST operand is located within the extended register space 38 and, thus, is to be addressed by the register index portion of the displacement field 60.
  • As can been seen from the example in FIG. 4, the number of clock cycles required to decode an instruction that utilizes the [0024] extended register space 38 can be minimized by providing additional decoding hardware and/or logic that performs register decoding operations (e.g., block 116) in parallel to instruction decoding activities (e.g., blocks 106 and 108). For example, with the example processor 12 shown in FIG. 2, one of the decoders 42 and 44 can be used for register decoding operations while the other one of the decoders 42 and 44 is used for instruction decoding activities. However, the addressing mode used by the instruction effects the extent to which instruction decoding and register decoding operations can be performed in parallel. For instance, for the example instruction shown and described in connection with FIG. 3, displacement addressing is used. With displacement addressing, an operand address is directly encoded within the instruction (i.e., within the displacement field 60 and/or the Mr/m field 56), thereby enabling substantial parallel processing of the encoding fields within the instruction.
  • In the case where the page identifier or tag portion of the [0025] displacement field 60 is contained within a register (i.e., the tag value is stored in the register) such as, for example, addressing that uses indirection through a base register, the technique shown in FIG. 4 may be used to compare (block 114) the value stored in the base register to the tag or value associated with the extended register space 38. However, such a comparison may be speculative because the comparison is performed at the front-end of the instruction processing pipeline and a subsequent processor operation could change the value stored in the base register. Thus, with indirect or other more complex addressing modes, the processor 12 is preferably configured to track changes to the base register and, upon recognition of changes to the base register value, restart any instruction affected by the change. In any event, changes to the page identifier or tag portion (i.e., the upper twenty bits) of the base register are a relatively rare occurrence and, thus, instruction restarts and the like would have a minimal impact on overall execution speed or the effective IPC rate of the processor 12.
  • From the above example, it can be seen that a standard or known IA-32 instruction set or encodings can be used to enable an IA-32 processor having an extended register space (e.g., the [0026] extended register space 38 of the processor 12) to use that extended register space to store operand values that would traditionally be stored within system memory (e.g., within off-chip shared memory). The use of register-based operations in place of operations that would otherwise be memory-based reduces the use of stack-based operations and other memory access overhead, thereby resulting in an increased IPC rate for the processor having the extended register space.
  • Software written for a processor having an extended register set such as the example processor described in connection with FIGS. [0027] 1-4 above is backward compatible with (i.e., can run natively on or can be executed by) an existing IA-32 processor having only the eight traditional on-chip general purpose registers. To enable such backward compatibility, software or instructions utilizing the extended register set are compiled so that an instruction requiring access to a register within the extended register set is reduced to a memory access operation. However, the BIOS and/or OS executed by the existing IA-32 processor must ensure that the system memory used as register space is available to the existing IA-32 processor. In other words, if software is written for use by an IA-32 processor having an additional 1024 thirty-two bit on-chip registers, executing this software on a currently available IA-32 processor having only eight on-chip general purpose registers requires the BIOS and/or OS of the existing IA-32 processor to map a page (i.e., 1024 thirty-two bit words) with the same base address as the extended register tag within its system memory. However, executing software that makes use of the extended register space 38 on an existing IA-32 processor does not provide a performance advantage (e.g., an increased IPC rate) because operands addressed within the extended register space physically reside within system memory and, thus, accessing these operands involves memory operations and the processing overhead associated therewith.
  • As noted above, the [0028] extended register space 38 provided within the processor 12 can be more or less than 1024 thirty-two bit words (e.g., more than one page) if desired. For example, in a case where the processor 12 is executing a single thread or process that uses multiple pages of register space within the extended register space 38, the tag match or comparison (block 114) shown in FIG. 4 compares the tag portion of the displacement field 60 of each instruction executed in the thread to identifier values or tags that correspond to the multiple pages of register space. If any one of the identifiers or tags matches the tag portion of the displacement field 60, the processor 12 carries out the register decoding (block 116) as described in connection with FIG. 4 above.
  • On the other hand, in a case where the [0029] processor 12 uses its operating system to carry out multiple threads or processes, each thread or process can be associated with a different page identifier or tag so that each thread or process has its own page of register space. Thus, in the case where the processor 12 is executing multiple threads or processes, each of which is associated with a different page identifier or tag, the tag match or comparison (block 114) shown in FIG. 4 compares the tag portion of the displacement field 60 to the identifier associated with the page used for the current thread or process.
  • Still further, the [0030] processor 12 may execute multiple threads or processes where some or all of those threads or processes use a plurality of pages within the extended register space 38. In other words, there may be multiple threads and each of those threads may have access to more than one page within the extended register space 38. In this case, the tag match or comparison (block 114) compares the tag portion of the displacement field 60 to the identifier values or tags associated with the current thread.
  • For single- or multi-threaded processors (i.e., processors that execute multiple processes simultaneously) that have the extended [0031] register space 38, the operating system is preferably adapted to save and restore the extended register space 38 for each thread or process in response to a context switch (i.e., when switching from execution of one process or thread to another process or thread). Additionally, an efficient transfer of operands between the eight traditional on-chip general purpose registers and the extended register space 38 can be implemented by mapping the traditional registers into the extended register space 38. Alternatively, the eight traditional registers associated with known IA-32 processors may be kept physically and logically separate from the extended register space 38 and specific encodings of the Mr/m field 56 can be used to indicate that a source or destination operand is located in one of the eight traditional on-chip registers.
  • Further optimization of the use of the [0032] extended register space 38 can be achieved with processors having trace cache-based microarchitectures. In particular, when a processor having a trace cache-based microarchitecture identifies an instruction that requires access to the extended register space 38, information relating to that instruction and the extended register space to which it requires access can be stored in the microcode trace to enable more efficient processing of that instruction during subsequent invocations of the instruction.
  • FIG. 5 is a block diagram that depicts another example manner in which instruction encoding fields [0033] 150 of a standard IA-32 instruction can be used by the processor 12 shown in FIG. 1 to access the extended register space 38. As shown in FIG. 5, the example instruction is composed using standard IA-32 processor instruction encoding fields (i.e., the encoding fields that are used with IA-32 processors having only eight on-chip general purpose registers). As with the instruction shown in FIG. 3, the example encoding fields 150 include a prefix field 152, an opcode field 154, an Mr/m byte or field 156, an Sib field 158, a displacement addressing field 160 and an immediate addressing field 162.
  • As depicted in FIG. 5, bits three to five of the Mr/[0034] m field 156 and an offset portion (i.e., bits zero to eleven) 163 of the displacement field 160 are used by the processor 12 to access three operands within three different registers. In the example shown in FIG. 5, bits within the Mr/m field 156 and the offset portion 163 of the displacement field 160 are decoded as a three operand add with carry (ADC) instruction 164. However, the principals depicted in FIG. 5 could be applied to any other instruction. Mnemonically, the ADC instruction 164 can be depicted as DEST
    Figure US20040049657A1-20040311-P00001
    SRC1+SCR2+CF.
  • To process the instruction shown in FIG. 5, the [0035] processor 12 executes the register decode process (block 116 of FIG. 4) so that bits three to five of the Mr/m field 156 and bits ten and eleven of the offset 163 are used to address the destination (DEST) operand, bits five to nine of the offset 163 are used to address the first source operand (SRC1) and bits zero to four of the offset 163 are used to address the second source operand (SRC2). Thus, each of the three operands shown in FIG. 5 is represented by a five-bit value and, as a result, each of the operands can randomly access any one of thirty-two registers located in the extended register space 38 of the processor 12.
  • The example manner of enabling the [0036] processor 12 to access an extended register space depicted in FIG. 5 is similar to the technique depicted in FIG. 4. However, as can be seen from a comparison of FIGS. 3 and 5, the manner in which the bits of the displacement field are decoded enables native backward compatibility of software written using the standard IA-32 encodings on known IA-32 processors.
  • On the other hand, software written using the standard IA-32 instruction encodings for a processor such as that shown in the example of FIG. 5 is not natively backward compatible with known IA-32 processors. However, backward compatibility can be achieved by using a modified exception handler. In particular, because the tag field of the pseudo memory displacement points to an unmapped memory address, the fault handler can be used to inspect an instruction that is attempting to access this unmapped memory, and emulate the functionality of the instruction. Upon completion, the fault handler returns program execution to the instruction following the emulated instruction. Of course, a substantial performance penalty is incurred as a result of using a fault handler to emulate each software instruction that attempts to access the extended register space within a processor that does not have the extended register space. [0037]
  • Although certain methods and apparatus implemented in accordance with the teachings of the invention have been described herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all embodiments of the teachings of the invention fairly falling within the scope of the appended claims either literally or under the doctrine of equivalents. [0038]

Claims (45)

What is claimed is:
1. A method of accessing an extended register space associated with a processor, the method comprising:
comparing a first portion of a first encoding field of an instruction to a value associated with the extended register space; and
associating a first operand of the instruction with a second portion of the first encoding field if the first portion of the first encoding field matches the value associated with the extended register space.
2. The method of claim 1, wherein comparing the first portion of the first encoding field to the value associated with the extended register space includes comparing a portion of a displacement field of the instruction to the value associated with the extended register space.
3. The method of claim 2, wherein comparing the portion of the displacement field of the instruction to the value associated with the extended register space includes comparing a page identifier portion of the displacement field to a page identifier associated with the extended register space.
4. The method of claim 3, wherein comparing the page identifier portion of the displacement field to the page identifier associated with the extended register space includes comparing a predetermined number of most significant bits of the displacement field to the page identifier associated with the extended register space.
5. The method of claim 1, wherein associating the first operand of the instruction with the second portion of the first encoding field if the first portion of the first encoding field matches the value associated with the extended register space includes associating one of a source and a destination operand with the second portion of the first encoding field.
6. The method of claim 1, further including configuring a memory associated with the processor so that a portion of the memory corresponding to the extended register space is used exclusively by the processor.
7. The method of claim 6, further including storing information from the extended register space within the portion of the memory corresponding to the extended register space in response to a context switch.
8. The method of claim 1, further including associating a portion of a second encoding field of the instruction with a second operand of the instruction.
9. The method of claim 8, wherein associating the portion of the second encoding field with the second operand includes associating a portion of an Mr/m field with one of a source and a destination operand.
10. The method of claim 1, further including associating third and fourth portions of the first encoding field and a portion of a second encoding field of the instruction with second and third operands if the first portion of the first encoding field matches the value associated with the extended register space.
11. A method of accessing a register space associated with a processor, the method comprising:
comparing a first portion of a displacement field of an instruction to a value associated with the register space; and
associating an operand of the instruction with a second portion of the displacement field if the first portion of the displacement field matches the value associated with the register space.
12. The method of claim 11, wherein comparing the first portion of the displacement field to the value associated with the register space includes comparing a tag portion of the displacement field to a tag associated with the register space.
13. The method of claim 12, wherein comparing the tag portion of the displacement field to the tag associated with the register space includes comparing a predetermined number of most significant bits of the displacement field to the tag associated with the register space.
14. The method of claim 11, wherein associating the operand of the instruction with the second portion of the displacement field if the first portion of the displacement field matches the value associated with the register space includes associating one of a source and a destination operand with the second portion of the displacement field.
15. The method of claim 11, further including configuring a system memory so that a portion of the system memory corresponding to the register space is not shared.
16. The method of claim 11, further including associating third and fourth portions of the displacement field and a portion of an Mr/m field with second and third operands if the first portion of the displacement field matches the value associated with the register space.
17. A method of processing an instruction requiring access to a register space associated with a processor, the method comprising:
comparing a tag portion of a first encoding field of the instruction to a value associated with the register space; and
decoding the instruction to associate first and second operands of the instruction with respective first and second registers within the register space.
18. The method of claim 17, wherein comparing the tag portion of the first encoding field of the instruction to the value associated with the register space includes comparing a portion of a displacement field to the value associated with the register space.
19. The method of claim 17, wherein decoding the instruction to associate the first and second operands of the instruction with the respective first and second registers includes associating a register index portion of the first encoding field with the first operand and a portion of a second encoding field of the instruction with the second operand.
20. The method of claim 19, wherein associating the register index portion of the first encoding field with the first operand and the portion of the second encoding field with the second operand includes associating a portion of a displacement field with the first operand and a portion of an Mr/m field with the second operand.
21. The method of claim 17, further including tracking changes within a base register and restarting the instruction in response to detection of a change affecting the instruction.
22. The method of claim 17, further including comparing the tag portion of the first encoding field of the instruction to a plurality of values, each of which is associated with a portion of the register space.
23. The method of claim 17, further including saving the information stored within the register space in response to a context switch.
24. The method of claim 17, further including mapping registers not located within the register space into the register space.
25. The method of claim 17, further including using a microcode trace to store information associated with accessing the register space.
26. The method of claim 17, further including emulating the functionality of the instruction within a fault handler in response to an attempt by the instruction to access an unmapped memory address.
27. A processor, comprising:
a register space;
an instruction decoding pipeline; and
microcode adapted to cause the processor to process an instruction requiring access to the register space within the instruction decoding pipeline so that a first portion of a first encoding field of the instruction is compared to a value associated with the register space and so that a first operand of the instruction is associated with a second portion of the first encoding field if the first portion of the first encoding field matches the value associated with the register space.
28. The processor of claim 27, wherein the register space includes first and second portions and wherein the first portion contains fewer registers than the second portion.
29. The processor of claim 27, wherein the register space is physically integrated within the processor.
30. The processor of claim 27, wherein the instruction decoding pipeline includes a plurality of decoders adapted to perform parallel decoding of the instruction.
31. The processor of claim 27, wherein the first encoding field is a displacement addressing field and wherein the first portion of the first encoding field is one of a memory page identifier and a tag.
32. The processor of claim 27, wherein the second portion of the first encoding field is one of a register index and an offset.
33. A computer system, comprising:
a memory controller;
a system memory coupled to the memory controller; and
a processor having a register space and coupled to the memory controller, wherein the processor is programmed to process an instruction requiring access to the register space so that a first portion of an encoding field of the instruction is compared to a value associated with the register space and so that an operand of the instruction is associated with a second portion of the encoding field if the first portion of the encoding field matches the value associated with the register space.
34. The computer system of claim 33, wherein a portion of the register space portion corresponds to a page of the system memory.
35. The computer system of claim 33, wherein the register space is physically integrated within the processor.
36. The computer system of claim 33, wherein the register space includes first and second portions and wherein the first portion contains fewer registers than the second portion.
37. A method of processing computer readable instructions, the method comprising:
providing a first processor having a first number of registers;
defining a field in an instruction set associated with a second processor having a second number of registers less than the first number of registers so that an instruction from the instruction set that access an off-chip resource when executed by the second processor accesses an on-chip resource when executed by the first processor.
38. The method of claim 37, wherein providing the first processor having the first number of registers includes providing a processor having more than eight general purpose registers.
39. The method of claim 37, wherein defining the field in the instruction set includes defining a tag within a displacement field of the instruction set.
40. The method of claim 39, wherein defining the field in the instruction set associated with the second processor having the second number of registers less than the first number of registers so that the instruction from the instruction set that access the off-chip resource when executed by the second processor accesses the on-chip resource when executed by the first processor includes using a portion of the field to access the on-chip resource.
41. A method of executing an instruction, comprising:
providing a processor with an extended set of on-chip registers;
encoding the instruction with an address of a first register in the extended set of on-chip registers;
executing the instruction using only data from the first register and data from a second on-chip register associated with the processor.
42. The method of claim 41, wherein the second register is in the extended set of registers.
43. The method of claim 41, wherein the second register is in a second set of on-chip registers associated with the processor.
44. A processor, comprising:
a first set of registers;
an extended set of registers;
a decoder to decode an instruction so that the instruction is executed using only data from at least one of the first set and extended set of registers.
45. The processor of claim 44, wherein the decoder comprises:
a first decoder to decode an opcode portion of an instruction; and
a second decoder to decode a tag portion of the instruction at substantially the same time the first decoder decodes the opcode portion of the instruction.
US10/238,276 2002-09-10 2002-09-10 Extended register space apparatus and methods for processors Abandoned US20040049657A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US10/238,276 US20040049657A1 (en) 2002-09-10 2002-09-10 Extended register space apparatus and methods for processors
US11/830,473 US7676654B2 (en) 2002-09-10 2007-07-30 Extended register space apparatus and methods for processors

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/238,276 US20040049657A1 (en) 2002-09-10 2002-09-10 Extended register space apparatus and methods for processors

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US11/830,473 Continuation US7676654B2 (en) 2002-09-10 2007-07-30 Extended register space apparatus and methods for processors

Publications (1)

Publication Number Publication Date
US20040049657A1 true US20040049657A1 (en) 2004-03-11

Family

ID=31990940

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/238,276 Abandoned US20040049657A1 (en) 2002-09-10 2002-09-10 Extended register space apparatus and methods for processors
US11/830,473 Expired - Fee Related US7676654B2 (en) 2002-09-10 2007-07-30 Extended register space apparatus and methods for processors

Family Applications After (1)

Application Number Title Priority Date Filing Date
US11/830,473 Expired - Fee Related US7676654B2 (en) 2002-09-10 2007-07-30 Extended register space apparatus and methods for processors

Country Status (1)

Country Link
US (2) US20040049657A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040255094A1 (en) * 2003-06-11 2004-12-16 Arm Limited Address offset generation within a data processing system
US20050198466A1 (en) * 2004-03-08 2005-09-08 Estlick Michael D. Partial address compares stored in translation lookaside buffer
US20060155911A1 (en) * 2005-01-13 2006-07-13 Ahmed Gheith Extended register microprocessor
US7694110B1 (en) * 2003-07-08 2010-04-06 Globalfoundries Inc. System and method of implementing microcode operations as subroutines
US20110047355A1 (en) * 2009-08-24 2011-02-24 International Business Machines Corporation Offset Based Register Address Indexing
CN102750187A (en) * 2012-07-11 2012-10-24 北京联时空网络通信设备有限公司 Out-of-process interaction method and device
JP2015514242A (en) * 2012-03-15 2015-05-18 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Convert discontinuous instruction specifier to continuous instruction specifier
US20160188382A1 (en) * 2014-12-24 2016-06-30 Elmoustapha Ould-Ahmed-Vall Systems, apparatuses, and methods for data speculation execution
US9785442B2 (en) 2014-12-24 2017-10-10 Intel Corporation Systems, apparatuses, and methods for data speculation execution
US10061589B2 (en) 2014-12-24 2018-08-28 Intel Corporation Systems, apparatuses, and methods for data speculation execution
US10061583B2 (en) 2014-12-24 2018-08-28 Intel Corporation Systems, apparatuses, and methods for data speculation execution
US10387156B2 (en) 2014-12-24 2019-08-20 Intel Corporation Systems, apparatuses, and methods for data speculation execution
US10387158B2 (en) 2014-12-24 2019-08-20 Intel Corporation Systems, apparatuses, and methods for data speculation execution
US10942744B2 (en) 2014-12-24 2021-03-09 Intel Corporation Systems, apparatuses, and methods for data speculation execution

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8060724B2 (en) * 2008-08-15 2011-11-15 Freescale Semiconductor, Inc. Provision of extended addressing modes in a single instruction multiple data (SIMD) data processor
US9507599B2 (en) 2013-07-22 2016-11-29 Globalfoundries Inc. Instruction set architecture with extensible register addressing

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4942541A (en) * 1988-01-22 1990-07-17 Oms, Inc. Patchification system
US5142634A (en) * 1989-02-03 1992-08-25 Digital Equipment Corporation Branch prediction
US5301285A (en) * 1989-03-31 1994-04-05 Hitachi, Ltd. Data processor having two instruction registers connected in cascade and two instruction decoders
US5303356A (en) * 1990-05-04 1994-04-12 International Business Machines Corporation System for issuing instructions for parallel execution subsequent to branch into a group of member instructions with compoundability in dictation tag
US5420984A (en) * 1992-06-30 1995-05-30 Genroco, Inc. Apparatus and method for rapid switching between control of first and second DMA circuitry to effect rapid switching beween DMA communications
US5506980A (en) * 1991-10-22 1996-04-09 Hitachi, Ltd. Method and apparatus for parallel processing of a large data array utilizing a shared auxiliary memory
US5630163A (en) * 1991-08-09 1997-05-13 Vadem Corporation Computer having a single bus supporting multiple bus architectures operating with different bus parameters
US5680632A (en) * 1992-12-24 1997-10-21 Motorola, Inc. Method for providing an extensible register in the first and second data processing systems
US5845331A (en) * 1994-09-28 1998-12-01 Massachusetts Institute Of Technology Memory system including guarded pointers
US5881217A (en) * 1996-11-27 1999-03-09 Hewlett-Packard Company Input comparison circuitry and method for a programmable state machine
US6134676A (en) * 1998-04-30 2000-10-17 International Business Machines Corporation Programmable hardware event monitoring method

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3781811A (en) * 1967-09-14 1973-12-25 Tokyo Shibaura Electric Co Memory protective systems for computers
JPH0784851A (en) * 1993-09-13 1995-03-31 Toshiba Corp Shared data managing method
US5625276A (en) * 1994-09-14 1997-04-29 Coleman Powermate, Inc. Controller for permanent magnet generator
US5845129A (en) * 1996-03-22 1998-12-01 Philips Electronics North America Corporation Protection domains in a single address space
US5903919A (en) * 1997-10-07 1999-05-11 Motorola, Inc. Method and apparatus for selecting a register bank
US6014739A (en) * 1997-10-27 2000-01-11 Advanced Micro Devices, Inc. Increasing general registers in X86 processors
US6230259B1 (en) * 1997-10-31 2001-05-08 Advanced Micro Devices, Inc. Transparent extended state save
US7085889B2 (en) * 2002-03-22 2006-08-01 Intel Corporation Use of a context identifier in a cache memory

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4942541A (en) * 1988-01-22 1990-07-17 Oms, Inc. Patchification system
US5142634A (en) * 1989-02-03 1992-08-25 Digital Equipment Corporation Branch prediction
US5301285A (en) * 1989-03-31 1994-04-05 Hitachi, Ltd. Data processor having two instruction registers connected in cascade and two instruction decoders
US5303356A (en) * 1990-05-04 1994-04-12 International Business Machines Corporation System for issuing instructions for parallel execution subsequent to branch into a group of member instructions with compoundability in dictation tag
US5630163A (en) * 1991-08-09 1997-05-13 Vadem Corporation Computer having a single bus supporting multiple bus architectures operating with different bus parameters
US5506980A (en) * 1991-10-22 1996-04-09 Hitachi, Ltd. Method and apparatus for parallel processing of a large data array utilizing a shared auxiliary memory
US5420984A (en) * 1992-06-30 1995-05-30 Genroco, Inc. Apparatus and method for rapid switching between control of first and second DMA circuitry to effect rapid switching beween DMA communications
US5680632A (en) * 1992-12-24 1997-10-21 Motorola, Inc. Method for providing an extensible register in the first and second data processing systems
US5845331A (en) * 1994-09-28 1998-12-01 Massachusetts Institute Of Technology Memory system including guarded pointers
US5881217A (en) * 1996-11-27 1999-03-09 Hewlett-Packard Company Input comparison circuitry and method for a programmable state machine
US6134676A (en) * 1998-04-30 2000-10-17 International Business Machines Corporation Programmable hardware event monitoring method

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040255094A1 (en) * 2003-06-11 2004-12-16 Arm Limited Address offset generation within a data processing system
US7120779B2 (en) * 2003-06-11 2006-10-10 Arm Limited Address offset generation within a data processing system
US7694110B1 (en) * 2003-07-08 2010-04-06 Globalfoundries Inc. System and method of implementing microcode operations as subroutines
US20050198466A1 (en) * 2004-03-08 2005-09-08 Estlick Michael D. Partial address compares stored in translation lookaside buffer
US7206916B2 (en) * 2004-03-08 2007-04-17 Sun Microsystems, Inc. Partial address compares stored in translation lookaside buffer
US20060155911A1 (en) * 2005-01-13 2006-07-13 Ahmed Gheith Extended register microprocessor
US7231509B2 (en) * 2005-01-13 2007-06-12 International Business Machines Corporation Extended register bank allocation based on status mask bits set by allocation instruction for respective code block
US20110047355A1 (en) * 2009-08-24 2011-02-24 International Business Machines Corporation Offset Based Register Address Indexing
JP2015514242A (en) * 2012-03-15 2015-05-18 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Convert discontinuous instruction specifier to continuous instruction specifier
CN102750187A (en) * 2012-07-11 2012-10-24 北京联时空网络通信设备有限公司 Out-of-process interaction method and device
US20160188382A1 (en) * 2014-12-24 2016-06-30 Elmoustapha Ould-Ahmed-Vall Systems, apparatuses, and methods for data speculation execution
CN107003853A (en) * 2014-12-24 2017-08-01 英特尔公司 The systems, devices and methods performed for data-speculative
US9785442B2 (en) 2014-12-24 2017-10-10 Intel Corporation Systems, apparatuses, and methods for data speculation execution
US10061589B2 (en) 2014-12-24 2018-08-28 Intel Corporation Systems, apparatuses, and methods for data speculation execution
US10061583B2 (en) 2014-12-24 2018-08-28 Intel Corporation Systems, apparatuses, and methods for data speculation execution
TWI657371B (en) * 2014-12-24 2019-04-21 美商英特爾股份有限公司 Systems, apparatuses, and methods for data speculation execution
US10303525B2 (en) * 2014-12-24 2019-05-28 Intel Corporation Systems, apparatuses, and methods for data speculation execution
US10387156B2 (en) 2014-12-24 2019-08-20 Intel Corporation Systems, apparatuses, and methods for data speculation execution
US10387158B2 (en) 2014-12-24 2019-08-20 Intel Corporation Systems, apparatuses, and methods for data speculation execution
US10942744B2 (en) 2014-12-24 2021-03-09 Intel Corporation Systems, apparatuses, and methods for data speculation execution

Also Published As

Publication number Publication date
US7676654B2 (en) 2010-03-09
US20070266227A1 (en) 2007-11-15

Similar Documents

Publication Publication Date Title
US7676654B2 (en) Extended register space apparatus and methods for processors
EP1320800B1 (en) Cpu accessing an extended register set in an extended register mode and corresponding method
US7827390B2 (en) Microprocessor with private microcode RAM
US5517651A (en) Method and apparatus for loading a segment register in a microprocessor capable of operating in multiple modes
EP3343353A1 (en) Processors, methods, systems, and instructions to check and store indications of whether memory addresses are in persistent memory
US9710385B2 (en) Method and apparatus for accessing physical memory from a CPU or processing element in a high performance manner
CN111124498A (en) Apparatus and method for speculative execution side channel mitigation
US20040030870A1 (en) Software breakpoints with tailoring for multiple processor shared memory or multiple thread systems
EP3550437B1 (en) Adaptive spatial access prefetcher apparatus and method
US9317285B2 (en) Instruction set architecture mode dependent sub-size access of register with associated status indication
EP1910919A2 (en) Instruction cache having fixed number of variable length instructions
US20210073144A1 (en) Processing method and apparatus for translation lookaside buffer flush instruction
EP3629155A1 (en) Processor core supporting a heterogeneous system instruction set architecture
CN113535236A (en) Method and apparatus for instruction set architecture based and automated load tracing
US5903919A (en) Method and apparatus for selecting a register bank
KR101528130B1 (en) System, apparatus, and method for segment register read and write regardless of privilege level
EP4202663A1 (en) Asymmetric tuning
EP4160406A1 (en) User-level interprocessor interrupts
US11907712B2 (en) Methods, systems, and apparatuses for out-of-order access to a shared microcode sequencer by a clustered decode pipeline
WO2021061626A1 (en) Instruction executing method and apparatus
US20220179949A1 (en) Compiler-directed selection of objects for capability protection
US20230205605A1 (en) Dynamic asymmetric resources
EP4155911A1 (en) Shared prefetch instruction and support
EP3905034A1 (en) A code prefetch instruction
US11275588B2 (en) Context save with variable save state size

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTEL CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:KLING, RALPH M.;REEL/FRAME:013519/0090

Effective date: 20020829

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION