US20050257037A1

US20050257037A1 - Controlling execution of a block of program instructions within a computer processing system

Info

Publication number: US20050257037A1
Application number: US11/121,184
Authority: US
Inventors: Matthew Elwood; Vladimir Vasekin
Original assignee: ARM Ltd
Current assignee: ARM Ltd
Priority date: 2003-04-04
Filing date: 2005-05-04
Publication date: 2005-11-17
Also published as: GB2400198B; TW200421174A; AU2003292401A1; US7386709B2; GB0307821D0; GB2400198A; WO2004088504A1; US20040199754A1

Abstract

A data processing apparatus and method are disclosed. The data processing apparatus comprises: an instruction fetching circuit operable to fetch a sequence of program instructions from a sequence of memory locations; an instruction decoder responsive to program instructions within the sequence of program instructions fetched by the instruction fetching circuit to control data processing operations specified by the program instructions; and an execution circuit operable under control of the instruction decoder to execute the data processing operations, wherein the instruction decoder is responsive to an execute block instruction within the sequence of program instructions to trigger fetching of a block of two or more program instructions by the instruction fetching circuit and execution of the block of two or more program instructions by the execution circuit, the block of two or more instructions containing a number of program instructions specified by a block length field within the executed block instruction and being stored at a memory location specified by a location field within the execute block instruction, the apparatus further comprises execute block instruction logic operable in response to the execute block instruction to store an indication of a memory location of an instruction following the execute block instruction and to determine which instruction in the block of two or more program instructions is being processed, the execute block instruction logic being further operable when it is determined that a last instruction in the block of two or more program instructions is being processed to provide to the instruction fetching circuit the indication of the memory location of the instruction following the execute block instruction so that the instruction following the execute block instruction is fetched for execution immediately following the last instruction in the block of two or more program instructions. Providing the indication of the memory location of the instruction following the execute block instruction to the instruction fetching circuit causes the fetch unit to fetch that instruction so that the correct sequence of instructions is fetched by the fetch unit which avoids the need to flush instructions.

Description

BACKGROUND OF THE INVENTION

1. Field of the Invention
The present invention relates to the field of data processing systems. More particularly, the present invention relates to the control of execution of a block of program instructions within a data processing system.
2. Description of the Prior Art
It is known that computer programs often contain sequences of program instructions that are frequently repeated within the computer program. In order to produce a computer program with a smaller code size, it is known to arrange such blocks of computer program instructions into functions or subroutines which can be called from various positions within the computer program.
It is normal for such subroutines to terminate with a return instruction which commands the data processing system to return to the instruction immediately following the point in the computer program from where the call to the subroutine was made. When the subroutine or block of instructions is short in length, then the overhead of providing a return instruction at the end of the subroutine can form a significant proportion of the size of the subroutine itself. As an example, if the subroutine block of program instructions being called is only three instructions in length, then the necessary return instruction at the end of the block increases this length to four instructions and results in a significant increase in code size when this is repeated across a large number of such subroutines which may be included within a computer program as a whole.
It is also known within VAX architecture computers to provide an execute instruction which commands the system to execute an instruction found at a memory location specified by the execute instruction. This type of operation can be considered as a one-for-one replacement of the execute instruction within the program code by different instructions pointed to by those execute instructions. This type of functionality is particularly useful for debugging and diagnostic purposes but does not yield significant code density improvements.
It is also known to provide data processing systems including a dictionary function whereby an instruction in the program is a dictionary instruction which triggers a reference to be made to a stored dictionary table where there is a pointer to a memory location storing a sequence of program instructions to be executed in response to that dictionary function. The dictionary table may also include an indication of the length of that block of instructions. The dictionary table approach has the disadvantage that an additional memory construct, namely the dictionary table, needs to be provided within the data processing system as well as additional registers for keeping track of full length memory addresses for the dictionary instruction and the position within the block of program instructions called. In the context of blocks of program instructions which are very short in length, the storage requirements of the dictionary table entries relating to those small blocks of program instructions form a significant proportion of the storage requirements for those blocks of instructions in a manner which is disadvantageous.
A further disadvantage of the dictionary table approach is that it is a more radical change to an existing data processing system architecture if it is to be added to such an existing data processing system architecture.
It is desired to provide a technique whereby frequently repeated sequences of program instructions within a computer program can be executed by a data processing apparatus in an efficient manner whilst minimising the degree of any architectural changes which may be required.

SUMMARY OF THE INVENTION

Viewed from one aspect the present invention provides a data processing apparatus comprising: an instruction fetching circuit operable to fetch a sequence of program instructions from a sequence of memory locations; an instruction decoder responsive to program instructions within the sequence of program instructions fetched by the instruction fetching circuit to control data processing operations specified by the program instructions; and an execution circuit operable under control of the instruction decoder to execute the data processing operations, wherein the instruction decoder is responsive to an execute block instruction within the sequence of program instructions to trigger fetching of a block of two or more program instructions by the instruction fetching circuit and execution of the block of two or more program instructions by the execution circuit, the block of two or more instructions containing a number of program instructions specified by a block length field within the executed block instruction and being stored at a memory location specified by a location field within the execute block instruction, the apparatus further comprising execute block instruction logic operable in response to the execute block instruction to store an indication of a memory location of an instruction following the execute block instruction and to determine which instruction in the block of two or more program instructions is being processed, the execute block instruction logic being further operable when it is determined that a last instruction in the block of two or more program instructions is being processed to provide to the instruction fetching circuit the indication of the memory location of the instruction following the execute block instruction so that the instruction following the execute block instruction is fetched for execution immediately following the last instruction in the block of two or more program instructions.
The present technique recognises that for a large number of blocks of program instructions that can advantageously be the subject of calls from different points within a program, the return instruction represents a significant overhead. Combined with this is the realisation that such small blocks of program instructions rarely need to include therein branch instructions such that when they are started they will with a high degree of probability always be run to their conclusion, i.e. result in a fixed number of program instructions being fetched and executed. Accordingly, the execute block instruction provided by the present technique specifies within the execute block instruction both the location of the block of program instructions to be executed as well as the length of that block of program instructions. Accordingly, there is no need for the block of program instructions to include a return instruction, since the length of the block is already known as specified within the execute block instruction and the return to the main program can be triggered when the final instruction within the block of program instructions has been executed. This execute block instruction extends the advantages of program instruction calls to small blocks of program instructions. The technique is also particularly well suited to use by program compilers which can identify frequently occurring small blocks of instructions within a program image and replace these by execute block instructions. The occurrence of a block in the normal code can be used as the target of branch instructions without the need to separately store the block of instructions elsewhere.
Also, the present invention further recognises that even when the last instruction in the block of instructions is not itself a branch instruction, this instruction can be viewed as a branch instruction in addition to the encoded data processing operation. This is because, in typical operation, a return will occur to an instruction following the execute block instruction. Hence, a non-sequential instruction or operation will be required to be executed following execution of that last instruction. However, it will be appreciated that the last instruction itself would not typically provide an indication of the memory location of the instruction following execute block instruction or contain any information which would normally enable such an indication to be provided. Also, because the last instruction is typically not an instruction which would normally be interpreted as being encoded as a branch, the likelihood that executing that instruction will result in a branch would not normally ever be considered as a possibility and, hence, the fetch unit would typically assume that it is required to linearly access instructions following the last instruction. It will be appreciated that once the last instruction is executed, a recognition will be made that a non-sequential instruction is needed to be executed next. However, this instruction would not have been fetched by the fetch unit and so any following instructions will need to be flushed whilst the required instruction is fetched and processed. It will be appreciated that this situation will occur typically relatively frequently since, in order to maximise code compression, the occurrence of execute block instructions may be high. Accordingly, the number of flushes that would need to be performed would also be relatively high, which may have an adverse effect on overall performance.
Accordingly, execute block instruction logic is provided which stores an indication of a memory location of an instruction following the execute block instruction. The execute block instruction logic determines when the last instruction in the block of two or more program instructions is being processed. When it is determined that the last instruction in the block of two or more program instructions is being processed then an indication of the memory location of an instruction following the execute block instruction is provided to the instruction fetching circuit. Providing the indication of the memory location of the instruction following the execute block instruction to the instruction fetching circuit causes the fetch unit to fetch that instruction. It will be appreciated that in some fetch units which take multiple cycles to fetch instructions or where the fetch unit fetches blocks of instructions, one or more instructions linearly following the last instruction may still be fetched by the fetch unit but these instructions will not be executed. In this way, the correct sequence of instructions is fetched by the fetch unit which avoids the need to flush instructions, with all the attendant performance implications which result from such flushing. Hence, it will be appreciated that the provision of the execute block logic enables the execute block instruction to be implemented in an efficient manner whilst minimising the degree of any architectural changes which may be required.
In embodiments, the execute block instruction logic is operable in response to the execute block instruction to store an indication of the memory location specified by the location field within the execute block instruction, the execute block instruction logic being further operable when it is determined that the execute block instruction is being processed to provide to the instruction fetching circuit the indication of the memory location specified by the location field within the execute block instruction so that instructions in the block of two or more program instructions are fetched for execution immediately following the execute block instruction.
Accordingly, execute block instruction logic is provided which stores an indication of a memory location of an instruction pointed to by the location field within the execute block instruction. The execute block instruction logic determines when the execute block instruction is being processed. When it is determined that the execute block instruction is being processed then an indication of the memory location of the instruction referred to by the execute block instruction is provided to the instruction fetching circuit. Providing the indication of the memory location of the instruction referenced to the execute block instruction to the instruction fetching circuit causes the fetch unit to fetch that instruction. As mentioned previously, in fetch units which take multiple cycles to fetch instructions or where the fetch unit fetches blocks of instructions, one or more instructions sequentially following the execute block instruction may still be fetched by the fetch unit but these instructions will not be executed. In this way, the correct sequence of instructions is fetched by the fetch unit which avoids the need to flush instructions, with all the attendant performance implications which result from such flushing.
In embodiments, the execute block instruction logic comprises storage operable to store the indication of the memory location of the instruction following the execute block instruction, which indication being associated as a target memory location with the last instruction in the block of two or more program instructions.
In embodiments, the storage is further operable to store the indication of the memory location specified by the location field within the execute block instruction, which indication being associated as a target memory location with the execute block instruction.
It will be appreciated that the storage could be provided anywhere in the data processing apparatus. In embodiments, the storage is a branch target buffer, associated with the fetch circuit.
In embodiments, the storage contains a number of entries, each entry having a field for storing an indication of the memory location of an instruction and an associated field for storing an indication of a corresponding target memory location.
Hence, the storage is arranged to readily associate instructions with target instruction memory locations. In embodiments, each entry comprises further fields.
In embodiments, the storage comprises a return stack operable to store the indication of the memory location of the instruction following the execute block instruction, which indication being associated as the target memory location with the last instruction in the block of two or more program instructions.
Such a return stack will typically be arranged to store the indications in a first-in, last-out arrangement. The return stack may or may not be shared with normal subroutines and, if not shared, may degenerate to a single entry (single execute block) return buffer.
In embodiments, the execute block instruction logic is further operable to push the indication of the memory location of the instruction following the execute block instruction onto the return stack in response to the execute block instruction.
Accordingly, the stack will provide an indication of memory location of the instruction following the last execute block instruction encountered.
In embodiments, the execute block instruction logic is operable, in the event that it is determined that the last instruction in the block of two or more program instructions is being processed and the storage provides a target memory location associated with the last instruction in the block of two or more program instructions, to disregard the target memory location associated with the last instruction in the block of two or more program instructions and instead to pop the return stack and to provide to the instruction fetching circuit the indication of the memory location popped from the return stack.
In embodiments, the execute block instruction logic comprises interrupt handling logic which, in the event of an interrupt, stores an indication of the memory location of the execute block instruction and an indication of which instruction in the block of two or more program instructions is being executed when the interrupt occurs to enable upon completion of handling of the interrupt restarting execution of the block of two or more program instructions at a program instruction within the block of two or more instructions indicated by the indication of the memory location of the execute block instruction and the indication of which instruction in the block of two or more program instructions is being executed.
By storing an indication of the memory location of the execute block instruction and an indication of which instruction in the block of two or more program instructions is being executed enables the execution of the instructions at the appropriate point once the interrupt handling has completed.
In embodiments, the interrupt handling logic prevents the pushing of the indication of the memory location of the instruction following the execute block instruction onto the return stack in response to the execute block instruction being processed following the interrupt.
Preventing the pushing the indication of the memory location of the instruction following the execute block instruction onto the return stack following completion of the handling of the interrupt prevents duplication (of the entry corresponding to the EMB instruction) on the return stack which would otherwise occur and result in non-optimal operation.
In embodiments, the execute block instruction logic comprises prediction logic operable to receive an indication of each instruction being fetched by the instruction fetching circuit and to determine whether that instruction has associated therewith a target memory address and, if so, to provide the target memory address to the instruction fetching circuit so that the instruction at that memory address is fetched for execution immediately following that instruction.
It will be appreciated that this functionality may advantageously be provided by standard branch prediction logic.
In embodiments, data processing apparatus further comprises prediction prevention logic operable to prevent prediction logic from providing the target memory address to the instruction fetching circuit for non-branch encoded instructions, the execute block instruction logic comprising prediction prevention override logic operable to inhibit operation of the prediction prevention logic when the last instruction in the block of two or more program instructions is being processed.
Accordingly, when reusing existing hardware which prevents non-branch instructions from causing the prediction logic determining whether a target memory address is associated therewith, prediction prevention override logic is provided which overrides this restriction on when the last instruction in the block is being processed which enables this instruction to be treated in a manner analogous to a branch instruction despite there being no decoded control signals would be associated with that instruction to indicate this.
In embodiments, the execute block instruction logic comprises a counter operable to provide an indication of which instruction in the block of two or more program instructions is being processed.
In embodiments, the execute block instruction logic is operable when it is determined that a last instruction in the block of two or more program instructions is being processed to provide to the instruction fetching circuit the indication of the memory location of the instruction immediately following the execute block instruction.
It will be appreciated that whilst it is possible that a block of program instructions being called could include a branch overriding the action of the execute block instruction and any return calculated from the length of the block specified in the execute block instruction, some embodiments utilise the length of the block as specified in the execute block instruction to trigger a return to a program instruction outside of the block of program instructions once the execution of the block of program instructions has completed.
Whilst the return could be made to a variety of different program locations, such as specified in a final instruction of the block of program instructions, it is normal and advantageous that the return should be made by default to a program instruction immediately following the execute block instruction within the sequence of memory locations storing the main computer program.
The location field within the execute block instruction may specify the location of the block of program instructions in a variety of different ways, such as an absolute address value that advantageously uses an offset field as this is typically more space efficient and can be embedded within the size of bit field available within the execute block instruction.
In embodiments there may be provided a program counter to store an address of a memory location of a program instruction being executed together with a block counter register storing a block count value indicative of a location of a program instruction being executed within a called block of program instructions. In this type of arrangement, when a call is made to a block of program instructions, the program counter register stores the memory location of the execute block instruction whilst the block counter register stores a block counter value indicative of the position within the block of program instructions that has been reached.
The use of a block count register and a program counter register in the above way is particularly useful during exception handling whereby an exception routine can be triggered and a return made after exception handling to the point within the block of instructions that was interrupted in dependence upon the program counter register referencing the execute block instruction and then reference to the block counter value to find the point reached within the block of program instructions.
Viewed from another aspect the present invention provides a method of processing data, comprising fetching a sequence of program instructions from a sequence of memory locations with an instruction fetching circuit; controlling data processing operations specified by the program instructions with an instruction decoder responsive to program instructions within the sequence of program instructions fetched by the instruction fetching circuit; and executing the data processing operations with an execution circuit under control of the instruction decoder, wherein the instruction decoder is responsive to an execute block instruction within the sequence of program instructions to trigger fetching of a block of two or more program instructions by the instruction fetching circuit and execution of the block of two or more program instructions by the execution circuit, the block of two or more instructions containing a number of program instructions specified by a block length field within the executed block instruction and being stored at a memory location specified by a location field within the execute block instruction, the method further comprising, storing, in response to the execute block instruction, an indication of a memory location of an instruction following the execute block instruction and determining which instruction in the block of two or more program instructions is being processed, and when it is determined that a last instruction in the block of two or more program instructions is being processed, providing to the instruction fetching circuit the indication of the memory location of the instruction following the execute block instruction so that the instruction following the execute block instruction is fetched for execution immediately following the last instruction in the block of two or more program instructions.
The above, and other objects, features and advantages of this invention will be apparent from the following detailed description of illustrative embodiments which is to be read in connection with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 schematically illustrates a data processing apparatus of a type suitable for executing execute block instructions;
FIG. 2 schematically illustrates a call by an execute block instruction;
FIG. 3 is a flow diagram schematically illustrating the execution of a block of program instructions in a non-pipelined environment;
FIG. 4 schematically illustrates the interruption of a block of program instructions that has been called;
FIG. 5 schematically illustrates the architecture of a general purpose computer according to embodiments;
FIG. 6 illustrates an arrangement of a data processing apparatus according to an embodiment;
FIGS. 7 a to 7 c illustrate a preferred arrangement of the BTB shown in FIG. 6; and
FIG. 8 shows an example code sequence incorporating an execute block instruction.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

FIG. 1 shows a data processing apparatus 2 including a register bank 4, a multiplier 6, a shifter 8, an adder 10, an instruction pipeline 12, an instruction decoder 14, a prefetch unit 16, a program counter register 18 and an interrupt controller 20. It will be appreciated that the data processing apparatus 2 as illustrated in FIG. 1 will typically include many further circuit elements, but these have been omitted for the sake of clarity. In operation, instructions are fetched from a memory under control of the prefetch unit 16 and a memory location as specified in the program counter register 18 into the fetch stage of the instruction pipeline 12. The instructions progress along the instruction pipeline 12 to a decode stage and then to an execute stage in accordance with normal microprocessing techniques. The instruction decoder 14 decodes the program instructions in the decode stage and generates control signals which are used to configure the circuit elements, such as the register bank 4, the multiplier 6, the shifter 8 and the adder 10, to perform specified data processing operations. The register bank 4, the multiplier 6, the shifter 8 and the adder 10 can be considered to be an execute circuit for executing processing operations as specified by program instructions and under control of the control signals generated by the instruction decoder 14. The interrupt controller 20 is responsive to interrupt signals irq to interrupt normal processing activity and trigger execution of an exception handling routine. The interrupt controller 20 forces the prefetch unit 16 to start fetching instructions from the start of the exception handling routine. Upon completion of the exception handling routine, the previous processing is resumed with a restore being made of the processor state at the point at which the interrupt occurred.
In accordance with the present technique, an execute block instruction is added which specifies an offset address (in this example a negative offset to the current program count value, but may also be a positive offset address or a non-relative address) to a block of program instructions to be fetched. The block has a length specified within the execute block instruction (e.g. up to 16 instructions as specified by a 4-bit field). The instruction decoder 14 has a block counter register 22 added to it to keep track of the number of instructions from the called block of instructions that have been executed so that when the end of that block has been reached the prefetch unit 16 can be commanded to restart fetching instructions from a memory location immediately following the execute block instruction within the initial program flow.
FIG. 2 illustrates an area of main program code 24 which is being sequentially executed. The program instructions are fetched in turn by the prefetch unit 16 to the instruction pipeline 12 where they are decoded and then executed. The program counter register 18 keeps a track of the address value corresponding to the program instruction currently being executed. In this example the program instructions are 32-bit instructions and accordingly the program counter value increments between instructions in steps of four bytes.
An execute block instruction (EMB: Execute Macroblock Instruction) 26 is located within the main code 24 at a PC value of “x”. This execute block instruction includes within it as fields an offset value being a negative memory offset to a starting location of a reference block of program instructions 28 together with a length field specifying the number of program instructions (y−1) within the block of program instructions 28. The instruction decoder 14 is responsive to the execute block instruction 26 to start fetching program instructions, using the prefetch unit 16, from the new memory location pointed to by the offset field. These instructions are then executed. During execution of this block of program instructions the program counter value stored within the program counter register 18 is not incremented but is instead held at the value corresponding to the execute block instruction 26 itself, namely (x). The block counter register 22 is incremented by the instruction decoder starting from a value of zero up to a value of (y−1) corresponding to the end of the block of program instructions 28. When the instruction decoder 14 detects that the block counter value has reached a value of (y−1) which matches the length of the block as originally specified within the execute block instruction 26, the instruction decoder 14 then forces the return to the instruction immediately following the execute block instruction 26 by controlling the prefetch unit 16 to fetch that instruction (at location x+4) into the instruction pipeline 12.
As an alternative to the normal return behaviour, it is possible that a branch instruction may be embedded within the block of program instructions 28 directing a branch to an instruction at another memory location. If such a branch occurs, then it serves to clear the pending execute block instruction behaviour and processing proceeds starting from the target of the branch instruction in the normal way. It is also possible to have such a branch instruction as the last instruction within the block of program instructions 28 to trigger a return to a point in the main program code 24 other than the instruction immediately following the execute block instruction 26.
FIG. 3 is a flow diagram schematically illustrating the behaviour in a non-pipeline system (i.e. not the pipeline system of FIG. 1) of a data processing system responsive to an execute block instruction. At step 30 the instruction decoder detects whether an execute block instruction has been received. When such an execute block instruction has been received, processing proceeds to step 32 at which the address offset value and block length are read from the execute block instruction. At step 34 the first instruction from the block of program instructions pointed to by the current microPC (block counter register 22) is loaded into the system for decoding an execution starting from the memory location pointed to by the current program counter value minus the specified offset value. At step 36 the current program instruction from the block of program instructions is executed.
It will be appreciated that an external interrupt, as will be discussed later, can occur before the step 36 or an internal interrupt during the step 36. Whilst the occurrence of such external and internal interrupts is conventional, the way in which the return location after interrupt servicing is tracked is different when using the present technique and will be discussed in relation to FIG. 4.
After step 26, the block counter value is incremented at step 38 and then step 40 determines whether the last instruction within the block of program instructions has yet been reached. If the last instruction has not yet been reached, then processing proceeds to step 42 at which the next instruction is loaded into the system and a return is made to step 36. If the last instruction has been reached, then processing proceeds to step 44 at which a return is made to the main program code in which the execute block instruction occurred at a location immediately following that execute block instruction and the microPC value is set to zero.
FIG. 4 schematically illustrates the occurrence of an interrupt during execution of a block of program instructions called by an execute block instruction. Firstly, an execute block instruction occurs at point 46 triggering the start of execution of the block of program instructions at point 48. During the execution of the block of program instructions, the program counter value is held at a value corresponding to the execute block instruction whilst the block counter value is incremented to indicate the position within the block of program instructions concerned. An interrupt occurs during the execution of the block of program instructions and this triggers the start of execution of interrupt handling code at step 50. The state of the data processing apparatus 2 is saved for later restarting by saving the program counter value and the block counter value, as well as other state variables in the normal way. When the interrupt handling code has finished, these saved program counter values and block counter values are restored. The program counter values can be saved in accordance with the normal exception handling mechanisms of processors such as the ARM processors designed by ARM Limited of Cambridge, England. The block counter value can be saved in the context of such processors by using a bit field within the PSR register to be saved as a saved program status register configuration parameter. If the blocks of program instructions that may be called using an execute block instruction are restricted in length to a maximum size of 16 instructions, then the block counter value need only be four bits in length, which can be conveniently represented by a small bit field within the PSR register of an ARM processor.
Termination of the interrupt handling code at point 52 serves to trigger the restoring of the program counter value and the block counter value and resumption of the block of program instructions at point 54. After the block of program instructions has completed execution at point 56, a return is made to the main program code at point 58.
FIG. 5 schematically illustrates a general purpose computer 200 of the type that may be used to implement the described techniques. The general purpose computer 200 includes a central processing unit 202, a random access memory 204, a read only memory 206, a network interface card 208, a hard disk drive 210, a display driver 212 and monitor 214 and a user input/output circuit 216 with a keyboard 218 and mouse 220 all connected via a common bus 222. In operation the central processing unit 202 will execute computer program instructions that may be stored in one or more of the random access memory 204, the read only memory 206 and the hard disk drive 210 or dynamically downloaded via the network interface card 208. The results of the processing performed may be displayed to a user via the display driver 212 and the monitor 214. User inputs for controlling the operation of the general purpose computer 200 may be received via the user input output circuit 216 from the keyboard 218 or the mouse 220. It will be appreciated that the computer program could be written in a variety of different computer languages. The computer program may be stored and distributed on a recording medium or dynamically downloaded to the general purpose computer 200. When operating under control of an appropriate computer program, the general purpose computer 200 can perform the above described techniques and can be considered to form an apparatus for performing the above described technique. The architecture of the general purpose computer 200 could vary considerably and FIG. 5 is only one example.
FIG. 6 illustrates an arrangement of a data processing apparatus according to an embodiment. The data processing apparatus 100 comprises a fetch unit 110 coupled with a decode unit 120 and an execute unit 130.
The fetch unit 110 contains a fetch address register 122 which is loaded with an instruction address of an instruction to be retrieved from an instruction cache 112. The required instructions from the instruction cache 112 are queued in an instruction queue 114. The instruction cache 112 is preferably operable to return or provide multiple instructions with a single fetch.
The fetched instructions are provided from the instruction queue 114 to the decode unit 120. As will be explained in more detail below, the instructions are then decoded by the decoder unit 120 and a number of control signals are provided over the path 125 to control the operation of the execute unit 130.
The fetch unit 110 contains a Branch Target Buffer (BTB) 124 which receives the contents of the fetch address register 122 and is operable to determine whether the fetched instruction has associated therewith a predicted branch target address. The BTB 124 is coupled with a return stack 116 which contains return addresses which have been pushed thereon as will be described in more detail below.
A preferred arrangement of the BTB 124 will now be discussed with reference to FIG. 7 a.
The BTB 124 contains a number of entries. Each entry has a tag field 700, a type field 710 and a target address 705. The address stored in the fetch address register 122 may be split into bit fields that match an entry in the tag field 700 and select an index 750 in the BTB 124.
When an address is provided to the BTB 124 then the address or a bit field within the address is matched against each tag field 700 entry. When a hit occurs the contents of the type field 710 may be used as a control signal for the return stack 116 and the multiplexer 126. Also, the associated contents of the target field 705 is provided to the multiplexer 126.
Although the illustrated BTB 124 is shown as being direct mapped, the BTB 124 may be multiple set-associative or fully associative. Additional fields in the BTB 124 may also be provided.
Returning now to FIG. 6, when an instruction is fetched from the instruction cache 112 the address of that instruction is provided to the BTB 114. Should the address provided to the BTB 114 match any addresses stored therein then a hit occurs and a target address indicative of the predicted branch associated with the fetched instruction is provided over the path 111 to the multiplexer 118. This value is then in turn provided to the fetch address register 122.
As mentioned above, the content of the fetch address register 122 is used to indicate to the instruction cache 112 address values to be retrieved. Hence, in this way, it will be appreciated that the target address outputted by the BTB 124 in response to a fetched instruction can be used to cause instructions associated with that target address to be retrieved from the instruction cache 112.
In addition to providing an indication of a target address when a hit occurs, the BTB 124 also outputs attribute signals associated with the instruction. One such attribute signal is an indication of whether or not the instruction is associated with a sub-routine return. In the event that the instruction is associated with a sub-routine return then a control signal is provided over the path 115 which causes the multiplexer 126 to select an address provided by the return stack 116 over the path 111 to the multiplexer 118 and into the fetch address register 122 instead of any target address provided by the BTB 124. Hence, in the event that the BTB 124 indicates that the instruction is associated with a sub-routine return then a control signal is provided over the path 117 to cause the return stack 116 to pop data to the multiplexer 126.
The execute unit 130 contains logic operable to detect and repair unpredicted branches or mispredicted branches and causes the entries in the BTB 124 to be updated or allocated in order to help prevent the unpredicted or mispredicted branch from occurring again. The execute unit 130 includes a program counter register 500 which contains an address of the instruction being executed by the execute unit 130. In the event of a branch mispredict (i.e. a non-sequential instruction is required to follow the instruction currently being executed by the execute unit 130 and this has not been predicted which means that the required instruction is currently not available to be executed) the execute unit 130 provides information to associate the two instructions which can then be used to help prevent a misprediction from occurring again. For example, an address of the mispredicted branch may be provided over the path 550 via an adder 570 and the multiplexer 118 to the fetch address register 122. This path provides the actual memory address of the next instruction to be processed following the current instruction to the fetch unit. It will be appreciated that if this return path is used then this means that the instructions currently in the pipeline are not the instructions which are required to be executed by the execute unit 130. Accordingly, those instructions will need to be flushed and the required instruction fetched and then propagated through the pipeline to the execute unit 130. It will be appreciated that flushing the pipeline in this way imposes a large performance penalty on the operation of the data processing apparatus 100.
Hence, the operation of the BTB 124 helps to reduce the frequency at which mispredicted branches occur by storing an indication of branch predictions addresses and associating these with the instructions which caused a branch mispredict during the execution of that instruction in the execution unit 130. Accordingly, if the branch is predictable then, as mentioned above, logic within the execute unit 130 generates signals which are propagated over the path 555 to the fetch unit 110 in order to update the BTB 124 and/or the return stack 124.
As mentioned above, a previously unencountered instruction which causes a branch will not result in a hit in the BTB 124 and therefore will propagate through the pipeline with subsequent sequential instructions following behind. However, once that instruction reaches the execute unit 130 a determination will be made that the instruction to be executed following the current instruction is not the next instruction in the pipeline. Hence, a misprediction will occur and the address of the instruction which ought to follow the current instruction will be loaded into the BTB in an entry associated with the address of the current instruction and having a target field 705 indicating the target address of the next instruction (i.e. the non-sequential or branch destination instruction) to be executed.
Thereafter, subsequent fetches of this instruction will result in a hit occurring in the BTB 124 and the target address of the next instruction to be executed being provided to the instruction cache 112 such that the correct sequence of instructions are provided into the pipeline.
To illustrate the operation of the fetch unit 110 and the execute unit 130 in more detail, an illustrative sequence of instructions is shown in FIG. 8.
Assuming that the BTB 124 is initially empty, the instructions starting at b 0×100 will be fetched in sequence. The adder 132 operates to increment the instruction address by a predetermined fixed amount, in this case 0×008, and this address is stored in the fetch address register 122. The address stored in the fetch address register 122 is provided to the BTB 124, but no hits will occur because the BTB 124 is empty. Accordingly, fetching will occur lineally past the execute block instruction.
When the execute block instruction enters the execute unit 130, control signals provided over the path 520 as a result of decode signals generated by the decode unit 120 in response to the Execute block instruction will control the branch resolution hardware 545 and 560. Also, these control signals operate to freeze the program counter 500 at address 0×110 and the count register 515 will be loaded based on the block length field within the execute block instruction.
Because no branch was predicted and the decoded instruction being executed in the execute unit 130 indicates that a branch should occur, a mispredict will occur. The mispredict causes control signals to be provided over the path 550 and 555 which results in the entire pipeline being flushed because the next instruction which needs to be executed is not present. The target (or branch destination) address will be calculated by the control logic 545 and provided over the path 550 to the fetch address register 122 to cause that instruction to be accessed from the instruction cache 112. Also, the Execute block instruction is identified as a predictable branch.
In addition, the signals provided over the path 555 will cause an entry to be allocated in the BTB 124 as illustrated in FIG. 7 b. The BTB 124 will be loaded with an entry 720 which signifies that there is a branch at address 0×110, having a target address of 0×200 and that the return stack 116 should be pushed.
Returning now to FIG. 8, fetching will resume at address 0×200 within the block of instructions, and the subsequent fetched instructions in the block will be decoded and executed in turn until the last instruction in the block is reached. A microPC counter 600 is incremented within the execute unit 130 during the execution of each instruction within the block. Processing of instructions within the block is terminated when the microPC counter 600 reaches the value stored within the count register 515 which indicates that the last instruction in the block has been reached. The microPC counter 600 is then reset to zero. On the first occurrence of this event the fetch unit 110 will know nothing of the instruction block termination and will continue to fetch instructions lineally at addresses past 0×208. Hence, the instructions in the pipeline behind the instruction at address 0×208 are not required to be executed and so the branch resolution hardware 560 and 545 will initiate a pipeline flush and indicate that a branch mispredict has occurred. The address returned to the fetch unit 110 will be the program counter value of the Execute block instruction stored in the program counter register 500 (which was frozen) plus an offset of 4, in this example address 0×114 (which is the address of the instruction immediately following the Execute block instruction). As mentioned previously, it will be appreciated that causing the pipeline to flush results in a large performance cost.
Hence, although the last instruction within the block at address 0×208 is not encoded as a branch instruction, a branch occurs following the execution of this instruction. The control signals sent over path 555 are not based on the decode of the instruction at memory address 0×208 itself (as would be the case for other predictable branches) but are instead based on the micro pc value 600 reaching a predetermined value (in this case the value stored in the count register 515).
Hence, the control signals provided over path 555 indicates that the instruction at address 0×208 resulted in a predictable branch. The functionality of this predictable branch is similar to that of a sub-routine return as the address of the next instruction to be processed is the address immediately following that of the calling Execute block instruction (in this case the immediately following instruction is the instruction at address 0×114). Accordingly, the control signals provided over the path 555 cause a second entry 730 to be allocated in the BTB 124. This entry 730 is associated with instruction address 0×208 (the last instruction in the embedded block). As can be seen, in this example, no target address is stored in the entry 730 of the BTB 124 but, instead, the branch type field is set to indicate a return stack 116 pop.
Thereafter, subsequent fetches by the fetch unit 110 of the code sequence shown in FIG. 3 when starting at address 0×100 will fetch lineally until address 0×110. On the occurrence of address 0×110, the BTB 124 will hit on entry 720 using the address value 0×110 of the Execute block instruction. The BTB hit will cause the target address of 0×200 to be provided to the fetch address register 122 to redirect fetching to the target instruction at address 0×200. In addition, since the branch type indicates a return stack push then the return stack 116 will be pushed with the immediately following (the next sequential linear address) 0×114 following the Execute block instruction. Hence, the instructions will be retrieved by the instruction cache 112 and stored in the instruction queue 114 to be provided to the decode unit 120 over the bus 500 in the correct order, namely, the instruction at address 0×110 immediately followed by the instruction at address 0×200.
When the last instruction in the embedded block is fetched by the fetch unit 110 then the BTB 124 will hit on entry 730 for address 0×208. Because the entry 730 indicates a return stack pop then the return stack 116 will be popped to provide the next fetch address to be provided over the path 111 to the fetch address register 122 instead of the target address stored in the target address field 705 (which in this case contains a null value).
It will be appreciated that this also results in the instructions being accessed from the instruction cache 122 and passed to the instruction queue 114 in the correct order, namely, the instruction at address 0×208 will be followed by the instruction at address 0×114.
Meanwhile, the logic in the execute unit 130 will continue to operate as before, except, in this case, no flush will be required on either entry to the instruction block or exit from the instruction block as long as it is determined that these branches were properly predicted.
Typically, logic is provided (in this example, within logic 560) which only allows branch prediction on branch-type instructions and any prevents branch prediction occurring for instructions which are not branch-type instructions. However, it will be appreciated that the last instruction in the block (in this example the instruction at address 0×208) is in fact not a branch-type instruction. Hence, in this example, any control signal provided over the path from the decode unit 120 indicating that the instruction is not a branch-type instruction is overridden when the value stored in the microPC counter 600 reaches the value stored within the count register 515. In this way, it will be appreciated that branch prediction can occur for those instructions which are not necessarily coded as branch-type instructions.
As mentioned above, in the event that a branch occurs within the embedded block (for example the embedded block terminates with a branch instruction or a branch occurs within the embedded block due, for example, to a conditional branch) then the execution of the embedded block is terminated. The microPC counter 600 is reset to zero, the address returned to the fetch unit 110 will be the decoded destination address of that branch instruction, the program counter value stored in the program counter register 500 (which was frozen) plus the offset is disregarded, as is the address popped from the return stack 116. Also, the control signals provided over the path 555 cause an entry to be allocated in the BTB 124. This entry is associated with instruction address of the branch instruction. A target address may be stored in the entry of the BTB 124 or alternatively, the branch type field is set to indicate a return stack 116 pop and the decoded destination address of that branch instruction is pushed onto the return stack 116.
It will be appreciated that exceptions may occur during the execution of the instruction blocks. When this occurs the pipeline needs to be able to return to the state it was in prior to the exception occurring. Hence, certain information must be stored as architectural state. Accordingly, the value of the program counter 500 (which contains the address of the execute block instruction) is stored. Also, the value stored in the microPC counter 600 is stored as a sub-field of the micro PSR register 605.
Upon return from the exception, the fetch unit 110 will be restarted. The address of the execute block instruction will be restored, together with the value of the microPC counter 600 which is restored from the PSR register 605.
Thereafter, on fetching the instruction at address 0×110, a hit will occur in the BTB 124 for the entry 720. This will redirect fetching to the stored target address of 0×200. Also, the entry 720 will indicate a return stack push. However, it will be appreciated that performing this return stack push would result in incorrect operation since the return stack 116 will already hold an entry with the address immediately following the execute block instruction since this would have been pushed onto the return stack 116 prior to the exception occurring. To avoid this duplication from occurring the return stack push will be inhibited if the value stored in the micoPC counter 600 is not equal to zero (for example, path 117 will be blocked). This will be indicated by a control signal sent from the execute unit 130 to the fetch unit 110.
Hence, if the microPC value stored in the micro PSR register 605 is not equal to zero then this indicates that the instruction that address 0×200 will already have been executed, together with a number of other instructions in the block of instructions. Hence, the instruction to be executed is indexed by the microPC value in order that the correct instruction is identified to be restarted following the completion of the exception.
Accordingly, techniques are described which recognise that whilst the last instruction in the block of instructions is typically not itself a branch instruction, this instruction can be viewed as a branch instruction. However, the last instruction itself would not typically provide an indication of the memory location of the instruction following execute block instruction or contain any information which would normally enable such an indication to be provided. By storing an indication of a memory location of an instruction following the execute block instruction when an execute block instruction is encountered, when it is determined that the last instruction in the block is being processed then an indication of the memory location of the instruction following the execute block instruction is provided to the instruction fetching circuit. Providing the indication of the memory location of the instruction following the execute block instruction to the instruction fetching circuit causes the fetch unit to fetch that instruction. In this way, the correct sequence of instructions is fetched by the fetch unit which avoids the need to flush instructions which enables the execute block instruction to be implemented in an efficient manner whilst enabling much existing architecture to be utilised with little modification.
Although illustrative embodiments of the invention have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various changes and modifications can be effected therein by one skilled in the art without departing from the scope of the invention as defined by the appended claims.

Claims

1. A data processing apparatus comprising:

an instruction fetching circuit operable to fetch a sequence of program instructions from a sequence of memory locations;

an instruction decoder responsive to program instructions within said sequence of program instructions fetched by said instruction fetching circuit to control data processing operations specified by said program instructions; and

an execution circuit operable under control of said instruction decoder to execute said data processing operations, wherein said instruction decoder is responsive to an execute block instruction within said sequence of program instructions to trigger fetching of a block of two or more program instructions by said instruction fetching circuit and execution of said block of two or more program instructions by said execution circuit, said block of two or more instructions containing a number of program instructions specified by a block length field within said executed block instruction and being stored at a memory location specified by a location field within said execute block instruction, said apparatus further comprising

execute block instruction logic operable in response to said execute block instruction to store an indication of a memory location of an instruction following said execute block instruction and to determine which instruction in said block of two or more program instructions is being processed, said execute block instruction logic being further operable when it is determined that a last instruction in said block of two or more program instructions is being processed to provide to said instruction fetching circuit said indication of said memory location of said instruction following said execute block instruction so that said instruction following said execute block instruction is fetched for execution immediately following said last instruction in said block of two or more program instructions.

2. The data processing apparatus as claimed in claim 1, wherein

said execute block instruction logic is operable in response to said execute block instruction to store an indication of said memory location specified by said location field within said execute block instruction, said execute block instruction logic being further operable when it is determined that said execute block instruction is being processed to provide to said instruction fetching circuit said indication of said memory location specified by said location field within said execute block instruction so that instructions in said block of two or more program instructions are fetched for execution immediately following said execute block instruction.

3. The data processing apparatus as claimed in claim 2, wherein

said execute block instruction logic comprises storage operable to store said indication of said memory location of said instruction following said execute block instruction, which indication being associated as a target memory location with said last instruction in said block of two or more program instructions.

4. The data processing apparatus as claimed in claim 3, wherein

said storage is further operable to store said indication of said memory location specified by said location field within said execute block instruction, which indication being associated as a target memory location with said execute block instruction.

5. The data processing apparatus as claimed in claim 3, wherein

said storage contains a number of entries, each entry having a field for storing an indication of said memory location of an instruction and an associated field for storing an indication of a corresponding target memory location.

6. The data processing apparatus as claimed in claim 3, wherein

said storage comprises a return stack operable to store said indication of said memory location of said instruction following said execute block instruction, which indication being associated as said target memory location with said last instruction in said block of two or more program instructions.

7. The data processing apparatus as claimed in claim 6, wherein

said execute block instruction logic is further operable to push said indication of said memory location of said instruction following said execute block instruction onto said return stack in response to said execute block instruction.

8. The data processing apparatus as claimed in claim 7, wherein

said execute block instruction logic is operable, in the event that it is determined that said last instruction in said block of two or more program instructions is being processed and said storage provides a target memory location associated with said last instruction in said block of two or more program instructions, to disregard said target memory location associated with said last instruction in said block of two or more program instructions and instead to pop said return stack and to provide to said instruction fetching circuit said indication of said memory location popped from said return stack.

9. The data processing apparatus as claimed in claim 7, wherein

said execute block instruction logic comprises interrupt handling logic which, in the event of an interrupt, stores an indication of said memory location of said execute block instruction and an indication of which instruction in said block of two or more program instructions is being executed when said interrupt occurs to enable upon completion of handling of said interrupt restarting execution of said block of two or more program instructions at a program instruction within said block of two or more instructions indicated by said indication of said memory location of said execute block instruction and said indication of which instruction in said block of two or more program instructions is being executed.

10. The data processing apparatus as claimed in claim 9, wherein

said interrupt handling logic prevents the pushing of said indication of said memory location of said instruction following said execute block instruction onto said return stack in response to said execute block instruction being processed following said interrupt.

11. The data processing apparatus as claimed in claim 4, wherein

said execute block instruction logic comprises prediction logic operable to receive an indication of each instruction being fetched by said instruction fetching circuit and to determine whether that instruction has associated therewith a target memory address and, if so, to provide said target memory address to said instruction fetching circuit so that the instruction at that memory address is fetched for execution immediately following that instruction.

12. The data processing apparatus as claimed in claim 11, further comprising

prediction prevention logic operable to prevent prediction logic from providing said target memory address to said instruction fetching circuit for non-branch encoded instructions, said execute block instruction logic comprising prediction prevention override logic operable to inhibit operation of said prediction prevention logic when said last instruction in said block of two or more program instructions is being processed.

13. The data processing apparatus as claimed in claim 1, wherein

said execute block instruction logic comprises a counter operable to provide an indication of which instruction in said block of two or more program instructions is being processed.

14. The data processing apparatus as claimed in claim 1, wherein

said execute block instruction logic is operable when it is determined that a last instruction in said block of two or more program instructions is being processed to provide to said instruction fetching circuit said indication of said memory location of said instruction immediately following said execute block instruction.

15. A method of processing data, comprising

fetching a sequence of program instructions from a sequence of memory locations with an instruction fetching circuit;

controlling data processing operations specified by said program instructions with an instruction decoder responsive to program instructions within said sequence of program instructions fetched by said instruction fetching circuit; and

executing said data processing operations with an execution circuit under control of said instruction decoder, wherein

said instruction decoder is responsive to an execute block instruction within said sequence of program instructions to trigger fetching of a block of two or more program instructions by said instruction fetching circuit and execution of said block of two or more program instructions by said execution circuit, said block of two or more instructions containing a number of program instructions specified by a block length field within said executed block instruction and being stored at a memory location specified by a location field within said execute block instruction, said method further comprising,

storing, in response to said execute block instruction, an indication of a memory location of an instruction following said execute block instruction and determining which instruction in said block of two or more program instructions is being processed, and

when it is determined that a last instruction in said block of two or more program instructions is being processed, providing to said instruction fetching circuit said indication of said memory location of said instruction following said execute block instruction so that said instruction following said execute block instruction is fetched for execution immediately following said last instruction in said block of two or more program instructions.

16. The method as claimed in claim 15, comprising:

storing, in response to said execute block instruction, an indication of said memory location specified by said location field within said execute block instruction and when it is determined that said execute block instruction is being processed to provide to said instruction fetching circuit said indication of said memory location specified by said location field within said execute block instruction so that instructions in said block of two or more program instructions are fetched for execution immediately following said execute block instruction.

17. The method as claimed in claim 16, comprising

providing storage operable to store said indication of said memory location of said instruction following said execute block instruction, which indication being associated as a target memory location with said last instruction in said block of two or more program instructions.

18. The method as claimed in claim 17, comprising

providing storage operable to store said indication of said memory location specified by said location field within said execute block instruction, which indication being associated as a target memory location with said execute block instruction.

19. The method as claimed in claim 17, wherein

20. The method as claimed in claim 17, wherein

21. The method as claimed in claim 20, comprising

pushing said indication of said memory location of said instruction following said execute block instruction onto said return stack in response to said execute block instruction.

22. The method as claimed in claim 21, comprising

disregarding, in the event that it is determined that said last instruction in said block of two or more program instructions is being processed and said storage provides a target memory location associated with said last instruction in said block of two or more program instructions, said target memory location associated with said last instruction in said block of two or more program instructions and instead to pop said return stack and to provide to said instruction fetching circuit said indication of said memory location popped from said return stack.

23. The method as claimed in claim 21, comprising

storing, in the event of an interrupt, an indication of said memory location of said execute block instruction and an indication of which instruction in said block of two or more program instructions is being executed when said interrupt occurs to enable upon completion of handling of said interrupt restarting execution of said block of two or more program instructions at a program instruction within said block of two or more instructions indicated by said indication of said memory location of said execute block instruction and said indication of which instruction in said block of two or more program instructions is being executed.

24. The method as claimed in claim 23, comprising

preventing the pushing of said indication of said memory location of said instruction following said execute block instruction onto said return stack in response to said execute block instruction being processed following said interrupt.

25. The method as claimed in claim 18, comprising

receiving an indication of each instruction being fetched by said instruction fetching circuit and determining whether that instruction has associated therewith a target memory address and, if so, providing said target memory address to said instruction fetching circuit so that the instruction at that memory address is fetched for execution immediately following that instruction.

26. The method as claimed in claim 25, comprising

inhibiting the operation of prediction prevention logic which prevents said target memory address from being provided to said instruction fetching circuit for non-branch encoded instructions when said last instruction in said block of two or more program instructions is being processed.

27. The method as claimed in claim 15, comprising

providing an indication of which instruction in said block of two or more program instructions is being processed.

28. The method as claimed in claim 15, comprising

providing, when it is determined that a last instruction in said block of two or more program instructions is being processed, said instruction fetching circuit with said indication of said memory location of said instruction immediately following said execute block instruction.