1
VLIW COMPUTER PROCESSING
ARCHITECTURE HAVING THE PROBLEM
COUNTER STORED IN A REGISTER FILE
REGISTER
CROSS-REFERENCES TO RELATED
APPLICATIONS
This applications claims the benefit of U.S. Provisional Patent Application Ser. No. 60/187,738, filed on Mar. 8, 2000 and entitled "Computer Processing Architecture Having the Program Counter Stored in a Register File Register," the entirety of which is incorporated by reference herein for all purposes.
BACKGROUND OF THE INVENTION
The present invention relates generally to a novel computer processing architecture, and more particularly to a processing core in which one register in a register file stores the program counter and another register in the register file always stores a zero value, making address calculations and program jumps and branches quicker and more efficient.
Computer architecture designers are constantly trying to increase the speed and efficiency of computer processors. For example, computer architecture designers have attempted to increase processing speeds by increasing clock speeds and attempting latency hiding techniques, such as data pre-fetching and cache memories. In addition, other techniques, such as instruction-level parallelism using very long instruction word (VLIW) designs, and embeddedDRAM have been attempted. However, one of the best methods of improving speed and efficiency in computer processors is to improve the efficiency of the instruction set, and in particular, decreasing the number of instructions required to perform a particular function.
In the prior art computer architectures, the program counter typically is stored in a special register apart from the register file used by the processing pipeline. After instructions have been executed, the program counter typically is updated, so that the computer processor properly progresses through program execution. With these prior art architectures, branch and jump instructions typically are individual instructions within an instruction set. Thus, to perform branch and jump functions, the branch or jump program location first is calculated, and then the jump instruction is executed. As one skilled in the art will appreciate, to perform a proper jump or branch function, as many as 6 or 7 instructions may be executed. Thus, it is desirable to have a processing architecture, which can perform jumps and branches more efficiently.
Similarly, in the prior art processing architectures, retrieving data from a memory location can require as many as 6 to 10 program instructions. For example, address calculation values must be retrieved, the memory address must be calculated from the retrieved calculation values, and a load or fetch from memory must be executed. In addition, if data fetch errors occur, the data fetch process typically is repeated. Thus, data fetches from memory can be a big processing bottleneck, especially if fetch errors occur. Therefore, it would be advantageous to simplify the data load or data fetch process.
SUMMARY OF THE INVENTION
According to the invention, a processing core comprising a processing pipeline having N-number of processing paths,
2
each of which process instructions on M-bit data words. In addition, the processing core includes one or more register files, each having Q-number of registers which are M-bits wide. One of the Q-number of registers in at least one of the
5 register files is a program counter register for holding a program counter, and one of the Q-number of registers in at least one of the register files is a zero register for holding a zero value. In this manner, program jumps can be executed by adding values to the program counter in the program
10 counter register, and memory address values can be calculated by adding values to the program counter stored in the program counter register or to the zero value stored in the zero register.
15 In accordance with one embodiment of the present invention, a processing instruction comprises N-number of P-bit instructions appended together to form a very long instruction word (VLIW), and the N-number of processing paths process the N-number of P-bit instructions in parallel. In
20 accordance with one embodiment of the invention, M is 64, Q is 64 and P is 32. Accordingly, the N-number of processing paths, each process 32-bit instructions on 64-bit data words, and the plurality of register files each have 64 registers which are 64-bits wide.
25
In addition, in accordance with another embodiment of the present invention, the registers in the one or more register files are either private or global registers. When data is written to a global register in one of the one or more
30 register files, the data is propagated to a corresponding global register in the other of the plurality of register files. In this manner, the global registers in each of the one or more register files hold the same data. Conversely, when data is written to a private register in a register, that data in not
35 propagated to the other register files.
To indicate whether a register in a register file is either private or global, a 64-bit special register is used. Each bit in the 64-bit special register corresponds to one of the registers in the register file, and the setting of each bit
40 determines the status of the corresponding register (i.e., private or global). For example, assume that a 1-bit corresponds to a private register and a 0-bit corresponds to a global register, if the first bit in the special register is 0, then the first register in the register file is a global register.
45 Similarly, if bit number 32 in the special register is 1, then register 32 in the register file is a private register, and so on.
In accordance with one embodiment of the present invention, the program counter register, the zero register or both
5Q can be global registers. Therefore, when the program counter is updated in one register file, the program counter value is propagated to the program counter registers in the other register files. Thus, all processing paths can perform jumps and can use the program counter to calculate address
55 values.
In accordance with yet another embodiment of the present invention, a scalable computer processing architecture comprises a plurality of processor chips, which include the processing core of the present invention, connected together
go in parallel. In this manner, a plurality of multi-processing path pipelines can be connected together to form a powerful parallel processor. Each processor chip may comprise one or more register files.
A more complete understanding of the present invention
65 may be derived by referring to the detailed description of preferred embodiments and claims when considered in connection with the figures.