WO2000068782A1

WO2000068782A1 - Method for developing semiconductor integrated circuit

Info

Publication number: WO2000068782A1
Application number: PCT/JP1999/002348
Authority: WO
Inventors: Hirotsugu Kojima; Tadaaki Tanimoto; Haruo Kamimaki; Tetsuya Nakagawa; Yuki Inoue
Original assignee: Hitachi, Ltd.
Priority date: 1999-05-06
Filing date: 1999-05-06
Publication date: 2000-11-16

Abstract

A method for developing a semiconductor integrated circuit by adding an accelerator to a processor capable of executing instructions included in a first instruction set, comprises the step of designing the accelerator capable of executing an instruction, which is the same instruction included in the first instruction set or has the same instruction codes as those in an instruction in the first instruction set but brings a different result, in a second instruction set including an instruction of a low-order compatible. Since the second instruction set includes the same instruction included in the first instruction set or includes an instruction having the same instruction codes as those of an instruction in the first instruction set, any other functions are not added to the logic performing interpretation of an instruction fetch and control of the instruction execution procedure in the accelerator with respect to the control logic of the processor, and the logic realized by function reduction is defined. The time required to develop a semiconductor integrated circuit having a combination of a processor and an accelerator can be shortened.

Description

Details Development method of semiconductor integrated circuit

Technical field

The present invention relates to a method for developing a semiconductor integrated circuit by combining a processor and an accelerator, and a recording medium storing design data and a development program used in the development method. It relates to technology that is effective when applied to a method of developing a system LSI (Large Scale Integrated circuit) using the method of Application Specific Integrated Circuit (Application). Background art

When designing a semiconductor integrated circuit, etc., a functional unit such as an arithmetic function and a signal control function to be provided therein is called a block (module). In this block, the layout of that part on the semiconductor substrate, that is, the layout design is completed, and data representing multiple mask patterns for forming the layout is provided to the chip designer as parts. There is a hard macro block (hard module) to be provided. Recently, such a node macro block is also called a node IP (Intellectual Property) module. When such a hard macro block is provided to a chip designer, the function of the block is expressed in data such as HDL (Hardware Description Language) as a data representing the hard macro block. Along with the data describing the pattern, the data of the mask pattern that represents the layout of the circuit on the semiconductor substrate (for example, forming a mask pattern) And the like are provided. For such a hard macroblock, there is a so-called soft macroblock. In a soft macro block, the function of the block (circuit) is specified by a description such as HDL, and the description is provided as a component to a chip designer. Such a soft macro block is also called a soft IP module for a hard IP module. The circuit size of macroblocks such as the above-mentioned hard IP module and soft IP module is as follows: Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), and Dynamic Random Access Memory (DRAM). It may cover functional units such as access (memory), CPU (Central Processing Unit), and DSP (Digital Signal Processor).

As for IP modules, etc., "Verification of system LSI using IP core" is described on pages 99 to 109 of Nikkei Electronics No. 72 3 (August 10, 1998). .

The present inventor has studied a case where an ASICS system LSI is developed using a soft IP module of a processor. In other words, if it is not possible to obtain the required performance if all of the required processing is realized by software on the processor, consider the case where some of the processing is realized by using dedicated hardware such as Xcelare. did. Here, the term “executable” means a dedicated circuit for bearing a part of the functions that can be realized on the processor.

The design method of the system LSI studied by the inventor is shown in FIG. The contents of FIG. 13 are not known. First, a program (base program) that runs on the processor is developed. The base program is a program development environment of a processor, that is, a compiler, software It will be developed using development tools such as Airushi Miura and a debugger. Verification that the developed base program satisfies the functional specifications is performed at this stage. Next, verify whether the base program satisfies the time constraints of the applied system. If the constraints are satisfied, a system LSI can be realized using the port processor. If the time constraints cannot be satisfied, the system LSI will be implemented by improving the processing capacity by introducing an Axelaray system.

Verification of time constraints is performed in the base program development environment. The time constraint to be verified at this stage is a constraint determined by the relationship between the system LSI and the outside world, and may be as long as a few milliseconds to a few seconds. Soft-to-air simulations are generally thousands to tens of thousands of times faster than logic simulations, and are useful for such verification.

Analyze the base program to extract application-specific operations, etc., and determine the specifications of the accelerator. The hardware side of the AXELLALE is completed through the design of arithmetic circuits, definition of microcommands, and design of control circuits. On the software side, it is necessary to modify the base program and develop a microprogram to control the excel process. The change of the base program replaces the part extracted as "Axellare" with the processing for communication with "Axellare". On the other hand, the microprogram that controls the accelerator must form a desired control sequence by combining the defined microinstructions.

Finally, the verification of the environment in which the processor and the accelerator are integrated is performed by logic simulation or the like, thereby completing the design of the desired system LSI.

The design of an axelator is usually divided into a part that performs the de- night calculation and a part that controls the sequence. The design of the part that performs data operation is Although it is relatively easy, the design of the part that controls the sequence is complicated and is a time-consuming process even for engineers with expertise in LSI design. This is a major burden when developing system LSIs in a short time.

The inventor has found the following problems in the above development method. The first problem is that there is no development environment in the development of micro-programs for controlling Xerasele. Since the micro-programs and control circuits of AXELLALE are designed optimally according to the application, that is, they are individually designed, the software development environment for the individual micro-programs that are subject to such individual design That is, software simulators corresponding to the defined microinstructions are not usually provided from the viewpoint of time and economy. Therefore, verification that the microprogram is operating correctly is performed by a logic simulation or the like at the time of hardware design. At this stage, not only defects in the microprogram itself but also defects in the definition of microinstructions and control circuits are often found, causing significant design rework. In short, the problem that the development period of the semiconductor integrated circuit is prolonged is caused by the fact that the original design of the control circuit and the like of Accelare is freely accepted.

The second problem is the difficulty in verifying the integrated functions and performance of the system LSI. Even if the function of the base program is verified at the initial stage, whether the function is realized by the Axelaray, whether the program change for communication with the Axelare is correct, etc. As a means of verifying whether the entire system LSI operates correctly in a unified manner, only simulations for hardware design such as logic simulation can be used. In general, hardware simulation Since Yon has several orders of magnitude lower speed performance than the software development tool software simulation, detailed verification at the system level is practically impossible ₍ the difficulty of integrated verification as a system LSI is also difficult). The reason is that the company has approved its own design.

The present inventor has developed a software simulation of a processor which has already been provided.

—The inefficiency that the development environment such as evenings could hardly be used for the development of the Axelare and the integrated verification with the Axelale was found.

An object of the present invention is to provide a development method capable of shortening a period for developing a semiconductor integrated circuit by combining a processor and an accelerator. Another object of the present invention is to easily prepare a development environment for developing a semiconductor integrated circuit by combining a processor and an accelerator, and to enable reliable verification in a short time. An object of the present invention is to provide a method for developing a semiconductor integrated circuit.

Still another object of the present invention is to enable the above-mentioned development method to be immediately executed using a computer.

Still another object of the present invention is to make it possible to easily realize the above development method from the viewpoint of design data or design assets.

Another object of the present invention is to provide a semiconductor integrated circuit in which the development cost is reduced and the operation reliability is improved in a semiconductor integrated circuit which aims to speed up data processing by combining a processor and an accelerator. To provide.

The above and other objects and novel features of the present invention will become apparent from the following description of the present specification and the accompanying drawings. Disclosure of the invention

[1] The method for developing a semiconductor integrated circuit according to the present invention includes a first instruction set. A method for generating a semiconductor integrated circuit by adding an executable instruction to a processor capable of executing an instruction included therein, wherein the instruction is the same as the instruction included in the first instruction set or the first instruction. The method includes a process of designing the excel instruction so that an instruction of a second instruction set including a backward compatible instruction having the same instruction code and an execution result different from that of an instruction included in the instruction set can be executed. The processor is, for example, a reduced instruction set computer (RISC) architecture, and basically executes each instruction in one clock cycle. The configuration in which the second instruction set includes the same or lower-compatible instructions with respect to the instructions included in the first instruction set is a method of controlling instruction flipping, decoding, and instruction execution procedures in an accelerator. It specifies that the logic to be implemented is realized by reducing the number of functions without adding new functions to the control logic of the processor. That is, the accelerator is configured as a subset of the processor. For example, the code of the addition instruction included in the first instruction set is assigned to the Galois field multiplication function in the second instruction set, and the type of control signal obtained by decoding the instruction code is the same as the decoding result in the processor. However, these control signals are diverted to the arithmetic control signals of the Galois field multiplier added to the AXELLALE. The Galois field multiplier will adopt a logic configuration that can perform Galois field multiplication using the control signal. When the processor has the RISC architecture, it is very easy to use the control signal obtained by decoding the instruction code to control an arithmetic unit such as a Galois field multiplier different from the processor. is there.

As described above, it is not allowed to configure an excel processor with a unique circuit unique to each application (irrespective of the processor architecture), and since the excel processor is configured as a subset of the processor, processor development is performed. By using the debugger, compiler, and software simulator, which are environments, it is possible to easily obtain the development environment for Axelara. For example, the process If the function description of the process to be replaced with “Axelare” is replaced in the “Simulation of software”, the software can be used as it is. By combining the software simulation of the X-ray generator generated in this way with the software simulator of the processor, integrated verification of the entire semiconductor integrated circuit such as a system LSI equipped with the processor and the X-ray generator can be achieved. Can also be easily realized. Therefore, it is possible to easily prepare a development environment for developing a semiconductor integrated circuit by combining a processor and an accelerator, and it is possible to perform reliable verification in a relatively short time, and The development period can be shortened. A semiconductor integrated circuit developed by such a method will realize a reduction in development cost and an improvement in operation reliability.

[2] A method of developing a semiconductor integrated circuit in a more detailed embodiment according to the present invention will be described. A method of developing a semiconductor integrated circuit includes a process of extracting a part of a function of a base program developed to operate on a processor having a first instruction set; and a method of extracting a part of a function included in the first instruction set. The same instruction or the second instruction set including a lower compatible instruction having an execution result different from the instruction code included in the first instruction set. A process of designing a logical configuration of an xerare overnight to realize the above process; a process of designing an xerare night control program for causing the xerarae to execute the extracted process; and A process of making a change to the base program so that the extracted process is replaced with a process by the control program for exchanging data. And a process of performing an integrated simulation of the processor by the computer program and the accelerator by the accelerator control program.

The process for designing the control program includes the control process.

—Use the evening software simulation to set up the control program Including the process of verifying the system, the software simulation of the accelerator can be obtained by changing a part of the software of the processor:!: Simulation.

The process of performing the integrated simulation is a process performed using an integrated simulation, and the integrated simulator obtains a combination of the software simulation of the accelerator and the software simulation of the processor. be able to.

[3] A semiconductor integrated circuit designed using the above-described development method reads and decodes instructions included in the first instruction set, and executes instructions included in the first instruction set. A processor having a control unit for controlling the procedure, an execution unit for performing arithmetic processing in accordance with the control signal generated by the control unit, and fetching and decoding instructions included in the second instruction set A control unit for controlling an execution procedure of an instruction included in the second instruction set; and an execution unit having an execution unit for performing arithmetic processing according to a control signal generated by the control unit. The second instruction set has the same instruction code as the instruction included in the first instruction set or the same instruction code as the instruction included in the first instruction set and has a different execution result. Includes backward compatible instructions. The control unit of the accelerator has a logical function in which a part of the logical function in the control unit of the processor is deleted.

Further, a semiconductor integrated circuit according to another aspect designed using the above-described development method includes a processor and an exciter similar to the above, and the second instruction set includes instructions included in the first instruction set. The same instruction or the instruction of the lower compatibility described above is included in the execution unit, but some execution functions are deleted from the execution unit of the processor, and a new execution function is added instead. May be adopted.

The control unit includes, for example, a program counter, an instruction decoder, and a sequencer. A control unit. The sequence control means causes the program counter to hold the instruction address to be executed next, and causes the instruction decoder to output a control signal based on the instruction fetched according to the instruction address held by the program count. The execution unit includes a register circuit whose operation is controlled by the control signal and an arithmetic unit.

[4] A recording medium storing a program to be executed by a computer to implement the development method is a part of a base program extracted from a base program developed to operate on a processor having a first instruction set. The processing is performed by using the same instruction for the instructions included in the first instruction set or the lower compatible with different execution results having the same instruction code for the instructions included in the first instruction set. A process of designing a logical configuration of an x-ray instruction for realizing using a second instruction set including instructions, and an x-ray control for causing the extracted part of the processing to be executed in the x-ray display in the evening The process of designing a program and the program for causing the program to be executed on a computer are recorded in a computer-readable form.

The program includes a process of making a change to the base program to substitute the extracted part of the process by the base program with a process by the accelerator control program; and a process of changing the base program. The program may further include a process of performing an integrated simulation of the processor by the program and the exercise by the exercise control program. By providing the above program, the above development method can be used immediately.

[5] In the above development method, design data such as IP module data of a processor can be used. Considering that the design data is to be used for the development of the Xerasele, which is positioned as a subset of the processor, the design data contains function definition information for defining the functions of the processor. Information and control signal information are recorded on a recording medium in a computer-readable manner and provided.

The function definition information includes a first definition that defines the function of a control unit that performs the reading and decoding of the instructions included in the first instruction set and controls the execution procedure of the instructions included in the first instruction set. Information and second information that defines the function of an execution unit that performs arithmetic processing in accordance with the control signal generated by the control unit, each of which has a function that can be changed. As a result, when the design data of this processor is used in the design of the accelerator, the functions required for the subset are given to the control unit without adding new functions, and the execution unit has the unique functions. A new function can be provided for the arithmetic processing. The control signal information is information that clearly specifies a control signal between the control unit and the execution unit. That is, the meaning and the logical value of the control signal between the part that controls the operation sequence of the processor and the part that performs the data calculation are disclosed to the designer of the semiconductor integrated circuit. Thus, it is possible to easily determine which control signal of the control signals from the control unit is to be assigned to the control of the new arithmetic function in the execution unit. Therefore, the use of the above design data makes it extremely easy to design an excellence. BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a flowchart showing an example of a method for developing a semiconductor integrated circuit according to the present invention.

FIG. 2 is an explanatory diagram showing an example of a base program.

FIG. 3 is a block diagram showing an example of an arithmetic circuit of the processor. FIG. 4 is a block diagram showing an example of the arithmetic circuit of the Xerasele. FIG. 5 is an explanatory diagram illustrating an instruction set of a processor.

FIG. 6 is an explanatory diagram exemplifying an instruction set of an excelerre. FIG. 7 is a block diagram of a system LSI configured by combining a processor with an accelerator.

FIG. 8 is an explanatory diagram showing an example of a program for an axelare evening. FIG. 9 is an explanatory diagram showing an example of the IP module data of the processor.

FIG. 10 is an explanatory diagram showing an example of the software simulation of the processor.

FIG. 11 is a perspective view showing an example of a computer used for developing the system LSI.

FIG. 12 is a block diagram of a system LSI having two processors together with a processor.

FIG. 13 is a flowchart showing a method of designing the system LSI studied by the inventor previously.

BEST MODE FOR CARRYING OUT THE INVENTION

FIG. 1 is a flowchart showing an example of a method for developing a semiconductor integrated circuit according to the present invention. First, a program (base program) that runs on the processor is developed (S1). The target processor here is in the form of a reduced instruction set computer (RISC). The base program is developed using a program development environment of a processor, that is, a development tool such as a compiler, a software simulator, a debugger, or the like. Verification that the developed base program satisfies the functional specifications is performed at this stage using a software simulator.

Next, it is verified whether the base program satisfies the time constraint of the applied system (S2). Verification of time constraint is base program Perform in a development environment. The time constraints to be verified at this stage are constraints determined by the relationship between the system LSI and the outside world. The software simulation is generally thousands to tens of thousands of times faster than the logic simulation, and is useful for such verification.

Since there is no problem if the time constraint is satisfied, the system LSI is realized using the processor as it is (S3). If the time constraints cannot be satisfied, the system will be implemented by improving the processing capacity by introducing the best practice, such as the introduction of Xerasele. In the case of introducing an accelerator, a processing routine to be realized in the accelerator is extracted from a base program (S4). The extraction criterion focuses on the fact that it is a bottleneck in data processing performance when implemented on a processor, such as simple repetition of processing many times or application-specific operations.

In step S4, if an operation specific to the application has been extracted, an arithmetic circuit suitable for the extracted operation is designed (S5). The operations specific to the application include, for example, floating-point operations and double-precision operations in computer graphics, root-mean-square operations in error calculation, and Gaussian operations in error correction. Such special operations are often not directly supported by general-purpose processors, and are implemented by combining existing instructions, thus requiring multiple instruction processing steps. Often it can be achieved.

Next, the instruction set for the X-Axelare is defined (S6). In other words, by exchanging only the operation type without changing the number or type of operands, the arithmetic operation instruction of the processor is defined as a special operation, and further restricted to the minimum necessary control instructions. Define a set. In other words, a special operation is supported by replacing only the type of operation without changing the number or type of operands in the operation instruction of the processor. Up The definition of the instruction set of the Xerasele Iya has the following significance. The instruction set (second instruction set) of the AXELLALE is the same instruction as the instruction included in the instruction set (first instruction set) of the processor or included in the first instruction set. This means that instructions have the same instruction code and include backward-compatible instructions with different execution results. The fact that the second instruction set includes the same or lower-compatible instructions as the instructions included in the first instruction set means that instruction fetching, decoding, and instruction execution procedures are performed in an ex-celerator. This means that the control logic circuit that performs the control described above does not have a new function added to the control logic circuit of the processor but is realized by reducing the functions in the processor. That is,

—Evening will be configured as a subset of the processors.

Next, an unnecessary function is deleted from the control logic circuit of the processor so as to conform to the instruction set of the X-ray transmission defined above, and the control logic circuit of the X-ray transmission is designed (S7). The relationship between the control logic circuit of the Xerasele and the control logic circuit of the processor is as described above. If the processor's control logic circuit is given in a high-level description such as the register transfer function, unsupported functions should be deleted from the description and the control logic circuit re-synthesized. If it is defined in the truth table, the function is limited, and unnecessary combinations of input values are deleted to reduce the size of the truth table. The output whose value does not change due to the reduction of the truth table enables the reduction of another truth table to which the value is input, and eventually the circuit size of the entire control logic circuit is reduced.

On the other hand, on the software side, it is necessary to change the base program (S8) and develop a program to control the accelerator (S9). To change the base program, replace the part extracted as “Axelare” with a process for updating with the “Axelare”. In other words, —Transfer of data to the evening, activation of the excitement, monitoring of the operation of the excitement, acceptance of a termination interrupt, and transfer of results. On the other hand, the program that controls the accelerator is not one that forms a desired control sequence by combining newly defined microinstructions. As described above, since the instruction set of the accelerator is defined by limiting the control instructions of the processor and redefining the operation instructions, the routine extracted as the accelerator can be rewritten accordingly. . Since only the type of operation is changed, the operation definition of the processor software is changed to easily create the software simulation of the processor. It can be said that creating a dedicated simulator for the X-Series that adopted a unique micro-program sequence has not been performed at all because the development and design required a great deal of time and effort. However, according to the development method shown in Fig. 1, it is easy to develop the simulation of the Xerasele and the function specifications of the Xerasele at an early stage of design.

Verification of a system that integrates a processor and an X-ray system can be easily performed by combining a software simulator of the processor with a software simulation of X-Series (S10). The hardware and software simulations of the X-ray simulation are realized in the first place by the functions of the processor hardware and the software-to-air simulation that have already been fully verified, with only limited functions and minor changes (replacement of the type of operation). Therefore, the possibility of human error in the hardware and software simulators of Accellare is quite low. Since the base program also does not change the control algorithm and only introduces special operations, the possibility of human error is considerably lower than before. A more integrated software simulation environment is provided, Even if human errors occur, they can be discovered and corrected early in the design process. In addition, the software simulation environment of the integrated system of the processor and the accelerator can be simulated 100 to 10,000 times faster than the logical simulation environment. Verification can be performed to a degree. If the simulation environment of the integrated system of the processor and the accelerator is a logic simulation level, only a few dozen simulation steps per second can be obtained. On the other hand, if an integrated system is verified by software simulation, an environment with thousands to hundreds of thousands of steps per second can be obtained. This dramatically increases the complexity of the functions that can be verified, resulting in extremely high design quality. Even if special conditions that occur once in hundreds of millions of steps can be verified by simulation within several hours if the integrated system is verified by software simulation, it will be tens of thousands of hours in logic simulation and unrealistic It is a target.

Next, a specific example of designing an accelerator from a certain processor will be described.

Fig. 2 shows an example of the target base program. Line numbers MSE-01 to MSE_14 are routines for calculating the root-mean-square error. The number of repetitions is as large as 128, and the calculation that is not defined as a single instruction in the processor instruction set called square calculation is performed. Is going. The routine is extracted as a routine that is suitable for being implemented in the Axelare overnight. It should be noted here that it is preferable to select a program whose base program immediately after the routine to be calculated by the accelerator does not require the calculation result of the accelerator. Even if the excelerat operates independently of the processor, if the processing of the base program itself waits for the operation result of the exercise, the effect of speeding up the data processing is low. Because.

FIG. 3 illustrates an arithmetic circuit of the processor. The processor is equipped with a register file 10 having 16 registers R 0 to R 15, an ALU (Arithmetic Logic Unit) 11, and a multiplier (MULT) 12, and the MUL 12 is a dedicated You have an output registry evening MAC 13.

The arithmetic operations used in the root mean square routine for the excel routine extracted from the routine in FIG. 2 are only addition and subtraction and the square. The occupied register is also R 0, R 4, R 5, RIO, R ll, There are 6 MACs. Fig. 4 shows the arithmetic circuit of the Axelalay designed based on this. Since the register window 14 only needs to be provided with the register window to be used, the five register windows are limited to R0: R4, R5, R10, and R11, and the rest are deleted. The ALU 11 is replaced by an adder / subtractor 15 and the multiplier 12 is only a square, so the input is short-circuited to form a square 16 and the circuit is simplified. In general, a squarer can be realized with a simpler circuit than a general-purpose multiplier. In addition, since the squarer 16 operates at high speed, the output MAC register 13 is deleted and connected to the general-purpose register.

With the simplification of the function as described above, the number of control signal lines can be reduced. The control signals of the processor are 12 bits, 5 bits, and 2 bits for the register file 10, ALU 11, and multiplier 12, respectively, as shown in FIG. The signal line for the Regis file 10 is 4 bits to select one of the 16 lines, and this is necessary for a total of 3 independent lines for 2 lines and 1 line. It has 12 bits. As shown in Fig. 4, the number of register evenings to be selected is reduced to five, so the register evening selection signal can be 3 bits and 3 systems, for a total of 9 bits. As shown in Fig. 3, the control signal for ALU 11 required 5 bits to select arithmetic operation such as addition / subtraction / shift and logical operation such as OR / AND. Arithmetic processing is performed in the x-ray Only one bit is required for selection of arithmetic / subtraction. In addition, in the case of the x-ray converter shown in FIG. 4, the input / output control of the output register MAC 13 of the multiplier 12 becomes completely unnecessary.

FIG. 5 shows an example of a processor instruction set, and FIG. 6 shows an example of an instruction set for the fax machine. The 17 instructions to be saboted by the processor are now 7 instructions in the Xerasele. The instruction codes of the seven instructions are the same as the instruction codes of the corresponding instruction numbers included in the processor, that is, the bit arrangement is the same.

Of the processor instruction numbers 1 to 6, only the instruction number 6 REPEAT instruction (rebeat instruction) is required at the accelerator. For the instruction numbers 7 to 11 of the processor, the addition and subtraction instructions of instruction numbers 7 and 8 are left, and the multiplication instruction MUL of instruction number 11 is replaced with the square operation instruction SQA, and included in the instruction set of the accelerator. ing. It goes without saying that the bit arrangements of the multiplication instruction MUL of instruction number 11 and the square operation instruction SQA are the same. The instruction set of the AXELLALE does not include the processor instruction numbers 12, 13, 14 and the data transfer instruction, and does not include the Mouth Register instruction to the MAC register. By reducing the number of supported instructions in addition to the reduction of the control signal lines given to the arithmetic circuit described above, the circuit size of the accelerator is reduced to less than half the circuit size of the processor. It goes without saying that the scale of the arithmetic circuit in the accelerator may be larger than the scale of the arithmetic circuit possessed by the processor, depending on the type of operation. This is different from the fact that the logic scale of the control circuit such as the sequence section and instruction decoder of the accelerator is not larger than that of the processor in any case.

Fig. 7 shows a system LSI configured by combining the designed accelerator with a processor. The processor 20 includes a program counter (PC) 21, an arithmetic circuit (Ex) 22, an instruction decoder (Inst-DEC) 23, and a state machine 24 for generating a control procedure. The state machine 24 holds the instruction address to be executed next in the program counter 21, and sends a control signal to the instruction decoder 23 based on the instruction fetched in accordance with the instruction address held in the program counter 21. Is configured to output a sequence control unit. The arithmetic circuit 22 constitutes an execution unit having a resistive circuit whose operation is controlled by the control signal and an arithmetic unit.

In the same way, AXELLALE 30 consists of a program counter (PC) 31, an arithmetic circuit (Ex) 32, an instruction decoder (Inst—DEC) 33, and a state machine 34 that generates control procedures. Have been. The state machine 34 stores the instruction address to be executed next in the program counter 31 and the instruction decoder 33 based on the instruction fetched in accordance with the instruction address held by the program counter 31. A sequence control unit for outputting a control signal to the control unit. The arithmetic circuit 32 constitutes an execution unit having a resistive circuit whose operation is controlled by the control signal and an arithmetic unit. However, the functions of the constituent circuits of the accelerator 30 are limited as compared with the constituent circuits of the processor 20 as described above.

The processor 20 is connected to the main memory 42 via the common address bus 40 and the common data bus 41, and operates by reading a program stored in the main memory 42. The storage unit 30 is connected to a local memory 43 via a local bus 44. When the operation is instructed by the processor 20, the external storage 30 reads and executes the program stored in the local memory 43. The data has also been transferred to the local memory 43 in advance. In FIG. 7, the data transfer control between the main memory 42 and the local memory 43 is controlled by the processor 2 0 does.

After the transfer of the data to the local memory 43 or the like is completed, the processor 20 issues an activation instruction for the excel processor 30 and the processor itself continues other processing. The actuator 30 informs the processor of the end of processing by an interrupt or the like and waits. Various methods have been proposed for the connection between the processor 20 and the excel server 30. The connection may be made in a manner that is optimal for the application system. In addition, the configuration of the common buses 40, 41 and the mouth-calbus 44 may be such that the instruction and data are transferred on the same bus, or the instruction and the data may be provided on separate buses.

In the example of FIG. 7, the system LSI 19 includes a processor 20, an accelerator 30, and a local memory 43, and is configured as one semiconductor chip. In FIG. 7, the address output path for operand fetch is not shown in the processor 20 and the accelerator unit 30, but it is actually different from the program counter 21 and 31. The address arithmetic unit is provided, and the address arithmetic unit outputs an address signal to the common address bus 40. Such an address arithmetic unit is provided in, for example, arithmetic circuits 22 and 32.

Fig. 8 shows an example of the program of the Axelare evening. The program shown in the figure implements the processing of the line numbers MSE-01 to MSE-14 extracted from the base program of FIG. According to the excel program of Fig. 8, the number of steps in the loop that is repeated 128 times by the: REPEAT instruction changes from 7 steps to 5 steps. In the case of a processor, processing that required 7 X 1 2 8 = 8 9 6 steps was reduced to 5 x 1 2 8 = 6 4 0 steps by replacing the processing with Axellare overnight. In addition, the effect of the performance improvement is extremely high because the processor can perform other processing during that period. FIG. 9 shows an example of the software IP module data of the processor. The soft IP module data shown in the figure adopts a data structure that can facilitate the design of the x-ray as a subset as described in the method of developing the system LSI.

In FIG. 9, the processor IP module data 50 includes control function definition information 51 and execution function definition information 52 as function definition information for defining processor functions, and control signal information 53. Have.

The control function definition information 51 includes, for example, a control block that reads and decodes an instruction included in a first instruction set and controls an execution procedure of an instruction included in the first instruction set. It is composed of data written in a combi-language language, consisting of a counter, instruction decoder, sequence control unit, etc. The execution function definition information 52 indicates that an execution block that performs arithmetic processing according to the control signal generated by the control block is configured by a register file, a computing unit, an internal bus, or the like. (The control function definition information 51 and the execution function definition information 52 are described in a convenience language such as HDL, and each of the functions can be changed.

The control signal information 53 is information that clearly specifies a control signal between the control block and the execution block. That is, the meaning and the logical value of the control signal between the control block for controlling the operation sequence of the processor and the execution block for performing the data calculation are disclosed to the user of the IP module data. Such control signal information 53 is, for example, a register selection signal, a calculation type selection signal, a flag output, and the like.

As described above, the functions and operations of each block are specified in the soft IP module data so that it is convenient for reconfiguring the function, and design constraints are clarified. I have. For example, the sequence The control unit is described in a state transition diagram. To limit the functions when the control unit is diverted to an application, an unnecessary state or state transition path may be deleted. The instruction decoder is defined as a truth table that receives instruction codes, states, etc. as inputs, and outputs control signals for register selection, operation type selection, etc., and removes unnecessary control signals to perform re-logic synthesis. By doing so, it is possible to generate the logic of the instruction decoder of the accelerator. The same applies to the program counter. For example, the functions may be reduced from those described in HDL and the logic may be re-synthesized. As for the execution unit, the number of registers and the number of ports in the registry file are described, so reduce them as necessary, and for the calculation, describe the arithmetic circuit as far as control signals are not added. Just replace it. At this time, it is only necessary to redesign to satisfy the constraints on characteristics such as delay specified in the processor's soft IP module. It is convenient for verification if constraints on characteristics such as delay are given in the form of a logic synthesis constraint file.

When diverting the processor design data to the design of the Xerasele, the control block is given the necessary functions as a subset without adding new functions, and the execution block is used for the arithmetic processing unique to the Xerasele. New functions can be given. Since the control signal information 53 is information that clearly specifies a control signal between the control block and the execution block, the control signal information 53 is used to control a new arithmetic function in the execution block, and is included in the control signal from the control block. Which control signal to assign can be easily determined. Therefore, the use of design data such as soft IP module data makes it extremely easy to design an accelerator.

FIG. 10 shows an example of a software simulation that can be used in the development method. The processor software simulation 60 is designed so that the control routine 61 has a function equivalent to that of hardware. In this case, the function routine 62 is defined as a function according to its calculation function. The data operation routine 62 is used as a function call from the control routine 61. In addition, the software simulator is combined with a user interface routine 63 and utility tools 64 such as assembler, disassembler, and debugger. In order to generate the software simulation of the X-Series, at least, the function definition should be replaced for the replaced calculation in the D-X calculation routine 62. It would be even easier if a syntax check for reduced instructions and operands could be added to the assembler or debugger. As for the hardware control program, only the substantial function reduction is performed, so that the control routine 61 does not need to be modified. As described above, the software simulator of Axelalay can be generated very easily.

By using the software simulator of the Xerasele and the software simulation of the processor, it is possible to verify the logic of each of the Xerasele alone and in combination with the processor. Compared to a design environment where software simulation is not provided, the design quality of XALA can be dramatically improved.

FIG. 11 shows an example of a computer such as an engineering workstation, a personal computer, or a design device used in the development of the system LSI. The combination 70 shown in Fig. 11 is composed of a processor board equipped with a processor and memory, and a main body 71 equipped with various interface boards, a display Ί2, a keyboard 73, and a disk. Peripherally-illustrated peripheral devices such as the drive 74 are connected and configured. The data of the IP module data and the software simulation of the processor are recorded on a recording medium 75 such as a magnetic tape, a floppy disk, a hard disk, a CD-ROM ^ MO (magnet optical disk), and read by a computer. It is memorized as possible. Although the recording medium 75 is not particularly limited, it is mounted on the disk drive 74, and the IP module data stored in the disk drive 74 and the software simulation data stored therein are stored in the main unit 7 of the convenience store. Read into 2. The computer main body 72 executes the program (system LSI development support program) for controlling the system LSI development procedure shown in FIG. 1 by using the read soft IP module data and the like.

The system SI development support program compiles, based on a source program that describes the processing contents of the system LSI development method described with reference to Fig. 1, etc. in a high-level language such as C, It is a machine language program (object program) that has been converted to an object code unique to the computer that will be the subject of the evening.

The development support program of the system LSI is not particularly limited, but is stored in a recording medium 76 illustrated in FIG. 11 so as to be readable by a computer. As the recording medium 76, a magnetic tape, a floppy disk, a hard disk, a CD-ROM, an MO (magnetic optical 'disk), or the like may be used.

Although not particularly limited, the recording medium 76 is mounted on the disk drive 74, and the system LSI development support program stored therein is read into the computer main body 71. For example, the read system LSI development support program is loaded into the memory of the main unit 71, and the above-described system LSI development support operation is performed while sequentially decoding the loaded programs. Or, recording medium 7 6 The system LSI development support program read from the computer may be installed on the magnetic recording medium of the hard disk device provided in the computer main body 71, and may be loaded into the memory at any time and executed therefrom. In this case, the system LSI development support program may be stored in the recording medium 76 in a compressed state, and decompressed when the program is installed on the hard disk.

In this way, by exposing the function description data of the hardware such as the IP module data of the processor and the design property such as the software simulation of the processor to the designer of the accelerator, a high-quality accelerator can be obtained. An overnight can be designed in a short time.

In the above description, the hardware of the control circuit is reduced by the amount limited to the function of the control circuit in the design of the excellence studio. As a second example, regarding the design of the accelerator, a method that does not intentionally reduce the hardware of the control circuit may be adopted. In the method shown in Fig. 1, even if the hardware is reduced, only unnecessary functions are removed from the circuit of the processor that has already been verified. effective. However, since the circuit is changed, at least verification work is necessary, and only the necessary simulations used in the processor verification are retried. If the designers of the processor and the designers of Axelaray are different, selecting only the necessary simulations may be a time-consuming task. In such a case, the control circuit of the processor may be used as it is for the function without reducing the hardware. There is no room for human error, and verification work is essentially unnecessary. Even if verification work is performed just in case, it can be done in a very short time.

Furthermore, as a third example, without changing the arithmetic circuit of the processor, It may be applied to Xerasele overnight. In this case, the effect that the number of processing steps can be reduced by defining an application-specific special operation cannot be obtained. On the other hand, as described above, the operation of verifying the operation circuit is essentially unnecessary, and even if it is performed just in case, it is possible to obtain an effect that the operation can be performed in a very short time. Furthermore, since the simulation of the processor can be used as it is, there is no need to change the design of the simulation. In addition, the routine extracted from the base program for application to the x-ray system is the control program for the x-ray system, and if the base program has been sufficiently verified, it is essentially unnecessary to repeat the verification. Absent.

As a fourth example, one fax / record may be used repeatedly in a plurality of routines. For example, if the operation specific to the application is a double-precision operation, a configuration in which a double-precision arithmetic unit is provided in the arithmetic circuit of the accelerator is adopted. In the case of digital servos, double-precision arithmetic is used because the arithmetic precision is insufficient with 16 bits. However, since the control is not so complicated, a processor with 16 bits or less is sufficient. By applying the present invention, an accelerator that combines a control unit of an 8-bit processor with an arithmetic circuit having a 24-bit or 32-bit precision may be realized in some cases. At this time, the processing performed in the x-axis can be performed in a wide variety of ways, such as filter processing and error calculation processing with different degrees of feedback, etc. It is desirable to support the processing sequence. For example, in FIG. 7, a method is adopted in which the program storage area of the local memory of the accelerator is set as a rewritable memory and a program corresponding to the processing to be executed is transferred before the start of the accelerator. This has the effect that the processing algorithm of the accelerator can be dynamically changed during the operation of the system LSI. The digital servo dynamically adjusts the order of the filter according to the behavior of the controlled system. This is effective when adjusting. If the processing sequence can be predicted, the local memory of the Accelerator is configured with a read-only memory and a plurality of processing programs are stored. By changing the start address of the program to be executed according to the required processing, it is possible to realize an accelerator that can handle multiple processing. Naturally, if the program is stored in the read-only memory in advance, the chip size is advantageous, and there is no overhead for transferring the program.

As a fifth example, a system LSI may be configured by mounting a plurality of actuators. As shown in Fig. 12, the configuration shown in Fig. 7 is expanded to include a plurality of accelerators 30A and 30B. Individual axelares—30 a and 30 b have local memories 43 a and 43 b so that multiple axelares 30 a and 30 b can be operated simultaneously in parallel . In a multitasking system, the operating system can be operated by the processor 20 and each task can be operated by individual accelerators 30OA and 30B. Time management becomes easier because each individual session is not disturbed by another operator performing another task. If the task is further increased, it can be easily added because the correlation between the functions is poor. Also, it is possible to easily support operations specific to each task. In a multi-processor system using the same processor, adding special operation instructions is naturally limited due to restrictions on instruction codes, and there are also compatibility issues. Is virtually impossible.

As a sixth example, we will describe how to start and stop the Xerasele. The activation of the accelerator may be performed by using a start instruction, or by writing parameters in a dedicated control register. In complex systems such as those shown in the fourth and fifth examples, starting by writing control parameters This is preferable because it is not restricted by the instruction set of the processor. In other words, for example, if it is attempted to directly start dozens of types of processing using a single or a plurality of accelerators with a start instruction, an identifier of about 5 bits will be embedded in the instruction code, and the instruction can be freely executed. Limit the degree. According to the access of the control register, the number of bits in the control register can be freely determined, so that there is no restriction on the instruction set of the processor.

An interrupt or status register may be used to terminate the process. Although the number of interrupts is limited by the number of interrupts allowed by the processor, the effect is that the transition from the end to the next processing can be performed quickly. If the number of interrupts accepted by the processor is less than the number of interrupts to be issued from the accelerator, the interrupts from multiple accelerators are combined into a logical sum and input to the processor, and the interrupt source is determined by the processor's interrupt processing routine. The problem can be solved by a conventionally used method such as making a judgment. On the other hand, in systems where overhead due to interrupts is a burden, the status register status should be monitored by the status register.

Each of the second to sixth examples has the above-described unique effects, but a combination of a plurality of examples is also sufficiently effective.

According to the above, the design of an accelerator can be realized in a short period of time and with high quality.

By providing an instruction set that is quasi-compatible with the processor, the software development environment can be easily provided for developing control programs for the accelerator.

By combining the development environment of Accelera Izuya with the development environment of the processor, it is possible to verify the integrated functions and performance of the system LSI in a short time and with high reliability. It can reduce the time required to develop the AXELLALE, and can provide a high-quality system LSI through integrated verification from the early stages of design.

The invention made by the present inventor has been specifically described based on the embodiments, but the present invention is not limited thereto, and can be variously modified without departing from the gist thereof.

For example, the number of accelerators mounted on a semiconductor integrated circuit such as the system LSI may be three or more. The main memory may be included in a semiconductor integrated circuit such as a system LSI. Conversely, an external local memory may be used. However, the number of external terminals (external bins) of the semiconductor integrated circuit increases. A semiconductor integrated circuit such as a system LSI may include other peripheral circuits. The function description language may be RTL (Register Transfer language). The content of HDL is standardized as IEEE1364. The processor may be a superscalar processor. Industrial applicability

The present invention is applicable not only to the development of semiconductor integrated circuits called system LSIs, but also to the development of various data processing LSIs or logic LSIs called single-chip microcomputers and data processors. Can be widely applied to.

Claims

The scope of the claims

1. A method for developing a semiconductor integrated circuit by adding an accelerator to a processor capable of executing an instruction included in a first instruction set, the method comprising: Designing the accelerator so that instructions of the second instruction set including the same instruction or instructions included in the first instruction set and having the same instruction code and having different execution results and including backward compatible instructions can be executed. A method for developing a semiconductor integrated circuit, comprising:

2. A method for developing a semiconductor integrated circuit by adding an accelerator to a processor capable of executing instructions included in a first instruction set, the method comprising: a base program developed to operate on the processor; Some of the extracted processes are executed with the same instruction code as the instruction included in the first instruction set or the same instruction code as the instruction included in the first instruction set. A process of designing an accelerator for realizing using a second instruction set including backward compatible instructions having different results; and

A process for designing an accommodation overnight control program for causing the part of the extracted processing to be executed by the accommodation overnight. A method for developing a semiconductor integrated circuit, comprising:

3. A process of extracting some functions of a base program developed to operate on a processor having the first instruction set;

A second instruction including the same instruction as the instruction included in the first instruction set or a lower compatible instruction having the same instruction code as the instruction included in the first instruction set and having a different execution result. A process of designing a logical configuration of an accelerator for realizing the extracted process using an instruction set, and an accelerator for causing the accelerator to execute the extracted process. A process of designing a laser control program;

A process of making a change to the pace program to replace the extracted process by the base program with a process by the accelerator control program;

Performing integrated simulation of the processor based on the changed base program and the accelerator controller according to the accelerator controller control program. The process of designing the accelerator control program includes the process of verifying the accelerator control program using the software simulation program, and the software simulation program of the accelerator program.

4. The method for developing a semiconductor integrated circuit according to claim 3, wherein the evening is obtained by partially changing a software simulation of the processor.

The process of performing the integrated simulation is a process performed by using an integrated simulator, and the integrated simulator is a combination of the software to air simulator of the Axelaray and the software simulation of the processor. 5. The method for developing a semiconductor integrated circuit according to claim 4, wherein:

A control unit that reads and decodes the instructions included in the first instruction set and controls the execution procedure of the instructions included in the first instruction set; and a control signal generated by the control unit. A processor having an execution unit that performs an arithmetic operation according to the

A control unit for fetching and decoding the instructions included in the second instruction set and controlling the execution procedure of the instructions included in the second instruction set; and a control signal generated by the control unit. And an execution unit for performing arithmetic processing. The second instruction set has the same instruction code as the instruction included in the first instruction set or the same instruction code as the instruction included in the first instruction set and has an execution result. Contains different backward compatible instructions,

The semiconductor integrated circuit according to claim 1, wherein the control unit of the accelerator has a logic function in which a part of the logic function in the control unit of the processor is deleted.

A control unit that reads and decodes instructions included in the first instruction set and controls an execution procedure of the instructions included in the first instruction set; and a control signal generated by the control unit. A processor having an execution unit that performs an arithmetic operation according to the

A control unit for fetching and decoding the instructions included in the second instruction set and controlling the execution procedure of the instructions included in the second instruction set; and a control signal generated by the control unit. And an execution unit for performing the arithmetic processing.

The second instruction set has the same instruction code as the instruction included in the first instruction set or the same instruction code as the instruction included in the first instruction set and is executed. Including backward-compatible instructions with different results,

The semiconductor integrated circuit according to claim 1, wherein the execution unit of the accelerator is configured such that a part of the execution function is deleted and a new execution function is added to the execution function of the execution unit of the processor.

9. The semiconductor integrated circuit according to claim 7, wherein the control unit of the accelerator unit has a logical function in which a part of the logical function in the control unit of the processor is deleted.

The control unit has a program counter, an instruction decoder, and a sequence control means. The sequence control means causes the program counter to store an instruction address to be executed next, and stores an instruction address held by the program counter. The instruction fetched based on the address is decoded by an instruction decoder to output a control signal.

8. The semiconductor integrated circuit according to claim 6, wherein the execution unit includes a register circuit and an operation unit whose operations are controlled by the control signal.

0. A part of the processing extracted from the base program developed to operate on the processor having the first instruction set is performed by using the same instruction or the same instruction as the instruction included in the first instruction set. A logical configuration of an accelerator for realizing an instruction included in the first instruction set by using a second instruction set including backward compatible instructions having the same instruction code and different execution results. The process of designing

A process for designing an axelalay evening control program for causing the axelatio evening to execute the extracted part of the process, and a program for causing the axelalay evening to be executed all the time on a convenience store. A recording medium characterized by that.

1. A part of the processing extracted from the base program developed to operate on the processor having the first instruction set is converted to the same instruction or the same instruction as the instruction included in the first instruction set. The logical configuration of the exceler to realize the instructions included in the first instruction set using the second instruction set including the backward compatible instructions having the same instruction code and different execution results is described. Process to design,

A process of designing an accelerator control program for causing the accelerator to execute the extracted part of the process; and

A process of making a change to the base program so that the extracted part of the process by the base program is replaced by a process by the accelerator control program; A process of performing an integrated simulation of the processor based on the changed base program and the function based on the function control program; and a program for causing a computer to execute the process. Recording medium characterized by comprising:

2. The process of designing the control program of the x-ray computer includes the process of verifying the control program of the x-cell program with the use of the x-ray software program. 11. The recording medium according to claim 11, wherein the evening is obtained by changing a part of a software simulation of the processor.

3. The process of performing the integrated simulation is a process performed by using an integrated simulator, and the integrated simulation is a combination of the software simulator of the exceler and the software simulator of the server. 13. The recording medium according to claim 12, wherein the recording medium comprises:

4. The function definition information for defining the function of the processor and the control signal information are recorded so as to be readable over the entire time,

The function definition information defines a function of a control unit that decodes and interprets an instruction included in the first instruction set and controls an execution procedure of an instruction included in the first instruction set. First information and second information that defines a function of an execution unit that performs arithmetic processing according to a control signal generated by the control unit, each of which separately includes a function that can be changed,

The recording medium according to claim 1, wherein the control signal information is information specifying a control signal between the control unit and the execution unit.