WO2000068782A1 - Method for developing semiconductor integrated circuit - Google Patents

Method for developing semiconductor integrated circuit Download PDF

Info

Publication number
WO2000068782A1
WO2000068782A1 PCT/JP1999/002348 JP9902348W WO0068782A1 WO 2000068782 A1 WO2000068782 A1 WO 2000068782A1 JP 9902348 W JP9902348 W JP 9902348W WO 0068782 A1 WO0068782 A1 WO 0068782A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
instruction set
processor
program
accelerator
Prior art date
Application number
PCT/JP1999/002348
Other languages
French (fr)
Japanese (ja)
Inventor
Hirotsugu Kojima
Tadaaki Tanimoto
Haruo Kamimaki
Tetsuya Nakagawa
Yuki Inoue
Original Assignee
Hitachi, Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi, Ltd. filed Critical Hitachi, Ltd.
Priority to PCT/JP1999/002348 priority Critical patent/WO2000068782A1/en
Publication of WO2000068782A1 publication Critical patent/WO2000068782A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3877Concurrent instruction execution, e.g. pipeline, look ahead using a slave processor, e.g. coprocessor

Definitions

  • the present invention relates to a method for developing a semiconductor integrated circuit by combining a processor and an accelerator, and a recording medium storing design data and a development program used in the development method. It relates to technology that is effective when applied to a method of developing a system LSI (Large Scale Integrated circuit) using the method of Application Specific Integrated Circuit (Application). Background art
  • a functional unit such as an arithmetic function and a signal control function to be provided therein is called a block (module).
  • a block module
  • the layout of that part on the semiconductor substrate, that is, the layout design is completed, and data representing multiple mask patterns for forming the layout is provided to the chip designer as parts.
  • a hard macro block (hard module) to be provided.
  • a node macro block is also called a node IP (Intellectual Property) module.
  • HDL Hardware Description Language
  • the data of the mask pattern that represents the layout of the circuit on the semiconductor substrate (for example, forming a mask pattern) And the like are provided.
  • a hard macroblock there is a so-called soft macroblock.
  • the function of the block (circuit) is specified by a description such as HDL, and the description is provided as a component to a chip designer.
  • Such a soft macro block is also called a soft IP module for a hard IP module.
  • the circuit size of macroblocks such as the above-mentioned hard IP module and soft IP module is as follows: Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), and Dynamic Random Access Memory (DRAM). It may cover functional units such as access (memory), CPU (Central Processing Unit), and DSP (Digital Signal Processor).
  • the present inventor has studied a case where an ASICS system LSI is developed using a soft IP module of a processor. In other words, if it is not possible to obtain the required performance if all of the required processing is realized by software on the processor, consider the case where some of the processing is realized by using dedicated hardware such as Xcelare. did.
  • the term “executable” means a dedicated circuit for bearing a part of the functions that can be realized on the processor.
  • the design method of the system LSI studied by the inventor is shown in FIG.
  • the contents of FIG. 13 are not known.
  • a program (base program) that runs on the processor is developed.
  • the base program is a program development environment of a processor, that is, a compiler, software It will be developed using development tools such as Airushi Miura and a debugger. Verification that the developed base program satisfies the functional specifications is performed at this stage.
  • Verification that the developed base program satisfies the functional specifications is performed at this stage.
  • the time constraint to be verified at this stage is a constraint determined by the relationship between the system LSI and the outside world, and may be as long as a few milliseconds to a few seconds.
  • Soft-to-air simulations are generally thousands to tens of thousands of times faster than logic simulations, and are useful for such verification.
  • the hardware side of the AXELLALE is completed through the design of arithmetic circuits, definition of microcommands, and design of control circuits.
  • On the software side it is necessary to modify the base program and develop a microprogram to control the excel process.
  • the change of the base program replaces the part extracted as "Axellare" with the processing for communication with "Axellare".
  • the microprogram that controls the accelerator must form a desired control sequence by combining the defined microinstructions.
  • the design of an axelator is usually divided into a part that performs the de- night calculation and a part that controls the sequence.
  • the design of the part that performs data operation is Although it is relatively easy, the design of the part that controls the sequence is complicated and is a time-consuming process even for engineers with expertise in LSI design. This is a major burden when developing system LSIs in a short time.
  • the inventor has found the following problems in the above development method.
  • the first problem is that there is no development environment in the development of micro-programs for controlling Xerasele. Since the micro-programs and control circuits of AXELLALE are designed optimally according to the application, that is, they are individually designed, the software development environment for the individual micro-programs that are subject to such individual design That is, software simulators corresponding to the defined microinstructions are not usually provided from the viewpoint of time and economy. Therefore, verification that the microprogram is operating correctly is performed by a logic simulation or the like at the time of hardware design. At this stage, not only defects in the microprogram itself but also defects in the definition of microinstructions and control circuits are often found, causing significant design rework. In short, the problem that the development period of the semiconductor integrated circuit is prolonged is caused by the fact that the original design of the control circuit and the like of Accelare is freely accepted.
  • the second problem is the difficulty in verifying the integrated functions and performance of the system LSI. Even if the function of the base program is verified at the initial stage, whether the function is realized by the Axelaray, whether the program change for communication with the Axelare is correct, etc. As a means of verifying whether the entire system LSI operates correctly in a unified manner, only simulations for hardware design such as logic simulation can be used. In general, hardware simulation Since Yon has several orders of magnitude lower speed performance than the software development tool software simulation, detailed verification at the system level is practically impossible ( the difficulty of integrated verification as a system LSI is also difficult). The reason is that the company has approved its own design.
  • the present inventor has developed a software simulation of a processor which has already been provided.
  • Still another object of the present invention is to enable the above-mentioned development method to be immediately executed using a computer.
  • Still another object of the present invention is to make it possible to easily realize the above development method from the viewpoint of design data or design assets.
  • Another object of the present invention is to provide a semiconductor integrated circuit in which the development cost is reduced and the operation reliability is improved in a semiconductor integrated circuit which aims to speed up data processing by combining a processor and an accelerator. To provide.
  • the method for developing a semiconductor integrated circuit according to the present invention includes a first instruction set.
  • the method includes a process of designing the excel instruction so that an instruction of a second instruction set including a backward compatible instruction having the same instruction code and an execution result different from that of an instruction included in the instruction set can be executed.
  • the processor is, for example, a reduced instruction set computer (RISC) architecture, and basically executes each instruction in one clock cycle.
  • RISC reduced instruction set computer
  • the configuration in which the second instruction set includes the same or lower-compatible instructions with respect to the instructions included in the first instruction set is a method of controlling instruction flipping, decoding, and instruction execution procedures in an accelerator. It specifies that the logic to be implemented is realized by reducing the number of functions without adding new functions to the control logic of the processor. That is, the accelerator is configured as a subset of the processor. For example, the code of the addition instruction included in the first instruction set is assigned to the Galois field multiplication function in the second instruction set, and the type of control signal obtained by decoding the instruction code is the same as the decoding result in the processor. However, these control signals are diverted to the arithmetic control signals of the Galois field multiplier added to the AXELLALE.
  • the Galois field multiplier will adopt a logic configuration that can perform Galois field multiplication using the control signal.
  • the processor has the RISC architecture, it is very easy to use the control signal obtained by decoding the instruction code to control an arithmetic unit such as a Galois field multiplier different from the processor. is there.
  • a semiconductor integrated circuit developed by such a method will realize a reduction in development cost and an improvement in operation reliability.
  • a method of developing a semiconductor integrated circuit in a more detailed embodiment according to the present invention includes a process of extracting a part of a function of a base program developed to operate on a processor having a first instruction set; and a method of extracting a part of a function included in the first instruction set.
  • the same instruction or the second instruction set including a lower compatible instruction having an execution result different from the instruction code included in the first instruction set.
  • the process for designing the control program includes the control process.
  • the software simulation of the accelerator can be obtained by changing a part of the software of the processor:!: Simulation.
  • the process of performing the integrated simulation is a process performed using an integrated simulation, and the integrated simulator obtains a combination of the software simulation of the accelerator and the software simulation of the processor. be able to.
  • a semiconductor integrated circuit designed using the above-described development method reads and decodes instructions included in the first instruction set, and executes instructions included in the first instruction set.
  • a processor having a control unit for controlling the procedure, an execution unit for performing arithmetic processing in accordance with the control signal generated by the control unit, and fetching and decoding instructions included in the second instruction set
  • a control unit for controlling an execution procedure of an instruction included in the second instruction set; and an execution unit having an execution unit for performing arithmetic processing according to a control signal generated by the control unit.
  • the second instruction set has the same instruction code as the instruction included in the first instruction set or the same instruction code as the instruction included in the first instruction set and has a different execution result. Includes backward compatible instructions.
  • the control unit of the accelerator has a logical function in which a part of the logical function in the control unit of the processor is deleted.
  • a semiconductor integrated circuit includes a processor and an exciter similar to the above, and the second instruction set includes instructions included in the first instruction set.
  • the same instruction or the instruction of the lower compatibility described above is included in the execution unit, but some execution functions are deleted from the execution unit of the processor, and a new execution function is added instead. May be adopted.
  • the control unit includes, for example, a program counter, an instruction decoder, and a sequencer.
  • the sequence control means causes the program counter to hold the instruction address to be executed next, and causes the instruction decoder to output a control signal based on the instruction fetched according to the instruction address held by the program count.
  • the execution unit includes a register circuit whose operation is controlled by the control signal and an arithmetic unit.
  • a recording medium storing a program to be executed by a computer to implement the development method is a part of a base program extracted from a base program developed to operate on a processor having a first instruction set. The processing is performed by using the same instruction for the instructions included in the first instruction set or the lower compatible with different execution results having the same instruction code for the instructions included in the first instruction set.
  • a process of designing a logical configuration of an x-ray instruction for realizing using a second instruction set including instructions, and an x-ray control for causing the extracted part of the processing to be executed in the x-ray display in the evening The process of designing a program and the program for causing the program to be executed on a computer are recorded in a computer-readable form.
  • the program includes a process of making a change to the base program to substitute the extracted part of the process by the base program with a process by the accelerator control program; and a process of changing the base program.
  • the program may further include a process of performing an integrated simulation of the processor by the program and the exercise by the exercise control program.
  • design data such as IP module data of a processor can be used.
  • the design data is to be used for the development of the Xerasele, which is positioned as a subset of the processor, the design data contains function definition information for defining the functions of the processor.
  • Information and control signal information are recorded on a recording medium in a computer-readable manner and provided.
  • the function definition information includes a first definition that defines the function of a control unit that performs the reading and decoding of the instructions included in the first instruction set and controls the execution procedure of the instructions included in the first instruction set.
  • Information and second information that defines the function of an execution unit that performs arithmetic processing in accordance with the control signal generated by the control unit, each of which has a function that can be changed.
  • FIG. 1 is a flowchart showing an example of a method for developing a semiconductor integrated circuit according to the present invention.
  • FIG. 2 is an explanatory diagram showing an example of a base program.
  • FIG. 3 is a block diagram showing an example of an arithmetic circuit of the processor.
  • FIG. 4 is a block diagram showing an example of the arithmetic circuit of the Xerasele.
  • FIG. 5 is an explanatory diagram illustrating an instruction set of a processor.
  • FIG. 6 is an explanatory diagram exemplifying an instruction set of an excelerre.
  • FIG. 7 is a block diagram of a system LSI configured by combining a processor with an accelerator.
  • FIG. 8 is an explanatory diagram showing an example of a program for an axelare evening.
  • FIG. 9 is an explanatory diagram showing an example of the IP module data of the processor.
  • FIG. 10 is an explanatory diagram showing an example of the software simulation of the processor.
  • FIG. 11 is a perspective view showing an example of a computer used for developing the system LSI.
  • FIG. 12 is a block diagram of a system LSI having two processors together with a processor.
  • FIG. 13 is a flowchart showing a method of designing the system LSI studied by the inventor previously.
  • FIG. 1 is a flowchart showing an example of a method for developing a semiconductor integrated circuit according to the present invention.
  • a program that runs on the processor is developed (S1).
  • the target processor here is in the form of a reduced instruction set computer (RISC).
  • the base program is developed using a program development environment of a processor, that is, a development tool such as a compiler, a software simulator, a debugger, or the like. Verification that the developed base program satisfies the functional specifications is performed at this stage using a software simulator.
  • the time constraints to be verified at this stage are constraints determined by the relationship between the system LSI and the outside world.
  • the software simulation is generally thousands to tens of thousands of times faster than the logic simulation, and is useful for such verification.
  • the system LSI is realized using the processor as it is (S3). If the time constraints cannot be satisfied, the system will be implemented by improving the processing capacity by introducing the best practice, such as the introduction of Xerasele.
  • a processing routine to be realized in the accelerator is extracted from a base program (S4). The extraction criterion focuses on the fact that it is a bottleneck in data processing performance when implemented on a processor, such as simple repetition of processing many times or application-specific operations.
  • step S4 if an operation specific to the application has been extracted, an arithmetic circuit suitable for the extracted operation is designed (S5).
  • the operations specific to the application include, for example, floating-point operations and double-precision operations in computer graphics, root-mean-square operations in error calculation, and Gaussian operations in error correction. Such special operations are often not directly supported by general-purpose processors, and are implemented by combining existing instructions, thus requiring multiple instruction processing steps. Often it can be achieved.
  • the instruction set for the X-Axelare is defined (S6).
  • the arithmetic operation instruction of the processor is defined as a special operation, and further restricted to the minimum necessary control instructions.
  • a set In other words, a special operation is supported by replacing only the type of operation without changing the number or type of operands in the operation instruction of the processor.
  • the instruction set (second instruction set) of the AXELLALE is the same instruction as the instruction included in the instruction set (first instruction set) of the processor or included in the first instruction set.
  • control logic circuit of the processor so as to conform to the instruction set of the X-ray transmission defined above, and the control logic circuit of the X-ray transmission is designed (S7).
  • the relationship between the control logic circuit of the Xerasele and the control logic circuit of the processor is as described above. If the processor's control logic circuit is given in a high-level description such as the register transfer function, unsupported functions should be deleted from the description and the control logic circuit re-synthesized. If it is defined in the truth table, the function is limited, and unnecessary combinations of input values are deleted to reduce the size of the truth table. The output whose value does not change due to the reduction of the truth table enables the reduction of another truth table to which the value is input, and eventually the circuit size of the entire control logic circuit is reduced.
  • Verification of a system that integrates a processor and an X-ray system can be easily performed by combining a software simulator of the processor with a software simulation of X-Series (S10).
  • the hardware and software simulations of the X-ray simulation are realized in the first place by the functions of the processor hardware and the software-to-air simulation that have already been fully verified, with only limited functions and minor changes (replacement of the type of operation). Therefore, the possibility of human error in the hardware and software simulators of Accellare is quite low. Since the base program also does not change the control algorithm and only introduces special operations, the possibility of human error is considerably lower than before. A more integrated software simulation environment is provided, Even if human errors occur, they can be discovered and corrected early in the design process.
  • the software simulation environment of the integrated system of the processor and the accelerator can be simulated 100 to 10,000 times faster than the logical simulation environment. Verification can be performed to a degree. If the simulation environment of the integrated system of the processor and the accelerator is a logic simulation level, only a few dozen simulation steps per second can be obtained. On the other hand, if an integrated system is verified by software simulation, an environment with thousands to hundreds of thousands of steps per second can be obtained. This dramatically increases the complexity of the functions that can be verified, resulting in extremely high design quality. Even if special conditions that occur once in hundreds of millions of steps can be verified by simulation within several hours if the integrated system is verified by software simulation, it will be tens of thousands of hours in logic simulation and unrealistic It is a target.
  • Fig. 2 shows an example of the target base program.
  • Line numbers MSE-01 to MSE_14 are routines for calculating the root-mean-square error. The number of repetitions is as large as 128, and the calculation that is not defined as a single instruction in the processor instruction set called square calculation is performed. Is going.
  • the routine is extracted as a routine that is suitable for being implemented in the Axelare overnight. It should be noted here that it is preferable to select a program whose base program immediately after the routine to be calculated by the accelerator does not require the calculation result of the accelerator. Even if the excelerat operates independently of the processor, if the processing of the base program itself waits for the operation result of the exercise, the effect of speeding up the data processing is low. Because.
  • FIG. 3 illustrates an arithmetic circuit of the processor.
  • the processor is equipped with a register file 10 having 16 registers R 0 to R 15, an ALU (Arithmetic Logic Unit) 11, and a multiplier (MULT) 12, and the MUL 12 is a dedicated You have an output registry evening MAC 13.
  • ALU Arimetic Logic Unit
  • MULT multiplier
  • the arithmetic operations used in the root mean square routine for the excel routine extracted from the routine in FIG. 2 are only addition and subtraction and the square.
  • the occupied register is also R 0, R 4, R 5, RIO, R ll, There are 6 MACs.
  • Fig. 4 shows the arithmetic circuit of the Axelalay designed based on this. Since the register window 14 only needs to be provided with the register window to be used, the five register windows are limited to R0: R4, R5, R10, and R11, and the rest are deleted.
  • the ALU 11 is replaced by an adder / subtractor 15 and the multiplier 12 is only a square, so the input is short-circuited to form a square 16 and the circuit is simplified. In general, a squarer can be realized with a simpler circuit than a general-purpose multiplier. In addition, since the squarer 16 operates at high speed, the output MAC register 13 is deleted and connected to the general-purpose register.
  • the control signals of the processor are 12 bits, 5 bits, and 2 bits for the register file 10, ALU 11, and multiplier 12, respectively, as shown in FIG.
  • the signal line for the Regis file 10 is 4 bits to select one of the 16 lines, and this is necessary for a total of 3 independent lines for 2 lines and 1 line. It has 12 bits.
  • the number of register evenings to be selected is reduced to five, so the register evening selection signal can be 3 bits and 3 systems, for a total of 9 bits.
  • the control signal for ALU 11 required 5 bits to select arithmetic operation such as addition / subtraction / shift and logical operation such as OR / AND.
  • Arithmetic processing is performed in the x-ray Only one bit is required for selection of arithmetic / subtraction. In addition, in the case of the x-ray converter shown in FIG. 4, the input / output control of the output register MAC 13 of the multiplier 12 becomes completely unnecessary.
  • FIG. 5 shows an example of a processor instruction set
  • FIG. 6 shows an example of an instruction set for the fax machine.
  • the 17 instructions to be saboted by the processor are now 7 instructions in the Xerasele.
  • the instruction codes of the seven instructions are the same as the instruction codes of the corresponding instruction numbers included in the processor, that is, the bit arrangement is the same.
  • the instruction set of the AXELLALE does not include the processor instruction numbers 12, 13, 14 and the data transfer instruction, and does not include the Mouth Register instruction to the MAC register.
  • the circuit size of the accelerator is reduced to less than half the circuit size of the processor. It goes without saying that the scale of the arithmetic circuit in the accelerator may be larger than the scale of the arithmetic circuit possessed by the processor, depending on the type of operation. This is different from the fact that the logic scale of the control circuit such as the sequence section and instruction decoder of the accelerator is not larger than that of the processor in any case.
  • Fig. 7 shows a system LSI configured by combining the designed accelerator with a processor.
  • the processor 20 includes a program counter (PC) 21, an arithmetic circuit (Ex) 22, an instruction decoder (Inst-DEC) 23, and a state machine 24 for generating a control procedure.
  • the state machine 24 holds the instruction address to be executed next in the program counter 21, and sends a control signal to the instruction decoder 23 based on the instruction fetched in accordance with the instruction address held in the program counter 21.
  • Is configured to output a sequence control unit.
  • the arithmetic circuit 22 constitutes an execution unit having a resistive circuit whose operation is controlled by the control signal and an arithmetic unit.
  • AXELLALE 30 consists of a program counter (PC) 31, an arithmetic circuit (Ex) 32, an instruction decoder (Inst—DEC) 33, and a state machine 34 that generates control procedures. Have been.
  • the state machine 34 stores the instruction address to be executed next in the program counter 31 and the instruction decoder 33 based on the instruction fetched in accordance with the instruction address held by the program counter 31.
  • a sequence control unit for outputting a control signal to the control unit.
  • the arithmetic circuit 32 constitutes an execution unit having a resistive circuit whose operation is controlled by the control signal and an arithmetic unit.
  • the functions of the constituent circuits of the accelerator 30 are limited as compared with the constituent circuits of the processor 20 as described above.
  • the processor 20 is connected to the main memory 42 via the common address bus 40 and the common data bus 41, and operates by reading a program stored in the main memory 42.
  • the storage unit 30 is connected to a local memory 43 via a local bus 44. When the operation is instructed by the processor 20, the external storage 30 reads and executes the program stored in the local memory 43. The data has also been transferred to the local memory 43 in advance. In FIG. 7, the data transfer control between the main memory 42 and the local memory 43 is controlled by the processor 2 0 does.
  • the processor 20 issues an activation instruction for the excel processor 30 and the processor itself continues other processing.
  • the actuator 30 informs the processor of the end of processing by an interrupt or the like and waits.
  • Various methods have been proposed for the connection between the processor 20 and the excel server 30. The connection may be made in a manner that is optimal for the application system.
  • the configuration of the common buses 40, 41 and the mouth-calbus 44 may be such that the instruction and data are transferred on the same bus, or the instruction and the data may be provided on separate buses.
  • the system LSI 19 includes a processor 20, an accelerator 30, and a local memory 43, and is configured as one semiconductor chip.
  • the address output path for operand fetch is not shown in the processor 20 and the accelerator unit 30, but it is actually different from the program counter 21 and 31.
  • the address arithmetic unit is provided, and the address arithmetic unit outputs an address signal to the common address bus 40.
  • Such an address arithmetic unit is provided in, for example, arithmetic circuits 22 and 32.
  • FIG. 8 shows an example of the program of the Axelare evening.
  • the program shown in the figure implements the processing of the line numbers MSE-01 to MSE-14 extracted from the base program of FIG.
  • the number of steps in the loop that is repeated 128 times by the: REPEAT instruction changes from 7 steps to 5 steps.
  • the effect of the performance improvement is extremely high because the processor can perform other processing during that period.
  • FIG. 9 shows an example of the software IP module data of the processor.
  • the soft IP module data shown in the figure adopts a data structure that can facilitate the design of the x-ray as a subset as described in the method of developing the system LSI.
  • the processor IP module data 50 includes control function definition information 51 and execution function definition information 52 as function definition information for defining processor functions, and control signal information 53. Have.
  • the control function definition information 51 includes, for example, a control block that reads and decodes an instruction included in a first instruction set and controls an execution procedure of an instruction included in the first instruction set. It is composed of data written in a combi-language language, consisting of a counter, instruction decoder, sequence control unit, etc.
  • the execution function definition information 52 indicates that an execution block that performs arithmetic processing according to the control signal generated by the control block is configured by a register file, a computing unit, an internal bus, or the like. (The control function definition information 51 and the execution function definition information 52 are described in a convenience language such as HDL, and each of the functions can be changed.
  • the control signal information 53 is information that clearly specifies a control signal between the control block and the execution block. That is, the meaning and the logical value of the control signal between the control block for controlling the operation sequence of the processor and the execution block for performing the data calculation are disclosed to the user of the IP module data.
  • Such control signal information 53 is, for example, a register selection signal, a calculation type selection signal, a flag output, and the like.
  • the functions and operations of each block are specified in the soft IP module data so that it is convenient for reconfiguring the function, and design constraints are clarified.
  • the control unit is described in a state transition diagram. To limit the functions when the control unit is diverted to an application, an unnecessary state or state transition path may be deleted.
  • the instruction decoder is defined as a truth table that receives instruction codes, states, etc. as inputs, and outputs control signals for register selection, operation type selection, etc., and removes unnecessary control signals to perform re-logic synthesis. By doing so, it is possible to generate the logic of the instruction decoder of the accelerator. The same applies to the program counter.
  • the functions may be reduced from those described in HDL and the logic may be re-synthesized.
  • the execution unit the number of registers and the number of ports in the registry file are described, so reduce them as necessary, and for the calculation, describe the arithmetic circuit as far as control signals are not added. Just replace it. At this time, it is only necessary to redesign to satisfy the constraints on characteristics such as delay specified in the processor's soft IP module. It is convenient for verification if constraints on characteristics such as delay are given in the form of a logic synthesis constraint file.
  • control block When diverting the processor design data to the design of the Xerasele, the control block is given the necessary functions as a subset without adding new functions, and the execution block is used for the arithmetic processing unique to the Xerasele. New functions can be given. Since the control signal information 53 is information that clearly specifies a control signal between the control block and the execution block, the control signal information 53 is used to control a new arithmetic function in the execution block, and is included in the control signal from the control block. Which control signal to assign can be easily determined. Therefore, the use of design data such as soft IP module data makes it extremely easy to design an accelerator.
  • FIG. 10 shows an example of a software simulation that can be used in the development method.
  • the processor software simulation 60 is designed so that the control routine 61 has a function equivalent to that of hardware.
  • the function routine 62 is defined as a function according to its calculation function.
  • the data operation routine 62 is used as a function call from the control routine 61.
  • the software simulator is combined with a user interface routine 63 and utility tools 64 such as assembler, disassembler, and debugger.
  • the function definition should be replaced for the replaced calculation in the D-X calculation routine 62. It would be even easier if a syntax check for reduced instructions and operands could be added to the assembler or debugger.
  • the hardware control program only the substantial function reduction is performed, so that the control routine 61 does not need to be modified.
  • the software simulator of Axelalay can be generated very easily.
  • FIG. 11 shows an example of a computer such as an engineering workstation, a personal computer, or a design device used in the development of the system LSI.
  • the combination 70 shown in Fig. 11 is composed of a processor board equipped with a processor and memory, and a main body 71 equipped with various interface boards, a display ⁇ 2, a keyboard 73, and a disk. Peripherally-illustrated peripheral devices such as the drive 74 are connected and configured.
  • the data of the IP module data and the software simulation of the processor are recorded on a recording medium 75 such as a magnetic tape, a floppy disk, a hard disk, a CD-ROM ⁇ MO (magnet optical disk), and read by a computer. It is memorized as possible.
  • the recording medium 75 is not particularly limited, it is mounted on the disk drive 74, and the IP module data stored in the disk drive 74 and the software simulation data stored therein are stored in the main unit 7 of the convenience store. Read into 2.
  • the computer main body 72 executes the program (system LSI development support program) for controlling the system LSI development procedure shown in FIG. 1 by using the read soft IP module data and the like.
  • the system SI development support program compiles, based on a source program that describes the processing contents of the system LSI development method described with reference to Fig. 1, etc. in a high-level language such as C, It is a machine language program (object program) that has been converted to an object code unique to the computer that will be the subject of the evening.
  • the development support program of the system LSI is not particularly limited, but is stored in a recording medium 76 illustrated in FIG. 11 so as to be readable by a computer.
  • a recording medium 76 a magnetic tape, a floppy disk, a hard disk, a CD-ROM, an MO (magnetic optical 'disk), or the like may be used.
  • the recording medium 76 is mounted on the disk drive 74, and the system LSI development support program stored therein is read into the computer main body 71.
  • the read system LSI development support program is loaded into the memory of the main unit 71, and the above-described system LSI development support operation is performed while sequentially decoding the loaded programs.
  • recording medium 7 6 The system LSI development support program read from the computer may be installed on the magnetic recording medium of the hard disk device provided in the computer main body 71, and may be loaded into the memory at any time and executed therefrom.
  • the system LSI development support program may be stored in the recording medium 76 in a compressed state, and decompressed when the program is installed on the hard disk.
  • the hardware of the control circuit is reduced by the amount limited to the function of the control circuit in the design of the excellence studio.
  • a method that does not intentionally reduce the hardware of the control circuit may be adopted. In the method shown in Fig. 1, even if the hardware is reduced, only unnecessary functions are removed from the circuit of the processor that has already been verified. effective. However, since the circuit is changed, at least verification work is necessary, and only the necessary simulations used in the processor verification are retried. If the designers of the processor and the designers of Axelaray are different, selecting only the necessary simulations may be a time-consuming task. In such a case, the control circuit of the processor may be used as it is for the function without reducing the hardware. There is no room for human error, and verification work is essentially unnecessary. Even if verification work is performed just in case, it can be done in a very short time.
  • one fax / record may be used repeatedly in a plurality of routines.
  • the operation specific to the application is a double-precision operation
  • a configuration in which a double-precision arithmetic unit is provided in the arithmetic circuit of the accelerator is adopted.
  • double-precision arithmetic is used because the arithmetic precision is insufficient with 16 bits.
  • a processor with 16 bits or less is sufficient.
  • an accelerator that combines a control unit of an 8-bit processor with an arithmetic circuit having a 24-bit or 32-bit precision may be realized in some cases.
  • the processing performed in the x-axis can be performed in a wide variety of ways, such as filter processing and error calculation processing with different degrees of feedback, etc. It is desirable to support the processing sequence.
  • a method is adopted in which the program storage area of the local memory of the accelerator is set as a rewritable memory and a program corresponding to the processing to be executed is transferred before the start of the accelerator. This has the effect that the processing algorithm of the accelerator can be dynamically changed during the operation of the system LSI.
  • the digital servo dynamically adjusts the order of the filter according to the behavior of the controlled system. This is effective when adjusting.
  • the local memory of the Accelerator is configured with a read-only memory and a plurality of processing programs are stored.
  • the start address of the program By changing the start address of the program to be executed according to the required processing, it is possible to realize an accelerator that can handle multiple processing.
  • the program is stored in the read-only memory in advance, the chip size is advantageous, and there is no overhead for transferring the program.
  • a system LSI may be configured by mounting a plurality of actuators.
  • the configuration shown in Fig. 7 is expanded to include a plurality of accelerators 30A and 30B.
  • Individual axelares—30 a and 30 b have local memories 43 a and 43 b so that multiple axelares 30 a and 30 b can be operated simultaneously in parallel .
  • the operating system can be operated by the processor 20 and each task can be operated by individual accelerators 30OA and 30B. Time management becomes easier because each individual session is not disturbed by another operator performing another task. If the task is further increased, it can be easily added because the correlation between the functions is poor. Also, it is possible to easily support operations specific to each task. In a multi-processor system using the same processor, adding special operation instructions is naturally limited due to restrictions on instruction codes, and there are also compatibility issues. Is virtually impossible.
  • the activation of the accelerator may be performed by using a start instruction, or by writing parameters in a dedicated control register.
  • starting by writing control parameters This is preferable because it is not restricted by the instruction set of the processor.
  • an identifier of about 5 bits will be embedded in the instruction code, and the instruction can be freely executed.
  • Limit the degree According to the access of the control register, the number of bits in the control register can be freely determined, so that there is no restriction on the instruction set of the processor.
  • An interrupt or status register may be used to terminate the process.
  • the number of interrupts is limited by the number of interrupts allowed by the processor, the effect is that the transition from the end to the next processing can be performed quickly. If the number of interrupts accepted by the processor is less than the number of interrupts to be issued from the accelerator, the interrupts from multiple accelerators are combined into a logical sum and input to the processor, and the interrupt source is determined by the processor's interrupt processing routine. The problem can be solved by a conventionally used method such as making a judgment. On the other hand, in systems where overhead due to interrupts is a burden, the status register status should be monitored by the status register.
  • Each of the second to sixth examples has the above-described unique effects, but a combination of a plurality of examples is also sufficiently effective.
  • the design of an accelerator can be realized in a short period of time and with high quality.
  • the software development environment can be easily provided for developing control programs for the accelerator.
  • the number of accelerators mounted on a semiconductor integrated circuit such as the system LSI may be three or more.
  • the main memory may be included in a semiconductor integrated circuit such as a system LSI.
  • an external local memory may be used.
  • the number of external terminals (external bins) of the semiconductor integrated circuit increases.
  • a semiconductor integrated circuit such as a system LSI may include other peripheral circuits.
  • the function description language may be RTL (Register Transfer language).
  • the content of HDL is standardized as IEEE1364.
  • the processor may be a superscalar processor. Industrial applicability
  • the present invention is applicable not only to the development of semiconductor integrated circuits called system LSIs, but also to the development of various data processing LSIs or logic LSIs called single-chip microcomputers and data processors. Can be widely applied to.

Abstract

A method for developing a semiconductor integrated circuit by adding an accelerator to a processor capable of executing instructions included in a first instruction set, comprises the step of designing the accelerator capable of executing an instruction, which is the same instruction included in the first instruction set or has the same instruction codes as those in an instruction in the first instruction set but brings a different result, in a second instruction set including an instruction of a low-order compatible. Since the second instruction set includes the same instruction included in the first instruction set or includes an instruction having the same instruction codes as those of an instruction in the first instruction set, any other functions are not added to the logic performing interpretation of an instruction fetch and control of the instruction execution procedure in the accelerator with respect to the control logic of the processor, and the logic realized by function reduction is defined. The time required to develop a semiconductor integrated circuit having a combination of a processor and an accelerator can be shortened.

Description

明 細 半導体集積回路の開発方法  Details Development method of semiconductor integrated circuit
技術分野 Technical field
本発明は、プロセッサとァクセラレー夕とを組合わせて半導体集積回 路を開発するための方法、そして当該開発方法に用いられる設計デ一夕 や開発プログラムを格納した記録媒体等に関し、 例えば、 A S I C (Appl ication Specif ic Integrated Circuit) の手法でシステム L S I (Large Scale Integrated circuit) を開発する方法に適用して有効 な技術に関するものである。 背景技術  The present invention relates to a method for developing a semiconductor integrated circuit by combining a processor and an accelerator, and a recording medium storing design data and a development program used in the development method. It relates to technology that is effective when applied to a method of developing a system LSI (Large Scale Integrated circuit) using the method of Application Specific Integrated Circuit (Application). Background art
半導体集積回路等の設計に際して、その中に設けられるべき演算機能 や信号制御機能等の機能上のまとまりは、 ブロック (モジュール) と呼 ばれる。 このブロックには、 その部分の半導体基板上での配置、 すなわ ちレイァゥ 卜の設計が完了し、そのレイァゥトを形成するための複数の マスクパターンを表すところのデータを部品としてチップ設計者に提 供するハ一ドマクロブロック (ハードモジュール) というものがある。 最近では、 そのよ う なノヽ 一 ドマク ロ ブロ ッ ク をノヽ 一 ド I P ( Intel lectual Property:知的所有権) モジュールとも称する。 この ようなハードマクロプロックをチップ設計者に提供する際には、そのハ 一 ドマク ロ ブロ ッ ク を表すデータ と して、 H D L (Hardware Description Language )等のコンビュ 一夕言語で、 そのブロックの機能 を記述したデ一夕と共に、その回路の半導体基板上でのレイァゥ トを表 すところのマスクパターンのデータ (例えば、 マスクパターンを形成す るための描画デ一夕)等が提供される。 この様なハードマクロブロック に対して、 ソフ トマクロブロックと呼ばれるものがある。 ソフ トマクロ ブロックでは、 そのブロック (回路) の機能が H D L等の記述によって 特定され、 その記述が部品としてチップ設計者に提供される。 このよう なソフ トマクロブロックは、ハード I Pモジュ一ルに対してソフ ト I P モジュールとも称される。上述のハード I Pモジュールゃソフ ト I Pモ ジュールなどのマクロブロックの回路規模は、 S R A M ( Static Random Access Memory:スタティ ック · ランダム · アクセス ' メモリ) 、 D R A M (Dynamic Random Access Memory: ダイナミック · ランダム · ァク セス ' メモリ) 、 C P U (Central Processing Unit: 中央処理装置) 、 D S P (Digital Signal Processor:ディジタル · シグナル · プロセッ サ) 等の機能単位に及ぶこともある。 When designing a semiconductor integrated circuit, etc., a functional unit such as an arithmetic function and a signal control function to be provided therein is called a block (module). In this block, the layout of that part on the semiconductor substrate, that is, the layout design is completed, and data representing multiple mask patterns for forming the layout is provided to the chip designer as parts. There is a hard macro block (hard module) to be provided. Recently, such a node macro block is also called a node IP (Intellectual Property) module. When such a hard macro block is provided to a chip designer, the function of the block is expressed in data such as HDL (Hardware Description Language) as a data representing the hard macro block. Along with the data describing the pattern, the data of the mask pattern that represents the layout of the circuit on the semiconductor substrate (for example, forming a mask pattern) And the like are provided. For such a hard macroblock, there is a so-called soft macroblock. In a soft macro block, the function of the block (circuit) is specified by a description such as HDL, and the description is provided as a component to a chip designer. Such a soft macro block is also called a soft IP module for a hard IP module. The circuit size of macroblocks such as the above-mentioned hard IP module and soft IP module is as follows: Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), and Dynamic Random Access Memory (DRAM). It may cover functional units such as access (memory), CPU (Central Processing Unit), and DSP (Digital Signal Processor).
尚、 I Pモジュール等に関しては、 日経エレク トロニクス N o . 7 2 3 ( 1998. 8. 10) の第 99頁〜第 109頁に、 " I Pコアを用いたシステ ム L S Iの検証" について記載がある。  As for IP modules, etc., "Verification of system LSI using IP core" is described on pages 99 to 109 of Nikkei Electronics No. 72 3 (August 10, 1998). .
本発明者は、プロセッサのソフ ト I Pモジュールを使って A S I Cの システム L S Iを開発する場合について検討した。即ち、 所要の処理の 全部をプロセッサ上にソフ トウェアで実現しょうとすると必要な性能 を得ることができないとき、一部の処理をァクセラレ一夕のような専用 ハードウェアを用いて実現する場合について検討した。 ここで、 ァクセ ラレ一夕とは、プロセッサ上で実現可能な機能の一部を負担するための 専用的な回路を意味するものである。  The present inventor has studied a case where an ASICS system LSI is developed using a soft IP module of a processor. In other words, if it is not possible to obtain the required performance if all of the required processing is realized by software on the processor, consider the case where some of the processing is realized by using dedicated hardware such as Xcelare. did. Here, the term “executable” means a dedicated circuit for bearing a part of the functions that can be realized on the processor.
本発明者が検討したシステム L S Iの設計手法は第 1 3図に示され ている。 第 1 3図の内容は公知ではない。 先ず、 プロセッサ上で動作す るプログラム (ベースプログラム) を開発する。 前記ベースプログラム はプロセッサのプログラム開発環境、 すなわち、 コンパィラ、 ソフ トウ エアシミユレ一夕、 デバッガなどの開発ツールを用いて開発される。開 発したベースプログラムが機能仕様を満足していることの検証はこの 段階で行なわれる。次に、 ベースプログラムが適用システムの時間制約 を満たすかどうかの検証を行なう。制約を満足していれば、 このままプ 口セッサを使ってシステム L S Iが実現される。時間制約を満足できな い場合、ァクセラレー夕の導入などの方法で処理能力を向上してシステ ム L S Iを実現する。 The design method of the system LSI studied by the inventor is shown in FIG. The contents of FIG. 13 are not known. First, a program (base program) that runs on the processor is developed. The base program is a program development environment of a processor, that is, a compiler, software It will be developed using development tools such as Airushi Miura and a debugger. Verification that the developed base program satisfies the functional specifications is performed at this stage. Next, verify whether the base program satisfies the time constraints of the applied system. If the constraints are satisfied, a system LSI can be realized using the port processor. If the time constraints cannot be satisfied, the system LSI will be implemented by improving the processing capacity by introducing an Axelaray system.
時間制約についての検証はべ一スプログラムの開発環境で行う。この 段階で検証する時間制約は、システム L S Iと外界との関係によって決 まる制約で、 数 m sから長いものでは数秒に及ぶものもある。 ソフ トゥ エアシミユレ一夕は一般に論理シミユレ一夕より数千倍から数万倍高 速で、 このような検証に有用である。  Verification of time constraints is performed in the base program development environment. The time constraint to be verified at this stage is a constraint determined by the relationship between the system LSI and the outside world, and may be as long as a few milliseconds to a few seconds. Soft-to-air simulations are generally thousands to tens of thousands of times faster than logic simulations, and are useful for such verification.
ベースプログラムを分析して、アプリケーション特有の演算等を抽出 し、 ァクセラレー夕の仕様を決定する。 ァクセラレ一夕のハ一ドウエア 側は、 演算回路の設計、 マイクロコマンドの定義、 制御回路の設計を経 て完成する。 ソフ トウエア側は、 ベースプログラムの変更とァクセラレ 一夕を制御するためのマイクロプログラムの開発が必要である。ベース プログラムの変更はァクセラレ一夕として抽出した部分を、ァクセラレ 一夕との交信のための処理に置き換える。一方、 ァクセラレ一夕を制御 するマイクロプログラムは、定義されたマイクロ命令を組み合わせて所 望の制御シーケンスを構成しなければならい。  Analyze the base program to extract application-specific operations, etc., and determine the specifications of the accelerator. The hardware side of the AXELLALE is completed through the design of arithmetic circuits, definition of microcommands, and design of control circuits. On the software side, it is necessary to modify the base program and develop a microprogram to control the excel process. The change of the base program replaces the part extracted as "Axellare" with the processing for communication with "Axellare". On the other hand, the microprogram that controls the accelerator must form a desired control sequence by combining the defined microinstructions.
最後に、プロセッサとァクセラレータを統合した環境での検証を論理 シミュレ一ション等で行なうことにより、所望のシステム L S Iの設計 が完了する。  Finally, the verification of the environment in which the processor and the accelerator are integrated is performed by logic simulation or the like, thereby completing the design of the desired system LSI.
ァクセラレー夕の設計は、 通常、 デ一夕演算を行なう部分とシーケン スを制御する部分に分けて行われる。データ演算を行なう部分の設計は 比較的容易であるが、 シーケンスを制御する部分の設計は複雑で、 L S I設計に関する専門知識を持った技術者によっても時間のかかる処理 である。これがシステム L S Iを短期間で開発する場合の大きな負担に なっている。 The design of an axelator is usually divided into a part that performs the de- night calculation and a part that controls the sequence. The design of the part that performs data operation is Although it is relatively easy, the design of the part that controls the sequence is complicated and is a time-consuming process even for engineers with expertise in LSI design. This is a major burden when developing system LSIs in a short time.
本発明者は上述の開発方法における以下の問題点を見出した。第 1の 問題点は、ァクセラレ一夕を制御するマイクロプログラムの開発におい て、 開発環境がないことが挙げられる。 ァクセラレ一夕のマイクロプロ グラムや制御回路は、 アプリケーションに応じて最適設計、 即ち個別設 計されるので、そのような個別設計の対象になつている一品一様のマイ クロプログラムのソフ トウェア開発環境、すなわち定義したマイクロ命 令に対応するソフ トウエアシミュレータなどは、 時間的 ·経済的な観点 より、 提供されないのが普通である。 したがって、 マイクロプログラム が正しく動作していることの検証は、ハ一ドウェア設計時の論理シミュ レ一シヨン等によって行なわれる。 この段階において、 マイクロプログ ラム自体の不具合に留まらず、マイクロ命令の定義や制御回路の不備が 発見されれば、 大きな設計手戻りを引き起こすことが少なくない。要す るに、それによる半導体集積回路の開発期間が長期化するという問題点 は、ァクセラレ一夕の制御回路等に対して独自の設計を自由に認めるこ とに原因がある。  The inventor has found the following problems in the above development method. The first problem is that there is no development environment in the development of micro-programs for controlling Xerasele. Since the micro-programs and control circuits of AXELLALE are designed optimally according to the application, that is, they are individually designed, the software development environment for the individual micro-programs that are subject to such individual design That is, software simulators corresponding to the defined microinstructions are not usually provided from the viewpoint of time and economy. Therefore, verification that the microprogram is operating correctly is performed by a logic simulation or the like at the time of hardware design. At this stage, not only defects in the microprogram itself but also defects in the definition of microinstructions and control circuits are often found, causing significant design rework. In short, the problem that the development period of the semiconductor integrated circuit is prolonged is caused by the fact that the original design of the control circuit and the like of Accelare is freely accepted.
第 2の問題点は、システム L S Iとしての統合した機能と性能の検証 の困難さである。最初の段階でベースプログラムの機能の正当性が検証 されていても、 ァクセラレー夕が所望の機能を実現しているか、 ァクセ ラレ一夕との交信の為のプログラム変更は正しくなされているか、など、 システム L S I全体が統一的に正しく動作するかどうかを検証する手 段として、論理シミュレ一シヨンなどのハードウェア設計のためのシミ ユレ一ションしか利用できない。一般に、 ハードウェアのシミュレーシ ヨンはソフ トウェア開発ツールのソフ トウヱァシミュレ一夕よりも数 桁速度性能が劣るので、システムレベルの詳細な検証が事実上できない ( このシステム L S I としての統合した検証の困難性も、ァクセラレ一夕 の設計に独自設計を認めていることが原因である。 The second problem is the difficulty in verifying the integrated functions and performance of the system LSI. Even if the function of the base program is verified at the initial stage, whether the function is realized by the Axelaray, whether the program change for communication with the Axelare is correct, etc. As a means of verifying whether the entire system LSI operates correctly in a unified manner, only simulations for hardware design such as logic simulation can be used. In general, hardware simulation Since Yon has several orders of magnitude lower speed performance than the software development tool software simulation, detailed verification at the system level is practically impossible ( the difficulty of integrated verification as a system LSI is also difficult). The reason is that the company has approved its own design.
本発明者は、既に提供されているプロセッサのソフ トウエアシミュレ The present inventor has developed a software simulation of a processor which has already been provided.
—夕等の開発環境をァクセラレ一夕の開発やァクセラレ一夕との統合 的な検証に殆ど利用できない、 という非効率性を見出した。 —The inefficiency that the development environment such as evenings could hardly be used for the development of the Axelare and the integrated verification with the Axelale was found.
本発明の目的は、プロセッサとァクセラレ一夕とを組合わせて半導体 集積回路を開発する期間を短縮できる開発方法を提供することにある。 本発明の別の目的は、プロセッサとァクセラレ一夕とを組合わせて半 導体集積回路を開発するための開発環境を容易に揃えることをができ、 短時間で信頼性のある検証を可能にする半導体集積回路の開発方法を 提供することにある。  An object of the present invention is to provide a development method capable of shortening a period for developing a semiconductor integrated circuit by combining a processor and an accelerator. Another object of the present invention is to easily prepare a development environment for developing a semiconductor integrated circuit by combining a processor and an accelerator, and to enable reliable verification in a short time. An object of the present invention is to provide a method for developing a semiconductor integrated circuit.
本発明の更に別の目的は、上記開発方法をコンピュータを用いて直ぐ に実行できるようにすることにある。  Still another object of the present invention is to enable the above-mentioned development method to be immediately executed using a computer.
本発明の更に別の目的は、設計データ若しくは設計資産の観点より、 上記開発方法を容易に実現できるようにすることにある。  Still another object of the present invention is to make it possible to easily realize the above development method from the viewpoint of design data or design assets.
本発明のその他の目的は、プロセッサとァクセラレ一夕との組み合わ せによってデータ処理の高速化を企図した半導体集積回路における開 発コス ト低減及び動作の信頼性向上が実現された半導体集積回路を提 供することにある。  Another object of the present invention is to provide a semiconductor integrated circuit in which the development cost is reduced and the operation reliability is improved in a semiconductor integrated circuit which aims to speed up data processing by combining a processor and an accelerator. To provide.
本発明の上記並びにその他の目的と新規な特徴は本明細書の以下の 記述と添付図面から明らかにされるであろう。 発明の開示  The above and other objects and novel features of the present invention will become apparent from the following description of the present specification and the accompanying drawings. Disclosure of the invention
〔 1〕本発明に係る半導体集積回路の開発方法は、 第 1の命令セッ 卜に 含まれる命令を実行可能なプロセッサにァクセラレ一夕を付加して半 導体集積回路を閧発する方法であって、前記第 1の命令セッ卜に含まれる 命令に対して同一の命令又は前記第 1の命令セッ卜に含まれる命令に対して 命令コードが同一であって実行結果の異なる下位互換の命令を含む第 2の命 令セットの命令を実行可能に前記ァクセラレ一夕を設計する処理を含む。 前 記プロセッサは例えば R I S C (Reduced Instruction Set Computer) ァ一 キテクチャとされ、 基本的には各命令を 1クロックサイクルで実行する。 前記第 2の命令セッ 卜が、 前記第 1の命令セットに含まれる命令に対し て同一又は前記下位互換の命令を含むという構成は、 ァクセラレー夕におい て命令フヱツチと解読並びに命令実行手順の制御を行う論理には前記プロセ ッザの制御論理に対して新たな機能追加はなく機能削減によつて実現される、 ということを規定する。 即ち、 ァクセラレー夕は、 プロセッサのサブセッ ト として構成される。 例えば、 第 1の命令セットに含まれる加算命令のコード を、 第 2の命令セットではガロア体の乗算機能に割り当て、 当該命令コード を解読して得られる制御信号の種類はプロセッサにおける解読結果と同じで あっても、 それら制御信号をァクセラレ一夕に追加されたガロア体の乗算器 の演算制御信号として流用する。 ガロァ体の乗算器にはその制御信号を用い てガロア体の乗算を行うことができる論理構成を採用することになる。 前記 プロセッサが R I S Cアーキテクチャである場合、 命令コードの解読によつ て得られる制御信号をプロセッサの場合とは異なるガロァ体乗算器のような 演算器の制御に利用できるようにすることは極めて容易である。 [1] The method for developing a semiconductor integrated circuit according to the present invention includes a first instruction set. A method for generating a semiconductor integrated circuit by adding an executable instruction to a processor capable of executing an instruction included therein, wherein the instruction is the same as the instruction included in the first instruction set or the first instruction. The method includes a process of designing the excel instruction so that an instruction of a second instruction set including a backward compatible instruction having the same instruction code and an execution result different from that of an instruction included in the instruction set can be executed. The processor is, for example, a reduced instruction set computer (RISC) architecture, and basically executes each instruction in one clock cycle. The configuration in which the second instruction set includes the same or lower-compatible instructions with respect to the instructions included in the first instruction set is a method of controlling instruction flipping, decoding, and instruction execution procedures in an accelerator. It specifies that the logic to be implemented is realized by reducing the number of functions without adding new functions to the control logic of the processor. That is, the accelerator is configured as a subset of the processor. For example, the code of the addition instruction included in the first instruction set is assigned to the Galois field multiplication function in the second instruction set, and the type of control signal obtained by decoding the instruction code is the same as the decoding result in the processor. However, these control signals are diverted to the arithmetic control signals of the Galois field multiplier added to the AXELLALE. The Galois field multiplier will adopt a logic configuration that can perform Galois field multiplication using the control signal. When the processor has the RISC architecture, it is very easy to use the control signal obtained by decoding the instruction code to control an arithmetic unit such as a Galois field multiplier different from the processor. is there.
上述のように、 ァクセラレ一夕をアプリケーション毎に固有の独自回路で (プロセヅサのアーキテクチャとは無関係に) 構成することは認めず、 ァク セラレー夕をプロセッサのサブセットとして構成するから、 プロセッサの開 発環境であるデバッガ、 コンパイラ、 ソフトウェアシミュレータを流用すれ ば、 ァクセラレー夕の開発環境を容易に得ることができる。 例えば、 プロセ ヅサのソフトウエアシミュレ一夕において、 ァクセラレ一夕で置き換える処 理の機能記述を入れ替えれば、 そのままァクセラレー夕のソフトウエアシミ ユレ一夕として利用できる。 このように生成されたァクセラレー夕のソフト ウェアシミュレ一夕と、 プロセッサのソフトウエアシミュレータとを組合わ せれば、 プロセッサとァクセラレー夕とを搭載したシステム L S Iのような 半導体集積回路全体に対する統合的な検証も容易に実現することができる。 したがって、 プロセッサとァクセラレ一夕とを組合わせて半導体集積 回路を開発するための開発環境を容易に揃えることをができ、比較的短 時間で信頼性のある検証が可能になり、前記半導体集積回路を開発する 期間の短縮を実現できる。そのような方法で開発された半導体集積回路は、 開発コス ト低減及び動作の信頼性向上が実現されることになる。 As described above, it is not allowed to configure an excel processor with a unique circuit unique to each application (irrespective of the processor architecture), and since the excel processor is configured as a subset of the processor, processor development is performed. By using the debugger, compiler, and software simulator, which are environments, it is possible to easily obtain the development environment for Axelara. For example, the process If the function description of the process to be replaced with “Axelare” is replaced in the “Simulation of software”, the software can be used as it is. By combining the software simulation of the X-ray generator generated in this way with the software simulator of the processor, integrated verification of the entire semiconductor integrated circuit such as a system LSI equipped with the processor and the X-ray generator can be achieved. Can also be easily realized. Therefore, it is possible to easily prepare a development environment for developing a semiconductor integrated circuit by combining a processor and an accelerator, and it is possible to perform reliable verification in a relatively short time, and The development period can be shortened. A semiconductor integrated circuit developed by such a method will realize a reduction in development cost and an improvement in operation reliability.
〔2〕本発明による更に詳細な態様の半導体集積回路の開発方法を説明す る。 半導体集積回路の開発方法は、 第 1の命令セットを有するプロセッサ上 で動作するように開発されたべ一スプログラムの一部の機能を抽出する処理 と、 前記第 1の命令セッ卜に含まれる命令に対して同一の命令又は前記第 1 の命令セットに含まれる命令に対して命令コ一ドが同一であって実行結果の 異なる下位互換の命令を含む第 2の命令セットを用いて前記抽出された処理 を実現するためのァクセラレ一夕の論理構成を設計する処理と、 前記抽出さ れた処理を前記ァクセラレー夕に実行させるためのァクセラレ一夕制御プロ グラムを設計する処理と、 前記ベースプログラムによる前記抽出された処理 を前記ァクセラレ一夕制御プログラムによる処理に代替させる変更を前記べ ースプログラムに対して行う処理と、 前記変更されたベースプログラムによ る前記プロセッサと前記ァクセラレ一夕制御プログラムによるァクセラレ一 夕とに対する統合シミュレーションを行う処理と、 を含む。  [2] A method of developing a semiconductor integrated circuit in a more detailed embodiment according to the present invention will be described. A method of developing a semiconductor integrated circuit includes a process of extracting a part of a function of a base program developed to operate on a processor having a first instruction set; and a method of extracting a part of a function included in the first instruction set. The same instruction or the second instruction set including a lower compatible instruction having an execution result different from the instruction code included in the first instruction set. A process of designing a logical configuration of an xerare overnight to realize the above process; a process of designing an xerare night control program for causing the xerarae to execute the extracted process; and A process of making a change to the base program so that the extracted process is replaced with a process by the control program for exchanging data. And a process of performing an integrated simulation of the processor by the computer program and the accelerator by the accelerator control program.
前記ァクセラレー夕制御プログラムを設計する処理には、 前記ァクセラレ The process for designing the control program includes the control process.
—夕のソフトウェアシミュレー夕を用いて前記ァクセラレ一夕制御プログラ ムを検証する処理を含み、 前記ァクセラレー夕のソフトウエアシミュレ一夕 は、 前記プロセッサのソフ トウ:!:ァシミユレ一夕の一部を変更して得る ことができる。 —Use the evening software simulation to set up the control program Including the process of verifying the system, the software simulation of the accelerator can be obtained by changing a part of the software of the processor:!: Simulation.
前記統合シミュレーシヨンを行う処理は、 統合シミュレ一夕を用いて行う 処理であり、 前記統合シミュレータは、 前記ァクセラレー夕のソフトウェア シミュレ一夕と、 前記プロセヅサのソフ トウエアシミュレ一夕とを組合 わせて得ることができる。  The process of performing the integrated simulation is a process performed using an integrated simulation, and the integrated simulator obtains a combination of the software simulation of the accelerator and the software simulation of the processor. be able to.
〔 3〕上記開発方法を利用して設計される半導体集積回路は、 第 1の命 令セッ トに含まれる命令のフエツチと解読を行うと共に前記第 1の命 令セッ 卜に含まれる命令の実行手順を制御する制御部と、前記制御部で 生成された制御信号に従った演算処理を行う実行部とを有するプロセ ッサと、 第 2の命令セッ卜に含まれる命令のフエツチと解読を行うと共に 前記第 2の命令セッ トに含まれる命令の実行手順を制御する制御部と、 前記制御部で生成された制御信号に従った演算処理を行う実行部とを 有するァクセラレ一夕とを含む。 前記第 2の命令セットは、 前記第 1の命令 セットに含まれる命令に対して同一の命令又は前記第 1の命令セッ卜に含ま れる命令に対して命令コードが同一であって実行結果の異なる下位互換の命 令を含む。 前記ァクセラレー夕の制御部は前記プロセッサの制御部における 論理機能の一部が削除された論理機能を有する。  [3] A semiconductor integrated circuit designed using the above-described development method reads and decodes instructions included in the first instruction set, and executes instructions included in the first instruction set. A processor having a control unit for controlling the procedure, an execution unit for performing arithmetic processing in accordance with the control signal generated by the control unit, and fetching and decoding instructions included in the second instruction set A control unit for controlling an execution procedure of an instruction included in the second instruction set; and an execution unit having an execution unit for performing arithmetic processing according to a control signal generated by the control unit. The second instruction set has the same instruction code as the instruction included in the first instruction set or the same instruction code as the instruction included in the first instruction set and has a different execution result. Includes backward compatible instructions. The control unit of the accelerator has a logical function in which a part of the logical function in the control unit of the processor is deleted.
また、 上記開発方法を利用して設計される別の観点による半導体集積 回路は、 上記同様、 プロセッサとァクセラレ一夕とを含み、 第 2の命令 セッ卜には第 1の命令セットに含まれる命令に対して同一の命令又は前記下 位互換の命令を含むものとするが、 ァクセラレー夕にはプロセッサの実行 部に対して一部の演算機能を削除し、代わりに新たな演算機能を追加し た実行部を採用して、 構成してもよい。  Further, a semiconductor integrated circuit according to another aspect designed using the above-described development method includes a processor and an exciter similar to the above, and the second instruction set includes instructions included in the first instruction set. The same instruction or the instruction of the lower compatibility described above is included in the execution unit, but some execution functions are deleted from the execution unit of the processor, and a new execution function is added instead. May be adopted.
前記制御部は、 例えば、 プログラムカウン夕、 命令デコーダ及びシ一ケン ス制御手段を有する。 前記シーケンス制御手段は、 次に実行すべき命令アド レスをプログラムカウン夕に保有させ、 プログラムカウン夕が保有する命令 アドレスに従ってフェッチした命令に基づいて命令デコーダに制御信号を出 力させる。 前記実行部は、 前記制御信号によってその動作が制御されるレジ ス夕回路及び演算器を有する。 The control unit includes, for example, a program counter, an instruction decoder, and a sequencer. A control unit. The sequence control means causes the program counter to hold the instruction address to be executed next, and causes the instruction decoder to output a control signal based on the instruction fetched according to the instruction address held by the program count. The execution unit includes a register circuit whose operation is controlled by the control signal and an arithmetic unit.
〔 4〕 コンピュータに実行させて前記開発方法を実現するためのプログラム を格納した記録媒体は、 第 1の命令セットを有するプロセッサ上で動作する ように開発されたベースプログラムから抽出された一部の処理を、 前記第 1 の命令セッ 卜に含まれる命令に対して同一の命令又は前記第 1の命令セッ ト に含まれる命令に対して命令コードが同一であって実行結果の異なる下位互 換の命令を含む第 2の命令セットを用いてを実現するためのァクセラレ一夕 の論理構成を設計する処理と、 前記抽出された一部の処理を前記ァクセラレ —夕に実行させるためのァクセラレ一夕制御プログラムを設計する処理と、 をコンビユー夕に実行させるためのプログラムがコンピュータ読取り可能に 記録されて成る。  [4] A recording medium storing a program to be executed by a computer to implement the development method is a part of a base program extracted from a base program developed to operate on a processor having a first instruction set. The processing is performed by using the same instruction for the instructions included in the first instruction set or the lower compatible with different execution results having the same instruction code for the instructions included in the first instruction set. A process of designing a logical configuration of an x-ray instruction for realizing using a second instruction set including instructions, and an x-ray control for causing the extracted part of the processing to be executed in the x-ray display in the evening The process of designing a program and the program for causing the program to be executed on a computer are recorded in a computer-readable form.
上記プログラムには、 前記ベースプログラムによる前記抽出された一部の 処理を前記ァクセラレ一夕制御プログラムによる処理に代替させる変更を前 記べ一スプログラムに対して行う処理と、 前記変更されたべ一スプログラム による前記プロセッサと前記ァクセラレー夕制御プログラムによるァクセラ レ一夕とに対する統合シミュレーションを行う処理と、を更に含めてもよい。 上記プログラムが提供されることによって、 上記開発方法を直ぐに利用で きるようになる。  The program includes a process of making a change to the base program to substitute the extracted part of the process by the base program with a process by the accelerator control program; and a process of changing the base program. The program may further include a process of performing an integrated simulation of the processor by the program and the exercise by the exercise control program. By providing the above program, the above development method can be used immediately.
〔 5〕 前記開発方法ではプロセッサの I Pモジュールデータのような設計デ —夕を用いることができる。 その設計データを、 前記プロセッサのサブセッ トとして位置付けられるァクセラレ一夕の開発に利用することを考慮したと き、 前記設計データは、 プロセッサの機能を定義するための機能定義情 報と、制御信号情報とがコンピュータ読取り可能に記録媒体に記録され て提供される。 [5] In the above development method, design data such as IP module data of a processor can be used. Considering that the design data is to be used for the development of the Xerasele, which is positioned as a subset of the processor, the design data contains function definition information for defining the functions of the processor. Information and control signal information are recorded on a recording medium in a computer-readable manner and provided.
前記機能定義情報は、第 1の命令セッ トに含まれる命令のフエツチと 解読を行うと共に前記第 1の命令セッ トに含まれる命令の実行手順を 制御する制御部の機能を定義する第 1の情報と、前記制御部で生成され た制御信号に従った演算処理を行う実行部の機能を定義する第 2の情 報とを、 夫々機能変更可能に別々に有する。 これにより、 このプロセッ ザの設計デ一夕をァクセラレー夕の設計に流用するとき、制御部には新 たな機能を付加すること無くサブセッ トとして必要な機能を与え、実行 部にはァクセラレー夕固有の演算処理のための新たな機能を与えるこ とができる。前記制御信号情報は、 前記制御部と実行部との間の制御信 号を夫々明示する情報である。即ち、 プロセッサの動作シーケンスを制 御する部分とデ一夕演算を行う部分との間の制御信号の意味及び論理 値等が半導体集積回路の設計者に開示されている。 これにより、 実行部 における新たな演算機能の制御に、制御部からの制御信号の内のどの制 御信号を割り当てるかを容易に決定できる。 したがって、 上記設計デー 夕を用いれば、 ァクセラレー夕の設計が極めて容易になる。 図面の簡単な説明  The function definition information includes a first definition that defines the function of a control unit that performs the reading and decoding of the instructions included in the first instruction set and controls the execution procedure of the instructions included in the first instruction set. Information and second information that defines the function of an execution unit that performs arithmetic processing in accordance with the control signal generated by the control unit, each of which has a function that can be changed. As a result, when the design data of this processor is used in the design of the accelerator, the functions required for the subset are given to the control unit without adding new functions, and the execution unit has the unique functions. A new function can be provided for the arithmetic processing. The control signal information is information that clearly specifies a control signal between the control unit and the execution unit. That is, the meaning and the logical value of the control signal between the part that controls the operation sequence of the processor and the part that performs the data calculation are disclosed to the designer of the semiconductor integrated circuit. Thus, it is possible to easily determine which control signal of the control signals from the control unit is to be assigned to the control of the new arithmetic function in the execution unit. Therefore, the use of the above design data makes it extremely easy to design an excellence. BRIEF DESCRIPTION OF THE FIGURES
第 1図は本発明に係る半導体集積回路の開発方法の一例を示すフロ 一チヤ一トである。  FIG. 1 is a flowchart showing an example of a method for developing a semiconductor integrated circuit according to the present invention.
第 2図はベースプログラムの一例を示す説明図である。  FIG. 2 is an explanatory diagram showing an example of a base program.
第 3図はプロセッサの演算回路の一例を示すプロック図である。 第 4図はァクセラレ一夕の演算回路の一例を示すプロック図である。 第 5図はプロセッサの命令セッ トを例示する説明図である。  FIG. 3 is a block diagram showing an example of an arithmetic circuit of the processor. FIG. 4 is a block diagram showing an example of the arithmetic circuit of the Xerasele. FIG. 5 is an explanatory diagram illustrating an instruction set of a processor.
第 6図はァクセラレー夕の命令セッ トを例示する説明図である。 第 7図はプロセッサにァクセラレ一夕を組合わせて構成されたシス テム L S Iのブロック図である。 FIG. 6 is an explanatory diagram exemplifying an instruction set of an excelerre. FIG. 7 is a block diagram of a system LSI configured by combining a processor with an accelerator.
第 8図はァクセラレー夕のプログラムの一例を示す説明図である。 第 9図はプロセッサの I Pモジュールデータの一例を示す説明図で ある。  FIG. 8 is an explanatory diagram showing an example of a program for an axelare evening. FIG. 9 is an explanatory diagram showing an example of the IP module data of the processor.
第 1 0図はプロセッサのソフ トウエアシミユレ一夕の一例を示す説 明図である。  FIG. 10 is an explanatory diagram showing an example of the software simulation of the processor.
第 1 1図はシステム L S Iの開発に利用されるコンピュータの一例 を示す斜視図である。  FIG. 11 is a perspective view showing an example of a computer used for developing the system LSI.
第 1 2図はプロセッサと共に 2個のァクセラレ一夕を有するシステ ム L S Iのブロック図である。  FIG. 12 is a block diagram of a system LSI having two processors together with a processor.
第 1 3図は本発明者が先に検討したシステム L S Iの設計手法を示 すフローチヤ一トである。  FIG. 13 is a flowchart showing a method of designing the system LSI studied by the inventor previously.
発明を実施するための最良の形態 BEST MODE FOR CARRYING OUT THE INVENTION
第 1図には本発明に係る半導体集積回路の開発方法の一例を示すフ 口—チヤ—トが示されている。先ず、 プロセッサ上で動作するプログラ ム (ベースプログラム) を開発する ( S 1 ) 。 ここでターゲッ トとする プロセッサは R I S C (Reduced Instruction Set Computer) 形態 とされる。前記ベースプログラムはプロセッサのプログラム開発環境、 すなわち、 コンパイラ、 ソフ トウェアシミユレ一夕、 デバッガなどの開 発ヅールを用いて開発される。開発したベースプログラムが機能仕様を 満足していることの検証は、 ソフ トウェアシミュレータを用いて、 この 段階で行なわれる。  FIG. 1 is a flowchart showing an example of a method for developing a semiconductor integrated circuit according to the present invention. First, a program (base program) that runs on the processor is developed (S1). The target processor here is in the form of a reduced instruction set computer (RISC). The base program is developed using a program development environment of a processor, that is, a development tool such as a compiler, a software simulator, a debugger, or the like. Verification that the developed base program satisfies the functional specifications is performed at this stage using a software simulator.
次に、ベースプログラムが適用システムの時間制約を満たすかどうか の検証を行なう (S 2 ) 。 時間制約についての検証はベースプログラム の開発環境で行う。 この段階で検証する時間制約は、 システム L S Iと 外界との関係によって決まる制約で、数 m sから長いものでは数秒に及 ぶものもある。ソフ トウヱァシミユレ一夕は一般に論理シミュレ一夕よ り数千倍から数万倍高速で、 このような検証に有用である。 Next, it is verified whether the base program satisfies the time constraint of the applied system (S2). Verification of time constraint is base program Perform in a development environment. The time constraints to be verified at this stage are constraints determined by the relationship between the system LSI and the outside world. The software simulation is generally thousands to tens of thousands of times faster than the logic simulation, and is useful for such verification.
時間制約を満足していれば何の問題もないので、このままプロセッサ を使ってシステム L S Iが実現される (S 3 ) 。 時間制約を満足できな い場合には、ァクセラレ一夕の導入などの方法で処理能力を向上してシ ステム L S Iを実現する。ァクセラレー夕を導入する場合にはァクセラ レー夕で実現する処理ルーチンをべ一スプログラムから抽出する ( S 4 ) 。 抽出の基準は、 単純な処理の多数回の繰り返しや、 アプリケーシ ョン特有の演算など、プロセッサ上で実現した場合にデ一夕処理性能上 のネックになっている点に着目する。  Since there is no problem if the time constraint is satisfied, the system LSI is realized using the processor as it is (S3). If the time constraints cannot be satisfied, the system will be implemented by improving the processing capacity by introducing the best practice, such as the introduction of Xerasele. In the case of introducing an accelerator, a processing routine to be realized in the accelerator is extracted from a base program (S4). The extraction criterion focuses on the fact that it is a bottleneck in data processing performance when implemented on a processor, such as simple repetition of processing many times or application-specific operations.
ステップ S 4において、アプリケーションに特有の演算が抽出されて いれば、 それに適する演算回路を設計する ( S 5 ) 。 アプリケーション に特有の演算とは例えば、コンピュータグラフィ ックにおける浮動小数 点演算や倍精度演算、 誤差計算における自乗平均、 誤り訂正におけるガ 口ァ体演算などである。このような特殊演算は汎用プロセッサでは直接 サポートされていない場合が多く、既存命令の組み合わせで実現されて いるので、 複数の命令処理ステップを要するが、 専用演算回路を導入で きれば単一ステップで実現できる場合も多い。  In step S4, if an operation specific to the application has been extracted, an arithmetic circuit suitable for the extracted operation is designed (S5). The operations specific to the application include, for example, floating-point operations and double-precision operations in computer graphics, root-mean-square operations in error calculation, and Gaussian operations in error correction. Such special operations are often not directly supported by general-purpose processors, and are implemented by combining existing instructions, thus requiring multiple instruction processing steps. Often it can be achieved.
次にァクセラレ一夕の命令セッ トを定義する ( S 6 ) 。 すなわち、 プ 口セッサの演算命令をオペランドの数や種類を変更せずに演算の種類 のみを置き換えて特殊演算を定義し、さらに必要最低限の制御命令に限 定することによって、 ァクセラレー夕の命令セッ トを定義する。換言す れば、プロセッサの演算命令においてオペランドの数や種類を変更せず に演算の種類のみを置き換えて特殊演算をサポートすることになる。上 記によるァクセラレ一夕の命令セッ トの定義は次のような意義を有す ることになる。 ァクセラレ一夕の命令セッ ト (第 2の命令セッ ト) は、 プロセッサの命令セッ ト (第 1の命令セット) に含まれる命令に対して同 一の命令又は前記第 1の命令セットに含まれる命令に対して命令コ一ドが同 一であって実行結果の異なる下位互換の命令を含む、 ということである。 前 記第 2の命令セッ トが、 前記第 1の命令セットに含まれる命令に対して同 一又は前記下位互換の命令を含むということは、 ァクセラレ一夕において命 令フェツチと解読並びに命令実行手順の制御を行う制御論理回路には、 前記 プロセッサの制御論理回路に対して新たな機能追加はなくプロセッサにおけ る機能削減によって実現される、 ということを意味する。 即ち、 ァクセラレNext, the instruction set for the X-Axelare is defined (S6). In other words, by exchanging only the operation type without changing the number or type of operands, the arithmetic operation instruction of the processor is defined as a special operation, and further restricted to the minimum necessary control instructions. Define a set. In other words, a special operation is supported by replacing only the type of operation without changing the number or type of operands in the operation instruction of the processor. Up The definition of the instruction set of the Xerasele Iya has the following significance. The instruction set (second instruction set) of the AXELLALE is the same instruction as the instruction included in the instruction set (first instruction set) of the processor or included in the first instruction set. This means that instructions have the same instruction code and include backward-compatible instructions with different execution results. The fact that the second instruction set includes the same or lower-compatible instructions as the instructions included in the first instruction set means that instruction fetching, decoding, and instruction execution procedures are performed in an ex-celerator. This means that the control logic circuit that performs the control described above does not have a new function added to the control logic circuit of the processor but is realized by reducing the functions in the processor. That is,
—夕は、 プロセッサのサブセットとして構成されることになる。 —Evening will be configured as a subset of the processors.
次に、上記で定義されたァクセラレ一夕の命令セッ トに適合するよう にプロセッサの制御論理回路から不要な機能を削除してァクセラレ一 夕の制御論理回路を設計する (S 7 ) 。 プロセッサの制御論理回路に対 するァクセラレ一夕の制御論理回路の関係は上述の通りである。 プロセ ッサの制御論理回路がレジス夕 トランスファランゲジ一などの高位記 述で与えられていれば、サポートしない機能を記述から削除して制御論 理回路を再合成すればよい。真理値表で定義されていれば機能を限定す ることによって不要になった入力値の組み合わせを削除して真理値表 を縮小する。真理値表の縮小によって値の変化しなくなった出力はそれ を入力する別の真理値表の縮小を可能にし、最終的に制御論理回路全体 の回路規模が縮小される。  Next, an unnecessary function is deleted from the control logic circuit of the processor so as to conform to the instruction set of the X-ray transmission defined above, and the control logic circuit of the X-ray transmission is designed (S7). The relationship between the control logic circuit of the Xerasele and the control logic circuit of the processor is as described above. If the processor's control logic circuit is given in a high-level description such as the register transfer function, unsupported functions should be deleted from the description and the control logic circuit re-synthesized. If it is defined in the truth table, the function is limited, and unnecessary combinations of input values are deleted to reduce the size of the truth table. The output whose value does not change due to the reduction of the truth table enables the reduction of another truth table to which the value is input, and eventually the circuit size of the entire control logic circuit is reduced.
一方、 ソフ トウヱァ側は、 ベースプログラムの変更 ( S 8 ) と、 ァク セラレー夕を制御するためのプログラムの開発(S 9 ) が必要である。 ベースプログラムの変更はァクセラレ一夕として抽出した部分を、ァク セラレー夕との更新のための処理に置き換える。 すなわち、 ァクセラレ —夕へのデータの転送、 ァクセラレー夕の起動、 ァクセラレー夕の動作 の監視若しくは終了割り込みの受付、 結果の転送、 である。 一方、 ァク セラレ一夕を制御するプログラムは、新規に定義されたマイクロ命令を 組み合わせて所望の制御シーケンスを構成するものではない。前述のよ うに、プロセッサの制御命令の限定と演算命令の再定義でァクセラレー 夕の命令セッ トを定義しているので、 それに応じて、 ァクセラレ一夕と して抽出されているルーチンを書き換えればよい。演算の種類を変更し ただけであるので、プロセッサのソフ トウェアシミユレ一夕の演算定義 を変更してァクセラレ一夕のソフ トウエアシミユレ一夕が容易に作成 できる。独自のマイクロプログラムシーケンスを採用したァクセラレ一 夕に対しては、 それ専用のシミュレータを作ることは、 その開発設計に 多大な時間と労力を必要とすることから、全く行なわれていなかつたと 言って良いが、 第 1図の開発手法によれば、 ァクセラレ一夕のシミュレ 一夕の開発が容易になり、ァクセラレー夕の機能仕様を設計の早い段階 で検証することが可能になる。 On the other hand, on the software side, it is necessary to change the base program (S8) and develop a program to control the accelerator (S9). To change the base program, replace the part extracted as “Axelare” with a process for updating with the “Axelare”. In other words, —Transfer of data to the evening, activation of the excitement, monitoring of the operation of the excitement, acceptance of a termination interrupt, and transfer of results. On the other hand, the program that controls the accelerator is not one that forms a desired control sequence by combining newly defined microinstructions. As described above, since the instruction set of the accelerator is defined by limiting the control instructions of the processor and redefining the operation instructions, the routine extracted as the accelerator can be rewritten accordingly. . Since only the type of operation is changed, the operation definition of the processor software is changed to easily create the software simulation of the processor. It can be said that creating a dedicated simulator for the X-Series that adopted a unique micro-program sequence has not been performed at all because the development and design required a great deal of time and effort. However, according to the development method shown in Fig. 1, it is easy to develop the simulation of the Xerasele and the function specifications of the Xerasele at an early stage of design.
プロセッサとァクセラレー夕を統合したシステムの検証は、プロセッ サのソフ トウエアシミュレータとァクセラレ一夕のソフ トウエアシミ ユレ一夕を組み合わせることによって容易に行うことができる ( S 1 0 ) 。 ァクセラレー夕のハ一ドウエアとソフ トウエアシミュレ一夕は、 そもそも既に十分に検証されたプロセッサのハードウヱァとソフ トゥ エアシミュレ一夕を機能限定と小変更(演算の種類の置き換え)のみで 実現されている。 したがって、 ァクセラレ一夕のハードウェアとソフ ト ウェアシミュレータに、 人為的なミスの入る可能性は相当低い。ベース プログラムもまた制御アルゴリズムを変更せず、特殊演算を導入しただ けであるので、人為的なミスの入る可能性は従来に比べてかなり低い。 さらに統合したソフ トゥヱァシミュレーション環境が提供されるので、 例え人為的なミスが発生していたとしても、設計の早い段階でミスを発 見して修正することができる。さらにプロセッサとァクセラレ一夕の統 合システムのソフ トウエアシミュレ一ション環境は、論理シミュレ一シ ョン環境に比べて百倍から一万倍の高速でシミユレーシヨンできるの で、当初のベースプログラムの開発と同程度の検証ができることができ る。プロセッサとァクセラレ一夕の統合システムのシミュレ一シヨン環 境が論理シミュレーションレベルであれば、毎秒数十ステップ程度のシ ミュレーシヨン能力しか得られない。 これに対し、 ソフ トウエアシミュ レーシヨンで統合システムを検証すれば、毎秒数千から数十万ステップ の環境が得られる。これによつて検証できる機能の複雑さは飛躍的に向 上し、 設計の品質が極めて高くなる。数億ステップに 1回発生するよう な特殊な条件についてもソフ トウエアシミュレ一ションで統合システ ムを検証すれば、 数時間以内のシミュレーションで検証できるが、 論理 シミユレーシヨンでは数万時間となって非現実的である。 Verification of a system that integrates a processor and an X-ray system can be easily performed by combining a software simulator of the processor with a software simulation of X-Series (S10). The hardware and software simulations of the X-ray simulation are realized in the first place by the functions of the processor hardware and the software-to-air simulation that have already been fully verified, with only limited functions and minor changes (replacement of the type of operation). Therefore, the possibility of human error in the hardware and software simulators of Accellare is quite low. Since the base program also does not change the control algorithm and only introduces special operations, the possibility of human error is considerably lower than before. A more integrated software simulation environment is provided, Even if human errors occur, they can be discovered and corrected early in the design process. In addition, the software simulation environment of the integrated system of the processor and the accelerator can be simulated 100 to 10,000 times faster than the logical simulation environment. Verification can be performed to a degree. If the simulation environment of the integrated system of the processor and the accelerator is a logic simulation level, only a few dozen simulation steps per second can be obtained. On the other hand, if an integrated system is verified by software simulation, an environment with thousands to hundreds of thousands of steps per second can be obtained. This dramatically increases the complexity of the functions that can be verified, resulting in extremely high design quality. Even if special conditions that occur once in hundreds of millions of steps can be verified by simulation within several hours if the integrated system is verified by software simulation, it will be tens of thousands of hours in logic simulation and unrealistic It is a target.
次に、あるプロセッサからァクセラレー夕を設計する場合の具体例を 説明する。  Next, a specific example of designing an accelerator from a certain processor will be described.
第 2図には対象とするベースプログラムの一例が示される。 行番号 MSE— 01〜MSE_14 が自乗平均誤差を求めるルーチンであり、 その繰り返 し回数が 1 2 8回と多く、自乗計算というプロセッサの命令セッ ト内で 単一命令として定義されていない演算を行っている。そのルーチンを、 ァクセラレ一夕で実現するに相応しいルーチンとして抽出する。ここで 留意すべきは、ァクセラレー夕に演算させるル一チンの直後のベースプ 口グラムの処理がァクセラレー夕の演算結果を必要としないものを選 ぶのが好ましい。ァクセラレ一夕がプロセッサとは独立に動作するとし ても、ベースプログラムの処理自体がァクセラレ一夕の演算結果を待つ ような場合には、データ処理の高速化の効果は低いものとなってしまう からである。 Fig. 2 shows an example of the target base program. Line numbers MSE-01 to MSE_14 are routines for calculating the root-mean-square error. The number of repetitions is as large as 128, and the calculation that is not defined as a single instruction in the processor instruction set called square calculation is performed. Is going. The routine is extracted as a routine that is suitable for being implemented in the Axelare overnight. It should be noted here that it is preferable to select a program whose base program immediately after the routine to be calculated by the accelerator does not require the calculation result of the accelerator. Even if the excelerat operates independently of the processor, if the processing of the base program itself waits for the operation result of the exercise, the effect of speeding up the data processing is low. Because.
第 3図にはプロセッサの演算回路が例示されている。プロセッサは 1 6個のレジス夕 R 0〜R 1 5を有するレジス夕ファイル 1 0、 ALU (Arithmetic Logic Unit) 1 1、 及び乗算器 ( M U L T ) 1 2を備え、 MU L Τ 1 2は専用の出力レジス夕 MAC 1 3を持っている。  FIG. 3 illustrates an arithmetic circuit of the processor. The processor is equipped with a register file 10 having 16 registers R 0 to R 15, an ALU (Arithmetic Logic Unit) 11, and a multiplier (MULT) 12, and the MUL 12 is a dedicated You have an output registry evening MAC 13.
第 2図のルーチンから抽出したァクセラレ一夕のための自乗平均ル 一チンで使う演算は、 加減算と自乗のみであり、 占有するレジス夕も R 0, R 4, R 5 , R I O , R l l , MACの 6個である。 これを元にし て設計したァクセラレー夕の演算回路を第 4図に示す。レジスタフアイ ノレ 14は使用するレジス夕のみ備えればよいので、 5個のレジス夕 R 0: R4 , R 5, R 1 0, R 1 1に限定し、 残りは削除する。 ALU 1 1は 加減算器 1 5で置き換え、乗算器 1 2は自乗のみでよいから入力を短絡 して自乗器 1 6とし回路も簡略化する。一般的に汎用乗算器より自乗器 の方が簡単な回路で実現できる。さらに自乗器 1 6は高速に動作するの で出力の MACレジスタ 1 3を削除して汎用レジス夕に接続する。  The arithmetic operations used in the root mean square routine for the excel routine extracted from the routine in FIG. 2 are only addition and subtraction and the square. The occupied register is also R 0, R 4, R 5, RIO, R ll, There are 6 MACs. Fig. 4 shows the arithmetic circuit of the Axelalay designed based on this. Since the register window 14 only needs to be provided with the register window to be used, the five register windows are limited to R0: R4, R5, R10, and R11, and the rest are deleted. The ALU 11 is replaced by an adder / subtractor 15 and the multiplier 12 is only a square, so the input is short-circuited to form a square 16 and the circuit is simplified. In general, a squarer can be realized with a simpler circuit than a general-purpose multiplier. In addition, since the squarer 16 operates at high speed, the output MAC register 13 is deleted and connected to the general-purpose register.
上記のような機能の簡略化にともなって制御信号線も削減できる。プ 口セヅサの制御信号は第 3図に示したようにレジスタファイル 1 0、 A LU 1 1、 及び乗算器 1 2に対してそれそれ 1 2ビッ ト、 5ビッ ト、 2 ビッ トである。 レジス夕ファイル 1 0に対する信号線は、 1 6個の中か ら 1個を選択するのに 4ビッ ト、これが 2系統のリードと 1系統のライ トのために合計 3系統独立に必要なので、 1 2ビッ トとなっている。 ァ クセラレ一夕では、 第 4図に示されるように、 選択するレジス夕が 5個 に削減されるので、レジス夕選択信号は 3ビッ ト 3系統で合計 9ビッ ト でよい。 ALU 1 1に対する制御信号は第 3図に示されるように、 加減 算 /シフ トなどの算術演算と論理和/論理積などの論理演算を選択す るために 5ビッ ト必要であつたが、第 4図のァクセラレー夕では算術加 算 /減算の選択のための 1ビッ トでよい。 また、 第 4図のァクセラレー 夕では、乗算器 1 2の出力レジス夕 M A C 1 3に対す入出力制御は全く 不要になる。 With the simplification of the function as described above, the number of control signal lines can be reduced. The control signals of the processor are 12 bits, 5 bits, and 2 bits for the register file 10, ALU 11, and multiplier 12, respectively, as shown in FIG. The signal line for the Regis file 10 is 4 bits to select one of the 16 lines, and this is necessary for a total of 3 independent lines for 2 lines and 1 line. It has 12 bits. As shown in Fig. 4, the number of register evenings to be selected is reduced to five, so the register evening selection signal can be 3 bits and 3 systems, for a total of 9 bits. As shown in Fig. 3, the control signal for ALU 11 required 5 bits to select arithmetic operation such as addition / subtraction / shift and logical operation such as OR / AND. Arithmetic processing is performed in the x-ray Only one bit is required for selection of arithmetic / subtraction. In addition, in the case of the x-ray converter shown in FIG. 4, the input / output control of the output register MAC 13 of the multiplier 12 becomes completely unnecessary.
第 5図にはプロセッザの命令セッ トが例示され、第 6図にはァクセラ レ一夕の命令セッ トが例示されている。プロセヅサでサボ一卜する 1 7 個の命令がァクセラレ一夕では 7個の命令になっている。ァクセラレー 夕の 7命令の命令コードはプロセッサに含まれる対応命令番号の命令 コードと同一、 即ち、 ビッ ト配列が同一である。  FIG. 5 shows an example of a processor instruction set, and FIG. 6 shows an example of an instruction set for the fax machine. The 17 instructions to be saboted by the processor are now 7 instructions in the Xerasele. The instruction codes of the seven instructions are the same as the instruction codes of the corresponding instruction numbers included in the processor, that is, the bit arrangement is the same.
プロセッサの命令番号 1〜 6の制御命令の中でァクセラレー夕で必 要なのは命令番号 6の R E P E A T命令(リビート命令)のみである。 プロセッサの命令番号 7〜 1 1の演算命令では、命令番号 7と 8の加減 算命令を残し、命令番号 1 1の乗算命令 M U Lを自乗演算命令 S Q Aに 置き換えて、 ァクセラレー夕の命令セッ 卜に含めている。命令番号 1 1 の乗算命令 M U Lと自乗演算命令 S Q Aとのビッ ト配列は同一である ことは言うまでもない。 ァクセラレ一夕の命令セッ トには、 プロセッサ の命令番号 1 2、 1 3、 1 4のデ一夕転送命令を残し、 M A Cレジス夕 への口一ドストァ命令は含まれていない。前述の演算回路に与える制御 信号線の削減に加えてサポートする命令の削減を行なうことで、ァクセ ラレー夕の回路規模はプロセッサの回路規模の 2分の 1以下になる。 尚、 ァクセラレー夕の演算回路の規模は、 演算の種類によってはプロ セッサが保有する演算回路の規模よりも大きくなる可能性が有ること は言うまでもない。ァクセラレー夕のシーケンス部や命令デコーダなど の制御回路の論理規模はどのような場合にもプロセッサのそれよりも 大きくならないこととは相異する点である。  Of the processor instruction numbers 1 to 6, only the instruction number 6 REPEAT instruction (rebeat instruction) is required at the accelerator. For the instruction numbers 7 to 11 of the processor, the addition and subtraction instructions of instruction numbers 7 and 8 are left, and the multiplication instruction MUL of instruction number 11 is replaced with the square operation instruction SQA, and included in the instruction set of the accelerator. ing. It goes without saying that the bit arrangements of the multiplication instruction MUL of instruction number 11 and the square operation instruction SQA are the same. The instruction set of the AXELLALE does not include the processor instruction numbers 12, 13, 14 and the data transfer instruction, and does not include the Mouth Register instruction to the MAC register. By reducing the number of supported instructions in addition to the reduction of the control signal lines given to the arithmetic circuit described above, the circuit size of the accelerator is reduced to less than half the circuit size of the processor. It goes without saying that the scale of the arithmetic circuit in the accelerator may be larger than the scale of the arithmetic circuit possessed by the processor, depending on the type of operation. This is different from the fact that the logic scale of the control circuit such as the sequence section and instruction decoder of the accelerator is not larger than that of the processor in any case.
第 7図には設計したァクセラレー夕をプロセヅサと組み合わせて構 成したシステム L S Iが示される。 プロセッサ 2 0はプログラムカウン夕 (P C ) 2 1、演算回路(E x ) 2 2、 命令デコーダ ( I n s t— D E C ) 2 3、 制御手順を発生するス テートマシン 2 4から成る。 前記ステートマシン 2 4は、 次に実行すベ き命令ァドレスをプログラムカウン夕 2 1に保有させ、 プログラムカウン夕 2 1が保有する命令アドレスにしたがってフェッチした命令に基づいて命令 デコーダ 2 3に制御信号を出力させるシーケンス制御部を構成する。 前記演 算回路 2 2は、 前記制御信号によってその動作が制御されるレジス夕回路及 び演算器を有する実行部を構成する。 Fig. 7 shows a system LSI configured by combining the designed accelerator with a processor. The processor 20 includes a program counter (PC) 21, an arithmetic circuit (Ex) 22, an instruction decoder (Inst-DEC) 23, and a state machine 24 for generating a control procedure. The state machine 24 holds the instruction address to be executed next in the program counter 21, and sends a control signal to the instruction decoder 23 based on the instruction fetched in accordance with the instruction address held in the program counter 21. Is configured to output a sequence control unit. The arithmetic circuit 22 constitutes an execution unit having a resistive circuit whose operation is controlled by the control signal and an arithmetic unit.
ァクセラレ一夕 3 0も同じように、 プログラムカウン夕 (P C ) 3 1、 演算回路 (E x ) 3 2、 命令デコーダ ( I n s t— D E C ) 3 3、 制御 手順を発生するステートマシン 3 4から構成されている。前記ステ一ト マシン 3 4は、次に実行すべき命令ァドレスをプログラムカウン夕 3 1に保 有させ、 プログラムカウン夕 3 1が保有する命令アドレスにしたがってフエ ツチした命令に基づいて命令デコーダ 3 3に制御信号を出力させるシ一ケン ス制御部を構成する。 前記演算回路 3 2は、 前記制御信号によってその動作 が制御されるレジス夕回路及び演算器を有する実行部を構成する。但し、 そ れらァクセラレ一夕 3 0の構成回路は、前述のようにプロセッサ 2 0の 構成回路に比べて機能が限定されている。  In the same way, AXELLALE 30 consists of a program counter (PC) 31, an arithmetic circuit (Ex) 32, an instruction decoder (Inst—DEC) 33, and a state machine 34 that generates control procedures. Have been. The state machine 34 stores the instruction address to be executed next in the program counter 31 and the instruction decoder 33 based on the instruction fetched in accordance with the instruction address held by the program counter 31. A sequence control unit for outputting a control signal to the control unit. The arithmetic circuit 32 constitutes an execution unit having a resistive circuit whose operation is controlled by the control signal and an arithmetic unit. However, the functions of the constituent circuits of the accelerator 30 are limited as compared with the constituent circuits of the processor 20 as described above.
プロセッサ 2 0はコモンアドレスバス 4 0及びコモンデータバス 4 1を介してメインメモリ 4 2に接続され、メインメモリ 4 2上に格納さ れたプログラムを読み込んで動作する。ァクセラレ一夕 3 0はローカル メモリ 4 3にローカルバス 4 4を介して接続されている。ァクセラレ一 夕 3 0は、 プロセッサ 2 0からその動作が指示されると、 ローカルメモ リ 4 3上に格納されたプログラムを読み込んで実行する。データもロー カルメモリ 4 3上に予め転送されている。第 7図において、 メインメモ リ 4 2とローカルメモリ 4 3との間のデ一夕転送制御は、プロセッサ 2 0が行う。 The processor 20 is connected to the main memory 42 via the common address bus 40 and the common data bus 41, and operates by reading a program stored in the main memory 42. The storage unit 30 is connected to a local memory 43 via a local bus 44. When the operation is instructed by the processor 20, the external storage 30 reads and executes the program stored in the local memory 43. The data has also been transferred to the local memory 43 in advance. In FIG. 7, the data transfer control between the main memory 42 and the local memory 43 is controlled by the processor 2 0 does.
プロセッサ 2 0はローカルメモリ 4 3等へのデ一夕の転送が終了し た後、 ァクセラレ一夕 3 0の起動命令を発行し、 プロセッサ自身は他の 処理を続ける。ァクセラレー夕 3 0は割り込みなどによって処理の終了 をプロセッサに伝えて待機する。プロセッサ 2 0とァクセラレ一夕 3 0 の接続方法は数々の手法が提案されているので、アプリケーシヨンシス テムに最適な方法で接続すればよい。 また、 コモンバス 4 0, 4 1と口 —カルバス 4 4の構成も命令とデータを同じバスで転送する構成とし てもよいし、 命令とデ一夕を独立する別のバスにしてもよい。  After the transfer of the data to the local memory 43 or the like is completed, the processor 20 issues an activation instruction for the excel processor 30 and the processor itself continues other processing. The actuator 30 informs the processor of the end of processing by an interrupt or the like and waits. Various methods have been proposed for the connection between the processor 20 and the excel server 30. The connection may be made in a manner that is optimal for the application system. In addition, the configuration of the common buses 40, 41 and the mouth-calbus 44 may be such that the instruction and data are transferred on the same bus, or the instruction and the data may be provided on separate buses.
第 7図の例では、 システム L S I 1 9は、 プロセッサ 2 0、 ァクセラ レ一夕 3 0、 及びローカルメモリ 4 3を含んで、 1個の半導体チップに 構成されている。 尚、 第 7図には、 プロセッサ 2 0及びァクセラレー夕 3 0においてオペラン ドフェッチのためのアドレス出力経路の図示が 省略されているが、 実際にはプログラムカウン夕 2 1, 3 1とは別のァ ドレス演算器を備え、当該ア ドレス演算器からコモンア ドレスバス 4 0 にァドレス信号を出力するようになっている。そのようなアドレス演算 器は、 例えば演算回路 2 2, 3 2に備えられている。  In the example of FIG. 7, the system LSI 19 includes a processor 20, an accelerator 30, and a local memory 43, and is configured as one semiconductor chip. In FIG. 7, the address output path for operand fetch is not shown in the processor 20 and the accelerator unit 30, but it is actually different from the program counter 21 and 31. The address arithmetic unit is provided, and the address arithmetic unit outputs an address signal to the common address bus 40. Such an address arithmetic unit is provided in, for example, arithmetic circuits 22 and 32.
第 8図にはァクセラレー夕のプログラムが例示されている。同図に示 されるプログラムは第 2図のベースプログラムから抽出した行番号 MSE— 01〜MSE— 14 の処理を実現するものである。 第 8図のァクセラレ一 夕プログラムによれば、 : R E P E A T命令により 1 2 8回繰り返すルー プの中のステップ数が 7ステツプから 5ステツプになっている。プロセ ッサで実現した場合 7 X 1 2 8 = 8 9 6ステップを要していたものが、 ァクセラレ一夕で処理を置き換えることによって、 5 x 1 2 8 = 6 4 0 ステツプに処理が短縮されたうえ、その期間プロセッサは別の処理を行 なえるので、 性能向上の効果は極めて高い。 第 9図には前記プロセッサのソフ ト I Pモジュールデ一夕の一例が 示される。 同図に示されるソフ ト I Pモジュールデ一夕は、 前記システ ム L S Iの開発方法で説明したようにサブセッ トとしてのァクセラレ —夕の設計を容易化できるデータ構造を採用したものである。 Fig. 8 shows an example of the program of the Axelare evening. The program shown in the figure implements the processing of the line numbers MSE-01 to MSE-14 extracted from the base program of FIG. According to the excel program of Fig. 8, the number of steps in the loop that is repeated 128 times by the: REPEAT instruction changes from 7 steps to 5 steps. In the case of a processor, processing that required 7 X 1 2 8 = 8 9 6 steps was reduced to 5 x 1 2 8 = 6 4 0 steps by replacing the processing with Axellare overnight. In addition, the effect of the performance improvement is extremely high because the processor can perform other processing during that period. FIG. 9 shows an example of the software IP module data of the processor. The soft IP module data shown in the figure adopts a data structure that can facilitate the design of the x-ray as a subset as described in the method of developing the system LSI.
第 9図において、 プロセッサの I Pモジュールデータ 5 0は、 プロセ ッサの機能を定義するための機能定義情報として制御機能定義情報 5 1及び実行機能定義情報 5 2と、 制御信号情報 5 3とを有する。  In FIG. 9, the processor IP module data 50 includes control function definition information 51 and execution function definition information 52 as function definition information for defining processor functions, and control signal information 53. Have.
前記制御機能定義情報 5 1は、 例えば、 第 1の命令セッ トに含まれる 命令のフエツチと解読を行うと共に前記第 1の命令セッ トに含まれる 命令の実行手順を制御する制御プロックを、 プログラムカウン夕、 命令 デコーダ、 シーケンス制御部などによって構成することを、 コンビユー 夕言語で記述したデータによって構成される。前記実行機能定義情報 5 2は、前記制御ブロックで生成された制御信号に従った演算処理を行う 実行ブロックを、 レジス夕ファイル、 演算器、 内部バスなどによって構 成することを、コンビュ一夕言語で記述したデータによって構成される ( 制御機能定義情報 5 1及び実行機能定義情報 5 2は、 H D L等のコンビ ユー夕言語で記述されており、 夫々機能変更可になっている。  The control function definition information 51 includes, for example, a control block that reads and decodes an instruction included in a first instruction set and controls an execution procedure of an instruction included in the first instruction set. It is composed of data written in a combi-language language, consisting of a counter, instruction decoder, sequence control unit, etc. The execution function definition information 52 indicates that an execution block that performs arithmetic processing according to the control signal generated by the control block is configured by a register file, a computing unit, an internal bus, or the like. (The control function definition information 51 and the execution function definition information 52 are described in a convenience language such as HDL, and each of the functions can be changed.
前記制御信号情報 5 3は、前記制御プロックと実行プロックとの間の 制御信号を夫々明示する情報である。即ち、 プロセッサの動作シーケン スを制御する制御プロックとデ一夕演算を行う実行プロックとの間の 制御信号の意味及び論理値等が I Pモジュールデータのユーザに開示 されている。 そのような制御信号情報 5 3は、 例えば、 レジス夕選択信 号、 演算種類選択信号、 フラグ出力等である。  The control signal information 53 is information that clearly specifies a control signal between the control block and the execution block. That is, the meaning and the logical value of the control signal between the control block for controlling the operation sequence of the processor and the execution block for performing the data calculation are disclosed to the user of the IP module data. Such control signal information 53 is, for example, a register selection signal, a calculation type selection signal, a flag output, and the like.
前記ソフ ト I Pモジュールデータは、 前述のように、 ァクセラレ一夕 を再構成するのに便利なように、各ブロックの機能や動作が明示されて おり、 かつ、 設計上の制約が明らかにされている。 例えば、 シーケンス 制御部は状態遷移図で記述されていて、ァクセラレー夕に転用する場合 に機能を限定するには、不要な状態もしくは状態遷移経路を削除すれば よい。 命令デコーダは、 命令コード、 状態、 などを入力とし、 レジス夕 選択、演算種類選択などの制御信号を出力とする真理値表で定義されて いて、 不要な制御信号を削除して再論理合成をすることによって、 ァク セラレー夕の命令デコーダの論理を生成することができる。プログラム カウン夕についても同様で、 例えば、 H D Lで記述されているものから 機能を削減して再度論理合成すればよい。実行ュニッ 卜については、 レ ジス夕ファイルのレジス夕数やポート数などが記述されているので、そ れを必要に応じて削減し、演算については制御信号を追加しない範囲で 演算回路の記述を置き換えればよい。 このとき、 プロセッサのソフ ト I Pモジュールデ一夕で規定されていた遅延などの特性の制約について は、 それを満たすように再設計すればよい。遅延などの特性の制約は論 理合成の制約ファイルの形で与えられると検証に便利である。 As described above, the functions and operations of each block are specified in the soft IP module data so that it is convenient for reconfiguring the function, and design constraints are clarified. I have. For example, the sequence The control unit is described in a state transition diagram. To limit the functions when the control unit is diverted to an application, an unnecessary state or state transition path may be deleted. The instruction decoder is defined as a truth table that receives instruction codes, states, etc. as inputs, and outputs control signals for register selection, operation type selection, etc., and removes unnecessary control signals to perform re-logic synthesis. By doing so, it is possible to generate the logic of the instruction decoder of the accelerator. The same applies to the program counter. For example, the functions may be reduced from those described in HDL and the logic may be re-synthesized. As for the execution unit, the number of registers and the number of ports in the registry file are described, so reduce them as necessary, and for the calculation, describe the arithmetic circuit as far as control signals are not added. Just replace it. At this time, it is only necessary to redesign to satisfy the constraints on characteristics such as delay specified in the processor's soft IP module. It is convenient for verification if constraints on characteristics such as delay are given in the form of a logic synthesis constraint file.
プロセッサの設計データをァクセラレ一夕の設計に流用するとき、制 御プロックには新たな機能を付加すること無くサブセッ トとして必要 な機能を与え、実行プロックにはァクセラレ一夕固有の演算処理のため の新たな機能を与えることができる。前記制御信号情報 5 3は、 前記制 御プロックと実行プロックとの間の制御信号を夫々明示する情報であ るから、 実行ブロックにおける新たな演算機能の制御に、 制御ブロック からの制御信号の内のどの制御信号を割り当てるかを容易に決定でき る。 したがって、 ソフ ト I Pモジュールデータのような設計データを用 いれば、 ァクセラレー夕の設計が極めて容易になる。  When diverting the processor design data to the design of the Xerasele, the control block is given the necessary functions as a subset without adding new functions, and the execution block is used for the arithmetic processing unique to the Xerasele. New functions can be given. Since the control signal information 53 is information that clearly specifies a control signal between the control block and the execution block, the control signal information 53 is used to control a new arithmetic function in the execution block, and is included in the control signal from the control block. Which control signal to assign can be easily determined. Therefore, the use of design data such as soft IP module data makes it extremely easy to design an accelerator.
第 1 0図には前記開発方法で用いることができるソフ トウエアシミ ュレー夕の一例が示される。プロセッサのソフ トウェアシミュレ一夕 6 0は、制御ルーチン 6 1がハ一ドウエアと等価の機能になるように設計 されていて、デ一夕演算ルーチン 6 2がその演算機能に応じてファンク シヨン定義されている。データ演算ルーチン 6 2は、 制御ルーチン 6 1 からファンクションコールされて使われる。ソフ トウエアシミュレータ はこのほか、 ユーザィン夕フェースルーチン 6 3と、 アセンブラ、 逆ァ センブラ、デバヅガなどのュ一テリティツール 6 4と組み合わされてい る。 ァクセラレ一夕のソフ トウエアシミュレ一夕を生成するには、 最低 限、 デ一夕演算ルーチン 6 2において、 置き換えた演算についてにファ ンクション定義を置換すればよい。削減した命令やオペランドについて の文法チエツクをアセンブラやデバッガに追加できればさらに使い易 くなる。ハ一ドウエアの制御プロヅクに関しては、 実質的な機能削減が 行なわれるだけなので、 制御ルーチン 6 1の修正は不要である。以上の ように、 ァクセラレー夕のソフ トウェアシミュレータは、極めて容易に 生成することができる。 FIG. 10 shows an example of a software simulation that can be used in the development method. The processor software simulation 60 is designed so that the control routine 61 has a function equivalent to that of hardware. In this case, the function routine 62 is defined as a function according to its calculation function. The data operation routine 62 is used as a function call from the control routine 61. In addition, the software simulator is combined with a user interface routine 63 and utility tools 64 such as assembler, disassembler, and debugger. In order to generate the software simulation of the X-Series, at least, the function definition should be replaced for the replaced calculation in the D-X calculation routine 62. It would be even easier if a syntax check for reduced instructions and operands could be added to the assembler or debugger. As for the hardware control program, only the substantial function reduction is performed, so that the control routine 61 does not need to be modified. As described above, the software simulator of Axelalay can be generated very easily.
ァクセラレー夕のソフ トウエアシミュレータとプロセッサのソフ ト ウェアシミュレ一夕とを用いることにより、 ァクセラレ一夕単体、 並び にプロセッサとの組み合わせで、 夫々の論理を検証することができ、 ァ クセラレ一夕のソフ トウエアシミュレー夕が提供されていない設計環 境に比べると、ァクセラレー夕の設計品質を飛躍的に高めることができ る。  By using the software simulator of the Xerasele and the software simulation of the processor, it is possible to verify the logic of each of the Xerasele alone and in combination with the processor. Compared to a design environment where software simulation is not provided, the design quality of XALA can be dramatically improved.
第 1 1図には前記システム L S Iの開発に用いられるエンジニアリ ングワークステ一ション、パーソナルコンピュータ若しくは設計装置の ようなコンピュータの一例が示される。第 1 1図に示されるコンビユー 夕 7 0は、 プロセッサ及びメモリなどを実装したプロセッサボ一ド、 そ して各種イン夕フェースボードを搭載した本体 7 1に、ディスプレイ Ί 2、 キーボード 7 3、 ディスク ドライブ 7 4などの代表的に示された周 辺機器が接続されて構成される。 前記 I Pモジュールデ一夕及びプロセヅサのソフ トウエアシミュレ 一夕のデータは、 磁気テープ、 フロッピーディスク、 ハ一ドディスク、 C D - R O M^ M O (マグネッ トーォプチカル ·ディスク) などの記録 媒体 7 5に、 コンピュータ読取り可能に記憶されている。記録媒体 7 5 は、 特に制限されないが、 前記ディスク ドライブ 7 4に装着されて、 そ れに記憶されている I Pモジュールデ一夕ゃソフ トウエアシミュレ一 夕のデ一夕がコンビュ一夕の本体 7 2に読み込まれる。コンピュータ本 体 7 2は、 読み込んだソフ ト I Pモジュールデータ等を用いて、 前記第 1図に示したシステム L S Iの開発手順を制御するプログラム(システ ム L S Iの開発支援プログラム) を実行する。 FIG. 11 shows an example of a computer such as an engineering workstation, a personal computer, or a design device used in the development of the system LSI. The combination 70 shown in Fig. 11 is composed of a processor board equipped with a processor and memory, and a main body 71 equipped with various interface boards, a display Ί2, a keyboard 73, and a disk. Peripherally-illustrated peripheral devices such as the drive 74 are connected and configured. The data of the IP module data and the software simulation of the processor are recorded on a recording medium 75 such as a magnetic tape, a floppy disk, a hard disk, a CD-ROM ^ MO (magnet optical disk), and read by a computer. It is memorized as possible. Although the recording medium 75 is not particularly limited, it is mounted on the disk drive 74, and the IP module data stored in the disk drive 74 and the software simulation data stored therein are stored in the main unit 7 of the convenience store. Read into 2. The computer main body 72 executes the program (system LSI development support program) for controlling the system LSI development procedure shown in FIG. 1 by using the read soft IP module data and the like.
システム S Iの開発支援プログラムは、前述の第 1図などに基づい て説明したシステム L S Iの開発方法による処理内容を C言語のよう な高級言語で記述したソースプログラムを元に、これをコンパイルして、 夕一ゲッ トとするコンピュー夕固有のォブジェク トコードに変換され た機械語プログラム (オブジェク トプログラム) である。  The system SI development support program compiles, based on a source program that describes the processing contents of the system LSI development method described with reference to Fig. 1, etc. in a high-level language such as C, It is a machine language program (object program) that has been converted to an object code unique to the computer that will be the subject of the evening.
このシステム L S Iの開発支援プログラムは、特に制限されないが、 第 1 1図に例示された記録媒体 7 6に、コンピュータによって読取り可 能に記憶されている。 記録媒体 7 6には、 磁気テープ、 フロッピーディ スク、ハ一ドディスク、 C D— R O M、 M O (マグネッ ト一ォプチカル ' ディスク) 等を採用すればよい。  The development support program of the system LSI is not particularly limited, but is stored in a recording medium 76 illustrated in FIG. 11 so as to be readable by a computer. As the recording medium 76, a magnetic tape, a floppy disk, a hard disk, a CD-ROM, an MO (magnetic optical 'disk), or the like may be used.
前記記録媒体 7 6は、 特に制限されないが、 前記ディスク ドライブ 7 4に装着されて、それに記憶されているシステム L S Iの開発支援プロ グラムがコンピュータの本体 7 1に読み込まれる。例えば、 読み込まれ たシステム L S Iの開発支援プログラムは、コンビュ一夕本体 7 1のメ モリにロードされ、 ロードされたプログラムを順次解読しながら、 前述 のシステム L S Iの開発支援動作を行っていく。或いは、 記録媒体 7 6 から読み込まれたシステム L S Iの開発支援プログラムはコンビュ一 夕本体 7 1に備え付けのハ一ドディスク装置の磁気記録媒体にィンス トールされて、そこから随時メモリにロードされて実行されてもよい。 この場合、システム L S Iの開発支援プログラムをデ一夕圧縮した状態 で記録媒体 7 6に格納しておき、前記ハードディスクへのインストール に際して伸長してもよい。 Although not particularly limited, the recording medium 76 is mounted on the disk drive 74, and the system LSI development support program stored therein is read into the computer main body 71. For example, the read system LSI development support program is loaded into the memory of the main unit 71, and the above-described system LSI development support operation is performed while sequentially decoding the loaded programs. Or, recording medium 7 6 The system LSI development support program read from the computer may be installed on the magnetic recording medium of the hard disk device provided in the computer main body 71, and may be loaded into the memory at any time and executed therefrom. In this case, the system LSI development support program may be stored in the recording medium 76 in a compressed state, and decompressed when the program is installed on the hard disk.
このように、プロセッサの I Pモジュールデ一夕のようなハードゥエ ァの機能記述データと、プロセッサのソフ トウヱァシミュレ一夕のよう な設計財産とをァクセラレータの設計者に公開することによって、高品 質のァクセラレ一夕を短期間に設計することができる。  In this way, by exposing the function description data of the hardware such as the IP module data of the processor and the design property such as the software simulation of the processor to the designer of the accelerator, a high-quality accelerator can be obtained. An overnight can be designed in a short time.
以上の説明では、 ァクセラレ一夕の設計において、 制御回路の機能を 限定した分だけそのハードウヱァを削減している。第 2の例として、 ァ クセラレー夕の設計に関しては、あえて制御回路のハ一ドウヱァを削減 しない方法を採用しても良い。第 1図の方法では、 ハードウエアを削減 するとしても既に検証の終わっているプロセッサの回路から不要な機 能を除去するだけであるので、人為的ミスが入りにくいので設計が短期 間で済むという効果がある。 しかし、 回路を変更するのであるから、 少 なくとも検証作業は必要であり、プロセッサの検証で用いたシミュレ一 ションの中で必要なものについてのみ再試行する。プロセッサの設計者 とァクセラレー夕の設計者が異なると、必要なシミュレ一ションのみを 選別すること自体が時間を要する作業になる場合がある。その場合には、 あえてハードウエアを削減せず、プロセッサの制御回路をそのままァク セラレー夕に用いればよい。人為的ミスの入り込む余地はなく、 検証作 業は本質的に不要で、 念の為に検証作業を実施したとしても、極く短期 間で済む。  In the above description, the hardware of the control circuit is reduced by the amount limited to the function of the control circuit in the design of the excellence studio. As a second example, regarding the design of the accelerator, a method that does not intentionally reduce the hardware of the control circuit may be adopted. In the method shown in Fig. 1, even if the hardware is reduced, only unnecessary functions are removed from the circuit of the processor that has already been verified. effective. However, since the circuit is changed, at least verification work is necessary, and only the necessary simulations used in the processor verification are retried. If the designers of the processor and the designers of Axelaray are different, selecting only the necessary simulations may be a time-consuming task. In such a case, the control circuit of the processor may be used as it is for the function without reducing the hardware. There is no room for human error, and verification work is essentially unnecessary. Even if verification work is performed just in case, it can be done in a very short time.
更に、 第 3の例として、 プロセッサの演算回路を変更せずに、 そのま まァクセラレ一夕に適用してもよい。 この場合は、 アプリケーション特 有の特殊演算を定義して処理ステップ数を削減できるという効果を得 られない。 一方、 前述と同様演算回路の検証作業が本質的に不要で、 念 の為に実施したとしても極く短期間で済むと言う効果を得ることがで きる。 更に、 プロセッサのシミュレ一夕がそのまま使えるので、 シミュ レー夕の設計変更も不要になる。 その上、 ベースプログラムからァクセ ラレー夕に適用するために抽出したル一チンはそのままァクセラレー 夕の制御プログラムとなり、ベースプログラムでの検証が十分に行なわ れていれば、 本質的に検証を繰り返す必要はない。 Furthermore, as a third example, without changing the arithmetic circuit of the processor, It may be applied to Xerasele overnight. In this case, the effect that the number of processing steps can be reduced by defining an application-specific special operation cannot be obtained. On the other hand, as described above, the operation of verifying the operation circuit is essentially unnecessary, and even if it is performed just in case, it is possible to obtain an effect that the operation can be performed in a very short time. Furthermore, since the simulation of the processor can be used as it is, there is no need to change the design of the simulation. In addition, the routine extracted from the base program for application to the x-ray system is the control program for the x-ray system, and if the base program has been sufficiently verified, it is essentially unnecessary to repeat the verification. Absent.
第 4の例として、一つのァクセラレ一夕を複数のルーチンで使い回し てもよい。例えば、 アプリケ一シヨンに特有の演算が倍精度演算であつ たような場合、ァクセラレー夕の演算回路に倍精度演算器を備える構成 を採用する。ディジタルサーボなどの場合、 演算精度が 1 6ビッ トでは 不足するので倍精度演算が使われるが、制御はあまり複雑でないので 1 6ビッ ト以下のプロセッサで十分である。本発明を適用すれば 8ビッ ト プロセッサの制御部に 2 4ビッ ト若しくは 3 2 ビッ ト精度の演算回路 を組み合わせたァクセラレータを実現できる場合も有る。 このとき、 ァ クセラレ一夕で行なう処理は、次数ゃフィ一ドバックの有無等が異なる フィル夕処理、 誤差演算処理など多岐に渡る可能性があるので、 単一の 倍精度演算ァクセラレ一夕で幅広い処理シーケンスに対応することが 望まれる。例えば第 7図において、 ァクセラレー夕のローカルメモリの プログラム格納領域を書き換え可能なメモリとして、実行する処理に応 じたプログラムをァクセラレ一夕の起動前に転送しておく方法を採用 する。ァクセラレー夕の処理アルゴリズムをシステム L S Iの動作中に ダイナミヅクに変更できるという効果がある。ディジタルサーボにおい てフィル夕の次数を制御対象系の振る舞いに応じてダイナミックに調 整する場合に有効である。処理のシーケンスが予測できる場合には、 ァ クセラレ一夕のローカルメモリを読み出し専用メモリで構成し、複数の 処理プログラムを格納しておく。求められる処理に応じて実行するプロ グラムの先頭番地を変更することによって、複数の処理に対応するァク セラレー夕を実現できる。当然読み出し専用メモリに予めプログラムを 格納しておけば、チップサイズで有利になるとともにプログラムを転送 するオーバーへッ ドも生じない。 As a fourth example, one fax / record may be used repeatedly in a plurality of routines. For example, if the operation specific to the application is a double-precision operation, a configuration in which a double-precision arithmetic unit is provided in the arithmetic circuit of the accelerator is adopted. In the case of digital servos, double-precision arithmetic is used because the arithmetic precision is insufficient with 16 bits. However, since the control is not so complicated, a processor with 16 bits or less is sufficient. By applying the present invention, an accelerator that combines a control unit of an 8-bit processor with an arithmetic circuit having a 24-bit or 32-bit precision may be realized in some cases. At this time, the processing performed in the x-axis can be performed in a wide variety of ways, such as filter processing and error calculation processing with different degrees of feedback, etc. It is desirable to support the processing sequence. For example, in FIG. 7, a method is adopted in which the program storage area of the local memory of the accelerator is set as a rewritable memory and a program corresponding to the processing to be executed is transferred before the start of the accelerator. This has the effect that the processing algorithm of the accelerator can be dynamically changed during the operation of the system LSI. The digital servo dynamically adjusts the order of the filter according to the behavior of the controlled system. This is effective when adjusting. If the processing sequence can be predicted, the local memory of the Accelerator is configured with a read-only memory and a plurality of processing programs are stored. By changing the start address of the program to be executed according to the required processing, it is possible to realize an accelerator that can handle multiple processing. Naturally, if the program is stored in the read-only memory in advance, the chip size is advantageous, and there is no overhead for transferring the program.
第 5の例として、複数のァクセラレー夕を搭載してシステム L S Iを 構成してもよい。第 1 2図に示すように、 第 7図に示した構成を拡張し て複数のァクセラレ一夕 3 0 A, 3 0 Bを搭載する。個々のァクセラレ —夕 3 0 A, 3 0 Bがローカルメモリ 4 3 A , 4 3 Bを備えることによ つて、 複数のァクセラレー夕 3 0 A, 3 0 Bを同時並行に動作させるこ とができる。マルチタスクシステムにおいてオペレ一テイングシステム をプロセッサ 2 0で動作させ、各々のタスクを個別のァクセラレー夕 3 O A , 3 0 Bで動作させることができる。個々のァクセラレ一夕は別の タスクを実行している別のァクセラレー夕に動作を妨害されないので、 時間管理が容易になる。さらにタスクを増やす場合もァクセラレー夕の 相互関係が粗であるので容易に追加できる。 また、 タスク毎に特有の演 算も容易にサポ一トできる。同一のプロセッサを用いたマルチプロセッ サシステムでは、特殊演算命令を追加するには命令コ一ドの制約から自 ずと限界がある上、 互換性の問題もあって、 特殊演算命令を追加するこ とは実質上不可能である。  As a fifth example, a system LSI may be configured by mounting a plurality of actuators. As shown in Fig. 12, the configuration shown in Fig. 7 is expanded to include a plurality of accelerators 30A and 30B. Individual axelares—30 a and 30 b have local memories 43 a and 43 b so that multiple axelares 30 a and 30 b can be operated simultaneously in parallel . In a multitasking system, the operating system can be operated by the processor 20 and each task can be operated by individual accelerators 30OA and 30B. Time management becomes easier because each individual session is not disturbed by another operator performing another task. If the task is further increased, it can be easily added because the correlation between the functions is poor. Also, it is possible to easily support operations specific to each task. In a multi-processor system using the same processor, adding special operation instructions is naturally limited due to restrictions on instruction codes, and there are also compatibility issues. Is virtually impossible.
第 6の例としてァクセラレー夕の起動/終了方法について述べる。ァ クセラレー夕の起動は起動命令を使ってもよいし、専用の制御レジス夕 にパラメ一夕を書き込む方法でもよい。第 4、 第 5の例で示したような 複雑なシステムでは、制御パラメ一夕を書き込むことによる起動がプロ セッサの命令セッ トからの制約を受けないので好適である。即ち、 例え ば単一若しくは複数のァクセラレ一夕を用いて数十種類の処理を起動 命令で直接起動しょうとすれば、 5ビッ ト程度の識別子を命令コード内 に埋め込むことになり、 命令の自由度を制限する。制御レジス夕のァク セスによれば制御レジス夕のビッ ト数を自由に決めればよいので、プロ セッサの命令セッ トの制約を受けない。 As a sixth example, we will describe how to start and stop the Xerasele. The activation of the accelerator may be performed by using a start instruction, or by writing parameters in a dedicated control register. In complex systems such as those shown in the fourth and fifth examples, starting by writing control parameters This is preferable because it is not restricted by the instruction set of the processor. In other words, for example, if it is attempted to directly start dozens of types of processing using a single or a plurality of accelerators with a start instruction, an identifier of about 5 bits will be embedded in the instruction code, and the instruction can be freely executed. Limit the degree. According to the access of the control register, the number of bits in the control register can be freely determined, so that there is no restriction on the instruction set of the processor.
ァクセラレー夕の処理の終了は割り込み若しくはステータスレジス 夕を用いればよい。割り込みはプロセッサの許容する割り込みの本数で 制約を受けるが、終了から次の処理への移行が迅速にできるという効果 がある。プロセッサの受け付ける割り込みの数がァクセラレー夕から発 行したい割り込みより少ない場合には、複数のァクセラレ一夕からの割 り込みを論理和でまとめてプロセッサに入力し、プロセッサの割り込み 処理ルーチンで割り込み要因を判断させるといった従来から常用され ている手法で解決すればよい。一方、 割り込みによるオーバーへッ ドが 負担になるようなシステムではステータスレジス夕によってァクセラ レ一夕の状態を監視できるようにしておけばよい。  An interrupt or status register may be used to terminate the process. Although the number of interrupts is limited by the number of interrupts allowed by the processor, the effect is that the transition from the end to the next processing can be performed quickly. If the number of interrupts accepted by the processor is less than the number of interrupts to be issued from the accelerator, the interrupts from multiple accelerators are combined into a logical sum and input to the processor, and the interrupt source is determined by the processor's interrupt processing routine. The problem can be solved by a conventionally used method such as making a judgment. On the other hand, in systems where overhead due to interrupts is a burden, the status register status should be monitored by the status register.
第 2乃至第 6の例は夫々上述の固有の効果があるが、複数の例を組み 合わせも十分に効果がある。  Each of the second to sixth examples has the above-described unique effects, but a combination of a plurality of examples is also sufficiently effective.
以上によれば、ァクセラレータの設計を短期間でしかも高品質で実現 できる。  According to the above, the design of an accelerator can be realized in a short period of time and with high quality.
ァクセラレー夕にプロセッサと準互換の命令セッ トを備えることに よって、ァクセラレ一夕の制御プログラムを開発する上でソフ トウェア 開発環境を容易に提供できるようになる。  By providing an instruction set that is quasi-compatible with the processor, the software development environment can be easily provided for developing control programs for the accelerator.
ァクセラレ一夕の開発環境とプロセッサの開発環境を組み合わせる ことによって、システム L S I としての統合した機能と性能の検証を短 時間で且つ高い信頼性を持って行うことができる。 ァクセラレ一夕を開発するのに要していた時間を短縮することがで き、 更に、 設計の早い段階からの統合検証によって、 品質の高いシステ ム L S Iを提供することができる。 By combining the development environment of Accelera Izuya with the development environment of the processor, it is possible to verify the integrated functions and performance of the system LSI in a short time and with high reliability. It can reduce the time required to develop the AXELLALE, and can provide a high-quality system LSI through integrated verification from the early stages of design.
以上本発明者によってなされた発明を実施例に基づいて具体的に説 明したが本発明はそれに限定されるものではなく、その要旨を逸脱しな い範囲において種々変更可能である。  The invention made by the present inventor has been specifically described based on the embodiments, but the present invention is not limited thereto, and can be variously modified without departing from the gist thereof.
例えば、システム L S Iなどの半導体集積回路に搭載するァクセラレ —夕の数は 3個以上であってもよい。システム L S Iなどの半導体集積 回路にメインメモリを含めてもよい。逆に、 ローカルメモリを外付けに してもよい。 但し、 半導体集積回路の外部端子 (外部ビン) の数が増え てしまう。システム L S Iなどの半導体集積回路はその他の周辺回路を 搭載してもよい。機能記述言語は R T L (Register Transfer language) であっても良い。 H D Lの内容は I E E E 1 3 6 4として標準化されて いる。 プロセッサはスーパスカラプロセッサであってもよい。 産業上の利用可能性  For example, the number of accelerators mounted on a semiconductor integrated circuit such as the system LSI may be three or more. The main memory may be included in a semiconductor integrated circuit such as a system LSI. Conversely, an external local memory may be used. However, the number of external terminals (external bins) of the semiconductor integrated circuit increases. A semiconductor integrated circuit such as a system LSI may include other peripheral circuits. The function description language may be RTL (Register Transfer language). The content of HDL is standardized as IEEE1364. The processor may be a superscalar processor. Industrial applicability
本発明は、システム L S Iと称される半導体集積回路の開発に適用さ れるだけでなく、 シングルチップマイクロコンピュ一夕、 データプロセ ッサなどと称される種々のデータ処理用 L S I若しくは論理 L S Iの 開発に広く適用することができる。  The present invention is applicable not only to the development of semiconductor integrated circuits called system LSIs, but also to the development of various data processing LSIs or logic LSIs called single-chip microcomputers and data processors. Can be widely applied to.

Claims

請 求 の 範 囲 The scope of the claims
1 .第 1の命令セッ トに含まれる命令を実行可能なプロセッサにァクセ ラレー夕を付加して半導体集積回路を開発する方法であって、 前記第 1の命令セッ卜に含まれる命令に対して同一の命令又は前記第 1 の命令セットに含まれる命令に対して命令コードが同一であって実行結果 の異なる下位互換の命令を含む第 2の命令セットの命令を実行可能に前記 ァクセラレー夕を設計する処理を含むことを特徴とする半導体集積回路の 開発方法。 1. A method for developing a semiconductor integrated circuit by adding an accelerator to a processor capable of executing an instruction included in a first instruction set, the method comprising: Designing the accelerator so that instructions of the second instruction set including the same instruction or instructions included in the first instruction set and having the same instruction code and having different execution results and including backward compatible instructions can be executed. A method for developing a semiconductor integrated circuit, comprising:
2 .第 1の命令セッ トに含まれる命令を実行可能なプロセッサにァクセ ラレー夕を付加して半導体集積回路を開発する方法であって、 前記プロセッサ上で動作するように開発されたベースプログラムから抽 出された一部の処理を、 前記第 1の命令セッ卜に含まれる命令に対して同 一の命令又は前記第 1の命令セットに含まれる命令に対して命令コードが 同一であって実行結果の異なる下位互換の命令を含む第 2の命令セットを 用いてを実現するためのァクセラレータを設計する処理と、 2. A method for developing a semiconductor integrated circuit by adding an accelerator to a processor capable of executing instructions included in a first instruction set, the method comprising: a base program developed to operate on the processor; Some of the extracted processes are executed with the same instruction code as the instruction included in the first instruction set or the same instruction code as the instruction included in the first instruction set. A process of designing an accelerator for realizing using a second instruction set including backward compatible instructions having different results; and
前記抽出された一部の処理を前記ァクセラレ一夕に実行させるためのァ クセラレ一夕制御プログラムを設計する処理と、 を含むことを特徴とする 半導体集積回路の開発方法。  A process for designing an accommodation overnight control program for causing the part of the extracted processing to be executed by the accommodation overnight. A method for developing a semiconductor integrated circuit, comprising:
3 . 第 1の命令セットを有するプロセッサ上で動作するように開発されたべ ースプログラムの一部の機能を抽出する処理と、 3. A process of extracting some functions of a base program developed to operate on a processor having the first instruction set;
前記第 1の命令セットに含まれる命令に対して同一の命令又は前記第 1 の命令セットに含まれる命令に対して命令コードが同一であって実行結果 の異なる下位互換の命令を含む第 2の命令セットを用いて前記抽出された 処理を実現するためのァクセラレー夕の論理構成を設計する処理と、 前記抽出された処理を前記ァクセラレー夕に実行させるためのァクセラ レー夕制御プログラムを設計する処理と、 A second instruction including the same instruction as the instruction included in the first instruction set or a lower compatible instruction having the same instruction code as the instruction included in the first instruction set and having a different execution result. A process of designing a logical configuration of an accelerator for realizing the extracted process using an instruction set, and an accelerator for causing the accelerator to execute the extracted process. A process of designing a laser control program;
前記ベースプログラムによる前記抽出された処理を前記ァクセラレー夕 制御プログラムによる処理に代替させる変更を前記ペースプログラムに対 して行う処理と、  A process of making a change to the pace program to replace the extracted process by the base program with a process by the accelerator control program;
前記変更されたベースプログラムによる前記プロセッサと前記ァクセラ レー夕制御プログラムによるァクセラレー夕とに対する統合シミュレーシ ヨンを行う処理と、 を含むことを特徴とする半導体集積回路の開発方法。. 前記ァクセラレータ制御プログラムを設計する処理は、 前記ァクセラレ —夕のソフトウエアシミュレ一夕を用いて前記ァクセラレ一夕制御プログ ラムを検証する処理を含み、 前記ァクセラレ一夕のソフトウェアシミュレ Performing integrated simulation of the processor based on the changed base program and the accelerator controller according to the accelerator controller control program. The process of designing the accelerator control program includes the process of verifying the accelerator control program using the software simulation program, and the software simulation program of the accelerator program.
—夕は、 前記プロセッサのソフ トウエアシミュレー夕の一部を変更し て成るものであることを特徴とする請求の範囲第 3項記載の半導体集積 回路の開発方法。 4. The method for developing a semiconductor integrated circuit according to claim 3, wherein the evening is obtained by partially changing a software simulation of the processor.
. 前記統合シミュレーションを行う処理は、 統合シミュレ一夕を用いて行 う処理であり、 前記統合シミュレータは、 前記ァクセラレー夕のソフトゥ エアシミュレ一夕と、前記プロセッサのソフ トウエアシミュレー夕とを 組合わせて成るものであることを特徴とする請求の範囲第 4項記載の半 導体集積回路の開発方法。 The process of performing the integrated simulation is a process performed by using an integrated simulator, and the integrated simulator is a combination of the software to air simulator of the Axelaray and the software simulation of the processor. 5. The method for developing a semiconductor integrated circuit according to claim 4, wherein:
.第 1の命令セッ トに含まれる命令のフエツチと解読を行うと共に前 記第 1の命令セッ 卜に含まれる命令の実行手順を制御する制御部と、 前記制御部で生成された制御信号に従った演算処理を行う実行部と を有するプロセッサと、 A control unit that reads and decodes the instructions included in the first instruction set and controls the execution procedure of the instructions included in the first instruction set; and a control signal generated by the control unit. A processor having an execution unit that performs an arithmetic operation according to the
第 2の命令セットに含まれる命令のフエツチと解読を行うと共に前 記第 2の命令セッ 卜に含まれる命令の実行手順を制御する制御部と、 前記制御部で生成された制御信号に従った演算処理を行う実行部と を有するァクセラレー夕とを含み、 前記第 2の命令セットは、 前記第 1の命令セッ卜に含まれる命令に対し て同一の命令又は前記第 1の命令セットに含まれる命令に対して命令コ一 ドが同一であって実行結果の異なる下位互換の命令を含み、 A control unit for fetching and decoding the instructions included in the second instruction set and controlling the execution procedure of the instructions included in the second instruction set; and a control signal generated by the control unit. And an execution unit for performing arithmetic processing. The second instruction set has the same instruction code as the instruction included in the first instruction set or the same instruction code as the instruction included in the first instruction set and has an execution result. Contains different backward compatible instructions,
前記ァクセラレータの制御部は前記プロセッザの制御部における論理機 能の一部が削除された論理機能を有するものであることを特徴とする半導 体集積回路。  The semiconductor integrated circuit according to claim 1, wherein the control unit of the accelerator has a logic function in which a part of the logic function in the control unit of the processor is deleted.
.第 1の命令セッ トに含まれる命令のフヱツチと解読を行うと共に前 記第 1の命令セッ トに含まれる命令の実行手順を制御する制御部と、 前記制御部で生成された制御信号に従った演算処理を行う実行部と を有するプロセッサと、 A control unit that reads and decodes instructions included in the first instruction set and controls an execution procedure of the instructions included in the first instruction set; and a control signal generated by the control unit. A processor having an execution unit that performs an arithmetic operation according to the
第 2の命令セッ卜に含まれる命令のフエツチと解読を行うと共に前 記第 2の命令セッ トに含まれる命令の実行手順を制御する制御部と、 前記制御部で生成された制御信号に従った演算処理を行う実行部と を有するァクセラレー夕とを含み、  A control unit for fetching and decoding the instructions included in the second instruction set and controlling the execution procedure of the instructions included in the second instruction set; and a control signal generated by the control unit. And an execution unit for performing the arithmetic processing.
前記第 2の命令セッ トは、 前記第 1の命令セッ卜に含まれる命令に対し て同一の命令又は前記第 1の命令セッ卜に含まれる命令に対して命令コー ドが同一であって実行結果の異なる下位互換の命令を含み、  The second instruction set has the same instruction code as the instruction included in the first instruction set or the same instruction code as the instruction included in the first instruction set and is executed. Including backward-compatible instructions with different results,
前記ァクセラレー夕の実行部は、 前記プロセッサの実行部の演算機能に 対して一部の演算機能が削除され且つ新たな演算機能が追加されて成るも のであることを特徴とする半導体集積回路。  The semiconductor integrated circuit according to claim 1, wherein the execution unit of the accelerator is configured such that a part of the execution function is deleted and a new execution function is added to the execution function of the execution unit of the processor.
. 前記ァクセラレ一夕の制御部は前記プロセッサの制御部における論理機 能の一部が削除された論理機能を有するものであることを特徴とする請求 の範囲第 7項記載の半導体集積回路。9. The semiconductor integrated circuit according to claim 7, wherein the control unit of the accelerator unit has a logical function in which a part of the logical function in the control unit of the processor is deleted.
. 前記制御部は、 プログラムカウン夕、 命令デコーダ及びシーケンス制御 手段を有し、 前記シーケンス制御手段は、 次に実行すべき命令アドレスを プログラムカウン夕に保有させ、 プログラムカウン夕が保有する命令ァド レスに基づいてフェッチした命令を命令デコーダにデコードさせて制御信 号を出力させるものであり、 The control unit has a program counter, an instruction decoder, and a sequence control means. The sequence control means causes the program counter to store an instruction address to be executed next, and stores an instruction address held by the program counter. The instruction fetched based on the address is decoded by an instruction decoder to output a control signal.
前記実行部は、 前記制御信号によってその動作が制御されるレジスタ回 路及び演算器を有するものであることを特徴とする請求の範囲第 6項又は 第 7項記載の半導体集積回路。  8. The semiconductor integrated circuit according to claim 6, wherein the execution unit includes a register circuit and an operation unit whose operations are controlled by the control signal.
0 . 第 1の命令セットを有するプロセッサ上で動作するように開発された ベースプログラムから抽出された一部の処理を、 前記第 1の命令セッ卜に 含まれる命令に対して同一の命令又は前記第 1の命令セッ卜に含まれる命 令に対して命令コードが同一であって実行結果の異なる下位互換の命令を 含む第 2の命令セヅトを用いてを実現するためのァクセラレー夕の論理構 成を設計する処理と、 0. A part of the processing extracted from the base program developed to operate on the processor having the first instruction set is performed by using the same instruction or the same instruction as the instruction included in the first instruction set. A logical configuration of an accelerator for realizing an instruction included in the first instruction set by using a second instruction set including backward compatible instructions having the same instruction code and different execution results. The process of designing
前記抽出された一部の処理を前記ァクセラレー夕に実行させるためのァ クセラレー夕制御プログラムを設計する処理と、 コンビュ一夕に実行させ るためのプログラムがコンビユー夕読取り可能に記録されて成るものであ ることを特徴とする記録媒体。  A process for designing an axelalay evening control program for causing the axelatio evening to execute the extracted part of the process, and a program for causing the axelalay evening to be executed all the time on a convenience store. A recording medium characterized by that.
1 . 第 1の命令セットを有するプロセッサ上で動作するように開発された ベースプログラムから抽出された一部の処理を、 前記第 1の命令セッ卜に 含まれる命令に対して同一の命令又は前記第 1の命令セッ卜に含まれる命 令に対して命令コードが同一であって実行結果の異なる下位互換の命令を 含む第 2の命令セットを用いて実現するためのァクセラレ一夕の論理構成 を設計する処理と、  1. A part of the processing extracted from the base program developed to operate on the processor having the first instruction set is converted to the same instruction or the same instruction as the instruction included in the first instruction set. The logical configuration of the exceler to realize the instructions included in the first instruction set using the second instruction set including the backward compatible instructions having the same instruction code and different execution results is described. Process to design,
前記抽出された一部の処理を前記ァクセラレー夕に実行させるためのァ クセラレー夕制御プログラムを設計する処理と、  A process of designing an accelerator control program for causing the accelerator to execute the extracted part of the process; and
前記ベースプログラムによる前記抽出された一部の処理を前記ァクセラ レー夕制御プログラムによる処理に代替させる変更を前記ベースプログラ ムに対して行う処理と、 前記変更されたベースプログラムによる前記プロセッサと前記ァクセラ レ一夕制御プログラムによるァクセラレ一夕とに対する統合シミュレ一シ ヨンを行う処理と、 をコンピュータに実行させるためのプログラムがコン ビュー夕読取り可能に記録されて成るものであることを特徴とする記録媒 体 oA process of making a change to the base program so that the extracted part of the process by the base program is replaced by a process by the accelerator control program; A process of performing an integrated simulation of the processor based on the changed base program and the function based on the function control program; and a program for causing a computer to execute the process. Recording medium characterized by comprising:
2 . 前記ァクセラレ一夕制御プログラムを設計する処理は、 前記ァクセラ レ一夕のソフトウエアシミュレ一夕を用いて前記ァクセラレ一夕制御プロ グラムを検証する処理を含み、 前記ァクセラレー夕のソフトウエアシミュ レー夕は、 前記プロセッサのソフ トウェアシミュレー夕の一部を変更 して成るものであることを特徴とする請求の範囲第 1 1項記載の記録媒 体。  2. The process of designing the control program of the x-ray computer includes the process of verifying the control program of the x-cell program with the use of the x-ray software program. 11. The recording medium according to claim 11, wherein the evening is obtained by changing a part of a software simulation of the processor.
3 . 前記統合シミュレーションを行う処理は、 統合シミュレータを用いて 行う処理であり、 前記統合シミュレ一夕は、 前記ァクセラレー夕のソフト ウェアシミュレータと、前記プ口セヅサのソフ トウエアシミュレ一夕と を組合わせて成るものであることを特徴とする請求の範囲第 1 2項記載 の記録媒体。  3. The process of performing the integrated simulation is a process performed by using an integrated simulator, and the integrated simulation is a combination of the software simulator of the exceler and the software simulator of the server. 13. The recording medium according to claim 12, wherein the recording medium comprises:
4 . プロセッサの機能を定義するための機能定義情報と、 制御信号情 報とがコンビュ一夕読取り可能に記録され、  4. The function definition information for defining the function of the processor and the control signal information are recorded so as to be readable over the entire time,
前記機能定義情報は、第 1の命令セッ トに含まれる命令のフ Xツチ と解読を行うと共に前記第 1の命令セッ トに含まれる命令の実行手 順を制御する制御部の機能を定義する第 1の情報と、前記制御部で生 成された制御信号に従った演算処理を行う実行部の機能を定義する 第 2の情報とを、 夫々機能変更可能に別々に有し、  The function definition information defines a function of a control unit that decodes and interprets an instruction included in the first instruction set and controls an execution procedure of an instruction included in the first instruction set. First information and second information that defines a function of an execution unit that performs arithmetic processing according to a control signal generated by the control unit, each of which separately includes a function that can be changed,
前記制御信号情報は、前記制御部と実行部との間の制御信号を夫々 明示する情報である、 ことを特徴とする記録媒体。  The recording medium according to claim 1, wherein the control signal information is information specifying a control signal between the control unit and the execution unit.
PCT/JP1999/002348 1999-05-06 1999-05-06 Method for developing semiconductor integrated circuit WO2000068782A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP1999/002348 WO2000068782A1 (en) 1999-05-06 1999-05-06 Method for developing semiconductor integrated circuit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP1999/002348 WO2000068782A1 (en) 1999-05-06 1999-05-06 Method for developing semiconductor integrated circuit

Publications (1)

Publication Number Publication Date
WO2000068782A1 true WO2000068782A1 (en) 2000-11-16

Family

ID=14235613

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP1999/002348 WO2000068782A1 (en) 1999-05-06 1999-05-06 Method for developing semiconductor integrated circuit

Country Status (1)

Country Link
WO (1) WO2000068782A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009110176A (en) * 2007-10-29 2009-05-21 Fujitsu Ltd Data processor and data processing method
US7873937B2 (en) 2003-10-07 2011-01-18 Asml Netherlands B.V. System and method for lithography simulation
JP2012252374A (en) * 2011-05-31 2012-12-20 Renesas Electronics Corp Information processor

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS641031A (en) * 1987-02-27 1989-01-05 Nec Corp Data processor
JPH01142967A (en) * 1987-11-30 1989-06-05 Nec Corp Coprocessor control system
JPH09512651A (en) * 1994-05-03 1997-12-16 アドバンスド リスク マシーンズ リミテッド Multiple instruction set mapping

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS641031A (en) * 1987-02-27 1989-01-05 Nec Corp Data processor
JPH01142967A (en) * 1987-11-30 1989-06-05 Nec Corp Coprocessor control system
JPH09512651A (en) * 1994-05-03 1997-12-16 アドバンスド リスク マシーンズ リミテッド Multiple instruction set mapping

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7873937B2 (en) 2003-10-07 2011-01-18 Asml Netherlands B.V. System and method for lithography simulation
US8209640B2 (en) 2003-10-07 2012-06-26 Asml Netherlands B.V. System and method for lithography simulation
US8516405B2 (en) 2003-10-07 2013-08-20 Asml Netherlands B.V. System and method for lithography simulation
US8893067B2 (en) 2003-10-07 2014-11-18 Asml Netherlands B.V. System and method for lithography simulation
JP2009110176A (en) * 2007-10-29 2009-05-21 Fujitsu Ltd Data processor and data processing method
JP2012252374A (en) * 2011-05-31 2012-12-20 Renesas Electronics Corp Information processor
US9354893B2 (en) 2011-05-31 2016-05-31 Renesas Electronics Corporation Device for offloading instructions and data from primary to secondary data path

Similar Documents

Publication Publication Date Title
US10360327B2 (en) Modifying a virtual processor model for hardware/software simulation
US7472361B2 (en) System and method for generating a plurality of models at different levels of abstraction from a single master model
US6425116B1 (en) Automated design of digital signal processing integrated circuit
US20050049843A1 (en) Computerized extension apparatus and methods
JP4492803B2 (en) Behavioral synthesis apparatus and program
WO1997013209A1 (en) Method of producing a digital signal processor
KR100499720B1 (en) Method for designing a system lsi
KR100565143B1 (en) Method for designing a system lsi and a recording medium storing the same
US20080172551A1 (en) Operation verification method for verifying operations of a processor
JP5716104B2 (en) Mixed language simulation
WO2006078436A2 (en) Methods and systems for modeling concurrent behavior
EP1532554A2 (en) Method and apparatus for translating to a hardware description language from an architecture description language
WO2000068782A1 (en) Method for developing semiconductor integrated circuit
JP2000268074A (en) Device and method for automatically generating verification program and device and method for automatically generating property
JP4152659B2 (en) Data processing system and design system
US8539415B2 (en) Reconfigurable circuit, its design method, and design apparatus
JP2004310568A (en) Simulator device, simulation method and performance analysis method
JP2004118518A (en) Simulator, and recording medium recording the simulator readable in computer
JP2000242684A (en) Function synthesis method, function synthesis device and recording medium therefor
Le A Tightly Integrated Generic Instruction RISC-V Accelerator (TIGRA) for the Rocket Core
JP2001022808A (en) Logic circuit reducing device, method and device for logic simulation
US8495539B1 (en) Scheduling processes in simulation of a circuit design
De Sutter et al. Hardware and a tool chain for ADRES
JP2003233632A (en) Integrated circuit device including pointer circuit, design method for the integrated circuit and design support device
Mutz Register transfer level VHDL models without clocks

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP US

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
ENP Entry into the national phase

Ref country code: JP

Ref document number: 2000 616499

Kind code of ref document: A

Format of ref document f/p: F