CN100514281C - Data by-passage technology in digital signal processor - Google Patents

Data by-passage technology in digital signal processor Download PDF

Info

Publication number
CN100514281C
CN100514281C CNB2004100167561A CN200410016756A CN100514281C CN 100514281 C CN100514281 C CN 100514281C CN B2004100167561 A CNB2004100167561 A CN B2004100167561A CN 200410016756 A CN200410016756 A CN 200410016756A CN 100514281 C CN100514281 C CN 100514281C
Authority
CN
China
Prior art keywords
data
result
address
register
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2004100167561A
Other languages
Chinese (zh)
Other versions
CN1664775A (en
Inventor
陈晓毅
刘鹏
姚庆栋
李东晓
俞国军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CNB2004100167561A priority Critical patent/CN100514281C/en
Publication of CN1664775A publication Critical patent/CN1664775A/en
Application granted granted Critical
Publication of CN100514281C publication Critical patent/CN100514281C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Abstract

This invention discloses a microprocessor and a computer system to supply a DSP structure for internal memory, especially relating to data byway technology, offering a new data byway technology, wherein its circuit, six-way data transfer is actualized with four ways choosing parallel data according to priority in eleven data sources and two ways choosing them in 3 sources. The invention has the advantages that it decreases conflict stagnation in pipeline and improves real-time processing ability, and the data byway technology of DSP in new six-way pipeline structure adopts parallel processing technology fir the key four ways, compared to the normal means that every way needs ten data selector to select serial data.

Description

A kind of data bypass system that is applied in the digital signal processor streamline
Technical field
The present invention relates to microprocessor and computer system, more particularly, the present invention relates to a kind of digital signal processor (DSP) structure, relate in particular to a kind of data bypass system that is applied in the digital signal processor streamline towards internal memory.
Background technology
Increase along with demand in the development of modern microelectronic technique and the practical application, digital signal device towards internal memory operation becomes more and more popular, and wherein maximum characteristics are can take out two data the data-carrier store simultaneously from sheet to carry out operations such as logic, calculating in a clock.Because the individualized and customization trend of electronic equipment, digital signal processor must be pursued higher and faster arithmetic speed, under the certain process conditions restriction, adopts multistage pipeline organization to become to solve a kind of means of clock bottleneck.Be the principle of work that example illustrates data bypass circuit in the digital signal processor (BPU) now with 6 stage flowing structures.
The principle of work of data bypass circuit in the digital signal processor architecture streamline is described with reference to figure 1.
Phase one, the instruction fetch phase (IF stage): programmable counter 101 provides a virtual address S1 to on-chip command storer 102.On-chip command storer 102 is according to the instruction S2 of one 32 bit of corresponding address output.This instruction according to circumstances difference can be towards register instruction (elementary instruction), towards the DSP of internal memory instruction (DSP instruction).The elementary instruction vector comprises general address (rs, rt), destination address (rd) and operation control code (op).The address, address searching modes sign indicating number (mode) and the operation control code (op) that comprise destination register (rd), two address background registers (ARm, ARn), two indexed registers (IR0, IR1) in the DSP instruction vector.
Subordinate phase, the instruction decode stage (ID stage): the output terminal 103a by this register 103 after instructing vectorial S2 through interface registers 103 time-delays clock period sends command decoder 105 to.The instruction vector decoding back output control code S3 that this code translator 105 will be imported is to control module, output register reference address S5 is to register file, export current read register address S6 to data bypass circuit and later on each stage the control, the data-signal S4 that use arrive interface registers 108.Register file is exported corresponding data S7 to data bypass circuit 107 according to access register address S5, and signal S7 has comprised 6 32 data, is respectively: rs, rt, ARm, ARn, IR0, IR1.Data bypass circuit is selected corresponding 6 value S8..S13 according to correlationship and priority relationship from S22, S23, S27, S31, S33, S7 set of signals, and delivers to interface registers 108 and latch.
Phase III, the address computation stage (DA stage): two address logic unit (DALU) 109,110 are according to operating control code S20, S21 to operand S10a, S12a, S11a, S13a carry out corresponding operation, and with S22, S23 visit data-carrier store 111,112 on two sheets as a result, their value of while is also delivered to interface registers 113 and is latched, and feeds back to the data that data bypass circuit 107 is used as bypass.Address register upgrades according to address addressing mode sign indicating number, and its updating value S35, S36 also are latched into 113.
The quadravalence section, the internal storage access stage (DM stage): according to address S22, S23, data register is through corresponding two data S25, S26 of output after certain access time on the sheet, and they are latched into interface registers 114 under the effect of clock.Control signal S24 also latchs into 114.
Five-stage, carry out computing dial-tone stage (EX stage): data logical block (ALU) 115 is carried out corresponding arithmetic operation according to operation control code S15 to operand S17, S18, and operation result S16 is input to interface registers 116.
In the 6th stage, write the stage (WB stage) as a result: will have data logic operation result in the interface registers 116 and two address registers, two indexed registers, data register is read from the dual-port sheet in one of them, and the aggregate signal S33 of the value of poke is written to the register file 106.
The available following table 1 of the operation of above-mentioned 6 stage pipeline organizations is observed.
Table 1
Cycle Instruction fetch Decoding Address computation Internal storage access Carry out Write the result
0 Instruction 1
1 Instruction 2 Instruction 1
2 Instruction 3 Instruction 2 Instruction 1
3 Instruction 4 Instruction 3 Instruction 2 Instruction 1
4 Instruction 5 Instruction 4 Instruction 3 Instruction 2 Instruction 1
5 Instruction 6 Instruction 5 Instruction 4 Instruction 3 Instruction 2 Instruction 1
6 Instruction 7 Instruction 6 Instruction 5 Instruction 4 Instruction 3 Instruction 2
In the cycle 0, instruction 1 is in the instruction fetch phase, and instruct 1 to take out 102 from the on-chip command storer this moment.In the cycle 1, instruction 1 moves on to the decoding stage, and the register address according to regulation in the instruction 1 reads register value.At this moment, instruction 2 enters the instruction fetch phase.In the cycle 2, instruction 1 enters the address computation stage, and the value of address register, indexed registers is offered address logic unit 109,110 and carries out corresponding operation according to control code.Instruction 2 moves into the decoding stage, and instruction 3 enters the instruction fetch phase.In the cycle 3, instruction 1 enters the internal memory fetch phase, and S25, S26 latch into the next operation of interface registers 114 waits reading as a result, and a stage is also all moved in instruction subsequently, newly instructs 4 to enter the stage of reading instruction.In the cycle 4, instruction 1 enters the execute phase, according to control code S15 two data S17, S18 is carried out various operations, and instruction subsequently also all moves forward a stage, newly instructs 5 to enter the stage of reading instruction.At last, in the cycle 5, instruction 1 enters the stage as a result of writing, and poke writes register file in one of them that data register is read data logical block 115 results, address register, indexed registers, from the dual-port sheet.Instruction subsequently also all moves forward a stage, newly instructs 6 to enter the stage of reading instruction.
From table 1 streamline chart, can see that when the operand in the instruction 2,3,4,5 or had been read, the result of instruction 1 just was written in the register file 106.If instruct 2,3,4,5 need use instruction 1 result in the stage of read operation number or execution command, instruct 1 result just must be switched in advance in the bypass circuit 107 in ID stage so, otherwise unnecessary data conflict (DATAHAZARD) can take place and cause pipeline stall to be waited for.For example, if instruction 2 will be used instruction 1 data logical block result, streamline must pause and wait by the time the cycle 5 so, instruct 1 when writing operation result the bypass result to module 107.
Consider the follow procedure fragment:
ADD ARm, B, C #B+C, the result deposits address register ARm in
SUB E, [ARm], D #[ARm]-D, the result deposits register E in.[ARm] expression is according to ARm
Data register 111 on the value visit sheet of #.
Command N 1
Command N 2
The execution order of said procedure fragment can be represented with the flowing water line chart of following table 2.
As can be seen from Table 2, the result of B+C will write address register ARm.But ARm but needs to visit data register 111 on the sheet.Therefore, the streamline wait that must pause, when writing the ARm value by the time, its value is bypassed to the ID stage simultaneously, otherwise just can not get expected result.
In the processor of reduced instruction, because the structure of streamline and the structure of digital signal processor streamline have very big difference, before reference-to storage, general general structure is for getting finger, decoding, execution, access memory, write-back usually for the execution level of reduced instruction processor.And the reduced instruction processor adopting is towards register manipulation, and its two operands can only be two registers, so its data bypass circuit simple many than in the digital signal processor.Address register, indexed registers, base register etc. are arranged in the digital signal processor, simultaneously may use 4 register values, and the pipeline depth increase, so that improve work clock, therefore, the design of data bypass circuit also will consider to satisfy the requirement of time delay.In order to embody the characteristics of real-time processing, reduce the pause of streamline as far as possible.To sum up, The key factor has been played in the design of the data bypass circuit in the digital signal processor.
At present, having used detection of serial data address conflict and data in the data bypass of common digital signal processor pipeline organization selects.
Table 2
Cycle Instruction fetch Decoding Address computation Internal storage access Carry out Write the result
0 ADD ARm,B,C
1 SUB E,[ARm],D ADD?ARm,B,C
2 Command N 1 SUB E,[ARm],D ADD ARm,B,C
3 Command N 1 SUB E,[ARm],D Nop ADD ARm,B,C
4 Command N 1 SUB E,[ARm],D NOP NOP ADD ARm,B,C
5 Command N 1 SUB E,[ARm],D NOP NOP NOP ADD ARm,B,C
6 Command N 2 Command N 1 SUB E,[ARm],D NOP NOP NOP
Summary of the invention
The objective of the invention is to overcome deficiency of the prior art, a kind of digital signal processor architecture towards internal memory (DSP) is provided, especially a kind of data bypass system that is applied in the digital signal processor streamline.
In order to solve the problems of the technologies described above, the present invention is achieved by the following technical solutions:
The present invention proposes a kind of data bypass system that is applied in the digital signal processor streamline, described digital signal processor streamline adopts six stage flowing water, be respectively the instruction fetch phase, the decoding stage, the address computation stage, the internal storage access stage, execute phase and write the stage as a result, this data bypass system comprises that a utilization has the parallel data bypass circuit of priority ranking, be used for from general-purpose register, 6 circuit-switched data of address background register and indexed registers are transmitted: wherein 4 the tunnel have the parallel data bypass of priority to 11 data, and 2 the tunnel have the parallel data bypass of priority to 3 data;
Described parallel data bypass circuit with priority ranking is made up of two parts: a part is a parallel data address conflict testing circuit, comprises 10 5 bit address comparers; Another part is that prioritized data is selected circuit, according to parallel data address conflict testing result 11 data is carried out prioritized data and selects.
As a kind of improvement, the described parallel data bypass circuit with priority ranking that is used for data forwarding: parallel data address conflict testing circuit comprises the first to the 14 comparer (CMP1, CMP2, CMP3, CMP4, CMP5, CMP6, CMP7, CMP8, CMP9, CMP10, CMP11, CMP12, CMP13, CMP14), prioritized data selects circuit to comprise first to the 5th anti-door (NO1, NO2, NO3, NO4, NO5), the first to the 18 with the door (an AND1, AND2, AND18), first to the 8th Sheffer stroke gate (NOR1, NOR2, NOR8), first to the 8th or the door (OR1, OR2, OR8), a data selector switch (MUX1).
Compared with prior art, the invention has the beneficial effects as follows:
The conflict that reduces in the streamline pauses, and reduces time delay, improves the clock of processor, thereby improves processing capability in real time.Data bypass technology in the digital signal processor of 6 stage pipeline structure of the present invention design has all adopted parallel processing technique to 4 tunnel of key, and general way to be each road need carry out serial data with 10 data selector switchs selects.
Description of drawings
Fig. 1 is the fundamental diagram of 6 stage pipeline structure of the present invention.
Fig. 2 is the data bypass circuit fundamental diagram in the digital signal processor streamline of one embodiment of the present of invention.
Fig. 3 is for there being the data selection circuit fundamental diagram of priority ranking to 11 data in the data bypass technology of the present invention.
Embodiment
Below in conjunction with specific embodiment technical solution of the present invention is elaborated:
The present invention proposes a kind of data bypass system that is applied in the digital signal processor streamline.This data bypass technology realizes that 6 circuit-switched data transmit, and wherein 4 the tunnel have the parallel data bypass of priority to 11 data, and 2 the tunnel have the parallel data bypass of priority to 3 data.4 tunnel forwardings that realize rs, rt, ARm, ARn register data value respectively wherein.2 the tunnel realize the forwarding to IR0, IR1 register data.
Involved in the present invention to the data bypass system in have a priority ranking the parallel data bypass circuit form by two parts, a part is that the parallel data address conflict detects, a part is that prioritized data is selected.The parallel data address conflict detects and to comprise 10 5 bit address comparers, forms one group to the comparison that walk abreast of preceding 4 pairs of addresses for preceding 4.4 comparers in back are formed another group, and back 4 pairs of addresses are walked abreast relatively.Latter two comparer is to 2 pairs of remaining addresses comparison that walk abreast.Prioritized data is selected according to parallel data address conflict testing result 11 data preferentially to be selected.
Involved in the present invention to data bypass circuit in wherein have 10 numbers existing shared factually in 4 tunnel 11 data, they are respectively: from the ARm in DA stage, the updating value of ARn (not through not latching), from the ARm in DM stage, ARn latched value, the ARm, ARn latched value from the EX stage, ARm, the ARn latched value from the WB stage, the data logical consequence after latching, data register is read from the dual-port sheet in one of them poke.Data shared is not for reading the register value from register file separately.The data of writing in front have higher priority, and promptly the ARm in DA stage, ARn updating value have the highest priority.Wherein have 2 data shared in 3 data in 2 tunnel, they are respectively: poke in one of them that data register is read from the data logical consequence after the latching of WB stage, from the dual-port sheet.Not shared is the register value of reading separately from register file.The data of writing in front have higher priority.
Totally 8 of address registers involved in the present invention are wherein the 8th to the 15th in 32 32 bit register files, and can be used as general-purpose register (rs, rt), promptly can be multiplexing.The indexed registers address is fixed as the 24th, 25 in 32, they also with rs, rt is multiplexing.
For example, present instruction will be read the address register of the 14th position of the register file that is arranged in 32 registers through decoding back discovery, and suppose that parallel data address conflict testing circuit detects that it is relevant with the 1st, 5 data in 11 data this moment, promptly the 1st comparer, the 5th comparer are output as 32 complete " 1 ", and other are complete " 0 ".Prioritized data selects circuit can select the 1st data so, because its priority is greater than the 5th value, and the 5th value will be left in the basket.
Technical scheme of the present invention can illustrate that Fig. 2 is an overall pattern with Fig. 2 and Fig. 3, and Fig. 3 is the refinement about parts Prl_sel among Fig. 2.At first with reference to figure 2, each road input signal of data bypass circuit is respectively 45 bit register addresses after the process decoding that is in the ID stage, be respectively m1 (or ID_Reg_Addr1[4:0]), m3 (or ID_Reg_Addr2[4:0]), m5 (or ID_Reg_ARm[4:0]), m7 (or ID_Reg_ARn[4:0]) and 6 32 bit register values of reading from register file, they are respectively m2 (or RF_Rd_Data1[31:0]), m4 (or RF_Rd_Data2[31:0]), m6 (or RF_Rd_ARm[31:0]), m8 (or RF_Rd_ARn[31:0]), m9 (or RF_Rd_IR0[31:0]), m10 (or RF_Rd_IR1[31:0]).Then from other each stages, they are respectively other signals:
Come from the address of DA stage A Rm, ARn, the data of writing enable signal and upgrading, be m35 (or DA_ARm[4:0]), m36 (or DA_ARm_wr), m37 (or DA_ARm_din[31:0]), m38 (or DA_ARn[4:0]), m39 (or DA_ARn_wr), m40 (or DA_ARn_din[31:0]).The meaning of writing enable signal is if it is effective, shows that then this register is destination register in this instruction.
Come from DM stage A Rm, ARn the address, write enable signal and latched data, be m11 (or DA_DM_ARm[4:0]), m12 (or DA_DM_ARm_wr), m13 (or DA_DM_ARm_din[31:0]), m14 (or DA_DM_ARn[4:0]), m15 (or DA_DM_ARn_wr), m16 (or DA_DM_ARn_din[31:0]).
Come from EX stage A Rm, ARn the address, write enable signal and latched data, be m17 (or DM_EX_ARm[4:0]), m18 (DM_EX_ARm_wr), m19 (or DM_EX_ARm_din[31:0]), m20 (or DM_EX_ARn[4:0]), m21 (or DM_EX_ARn_wr), m22 (or DM_EX_ARn_din[31:0]).
Come from the WB stage data logic latch result, register address, write enable signal, i.e. m25 (or EX_WB_Dest1_din[31:0]), m23 (or EX_WB_Dest1[4:0]), m24 (or EX_WB_Dest1_wr).The register address that poke, interior poke will be write in one of them that data register is read from the dual-port sheet, write enable signal, i.e. m28 (or EX_WB_Dest2_din[31:0]), m26 (or EX_WB_Dest2[4:0]), m27 (or EX_WB_Dest2_wr).The address of ARm, ARn, write enable signal and latched data, be m29 (or EX_WB_ARm[4:0]), m30 (or EX_WB_ARm_wr), m31 (or EX_WB_ARm_din[31:0]), m32 (or EX_WB_ARn[4:0]), m33 (or EX_WB_ARn_wr), m22 (or EX_WB_ARn_din[31:0]).
With reference to figure 3, it is the refinement of module 201,202,203,204 among Fig. 2 again, and its function is that parallel data collision detection, prioritized data are selected.Its input signal is compared address n1 respectively, 10 compare address n2, n5, n8, n11, n14, n17, n20, n23, n26, n29.10 comparer enable signal n3, n6, n9, n12, n15, n18, n21, n24, n27, n30.11 selected data n 4, n7, n10, n13, n16, n19, n22, n25, n28, n31, n32.
According to above signal definition and in conjunction with Fig. 2, Fig. 3, can know that rs, rt, ARm, ARn data bypass have called module Prl_sel (see figure 3), because their structure is closely similar, the different address differences that just is compared, therefore, we just stress with the bypath principle of rs.
The address m1 of rs register is taken as the n1 signal and is input to the Prl_sel module by decoding scheme 105 outputs.The value of rs register is obtained by the address of register file 106 according to it, and is used as signal n32 and is input to the Prl_sel module.Other n2 then are the correspondence of m11 to m40 to the n31 signal, and they are respectively the data, the addresses that come from each stage, write enable signal.With reference to figure 3, be compared address n1 and be fed in 10 comparers.And another comparand in each comparer is the feedback addresses in each stage.Control comparer and enable with writing enable signal.If enable signal is " 0 ", then, the comparer operate as normal, if be " 1 ", then comparer is not worked, all the time output " 0 ".If comparer is made equal differentiation, can export complete " 1 " signal of 32 so, otherwise export 32 complete " 0 ".Therefore, 10 comparers, 10 comparative result cc1 of output that can walk abreast are to cc10.10 comparative results are divided into 3 groups, cc1, cc2, cc3, cc4 are the 1st group, cc5, cc6, cc7, cc8 are the 2nd group, cc9, cc10 are the 3rd group.At first see first group working condition.The priority of cc1 is the highest as a result in this group, and cc4 is minimum.Therefore cc1 and the corresponding input data n 4 of this comparer are carried out and operation output sc1.Suppose n1, the n2 address equates that then cc1 be " 32 ' HFFFFFFFF ", the result or the n4 of it and n4 and operation, so sc1=n4.Simultaneously it by or the operation of door NO1, rejection gate NOR1, NOR2, make output sc2, sc3, the sc4 with door AND2, an AND3, AND4 be complete " 0 ", thereby ignored the comparative result of comparator C MP2, CMP3, CMP4.Again by or a door OR1 output tc1 be the value of n4.Realized having 4 of priority to select 1.In like manner, equating that with the n6 address then cc2 is " 32 ' HFFFFFFFF " if find n5, must be that cc1 is " 32 ' H00000000 " at this moment, otherwise the priority height of cc1, can neglect the result of cc2.Cc1 is by making that with door AND1 the result of output sc1 is complete " 0 ".And cc2 makes also that by rejection gate NOR1, NOR2 output sc3, sc4 are complete " 0 ".With the anti-and cc2 of two input signal cc1 of door AND2 all be complete " 1 ", so sc2 to export be the value of n7 in fact, and by or the OR1 selection result tc1 that exports this group equal the value of n7, also realized the data selection.Corresponding other comparative result also has same logical relation.
For the 2nd group principle of work, similar with the 1st group situation.And they are identity relations, do not have priority relationship.Their priority relationship realizes by data selector MUX1, if having a condition to set up among cc1, cc2, cc3, the cc4, so they or g1 as a result just be " 1 " entirely, thereby Dout1 selects the 1st group tc1 as a result.And if g1 is " 0 ", then no matter the condition of g2 how, Dout1 can select the 2nd group tc2 as a result.Thereby realized the 1st group of function that is better than the 2nd group.1st, the 2nd group of common establishment condition cfr be g1 and g2's or, if cc1 has a condition to set up to cc8, cfr just sets up.
For the 3rd group of circuit.Have only two comparator C MP9, CMP10.Obtain two cc9, cc10 as a result.According to the front narration, these two condition priority are the highest, but they come from the updating value of address logic unit 109,110, and all its arrival time delays are bigger.Therefore Dout1 computing as a result the time, sc9, sc10 are also carrying out as a result, and the result of Dout1 is often arranged earlier, and sc9, sc10 are just arranged.Therefore, can be used as Dout1 the 3rd group the 3rd data input, cfr is used as the 3rd comparative result.Such hypothesis has been arranged, the 3rd group of principle of work also with preceding two categories seemingly, if cc1 sets up to cc10 neither one condition, represent that then this instructs this register not have the data can bypass, thereby select the value n32 that reads from register, export out as a result at last.
The bypass circuit of IR0, IR1 is simpler than rs, rt, ARm, ARn's, and one of reason is that its selection data are fewer, has only 3, and therefore, in any case, the forwarding of these two registers can not become the critical path of circuit.With IR0 is example, owing to have only 3 data, and therefore as long as 2 comparator C MP11, CMP12.Principle of work and Prl_sel module class are that it is simpler seemingly.
As can be seen, formation parallel data address conflict testing circuit in the embodiment of the invention and prioritized data select the formation and the annexation of circuit to be from Fig. 2, Fig. 3:
The parallel parallel data address conflict that relatively carries out of first to the tenth comparer detects, result relatively be connected to first to the 5th anti-door, the first to the 12 with door, first to the 6th or door preferentially select; The the 9th to 12 with door and the 6th or door produce a final output result and finish the prioritized data selection; The module that above-mentioned four same circuit constitute produces the data bypass result of address date, address background register ARM and the ARn of general-purpose register rs and rt respectively;
Result after the 11 comparer to the 12 comparers carry out simultaneously relatively is connected to anti-door four and rejection gate seven, even again through two inputs and Men Shisan to ten five with input with after the result carry out mutually or the bypass result of back seven a generation allocation index register 0;
Result after the 13 comparer to the 14 comparers carry out simultaneously relatively is connected to anti-door five and rejection gate eight, even again through two inputs and Men Shiliu to ten eight with input with after the result carry out mutually or the bypass result of back eight a generation allocation index register 1.
As mentioned above, above-mentioned each circuit block formation and connected mode thereof make goal of the invention be achieved.
Therefore, the present invention compares the beneficial effect that is had in background technology and is following advantage: circuit time delay reduces, thereby higher efficient is arranged, and adopts the work clock of digital signal processor of the present invention to improve accordingly, and the DSP characteristic also better embodies.
Below be two application examples of the present invention.
Example 1, consider following 2 instructions:
ADD?A,B,C
SUB?D,A,E
Article one, the operation of B+C is carried out in instruction, and the result is stored among the register A.A-E operation is carried out in second instruction, and the result is deposited among the register D.Because the source-register A of second instruction has used the result of calculation of article one instruction, so exist data dependence between these two instructions.So when the second instruction was in decode stage, control module can be judged data collision control information S35, make streamline IF, ID pause, and other levels continue to carry out.Up to when the WB level is carried out in article one instruction, parallel data collision detection compare address value in the data bypass circuit, it is identical with write address that comparator C MP5 can find to read the address, it all is A-register, comparator C MP5 output cc5 can become " 1 " entirely, thereby prioritized data is selected the value of circuit output A-register.Be that output s8 in the data bypass circuit is the value of A-register.Realized the data bypass work on this road, streamline is carried out after this.
Example 2, consider following 3 instructions:
ADD?ARm,B,C
SUB?D,*ARm++(IR0),*ARn++(IR1)
ADD?E,*ARm+(4),F
Article one, instruction is add instruction, realizes B+C, and the result deposits address register ARm in.Second instruction more complicated, realization be subtraction, two operands come from so that ARm+IR0, ARn+IR1 are the interior poke of address as a result.The value of ARm register can be updated to the ARm+IR0 value simultaneously, and the value of ARn register can be updated to the ARn+IR1 value.The result of subtraction deposits register D in.Article three, instruction still is add instruction, and two operands are respectively from interior poke, and memory address is the result of ARm+4, and another operand comes from register F.In this example, it is relevant that the ARm of article one instruction and second instruction exists data, and the operand of second instruction is the operation result of article one instruction.It is relevant that second instruction and the 3rd instruction also exist data, because the address register ARm that second instructs upgrades, it need change the analog value of register file 106, and ARm here are not only source operand, or destination operand.And the 3rd instruction will be used as address register to ARm and carry out memory address, is source operand, so the second instruction also exists data relevant with the 3rd instruction.
Carry out the second instruction when streamline and arrive decoding scheme, control circuit detects data collision, makes streamline IF, ID level pause, and other levels continue to carry out, and arrives up to article one instruction and writes level as a result.It is equal that comparator C MP5 in the 3rd roadside road branch module 203 of this moment in the data bypass circuit detects two addresses of n1 (or ID_Reg_ARm[4:0]) and n14 (or EX_WB_Dest1[4:0]), other are complete in not waiting, therefore module 203 is selected output n16 value, be the value of the destination register ARm in article one instruction, thereby realized the data bypass effect.Streamline continues to carry out after this, and the DA level is carried out in the second instruction, and the 3rd instruction enters the ID level.It is equal that comparator C MP10 in the 3rd roadside road branch module 203 of this moment in the data bypass circuit detects the address of n1 (or ID_Reg_ARm[4:0]) and n14 (or DA_ARm[4:0]), all be ARm, other are complete in not waiting, therefore module 203 is selected output n28 value, it is the value of the address register ARm of the renewal in the second instruction, thereby realized the data bypass function, and avoided unnecessary pipeline stall in the DSP processing.
At last, it is also to be noted that what more than enumerate only is specific embodiments of the invention.Obviously, the invention is not restricted to above examples of implementation, many distortion can also be arranged.All distortion that those of ordinary skill in the art can directly derive or associate from content disclosed by the invention all should be thought protection scope of the present invention.

Claims (1)

1, a kind of data bypass system that is applied in the digital signal processor streamline, described digital signal processor streamline adopts six stage flowing water, be respectively the instruction fetch phase, the decoding stage, the address computation stage, the internal storage access stage, execute phase and write the stage as a result, it is characterized in that, this data bypass system comprises a parallel data bypass circuit with priority ranking, be used for from general-purpose register, 6 circuit-switched data of address background register and indexed registers are transmitted: wherein 4 the tunnel have the parallel data bypass of priority to 11 data, and 2 the tunnel have the parallel data bypass of priority to 3 data;
Described parallel data bypass circuit with priority ranking is made up of two parts: a part is a parallel data address conflict testing circuit, comprises 10 5 bit address comparers; Another part is that prioritized data is selected circuit, according to parallel data address conflict testing result 11 data is carried out prioritized data and selects;
The described parallel data bypass circuit with priority ranking that is used for data forwarding: parallel data address conflict testing circuit comprises the first to the 14 comparer; Prioritized data select circuit comprise first to the 5th anti-door, the first to the 18 with door, first to the 8th rejection gate, first to the 8th or and a data selector switch;
The structure of circuit module is: the parallel parallel data address conflict that relatively carries out of first to the tenth comparer detects, result relatively be connected to first to the 3rd anti-door, the first to the 12 with door, first to the 6th or door preferentially select; The the 9th to 12 with door and the 6th or door produce a final output result and finish the prioritized data selection; Four same foregoing circuit modules produce the data bypass result of address date, first address register (ARM) and second address register (Arn) of first general-purpose register (rs) and second general-purpose register (rt) respectively;
Result after the 11 comparer to the 12 comparers carry out simultaneously relatively is connected to anti-door four and rejection gate seven, even again through three inputs and Men Shisan to ten five with input with after the result carry out mutually or the bypass result of back seven a generation allocation index register 0;
Result after the 13 comparer to the 14 comparers carry out simultaneously relatively is connected to anti-door five and rejection gate eight, even again through three inputs and Men Shiliu to ten eight with input with after the result carry out mutually or the bypass result of back eight a generation allocation index register 1.
CNB2004100167561A 2004-03-03 2004-03-03 Data by-passage technology in digital signal processor Expired - Fee Related CN100514281C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2004100167561A CN100514281C (en) 2004-03-03 2004-03-03 Data by-passage technology in digital signal processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2004100167561A CN100514281C (en) 2004-03-03 2004-03-03 Data by-passage technology in digital signal processor

Publications (2)

Publication Number Publication Date
CN1664775A CN1664775A (en) 2005-09-07
CN100514281C true CN100514281C (en) 2009-07-15

Family

ID=35035884

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004100167561A Expired - Fee Related CN100514281C (en) 2004-03-03 2004-03-03 Data by-passage technology in digital signal processor

Country Status (1)

Country Link
CN (1) CN100514281C (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103154884A (en) * 2010-10-27 2013-06-12 惠普发展公司,有限责任合伙企业 Pattern detection

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100395737C (en) * 2006-06-08 2008-06-18 杭州华三通信技术有限公司 Method for transmitting data between internal memory and digital signal processor
CN101751244B (en) * 2010-01-04 2013-05-08 清华大学 Microprocessor
CN103440210A (en) * 2013-08-21 2013-12-11 复旦大学 Register file reading and isolating method controlled by asynchronous clock
CN111026445A (en) * 2019-12-17 2020-04-17 湖南长城银河科技有限公司 Intelligent identification method and chip
CN117331603B (en) * 2023-09-18 2024-04-09 中国人民解放军军事科学院国防科技创新研究院 Depth pipeline forward bypass based on priority determination

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103154884A (en) * 2010-10-27 2013-06-12 惠普发展公司,有限责任合伙企业 Pattern detection
US9342709B2 (en) 2010-10-27 2016-05-17 Hewlett-Packard Enterprise Development LP Pattern detection
CN103154884B (en) * 2010-10-27 2016-08-10 惠普发展公司,有限责任合伙企业 Mode detection

Also Published As

Publication number Publication date
CN1664775A (en) 2005-09-07

Similar Documents

Publication Publication Date Title
US5450556A (en) VLIW processor which uses path information generated by a branch control unit to inhibit operations which are not on a correct path
US20180225255A1 (en) Hardware processors and methods for tightly-coupled heterogeneous computing
US4823260A (en) Mixed-precision floating point operations from a single instruction opcode
US5954811A (en) Digital signal processor architecture
CN100527111C (en) On-chip DMA structure and its implement method
EP2372530A1 (en) Data processing method and device
CN113050990A (en) Apparatus, method and system for instructions for a matrix manipulation accelerator
EP0543415A2 (en) Method for controlling register read operations in information processing apparatus
CN112579159A (en) Apparatus, method and system for instructions for a matrix manipulation accelerator
US5896543A (en) Digital signal processor architecture
CN109739556B (en) General deep learning processor based on multi-parallel cache interaction and calculation
CN102262611B (en) 16-site RISC (Reduced Instruction-Set Computer) CUP (Central Processing Unit) system structure
CN100514281C (en) Data by-passage technology in digital signal processor
US4773035A (en) Pipelined data processing system utilizing ideal floating point execution condition detection
CN109582364B (en) Simplified instruction set processor based on memristor
CN109614145B (en) Processor core structure and data access method
WO2018058452A1 (en) Apparatus and method for performing artificial neural network operation
JP3479385B2 (en) Information processing device
CN112784970A (en) Hardware accelerator, data processing method, system-level chip and medium
US8402251B2 (en) Selecting configuration memory address for execution circuit conditionally based on input address or computation result of preceding execution circuit as address
US7143268B2 (en) Circuit and method for instruction compression and dispersal in wide-issue processors
EP1220089B1 (en) Method for executing conditional branch instructions in a data processor and corresponding data processor
CN104951283A (en) Floating point processing unit integration circuit and method of RISC processor
US4707783A (en) Ancillary execution unit for a pipelined data processing system
US8631173B2 (en) Semiconductor device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090715

Termination date: 20120303