A Memory Controller
FIELD OF THE INVENTION
This invention relates to a memory controller for controlling access to a Random Access Memory (RAM) device and in particular to a memory controller for random bank allocation of a banked RAM device. The invention is particularly suitable for addressing a banked Dynamic RAM (DRAM) device. The method and apparatus have particular application for use with processors running a wide range of applications including 3-d graphics applications and in particular for operations such as reading bit-maps from memory and reading lists of polygons to be rendered.
BACKGROUND OF THE INVENTION Main memory for computers is commonly provided by Dynamic Random Access Memory (DRAM) devices. Each DRAM device consists of a large array of memory capacitors which are arranged in rows and columns. To operate on a memory location in the array a control circuit calculates the row of the location, places the row address on the DRAM's address pins and toggles the address select (RAS) pin causing the DRAM to read the row address. The DRAM connects the selected row to the sense amplifiers. The control circuit then calculates the column of the location, places the column address on the DRAM's address pins, and toggles the Column Address Select (CAS) pin causing the DRAM to read the column address which it uses to select the output of the sense amplifier corresponding to the selected column. Once the control circuit has selected a first row, it can select several columns from the first row by successively placing different column addresses on the address pins and toggling the CAS whilst keeping the first row selected. The resulting access time for successive operations is reduced and the technique is particularly useful for accessing data stored at successive addresses in memory. Hence, access times for addressing further columns in the same row (page) are shorter than access times for addressing a new page. When banking is used, the address of a location in DRAM also incorporates a bank address specifying the bank. Fastest access times are possible when the bank and row addresses are kept constant from one access to the next. If the row address (page) is altered and the bank address kept constant, slowest access results. If the row address is altered and
the bank address is simultaneously altered, there is potential for the access speed to be somewhere between the slowest and fastest access time. Hence when implementing a DRAM addressing system for banked DRAM, it is advantageous to arrange the system such that when a row address is changed, the bank address is likely to change simultaneously. Fast-page access mode is only available if the page can be kept open between memory accesses and a bank is only able to keep one page open for fast-page access. Therefore, a page in a bank can only be kept open if the next memory access to that bank is to another location within the open page or if the next memory access is to a different bank. The restrictions of fast-page access mean that it is only possible to swap between requestors without any access speed slow down if the requestors are each accessing an open page.
Known memory organisations for banked memory include high-order interleave, low-order interleave and some intermediate order interleave systems. A memory system using high-order interleave uses the high order address bits to select a particular bank of memory. Thus, the address bits which specify the bank are placed at the top of the memory map and the bank contains a block of consecutive addresses. Conceptually, the N banks appear as blocks of memory utilising 1/N of the address space each. When used with DRAM it makes particular sense to use the intermediate P bits of the processor generated address to address the row (page) of the DRAM and the least significant R bits of the processor generated address as the column address of the DRAM. Figure 1 shows schematically how the conceptual memory map is organised with the banks each occupying % of the address space and the pages occupying 1/(N-1 ) of each bank. The column address (not shown in the schematic) would be used to specify a particular column of a page. If multiple processors, or requestors, each use a different bank of memory, all the banks can transfer data simultaneously. When the number of requestors exceeds the number of banks of memory, it becomes difficult to assign requestors to banks to ensure that access speeds are maximised. Assuming four banks specified using 2 bank bits, row addresses of 12 row bits and column addresses of 8 column bits, a high- order interleave system, or most significant bit (msb) addressing mode, would have an address map as follows:
AddressMap = Bank[1 downto 0] & Row[11 downto 0] & Column[7 downto 0]
where & is the VHDL concatenate operator.
In a low-order interleave system, consecutive addresses are placed in alternate banks by using the least significant address bits to specify the bank. A requestor therefore needs to access more than one of the banks as it executes its program or transfers data. When low-order interleave is employed, reading a number of consecutive locations in the memory map cycles through the same number of banks and hence one requestor may effectively lock open a page for each bank preventing other requestors from achieving fast memory access. Low-order interleave can, therefore, result in a memory conflict if two requestors request simultaneous access to the same bank of memory.
Intermediate-order interleave systems can also be designed. By designing an appropriate intermediate-order interleave system, it is possible for a number of requestors each to keep a page open as long as the pages map into different banks. For a memory arranged into N banks, it is possible for N requestors each to have a page open in one of the N banks and hence intermediate-order interleave can be particularly useful in a system with a plurality of requestors. For DRAM with banking, a particularly useful address map is one in which the most significant bits specify the row (page) address, the intermediate bits specify the bank address and the least significant bits specify the column address. This type of interleaving, which we term least significant bit (Isb) addressing, places consecutive addresses in the same bank for a whole page, then changes bank at the end of the page. Figure 2a shows schematically how the conceptual memory map is organised with the pages each occupying 1/(N-1) of the address space and the banks occupying 1/4 of each page. The column address (not shown in the schematic) would be used to specify a particular column of a page in a particular bank. The order that the banks appear in the conceptual memory map is the same for each page in Isb addressing. When Isb addressing is used, instead of carefully trying to place requestors in different banks, the aim is to randomise the placement of requestors in banks.
For example consider the lsb addressing mode where the most significant P bits of the address map are used to specify the row address, the next most significant Q bits of the address map are used to specify the bank address and the least most significant R bits are used to specify the column address. Assuming a four bank system requiring Q=2 bank bits, and row and column addresses having respectively P=12 row bits and R=δ column bits, the memory map will cycle through the four banks (0,1 ,2,3) every 32KB (assuming an 8KB row (or page)). The conceptual address map for the lsb addressing mode is:
AddressMap= Row[11 downto 0] & Bank[1 downto 0] & Column[7 downto 0]
This intermediate-order interleave system is designed to change bank each page which helps to ensure fast access speeds.
SUMMARY OF THE INVENTION
We have appreciated that a problem with lsb addressing is that there is at least one common memory access pattern that results in a slow overall memory access speed due to the need. to access consecutively different row addresses in the same bank. This can occur when more than one requestor (processor, device or the like) requires access to consecutive addresses in the memory, a second requestor likewise requires access to consecutive addresses in the memory and both requestors happen to require access to the same bank. As the two requestors increment the addresses they are accessing, the location being addressed in the DRAM repeatedly falls in the same bank and the access time is maximised resulting in poor overall performance.
We have appreciated that this problem is due to the cyclic nature of the order that the banks appear in the conceptual memory map. By memory map we mean the table that relates processor generated addresses arranged in ascending or descending order to a corresponding row, bank and column address of a banked RAM device. By "conceptual" we mean that the conceptual memory map may never physically exist as a look-up table, list or the like; a row, bank and column address may be generated on-the-fly by applying a suitable mapping function to processor generated address. We have appreciated that by substantially
randomising the order in which the banks appear in the conceptual memory map, the problem can be ameliorated. By randomising the order in which the banks appear in the conceptual memory map we mean arranging the conceptual memory map such that the order of the banks at a first row (page) address is randomly different from that at a second and subsequent row addresses. Although the order of banks may have to be repeated for more than one page (assuming the number of banks to be small compared to the number of pages) the random way in which the bank order is repeated means that two requestors consecutively addressing the memory and happening to address the same bank at one stage in the addressing are unlikely to continue sequentially to access the same bank as they increment their addressing.
The invention is defined in the claims to which reference is now directed.
BRIEF DESCRIPTION OF THE FIGURES
Figure 1 is a schematic showing high-order interleave applied to a banked memory;
Figure 2a is a schematic showing an intermediate-order interleave applied to a banked memory;
Figure 2b is a schematic showing the randomised least significant bit addressing mode applied to a banked memory in accordance with the invention;
Figure 3 is schematic diagram of a computer system in accordance with the invention;
Figure 4 is schematic flow-chart showing mapping of the processor generated address to a bank, row and column address in accordance with a preferred embodiment of the invention; and
Figure 5 is a schematic diagram showing in more detail the configuration of the memory controller in accordance with a preferred embodiment of the invention.
DESCRIPTION OF A PREFERRED EMBODIMENT The invention is based on the appreciation that by using intermediate-order interleaving and forcing the order of the banks in the conceptual memory map to be irregular and non-cyclic, the average access speed of memory can be improved for a multiple requestor computer system. We term this randomised lsb addressing. Figure 2b shows schematically how the conceptual memory map is organised with the highest order bits specifying the row (page) address and each page occupying 1/(N-1) of the address space, the next most significant bits specifying the bank with each bank occupying 1/4 of each page address space. The column address (not shown in the schematic) would be used to specify a particular column of a page. Of particular note is the fact that the order that the banks appear is random and hence differs between pages.
It will be immediately apparent to the skilled man that any number of randomising functions could be used to implement randomised lsb addressing. Any function which, when applied, maps the bits used to specify the bank in an uncorrelated manner so that the result is a random order of banks in the conceptual memory map would meet the requirements of the randomised lsb addressing scheme. In the presently preferred embodiment, a portion of the processor generated address is used to generate the randomising function.
Figure 3 shows a computer system 10 comprising a plurality of requestors 12a, 12b, ..., 12c, a memory controller 14 and a memory 16 comprising four banks of DRAM 18a, 18b, 18c and 18d. The requestors 12 are coupled to the memory controller 14. The requestors 12 may be separate processors or devices. The requestors 12 input processor generated addresses to the memory controller 14 and may transfer data to memory or receive data from memory via the memory controller 14. The memory controller 14 is coupled to each of the banks of DRAM 18a, 18b, 18c and 18d and outputs DRAM addresses comprising row, bank and column addresses to the DRAM. The memory controller 14 may receive data from a DRAM bank or cause data to be written to a DRAM bank. The timing and addressing protocols required for reading from and writing to a
DRAM device are provided by the memory controller 14 and are not discussed here in detail. A function of the memory controller 14 is to take a processor generated address and map it into a DRAM address by applying a randomising function which results in the banks appearing in an irregular and non-cyclic order in the conceptual memory map. Incremental addressing by two or more requestors is thereby more evenly distributed amongst the banks of RAM to improve average access speed by reducing conflicts generated when two separate processors attempt to access the same bank but a different row of DRAM.
Figure 4 is a flowchart showing a preferred method of mapping the processor generated address into bank, row and column addresses for the banked DRAM. A requestor generates a processor generated address 20. The memory controller receives the processor generated address 22 and separates the processor generated address into three portions, the first portion relating to the row address, the second portion relating to the bank and the third portion relating to the column address 24. The P most significant bits of the processor generated address are used as the first portion and are directly used to specify the row address of DRAM. In the presently preferred embodiment, P=12 and the twelve most significant bits of the processor generated address are used as the row address. The third portion is formed by taking the R least significant bits of the processor generated address. In the presently preferred embodiment the eight least significant bits of the processor generated address are used directly as the column address of DRAM. In order to achieve the randomisation of the bank order in the conceptual memory map, the intermediate Q bits (that is the Q next most significant bits after the P most significant bits of the processor generated address) are used to generate a bank address to select the bank of DRAM. In the presently preferred embodiment, there are four banks of DRAM and Q=2. The memory controller 14 is configured to map the second portion to a bank address by applying a randomising function to the second portion 26, the application of the randomising function having the effect of randomising the order in which the banks occur in the conceptual memory map relating the addresses generated by the processor to the addresses of the RAM device. The memory controller outputs the bank address, row (page) address and column address 28
in accordance with the timing and addressing protocols required by the particular DRAM device.
In the presently preferred embodiment, the bank address is formulated using both the bits of the processor generated address used to specify the row and also the Q next most significant bits (or bank bits of the processor generated address). The R bit third portion 34 is used as the column address and is not used in the randomising function in the presently preferred embodiment. Figure 5 shows schematically more detail of the configuration of the memory controller 14. The bits used for specifying the row (the most significant P bits of the processor generated address) 30 are taken and applied as an input to a randomiser 36. The randomiser 36 maps the input to an uncorrelated, but reproducible, output producing a randomising function. In the presently preferred embodiment, the randomiser 36 is based on the "Lucifer" encryption system and is basically a cascade of 4 bit lookup tables and exclusive-or gates. The steps performed by the randomiser 36 are (i) taking each nibble of the input word and putting them through one of two 4-bit look-up tables, (ii) XORing the results of the look-up tables and (iii) XORing the top 2 bits of the result with the bottom 2 bits. A suitable randomiser may be implemented by the following C code:
UINT32 SboxRandSiraple(ϋINT32 Seed) {
UINT32 SboxO[]={l2, 15, 7, 10, 14, 13, 11, 0, 2, 6, 3, 1, 9,
4, 5, 8};
UINT32 Sboxl[]={7, 2, 14, 9, 3, 11, 0, 4, 12, 13, 1, 10, 6, 15, 8, 5}; UINT32 Stagel [8] ;
UINT32 Stage2;
UINT32 Result; Stagel [0]=Sbox0 [Seed & OxF] ;
Stagel[l]=Sboxl[(Seed>>4) & OxF] ;
Stagel [2] =Sbox0 [(Seed>>8) & OxF] ;
Stagel [3] =Sboxl[(Seed>>12) & OxF]
Stagel [4] =Sbox0 [(Seed>>16) & OxF]- Stagel [5] =Sboxl[(Seed>>20) & OxF]
Stagel [6]=Sbox0 [ (Seed>>24) & OxF]
Stagel [7] =Sboxl[{Seed>>28) & OxF]
Stage2=Stagel [0] A Stagel [1] Λ Stagel [2] A Stagel [3] Λ Stagel [4] Λ Stagel [5] A Stagel [6] Λ Stagel [7 ]
Result= (Stage2 & 3 ) Λ (Stage2 >>2 ) ;
return (Result) ;
}
The randomising function is applied by supplying the output of the randomiser 36 as one of two inputs to an exclusive-or gate 38, the second input to the exclusive- or being the Q bank bits of the processor generated address 32. The output of the exclusive-or 38 is used as the bank address to select the bank. The resulting order of banks in the conceptual memory map is substantially randomised. For example, with four banks the order may be of the type [0 3 2 1], [1 0 2 3], [3 1 0 2] and so on in a substantially non-repeating, irregular way. This fulfils the requirement that the order in which the banks appear in the conceptual memory map is irregular and prevents the cyclic ordering which can result in slow memory access.
The conceptual address map for the randomised lsb addressing mode is shown schematically in figure 2b and can be expressed as:
AddressMap= Row[11 downto 0] & Bank[randomised function of first and second portions of the processor generated address] & Column[7 downto 0]
Whilst the mapping of the bank bits to a bank address has been described in terms of a hardware solution, the mapping could be implemented in software. For example, a look up table could be used in which the most significant bits of the processor generated address (relating to the row address) could be used to access a row of the look up table and the next most significant bits to access a column of the table which would yield a bank address which met the required random order criterion. Other software and/or hardware implementations are within the implementation abilities of the skilled man.
With respect to the above description, it is to be realised that equivalent apparatus and methods are deemed readily apparent to one skilled in the art, and
all equivalent apparatus and methods to those illustrated in the drawings and described in the specification are intended to be encompassed by the present invention. Further, since numerous modifications and changes will readily occur to those skilled in the art, it is not desired to limit the invention to the exact construction and operation shown and described, and accordingly, all suitable modifications and equivalents may be resorted to, falling within the scope of the invention. It should be noted that features described at different points of the description may be used in combinations other than those particularly described or shown.