US7200721B1 - Verification of memory operations by multiple processors to a shared memory - Google Patents
Verification of memory operations by multiple processors to a shared memory Download PDFInfo
- Publication number
- US7200721B1 US7200721B1 US10/268,238 US26823802A US7200721B1 US 7200721 B1 US7200721 B1 US 7200721B1 US 26823802 A US26823802 A US 26823802A US 7200721 B1 US7200721 B1 US 7200721B1
- Authority
- US
- United States
- Prior art keywords
- partition
- memory
- threads
- values
- parameters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
Images
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C29/00—Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
- G11C29/56—External testing equipment for static stores, e.g. automatic test equipment [ATE]; Interfaces therefor
Definitions
- the present invention relates to testing cache coherency of a multiprocessor data processing system.
- Verifying data integrity under these conditions is challenging. If no control is exerted over concurrent write operations, no prediction can be made as to those data values that constitute valid data. However, too much program control may defeat the purpose of the test. That is, with too much program control of concurrency the test program is essentially validated instead of the system cache coherency.
- the invention tests cache coherency in a multiprocessor data processing arrangement, and in a feedback loop parameters used to test the cache coherency are updated based on system performance characteristics. Selected values are written to memory by a plurality of threads, and consistency of the values in the memory with the values written by the plurality of threads is verified. Performance characteristics of the data processing system are measured while writing the values, and in response to the performance characteristics relative to target performance characteristics, parameters that control writing by the plurality of threads are selectively adjusted.
- FIG. 1 illustrates the relationship between a thread control table, an area of memory to which data values are written, and a reference count matrix that indicates the validity of data read from the memory;
- FIG. 2 illustrates a table used to manage partition schemes for partitions of the area of memory used for cache coherency testing in accordance with another embodiment of the invention
- FIG. 3 is a flowchart of a process for controlling a cache coherency test in accordance with one embodiment of the invention
- FIG. 4 is a flowchart of an example process by which a thread makes access to a partition
- FIG. 5 is a flowchart of an example process for verifying the validity of values written by multiple threads to the area of memory
- FIG. 6 is a flowchart of a process performed by a feedback activity
- FIG. 7 is a flowchart of a process performed by the feedback activity in optimizing a selected parameter value.
- FIG. 1 illustrates the relationship between the thread control table 102 , an area 104 of memory to which data values are written, and a reference count matrix 106 that indicates the validity of data read from the memory.
- multiple active threads concurrently execute on different physical processors, each thread having the ability to update and access data within the same bytes, cache-lines, and pages.
- the number of threads may vary over time. Assigning each thread to a processor reduces the overhead incurred by context switches, operating system overhead, and the associated cache flushes and invalidations. However, no specific knowledge of the processor on which each thread is executing is required for verifying data coherency.
- Multiple threads are created at the beginning of a test, the number depending on specific test requirements.
- Information on the state of each thread is stored in a thread control table 102 .
- Each thread need only have access to its associated entry.
- a main controller thread uses the information in the thread control table to monitor whether threads are active or inactive. Information concerning the state of each thread can be output when an error is detected.
- the memory area 104 is created for use by the threads. Within certain pre-defined rules, the threads are allowed to simultaneously read, write and perform input/output to this area.
- the data in memory area 104 is continually validated for expected values. Data validation can be performed by any thread, and the timing is not dependent on the status of other threads. There is no requirement that the memory area be restricted to any particular memory module or physical memory range.
- the virtual-to-absolute address translation rules ensure that consecutive bytes in virtual address space represent consecutive bytes in real address space for all bytes within a page boundary.
- One embodiment of the invention also allows threads to vary memory access patterns.
- the invention allows the size of the target memory size to be varied over the course of a test.
- the target memory size may be varied from a single cache line or byte to many pages or mega-bytes of data.
- the target memory size is controlled through logical partitioning.
- the logical partitioning of the memory area 104 involves dividing the memory range into multiple, non-overlapping areas of memory called partitions. A partition need not be defined by contiguous memory locations. Because the partitioning is logical, the partitioning can be performed prior to commencing memory access and prior to creating any threads. Multiple partitioning schemes can be defined in advance of being used, thereby limiting the number of extraneous references needed while testing is underway. Only one partitioning scheme can be active at a time. However, a test has the option to switch from one partitioning scheme to another for different testing scenarios.
- the values within “[ ]” represent the values at bytes in the memory area, “1” indicates a byte of data that belongs to partition 1, “2” indicates a byte of data that belongs to partition 2, etc.
- Example 1 all the bytes are in a single partition.
- Example 2 the memory size is limited to 4 kilo-bytes, and there is only a single partition.
- Example 3 the memory is divided into 4 contiguous partitions of 4 mega-bytes.
- word 1 [11111111222222223333333111111112222222233333333...2222222233333333] word 1 word 2 word 3 word 4 word 5 word 6 word 8*10
- Example 4 80 words of the 16 mega-byte memory are divided into three non-contiguous partitions, where each partition is assigned every third word, and assuming 8 bytes/word.
- Example 5 there are two partitions that together cover 16 mega-bytes, with the first partition covering odd bytes and the second partition covering even bytes.
- n is the number of bytes/cache line
- each byte in a cache line is in a different partition, and m cache lines are tested.
- Example 7 represents a random, non-uniform partitioning of the entire memory area.
- each partition is defined by a beginning offset, an ending offset, a contiguous byte count, and an increment value.
- the beginning offset is the position in the memory area 104 , relative to the starting address of the memory area, at which the partition begins.
- the ending offset is the relative address in the memory area at which the partition ends.
- the contiguous byte count is the number of consecutive bytes that belong to the partition before a byte not in the partition is encountered.
- the increment value is the number of bytes that are not in the partition and are between two sets of contiguous bytes of the partition.
- One of the facilities provided by the aforementioned partitioning scheme is that data coherency can be verified under varied timing conditions and in a manner not possible with other partitioning schemes.
- the number of contiguous bytes, increment bytes and their groups determines the duration and frequencies of both read and write operations. This then generates considerable variation in the timing of data access operations, provides greater coverage of infrequent access conditions and prevents the test from becoming “grooved”.
- the thread control table 102 , memory area 104 , and reference count matrix 106 are used in verifying cache coherency. Cache coherency is inferred when expected values are read from the memory area 104 .
- the threads described in the thread control table write values to the memory area consistent with the associated partition definitions, reference count ranges, and access routines. A verification routine is called upon to verify the validity of the values residing in the memory area.
- the partition to which a thread writes has an associated reference count range, as shown in table 102 .
- the values in the reference count range are the values that are written to the partition.
- the thread writes one of the reference count values to each address in the partition, and the verification routine is then called upon to read the values at the addresses of the partition and verify the values read are those expected.
- the verification routine reads a value from the partition, and uses the value, along with the partition number, to index into the reference count matrix 106 .
- the reference count matrix has a row for each partition, and a column for each reference count value. A value in the reference count matrix indicates whether or not the indexing reference count value is being used in the corresponding partition.
- a value of 1 indicates that the indexing reference count is in use in the partition, and a value of 0 indicates that the reference count is not in use. Thus, if the value read from the partition, along with the corresponding partition, indexes into the reference count matrix 106 to a value 1, the value in the partition is valid. A 0 value indicates that the value read from the partition is invalid, and therefore, may be indicative of a cache coherency problem.
- the thread control table 102 includes an entry for each available thread. Each entry includes a thread identifier (ID), an access routine specification, a status, and a partition and associated reference count range.
- ID identifies the thread.
- the access routine specification specifies the type of access made to the partition by the thread. For example, access is viewed as being in one of the following general categories: read operations, write operations, both read and write operations, write operations and immediate verification, and verification.
- the selected access routine depends on test objectives.
- the status indicates whether the thread is active or idle.
- FIG. 2 illustrates a table used to manage partition schemes for partitions of the area of memory used for cache coherency testing in accordance with another embodiment of the invention.
- Partition definitions are associated in groups to form a partitioning scheme. Multiple partitioning schemes are created prior to testing and stored in an M ⁇ N matrix 150 , where M is the number of partitioning schemes, and N is the maximum number of partition definitions used in a partitioning scheme. One partition scheme is selected and used during a test.
- partitions can also be used to limit the amount of memory that is accessed or forcing more concurrency on particular architectural boundaries, such as cache lines, supersets, segments or pages.
- the size of the area and the logical partitioning both influence the amount of simultaneous access that can be expected with a given number of threads.
- FIG. 3 is a flowchart of a process for controlling a cache coherency test in accordance with one embodiment of the invention.
- a main test execution process controls the selection of partitioning schemes and activation of threads in performing a desired test.
- Feedback of system operational characteristics is used to adjust the test characteristics in order to develop desirable parameter values and achieve certain test objectives.
- the test execution begins by getting a set of parameter values (step 202 ).
- the parameters include, for example, the number of threads, the types of access routines, the partitioning scheme, and reference count ranges for different thread access functions.
- the memory area 104 to be used by the threads is allocated (step 206 ), with the quantity determined by the particular test.
- the memory area is partitioned according to the selected partitioning scheme (step 208 ) by associating the partition definitions with the threads in the control table 102 .
- the partitioning also entails establishing reference count ranges for the threads. For threads operating within the same partition, the assigned reference count ranges must not overlap. As between two threads that operate in different partitions, the reference count ranges may overlap.
- the use of reference count ranges supports detection of error conditions such as old data, which is data that was previously valid but subsequently overwritten with new data. Error conditions related to invalidation, purge, flush, and ownership would also be revealed with the described partitioning and reference counts.
- a feedback activity monitors system performance characteristics and adjusts parameter values in order to achieve a certain test objective ( FIG. 6 ).
- the feedback activity is used to evolve parameter sets. Further information on evolving parameter sets is found in the following patents applications: “METHOD AND SYSTEM FOR TESTING A COMPUTING ARRANGEMENT” by Lang et al., filed on Jun. 6, 2002, having patent application Ser. No. 10/164,877; and “METHOD AND SYSTEM FOR GENERATING SETS OF PARAMETER VALUES FOR TEST SCENARIOS” by Lang et al., filed on Jun. 6, 2002, having patent application Ser. No. 10/165,506. The contents of these patents/applications are incorporated herein by reference.
- the test execution process incorporates changes to the parameters (step 212 ) before continuing with the remainder of the test (step 214 ). Example changes include the number of active threads, a change in partitioning scheme, and changes in reference count ranges.
- FIG. 4 is a flowchart of an example process by which a thread makes access to a partition.
- the thread writes values to the assigned partition of the memory area, and the verification thread verifies that the expected values are present in the partition.
- the thread obtains the definition of the assigned partition (step 254 ) from the partitioning scheme table 150 .
- the assigned partition is determined from the thread control table 102 .
- the initial value of the index used to update the memory area is set to the beginning offset of the partition definition.
- the thread also obtains the reference count range to be used in writing to the partition (step 256 ).
- the first value in the reference count range is selected to be written to the partition.
- the specific value that is selected depends on test requirements.
- the current reference count range for a given access function is determined by the test. However, in an example embodiment for a given range of reference counts the actual utilization begins with the lowest value in that range and progresses to the highest. For example, if the range 10–15 is assigned to a particular access thread, the thread begins by writing 10 then 11 and so on.
- the reference count concept requires that ranges be allocated consecutively and that values within each range be referenced sequentially. When the test begins reference count ranges are allocated beginning at 0 and continuing until the reference counts are exhausted. Each thread that selects a write function will acquire a reference count range that begins with the end of the previously allocated reference count range +1 and extends for some number of sequential values. This continues until all available reference counts have been allocated.
- the selected reference count value is written to the memory area as addressed by the memory index (step 258 ).
- the reference count is used in combination with an address tag. For example, if the memory area is restricted to a 16 Mbyte area, a single byte reference count can be combined with a 3-byte address tag to uniquely identify the location of any word (4 bytes) of data. These 4 bytes must align with a 4-byte boundary relative to the beginning of the partition, but would not need to be transferred in one instruction. For partitions containing contiguous byte ranges of fewer than 4 bytes, or access routines with non-sequential memory access, the data can be transferred a single byte at a time.
- each thread must adhere to the following rules when modifying a partition: 1) all bytes in the assigned partition must be modified; 2) the assigned reference count range must be used; 3) all the bytes in the partition must be written with one reference count value before beginning to write another value in the range; and 4) no other partitions can be modified by the thread. Even though every byte in the partition must be modified with a reference count value before using another value, the values in the range need not be used sequentially. Thus, the values could be used in reverse order or any other order.
- Various access patterns can be incorporated to stress different aspects of the cache. For example, for a relatively contiguous partition, an access pattern involves updating a single byte from each cache-line before going back and writing the next byte.
- the bytes of a partition may be written in a random order, provided that all unwritten bytes are updated.
- a partition may also be updated using instructions that update multiple words.
- the thread After writing the current reference count value to the partition, the thread advances the memory index according to the partition definition. Specifically, the memory index is incremented by one until the contiguous byte count (of the partition definition) is reached. When the contiguous byte count is reached, the memory index is incremented by the increment value. It will be appreciated that the memory index need not be advanced incrementally through the partition. However, in such an embodiment the thread must track which bytes of the partition remain to be written, and an unwritten byte must be selected for the new value of the memory index.
- step 262 When all the bytes in the partition have been written with a reference count value (decision step 262 ), the entry in the reference count matrix 106 for the previous reference count value is marked invalid (step 264 ). This indicates that there should be no remaining bytes in the partition having the previous reference count value. If there are unused values in the reference count range (decision step 266 ), the thread selects an unused reference count value to use as the current reference count value (step 268 ), and the thread resumes writing the current reference count value to the partition (step 258 ). Otherwise, the thread idles (step 270 ) until the controlling test execution process reactivates the thread.
- FIG. 5 is a flowchart of an example process for verifying the validity of values written by multiple threads to the area of memory. Verification of memory contents can be performed at any time during thread execution. There is no requirement that updates to a partition are stalled or complete for verification to take place. Verification can be performed at any of the following times: 1) the same thread performing a write operation may immediately verify the results; 2) any thread may verify a partition after the partition has been written at least once; and 3) any thread may verify a partition any time after the entire target memory has been written at least once.
- a thread executing on one processor may check the integrity of a memory location at the same time that that location is being modified by a thread on another processor.
- the reference count matrix which is accessible to all threads in the program, is used to save information on expected values. If multiple threads are permitted to target the same partition, several reference count values are valid in the partition.
- the process begins with setting a memory index to the beginning offset of the partition (step 302 ).
- the value at the address of the memory index is then read (step 304 ).
- the reference count status is read from the reference count matrix (step 306 ). If the reference count status is valid (decision step 308 ), the memory index is advanced according to the partition definition (step 310 ). The process returns to read another value from the memory (step 304 ) until the end of the partition is reached (step 312 ).
- the value read from memory may still be valid in the event that the reference count was marked invalid after the value was read from memory.
- the data read from the memory is saved as a previous reference count (step 314 ), and the value is reread from memory as addressed by the memory index (step 316 ). If the current reference count is equal to the previous reference count (decision step ( 318 ), an error is flagged (step 320 ) and the verification process exits.
- the current reference count is used as an index into the reference count array to read the reference count status (step 322 ). If the reference count status is valid (decision step 324 ), the process returns to advance the memory index (step 310 ). Otherwise, the current status of all the reference counts for the partition is saved as the previous reference count array (step 326 ). The value at the memory index is reread from the memory as the current reference count (step 328 ). The current reference count is used to index and read the reference count status from both the reference count matrix and the previous reference count array (steps 330 , 332 ).
- FIG. 6 is a flowchart of a process performed by a feedback activity.
- the feedback activity routine attempts to find the parameter values that come closest to achieving a system-level test objective and attempts to do so in a minimal amount of time.
- the feedback activity concentrates on those parameters that appear to have the largest positive influence (relative to the system-level goal) on the system behavior. Less time is spent adjusting parameters that have been found to have insignificant or negative affects on the outcome. Parameters that require large amount of time or overhead to change are also adjusted less frequently than other more responsive parameters.
- the particular parameters selected for adjustment depend on the current test objective and the platform parameters that are available for measurement. For example, if the current test objective is to verify data coherency while maximizing main memory access, some of the parameters that might be adjusted include:
- the feedback activity can begin to make adjustments.
- the point at which changes to parameter values are incorporated in the executing test is determined by the speed with which the executing code can both recognize and respond to the changes.
- the feedback activity follows these general rules: 1) values of critical parameters are not changed; 2) the type of the parameter (fixed, restricted, unrestricted) is maintained wherever possible; 3) random parameters are not changed to another type (except under special conditions); 4) changes are made in progressive increments, attempting to remain as close to the original focus area as possible; 5) values of parameters that are more easily acted on by the test execution unit(s) are changed first; and 6) parameter values are selected with the intent of getting closer to the system level goal. Further explanation of the various types of parameters (e.g., critical, fixed, restricted, and unrestricted), as well as focus area are found in the referenced patents/applications.
- the parameter definitions further include information that indicates levels of difficulty (“difficulty levels”) associated with changing the parameter type and the parameter value relative to incorporation by a test execution unit. This information is used to assess how difficult, how quickly, and at what level a parameter can be adjusted during execution. For example, in 2200 systems, assigning files and creating new processes involve different levels of system overhead. Thus, parameters associated with assigning files will have a different difficulty level than parameters associated with creating new processes. The difficulty levels are assigned by a user.
- parameters at the highest difficulty level require significant time and/or overhead to change. For example, changes to a parameter having a high difficulty level may require that execution be temporarily halted or require that changes be made only by the test execution engine, and only at specific times during execution.
- Parameters having the lowest difficulty levels require no special setup. These parameters can be adjusted at any time, and are frequently reread in the main-path of a test execution unit. Generally, parameters having higher difficulty levels are adjusted less frequently than parameters having lower difficulty levels.
- the difficulty levels are divided into two measures.
- One measure relates to the difficulty involved in reducing the current value of the parameter.
- the other relates to the difficulty involved in increasing the current value of the parameter. For example, marking a file invalid in a file table for the parameter of the number of target files requires relatively low overhead. However, assigning a new file requires significantly more overhead and can only be accomplished at relatively infrequent intervals during test execution for parameters such as the number of files/device or total number of files.
- the parameter type indicates whether the parameter is a fixed value parameter, a restricted range parameter, or a random parameter.
- the type is not intrinsic to the parameter, but is determined by initial settings.
- the feedback activity attempts to maintain the parameter type during the course of execution. Fixed parameters remain fixed, restricted range parameters remain restricted range, and random parameters remain random whenever possible.
- parameters can be further subdivided according to the type and number of allowable values.
- Parameter types refer to the current or assigned parameter values whereas value types refer to the allowable values and methodology for a parameter's modification.
- each parameter can be assigned a value type of discrete range, continuous range, or limited choice.
- Discrete range parameters are characterized by a relatively small number of possible values. Typically, a discrete range parameter would have less than a hundred possible discrete values.
- Continuous range parameters although not necessarily mathematically continuous, have a sufficiently large number of possible values that attempting them all is not possible. In addition, minor variations typically have little effect. For example, in adjusting the word count in data transfers the range of allowable values is quite large and variations in system behavior when adjusting the value by only a few words is not expected to be significant.
- a variant of the divide-and-conquer approach is used. The initial value is used to logically divide the range into two portions, and a measurement from each portion is taken. The best measurement becomes the basis for subdividing the portion in which it belongs into two new portions. The process of subdividing the range is repeated until no further progress is made, or the range has reached some minimum size.
- Limited choice parameters have a fixed, small number of choices, for example, not more than four or five different values.
- each different value has unknown effects on the system. For example, finding that an improvement occurred when changing from a first algorithm to a second algorithm allows no prediction as to the effect of changing from the second algorithm to a third algorithm.
- selecting values for parameters of this type each value is selected in turn. Selections can be made in any order.
- An example limited choice parameter is a workload distribution algorithm.
- the feedback activity begins by sorting parameters by difficulty level (step 402 ), and then obtains baseline system performance measurements (step 404 ). Assuming three levels of difficulty, parameters are divided into three groups. Groups may include a combination of fixed value, restricted range, or randomly generated parameters depending on the associated difficulty levels. Critical parameters are not included, as they represent parameters that may not be adjusted.
- Random (unrestricted) parameters are established as such because these parameters are expected to have an insignificant effect on the ability of the environment to achieve the associated system-level goal. Thus, random parameters are not optimal candidates for change. However, because interactions of variables cannot always be predicted, some exploration of these parameters is included.
- the feedback activity first attempts to optimize the set of parameters at the lowest difficulty level.
- a set of parameters is considered optimized when changes to any one of the set results in a measurement that is equal to the system-level goal, or as is more likely the case, further from the system-level goal than a previous reading.
- An individual parameter is considered optimized when both positive and negative changes have an adverse effect on the system measurement. It will be appreciated that the optimization is not absolute, but is dependent on the current values of other environmental parameters.
- parameters at higher levels of difficulty are optimized individually, and parameters at the lower difficulty level are optimized based on changes at the higher levels.
- the original (higher level difficulty) parameter is not readjusted, or treated as part of this optimization set. This limits the number of changes that are made to higher difficulty level parameters.
- the optimization is then considered stable. At this time, boundaries between difficulty levels are removed, and the parameters are treated as one optimized set. In this way, more interactions between parameters can be explored.
- the feedback activity proceeds in several stages. Except for the initial optimization stage, the other stages are repeated. The stages are shown in FIG. 6 and further described in the following paragraphs.
- each parameter is optimized individually using the initial environment parameter values. The best value for a parameter is found while the current values of all other parameters are fixed. Once all parameters have been optimized once, the process is repeated. Because parameters may interact with each other, a change in one parameter value may mean a new optimal value for another. Recall that when fixed value parameters are adjusted, they are adjusted by assigning another fixed value, ideally a value that results in a measurement closer to the system-level goal. An example of this technique for optimizing a fixed value parameter follows.
- the first step in optimizing this parameter is to test the two adjacent values, 2 and 4, for the effects on the system performance measurement, and to record the effects of the change relative to the baseline. The next action depends on whether the change is negative, positive, or there is no change.
- next adjacent values are selected (1 or 5). If there is still no change, values at the upper and lower bounds of the range are checked. If there is still no change, the parameter is (temporarily) abandoned, and attention turns to another fixed parameter.
- a range is selected instead of a single value.
- the size of the range is left unchanged.
- the range is shifted to the right or left (numerically), looking for positive or negative changes in the performance measurement.
- the shift is one value to the right (or left).
- the shift is a selected percentage of the size of the original range, for example, 50%. As with fixed parameters, the shifts continue in the order of positive change.
- parameters are grouped by response level and degree of interaction (step 408 ). As adjustments are made, the feedback activity tracks those parameters that appear to have little affect on the performance measurement, those parameters that cause a negative change, and those parameters that cause a significant positive change. The parameters are then grouped by relative response level and degree of interaction within each difficulty level. Those parameters that cause a positive change are placed in a higher level group within the difficulty level. Parameters that have a strong (positive) interaction with other parameters, in particular, specifically, those that require frequent adjustments on subsequent passes through the parameter list, are also placed in a higher level group. Optimization concentrates on those parameters with the most significant and positive interactions.
- This reordering and grouping of parameters is dynamic, and may change as new parameters or changes are introduced, and new interactions are found.
- the number of groups within a difficulty level is implementation specific. At a minimum, three groups would be required, one for parameters with positive responses, one for negative responses, and one for limited response.
- optimization is performed within the response groups (step 410 ). During the initial optimization stage, several passes through the parameters are made. Information gained with each pass through the parameters is used to separate these parameters into groups according to their level of positive response, and degree of interaction with other parameters.
- the level of positive response is parameter-specific, relative to the system-level goal, and independent of other parameters.
- the degree of interaction is a quantitative measurement of how other parameter optimizations are affected by a change in a particular parameter value.
- optimization of a response group results in no changes, some steps can be skipped. For example, if optimization of a lower-level response group parameters results in no adjustment from the initial value, no optimization of higher-level response groups is required because there was no change in the lower level response group to affect the higher-level response group. In practice, parameters from the limited or negative interaction sets would infrequently result in changes.
- the next stage of optimization optimizes parameters in successive increasing levels of difficulty (step 412 ).
- the parameters at each level of difficulty are optimized before continuing with parameters at the next level of difficulty.
- a parameter value is adjusted until an optimal value is found. Again, the process of FIG. 7 is used for optimizing the parameters.
- the parameter set is expanded and further optimization is performed after optimizing by increasing levels of difficulty (step 414 ). Optimization of the expanded parameter set occurs when no further optimization progress can be made by separately optimizing difficult parameters. This stage involves treating all parameters, regardless of their difficulty level as a single parameter set. Parameters are sorted into response groups according to the amount of positive influence and degree of interaction with other parameters in the set. Optimization by group continues as described above, until no further progress can be made.
- the final stage of optimization is called the exploratory stage (step 414 ).
- the exploratory stage (step 414 )
- previously unexplored ranges and parameter values are tested in search of a more optimal solution. Less emphasis is placed on maintaining the same degree of environmental focus, and fewer assumptions are made about the individual functional relationships of parameters.
- values of random parameters are allowed to vary.
- the values for one or more parameters are selected at random, from the set of previously unselected values. This defines a new starting point for the optimization.
- genes are allowed to mutate.
- this stage may be moved ahead of the optimization of expanded parameter set stage.
- the exploratory stage for a restricted range parameter involves first contracting, then expanding the current range. Contracting a restricted parameter range uses a divide-and-conquer method similar to that described above in association with adjusting continuous range parameter values.
- the exploratory stage also allows ranges to expand into previously unexplored areas of the range.
- the original range is abandoned, and a new, larger range is defined.
- the new range spans the entire range of defined values for the parameter.
- the divide-and-conquer method is then used to find some subset of this range that produces results closer to the system level goal being measured.
- the first step is implied, as the range of a random parameter is, by definition, the entire range of parameter values.
- the feedback activity could monitor the behavior of the system in other (non-goal) areas. For example, if the system level goal of the environment is to maximize the number of write bytes transferred per second, the feedback activity routine could simultaneously measure the average I/O queue size, the total number of requests per second, and the number of split requests. If desired, separate records could be kept for each of these measurements as well. Parameters are adjusted for maximum bytes per second. However, if during the course of this optimization one of the other measurements was found to exceed the previous “best” measurement, the saved set of parameter values could be updated.
- This auxiliary information could be used to supply initial starting parameter values for the environment if the user wishes to override the environment goal. For example, the user may want to verify the integrity of word addressable files under conditions of maximum page faults, rather than maximum I/O volume.
- FIG. 7 is a flowchart of a process performed by the feedback activity in optimizing a selected parameter value.
- the optimization process monitors the performance of the system by taking a performance measurement in a selected area of the system (step 452 ).
- the number and type of performance measurements that can be taken to support a given test objective depend on the specific platform under test.
- Example performance measurements include:
- the performance measurement is compared to a target value for the environment (step 454 ), and the set of parameter values is stored if the performance measurement is one of n best measurements taken. It will be appreciated that storage of the set of parameter values is made by type of computing arrangement in accordance with one embodiment of the invention.
- the process determines that either performance measurement has reached a desired level or that no further progress is expected (decision step 458 ). Otherwise, one or more parameters are selected for generation of new parameter values (steps 460 , 462 ). The response flags associated with the changed parameter values are set for the test execution units (step 464 ). When all the test execution units have responded by clearing the respective response flags (step 466 ), the optimization process returns to take another performance measurement (step 452 ).
- the present invention provides, among other aspects, a method and apparatus for verification of data coherency.
- Other aspects and embodiments of the present invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. It is intended that the specification and illustrated embodiments be considered as examples only, with a true scope and spirit of the invention being indicated by the following claims.
Abstract
Description
[11111111111111111111111111.....11111111111111111111111111111111111111] | ||
1 | 16M |
[111111...1111111] | ||
1 | 4K |
[1111...1112222...2223333...3334444...444] | |
1 4M 8M 12M 16M |
[11111111222222223333333111111112222222233333333...2222222233333333] | |
word 1 word 2 word 3 word 4 word 5 word 6 word 8*10 |
[1212121212121212...12121212] | |
1 16M |
[132456...n123456...n123456...n...123456...n] | |
|
[111112222222222222233111111111122333333333333344444.......xxxx11111] | |
1 16M |
-
- :
-
- 1. partition memory size—the size of the memory under test affects the residency of data in the system caches and consequently, the number of memory accesses made;
- 2. contiguous byte/increment byte counts—each of these has a direct bearing on the number if accesses to a given architectural structure (e.g., cache line, page, etc.) and consequently affects the number of memory accesses;
- 3. ratio of read to write functions—this ratio has a direct bearing on cache invalidation and the resulting number of memory accesses;
- 4. processors on which the access threads are assigned for execution—this affects the distribution of access requests from multiple platform access units; and
- 5. the number of active threads—this parameter affects the total number of memory accesses.
Each of these parameters has a direct impact on the amount of write concurrency (which causes cache line invalidation) that takes place across the entire range of the memory partition by multiple requesters, as well as the read operations that are necessary to verify data coherency.
-
- 1. (the number of data read requests per requestor—represents the number of read operations made by each requester to cache/memory.
- 2. (the number of data write requests per requestor—represents the number of data modification requests made by each requestor to cache/memory
- 3. the number of cache misses per requester (or cache data fetch operations)—indicates the number of times memory must be accessed to obtain the latest copy of the data; and
- 4. the number of cache invalidations per requestor—indicates the number of data areas that must be written to memory.
These measurements allow the test to determine the number of memory accesses and by which requesters the memory accesses are made. If it is determined that the access pattern from a number of requesters is causing a high level of cache hits, the processing can be modified to cause requests to be made to memory instead. If it is determined that the ratio of read to write requests is not as desired, the ratio of read and write access functions, along with the designated processors for execution, can be modified to achieve the desired result.
Claims (21)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/268,238 US7200721B1 (en) | 2002-10-09 | 2002-10-09 | Verification of memory operations by multiple processors to a shared memory |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/268,238 US7200721B1 (en) | 2002-10-09 | 2002-10-09 | Verification of memory operations by multiple processors to a shared memory |
Publications (1)
Publication Number | Publication Date |
---|---|
US7200721B1 true US7200721B1 (en) | 2007-04-03 |
Family
ID=37897709
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/268,238 Expired - Fee Related US7200721B1 (en) | 2002-10-09 | 2002-10-09 | Verification of memory operations by multiple processors to a shared memory |
Country Status (1)
Country | Link |
---|---|
US (1) | US7200721B1 (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050149700A1 (en) * | 2003-12-19 | 2005-07-07 | Samra Nicholas G. | Virtual multithreading translation mechanism including retrofit capability |
US20060026357A1 (en) * | 2004-07-27 | 2006-02-02 | Texas Instruments Incorporated | Context save and restore with a stack-based memory structure |
US20070067337A1 (en) * | 2005-09-22 | 2007-03-22 | Morris John M | Method of managing retrieval of data objects from a storage device |
US20080150949A1 (en) * | 2006-12-22 | 2008-06-26 | Jian Wei | Quick pixel rendering processing |
US20080168237A1 (en) * | 2007-01-05 | 2008-07-10 | Ibm Corporation | Cache coherence monitoring and feedback |
US20080168239A1 (en) * | 2007-01-05 | 2008-07-10 | Ibm Corporation | Architecture support of memory access coloring |
US20080168230A1 (en) * | 2007-01-05 | 2008-07-10 | Xiaowei Shen | Color-based cache monitoring |
US8055970B1 (en) * | 2005-11-14 | 2011-11-08 | Raytheon Company | System and method for parallel processing of data integrity algorithms |
US8289981B1 (en) * | 2009-04-29 | 2012-10-16 | Trend Micro Incorporated | Apparatus and method for high-performance network content processing |
US20140297833A1 (en) * | 2013-03-29 | 2014-10-02 | Alcatel Lucent | Systems And Methods For Self-Adaptive Distributed Systems |
US20140310484A1 (en) * | 2013-04-16 | 2014-10-16 | Nvidia Corporation | System and method for globally addressable gpu memory |
US9734090B2 (en) | 2012-06-21 | 2017-08-15 | Microsoft Technology Licensing, Llc. | Partitioned reference counter |
US9928127B2 (en) | 2016-01-29 | 2018-03-27 | International Business Machines Corporation | Testing a data coherency algorithm |
US10248479B2 (en) * | 2015-05-25 | 2019-04-02 | Fujitsu Limited | Arithmetic processing device storing diagnostic results in parallel with diagnosing, information processing apparatus and control method of arithmetic processing device |
US10685031B2 (en) * | 2018-03-27 | 2020-06-16 | New Relic, Inc. | Dynamic hash partitioning for large-scale database management systems |
US11144235B1 (en) | 2019-08-07 | 2021-10-12 | Xlnx, Inc. | System and method for evaluating memory system performance |
CN116820786A (en) * | 2023-08-31 | 2023-09-29 | 本原数据(北京)信息技术有限公司 | Data access method and device of database, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6301616B1 (en) * | 1997-04-11 | 2001-10-09 | Microsoft Corporation | Pledge-based resource allocation system |
US6430648B1 (en) * | 2000-01-05 | 2002-08-06 | International Business Machines Corporation | Arranging address space to access multiple memory banks |
US6535905B1 (en) * | 1999-04-29 | 2003-03-18 | Intel Corporation | Method and apparatus for thread switching within a multithreaded processor |
US6889159B2 (en) * | 2002-07-22 | 2005-05-03 | Finisar Corporation | Scalable multithreaded system testing tool |
-
2002
- 2002-10-09 US US10/268,238 patent/US7200721B1/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6301616B1 (en) * | 1997-04-11 | 2001-10-09 | Microsoft Corporation | Pledge-based resource allocation system |
US6535905B1 (en) * | 1999-04-29 | 2003-03-18 | Intel Corporation | Method and apparatus for thread switching within a multithreaded processor |
US6430648B1 (en) * | 2000-01-05 | 2002-08-06 | International Business Machines Corporation | Arranging address space to access multiple memory banks |
US6889159B2 (en) * | 2002-07-22 | 2005-05-03 | Finisar Corporation | Scalable multithreaded system testing tool |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050149700A1 (en) * | 2003-12-19 | 2005-07-07 | Samra Nicholas G. | Virtual multithreading translation mechanism including retrofit capability |
US7606977B2 (en) * | 2004-07-27 | 2009-10-20 | Texas Instruments Incorporated | Context save and restore with a stack-based memory structure |
US20060026357A1 (en) * | 2004-07-27 | 2006-02-02 | Texas Instruments Incorporated | Context save and restore with a stack-based memory structure |
US7546437B2 (en) * | 2004-07-27 | 2009-06-09 | Texas Instruments Incorporated | Memory usable in cache mode or scratch pad mode to reduce the frequency of memory accesses |
US20070067337A1 (en) * | 2005-09-22 | 2007-03-22 | Morris John M | Method of managing retrieval of data objects from a storage device |
US8055970B1 (en) * | 2005-11-14 | 2011-11-08 | Raytheon Company | System and method for parallel processing of data integrity algorithms |
US20080150949A1 (en) * | 2006-12-22 | 2008-06-26 | Jian Wei | Quick pixel rendering processing |
US8207972B2 (en) * | 2006-12-22 | 2012-06-26 | Qualcomm Incorporated | Quick pixel rendering processing |
US20080168230A1 (en) * | 2007-01-05 | 2008-07-10 | Xiaowei Shen | Color-based cache monitoring |
US7895392B2 (en) * | 2007-01-05 | 2011-02-22 | International Business Machines | Color-based cache monitoring |
US8799581B2 (en) * | 2007-01-05 | 2014-08-05 | International Business Machines Corporation | Cache coherence monitoring and feedback |
US20080168239A1 (en) * | 2007-01-05 | 2008-07-10 | Ibm Corporation | Architecture support of memory access coloring |
US20080168237A1 (en) * | 2007-01-05 | 2008-07-10 | Ibm Corporation | Cache coherence monitoring and feedback |
US8671248B2 (en) | 2007-01-05 | 2014-03-11 | International Business Machines Corporation | Architecture support of memory access coloring |
US8289981B1 (en) * | 2009-04-29 | 2012-10-16 | Trend Micro Incorporated | Apparatus and method for high-performance network content processing |
US9734090B2 (en) | 2012-06-21 | 2017-08-15 | Microsoft Technology Licensing, Llc. | Partitioned reference counter |
US20140297833A1 (en) * | 2013-03-29 | 2014-10-02 | Alcatel Lucent | Systems And Methods For Self-Adaptive Distributed Systems |
US20140310484A1 (en) * | 2013-04-16 | 2014-10-16 | Nvidia Corporation | System and method for globally addressable gpu memory |
US10248479B2 (en) * | 2015-05-25 | 2019-04-02 | Fujitsu Limited | Arithmetic processing device storing diagnostic results in parallel with diagnosing, information processing apparatus and control method of arithmetic processing device |
US9928127B2 (en) | 2016-01-29 | 2018-03-27 | International Business Machines Corporation | Testing a data coherency algorithm |
US10558510B2 (en) | 2016-01-29 | 2020-02-11 | International Business Machines Corporation | Testing a data coherency algorithm |
US11099919B2 (en) | 2016-01-29 | 2021-08-24 | International Business Machines Corporation | Testing a data coherency algorithm |
US10685031B2 (en) * | 2018-03-27 | 2020-06-16 | New Relic, Inc. | Dynamic hash partitioning for large-scale database management systems |
US11144235B1 (en) | 2019-08-07 | 2021-10-12 | Xlnx, Inc. | System and method for evaluating memory system performance |
CN116820786A (en) * | 2023-08-31 | 2023-09-29 | 本原数据(北京)信息技术有限公司 | Data access method and device of database, electronic equipment and storage medium |
CN116820786B (en) * | 2023-08-31 | 2023-12-19 | 本原数据(北京)信息技术有限公司 | Data access method and device of database, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7200721B1 (en) | Verification of memory operations by multiple processors to a shared memory | |
US5778436A (en) | Predictive caching system and method based on memory access which previously followed a cache miss | |
US4779189A (en) | Peripheral subsystem initialization method and apparatus | |
US5689679A (en) | Memory system and method for selective multi-level caching using a cache level code | |
US4875155A (en) | Peripheral subsystem having read/write cache with record access | |
US4463424A (en) | Method for dynamically allocating LRU/MRU managed memory among concurrent sequential processes | |
US8245060B2 (en) | Memory object relocation for power savings | |
DE69822534T2 (en) | Shared memory usage with variable block sizes for symmetric multiprocessor groups | |
US8380933B2 (en) | Multiprocessor system including processor cores and a shared memory | |
US6738865B1 (en) | Method, system, and program for demoting data from cache based on least recently accessed and least frequently accessed data | |
US20030046493A1 (en) | Hardware updated metadata for non-volatile mass storage cache | |
EP0077453A2 (en) | Storage subsystems with arrangements for limiting data occupancy in caches thereof | |
EP0471434B1 (en) | Method and apparatus for controlling a multi-segment cache memory | |
JPH023215B2 (en) | ||
JPH0415839A (en) | Distributed data base control device | |
US7065676B1 (en) | Multi-threaded memory management test system with feedback to adjust input parameters in response to performance | |
US7032133B1 (en) | Method and system for testing a computing arrangement | |
US6950902B2 (en) | Cache memory system | |
WO2013080434A1 (en) | Dynamic process/object scoped memory affinity adjuster | |
US20040193801A1 (en) | System, apparatus, and process for evaluating projected cache sizes | |
US8700851B2 (en) | Apparatus and method for information processing enabling fast access to program | |
JP2000298618A (en) | Set associative cache memory device | |
JP2005293300A (en) | Set associative cache system and control method of cache memory | |
CN115168247A (en) | Method for dynamically sharing memory space in parallel processor and corresponding processor | |
US7308683B2 (en) | Ordering of high use program code segments using simulated annealing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: JOHNSON, CHARLES A., MINNESOTA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LANG, MICHELLE J.;YOHN, WILLIAM JUDGE;REEL/FRAME:013382/0120;SIGNING DATES FROM 20021004 TO 20021009 |
|
AS | Assignment |
Owner name: CITIBANK, N.A.,NEW YORK Free format text: SECURITY AGREEMENT;ASSIGNORS:UNISYS CORPORATION;UNISYS HOLDING CORPORATION;REEL/FRAME:018003/0001 Effective date: 20060531 Owner name: CITIBANK, N.A., NEW YORK Free format text: SECURITY AGREEMENT;ASSIGNORS:UNISYS CORPORATION;UNISYS HOLDING CORPORATION;REEL/FRAME:018003/0001 Effective date: 20060531 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: UNISYS CORPORATION, PENNSYLVANIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:023312/0044 Effective date: 20090601 Owner name: UNISYS HOLDING CORPORATION, DELAWARE Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:023312/0044 Effective date: 20090601 Owner name: UNISYS CORPORATION,PENNSYLVANIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:023312/0044 Effective date: 20090601 Owner name: UNISYS HOLDING CORPORATION,DELAWARE Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:023312/0044 Effective date: 20090601 |
|
AS | Assignment |
Owner name: UNISYS CORPORATION, PENNSYLVANIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:023263/0631 Effective date: 20090601 Owner name: UNISYS HOLDING CORPORATION, DELAWARE Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:023263/0631 Effective date: 20090601 Owner name: UNISYS CORPORATION,PENNSYLVANIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:023263/0631 Effective date: 20090601 Owner name: UNISYS HOLDING CORPORATION,DELAWARE Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:023263/0631 Effective date: 20090601 |
|
AS | Assignment |
Owner name: DEUTSCHE BANK TRUST COMPANY AMERICAS, AS COLLATERA Free format text: PATENT SECURITY AGREEMENT (PRIORITY LIEN);ASSIGNOR:UNISYS CORPORATION;REEL/FRAME:023355/0001 Effective date: 20090731 |
|
AS | Assignment |
Owner name: DEUTSCHE BANK TRUST COMPANY AMERICAS, AS COLLATERA Free format text: PATENT SECURITY AGREEMENT (JUNIOR LIEN);ASSIGNOR:UNISYS CORPORATION;REEL/FRAME:023364/0098 Effective date: 20090731 |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
AS | Assignment |
Owner name: GENERAL ELECTRIC CAPITAL CORPORATION, AS AGENT, IL Free format text: SECURITY AGREEMENT;ASSIGNOR:UNISYS CORPORATION;REEL/FRAME:026509/0001 Effective date: 20110623 |
|
AS | Assignment |
Owner name: UNISYS CORPORATION, PENNSYLVANIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:DEUTSCHE BANK TRUST COMPANY;REEL/FRAME:030004/0619 Effective date: 20121127 |
|
AS | Assignment |
Owner name: UNISYS CORPORATION, PENNSYLVANIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:DEUTSCHE BANK TRUST COMPANY AMERICAS, AS COLLATERAL TRUSTEE;REEL/FRAME:030082/0545 Effective date: 20121127 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
AS | Assignment |
Owner name: WELLS FARGO BANK, NATIONAL ASSOCIATION, AS COLLATE Free format text: PATENT SECURITY AGREEMENT;ASSIGNOR:UNISYS CORPORATION;REEL/FRAME:042354/0001 Effective date: 20170417 Owner name: WELLS FARGO BANK, NATIONAL ASSOCIATION, AS COLLATERAL TRUSTEE, NEW YORK Free format text: PATENT SECURITY AGREEMENT;ASSIGNOR:UNISYS CORPORATION;REEL/FRAME:042354/0001 Effective date: 20170417 |
|
AS | Assignment |
Owner name: JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT, ILLINOIS Free format text: SECURITY INTEREST;ASSIGNOR:UNISYS CORPORATION;REEL/FRAME:044144/0081 Effective date: 20171005 Owner name: JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT Free format text: SECURITY INTEREST;ASSIGNOR:UNISYS CORPORATION;REEL/FRAME:044144/0081 Effective date: 20171005 |
|
AS | Assignment |
Owner name: UNISYS CORPORATION, PENNSYLVANIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WELLS FARGO BANK, NATIONAL ASSOCIATION (SUCCESSOR TO GENERAL ELECTRIC CAPITAL CORPORATION);REEL/FRAME:044416/0358 Effective date: 20171005 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20190403 |
|
AS | Assignment |
Owner name: UNISYS CORPORATION, PENNSYLVANIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WELLS FARGO BANK, NATIONAL ASSOCIATION;REEL/FRAME:054231/0496 Effective date: 20200319 |