WO1993022725A1 - Synthetic perturbation tuning of computer programs - Google Patents

Synthetic perturbation tuning of computer programs Download PDF

Info

Publication number
WO1993022725A1
WO1993022725A1 PCT/US1993/004143 US9304143W WO9322725A1 WO 1993022725 A1 WO1993022725 A1 WO 1993022725A1 US 9304143 W US9304143 W US 9304143W WO 9322725 A1 WO9322725 A1 WO 9322725A1
Authority
WO
WIPO (PCT)
Prior art keywords
program
perturbation
synthetic
code
perturbations
Prior art date
Application number
PCT/US1993/004143
Other languages
French (fr)
Inventor
Gordon E. Lyon
Original Assignee
THE UNITED STATES GOVERNMENT as represented by THESECRETARY OF COMMERCE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by THE UNITED STATES GOVERNMENT as represented by THESECRETARY OF COMMERCE filed Critical THE UNITED STATES GOVERNMENT as represented by THESECRETARY OF COMMERCE
Publication of WO1993022725A1 publication Critical patent/WO1993022725A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3404Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for parallel or distributed programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3414Workload generation, e.g. scripts, playback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3419Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/26Functional testing
    • G06F11/261Functional testing by simulating additional hardware, e.g. fault simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3419Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time
    • G06F11/3423Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time where the assessed time is active or idle time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3447Performance evaluation by modeling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/865Monitoring of software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/87Monitoring of transactions

Definitions

  • the present invention relates to the field of tuning computer programs, and in particular, to a method and system for tuning computer programs to be run on parallel computer systems.
  • profiling tools have been developed to aid computer programmers in debugging and improving the efficiency of programs to be run on serial uniprocessor computer systems. These profiling tools reveal which software functions require the most execution time and which software functions are called most frequently. Once these critical software functions are identified, a programmer can focus on editing the corresponding code to improve the overall efficiency of the program by, for example, decreasing the overall run time of the sequential computer program.
  • a simple application of a sequential profiler on a multiprocessor can measure the sum of time that copies of a segment of code spent on each processor. However, the total processor time credited to a source code segment is not related in a simple way to parallel run time. Similarly, a sequential profiler can identify the number of times a segment of code is entered during program performance. However, as with total execution time, it is not easy to relate use count information to run time effect in the parallel domain. Interpreting use count information can be difficult because non-productive wait states can conceal true contributions to performance.
  • a simple profile of a parallel program by a sequential profiler may not offer a clear plan of attack. Alternatively, it could encourage an incorrect interpretation of results. If the results from a profile generated by a sequential profiler are interpreted incorrectly, or if a profile is not available, a programmer can waste time and effort improving code that has little impact on overall performance.
  • the present invention is a method and a system for tuning computer programs that run on parallel computer systems by using synthetic perturbations.
  • This invention involves placing synthetic perturbations into selected locations of the code of a computer program to be tuned.
  • the selected locations correspond to various identifiable segments of the program, e.g., subroutines called by the main program, or data paths to variables and constants.
  • the synthetic perturbations may typically be time delays. The impact of these time delays on the overall performance " of the computer program, as quantified by its run time, is then determined.
  • time-delay values are selected for the perturbations.
  • some perturbations have a finite non ⁇ zero time delay selected, while the rest have zero time delay.
  • the computer program is then executed and its run time measured. By running different trials with different values selected for the time delays, a set of resulting run times is generated.
  • Fig. 1 is a flow chart representation of the method of synthetic perturbation tuning of computer programs of the present invention.
  • Fig. 1 there is shown a flow chart representation of synthetic perturbation computer program tuning method 4 of the present invention.
  • Execution of tuning method 4 begins at node 8 and proceeds to location identification box 12, wherein the locations for synthetic perturbation within the code of the program to be tuned are identified.
  • Execution of tuning method 4 continues to value selection box 16, wherein the sets of values for the synthetic perturbations are selected.
  • perturbation insertion box 20 the synthetic perturbations are inserted into the code of the computer program to be tuned at the locations identified in location identification box 12.
  • the loadable versions of the program are generated in generation box 24 of tuning method 4.
  • Execution of synthetic perturbation computer program _ tuning method 4 then proceeds to execution box 28, wherein the various program versions are run on the computer.
  • quantification box 32 the performances of these computer runs are quantified by measuring program responses.
  • analysis box 36 the measured responses from quantification box 32 are analyzed to determine how the various sets of perturbation values affected overall program performance. This analysis is used to identify in segment identification box 40 those segments of program code that are critical to program performance. Using the results of segment identification box 40, the code of these critical segments can be improved to improve overall program performance in code improvement box 44.
  • Tuning method 4 concludes at node 48.
  • Synthetic perturbation tuning method 4 of the present invention is to identify those segments of code within a computer program that are critical to the efficient performance of that program.
  • Synthetic perturbation tuning method 4 begins with the user identifying locations within the code of the program for insertion of synthetic perturbations as shown in location identification box 12. This location identification requires the user to determine test parts of the program code that correspond to the various in-line code segments, references to storage (e.g., variables), routines, subroutines, and other functions called or entered throughout the program. A synthetic perturbation is to be inserted by the user into each of these selected locations.
  • each synthetic perturbation may have a perturbation function that is an adjustable time delay function.
  • the time delay function used can be any computer operation that utilizes run time while not affecting the computational integrity of the program itself.
  • the time delay function may be adjustable in that it may accept an input parameter representative of the magnitude of the delay to be implemented. For example, one possible time delay function would involve a sequence of repeated computer multiplications as dictated by the input parameter. Thus, if the input parameter to the time delay function is seven, the computer performs seven identical multiplications in sequence. If, on the other hand, the input parameter is zero, no multiplications are performed.
  • each program version may correspond to a perturbation set in which only one of the perturbations has a non-zero input parameter. In such case, different program versions have different perturbations as their non-zero perturbation. Running each of these program versions may then be used to identify critical code segments, by analyzing the effects of each individual perturbation.
  • Design-of-experiments is a conventional statistical technique that applies macro modelling for system simplification, prediction, comparison, or optimization.
  • the design-of-experiments approach treats the system to be optimized as a "black box" with controllable input parameters and measurable output features.
  • Design-of-experiments suggests combinations of input parameters to be applied to the black box. These combinations are designed to isolate the effects of both individual input parameters and combinations of input parameters.
  • the resulting measured output features are then analyzed by the user applying empirical analysis methods, e.g., conventional multiple linear regression techniques, to characterize the effects that these individual input parameters and combinations of input parameters had on system performance.
  • the black box of the design-of- experiments approach is the computer program to be tuned, data for this program, the program perturbations, and the host system.
  • the input parameters are the synthetic perturbations, and each different combination of input parameters is a different perturbation set.
  • a typical measurable output feature is program run time.
  • Design-of-experiments suggests the sets of synthetic perturbation values to be selected by the user for the various • program versions as shown in set selection box 16. Typically, in each perturbation set, some of the perturbations have the same non-zero time delay value selected as input parameters, while the rest have an input of zero, implying that no delay is applied at those locations.
  • the design-of-experiments approach may suggest perturbation sets that are geometrically balanced to explore the input space, thereby limiting the number of perturbation sets employed to a reasonable and manageable level.
  • the synthetic perturbations inserted in perturbation insertion box 20 may be inserted by the user into the source code of the computer program to be tuned or into its machine loadable code.
  • the user may edit the source code by inserting the time delay function invocations into the code at the various locations and with the input parameters as selected by the user. If the delay function invocation replaces a variable reference in source code, the variable reference may itself become an actual parameter to the delay function, which will delay according to one argument, and return the value of the variable according to another argument. The user may then compile the program source code into its machine loadable version.
  • tuning method 4 of the present invention the user may instruct the compiler to ignore the time delay function invocation when compiling the source code for those locations having a perturbation input parameter of zero.
  • the user may implement the perturbation with a jump out of the loadable code at the selected location to a sequence of steps comprising the program instruction line replaced by the jump out, a time delay function, and a jump back to the loadable code.
  • the user may be required to make no change to the corresponding segments of loadable code.
  • the user Whether starting with source code or loadable code, the user generates a unique loadable version of the program in generation box 24 for each unique synthetic perturbation set.
  • the user executes the various program versions on the computer in random order, such that each version is eventually run more than one time as shown in execution box 28.
  • Each program version is run a plurality of times in order to characterize and account for the inherent variability of program performance on parallel computer systems. The user may select a different random sequence for executing the various program versions for each repetition.
  • Each time a computer version is run the user quantifies its performance as shown in quantification box 32. For example, the run time of each version execution may be measured.
  • Certain perturbations or certain combinations of perturbations may result in larger effects to the overall run time of the computer program than other perturbations or other combinations thereof.
  • Some perturbation sets may have negligible impact on program run time, while others may have a relatively large impact on run time.
  • Some combinations of time delays may result in increased program run time and others may result in decreased program run time.
  • the results from the various computer runs are then analyzed by the user to identify and characterize the effects the various perturbation sets had on overall program performance as shown in analysis box 36.
  • the user may perform multiple linear regression analysis on the run time measurements to determine the effects of the various time delays on overall program run time. This analysis is typically performed using the average run time for the plurality of runs of each program version as the run time for that program version. This in turn may identify the segments of program code that are most critical to program performance as shown in segment identification box 40 of tuning method 4. These identified code segments can then be altered by the user to improve the performance of the overall computer program as shown in code improvement box 44.
  • improving the performance of the program may involve different activities depending on the needs of the programmer.
  • the user may edit or rewrite the identified critical code segments to improve their efficiency and thereby improve the efficiency of the program as a whole.
  • Analysis of perturbation impacts also identifies those segments of code that can expand with only minor performance penalties. This information may be useful in implementing new substantive program requirements as they arise.
  • the user may apply the results from the various computer runs to identify a particular perturbation set to become a permanent feature of the final computer program.
  • the selected permanent perturbation set may be one of the perturbation sets previously tested or it may a different one.
  • Synthetic perturbation tuning method 4 of the present invention may be implemented iteratively.
  • the user implements tuning method 4 to identify and then improve the most critical segment of program code.
  • a second implementation of the tuning method 4 may then be performed by the user to identify and improve the next most critical segment of program code. This sequence of analysis and improvement may be repeated until the user determines that program performance has reached acceptable levels.
  • synthetic perturbation tuning method 4 of the present invention provides tuning of programs for serial computer systems as well. Moreover, synthetic perturbation tuning method 4 may be used to tune computer programs for both multiple-instruction, multiple- data and single-instruction, multiple-data parallel computer systems. In addition, tuning method 4 of the present invention may be applied to computer systems with either shared-memory or distributed-memory architecture.
  • quantification box 32 of tuning method 4 may include measuring the memory space used during execution of each program version.
  • the goal of synthetic perturbation computer program tuning method 4 of the present invention may be to improve the performance of the program to be tuned in terms of such memory space usage. Analysis of these memory space usage measurements may then be used to characterize how and to what degree the various segments of program code influence memory space during program execution. In this embodiment of tuning method 4, the critical code segments may then be improved to reduce the amount of memory space used during execution of the program.
  • Synthetic perturbation tuning method 4 of the present invention may also be used to improve program performance by increasing transaction throughput, where transaction throughput is understood to be the number of complete transactions the program can perform in a fixed period of time.
  • execution of the various program versions is quantified in quantification box 32 by measuring transaction throughput. Analysis may then characterize the effects of different segments of program code on the overall transaction throughput of the program to be tuned. The critical code segments may then be improved to increase transaction throughput.
  • Synthetic perturbation tuning method 4 may be implemented by quantifying several different types of response at the same time in quantification box 32. In these embodiments, any combination of program run time, memory space usage, and transaction throughput may be measured for each program execution of execution box 28. In such cases, several distinct tuning evaluations may be performed jointly.
  • the present invention may be implemented by synthetic perturbation tuning system 4 that accepts the computer program to be tuned as input.
  • the user of such a synthetic perturbation tuning system of the present invention may be given the option of manually selecting the locations for insertion of synthetic perturbations or allowing the system to perform that selection process automatically.
  • the automatic selection process may select locations for synthetic perturbations according to the basic structure of the program to be tuned in terms of in-line code segments, data references, routines, subroutines, and function calls.
  • the operator may also be able to combine the automatic and manual selection features to delete or add manually to the list of selections made automatically.
  • Synthetic perturbation tuning system 4 of the present invention may also automatically select the various perturbation sets to be applied to the computer program to be tuned. Tuning system 4 of the present invention may then automatically generate, execute, and quantify the performance of the various program versions corresponding to those selected perturbation sets. Tuning system 4 may then analyze the measured responses to identify the critical segments of program code. A particular perturbation set to be inserted as a permanent feature of the program may also be identified by tuning system 4.
  • Synthetic perturbation tuning method 4 identifies critical segments of program code, that is, those segments of code with greatest impact upon performance. Conventional sequential profiling tools do not achieve this basic goal of tuning computer programs to be run on parallel processing systems. Synthetic perturbation tuning as performed by method 4 also avoids the time and effort of tedious segment-by-segment recoding or algorithm revamping otherwise required for set-up when tuning programs for parallel environments.
  • tuning method 4 Another advantage of tuning method 4 is that the user need not have more than a slight understanding of the computer program being tuned in order to apply tuning method 4. At most, the user needs to understand the basic structure of the program in terms of general useful main program locations, routines, and calls to subroutines and functions to select perturbation locations. However, the user is not required to have any understanding of the actual substantive processing performed by the program to get a tuning recommendation from tuning method 4. A deeper understanding may assist the user in improving the program as in improvement box 44, but such deeper understanding is not always necessary.
  • Still another advantage of synthetic perturbation tuning method 4 is that it maintains the computational integrity of the program being tuned. None of the changes made to the code during the diagnostic steps of tuning method 4, that is, from location identification box 12 through segment identification box 40, have any effect on the substance of the program.
  • a synthetic perturbation may involve either a simple delay or a utilization of buffer space, neither of which has any impact on the numerical computations of a well-written program.
  • tuning method 4 of the present invention may be used to quantify, as well as qualify, program performance in terms of the impacts of adding or subtracting different amounts of substantive code from the selected code segments. Since the duration of a particular time delay is controlled by the perturbation input parameter, input parameters may be selected to simulate the performance of particular sequences of substantive code instructions. In this way, the impact of a particular substantive change upon program performance can be quantified and evaluated without actually editing the substantive code. Alternatively, such use of input parameters may provide insight into the effects of subtracting • different amounts of substantive code without actually making those substantive changes.
  • synthetic perturbation tuning method 4 avoids the dependency upon metrics or upon detailed global state information " inherent in profiling tools of the prior art, such as that taught by Anderson and Lazowska.
  • synthetic perturbation tuning as performed by tuning method 4 avoids many of the problems of interpretation and of information capture of those prior art approaches.
  • synthetic perturbation tuning method 4 is direct. If a perturbation has an effect, tuning method 4 identifies the effect. Consequently, synthetic perturbation tuning method 4 'is fairly independent of architectural aspects of a system. It focuses largely upon perturbation types, perturbation sets, and program performance responses. Structural interpretations within the program to be tuned or the system upon which the program is executed matter less.
  • tuning method 4 of the present invention addresses the interdependencies that predefined tuning metrics can easily miss. While many metrics identify obvious code segments for improvement, not all improvements are equally important. In some performance states, small perturbations matter little. Yet, the same time delay in another state may induce a change in run time that cannot be explained by interpreting a level-of-parallelism metric. Synthetic perturbation tuning method 4 addresses these interprocess communication effects by focusing upon overall response, for example, program run time.

Abstract

The present invention is a method and a system for tuning computer programs that run on parallel computer systems by using synthetic perturbations. This invention involves placing synthetic perturbations (20), e.g., time delays, into selected locations (12) of the code of a computer programm to be tuned. The impact of these time delays (16) on the overall performance of the computer program, as quantified by its run time, is then determined. By running different trials with different values selected (16) for the time delays, a set of resulting run times (32) is generated. Statistical analysis (36) is then performed on these results to identify the segments of program code that are most critical (40) in terms of affecting the performance of the program. The user can then optimize those code segments (44).

Description

SYNTHETIC PERTURBATION TUNING OF COMPUTER PROGRAMS
BACKGROUND OF THE INVENTION
1) Field of the Invention.
The present invention relates to the field of tuning computer programs, and in particular, to a method and system for tuning computer programs to be run on parallel computer systems.
2) Background Art.
Over the years, numerous profiling tools have been developed to aid computer programmers in debugging and improving the efficiency of programs to be run on serial uniprocessor computer systems. These profiling tools reveal which software functions require the most execution time and which software functions are called most frequently. Once these critical software functions are identified, a programmer can focus on editing the corresponding code to improve the overall efficiency of the program by, for example, decreasing the overall run time of the sequential computer program.
Programmers working on parallel architecture computer- systems have a similar need for tools helpful in analyzing and improving the performance of their programs. However, the conventional profiling tools developed for serial architecture computers do not solve the problems particular to running programs on parallel processing systems. Considerations such as process interaction, idle times, and busy-waiting- play an important role in the performance of parallel programs. These considerations usually do not exist in the same combinations in sequential programs. The asynchronous interdependent nature of events on parallel computer systems makes efficient tuning particularly difficult.
A simple application of a sequential profiler on a multiprocessor can measure the sum of time that copies of a segment of code spent on each processor. However, the total processor time credited to a source code segment is not related in a simple way to parallel run time. Similarly, a sequential profiler can identify the number of times a segment of code is entered during program performance. However, as with total execution time, it is not easy to relate use count information to run time effect in the parallel domain. Interpreting use count information can be difficult because non-productive wait states can conceal true contributions to performance. A simple profile of a parallel program by a sequential profiler may not offer a clear plan of attack. Alternatively, it could encourage an incorrect interpretation of results. If the results from a profile generated by a sequential profiler are interpreted incorrectly, or if a profile is not available, a programmer can waste time and effort improving code that has little impact on overall performance.
One tuning tool for shared-memory systems is described by T.E. Anderson and E.D. Lazowska in "Quartz: A Tool for Tuning Parallel Program Performance," Proc, SIG-METRICS 1990 Conference, May 1990, pp. 115-125. The method taught by Anderson and Lazowska exemplifies the state-based approach of using metrics to quantify program performance. The principal metric of their method is "normalized processor time," which is the total time spent in each code segment divided by the level of parallelism during that time. This metric is essentially a local state occupancy time divided by a global state, viz., level of parallelism. Its effect is to draw attention to segments of code that have either large amounts of time or low levels of global parallelism. However, systems such as that taught by Anderson and Lazowska have problems of interpretation and of information capture.
SUMMARY OF THE INVENTION
The present invention is a method and a system for tuning computer programs that run on parallel computer systems by using synthetic perturbations. This invention involves placing synthetic perturbations into selected locations of the code of a computer program to be tuned. The selected locations correspond to various identifiable segments of the program, e.g., subroutines called by the main program, or data paths to variables and constants. When the goal of tuning is to decrease the run time of the program, the synthetic perturbations may typically be time delays. The impact of these time delays on the overall performance "of the computer program, as quantified by its run time, is then determined.
In each trial, time-delay values are selected for the perturbations. Typically, some perturbations have a finite non¬ zero time delay selected, while the rest have zero time delay. Using the selected time delays, the computer program is then executed and its run time measured. By running different trials with different values selected for the time delays, a set of resulting run times is generated.
Statistical analysis is then performed on these results to correlate the various perturbation time delays with program run times. This analysis identifies those time delays that have the greatest effect on program run time. This in turn identifies the segments of program code that are most critical in terms of affecting the performance of the program. The user can then optimize those code segments.
It is believed that if a segment of code is highly sensitive to synthetic perturbation, then overall performance of the program may be most effectively improved by improving that code segment. For example, if adding a time delay to a certain subroutine results in a large increase in program run time, then concentrating program editing efforts on that subroutine may be an effective way to decrease overall program run time.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 is a flow chart representation of the method of synthetic perturbation tuning of computer programs of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
Referring to Fig. 1, there is shown a flow chart representation of synthetic perturbation computer program tuning method 4 of the present invention. Execution of tuning method 4 begins at node 8 and proceeds to location identification box 12, wherein the locations for synthetic perturbation within the code of the program to be tuned are identified. Execution of tuning method 4 continues to value selection box 16, wherein the sets of values for the synthetic perturbations are selected. Next, in perturbation insertion box 20, the synthetic perturbations are inserted into the code of the computer program to be tuned at the locations identified in location identification box 12. After insertion of the synthetic perturbations, the loadable versions of the program are generated in generation box 24 of tuning method 4.
Execution of synthetic perturbation computer program _ tuning method 4 then proceeds to execution box 28, wherein the various program versions are run on the computer. In quantification box 32, the performances of these computer runs are quantified by measuring program responses. Next, in analysis box 36, the measured responses from quantification box 32 are analyzed to determine how the various sets of perturbation values affected overall program performance. This analysis is used to identify in segment identification box 40 those segments of program code that are critical to program performance. Using the results of segment identification box 40, the code of these critical segments can be improved to improve overall program performance in code improvement box 44. Tuning method 4 concludes at node 48.
The goal of synthetic perturbation tuning method 4 of the present invention is to identify those segments of code within a computer program that are critical to the efficient performance of that program. Synthetic perturbation tuning method 4 begins with the user identifying locations within the code of the program for insertion of synthetic perturbations as shown in location identification box 12. This location identification requires the user to determine test parts of the program code that correspond to the various in-line code segments, references to storage (e.g., variables), routines, subroutines, and other functions called or entered throughout the program. A synthetic perturbation is to be inserted by the user into each of these selected locations.
In synthetic perturbation tuning method 4 of the present invention, each synthetic perturbation may have a perturbation function that is an adjustable time delay function. The time delay function used can be any computer operation that utilizes run time while not affecting the computational integrity of the program itself. The time delay function may be adjustable in that it may accept an input parameter representative of the magnitude of the delay to be implemented. For example, one possible time delay function would involve a sequence of repeated computer multiplications as dictated by the input parameter. Thus, if the input parameter to the time delay function is seven, the computer performs seven identical multiplications in sequence. If, on the other hand, the input parameter is zero, no multiplications are performed.
In order to identify those segments of code that are critical to program performance, the execution or running of the program is repeated using different perturbation sets. A perturbation set corresponds to a particular set of values selected for the input parameters of the synthetic perturbations. Each different perturbation set corresponds to a different version of the program. In tuning method 4 of the present invention, each program version may correspond to a perturbation set in which only one of the perturbations has a non-zero input parameter. In such case, different program versions have different perturbations as their non-zero perturbation. Running each of these program versions may then be used to identify critical code segments, by analyzing the effects of each individual perturbation.
In tuning method 4 of the present invention, the user may rely on the design-of-experiments approach in selecting perturbation sets. Design-of-experiments is a conventional statistical technique that applies macro modelling for system simplification, prediction, comparison, or optimization. The design-of-experiments approach treats the system to be optimized as a "black box" with controllable input parameters and measurable output features. Design-of-experiments suggests combinations of input parameters to be applied to the black box. These combinations are designed to isolate the effects of both individual input parameters and combinations of input parameters. The resulting measured output features are then analyzed by the user applying empirical analysis methods, e.g., conventional multiple linear regression techniques, to characterize the effects that these individual input parameters and combinations of input parameters had on system performance.
When design-of-experiments is applied in tuning method 4 of the present invention, the black box of the design-of- experiments approach is the computer program to be tuned, data for this program, the program perturbations, and the host system. The input parameters are the synthetic perturbations, and each different combination of input parameters is a different perturbation set. A typical measurable output feature is program run time. Design-of-experiments suggests the sets of synthetic perturbation values to be selected by the user for the various program versions as shown in set selection box 16. Typically, in each perturbation set, some of the perturbations have the same non-zero time delay value selected as input parameters, while the rest have an input of zero, implying that no delay is applied at those locations. Although the number of different possible perturbation sets is infinite, the design-of-experiments approach may suggest perturbation sets that are geometrically balanced to explore the input space, thereby limiting the number of perturbation sets employed to a reasonable and manageable level.
The synthetic perturbations inserted in perturbation insertion box 20 may be inserted by the user into the source code of the computer program to be tuned or into its machine loadable code. When source code is used, the user may edit the source code by inserting the time delay function invocations into the code at the various locations and with the input parameters as selected by the user. If the delay function invocation replaces a variable reference in source code, the variable reference may itself become an actual parameter to the delay function, which will delay according to one argument, and return the value of the variable according to another argument. The user may then compile the program source code into its machine loadable version. In tuning method 4 of the present invention, the user may instruct the compiler to ignore the time delay function invocation when compiling the source code for those locations having a perturbation input parameter of zero.
When the synthetic perturbations are inserted by the user into the program after its loadable code is generated, the user may implement the perturbation with a jump out of the loadable code at the selected location to a sequence of steps comprising the program instruction line replaced by the jump out, a time delay function, and a jump back to the loadable code. For those perturbations with an input parameter of zero, the user may be required to make no change to the corresponding segments of loadable code.
Whether starting with source code or loadable code, the user generates a unique loadable version of the program in generation box 24 for each unique synthetic perturbation set. Typically, the user executes the various program versions on the computer in random order, such that each version is eventually run more than one time as shown in execution box 28. Each program version is run a plurality of times in order to characterize and account for the inherent variability of program performance on parallel computer systems. The user may select a different random sequence for executing the various program versions for each repetition. Each time a computer version is run, the user quantifies its performance as shown in quantification box 32. For example, the run time of each version execution may be measured. Certain perturbations or certain combinations of perturbations may result in larger effects to the overall run time of the computer program than other perturbations or other combinations thereof. Some perturbation sets may have negligible impact on program run time, while others may have a relatively large impact on run time. Some combinations of time delays may result in increased program run time and others may result in decreased program run time.
This non-intuitive result of decreased run time is due to the communication conflicts that can arise in parallel processing systems where multiple processors are competing for limited resources. A decreased run time results when the applied synthetic perturbations lessen the degree of communication conflict as compared to that in the original program.
The results from the various computer runs are then analyzed by the user to identify and characterize the effects the various perturbation sets had on overall program performance as shown in analysis box 36. In tuning method 4 of the present invention, the user may perform multiple linear regression analysis on the run time measurements to determine the effects of the various time delays on overall program run time. This analysis is typically performed using the average run time for the plurality of runs of each program version as the run time for that program version. This in turn may identify the segments of program code that are most critical to program performance as shown in segment identification box 40 of tuning method 4. These identified code segments can then be altered by the user to improve the performance of the overall computer program as shown in code improvement box 44.
Many systems may be analyzed using multinomial macro-, models, wherein a system is treated as a "black box" with known input parameters and measurable output responses. When the underlying system is too complex to be approximated by such a multinomial macro-model, but synthetic perturbations clearly have beneficial effects, the design-of-experiments approach may change emphasis to an informal optimization. This may occur in a parallel system that is very unpredictable in its responses to changes in perturbations. In such case, input settings are still geometrically balanced to explore the input space, but, based upon observed responses, the program perturbation set that corresponds to the best observed performance is selected. Since this selection is made directly from measured responses, no deeper analysis model is built or used.
Thus, improving the performance of the program may involve different activities depending on the needs of the programmer. Where large positive undesired effects on program run time are detected, the user may edit or rewrite the identified critical code segments to improve their efficiency and thereby improve the efficiency of the program as a whole. Analysis of perturbation impacts also identifies those segments of code that can expand with only minor performance penalties. This information may be useful in implementing new substantive program requirements as they arise.
Alternatively or in addition, where certain perturbation sets result in desired responses, e.g. ,' decreased program run time, the user may apply the results from the various computer runs to identify a particular perturbation set to become a permanent feature of the final computer program. The selected permanent perturbation set may be one of the perturbation sets previously tested or it may a different one.
Synthetic perturbation tuning method 4 of the present invention may be implemented iteratively. In such case, the user implements tuning method 4 to identify and then improve the most critical segment of program code. A second implementation of the tuning method 4 may then be performed by the user to identify and improve the next most critical segment of program code. This sequence of analysis and improvement may be repeated until the user determines that program performance has reached acceptable levels.
Although developed to handle the problems particular to tuning computer programs on parallel computer systems, synthetic perturbation tuning method 4 of the present invention provides tuning of programs for serial computer systems as well. Moreover, synthetic perturbation tuning method 4 may be used to tune computer programs for both multiple-instruction, multiple- data and single-instruction, multiple-data parallel computer systems. In addition, tuning method 4 of the present invention may be applied to computer systems with either shared-memory or distributed-memory architecture.
As an alternative to measuring the run time of each program version, quantification box 32 of tuning method 4 may include measuring the memory space used during execution of each program version. In such case, the goal of synthetic perturbation computer program tuning method 4 of the present invention may be to improve the performance of the program to be tuned in terms of such memory space usage. Analysis of these memory space usage measurements may then be used to characterize how and to what degree the various segments of program code influence memory space during program execution. In this embodiment of tuning method 4, the critical code segments may then be improved to reduce the amount of memory space used during execution of the program.
Synthetic perturbation tuning method 4 of the present invention may also be used to improve program performance by increasing transaction throughput, where transaction throughput is understood to be the number of complete transactions the program can perform in a fixed period of time. In this embodiment of tuning method 4, execution of the various program versions is quantified in quantification box 32 by measuring transaction throughput. Analysis may then characterize the effects of different segments of program code on the overall transaction throughput of the program to be tuned. The critical code segments may then be improved to increase transaction throughput.
Synthetic perturbation tuning method 4 may be implemented by quantifying several different types of response at the same time in quantification box 32. In these embodiments, any combination of program run time, memory space usage, and transaction throughput may be measured for each program execution of execution box 28. In such cases, several distinct tuning evaluations may be performed jointly.
It will be understood by those skilled in the art that synthetic perturbations other than time delays may be employed in tuning method 4 of the present invention. For example, requests for temporary space buffers of adjustable size may be used as perturbations to improve the efficiency of computer program performance, where efficiency may be measured in terms of such parameters as run time, consumed space, and transaction throughput.
It will also be understood by those skilled in the art that the present invention may be implemented by synthetic perturbation tuning system 4 that accepts the computer program to be tuned as input. The user of such a synthetic perturbation tuning system of the present invention may be given the option of manually selecting the locations for insertion of synthetic perturbations or allowing the system to perform that selection process automatically. The automatic selection process may select locations for synthetic perturbations according to the basic structure of the program to be tuned in terms of in-line code segments, data references, routines, subroutines, and function calls. The operator may also be able to combine the automatic and manual selection features to delete or add manually to the list of selections made automatically.
Synthetic perturbation tuning system 4 of the present invention may also automatically select the various perturbation sets to be applied to the computer program to be tuned. Tuning system 4 of the present invention may then automatically generate, execute, and quantify the performance of the various program versions corresponding to those selected perturbation sets. Tuning system 4 may then analyze the measured responses to identify the critical segments of program code. A particular perturbation set to be inserted as a permanent feature of the program may also be identified by tuning system 4.
One advantage of synthetic perturbation tuning method 4 is that it identifies critical segments of program code, that is, those segments of code with greatest impact upon performance. Conventional sequential profiling tools do not achieve this basic goal of tuning computer programs to be run on parallel processing systems. Synthetic perturbation tuning as performed by method 4 also avoids the time and effort of tedious segment-by-segment recoding or algorithm revamping otherwise required for set-up when tuning programs for parallel environments.
Another advantage of tuning method 4 is that the user need not have more than a slight understanding of the computer program being tuned in order to apply tuning method 4. At most, the user needs to understand the basic structure of the program in terms of general useful main program locations, routines, and calls to subroutines and functions to select perturbation locations. However, the user is not required to have any understanding of the actual substantive processing performed by the program to get a tuning recommendation from tuning method 4. A deeper understanding may assist the user in improving the program as in improvement box 44, but such deeper understanding is not always necessary.
Still another advantage of synthetic perturbation tuning method 4 is that it maintains the computational integrity of the program being tuned. None of the changes made to the code during the diagnostic steps of tuning method 4, that is, from location identification box 12 through segment identification box 40, have any effect on the substance of the program. For example, a synthetic perturbation may involve either a simple delay or a utilization of buffer space, neither of which has any impact on the numerical computations of a well-written program.
A further advantage of tuning method 4 of the present invention is that it may be used to quantify, as well as qualify, program performance in terms of the impacts of adding or subtracting different amounts of substantive code from the selected code segments. Since the duration of a particular time delay is controlled by the perturbation input parameter, input parameters may be selected to simulate the performance of particular sequences of substantive code instructions. In this way, the impact of a particular substantive change upon program performance can be quantified and evaluated without actually editing the substantive code. Alternatively, such use of input parameters may provide insight into the effects of subtracting different amounts of substantive code without actually making those substantive changes.
Another further advantage of synthetic perturbation tuning method 4 is that it avoids the dependency upon metrics or upon detailed global state information"inherent in profiling tools of the prior art, such as that taught by Anderson and Lazowska. Thus, synthetic perturbation tuning as performed by tuning method 4 avoids many of the problems of interpretation and of information capture of those prior art approaches.
Still another further advantage of synthetic perturbation tuning method 4 is that it is direct. If a perturbation has an effect, tuning method 4 identifies the effect. Consequently, synthetic perturbation tuning method 4 'is fairly independent of architectural aspects of a system. It focuses largely upon perturbation types, perturbation sets, and program performance responses. Structural interpretations within the program to be tuned or the system upon which the program is executed matter less.
Yet another further advantage of tuning method 4 of the present invention is that it addresses the interdependencies that predefined tuning metrics can easily miss. While many metrics identify obvious code segments for improvement, not all improvements are equally important. In some performance states, small perturbations matter little. Yet, the same time delay in another state may induce a change in run time that cannot be explained by interpreting a level-of-parallelism metric. Synthetic perturbation tuning method 4 addresses these interprocess communication effects by focusing upon overall response, for example, program run time.
It will be understood that various changes in the details and arrangements of the elements which have been described and illustrated in order to explain the nature of the present invention may be made by those skilled in the art without departing from the principle and scope of the present invention as expressed in the following claims.

Claims

CLAIMSWhat is claimed is:
1. A method of improving the performance_of a program executed on a computer system, said program having program code', comprising the steps of:
(a) inserting a synthetic perturbation into said program code;
(b) selecting a first perturbation value for said synthetic perturbation;
(c) first executing said program having said inserted synthetic perturbation with said first perturbation value to provide a first program execution;
(d) first quantifying said first program execution;
(e) selecting a second perturbation value for said synthetic perturbation;
(f) second executing said program having said inserted synthetic perturbation with said second perturbation value to provide a second program execution;
(g) second quantifying said second program execution; and,
(h) altering said program in accordance with said first and second quantifyings.
2. The method of claim 1, wherein said first perturbation value is zero.
3. The method of claim 1, wherein said first perturbation value differs from said second perturbation value.
4. The method of claim 1, wherein said synthetic perturbation is adjustable.
5. The method of claim 1, wherein said synthetic perturbation comprises a time delay.
6. The method of claim 1, wherein said synthetic perturbation comprises a buffer space.
7. The method of claim 1, wherein said program code comprises source code.
8. The method of claim 1, wherein said program code comprises loadable code.
9. The method of claim 1, wherein said insertion of said perturbations in step (a) maintains program computational integrity.
10. The method of claim 1, wherein said computer system is a parallel computer system.
11. The method of claim 10, wherein said computer system is an multiple-instruction multiple-data parallel computer system.
12. The method of claim 10, wherein said computer system has shared-memory architecture.
13. The method of claim 8, wherein said computer system has distributed-memory architecture.
14. The method of claim 1, wherein the step of selecting at least one of said first and second perturbation values comprises selecting in accordance with statistical techniques of design-of-experiments.
15. The method of claim 1, wherein step (h) comprises the step of altering said program in accordance with multiple linear regression analysis.
16. The method of claim 1, wherein said first and second quantifymgs comprise measuring run time of said first and second program executions, respectively.
17. The method of claim 1, wherein said first and second quantifymgs comprise measuring memory space usage of said first and second program executions, respectively.
18. The method of claim 1, wherein said first and second quantifymgs comprise measuring throughput of said first and second program executions, respectively.
19. The method of claim 1, wherein step (h) comprises the step of altering said program to decrease program execution run time.
20. The method of claim 1, wherein step (h) comprises the step of altering said program to decrease program execution memory space usage.
21. The method of claim 1, wherein step (h) comprises the step of altering said program to increase program transaction throughput.
22. The method of claim 1, wherein step '(h) comprises the step of selecting a third perturbation value for said synthetic perturbation in accordance with said first and second quantifyings.
23. The method of claim 1, wherein step (h) comprises the step of editing a segment of said program code.
24. The method of claim 23, wherein step (h) further comprises the step of identifying, in accordance with said first and second quantifyings, said segment of said program code to be edited.
25. A method of improving the performance of a program executed on a computer system, said program having program code, comprising the steps of:
(a) inserting a plurality of synthetic perturbations into said program code;
(b) selecting a set of perturbation values for said synthetic perturbations;
(c) executing said program having said plurality of inserted synthetic perturbations with said set of perturbation values to provide a program execution;
(d) quantifying said program execution; and, (e) altering said program in accordance with said quantifying.
26. The method of claim 25, comprising the further steps of:
(f) selecting a second set of perturbation values for said synthetic perturbations, wherein said second set of perturbation values differs said set of perturbations values of step (b) ;
(g) executing said program having said plurality of inserted synthetic perturbations with said second set of perturbation values to provide a second program execution; and,
(h) second quantifying said second program execution, wherein said altering of step (e) is in accordance with said second quantifying.
27. A system for improving the performance of a program executed on a computer system, said program having program code, comprising: insertion means for inserting a synthetic perturbation into said program code; selection means for selecting a perturbation value for said synthetic perturbation; quantifying means for quantifying execution of said program having said inserted synthetic perturbation with said perturbation value; and, analysis means for analyzing the results of said quantifying means and determining impact of said synthetic perturbation with said perturbation value on program execution.
PCT/US1993/004143 1992-04-28 1993-04-28 Synthetic perturbation tuning of computer programs WO1993022725A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US87674592A 1992-04-28 1992-04-28
US07/876,745 1992-04-28

Publications (1)

Publication Number Publication Date
WO1993022725A1 true WO1993022725A1 (en) 1993-11-11

Family

ID=25368477

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1993/004143 WO1993022725A1 (en) 1992-04-28 1993-04-28 Synthetic perturbation tuning of computer programs

Country Status (1)

Country Link
WO (1) WO1993022725A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998003917A1 (en) * 1996-07-19 1998-01-29 Unisys Corporation Method of regulating the performance of an application program in a digital computer

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3891836A (en) * 1972-04-21 1975-06-24 Mobil Oil Corp Apparatus for optimizing multiunit processing systems
US4040021A (en) * 1975-10-30 1977-08-02 Bell Telephone Laboratories, Incorporated Circuit for increasing the apparent occupancy of a processor
US4638427A (en) * 1984-04-16 1987-01-20 International Business Machines Corporation Performance evaluation for an asymmetric multiprocessor system
US4858147A (en) * 1987-06-15 1989-08-15 Unisys Corporation Special purpose neurocomputer system for solving optimization problems
US5047919A (en) * 1986-04-03 1991-09-10 Harris Corporation Method and apparatus for monitoring software execution in a parallel multiprocessor computer system
US5204956A (en) * 1988-11-09 1993-04-20 Asea Brown Boveri Ltd. Method and apparatus for monitoring the execution time of a computer program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3891836A (en) * 1972-04-21 1975-06-24 Mobil Oil Corp Apparatus for optimizing multiunit processing systems
US4040021A (en) * 1975-10-30 1977-08-02 Bell Telephone Laboratories, Incorporated Circuit for increasing the apparent occupancy of a processor
US4638427A (en) * 1984-04-16 1987-01-20 International Business Machines Corporation Performance evaluation for an asymmetric multiprocessor system
US5047919A (en) * 1986-04-03 1991-09-10 Harris Corporation Method and apparatus for monitoring software execution in a parallel multiprocessor computer system
US4858147A (en) * 1987-06-15 1989-08-15 Unisys Corporation Special purpose neurocomputer system for solving optimization problems
US5204956A (en) * 1988-11-09 1993-04-20 Asea Brown Boveri Ltd. Method and apparatus for monitoring the execution time of a computer program

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998003917A1 (en) * 1996-07-19 1998-01-29 Unisys Corporation Method of regulating the performance of an application program in a digital computer
CN1099639C (en) * 1996-07-19 2003-01-22 尤尼西斯公司 Method of regulating performance of application program in digital computer

Similar Documents

Publication Publication Date Title
Ardalani et al. Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance
Manotas et al. Seeds: A software engineer's energy-optimization decision support framework
Wegener et al. A comparison of static analysis and evolutionary testing for the verification of timing constraints
Lopez-Novoa et al. A survey of performance modeling and simulation techniques for accelerator-based computing
Joshua et al. The future of simulation: A field of dreams
EP1416385A2 (en) Parallel efficiency calculation method and apparatus
Knijnenburg et al. The effect of cache models on iterative compilation for combined tiling and unrolling
Bao et al. Defensive loop tiling for shared cache
Martinez Caamaño et al. Full runtime polyhedral optimizing loop transformations with the generation, instantiation, and scheduling of code‐bones
Davidson et al. An aggressive approach to loop unrolling
Costa et al. Collectionswitch: A framework for efficient and dynamic collection selection
Caamaño et al. APOLLO: Automatic speculative polyhedral loop optimizer
Filipovič et al. Using hardware performance counters to speed up autotuning convergence on GPUs
Pan et al. PEAK—a fast and effective performance tuning system via compiler optimization orchestration
Whaley et al. Heuristics for profile-driven method-level speculative parallelization
Boyd et al. A hierarchical approach to modeling and improving the performance of scientific applications on the ksr1
Saumya et al. DARM: control-flow melding for SIMT thread divergence reduction
Lyon et al. Synthetic-perturbation tuning of MIMD programs
Yu et al. An adaptive algorithm selection framework
Kalinnik et al. Online auto-tuning for the time-step-based parallel solution of ODEs on shared-memory systems
Kobeissi et al. The polyhedral model beyond loops recursion optimization and parallelization through polyhedral modeling
Mosaner et al. Compilation Forking: A Fast and Flexible Way of Generating Data for Compiler-Internal Machine Learning Tasks
WO1993022725A1 (en) Synthetic perturbation tuning of computer programs
JP3821834B2 (en) Parallel efficiency calculation method
Patel et al. Principles of Speculative Run—Time Parallelization

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase