WO2006050349A2 - Methods and apparatus for running applications on computer grids - Google Patents

Methods and apparatus for running applications on computer grids Download PDF

Info

Publication number
WO2006050349A2
WO2006050349A2 PCT/US2005/039440 US2005039440W WO2006050349A2 WO 2006050349 A2 WO2006050349 A2 WO 2006050349A2 US 2005039440 W US2005039440 W US 2005039440W WO 2006050349 A2 WO2006050349 A2 WO 2006050349A2
Authority
WO
WIPO (PCT)
Prior art keywords
tasks
computational
task
grid
units
Prior art date
Application number
PCT/US2005/039440
Other languages
French (fr)
Other versions
WO2006050349A3 (en
Inventor
Fabricio Alves Barbosa Da Silva
Silvia Regina De Carvalho
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Publication of WO2006050349A2 publication Critical patent/WO2006050349A2/en
Publication of WO2006050349A3 publication Critical patent/WO2006050349A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5033Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering data affinity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing

Definitions

  • the invention relates to methods and apparatus for executing applications on computer grids. More particularly, although not exclusively, the invention relates to methods and apparatus for scheduling and running the components of applications, also known as tasks of grid-based applications, on computational units constituting a computational grid or cluster. Even more particularly, although not exclusively, the invention relates to scheduling and running tasks on heterogeneous distributed computational grids where the processing power of the node resources of the grid varies dynamically.
  • the invention may be particular suitable for scheduling sequential independent tasks, otherwise known as Bag-of-Tasks or Parameter
  • a computational grid can be thought of as a collection of physically distributed, heterogeneous computational units, or nodes.
  • the physical distribution of the grid nodes may range from immediate proximity to wide geographical distribution.
  • Grid nodes may be either heterogeneous or homogeneous with homogeneous grids differing primarily in that the nodes constituting such a grid provide essentially a uniform operating environment and computing capacity. Given the operational characteristics of grids as often being formed across administrative domains and over a wide range of hardware, homogeneous grids are considered a specific case of the general heterogeneous grid concept.
  • the present invention contemplates both types.
  • the present invention contemplates distributed networks of heterogeneous nodes which are desired to be treated as a unified computing resource.
  • Computational grids are usually built on top of specially designed middleware platforms known as grid platforms.
  • Grid platforms enable the sharing, selection and aggregation of the variety resources constituting the grid.
  • These resources which constitute the nodes of the grid can include supercomputers, servers, workstations, storage systems, desktop systems and specialized devices that may be owned and operated by different organizations.
  • BoT Bag-of-Tasks
  • BoT applications can be decomposed into groups of tasks. Tasks for this type of grid application are characterized as being independent in that no communication is required between them while they are running and that there are no dependencies between tasks. That is, each task constituting an element of the grid application as a whole can be executed independently with its result contributing to the overall result of the grid-based computation. Examples of BoT applications include Monte Carlo simulations, massive searches, key breaking, image manipulation and data mining.
  • BoT application A is composed of T tasks T
  • the amount of computation involved with each task 7 ⁇ is generally predefined and may vary among the tasks A.
  • the input for each task A is one or more (input) files and the output one or more (output) files.
  • the present exemplary embodiment relates to clusters organized as a master-slave platform.
  • a master node is responsible for scheduling computation among the slave noSes ⁇ and ⁇ cbllecting ⁇ the ⁇ fesults ⁇
  • Other grid/cluster models ⁇ are possible ivithin-th ⁇ scope" of the present invention, and may be capable of incorporating the execution/scheduling technique described herein with appropriate modification.
  • the slave components or nodes are themselves clusters.
  • Grid platforms typically use a non-dedicated network infrastructure such as the internet for inter-node communication.
  • a non-dedicated network infrastructure such as the internet for inter-node communication.
  • machine heterogeneity, long and/or variable network delays and variable processor loads and capacities are common. Since tasks belonging to a BoT application do not need to communicate with each other and can be executed independently, it is considered that BoT applications are particularly suitable for execution on such a grid infrastructure.
  • P tasks to be allocated and P is the total number of processors.
  • P the total number of processors.
  • task distribution and allocation according to this method will be less accurate. It would therefore be desirable to develop scheduling and execution techniques which take into account dynamic variation in grid node processing power.
  • the invention provides for a method of running grid applications on a grid, the grid comprising a plurality of computational units and the application comprising a plurality of tasks, the method including the steps of: estimating the task execution times on ail computational units comprising the grid; grouping the tasks and assigning said groups to corresponding computational units; and, as the computational units complete execution of tasks, replicating tasks onto idle computational units in such a way that the remaining amount of computation is balanced between the computational units.
  • the invention provides for a method of running an application on a computational grid comprising a plurality of computational units, the application comprised of a plurality of tasks, the method including the steps of:
  • tasks are replicated at step (i) so that the amount of outstanding computation is balanced among the computational units.
  • the initial grouping of tasks may be based on a static determination of the relative processing power of the computational units.
  • the task queue preferably corresponds to a size ordered list of the tasks constituting the grid application.
  • Step (i) may comprise replicating a one or more tasks or an entire group onto an idle computational unit.
  • step D) if the computational unit has not completed execution in a specified time, it is considered that that computational unit has failed or is offline and the method proceeds to step (d)(i) whereby any incomplete tasks allocated to that failed or offline computational unit are replicated onto an idle computational unit.
  • Computational units may correspond to processors, nodes, clusters or other type of computing resources which can be considered as an grid resource aggregated or otherwise.
  • the task queue is ordered talcing into account input files which are shared between tasks or have a degree of association.
  • the tasks are preferably grouped in step A) according to a method of scheduling the running of an application on a plurality of computational units, said application comprising a plurality of tasks, each task having at least one input file associated therewith, said method including the steps of: aggregating said plurality of tasks into one or more groups of tasks; and
  • the tasks are preferably grouped in step A) according to a method of scheduling tasks among a plurality of computing units, the method including the following steps:
  • J define the number of tasks to be assigned in groups to the computing units, where P is the number of computing units;
  • the invention provides for a network adapted to operate in accordance with the method as hereinbefore defined. In a further aspect, the invention provides for a system adapted to operate in accordance with the method as hereinbefore defined.
  • the invention provides for a computer adapted to perform the method as hereinbefore defined.
  • the invention provides for a computer program adapted to perform the steps of the method as hereinbefore defined.
  • the invention provides for a data carrier adapted to store a computer program as hereinbefore defined.
  • the invention also provides for a master computer configured to carry out the method as hereinbefore defined.
  • the invention provides for a computational grid adapted to operate in accordance with the method as hereinbefore defined.
  • Figure 1 illustrates an embodiment of the invention having a master-slave node configuration
  • Figure 2 illustrates a flow diagram showing the replication of tasks in accordance with an embodiment of the invention.
  • the duration of this phase is equal to t Mt .
  • the set of files sent may include a parameter file corresponding to a specified task and an executable file which is charged with performing the computational task on the slave processor.
  • the time in this phase includes the overhead incurred by the master node (10) to initiate a data transfer to a slave (11), for example, to initiate a TCP connection. For example, consider a task i that needs to send two files to a slave nodes before execution. The time t init can then be computed-as follows:
  • Lat t is the overhead incurred by the master node to initiate data transfer to the slave node (11-14).
  • . ]TF/% is the total size in bytes of the input files that have to be transferred to slave node s and B is the data transfer rate. For simplicity, in this example it is assumed that each task has only one separate parameter file of the same size associated with it.
  • the task processes the parameter file at the slave node (11-14) and produces an output file.
  • the duration of this phase is equal to t comp . Any additional overhead related to the reception of input files by a slave node is also included in this phase.
  • this phase the output file is sent back to the master node (1) and the task T is terminated.
  • the duration of this phase is t end .
  • This phase may require some processing at the master and this is mainly related to writing files at the file repository (not shown in (10)). This writing step may be deferred until disk resources are available. It is therefore considered negligible.
  • the initialization phase of one slave can occur concurrently to the completion phase of another slave node.
  • the total execution time of a task is therefore:
  • the exemplary embodiment described herein corresponds to a dedicated node cluster composed of P+l homogeneous processors where T » P .
  • the additional processor is the master node (10).
  • Communication between the master (10) and slave (11-14) is by way of a shared link and, in this embodiment the master (10) can only send files through the network to a single slave at a given time.
  • the communication link is full duplex.
  • This embodiment of the invention corresponds to a one-port model whereby there are at most two communication processes involving a given master, one sent and one received.
  • the one-port embodiment discussed herein is particularly suited to LAN network connections.
  • FIG. 1 shows the execution of a set of tasks in a system composed of four slave nodes. The initial grouping of tasks is based on static information available at the time of initialization of the application execution and reflects a snapshot of the processing power of the slave nodes.
  • the effective number of processors P e g is defined as the maximum number of slave processors needed to run an application with no idle periods on any slave processor. Taking into account the task and platform models of the particular embodiment described herein, a processor may have idle periods if:
  • the maximum number of tasks is to be executed on a given processor is:
  • T is the total number of tasks.
  • the total execution time, or makespan will be:
  • t comp may be increased by grouping sets of tasks sharing common input files into a larger task. By doing so, it is possible to increase the effective number of processors therefore increasing the number of slave processors that can be used effectively.
  • the time corresponding to t init should ideally not increase in the same proportion to t comp .
  • tasks which share one or more input files are preferably selected and scheduled so as to run on a common slave node or processor.
  • the number of groups is equal to the number of nodes available. This is not however a limitation in this example and modifications to the scheduling method are viable to take into account different processor/group. For example, for some specific sets of applications/platforms, the optimal execution in terms of the makespan will use a number of groups smaller than the total number of processors.
  • the file affinity I aff is defined in one embodiment as follows:
  • a potential benefit dfmitially clustering " tasks " into groups for-execution ⁇ are ⁇ that the grid scales that is, increasing the number of nodes results in a decrease in the total processing time for the grid application.
  • This simplified heuristic reduces the size of the possible solution space and provides a viable method of calculating the file affinities for tasks to within a workable level of accuracy.
  • file affinity the simplified embodiment includes the following steps: Initially, for each computing unit or processor, the number of tasks to be aggregated into a group is defined for that computing unit. This is done so that the time needed for processing each group is substantially the same for each computing unit.
  • the size of each task corresponds to the sum of the input file sizes for the task concerned.
  • the tasks allocated to the group as a function of both the number of tasks determined previously and task affinity.
  • the initial allocation step may be as follows. The reference to
  • 'position' relates to the position of the task input file in the size-ordered list.
  • the smallest size task, task(position) is assigned to a first group.
  • the file affinity of the pair task(position) and task(position+l) in the size-ordered list is determined. If the file affinity /c is greater than a specified value, task( ⁇ osition+l) is assigned to the first group. If the file affinity is less than a specified value, task(position+l) is assigned to a subsequent group. This process is repeated, filling sequentially the groups in order unti ⁇ the group allocations determined in the initial step are populated with the size-ordered, associated tasks. This can be expressed in pseudocode as follows:
  • the 10 heterogeneous input file tasks are ⁇ 20K, 35K, 44K, 80K, 102K, HOK, 200K, 300K, 400K, 45 OK ⁇ .
  • Three groups of tasks are generated, one with 4 tasks and the others with 3 tasks.
  • the simplified heuristic in the case of zero file affinity operates as follows. Each task is considered in size order. Thus, 2OK is allocated to the first position of group 1. Then the 35K input task is allocated to the next group following the principle that each group should minimize initial transmission or initialization time. Task 44K is allocated to the third group. Task 8OK is then allocated to position two of the first group, 102K to the second position of group two and so on.
  • the transfer of the files occurs in a pipelined manner, i.e.; where computation is overlapped with communication.
  • Figure 5 illustrates the pipelined transfer of input files from a master to three slave processors. As can be seen in this example, the transfers to and from the master/nodes are staggered with the computation on the slaves being overlapped with the communication phase on one or more of the other processor nodes. This reduces t ⁇ when executing a group of tasks on a slave processor.
  • Another example is that of 10 homogeneous tasks with ten completely homogeneous sets of input files ⁇ 30K, 3OK, 30K, 3OK, 3OK, 30K, 3OK, 30K, 3OK, 30K, 3OK, 30K ⁇ .
  • three groups of tasks are generated, one with four tasks and the others with three.
  • each pair will have a file affinity of 1.
  • the three groups of input files will be ⁇ 30K ⁇ , ⁇ 30K ⁇ , and ⁇ 30K ⁇ .
  • the size of each task may be calculated on the basis of the byte sum of all of the input files needed to execute each task on a computing unit or grid node.
  • the file affinity may usefully be defined as k for which an affinity of 0.5 is considered acceptable as a benchmark for grouping tasks into a specified group. Essentially, this equates to setting the minimum degree of 'association' which is necessary to consider two tasks as related or sharing input files. This ensures that the file affinity is maximized within a group. Thus sending similar sets of files to multiple processors is avoided. As noted above, if the next set of files is different enough (i.e., has a file affinity with a previously allocated task less than the minimum), that task will be located at the next processor position.
  • the number of tasks is allocated to each processor based on the processing power of the processor concerned and the file affinity, and the tasks are dispatched or transferred to the processor in a pipelined way.
  • pipelined means overlapping computational and communication steps.
  • this treatment assumes an initial grouping based on static information relating to the relative processing power of the grid nodes.
  • the number of tasks to be assigned to each group is determined such that the time needed for processing each group is substantially the same for each computing unit. This will depend on each processors relative speed based on the average speed of the processors in the cluster. For example, if the relative speed of a particular node processor is 1.0 compared to the average speed of the cluster nodes,
  • This initial task allocation is static and based on an assumption that the relative speed of the node processors remains constant throughout the execution of the grid application.
  • dynamic characteristics of grid node power may be taken into account according to the following process, expressed as pseudocode and with reference to the flowchart of Figure 2:
  • the tasks are replicated between grid nodes taking into account the dynamic rate of task completion.
  • This implicitly takes into account dynamic variation in the processing power of the grid nodes.
  • This method also takes into account the potential failure of a slave or grid node by dynamically asserting the failure of a machine by considering a dynamically adjustable timeout at step (22). That is, if the master node does not receive any results for a period of time equal or larger than the timeout, the machine is considered offline and its allocated tasks are replicated on an available grid node. Further, if a node fails, a whole group may be replicated. Otherwise tasks for which processing has not begun on the slowest node are replicated. This has the effect of balancing the outstanding computation among the nodes.
  • the number of tasks assigned to each cluster will be calculated based on the requirement that the time needed by the cluster for processing each group is the same for each cluster. As before, this will depend on the processing speed of the cluster aggregate and will depend on the internal structure of the particular cluster such as the number of processors, load from other users etc.
  • the allocation method proceeds substantially as described above. Following this initial grouping based on static information, dynamic replication can be used to take into account cluster-level variability.
  • the invention in further intended to cover the task scheduling/grouping technique in its most general sense as specified in the claims regardless of the possible size of the solution space for the affinity determination. It is also noted that the described embodiment of the invention may be applied to the distribution of tasks among nodes in a grid system where the computational characteristics of such nodes may take a variety of forms. That is, node processing may take the form of numerical calculation, storage or any other form of processing which might be envisaged as part of distributed application execution. Further, embodiments of the present invention may be included in a broader scheduling system in the context of allocating information to generalized computing resources.

Abstract

A method of running grid applications on. a grid, the grid comprising a plurality of computational units (10, 11, 12, 13, 14) and the application comprising a plurality of tasks, the method including the steps of: estimating the task execution times of all computational units comprising the grid; grouping the tasks and assigning said groups to corresponding computational units (10, 11, 12, 13, 14); and, asthe computational units complete execution of tasks, replicating tasks onto idle computational units in such a way that the remaining amount of computation is balanced between the computational units (publish figure 1).

Description

Methods and Apparatus for Running Applications on Computer Grids
Field of the invention
The invention relates to methods and apparatus for executing applications on computer grids. More particularly, although not exclusively, the invention relates to methods and apparatus for scheduling and running the components of applications, also known as tasks of grid-based applications, on computational units constituting a computational grid or cluster. Even more particularly, although not exclusively, the invention relates to scheduling and running tasks on heterogeneous distributed computational grids where the processing power of the node resources of the grid varies dynamically. The invention may be particular suitable for scheduling sequential independent tasks, otherwise known as Bag-of-Tasks or Parameter
Sweep applications, on computational grids.
Background~to-the Invention
A computational grid, or more simply 'grid', can be thought of as a collection of physically distributed, heterogeneous computational units, or nodes. The physical distribution of the grid nodes may range from immediate proximity to wide geographical distribution. Grid nodes may be either heterogeneous or homogeneous with homogeneous grids differing primarily in that the nodes constituting such a grid provide essentially a uniform operating environment and computing capacity. Given the operational characteristics of grids as often being formed across administrative domains and over a wide range of hardware, homogeneous grids are considered a specific case of the general heterogeneous grid concept. The present invention contemplates both types.
In the described example, the present invention contemplates distributed networks of heterogeneous nodes which are desired to be treated as a unified computing resource.
Computational grids are usually built on top of specially designed middleware platforms known as grid platforms. Grid platforms enable the sharing, selection and aggregation of the variety resources constituting the grid. These resources which constitute the nodes of the grid can include supercomputers, servers, workstations, storage systems, desktop systems and specialized devices that may be owned and operated by different organizations.
The described embodiment of the present invention is concerned with grid applications known as Bag-of-Tasks (BoT) applications. These types of applications can be decomposed into groups of tasks. Tasks for this type of grid application are characterized as being independent in that no communication is required between them while they are running and that there are no dependencies between tasks. That is, each task constituting an element of the grid application as a whole can be executed independently with its result contributing to the overall result of the grid-based computation. Examples of BoT applications include Monte Carlo simulations, massive searches, key breaking, image manipulation and data mining.
In this specification and the exemplary embodiments described therein, we will refer to a BoT application A as being composed of T tasks
Figure imgf000004_0001
T
The amount of computation involved with each task 7} is generally predefined and may vary among the tasks A. Note that the input for each task A is one or more (input) files and the output one or more (output) files.
The present exemplary embodiment relates to clusters organized as a master-slave platform. According to this model, a master node is responsible for scheduling computation among the slave noSes~and~cbllecting~the~fesultsτ Other grid/cluster models^are possible ivithin-thσ scope" of the present invention, and may be capable of incorporating the execution/scheduling technique described herein with appropriate modification. For example, a further embodiment is described where the slave components or nodes are themselves clusters.
Grid platforms typically use a non-dedicated network infrastructure such as the internet for inter-node communication. In such a network environment, machine heterogeneity, long and/or variable network delays and variable processor loads and capacities are common. Since tasks belonging to a BoT application do not need to communicate with each other and can be executed independently, it is considered that BoT applications are particularly suitable for execution on such a grid infrastructure.
While heterogeneous grids have been found to be suitable for executing such applications, a significant problem is scalability and dynamic variation in node processing power. The applicants copending application No. GB0423988.5 the disclosure of which is incorporated in its entirety, is concerned with scaling and the present invention is concerned .mainly with scheduling functionality to take into account dynamic load variation in the grid nodes.
In both homogeneous and heterogeneous grids, it is possible that the behavior of the grid nodes may vary over time. This can be due to extraneous loads put on the node. For example, a local user may begin using a node machine in an interactive manner while that node is carrying out an allocated sequence of task calculations for a grid application run by another user. This would have the effect of increasing the anticipated computation completion time for that node. This may also reduce the efficacy of the task allocation technique described in patent application No. GB0423988.5 when the present embodiment of the invention is used in combination with that technique. Initial task grouping is generally based on static information whereby the maximum number of tasks to be assigned to a particular processor in the grid or
cluster is calculated according to the ceiling function where T is the total number of
P tasks to be allocated and P is the total number of processors. In the case of single-node processors, if the relative speed of the nodes changes, task distribution and allocation according to this method will be less accurate. It would therefore be desirable to develop scheduling and execution techniques which take into account dynamic variation in grid node processing power.
Disclosure of the invention
In its broadest aspect, the invention provides for a method of running grid applications on a grid, the grid comprising a plurality of computational units and the application comprising a plurality of tasks, the method including the steps of: estimating the task execution times on ail computational units comprising the grid; grouping the tasks and assigning said groups to corresponding computational units; and, as the computational units complete execution of tasks, replicating tasks onto idle computational units in such a way that the remaining amount of computation is balanced between the computational units.
In a further aspect, the invention provides for a method of running an application on a computational grid comprising a plurality of computational units, the application comprised of a plurality of tasks, the method including the steps of:
A) grouping the tasks according to the total number of computational units and total number of tasks based on an initial determination or assumption in respect of the relative processing power of the computational units constituting the computational grid;
B) scheduling groups of tasks on computational units of the computational grid using a task queue;
C) while there remain uncompleted tasks perform step D)
D) when a computational unit Pj completes the execution of at least one task, perform the following steps (a) to (d): (a) compute the mean execution time for the completed task on computational unit Pi;
(b) update the task queue;
(c) abort any still running replicas of the completed tasks;
(d) if computational unit Pj is idle perform the following steps
(i) if there are unfinished tasks on slower computational units then replicate the unfinished tasks on computational unit Pj;
E) end
Preferably, tasks are replicated at step (i) so that the amount of outstanding computation is balanced among the computational units.
The initial grouping of tasks may be based on a static determination of the relative processing power of the computational units.
The task queue preferably corresponds to a size ordered list of the tasks constituting the grid application.
Step (i) may comprise replicating a one or more tasks or an entire group onto an idle computational unit.
In an alternative embodiment, in step D) if the computational unit has not completed execution in a specified time, it is considered that that computational unit has failed or is offline and the method proceeds to step (d)(i) whereby any incomplete tasks allocated to that failed or offline computational unit are replicated onto an idle computational unit.
Computational units may correspond to processors, nodes, clusters or other type of computing resources which can be considered as an grid resource aggregated or otherwise.
Preferably the task queue is ordered talcing into account input files which are shared between tasks or have a degree of association.
The tasks are preferably grouped in step A) according to a method of scheduling the running of an application on a plurality of computational units, said application comprising a plurality of tasks, each task having at least one input file associated therewith, said method including the steps of: aggregating said plurality of tasks into one or more groups of tasks; and
allocating each group of tasks to a computational unit, wherein the plurality of tasks are aggregated so that tasks which share one or more input file are included in the same group.
Alternatively, the tasks are preferably grouped in step A) according to a method of scheduling tasks among a plurality of computing units, the method including the following steps:
J) define the number of tasks to be assigned in groups to the computing units, where P is the number of computing units;
II) compute the size of each task;
III) rank the task files in a list L in order of increasing size,
IV) for each group, beginning with the group with the largest number of tasks perform the following steps (a) to (e):
(i) assign the smallest unassigned task file to the group;
(j) set the task file list position index equal to 1;
(k) while the group is not completely populated by task files perform the following steps:
(i) if the position index plus P is less than or equal than the size of the list L, and the task file affinity between the task file at the position index and the task file at the position index +1 is less than a specified value, k then increment the position index by P;
otherwise increment position index by 1;
(ii) assign to the group, the task file located at position in list L
(1) Remove assigned task files from List L
(m) Increment P = P - I
In a further aspect, the invention provides for a network adapted to operate in accordance with the method as hereinbefore defined. In a further aspect, the invention provides for a system adapted to operate in accordance with the method as hereinbefore defined.
In yet a further aspect, the invention provides for a computer adapted to perform the method as hereinbefore defined.
In another aspect, the invention provides for a computer program adapted to perform the steps of the method as hereinbefore defined.
In another aspect, the invention provides for a data carrier adapted to store a computer program as hereinbefore defined.
The invention also provides for a master computer configured to carry out the method as hereinbefore defined.
In a further aspect, the invention provides for a computational grid adapted to operate in accordance with the method as hereinbefore defined.
Brief Description of the Drawings
The invention will now be described by way of example only, with reference to the drawings, in which:
Figure 1: illustrates an embodiment of the invention having a master-slave node configuration; and
Figure 2: illustrates a flow diagram showing the replication of tasks in accordance with an embodiment of the invention.
For the purposes of explanation, a simple execution model will be described initially in relation to a prior art technique for scheduling an application on a homogeneous grid. This will then be compared with an embodiment of the invention. The specific embodiment described herein relates to fine-grain Bag-of-Tasks applications on dedicated master-slave platform as shown in Figure 1. However, this is not to be construed as limiting and the invention may be applied to other computing contexts with suitable modification. A further class of applications that may benefit from the invention are those composed of tasks with dependencies where sets of dependent tasks can be grouped and the groups are independent among themselves. The method may be modified slightly to group the tasks according to such dependencies. Referring to Figure 1 the application A is composed of T homogeneous tasks. That is: A = ■ The master node (10), is responsible for the organizing, scheduling, transmitting and receiving the tasks corresponding to the grid application. Referring to figure 1, each task goes through three phases during execution:
Initialization phase
This is the process whereby the files constituting the grid application and its data are sent from the master node (10) to the slave nodes (11-14) and the task is started. The duration of this phase is equal to tMt.
The set of files sent may include a parameter file corresponding to a specified task and an executable file which is charged with performing the computational task on the slave processor. The time in this phase includes the overhead incurred by the master node (10) to initiate a data transfer to a slave (11), for example, to initiate a TCP connection. For example, consider a task i that needs to send two files to a slave nodes before execution. The time tinit can then be computed-as follows:
∑ File j
= Lat, + ^-
B
Where Latt is the overhead incurred by the master node to initiate data transfer to the slave node (11-14). . ]TF/%is the total size in bytes of the input files that have to be transferred to slave node s and B is the data transfer rate. For simplicity, in this example it is assumed that each task has only one separate parameter file of the same size associated with it.
Computation Phase
In this phase, the task processes the parameter file at the slave node (11-14) and produces an output file. The duration of this phase is equal to tcomp. Any additional overhead related to the reception of input files by a slave node is also included in this phase.
Completion Phase
During this phase, the output file is sent back to the master node (1) and the task T is terminated. The duration of this phase is tend. This phase may require some processing at the master and this is mainly related to writing files at the file repository (not shown in (10)). This writing step may be deferred until disk resources are available. It is therefore considered negligible. Thus the initialization phase of one slave can occur concurrently to the completion phase of another slave node.
The total execution time of a task is therefore:
t total = hnit "*" ^ comp "^ l end
The exemplary embodiment described herein corresponds to a dedicated node cluster composed of P+l homogeneous processors where T » P . The additional processor is the master node (10). Communication between the master (10) and slave (11-14) is by way of a shared link and, in this embodiment the master (10) can only send files through the network to a single slave at a given time. The communication link is full duplex. This embodiment of the invention corresponds to a one-port model whereby there are at most two communication processes involving a given master, one sent and one received. The one-port embodiment discussed herein is particularly suited to LAN network connections.
A slave node-(l M4)-is~considered to be idle when -it is not- involved with the execution of any of the three phases of a task. Figure 1 shows the execution of a set of tasks in a system composed of four slave nodes. The initial grouping of tasks is based on static information available at the time of initialization of the application execution and reflects a snapshot of the processing power of the slave nodes.
The effective number of processors Peg is defined as the maximum number of slave processors needed to run an application with no idle periods on any slave processor. Taking into account the task and platform models of the particular embodiment described herein, a processor may have idle periods if:
tcomp "*" tend < \ " ~ Wink
Peg-is then given by the following equation:
Figure imgf000010_0001
The maximum number of tasks is to be executed on a given processor is:
M = -
P \ Where T is the total number of tasks. For a platform with Peff processors, the total execution time, or makespan, will be:
tmakespan = M{tm + tcomp + tend ) + (P - l)tinit
The second term in the right side of this equation gives the time which is needed to start the first (P-I) tasks in the other P-I processors. If the platform has more processors than Pφ then the overall makespan is dominated by communication times between the master and the slaves. Then:
'makespan ~ ^^init "■" ^comp "*" ' end
As there are idle periods on every processor, the following equation holds:
(.P - VhnU > (tcomp + tend)
This equation applies primarily to two cases: a. For very large platforms (P Large); and b. For applications with a large '"" ratio, such as fine-grain applications. comp
As described in the applicants copending patent application XXXX, tcomp may be increased by grouping sets of tasks sharing common input files into a larger task. By doing so, it is possible to increase the effective number of processors therefore increasing the number of slave processors that can be used effectively. The time corresponding to tinit should ideally not increase in the same proportion to tcomp. Thus, according to one form of task grouping method tasks which share one or more input files are preferably selected and scheduled so as to run on a common slave node or processor.
This may be achieved by introducing the concept of the file affinity which indicates the reduction in the amount of data that needs to be transferred to a remote node when all tasks of a group are sent to that node.
In this discussion it is assumed that the number of groups is equal to the number of nodes available. This is not however a limitation in this example and modifications to the scheduling method are viable to take into account different processor/group. For example, for some specific sets of applications/platforms, the optimal execution in terms of the makespan will use a number of groups smaller than the total number of processors. Given a set G of tasks composed of K tasks ,G=(T11T2, ...,Tκ}, and the set F of the Y input files needed by one or more tasks belonging to group G, F = {fj,f2,f3...fγ}, the file affinity Iaff is defined in one embodiment as follows:
Figure imgf000012_0001
|fi| is the size in bytes of file fj and N; is the number of tasks in group G which have file fj as an input file. 0 < Iaa< 1. An input file affinity of zero indicates that there is no sharing of files among tasks of a group. An input file affinity close to one means that all tasks of a group have a high degree of sharing of input files.
A potential benefit dfmitially clustering" tasks" into groups for-execution~are~that the grid scales that is, increasing the number of nodes results in a decrease in the total processing time for the grid application.
The equation above for file affinity is dominated by the combinatorial function whereby all possible pairs of tasks are considered. For large numbers of tasks, this can lead to very large numbers of combinations of task pairs. For example, there are N(25, 5) ways of clustering 25 tasks into 5 groups. This equates roughly to 1015 possible combinations. It may be therefore impractical to search exhaustively in solution space for an optimal task grouping. For this reason, a simplified heuristic may be used for determining the optimal task grouping which is based on the general file affinity equation described above.
Consider a group of tasks, each of which requires a different input file. Because there is no input file sharing, there is no file affinity between them. It is desirable to start processing them on slave nodes as soon as possible to minimize tinit . Therefore, the tasks are transferred to slave nodes in size order from smallest to largest with no account taken of sharing amongst input files (as this is zero).
If an application where all tasks share the same input file is considered, that input file only needs to be transferred once. This is taken into account by including the effect of file affinity. If the file affinity of two consecutive tasks (in size order) is very high, it is advantageous to assign those two tasks to the same processor instead of transferring the same set of input files twice over the network. In the ideal situation described here, this set of files is transferred only once to each processor or node of the network.
This simplified heuristic reduces the size of the possible solution space and provides a viable method of calculating the file affinities for tasks to within a workable level of accuracy. Taking into account file affinity, the simplified embodiment includes the following steps: Initially, for each computing unit or processor, the number of tasks to be aggregated into a group is defined for that computing unit. This is done so that the time needed for processing each group is substantially the same for each computing unit.
Then the total size of each task is calculated. Here, the size of each task corresponds to the sum of the input file sizes for the task concerned. For each group defined in the aggregation step, the tasks allocated to the group as a function of both the number of tasks determined previously and task affinity. The initial allocation step may be as follows. The reference to
'position' relates to the position of the task input file in the size-ordered list. The smallest size task, task(position) is assigned to a first group. Then the file affinity of the pair task(position) and task(position+l) in the size-ordered list is determined. If the file affinity /c is greater than a specified value, task(ρosition+l) is assigned to the first group. If the file affinity is less than a specified value, task(position+l) is assigned to a subsequent group. This process is repeated, filling sequentially the groups in order unti^the group allocations determined in the initial step are populated with the size-ordered, associated tasks. This can be expressed in pseudocode as follows:
define the number of tasks to be assigned in groups to the computing units,
P = the number of computing units;
o compute the size of each task;
o rank the task files in a list L in order of increasing size,
o for each group, beginning with the group with the largest number of tasks:
■ assign the smallest unassigned task file to the group;
■ task file list position = 1; • until the group is completely populated by task files do:
o if(position + P ≤ size of list L) and (task file affinity(task file [position] , task file [position+1] ) < a specified value, k) then position = position +P;
else position = position + 1;
o assign to the group, the task file at position in list L
• end do
■ Remove assigned task files from List L
■ P = P - I
An example application of this simplified heuristic is as follows. Consider a set of tasks composed often task files which are to be distributed on three homogeneous slave processors. The set of input files needed by each task is described as ffi, ... fio} where fi is a real value that corresponds to the byte sum of the input files needed by task tt. As the tasks are heterogeneous, they will share no input files and the file affinity between any pair of tasks will be zero.
The 10 heterogeneous input file tasks are {20K, 35K, 44K, 80K, 102K, HOK, 200K, 300K, 400K, 45 OK}. Three groups of tasks are generated, one with 4 tasks and the others with 3 tasks. The simplified heuristic in the case of zero file affinity operates as follows. Each task is considered in size order. Thus, 2OK is allocated to the first position of group 1. Then the 35K input task is allocated to the next group following the principle that each group should minimize initial transmission or initialization time. Task 44K is allocated to the third group. Task 8OK is then allocated to position two of the first group, 102K to the second position of group two and so on. This produces the group of files as follows: {20K, 80K, 200K, 400K}, {35K, 102K, 300K} and {44K, HOK, 400K}. At a first approximation this keeps the amount of transmitted data similar for each group and allows the task transmission/calculation to be pipelined in a reasonably efficient manner. In a preferred embodiment, the transfer of the files occurs in a pipelined manner, i.e.; where computation is overlapped with communication. Figure 5 illustrates the pipelined transfer of input files from a master to three slave processors. As can be seen in this example, the transfers to and from the master/nodes are staggered with the computation on the slaves being overlapped with the communication phase on one or more of the other processor nodes. This reduces t^ when executing a group of tasks on a slave processor.
Another example is that of 10 homogeneous tasks with ten completely homogeneous sets of input files {30K, 3OK, 30K, 3OK, 3OK, 30K, 3OK, 30K, 3OK, 30K}. Again, three groups of tasks are generated, one with four tasks and the others with three. As the tasks are completely homogeneous, each pair will have a file affinity of 1. Thus, following the simplified embodiment of the heuristic, the three groups of input files will be {30K}, {30K}, and {30K}.
These two extreme examples serve to illustrate how the initial static task grouping may be performed.
The size of each task may be calculated on the basis of the byte sum of all of the input files needed to execute each task on a computing unit or grid node. The file affinity may usefully be defined as k for which an affinity of 0.5 is considered acceptable as a benchmark for grouping tasks into a specified group. Essentially, this equates to setting the minimum degree of 'association' which is necessary to consider two tasks as related or sharing input files. This ensures that the file affinity is maximized within a group. Thus sending similar sets of files to multiple processors is avoided. As noted above, if the next set of files is different enough (i.e., has a file affinity with a previously allocated task less than the minimum), that task will be located at the next processor position. Firstly, this is done so that tasks with the smallest byte sum are sent initially. Secondly, this is done to guarantee that the groups are as uniform as possible in respect of the number of bytes that need to be transmitting from the master node. Thus, at initialization of the procedure, the number of tasks is allocated to each processor based on the processing power of the processor concerned and the file affinity, and the tasks are dispatched or transferred to the processor in a pipelined way. Here, pipelined means overlapping computational and communication steps.
However, this treatment assumes an initial grouping based on static information relating to the relative processing power of the grid nodes. As noted above, the number of tasks to be assigned to each group is determined such that the time needed for processing each group is substantially the same for each computing unit. This will depend on each processors relative speed based on the average speed of the processors in the cluster. For example, if the relative speed of a particular node processor is 1.0 compared to the average speed of the cluster nodes,
\~T~] the maximum number of tasks which should be assigned to that processor will be — .
This initial task allocation is static and based on an assumption that the relative speed of the node processors remains constant throughout the execution of the grid application.
In the case of a non-dedicated cluster environment or computational grid, this may not be true. For example, other users or unrelated processes may impose loads on one or more node processors on the grid or cluster. This will have the effect of varying the relative speeds of the node processors and thus reduce the efficacy of the initial task allocation.
According to one embodiment of the invention, dynamic characteristics of grid node power may be taken into account according to the following process, expressed as pseudocode and with reference to the flowchart of Figure 2:
-group - tasks according on the basis_ of _the relative processing power of the nodes constituting the computational grid and the total number of tasks to be executed on the grid (20) ;
schedule groups of tasks on nodes of the computational grid using a task queue (21) ;
on processor Pi completing the execution of a task (22) , do
o compute the mean execution time on processor P1
(23);
o update task queue (24);
o abort any still running replicas of the completed tasks (25) ;
o if processor P1 is idle (26)
if there are unfinished tasks on slower processors (29) ; • replicate the unfinished tasks on processor Pi (28) ;
- end do (22) .
According to this embodiment, the tasks are replicated between grid nodes taking into account the dynamic rate of task completion. This implicitly takes into account dynamic variation in the processing power of the grid nodes. This method also takes into account the potential failure of a slave or grid node by dynamically asserting the failure of a machine by considering a dynamically adjustable timeout at step (22). That is, if the master node does not receive any results for a period of time equal or larger than the timeout, the machine is considered offline and its allocated tasks are replicated on an available grid node. Further, if a node fails, a whole group may be replicated. Otherwise tasks for which processing has not begun on the slowest node are replicated. This has the effect of balancing the outstanding computation among the nodes.
Tn aΥurther embddimentfit ris possible to consider initially grouping-tasks-on a grid composed of a set of clusters as opposed to a grid composed of a set of processors. In this embodiment, the number of tasks assigned to each cluster will be calculated based on the requirement that the time needed by the cluster for processing each group is the same for each cluster. As before, this will depend on the processing speed of the cluster aggregate and will depend on the internal structure of the particular cluster such as the number of processors, load from other users etc. Once the number of tasks to be assigned to each cluster is determined statically, the allocation method proceeds substantially as described above. Following this initial grouping based on static information, dynamic replication can be used to take into account cluster-level variability.
The invention in further intended to cover the task scheduling/grouping technique in its most general sense as specified in the claims regardless of the possible size of the solution space for the affinity determination. It is also noted that the described embodiment of the invention may be applied to the distribution of tasks among nodes in a grid system where the computational characteristics of such nodes may take a variety of forms. That is, node processing may take the form of numerical calculation, storage or any other form of processing which might be envisaged as part of distributed application execution. Further, embodiments of the present invention may be included in a broader scheduling system in the context of allocating information to generalized computing resources. Although the invention has been described by way of example and with reference to particular simplified or reduced-scope embodiments it is to be understood that modification and/or improvements may be made without departing from the scope of the appended claims.
Where in the foregoing description reference has been made to integers or elements having known equivalents, then such equivalents are herein incorporated as if individually set forth.

Claims

Claims
1. A method of running grid applications on a grid, the grid comprising a plurality of computational units and the application comprising a plurality of tasks, the method including the steps of: estimating the task execution times on all computational units comprising the grid; grouping the tasks and assigning said groups to corresponding computational units; and, as the computational units complete execution of tasks, replicating tasks onto idle computational units in such a way that the remaining amount of computation is balanced between the computational units.
2. A method as claimed in claim 1 wherein the tasks are placed in a task queue once they have been allocated to a computational unit.
3. A method of running an application on a computational grid comprising a plurality of computational units, the application comprised of a plurality of tasks, the method including the steps of:
A) grouping the tasks according to the total number of computational units and total number of tasks based on an initial determination or assumption in respect of the relative processing power of the computational units constituting the computational grid;
B) scheduling groups of tasks on computational units of the computational grid using a task queue;
C) while there remain uncompleted tasks perform step D)
D) when a computational unit P1 completes the execution of at least one task, perform the following steps (a) to (d):
(a) compute the mean execution time for the completed task on computational unit P;;
(b) update the task queue;
(c) abort any still running replicas of the completed tasks;
(d) if computational unit Pj is idle perform the following steps (i) if there are unfinished tasks on slower computational units then replicate the unfinished tasks on computational unit P;;
E) end
4. A method as claimed in any of claims 1 to 3 wherein tasks are replicated so that the amount of outstanding computation is balanced among the computational units.
5. A method as claimed in any preceding claim wherein the initial grouping of tasks is based on a static determination of the relative processing power of the computational units.
6. A method as claimed in any of claims 2 to 4 wherein the task queue corresponds to a size ordered list of the tasks constituting the grid application.
7. A method as claimed in any preceding claim wherein one or more tasks or an entire group are replicated onto an idle computational unit.
8. A method as claimed- in claim 3 wherein in- step-D)-if -the- computational unit has not completed execution in a specified time, it is considered that that computational unit has failed or is offline and the method proceeds to step (d)(i) whereby any incomplete tasks allocated to that failed or offline computational unit are replicated onto an idle computational unit.
9. A method as claimed in claim 1 or 2 wherein if a computational unit has not completed execution in a specified time, it is considered that that computational unit has failed or is offline and any incomplete tasks allocated to that failed or offline computational unit are replicated onto an idle computational unit.
10. A method as claimed in any preceding claim wherein the computational units correspond to processors, nodes, clusters or other type of computing resources which can be considered as an grid resource, aggregated or otherwise.
11. A method as claimed in any one of claims 2 to 10 wherein the task queue is ordered taking into account input files which are shared between tasks or have a degree of association.
12. A method as claimed in any preceding claim wherein the tasks are grouped in according to a method of scheduling the running of an application on a plurality of computational units, said application comprising a plurality of tasks, each task having at least one input file associated therewith, said method including the steps of:
aggregating said plurality of tasks into one or more groups of tasks; and allocating each group of tasks to a computational unit, wherein the plurality of tasks are aggregated so that tasks which share one or more input file are included in the same group.
13. A method as claimed in any of claims 1 to 12 wherein the tasks are grouped according to a method of scheduling tasks among a plurality of computing units, the method including the following steps:
I) define the number of tasks to be assigned in groups to the computing units, where P is the number of computing units;
IF) compute the size of each task;
III) rank the task files in a list L in order of increasing size,
IV) for each group, beginning with the group with the largest number of tasks perform the following steps (a) to (e):
(i) assign the smallest unassigned task file to the group;-
(j) set the task file list position index equal to 1;
(k) while the group is not completely populated by task files perform the following steps:
(i) if the position index plus P is less than or equal than the size of the list L, and the task file affinity between the task file at the position index and the task file at the position index +1 is less than a specified value, k then increment the position index by P;
otherwise increment position index by 1;
(ii) assign to the group, the task file located at position in list L
(1) Remove assigned task files from List L
(m) Increment P = P - I
14. A network adapted to operate in accordance with the method as claimed in any of claims 1 to 13.
15. A system adapted to operate in accordance with the method as claimed in any of claims 1 to 13.
16. A computer adapted to perform the method as claimed in any of claims 1 to 13.
17. A computer program adapted to perform the steps of the method as claimed in any of claims 1 to 13.
18. A data carrier adapted to store a computer program as claimed in claim 17.
19. A master computational unit configured to carry out the method as claimed in any of claims 1 to 13.
20. A computational grid adapted to operate in accordance with the method as claimed in any of claims 1 to 13.
21. A scheduling system for an aggregate of computational resources adapted to operate in accordance with the method as claimed in any of claims 1 to 13.
PCT/US2005/039440 2004-10-29 2005-10-28 Methods and apparatus for running applications on computer grids WO2006050349A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0423990.1 2004-10-29
GB0423990A GB2419693A (en) 2004-10-29 2004-10-29 Method of scheduling grid applications with task replication

Publications (2)

Publication Number Publication Date
WO2006050349A2 true WO2006050349A2 (en) 2006-05-11
WO2006050349A3 WO2006050349A3 (en) 2009-04-09

Family

ID=33515734

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2005/039440 WO2006050349A2 (en) 2004-10-29 2005-10-28 Methods and apparatus for running applications on computer grids

Country Status (2)

Country Link
GB (1) GB2419693A (en)
WO (1) WO2006050349A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102508720A (en) * 2011-11-29 2012-06-20 中能电力科技开发有限公司 Method for improving efficiency of preprocessing module and efficiency of post-processing module and system
US9244751B2 (en) 2011-05-31 2016-01-26 Hewlett Packard Enterprise Development Lp Estimating a performance parameter of a job having map and reduce tasks after a failure
CN111464659A (en) * 2020-04-27 2020-07-28 广州虎牙科技有限公司 Node scheduling method, node pre-selection processing method, device, equipment and medium

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8150904B2 (en) * 2007-02-28 2012-04-03 Sap Ag Distribution of data and task instances in grid environments
CN102325255B (en) * 2011-09-09 2017-09-15 深圳融创新技术有限公司 A kind of multi-core CPU video code conversions dispatching method and system
CN103699445B (en) * 2013-12-19 2017-02-15 北京奇艺世纪科技有限公司 Task scheduling method, device and system
CN105022668B (en) * 2015-04-29 2020-11-06 腾讯科技(深圳)有限公司 Job scheduling method and system
CN109542620B (en) * 2018-11-16 2021-05-28 中国人民解放军陆军防化学院 Resource scheduling configuration method for associated task flow in cloud

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3648253A (en) * 1969-12-10 1972-03-07 Ibm Program scheduler for processing systems
US5410696A (en) * 1992-03-16 1995-04-25 Hitachi, Ltd. Method of processing a program by parallel processing, and a processing unit thereof
US6076174A (en) * 1998-02-19 2000-06-13 United States Of America Scheduling framework for a heterogeneous computer network
US6748593B1 (en) * 2000-02-17 2004-06-08 International Business Machines Corporation Apparatus and method for starvation load balancing using a global run queue in a multiple run queue system
US6988139B1 (en) * 2002-04-26 2006-01-17 Microsoft Corporation Distributed computing of a job corresponding to a plurality of predefined tasks

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6098091A (en) * 1996-12-30 2000-08-01 Intel Corporation Method and system including a central computer that assigns tasks to idle workstations using availability schedules and computational capabilities
AU7114200A (en) * 1999-08-26 2001-03-19 Parabon Computation System and method for the establishment and utilization of networked idle computational processing power
WO2002063479A1 (en) * 2001-02-02 2002-08-15 Datasynapse, Inc. Distributed computing system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3648253A (en) * 1969-12-10 1972-03-07 Ibm Program scheduler for processing systems
US5410696A (en) * 1992-03-16 1995-04-25 Hitachi, Ltd. Method of processing a program by parallel processing, and a processing unit thereof
US6076174A (en) * 1998-02-19 2000-06-13 United States Of America Scheduling framework for a heterogeneous computer network
US6748593B1 (en) * 2000-02-17 2004-06-08 International Business Machines Corporation Apparatus and method for starvation load balancing using a global run queue in a multiple run queue system
US6988139B1 (en) * 2002-04-26 2006-01-17 Microsoft Corporation Distributed computing of a job corresponding to a plurality of predefined tasks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BIZZARRI, P. ET AL.: 'Planning the Execution ofTask Groups in Real-Time Systems.' IEEE *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9244751B2 (en) 2011-05-31 2016-01-26 Hewlett Packard Enterprise Development Lp Estimating a performance parameter of a job having map and reduce tasks after a failure
CN102508720A (en) * 2011-11-29 2012-06-20 中能电力科技开发有限公司 Method for improving efficiency of preprocessing module and efficiency of post-processing module and system
CN102508720B (en) * 2011-11-29 2017-02-22 中能电力科技开发有限公司 Method for improving efficiency of preprocessing module and efficiency of post-processing module and system
CN111464659A (en) * 2020-04-27 2020-07-28 广州虎牙科技有限公司 Node scheduling method, node pre-selection processing method, device, equipment and medium

Also Published As

Publication number Publication date
GB2419693A (en) 2006-05-03
WO2006050349A3 (en) 2009-04-09
GB0423990D0 (en) 2004-12-01

Similar Documents

Publication Publication Date Title
US8234652B2 (en) Performing setup operations for receiving different amounts of data while processors are performing message passing interface tasks
US8108876B2 (en) Modifying an operation of one or more processors executing message passing interface tasks
US8127300B2 (en) Hardware based dynamic load balancing of message passing interface tasks
WO2006050349A2 (en) Methods and apparatus for running applications on computer grids
US8701112B2 (en) Workload scheduling
Shih et al. Performance study of parallel programming on cloud computing environments using mapreduce
Genaud et al. Load-balancing scatter operations for grid computing
Malik et al. Optimistic synchronization of parallel simulations in cloud computing environments
James Scheduling in metacomputing systems
Wang et al. MATRIX: MAny-Task computing execution fabRIc at eXascale
Lu et al. Morpho: a decoupled MapReduce framework for elastic cloud computing
Liu et al. Funcpipe: A pipelined serverless framework for fast and cost-efficient training of deep learning models
Rajendran et al. Matrix: Many-task Computing Execution Frabic for Extreme Scales
Senger Improving scalability of Bag-of-Tasks applications running on master–slave platforms
Díaz et al. Derivation of self-scheduling algorithms for heterogeneous distributed computer systems: Application to internet-based grids of computers
Weissman et al. Integrated scheduling: the best of both worlds
Saule et al. Optimizing the stretch of independent tasks on a cluster: From sequential tasks to moldable tasks
Mohamed et al. DDOps: dual-direction operations for load balancing on non-dedicated heterogeneous distributed systems
Choi et al. A taxonomy of desktop grid systems focusing on scheduling
Abawajy Adaptive hierarchical scheduling policy for enterprise grid computing systems
Polo et al. Adaptive task scheduling for multijob mapreduce environments
Mian et al. Managing data-intensive workloads in a cloud
Moens et al. Management of customizable software-as-a-service in cloud and network environments
Cera Providing adaptability to MPI applications on current parallel architectures
Ghazali et al. CLQLMRS: improving cache locality in MapReduce job scheduling using Q-learning

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KN KP KR KZ LC LK LR LS LT LU LV LY MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 05815068

Country of ref document: EP

Kind code of ref document: A2