EP1318453A1 - Scheduling system, method and apparatus for a cluster - Google Patents

Scheduling system, method and apparatus for a cluster Download PDF

Info

Publication number
EP1318453A1
EP1318453A1 EP01410158A EP01410158A EP1318453A1 EP 1318453 A1 EP1318453 A1 EP 1318453A1 EP 01410158 A EP01410158 A EP 01410158A EP 01410158 A EP01410158 A EP 01410158A EP 1318453 A1 EP1318453 A1 EP 1318453A1
Authority
EP
European Patent Office
Prior art keywords
task
jobs
tasks
nodes
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP01410158A
Other languages
German (de)
French (fr)
Inventor
designation of the inventor has not yet been filed The
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HP Inc
Original Assignee
Hewlett Packard Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Co filed Critical Hewlett Packard Co
Priority to EP01410158A priority Critical patent/EP1318453A1/en
Priority to US10/313,903 priority patent/US20030135621A1/en
Publication of EP1318453A1 publication Critical patent/EP1318453A1/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Definitions

  • the present invention relates to task scheduling in multicomputer systems having a plurality of nodes. That is, a network of processors having independent processors and memories capable of executing different instruction streams simultaneously. More particularly, although not exclusively, the present invention relates to inter and intra-job scheduling of parallel programs on a heterogeneous cluster of computational resources.
  • the improved scheduling protocol may particularly, but without limitation, be applicable to the control and execution of application programs executing in a cluster of heterogeneous computers.
  • a cluster typically comprises a loosely coupled network of computers having independent processors and memory capable of executing different instructions streams simultaneously.
  • a network provides inter-processor communication in the cluster.
  • Applications that are distributed across the processors of the cluster use either message passing or network shared memory for communication.
  • Programs are often parallelised using MPI libraries for inter-processor communication.
  • a critical aspect of a cluster system is task scheduling.
  • Task schedulers manage the execution of independent jobs or batches of jobs, in support of an application program.
  • An application program performs a specific function for a user.
  • Application programs particularly suited to parallel cluster systems are those with a high degree of mathematical complexity, interdependency and raw microprocessor demand. Examples include finite-element analysis, nuclear and sub-nuclear scattering calculations and data analysis and multi-dimensional modeling calculations involving sequential or heuristic approaches that typically consume large numbers of microprocessor cycles.
  • On of the primary functions of the task scheduler is to optimize the allocation of available microprocessor resources across a plurality of prioritized jobs.
  • optimizing task scheduling can lead to significant improvements in the apparent processing power or speed of the cluster.
  • Known task-scheduling techniques tend to treat parallel application programs as distinctive monolithic blocks or groups of monolithic blocks whose width corresponds to the number of processors used by the program and whose height represents the estimated computational time for the program.
  • These jobs are organized in a logical structure called a precedent tree or data flow graph that is a constraint which is used to allocate how the parallel program tasks are distributed across the cluster.
  • This scheduling policy approach conceals the parallel programs (or jobs) elementary processes (or tasks) and the effect of this is that the parallel program does not constantly utilize the entire number of processors, or nodes, that are, or could be, assigned to it. Idle processors that are not available for use by other jobs in a different parallel application program can thus degrade the apparent throughput of the parallel processing system.
  • optimization of the task-scheduler can therefore lead to significant enhancements in the processing power and speed of a parallel processing cluster and it is an object of the present invention to provide an improved task-scheduling technique that overcomes or at least ameliorates the abovementioned problems.
  • the invention provides for a method of optimizing a task-scheduling system comprising decomposing one or more parallel programs into its component tasks and dynamically moving the parallel programs tasks into any available idle nodes in such a way that the execution time of the parallel program is decreased.
  • the invention provides for a method of optimizing a task-scheduling system comprising representing one or more parallel programs, or jobs, as unitary two-dimensional blocks equating to the amount of time that the job will take to execute for a specified number of processors, or nodes, wherein the jobs are queued in an array whose width corresponds to the total number of available nodes in any single time interval, wherein each job is positioned in the array according to a block-packing algorithm.
  • the block-packing algorithm is preferably such that the packing of the jobs at the block level is substantially optimized for any arrangement of jobs in the array.
  • the method further includes the step of decomposing one or more jobs into their component time-unitary tasks and dynamically redistributing the tasks into any available idle nodes in such a way as to exploit any idle nodes within the structure of any of the jobs in the array thereby decreasing the execution time of at least one of the jobs.
  • the width of the block represents the needed computational power and the height of the block corresponds to the expected or required duration of the job.
  • the array may be represented by a bin having a horizontal, equally dimensioned array of nodes, and a vertically, equally spaced, time increment.
  • the array may be represented by a bin having a horizontal, unequally dimensioned, array of nodes, and/or a vertically, unequally spaced, time increment.
  • the invention provides for a method of creating and/or modifying a data flow graph in a parallel multicomputer system, comprising the steps of:
  • the tasks may have variable duration from time-unitary, thus representing tasks that require varying computational power and when queued, are represented as vertically distorted tasks.
  • the horizontal axis of the queue bin representing the nodes may be unequally dimensioned, thus representing a heterogeneous cluster of nodes where some nodes have different computational power.
  • the resulting data flow graph includes tasks that have an apparent difference in duration.
  • the allocation of tasks to holes is adapted to take into account the apparent time-distortion of the tasks.
  • the modification of the data flow graph is adapted to take into account the time required by the processor to change its working context.
  • the tasks may be distorted in the time axis to allow for overduration representing the time needed for the processor to change working context.
  • the invention also provides for a network of computing resources adapted to operate in accordance with the method as hereinbefore defined.
  • the invention also provides for a computing device adapted to schedule tasks on a cluster in accordance with the method as hereinbefore defined.
  • the present invention will be described in the context of a cluster similar to that shown in figure 1.
  • This cluster represents a test-bed which has undergone trials by the applicant and consists of 225 Hewlett Packard e-Vectras each including a PIII microprocessor running at 733MHz, 256Mb of RAM and a 15Gb hard disk. The machines are linked together by a fast Ethernet network.
  • the cluster runs Linux Mandrake 7.1 and managed by Open PBS. In this configuration, the cluster exhibits a capacity of 81.6Gflops.
  • a convenient first level of abstraction is to consider the group of computers as simply a processor cluster. Thus, in the following discussion, reference will be made to either a cluster or a processor cluster. More realistic treatments will include the effects of RAM, disk resources, administration hardware and other support devices in the cluster.
  • FIG. 2 A useful model for visualizing the principle and operation of an embodiment of the invention is illustrated in figure 2.
  • the time-dependant power characteristics of a cluster can be represented by a glass into which cubes of ice, or jobs, are dropped.
  • the glass is analogous to a bin queue which can be described as follows, and is named with reference to the use of the expression 'bins' in signal processing technologies.
  • a bin queue can be represented by the two-dimensional Gantt chart shown in the right of figure 2. This has a horizontal axis corresponding to an evenly spaced array (bins) of microprocessors, nodes or other unit of computing capacity.
  • the vertical axis represents an evenly spaced time scale wherein each unitary step corresponds to a unit of time. For a homogeneous computer cluster, this can correspond to a unit of calculation time or processor cycle time.
  • This representation can be referred to as a bin queue as it represents a plurality of microprocessor 'bins' into which are queued or allocated parallel processing applications or jobs. It is also appropriate to refer to these constructs as Gantt charts as they represent the time progress of a sequence of interrelated tasks.
  • the jobs are initially considered to be solid entities having no internal structure or adaptability to the glass.
  • Each job has a width corresponding to the anticipated number of processors that are needed and the height corresponds to the expected or required execution time for the job.
  • the jobs like the ice-cubes in the glass, arrange themselves in a configuration that attempts to make best use of the available space.
  • algorithms for achieving such a packing arrangement and these attempt to efficiently pack a series of regularly shaped blocks in a time-sequenced fashion, into a bin.
  • Other packing paradigms may be known in the art and are not to be excluded from the scope of the invention.
  • the internal structure of the jobs can be represented by drops of water associated with the cubes. These are identified with the elementary units, or 'tasks' of the job. As can be seen from figure 2, tasks constitute the structure of a notional block that in turn represents the coarse structure of the job as a whole.
  • the complexity of a job can be quantified by its internal computational complexity as well as the time which is required to do the job. This can be quantified by a recursive analysis of the tasks. That is, the function inside the task, the function inside the function etc.
  • Constraints that affect the job itself include the duration of the tasks in the job, the extent of the data flow graph knowledge. In the case of online scheduling, the data flow is determined continuously but is completely determined in the case of offline scheduling.
  • a tasks degree reflects its interconnectedness. That is, the higher degree, the more branches a node has connected thereto.
  • the first exemplary embodiment which is described below will focus on a novel task scheduling system for a static, homogeneous cluster of processors where inter-processor communication and data unpacking time is negligible.
  • the latter issues referred to above will be discussed by reference to a modified form of the exemplary embodiment.
  • the jobs have an internal structure comprising unitary tasks, represented by the circles in the Gantt chart, and vacant power/time space.
  • the internal structure of the jobs will depend on the specific parallel application represented by the job. However, it can be seen that there is idle capacity within the internal task structure of the jobs.
  • a Gantt chart of a bin queue is shown. It is assumed that the jobs, represented by the blocks, have been submitted and characteristics of these jobs (arrival time, expected duration, computational power needs) are known and stored in a database.
  • the jobs are placed in the Gantt chart as shown according to a packing algorithm.
  • the jobs are 'dropped' (grey blocks in figure 3(a)) into the qu eue, they define holes, or units of idle time (see figure 3(b)). These units of idle time are logically represented as holes having the same time and power dimensions at the detailed task structure elements of the jobs themselves. Creating a logical representation H of the gross hole structure in the data structure completes the first phase in the process.
  • the second phase begins by refining the schedule.
  • Each of the jobs on the job list is scanned in order from the first to arrive to the last, and a set of tasks is created. This is analogous to decomposing the jobs into their unitary task structure at a detailed level while including time dependency information for the tasks.
  • This functionality may be handled by an external application that builds a data flow graph.
  • a new data structure is created which stores the assignment of each of the tasks, i.e.; the node identification, job relationship, time sequence of the tasks.
  • This result of this phase is that new holes are added into the data structure as shown in Figures 3(c) and (d).
  • the cluster is homogeneous and static. Therefore, the width of each of the bins in the bin queue is constant. Also, as each of the task time demands is assumed to be the same, the vertical axis is constant.
  • Figure 3(d) illustrates the completed data structure H that represents all of the idle holes.
  • Each task in the Gantt chart is then rescanned from bottom to the top, i.e.; from the first to the last job, and it is determined whether it is possible to move the tasks down in the schedule. This is done by analyzing the position of the task in comparison to the position of the lowest holes in the data structure. If the hole is lower than the task, the task is dropped into the hole and the data structure is updated. The second phase is completed when all of the tasks have been scanned and, where possible, moved or promoted.
  • the advantage derived from the optimal case is calculated.
  • This optimal situation can be represented by a data structure in which all of the scheduled jobs use all of the available processors all of the time. That is, the width of each job is equal to the width of the Gantt chart. This situation is illustrated in figure 4. At the end of the first phase, the data structure is full with no holes resulting from any non-ideal packing.
  • figure 5(a) and (b) If the granularity is changed to task-level (see figure 5(a) and (b)), it is possible to look at the internal data structure of the jobs Intuitively, one would think that the data structure would not be empty as the hypothetical nature of distributed computing applications implies that there should exist a master process that is responsible for initiating the other downstream processes as well as a process which detects the termination of the parallel program. This situation is represented in figure 5 where the job in 5(a), which ostensibly resembles full utilization of processors over time, is decomposed into its constituent tasks. When considered at the higher degree of granularity, figure 5(b) shows a task tree for an idealized distributed processor program.
  • the first time period that is the first row is consumed by the task that initiates the rest of the parallel program.
  • the job finishes with a task that detects the end of the job.
  • These holes cannot be populated by tasks from within the job, as the lower holes cannot 'know' about anything happening after the job starts and the upper holes are created after the job is finished. Thus there is what is known as incompressible idle time at the beginning and end of the job.
  • the correlation between the job number N, the width of the bin queue L and the advantage between phase 1 and 2 on a quantity denoted g m and the amount of idle time g i can be determined. This is achieved by looking at a sequence of, for example, 4 idealized jobs as shown in figure 6.
  • the advantage shown in figure 6(c) can be obtained.
  • the jobs are decomposed as shown in figure 6(b) in the first phase, and the tasks then scanned from bottom to top and, for convenience, left to right.
  • the first task in the second job can populate a hole in the first job, for example the bottom right.
  • This allows tasks in the second row of the second job to populate upper incompressible idle time in the first job.
  • the data structure shown in figure 6(c) This approach is of limited use when trying to find a general relation that specifies the advantage, so it is useful to consider task promotion that leaves a symmetrical data structure.
  • This scenario is shown in figure 7 whereby after phase 1, the tasks are rearranged in a symmetrical manner as in figure 7(a) and then compressed as shown in figure 7(c).
  • the formula provides the following analysis. Initially, the makespan is equal to 15 time units. After refinement, it becomes 9, so the advantage is
  • N% L corresponds to N modulo L. If a job of smaller width is considered as is shown in figure 8 and if it is considered that all jobs will require a minimum of two processors, grouping 'beginner' tasks together at the bottom of the Gantt chart and the ending tasks together, the result in figure 8 is obtained. Contrary to the previous situation, the height is not conserved when an equivalent Gantt chart is built by grouping beginner and ending tasks together. Also, some tasks appear in the middle of the bin queue as the job blocks only incompletely fill the available width.
  • a more complex embodiment can be considered if the situation is considered where the tasks can have non-unitary duration. That is, the expected duration of the tasks or the time required for the tasks to run on a particular processor varies between tasks in the job.
  • This situation is illustrated by the two job structures shown in figure 9.
  • a unitary task job is shown in the left in figure 9.
  • the radius of the tasks is increased as in the job block in the right of figure 9.
  • the cluster is assumed to be homogeneous in this case, the task nodes must be distorted to make them 'fit' into the logical delimitation of the processor to which it is assigned. It is assumed that conservation of surface of the task applies, that is, the number of instructions in the task is constant.
  • the expected duration of each task must be determined, the data structure must be capable of storing the expected duration for each task and the reallocation of the tasks must check that the reorganized data structure can accommodate the distorted internal structure of the job.
  • the cluster may include computers having processors or other hardware that has the effect of varying of speeds and capacities.
  • the nodes may have different speeds reflecting different processor speeds, cache memory or similar.
  • the horizontal axis of the bin queue is distorted to take into account the relative different processing power of the nodes.
  • the tasks have the same duration, but are performed on processors of different speeds, the 'apparent' task duration changes.
  • a job having the internal unitary task structure is queued in the Gantt chart by distorting the tasks horizontally. This results in additional apparent holes that affect the granularity of the job and thus it's amenability to population by tasks in later jobs.
  • the jobs are always considered as rectangles where the height corresponds to the maximum duration required to perform the parallel program. That is, some tasks assigned to a powerful machine will finish before other tasks even if they are on the same level in the data flow graph.
  • the border of the parallel program is not a rectangle but distorted where the upper edge is not a line or segment, but a set of segments.
  • the first situation is relatively straightforward as the first phase of the algorithm is preserved.
  • the second phase is changed taking into account the differences between the computational nodes when a task is to be promoted. This is done by searching for the lowest holes in the data structure. If there is a hole below the task being considered, the duration of the task is multiplied by the coefficient of the processor corresponding to the holes to obtain a duration d '. This is compared with the duration of the hole to determine if there is sufficient space to place the task in it. If there is enough space, the task is moved and the data structure updated. If there is insufficient space, another hole is tested for suitability. If the task cannot be moved, the process moves to the next task and the procedure repeated.
  • the second case is more complex as the first phase algorithm needs to be changed to take into account the shape that represents the jobs.
  • the job shape will be irregular and thus the packing algorithm will be more complicated thus increasing the complexity of the jobs.
  • the time required by the processor to change its working context is taken into account. This situation is shown in figures 13 and 14. This period equates to the time between saving a tasks data and program of a first job and loading the data and program for the next task or program of another job. Considering a job containing a plurality of tasks, context change time compensation will not be required if there is no alteration in the working context. If there is a change of processor working context, time will be added before the task to allow for the saving and loading of data and the program by the processor. Introducing this "overduration" approximates more closely a real situation as it is possible that context changing may affect the operation of a processor cluster, particularly where the cluster is heterogeneous.
  • the first and second phase steps need only be modified slightly. If the cluster is homogeneous, the first phase is unchanged as there is only a change of working context at the beginning of each job. Therefore, it is only necessary to include an overduration at the beginning of (i.e.; under) the rectangle representing the job. If the cluster is heterogeneous, the overduration will change depending on the power of the node. This can be taken into account by the first phase protocol.
  • the duration of the task is multiplied by the coefficient of the processor corresponding to the hole to obtain a duration d' .
  • the overduration thereby obtaining d" .
  • This value is compared with the duration of the hole to determine if there is sufficient space in it to place the task. If there is enough space, the task in promoted and the data structure updated. If not, a new hole is found and the procedure is repeated. If the task cannot be moved, the procedure moves to the next task.
  • FIG 13(a) shows a hole between tasks of the same job.
  • the task in figure 13(b) is to be moved to this hole and it is therefore necessary to add an overduration (figure 13(c)) to take into account the change in the working context of the processor. It is then determined whether or not the distorted task can be placed in the hole, and the data structure updated in response to any move.
  • Another embodiment is where inter-processor communication time is non-zero. This situation is shown in figure 14 whereby the synchronization barrier causes idle time to be introduced between tasks.
  • the invention can be modified to take into account this extra idle time with the reallocation or promotion of tasks depending on the modified internal job structure.
  • a disturbance credit reflects the degree of disturbance that a user causes by introducing a higher priority job into the cluster processing stream.
  • a transfer of disturbance credits results from a user-provoked disturbance in the bin queue whereby a 'wronged' user gains these credits when their job is adversely affected.
  • Figure 15 illustrates an example where a new job having a duration of two units is dropped into a bin queue. The simplest case is shown where the arrival of this job does not introduce a disturbance in the previously computed schedule and the starting time of this job equals the earliest time where two processors are available.
  • Figure 16 shows a more complicated situation where the job has an inherently higher priority and is to be promoted or advanced by one time unit. To allow this, there are two possibilities shown in figure 16. In the Gantt chart at the left, job A might be retarded and in the right, jobs B and C might be retarded. If the resulting schedules are analyzed, it can be seen that the second schedule has introduced less disturbance because the total idle times are lower.
  • the number of jobs moved is greater.
  • the jobs which are moved are owned by users who have been disadvantaged. They are compensated by receiving a disturbance credit that reflects the degree of disturbance. This can be quantified according to the width of the job and the vertical displacement which it undergoes. For example, job A which is 5 units wide, when delayed 2 time units, would accumulate a disturbance credit of 10 units.
  • jobs can have a higher degree of granularity.
  • some jobs can be absorbed by an existing schedule due to the presence of holes having a size equivalent to that of the tasks. Jobs can also be represented by shapes other than rectangles as the set of tasks in the job does not necessarily take up all of the perimeter of the shape. Given this situation, the introduction of the new jobs or set of tasks does not necessarily cause an automatic disturbance. In fact, when such a job is introduced into the scheme, the user can earn new disturbance credits.
  • the invention provides a new approach to task scheduling in clusters.
  • the technique is extensible and can be refined to take into account real-world behavior and attributes of processor clusters such as finite inter-processor communication time and context changing time as well as being amenable to use in heterogeneous clusters. It is envisaged that there are further extensions and modifications that will be developed, however it is considered that these will retain the inventive technique as described herein.
  • task scheduling technique would be particularly useful in multi-user processor clusters running applications such as finite element analysis computationally intensive numerical calculations, modeling and statistical analysis of experimental data.

Abstract

The invention provides for a method of optimizing a task-scheduling system where the method comprises decomposing one or more parallel programs into its component tasks and dynamically redistributing the parallel programs tasks into any available idle nodes in such a way that the execution time of the parallel program is decreased. The parallel programs, or jobs, may be represented as unitary two-dimensional blocks equating to the amount of time that the job will take to execute for a specified number of processors, or nodes, wherein the jobs are queued in, or dropped into, in an array whose width corresponds to the total number of available nodes in any single time interval. In one embodiment, the first phase of the technique may implement an algorithm to position each job in the array. The invention also provides extensions to take into account real-world behavior such as finite inter-processor communication time and context switching between jobs. Applications include finite element analysis, computationally intensive numerical calculations, modeling and statistical analysis of experimental data.

Description

    Technical Field
  • The present invention relates to task scheduling in multicomputer systems having a plurality of nodes. That is, a network of processors having independent processors and memories capable of executing different instruction streams simultaneously. More particularly, although not exclusively, the present invention relates to inter and intra-job scheduling of parallel programs on a heterogeneous cluster of computational resources. The improved scheduling protocol may particularly, but without limitation, be applicable to the control and execution of application programs executing in a cluster of heterogeneous computers.
  • Background Art
  • Improvements in microprocessors, memory, buses, high-speed networks and software have made it possible to assemble groups of relatively inexpensive commodity-off-the-shelf (COTS) components having processing power rivaling that of supercomputers. This has had the effect of pushing development in parallel computing away from specialized platforms such as the Cray/SGI to cheaper, general-purpose systems or clusters consisting of loosely coupled components built from single or multi-processor workstations or PCs. Such an approach can provide a substantial advantage, as it is now possible to build relatively inexpensive platforms that are suitable for a large class of applications and workloads.
  • A cluster typically comprises a loosely coupled network of computers having independent processors and memory capable of executing different instructions streams simultaneously. A network provides inter-processor communication in the cluster. Applications that are distributed across the processors of the cluster use either message passing or network shared memory for communication. Programs are often parallelised using MPI libraries for inter-processor communication.
  • It has also been proposed to use conventionally networked computing resources to carry out cluster-style computational tasks. According to a version of this model, jobs are distributed across a number of computers in order to exploit idle time, for example while a network of PCs is unused out of business hours. Discussions related to clusters may be applied equally to loosely coupled heterogeneous networks of computers. Other types of clustered computer resources may include what are known as "blade" systems. This latter cluster topology is not necessarily distributed physically, but may nevertheless be operated as a homogeneous or heterogeneous processor cluster.
  • A critical aspect of a cluster system is task scheduling. A number of task scheduling systems exist in the prior art with many of these existing within operating systems designed for single processor computer systems or multiple processor systems with operating systems designed for shared memory.
  • Task schedulers manage the execution of independent jobs or batches of jobs, in support of an application program. An application program performs a specific function for a user. Application programs particularly suited to parallel cluster systems are those with a high degree of mathematical complexity, interdependency and raw microprocessor demand. Examples include finite-element analysis, nuclear and sub-nuclear scattering calculations and data analysis and multi-dimensional modeling calculations involving sequential or heuristic approaches that typically consume large numbers of microprocessor cycles.
  • On of the primary functions of the task scheduler is to optimize the allocation of available microprocessor resources across a plurality of prioritized jobs. Thus, optimizing task scheduling can lead to significant improvements in the apparent processing power or speed of the cluster.
  • Known task-scheduling techniques tend to treat parallel application programs as distinctive monolithic blocks or groups of monolithic blocks whose width corresponds to the number of processors used by the program and whose height represents the estimated computational time for the program. These jobs are organized in a logical structure called a precedent tree or data flow graph that is a constraint which is used to allocate how the parallel program tasks are distributed across the cluster. This scheduling policy approach conceals the parallel programs (or jobs) elementary processes (or tasks) and the effect of this is that the parallel program does not constantly utilize the entire number of processors, or nodes, that are, or could be, assigned to it. Idle processors that are not available for use by other jobs in a different parallel application program can thus degrade the apparent throughput of the parallel processing system.
  • Optimization of the task-scheduler can therefore lead to significant enhancements in the processing power and speed of a parallel processing cluster and it is an object of the present invention to provide an improved task-scheduling technique that overcomes or at least ameliorates the abovementioned problems.
  • Disclosure of the Invention
  • In one aspect, the invention provides for a method of optimizing a task-scheduling system comprising decomposing one or more parallel programs into its component tasks and dynamically moving the parallel programs tasks into any available idle nodes in such a way that the execution time of the parallel program is decreased.
  • In an alternative aspect the invention provides for a method of optimizing a task-scheduling system comprising representing one or more parallel programs, or jobs, as unitary two-dimensional blocks equating to the amount of time that the job will take to execute for a specified number of processors, or nodes, wherein the jobs are queued in an array whose width corresponds to the total number of available nodes in any single time interval, wherein each job is positioned in the array according to a block-packing algorithm.
  • The block-packing algorithm is preferably such that the packing of the jobs at the block level is substantially optimized for any arrangement of jobs in the array.
  • Preferably, the method further includes the step of decomposing one or more jobs into their component time-unitary tasks and dynamically redistributing the tasks into any available idle nodes in such a way as to exploit any idle nodes within the structure of any of the jobs in the array thereby decreasing the execution time of at least one of the jobs.
  • Preferably, the width of the block represents the needed computational power and the height of the block corresponds to the expected or required duration of the job.
  • To represent a homogeneous cluster of nodes, the array may be represented by a bin having a horizontal, equally dimensioned array of nodes, and a vertically, equally spaced, time increment.
  • To represent a heterogeneous cluster of nodes, the array may be represented by a bin having a horizontal, unequally dimensioned, array of nodes, and/or a vertically, unequally spaced, time increment.
  • In an alternative aspect, the invention provides for a method of creating and/or modifying a data flow graph in a parallel multicomputer system, comprising the steps of:
    • characterizing one or more jobs in terms of expected execution duration and computational power needs;
    • placing the jobs in a queue, the queue viewed as a two-dimensional array of nodes and time, according to a bin-packing algorithm;
    • locating idle times, or holes, within the jobs;
    • scanning each of the jobs in order to build a data flow graph which includes reference to the holes;
    • scanning the queue from earliest to the last, and attempt to move each task down in the queue by analyzing the position of each task in comparison to the position of the lowest holes in the data structure and if the hole is lower than the task, moving the task in the queue to fill the hole and thus updating the data flow graph; and
    • repeating the scanning process until the maximum number of available holes have been filled and a modified data flow graph has been created.
  • In an alternative embodiment, the tasks may have variable duration from time-unitary, thus representing tasks that require varying computational power and when queued, are represented as vertically distorted tasks.
  • In yet an alternative embodiment, the horizontal axis of the queue bin representing the nodes may be unequally dimensioned, thus representing a heterogeneous cluster of nodes where some nodes have different computational power.
  • Where the nodes are unequally spaced, the resulting data flow graph includes tasks that have an apparent difference in duration.
  • In the heterogeneous node case, the allocation of tasks to holes is adapted to take into account the apparent time-distortion of the tasks.
  • In yet a further embodiment, the modification of the data flow graph is adapted to take into account the time required by the processor to change its working context.
  • When the change in working context is taken into account, the tasks may be distorted in the time axis to allow for overduration representing the time needed for the processor to change working context.
  • The invention also provides for a network of computing resources adapted to operate in accordance with the method as hereinbefore defined.
  • The invention also provides for a computing device adapted to schedule tasks on a cluster in accordance with the method as hereinbefore defined.
  • Brief Description of the Drawings
  • The present invention will now be described by way of example only and with reference to the drawings in which:
  • Figure 1:
    Illustrates an example of an embodiment of a topology of a cluster;
    Figure 2:
    Illustrates the analogy between ice-cubes in a glass and job placement in a queued bin;
    Figure 3:
    Illustrates a sequence of Gantt charts for a bin queue and their corresponding hole representation showing the task structure within an array of jobs for a two phase scheduling system according to an embodiment of the invention;
    Figure 4:
    Illustrates a simple example of a Gantt chart of a bin queue showing how a hole representation for consecutive monolithic jobs is represented in phase one, which in this case has no holes;
    Figure 5:
    Illustrates an hypothetical job and its internal task structure;
    Figure 6:
    Illustrates a sequence of Gantt charts for a bin queue showing reallocation of holes in the job structures over two phases for a sequence of idealized jobs;
    Figure 7:
    Illustrates a rearrangement model for calculating the advantage gained by the scheduling method;
    Figure 8:
    Illustrates an example of a more complex task scheduling procedure;
    Figure 9:
    Illustrates how the tasks size is varied for tasks having variable complexity requirements;
    Figure 10:
    Illustrates how non-unitary tasks in a job are distorted when queued in a homogeneous cluster data structure;
    Figure 11:
    Illustrates how the width of the nodes in the data graph may be varied to represent cluster heterogeneity;
    Figure 12:
    Illustrates the assignment of a unitary-task job to a heterogeneous cluster and how the tasks is distorted;
    Figure 13:
    Illustrates a method by which program context modification may be taken into account;
    Figure 14:
    Illustrates how non-negligible processor communication time may be represented in a job;
    Figure 15:
    Illustrates a prioritization scheme applicable to the invention;
    Figure 16:
    Illustrates how the prioritization scheme can reallocate jobs in a bin queue; and
    Figure 17:
    Illustrates how a task can enter the bin queue without causing any disruption.
    Best Mode for Carrying Out the Invention
  • The present invention will be described in the context of a cluster similar to that shown in figure 1. This cluster represents a test-bed which has undergone trials by the applicant and consists of 225 Hewlett Packard e-Vectras each including a PIII microprocessor running at 733MHz, 256Mb of RAM and a 15Gb hard disk. The machines are linked together by a fast Ethernet network. The cluster runs Linux Mandrake 7.1 and managed by Open PBS. In this configuration, the cluster exhibits a capacity of 81.6Gflops. A convenient first level of abstraction is to consider the group of computers as simply a processor cluster. Thus, in the following discussion, reference will be made to either a cluster or a processor cluster. More realistic treatments will include the effects of RAM, disk resources, administration hardware and other support devices in the cluster.
  • A useful model for visualizing the principle and operation of an embodiment of the invention is illustrated in figure 2. According to this model, the time-dependant power characteristics of a cluster can be represented by a glass into which cubes of ice, or jobs, are dropped. The glass is analogous to a bin queue which can be described as follows, and is named with reference to the use of the expression 'bins' in signal processing technologies.
  • The simplest example of a bin queue can be represented by the two-dimensional Gantt chart shown in the right of figure 2. This has a horizontal axis corresponding to an evenly spaced array (bins) of microprocessors, nodes or other unit of computing capacity. The vertical axis represents an evenly spaced time scale wherein each unitary step corresponds to a unit of time. For a homogeneous computer cluster, this can correspond to a unit of calculation time or processor cycle time. This representation can be referred to as a bin queue as it represents a plurality of microprocessor 'bins' into which are queued or allocated parallel processing applications or jobs. It is also appropriate to refer to these constructs as Gantt charts as they represent the time progress of a sequence of interrelated tasks.
  • Again referring to figure 2, into the bin queue are 'dropped' parallel programs or 'jobs'. By analogy with ice-cubes, the jobs are initially considered to be solid entities having no internal structure or adaptability to the glass. Each job has a width corresponding to the anticipated number of processors that are needed and the height corresponds to the expected or required execution time for the job. In this first phase, the jobs, like the ice-cubes in the glass, arrange themselves in a configuration that attempts to make best use of the available space. There are a number of algorithms for achieving such a packing arrangement and these attempt to efficiently pack a series of regularly shaped blocks in a time-sequenced fashion, into a bin. Other packing paradigms may be known in the art and are not to be excluded from the scope of the invention.
  • The internal structure of the jobs can be represented by drops of water associated with the cubes. These are identified with the elementary units, or 'tasks' of the job. As can be seen from figure 2, tasks constitute the structure of a notional block that in turn represents the coarse structure of the job as a whole. The complexity of a job can be quantified by its internal computational complexity as well as the time which is required to do the job. This can be quantified by a recursive analysis of the tasks. That is, the function inside the task, the function inside the function etc.
  • Armed with this mental construct, the operation of an embodiment of the invention can be described as follows.
  • Initially, we shall consider a homogeneous processor cluster. This equates to a cluster of processors which have the same or substantially similar calculational capacity. It is also assumed that the size of the cluster is invariant and that there is no latency in the communication links underlying the network.
  • There are other inherent constraints that can affect the operation of the task scheduling system. These include the time required by a processor to switch from one working context to another, the duration of the data packing/unpacking process when circulating on the network and the time that data need to circulate on the network. Constraints that affect the job itself include the duration of the tasks in the job, the extent of the data flow graph knowledge. In the case of online scheduling, the data flow is determined continuously but is completely determined in the case of offline scheduling.
  • Further parameters that might affect the operation of the task scheduling system include the addition of prioritization information for a job and the degree of a task in a data flow graph. Here, a tasks degree reflects its interconnectedness. That is, the higher degree, the more branches a node has connected thereto.
  • Given these constraints, the first exemplary embodiment which is described below will focus on a novel task scheduling system for a static, homogeneous cluster of processors where inter-processor communication and data unpacking time is negligible. The latter issues referred to above will be discussed by reference to a modified form of the exemplary embodiment.
  • Referring again to figure 2, it can be seen that the jobs have an internal structure comprising unitary tasks, represented by the circles in the Gantt chart, and vacant power/time space. The internal structure of the jobs will depend on the specific parallel application represented by the job. However, it can be seen that there is idle capacity within the internal task structure of the jobs.
  • Referring to figure 3(a), a Gantt chart of a bin queue is shown. It is assumed that the jobs, represented by the blocks, have been submitted and characteristics of these jobs (arrival time, expected duration, computational power needs) are known and stored in a database. In phase 1 of the scheduling process, the jobs are placed in the Gantt chart as shown according to a packing algorithm. As the jobs are 'dropped' (grey blocks in figure 3(a)) into the qu eue, they define holes, or units of idle time (see figure 3(b)). These units of idle time are logically represented as holes having the same time and power dimensions at the detailed task structure elements of the jobs themselves. Creating a logical representation H of the gross hole structure in the data structure completes the first phase in the process.
  • The second phase begins by refining the schedule. Each of the jobs on the job list is scanned in order from the first to arrive to the last, and a set of tasks is created. This is analogous to decomposing the jobs into their unitary task structure at a detailed level while including time dependency information for the tasks. This functionality may be handled by an external application that builds a data flow graph. A new data structure is created which stores the assignment of each of the tasks, i.e.; the node identification, job relationship, time sequence of the tasks. This result of this phase is that new holes are added into the data structure as shown in Figures 3(c) and (d).
  • In the present case the cluster is homogeneous and static. Therefore, the width of each of the bins in the bin queue is constant. Also, as each of the task time demands is assumed to be the same, the vertical axis is constant.
  • Figure 3(d) illustrates the completed data structure H that represents all of the idle holes. Each task in the Gantt chart is then rescanned from bottom to the top, i.e.; from the first to the last job, and it is determined whether it is possible to move the tasks down in the schedule. This is done by analyzing the position of the task in comparison to the position of the lowest holes in the data structure. If the hole is lower than the task, the task is dropped into the hole and the data structure is updated. The second phase is completed when all of the tasks have been scanned and, where possible, moved or promoted.
  • It is useful to analyze the performance of this technique in order to gauge the effectiveness in improving the performance of the scheduling system. As a first approximation, to measure the advantage provided by the second phase of the process, the advantage derived from the optimal case is calculated. This optimal situation can be represented by a data structure in which all of the scheduled jobs use all of the available processors all of the time. That is, the width of each job is equal to the width of the Gantt chart. This situation is illustrated in figure 4. At the end of the first phase, the data structure is full with no holes resulting from any non-ideal packing.
  • If the granularity is changed to task-level (see figure 5(a) and (b)), it is possible to look at the internal data structure of the jobs Intuitively, one would think that the data structure would not be empty as the hypothetical nature of distributed computing applications implies that there should exist a master process that is responsible for initiating the other downstream processes as well as a process which detects the termination of the parallel program. This situation is represented in figure 5 where the job in 5(a), which ostensibly resembles full utilization of processors over time, is decomposed into its constituent tasks. When considered at the higher degree of granularity, figure 5(b) shows a task tree for an idealized distributed processor program. The first time period, that is the first row is consumed by the task that initiates the rest of the parallel program. The job finishes with a task that detects the end of the job. These holes cannot be populated by tasks from within the job, as the lower holes cannot 'know' about anything happening after the job starts and the upper holes are created after the job is finished. Thus there is what is known as incompressible idle time at the beginning and end of the job.
  • Given this assumed constraint, the correlation between the job number N, the width of the bin queue L and the advantage between phase 1 and 2 on a quantity denoted gm and the amount of idle time gi can be determined. This is achieved by looking at a sequence of, for example, 4 idealized jobs as shown in figure 6. Here, if we consider that the incompressible idle time cannot be populated by tasks in the same job, but can be used by tasks in a following job (so long as they follow the correct time sequence) the advantage shown in figure 6(c) can be obtained. Here, the jobs are decomposed as shown in figure 6(b) in the first phase, and the tasks then scanned from bottom to top and, for convenience, left to right. Thus, the first task in the second job can populate a hole in the first job, for example the bottom right. This allows tasks in the second row of the second job to populate upper incompressible idle time in the first job. Thus, after allowing the tasks to drop into the array of 'allowed' idle time locations, we are left with the data structure shown in figure 6(c). This approach is of limited use when trying to find a general relation that specifies the advantage, so it is useful to consider task promotion that leaves a symmetrical data structure. This scenario is shown in figure 7 whereby after phase 1, the tasks are rearranged in a symmetrical manner as in figure 7(a) and then compressed as shown in figure 7(c).
  • Considering the first phase in figure 7, the advantage corresponds to the difference in the height between tasks placed vertically and horizontally. This difference is equal to
    Figure 00120001
    As the first and third jobs are symmetric, we obtain:
    Figure 00120002
  • In the context of the example shown in figure 7, the formula provides the following analysis. Initially, the makespan is equal to 15 time units. After refinement, it becomes 9, so the advantage is
    Figure 00120003
  • The advantage obtained in terms of the idle time gi can be shown to be:
    Figure 00130001
  • Here, N%L corresponds to N modulo L. If a job of smaller width is considered as is shown in figure 8 and if it is considered that all jobs will require a minimum of two processors, grouping 'beginner' tasks together at the bottom of the Gantt chart and the ending tasks together, the result in figure 8 is obtained. Contrary to the previous situation, the height is not conserved when an equivalent Gantt chart is built by grouping beginner and ending tasks together. Also, some tasks appear in the middle of the bin queue as the job blocks only incompletely fill the available width.
  • A more complex embodiment can be considered if the situation is considered where the tasks can have non-unitary duration. That is, the expected duration of the tasks or the time required for the tasks to run on a particular processor varies between tasks in the job. This situation is illustrated by the two job structures shown in figure 9. A unitary task job is shown in the left in figure 9. To represent the increase in anticipated task complexity, the radius of the tasks is increased as in the job block in the right of figure 9. As the cluster is assumed to be homogeneous in this case, the task nodes must be distorted to make them 'fit' into the logical delimitation of the processor to which it is assigned. It is assumed that conservation of surface of the task applies, that is, the number of instructions in the task is constant. Thus where the width is divided by n, the height has to be multiplied by the same factor. So, the effect of some tasks requiring more processing power than others results in delaying the execution of following tasks. This situation is represented graphically in figure 10 where a job including non-unitary tasks is internally distorted and queued the cluster represented by the Gannt chart.
  • To implement this possibility, the expected duration of each task must be determined, the data structure must be capable of storing the expected duration for each task and the reallocation of the tasks must check that the reorganized data structure can accommodate the distorted internal structure of the job.
  • Yet another embodiment of the invention extends this functionality to the case where the homogeneity of the cluster is relaxed. That is, the cluster may include computers having processors or other hardware that has the effect of varying of speeds and capacities. Here, it is assumed that the nodes have different speeds reflecting different processor speeds, cache memory or similar. Such a situation is shown in figure 11. To represent this situation, the horizontal axis of the bin queue is distorted to take into account the relative different processing power of the nodes. As the tasks have the same duration, but are performed on processors of different speeds, the 'apparent' task duration changes. Referring to figure 12(a), a job having the internal unitary task structure is queued in the Gantt chart by distorting the tasks horizontally. This results in additional apparent holes that affect the granularity of the job and thus it's amenability to population by tasks in later jobs.
  • In terms of implementing this algorithm, there are two possibilities. The first is that the jobs are always considered as rectangles where the height corresponds to the maximum duration required to perform the parallel program. That is, some tasks assigned to a powerful machine will finish before other tasks even if they are on the same level in the data flow graph. In the second case, it may be considered that the border of the parallel program is not a rectangle but distorted where the upper edge is not a line or segment, but a set of segments.
  • The first situation is relatively straightforward as the first phase of the algorithm is preserved. The second phase is changed taking into account the differences between the computational nodes when a task is to be promoted. This is done by searching for the lowest holes in the data structure. If there is a hole below the task being considered, the duration of the task is multiplied by the coefficient of the processor corresponding to the holes to obtain a duration d'. This is compared with the duration of the hole to determine if there is sufficient space to place the task in it. If there is enough space, the task is moved and the data structure updated. If there is insufficient space, another hole is tested for suitability. If the task cannot be moved, the process moves to the next task and the procedure repeated.
  • The second case is more complex as the first phase algorithm needs to be changed to take into account the shape that represents the jobs. In this case, the job shape will be irregular and thus the packing algorithm will be more complicated thus increasing the complexity of the jobs.
  • In yet a further variation of the invention, the time required by the processor to change its working context is taken into account. This situation is shown in figures 13 and 14. This period equates to the time between saving a tasks data and program of a first job and loading the data and program for the next task or program of another job. Considering a job containing a plurality of tasks, context change time compensation will not be required if there is no alteration in the working context. If there is a change of processor working context, time will be added before the task to allow for the saving and loading of data and the program by the processor. Introducing this "overduration" approximates more closely a real situation as it is possible that context changing may affect the operation of a processor cluster, particularly where the cluster is heterogeneous.
  • In terms of the procedure, the first and second phase steps need only be modified slightly. If the cluster is homogeneous, the first phase is unchanged as there is only a change of working context at the beginning of each job. Therefore, it is only necessary to include an overduration at the beginning of (i.e.; under) the rectangle representing the job. If the cluster is heterogeneous, the overduration will change depending on the power of the node. This can be taken into account by the first phase protocol.
  • For the second phase, it is necessary to add an overduration to the task before checking if it can be moved. This is done by searching for the lowest holes in the data structure. If a hole is below the task, the duration of the task is multiplied by the coefficient of the processor corresponding to the hole to obtain a duration d'. To this is added the overduration thereby obtaining d". This value is compared with the duration of the hole to determine if there is sufficient space in it to place the task. If there is enough space, the task in promoted and the data structure updated. If not, a new hole is found and the procedure is repeated. If the task cannot be moved, the procedure moves to the next task.
  • This procedure is illustrated in figure 13(a) which shows a hole between tasks of the same job. The task in figure 13(b) is to be moved to this hole and it is therefore necessary to add an overduration (figure 13(c)) to take into account the change in the working context of the processor. It is then determined whether or not the distorted task can be placed in the hole, and the data structure updated in response to any move.
  • Another embodiment is where inter-processor communication time is non-zero. This situation is shown in figure 14 whereby the synchronization barrier causes idle time to be introduced between tasks. The invention can be modified to take into account this extra idle time with the reallocation or promotion of tasks depending on the modified internal job structure.
  • It is considered that there are other refinements and modifications to the task scheduling system that take into account cluster behavior which is more realistic and complex. These are considered to be within the scope of the present invention and it is envisaged that such modifications may be included without substantially departing from the principles of the invention.
  • Another possible alternative embodiment of the invention it is useful to consider a multi -user cluster environment incorporating the notion of priority of a parallel application. In the context of the invention, it has been found useful to include the concept of what is known as a "disturbance credit". A disturbance credit reflects the degree of disturbance that a user causes by introducing a higher priority job into the cluster processing stream. A transfer of disturbance credits results from a user-provoked disturbance in the bin queue whereby a 'wronged' user gains these credits when their job is adversely affected.
  • Figure 15 illustrates an example where a new job having a duration of two units is dropped into a bin queue. The simplest case is shown where the arrival of this job does not introduce a disturbance in the previously computed schedule and the starting time of this job equals the earliest time where two processors are available.
  • Figure 16 shows a more complicated situation where the job has an inherently higher priority and is to be promoted or advanced by one time unit. To allow this, there are two possibilities shown in figure 16. In the Gantt chart at the left, job A might be retarded and in the right, jobs B and C might be retarded. If the resulting schedules are analyzed, it can be seen that the second schedule has introduced less disturbance because the total idle times are lower.
  • However, in the second case, the number of jobs moved is greater. In both cases, the jobs which are moved are owned by users who have been disadvantaged. They are compensated by receiving a disturbance credit that reflects the degree of disturbance. This can be quantified according to the width of the job and the vertical displacement which it undergoes. For example, job A which is 5 units wide, when delayed 2 time units, would accumulate a disturbance credit of 10 units.
  • Practically, when a user account is opened, the user receives an amount of disturbance credit which he or she must manage when a job is submitted. A user needs to optimize his or her disturbance credits according to earned times. Thus at an overall level this introduces a further level of optimization to the invention which may be useful in certain contexts. This technique can be extended once it is realized that a job is not necessarily confined to a single rectangular block. If there are multiple blocks for a job, further possibilities for scheduling are feasible.
  • It is also noted that jobs can have a higher degree of granularity. In this case, some jobs can be absorbed by an existing schedule due to the presence of holes having a size equivalent to that of the tasks. Jobs can also be represented by shapes other than rectangles as the set of tasks in the job does not necessarily take up all of the perimeter of the shape. Given this situation, the introduction of the new jobs or set of tasks does not necessarily cause an automatic disturbance. In fact, when such a job is introduced into the scheme, the user can earn new disturbance credits.
  • Thus it can be seen that the invention provides a new approach to task scheduling in clusters. The technique is extensible and can be refined to take into account real-world behavior and attributes of processor clusters such as finite inter-processor communication time and context changing time as well as being amenable to use in heterogeneous clusters. It is envisaged that there are further extensions and modifications that will be developed, however it is considered that these will retain the inventive technique as described herein.
  • In terms of suitable applications it is envisaged that the task scheduling technique would be particularly useful in multi-user processor clusters running applications such as finite element analysis computationally intensive numerical calculations, modeling and statistical analysis of experimental data.
  • Although the invention has been described by way of example and with reference to particular embodiments it is to be understood that modification and/or improvements may be made without departing from the scope of the appended claims.
  • Where in the foregoing description reference has been made to integers or elements having known equivalents, then such equivalents are herein incorporated as if individually set forth.

Claims (16)

  1. A method of optimizing a task-scheduling system comprising decomposing one or more parallel programs into its component tasks and dynamically redistributing the parallel programs tasks into any available idle nodes in such a way that the execution time of the parallel program is decreased.
  2. A method of optimizing a task-scheduling system comprising representing one or more parallel programs, or jobs, as unitary two-dimensional blocks equating to the amount of time that the job will take to execute for a specified number of processors, or nodes, wherein the jobs are queued in an array whose width corresponds to the total number of available nodes in any single time interval, wherein each job is positioned in the array according to a block packing algorithm.
  3. A method of optimizing a task-scheduling system as claimed in claim 2 wherein the block packing algorithm is adapted such that the packing of the jobs at the block level of aggregation is substantiallyoptimized for any arrangement of jobs in the array.
  4. A method of optimizing a task-scheduling system as claimed in any one of claims 2 or 3 further including the step of decomposing one or more jobs into their component time-unitary tasks and dynamically redistributing the tasks into any available idle nodes in such a way as to exploit any idle nodes within the structure of any of the jobs in the array thereby decreasing the execution time of at least one of the jobs.
  5. A method of optimizing a task-scheduling system as claimed in any one of claims 2 to 4 wherein the width of the block represents the needed computational power and the height of the block corresponds to the expected or required duration of the job.
  6. A method of optimizing a task-scheduling system as claimed in any one of claims 2 to 5 wherein in order to represent a homogeneous cluster of nodes, the array is represented by a bin having a horizontal, equally dimensioned array of nodes, and a vertically, equally spaced, time increment.
  7. A method of optimizing a task-scheduling system as claimed in any one of claims 2 to 6 wherein in order to represent a heterogeneous cluster of nodes, the array is represented by a bin having a horizontal, unequally dimensioned, array of nodes, and/or a vertically, unequally spaced, time increment.
  8. A method of creating and/or modifying a data flow graph in a parallel multicomputer system, comprising the steps of:
    characterizing one or more jobs in terms of expected execution duration and computational power needs;
    placing the jobs in a queue, the queue viewed as a two-dimensional array of nodes versus time, according to a bin-packing algorithm;
    locating idle computation periods, or holes, between the jobs;
    scanning each of the jobs in order to build a data flow graph which includes reference to the holes;
    scanning the queue from earliest to the last, and attempting to move each task down in the queue by analyzing the position of each task in comparison to the position of the lowest holes in the data structure and if the hole is lower than the task, moving the task in the queue to fill the hole and thus updating the data flow graph; and
    repeating the scanning process until the maximum number of available holes have been filled and a modified data flow graph has been created.
  9. A method of optimizing a task-scheduling system as claimed in any one of claims 1 to 8 wherein the tasks may have variable duration from time-unitary, thus representing tasks that require varying computational power and when queued, are represented as distorted in the time axis.
  10. A method of optimizing a task-scheduling system as claimed in any one of claims 1 to 9 wherein the horizontal axis of the queue bin representing the nodes is unequally dimensioned, thus representing a heterogeneous cluster of nodes where some nodes have different computational power.
  11. A method of optimizing a task-scheduling system as claimed in any one of claims 1 to 10 wherein the nodes are unequally spaced, the resulting data flow graph includes tasks which have an apparent difference in duration.
  12. A method of optimizing a task-scheduling system as claimed in claim 11 wherein the allocation of tasks to holes is adapted to take into account the apparent time-distortion of the tasks.
  13. A method of optimizing a task-scheduling system as claimed in any preceding claims wherein the data flow graph is adapted to take into account the time required by the processor to change its working context.
  14. A method of optimizing a task-scheduling system as claimed in claim 13 wherein the tasks are distorted in the time axis to allow for over-duration representing the time needed for the processor to change working context.
  15. A network of computing resources configured to operate in accordance with any of claims 1 to 14.
  16. A computing device adapted to operate a task scheduling system in accordance with the method as claimed in any of claims 1 to 14.
EP01410158A 2001-12-07 2001-12-07 Scheduling system, method and apparatus for a cluster Withdrawn EP1318453A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP01410158A EP1318453A1 (en) 2001-12-07 2001-12-07 Scheduling system, method and apparatus for a cluster
US10/313,903 US20030135621A1 (en) 2001-12-07 2002-12-06 Scheduling system method and apparatus for a cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP01410158A EP1318453A1 (en) 2001-12-07 2001-12-07 Scheduling system, method and apparatus for a cluster

Publications (1)

Publication Number Publication Date
EP1318453A1 true EP1318453A1 (en) 2003-06-11

Family

ID=8183138

Family Applications (1)

Application Number Title Priority Date Filing Date
EP01410158A Withdrawn EP1318453A1 (en) 2001-12-07 2001-12-07 Scheduling system, method and apparatus for a cluster

Country Status (2)

Country Link
US (1) US20030135621A1 (en)
EP (1) EP1318453A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2254049A3 (en) * 2009-05-13 2012-01-25 Fujitsu Limited Job scheduling apparatus and job scheduling method
US9158591B2 (en) 2012-10-24 2015-10-13 Metric Holdings, Llc System and method for controlled sharing of consumable resources in a computer cluster
US9491114B2 (en) 2012-10-24 2016-11-08 Messageone, Inc. System and method for optimizing resource utilization in a clustered or cloud environment
CN112053135A (en) * 2020-09-10 2020-12-08 云南昆船数码科技有限公司 Business process management and control system and management and control method based on air freight
CN112148442A (en) * 2020-08-06 2020-12-29 武汉达梦数据库有限公司 ETL flow scheduling method and device
US11455191B2 (en) 2020-10-13 2022-09-27 International Business Machines Corporation Parallel task initialization on dynamic compute resources

Families Citing this family (104)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7325123B2 (en) * 2001-03-22 2008-01-29 Qst Holdings, Llc Hierarchical interconnect for configuring separate interconnects for each group of fixed and diverse computational elements
US7590984B2 (en) * 2003-05-29 2009-09-15 International Business Machines Corporation System and method for balancing a computing load among computing resources in a distributed computing problem
US7467180B2 (en) * 2003-05-29 2008-12-16 International Business Machines Corporation Automatically segmenting and populating a distributed computing problem
US8782654B2 (en) 2004-03-13 2014-07-15 Adaptive Computing Enterprises, Inc. Co-allocating a reservation spanning different compute resources types
EP1735706A4 (en) 2004-03-13 2008-08-13 Cluster Resources Inc System and method of co-allocating a reservation spanning different compute resources types
CA2559603A1 (en) 2004-03-13 2005-09-29 Cluster Resources, Inc. System and method for providing advanced reservations in a compute environment
US7890629B2 (en) * 2004-03-13 2011-02-15 Adaptive Computing Enterprises, Inc. System and method of providing reservation masks within a compute environment
WO2005089239A2 (en) 2004-03-13 2005-09-29 Cluster Resources, Inc. System and method of providing a self-optimizing reservation in space of compute resources
US8057307B2 (en) * 2004-04-08 2011-11-15 International Business Machines Corporation Handling of players and objects in massive multi-player on-line games
US7711977B2 (en) * 2004-04-15 2010-05-04 Raytheon Company System and method for detecting and managing HPC node failure
US20050235055A1 (en) * 2004-04-15 2005-10-20 Raytheon Company Graphical user interface for managing HPC clusters
US8190714B2 (en) * 2004-04-15 2012-05-29 Raytheon Company System and method for computer cluster virtualization using dynamic boot images and virtual disk
US9178784B2 (en) * 2004-04-15 2015-11-03 Raytheon Company System and method for cluster management based on HPC architecture
US8335909B2 (en) 2004-04-15 2012-12-18 Raytheon Company Coupling processors to each other for high performance computing (HPC)
US8336040B2 (en) 2004-04-15 2012-12-18 Raytheon Company System and method for topology-aware job scheduling and backfilling in an HPC environment
US20070266388A1 (en) 2004-06-18 2007-11-15 Cluster Resources, Inc. System and method for providing advanced reservations in a compute environment
US8176490B1 (en) 2004-08-20 2012-05-08 Adaptive Computing Enterprises, Inc. System and method of interfacing a workload manager and scheduler with an identity manager
WO2006053093A2 (en) 2004-11-08 2006-05-18 Cluster Resources, Inc. System and method of providing system jobs within a compute environment
US7356770B1 (en) * 2004-11-08 2008-04-08 Cluster Resources, Inc. System and method of graphically managing and monitoring a compute environment
US7475274B2 (en) * 2004-11-17 2009-01-06 Raytheon Company Fault tolerance and recovery in a high-performance computing (HPC) system
US8244882B2 (en) * 2004-11-17 2012-08-14 Raytheon Company On-demand instantiation in a high-performance computing (HPC) system
US7433931B2 (en) * 2004-11-17 2008-10-07 Raytheon Company Scheduling in a high-performance computing (HPC) system
US7461102B2 (en) * 2004-12-09 2008-12-02 International Business Machines Corporation Method for performing scheduled backups of a backup node associated with a plurality of agent nodes
US7730122B2 (en) * 2004-12-09 2010-06-01 International Business Machines Corporation Authenticating a node requesting another node to perform work on behalf of yet another node
US7917625B1 (en) * 2005-01-14 2011-03-29 Sprint Communications Company L.P. Predictive processing resource level control
US7996455B2 (en) 2005-06-17 2011-08-09 Adaptive Computing Enterprises, Inc. System and method for providing dynamic roll-back reservations in time
US8863143B2 (en) 2006-03-16 2014-10-14 Adaptive Computing Enterprises, Inc. System and method for managing a hybrid compute environment
US9231886B2 (en) 2005-03-16 2016-01-05 Adaptive Computing Enterprises, Inc. Simple integration of an on-demand compute environment
CA2603577A1 (en) 2005-04-07 2006-10-12 Cluster Resources, Inc. On-demand access to compute resources
US8170041B1 (en) * 2005-09-14 2012-05-01 Sandia Corporation Message passing with parallel queue traversal
WO2007038445A2 (en) 2005-09-26 2007-04-05 Advanced Cluster Systems, Llc Clustered computer system
JP4781089B2 (en) * 2005-11-15 2011-09-28 株式会社ソニー・コンピュータエンタテインメント Task assignment method and task assignment device
DE102005057697A1 (en) * 2005-12-02 2007-06-06 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method for computer-aided simulation of technical processes
US20070143759A1 (en) * 2005-12-15 2007-06-21 Aysel Ozgur Scheduling and partitioning tasks via architecture-aware feedback information
US7926057B2 (en) * 2005-12-15 2011-04-12 International Business Machines Corporation Scheduling of computer jobs employing dynamically determined top job party
US8082289B2 (en) 2006-06-13 2011-12-20 Advanced Cluster Systems, Inc. Cluster computing support for application programs
US7844959B2 (en) * 2006-09-29 2010-11-30 Microsoft Corporation Runtime optimization of distributed execution graph
US8201142B2 (en) * 2006-09-29 2012-06-12 Microsoft Corporation Description language for structured graphs
US20080082644A1 (en) * 2006-09-29 2008-04-03 Microsoft Corporation Distributed parallel computing
JP5055942B2 (en) * 2006-10-16 2012-10-24 富士通株式会社 Computer cluster
US20080307422A1 (en) * 2007-06-08 2008-12-11 Kurland Aaron S Shared memory for multi-core processors
US20090064166A1 (en) * 2007-08-28 2009-03-05 Arimilli Lakshminarayana B System and Method for Hardware Based Dynamic Load Balancing of Message Passing Interface Tasks
US8234652B2 (en) * 2007-08-28 2012-07-31 International Business Machines Corporation Performing setup operations for receiving different amounts of data while processors are performing message passing interface tasks
US8127300B2 (en) * 2007-08-28 2012-02-28 International Business Machines Corporation Hardware based dynamic load balancing of message passing interface tasks
US8108876B2 (en) * 2007-08-28 2012-01-31 International Business Machines Corporation Modifying an operation of one or more processors executing message passing interface tasks
US8312464B2 (en) * 2007-08-28 2012-11-13 International Business Machines Corporation Hardware based dynamic load balancing of message passing interface tasks by modifying tasks
US8041773B2 (en) 2007-09-24 2011-10-18 The Research Foundation Of State University Of New York Automatic clustering for self-organizing grids
US8127235B2 (en) 2007-11-30 2012-02-28 International Business Machines Corporation Automatic increasing of capacity of a virtual space in a virtual world
US20090165007A1 (en) * 2007-12-19 2009-06-25 Microsoft Corporation Task-level thread scheduling and resource allocation
US20090164919A1 (en) 2007-12-24 2009-06-25 Cary Lee Bates Generating data for managing encounters in a virtual world environment
US8719801B2 (en) * 2008-06-25 2014-05-06 Microsoft Corporation Timing analysis of concurrent programs
US8812578B2 (en) * 2008-11-07 2014-08-19 International Business Machines Corporation Establishing future start times for jobs to be executed in a multi-cluster environment
EP2437170A4 (en) * 2009-05-25 2013-03-13 Panasonic Corp Multiprocessor system, multiprocessor control method, and multiprocessor integrated circuit
US9015724B2 (en) * 2009-09-23 2015-04-21 International Business Machines Corporation Job dispatching with scheduler record updates containing characteristics combinations of job characteristics
US11720290B2 (en) 2009-10-30 2023-08-08 Iii Holdings 2, Llc Memcached server functionality in a cluster of data processing nodes
US10877695B2 (en) 2009-10-30 2020-12-29 Iii Holdings 2, Llc Memcached server functionality in a cluster of data processing nodes
US9205328B2 (en) 2010-02-18 2015-12-08 Activision Publishing, Inc. Videogame system and method that enables characters to earn virtual fans by completing secondary objectives
US8615764B2 (en) 2010-03-31 2013-12-24 International Business Machines Corporation Dynamic system scheduling
US8806501B2 (en) 2010-03-31 2014-08-12 International Business Machines Corporation Predictive dynamic system scheduling
US9682324B2 (en) 2010-05-12 2017-06-20 Activision Publishing, Inc. System and method for enabling players to participate in asynchronous, competitive challenges
US8707275B2 (en) 2010-09-14 2014-04-22 Microsoft Corporation Simulation environment for distributed programs
US8621070B1 (en) * 2010-12-17 2013-12-31 Netapp Inc. Statistical profiling of cluster tasks
US9268613B2 (en) * 2010-12-20 2016-02-23 Microsoft Technology Licensing, Llc Scheduling and management in a personal datacenter
US9235458B2 (en) 2011-01-06 2016-01-12 International Business Machines Corporation Methods and systems for delegating work objects across a mixed computer environment
US9052968B2 (en) * 2011-01-17 2015-06-09 International Business Machines Corporation Methods and systems for linking objects across a mixed computer environment
US8813088B2 (en) * 2011-05-26 2014-08-19 International Business Machines Corporation Scheduling flows in a multi-platform cluster environment
WO2012164633A1 (en) * 2011-06-03 2012-12-06 Hitachi, Ltd. Storage apparatus and storage apparatus management method
US8584136B2 (en) * 2011-08-15 2013-11-12 Sap Ag Context-aware request dispatching in clustered environments
US8954529B2 (en) 2012-09-07 2015-02-10 Microsoft Corporation Smart data staging based on scheduling policy
US10137376B2 (en) 2012-12-31 2018-11-27 Activision Publishing, Inc. System and method for creating and streaming augmented game sessions
US9323619B2 (en) 2013-03-15 2016-04-26 International Business Machines Corporation Deploying parallel data integration applications to distributed computing environments
US9401835B2 (en) 2013-03-15 2016-07-26 International Business Machines Corporation Data integration on retargetable engines in a networked environment
US9256460B2 (en) 2013-03-15 2016-02-09 International Business Machines Corporation Selective checkpointing of links in a data flow based on a set of predefined criteria
US9329899B2 (en) 2013-06-24 2016-05-03 Sap Se Parallel execution of parsed query based on a concurrency level corresponding to an average number of available worker threads
US9477511B2 (en) 2013-08-14 2016-10-25 International Business Machines Corporation Task-based modeling for parallel data integration
US9189273B2 (en) * 2014-02-28 2015-11-17 Lenovo Enterprise Solutions PTE. LTD. Performance-aware job scheduling under power constraints
US9716738B2 (en) * 2014-05-13 2017-07-25 International Business Machines Corporation Deploying a portion of a streaming application to one or more virtual machines according to cost
US10322351B2 (en) 2014-07-03 2019-06-18 Activision Publishing, Inc. Matchmaking system and method for multiplayer video games
US9886306B2 (en) * 2014-11-21 2018-02-06 International Business Machines Corporation Cross-platform scheduling with long-term fairness and platform-specific optimization
US11351466B2 (en) 2014-12-05 2022-06-07 Activision Publishing, Ing. System and method for customizing a replay of one or more game events in a video game
US10118099B2 (en) 2014-12-16 2018-11-06 Activision Publishing, Inc. System and method for transparently styling non-player characters in a multiplayer video game
US10315113B2 (en) 2015-05-14 2019-06-11 Activision Publishing, Inc. System and method for simulating gameplay of nonplayer characters distributed across networked end user devices
US10471348B2 (en) 2015-07-24 2019-11-12 Activision Publishing, Inc. System and method for creating and sharing customized video game weapon configurations in multiplayer video games via one or more social networks
US10099140B2 (en) 2015-10-08 2018-10-16 Activision Publishing, Inc. System and method for generating personalized messaging campaigns for video game players
US11185784B2 (en) 2015-10-08 2021-11-30 Activision Publishing, Inc. System and method for generating personalized messaging campaigns for video game players
US10232272B2 (en) 2015-10-21 2019-03-19 Activision Publishing, Inc. System and method for replaying video game streams
US10376781B2 (en) 2015-10-21 2019-08-13 Activision Publishing, Inc. System and method of generating and distributing video game streams
US10245509B2 (en) 2015-10-21 2019-04-02 Activision Publishing, Inc. System and method of inferring user interest in different aspects of video game streams
US10300390B2 (en) 2016-04-01 2019-05-28 Activision Publishing, Inc. System and method of automatically annotating gameplay of a video game based on triggering events
US10387454B2 (en) 2016-08-02 2019-08-20 International Business Machines Corporation Method for creating efficient application on heterogeneous big data processing platform
US9880823B1 (en) 2016-09-14 2018-01-30 International Business Machines Corporation Method for translating multi modal execution dependency graph with data interdependencies to efficient application on homogenous big data processing platform
US10500498B2 (en) 2016-11-29 2019-12-10 Activision Publishing, Inc. System and method for optimizing virtual games
US11040286B2 (en) 2017-09-27 2021-06-22 Activision Publishing, Inc. Methods and systems for improved content generation in multiplayer gaming environments
US10974150B2 (en) 2017-09-27 2021-04-13 Activision Publishing, Inc. Methods and systems for improved content customization in multiplayer gaming environments
US10561945B2 (en) 2017-09-27 2020-02-18 Activision Publishing, Inc. Methods and systems for incentivizing team cooperation in multiplayer gaming environments
US10765948B2 (en) 2017-12-22 2020-09-08 Activision Publishing, Inc. Video game content aggregation, normalization, and publication systems and methods
CA3089317C (en) 2018-02-23 2023-10-03 Spidaweb Llc Utility structure modeling and design
US11679330B2 (en) 2018-12-18 2023-06-20 Activision Publishing, Inc. Systems and methods for generating improved non-player characters
CN109710407A (en) * 2018-12-21 2019-05-03 浪潮电子信息产业股份有限公司 Distributed system real-time task scheduling method, device, equipment and storage medium
US11097193B2 (en) 2019-09-11 2021-08-24 Activision Publishing, Inc. Methods and systems for increasing player engagement in multiplayer gaming environments
US11712627B2 (en) 2019-11-08 2023-08-01 Activision Publishing, Inc. System and method for providing conditional access to virtual gaming items
CN111240834B (en) * 2020-01-02 2024-02-02 北京字节跳动网络技术有限公司 Task execution method, device, electronic equipment and storage medium
US11351459B2 (en) 2020-08-18 2022-06-07 Activision Publishing, Inc. Multiplayer video games with virtual characters having dynamically generated attribute profiles unconstrained by predefined discrete values
US11524234B2 (en) 2020-08-18 2022-12-13 Activision Publishing, Inc. Multiplayer video games with virtual characters having dynamically modified fields of view

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5392430A (en) * 1992-10-30 1995-02-21 International Business Machines Hierarchical scheduling method for processing tasks having precedence constraints on a parallel processing system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5701482A (en) * 1993-09-03 1997-12-23 Hughes Aircraft Company Modular array processor architecture having a plurality of interconnected load-balanced parallel processing nodes
US7082606B2 (en) * 2001-05-01 2006-07-25 The Regents Of The University Of California Dedicated heterogeneous node scheduling including backfill scheduling

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5392430A (en) * 1992-10-30 1995-02-21 International Business Machines Hierarchical scheduling method for processing tasks having precedence constraints on a parallel processing system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
AHMAD I ET AL: "ON PARALLELIZING THE MULTIPROCESSOR SCHEDULING PROBLEM", IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, IEEE INC, NEW YORK, US, vol. 10, no. 4, April 1999 (1999-04-01), pages 414 - 431, XP000823958, ISSN: 1045-9219 *
BELKHALE K P ET AL: "TASK SCHEDULING FOR EXPLOITING PARALLELISM AND HIERARCHY IN VLSI CAD ALGORITHMS", IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, IEEE INC. NEW YORK, US, vol. 12, no. 5, 1 May 1993 (1993-05-01), pages 557 - 567, XP000386017, ISSN: 0278-0070 *
EL-REWINI H ET AL: "TASK SCHEDULING IN MULTIPROCESSING SYSTEMS", COMPUTER, IEEE COMPUTER SOCIETY, LONG BEACH., CA, US, US, vol. 28, no. 12, 1 December 1995 (1995-12-01), pages 27 - 37, XP000550425, ISSN: 0018-9162 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2254049A3 (en) * 2009-05-13 2012-01-25 Fujitsu Limited Job scheduling apparatus and job scheduling method
US8429664B2 (en) 2009-05-13 2013-04-23 Fujitsu Limited Job scheduling apparatus and job scheduling method
US9158591B2 (en) 2012-10-24 2015-10-13 Metric Holdings, Llc System and method for controlled sharing of consumable resources in a computer cluster
US9491114B2 (en) 2012-10-24 2016-11-08 Messageone, Inc. System and method for optimizing resource utilization in a clustered or cloud environment
US11321118B2 (en) 2012-10-24 2022-05-03 Messageone, Inc. System and method for controlled sharing of consumable resources in a computer cluster
CN112148442A (en) * 2020-08-06 2020-12-29 武汉达梦数据库有限公司 ETL flow scheduling method and device
CN112148442B (en) * 2020-08-06 2023-07-21 武汉达梦数据库股份有限公司 ETL flow scheduling method and device
CN112053135A (en) * 2020-09-10 2020-12-08 云南昆船数码科技有限公司 Business process management and control system and management and control method based on air freight
CN112053135B (en) * 2020-09-10 2024-02-27 云南昆船设计研究院有限公司 Service flow control system and control method based on air freight
US11455191B2 (en) 2020-10-13 2022-09-27 International Business Machines Corporation Parallel task initialization on dynamic compute resources

Also Published As

Publication number Publication date
US20030135621A1 (en) 2003-07-17

Similar Documents

Publication Publication Date Title
EP1318453A1 (en) Scheduling system, method and apparatus for a cluster
Maiza et al. A survey of timing verification techniques for multi-core real-time systems
Glushkova et al. Mapreduce performance model for Hadoop 2. x
US9442760B2 (en) Job scheduling using expected server performance information
Cho et al. Natjam: Design and evaluation of eviction policies for supporting priorities and deadlines in mapreduce clusters
Yeung et al. Horus: Interference-aware and prediction-based scheduling in deep learning systems
Du et al. Coordinating parallel processes on networks of workstations
US20050081208A1 (en) Framework for pluggable schedulers
Sudarsan et al. ReSHAPE: A framework for dynamic resizing and scheduling of homogeneous applications in a parallel environment
Quan et al. A hierarchical run-time adaptive resource allocation framework for large-scale MPSoC systems
Pandey et al. A heuristic method towards deadline-aware energy-efficient mapreduce scheduling problem in Hadoop YARN
Setia et al. Processor scheduling on multiprogrammed, distributed memory parallel computers
Yang et al. Resource-oriented partitioning for multiprocessor systems with shared resources
Chakrabarti et al. Randomized load balancing for tree-structured computation
Carretero et al. Mapping and scheduling HPC applications for optimizing I/O
Kodase et al. Improving scalability of task allocation and scheduling in large distributed real-time systems using shared buffers
Lee et al. Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systems
Chakrabarti et al. Resource scheduling for parallel database and scientific applications
Ehsan et al. LiPS: A cost-efficient data and task co-scheduler for MapReduce
Hamdi et al. Dynamic load-balancing of image processing applications on clusters of workstations
Qu et al. Improving the energy efficiency and performance of data-intensive workflows in virtualized clouds
Kambatla et al. Optimistic scheduling with service guarantees
CN108009074B (en) Multi-core system real-time evaluation method based on model and dynamic analysis
Kakkar Scheduling techniques for operating systems for medical and IoT devices: A review
De Munck et al. Design and performance evaluation of a conservative parallel discrete event core for GES

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Extension state: AL LT LV MK RO SI

RIN1 Information on inventor provided before grant (corrected)

Inventor name: ROMAGNOLI, EMMANUEL

AKX Designation fees paid
17P Request for examination filed

Effective date: 20040122

RBV Designated contracting states (corrected)

Designated state(s): DE FR GB

REG Reference to a national code

Ref country code: DE

Ref legal event code: 8566

17Q First examination report despatched

Effective date: 20071218

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20080429