WO1999036851A1 - Scalable single system image operating software architecture for a multi-processing computer system - Google Patents

Scalable single system image operating software architecture for a multi-processing computer system Download PDF

Info

Publication number
WO1999036851A1
WO1999036851A1 PCT/US1998/025586 US9825586W WO9936851A1 WO 1999036851 A1 WO1999036851 A1 WO 1999036851A1 US 9825586 W US9825586 W US 9825586W WO 9936851 A1 WO9936851 A1 WO 9936851A1
Authority
WO
WIPO (PCT)
Prior art keywords
computer system
processors
computational
service
processor
Prior art date
Application number
PCT/US1998/025586
Other languages
French (fr)
Inventor
Paul A. Leskar
Jonathan L. Bertoni
Original Assignee
Src Computers, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Src Computers, Inc. filed Critical Src Computers, Inc.
Priority to EP98962876A priority Critical patent/EP1064597A1/en
Priority to JP2000540495A priority patent/JP2002509311A/en
Priority to CA002317132A priority patent/CA2317132A1/en
Publication of WO1999036851A1 publication Critical patent/WO1999036851A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the present invention relates, in general, to the field of multiprocessing computer systems. More particularly, the present invention relates to a scalable single system image ("S3I") operating software architecture for a multi-processing computer system.
  • S3I scalable single system image
  • the connection of more than one homogeneous processor to a single, monolithic central memory is denominated as multi-processing.
  • hardware and software limitations have minimized the total number of physical processors that could access a single memory efficiently. These limitations have reduced the ability to maintain computational efficiency as the number of processors in a computer system increased, thus reducing the overall scalability of the system. With the advent of faster and ever more inexpensive microprocessors, large processor count systems are becoming a hardware reality.
  • Hardware advances have allowed interconnected networks to multiplex hundreds of processors to a single memory with a minimal decrease in efficiency. Because of performance issues relating to software locking primitives in large configurations, operating software scalability that is required to efficiently accommodate large numbers of processors has still eluded system architects.
  • S3I Scalable Single System Image
  • MPP Massively Parallel
  • API/ABI application programming interface/application binary interface
  • S3I scalable single system image
  • the computational processors have no input/output ("I/O") devices mapped directly onto them while the service processors can control attached devices.
  • a single operating system software image presents a single system application programming interface to application programs across all of the processors and a communication mechanism between the computational and service processors allows multiple requests to the service processors and fast, asynchronous interrupt responses to each request.
  • a computational scheduler executes on each computational processor and provides the interface to the service processors where the operating system software executes.
  • the S3I architecture also improves cache performance by reducing cache conflicts. These conflicts are reduced because the operating software no longer forces application data from the cache during the process of servicing application requests. This performance improvement becomes more important as processor speed relative to memory latency increases, as the latest generation of multi-processors demonstrates.
  • the computer system comprises a first plurality of service processors functioning in conjunction with the operating software, the service processors handling all input/output functions for the computer system.
  • a second plurality of computational processors functions in conjunction with a computational scheduler.
  • the operating software and computational scheduler providing a communication medium between the service and computational processors.
  • the computer system comprises a first plurality of service processors and a second plurality of computational processors.
  • Each of the service function in conjunction with the operating software and handle all input/output functionality for the computer system.
  • Each of the computational function in conjunction with a computational scheduler, with the operating software and the computational scheduler providing a communication medium between the service and computational processors.
  • Figs. 1A and 1 B are a functional block system overview illustrating a computer system in accordance with an embodiment of the present invention comprising between 1 and 16 segments coupled together by a like number of trunk lines, each segment containing a number of computational and service processors in addition to memory and a crossbar switch assembly;
  • Fig. 2 is a simplified functional block diagram for the interconnect strategy for the computer system of Figs. 1 A and 1 B
  • Fig. 3 is a simplified functional block diagram of the computer system of Figs. 1 and 2 illustrating a 16 segment system comprising 256 computational processors and 64 service processors for interfacing to the computer program application software through a scalable single system image ("S3I") in accordance with the present invention
  • Fig. 4 is a more detailed block diagram of a single segment computer system corresponding to a portion of the system of Fig. 3 illustrating the interface to the computer program application software through a common API/ABI to the computational processor through the computational scheduler and the service processors through the operating software which provides an interrupt response to the computational scheduler.
  • the exemplary computer system 10 comprises, in pertinent part, any number of interconnected segments 12 0 through 12 15 , although the principles of the present invention are likewise applicable to any scalable system large numbers of processors.
  • the various segments 12 0 through 12 l 5 are coupled through a number of trunk lines 14 0 through 12 15 as will be more fully described hereinafter.
  • Each of the segments 12 comprises a number of functionally differentiated processing elements in the form of service processors 16 0 through 16 3 (service processor 16 0 functions additionally as a master boot device) and computational processors 18 0 through 18 15 .
  • the service processors 16 are coupled to a number of peripheral component interconnect (“PCI") interface modules 20, and in the embodiment shown, each service processor is coupled to two such modules 20 to enable the service processors 16 to carry out all of the I/O functionality of the segment 12.
  • PCI peripheral component interconnect
  • the computer system 10 further includes a serial-to-PCI interface 22 for coupling a system console 24 to at least one of the segments 12 of the computer system 10.
  • the system console 24 is operational for enabling a user of the computer system 10 to download boot information to the computer system 10, configure devices, monitor status, and perform diagnostic functions. Regardless of how many segments 12 are configured in the computer system 10, only one system console 24 is required.
  • the boot device 26 (for example, a JAZ® removable disk computer mass storage device available from Iomega Corporation, Roy UT) is also coupled to the master boot service processor 16 0 through one of the PCI modules 20.
  • the PCI modules 20 coupled to service processors 16 1 through 16 3 are utilized to couple the segment 12 to all other peripheral devices such as, for example, disk arrays 28 0 through 28 5 , any one or more of which may be replaced by, for example, an Ethernet connection.
  • the computer system 10 comprises sophisticated hardware and building blocks which are commodity based, with some enhancements to accommodate the uniqueness of high-performance computing ("HPC").
  • the base unit for the computer system 10 is a segment 12.
  • Each segment 12 contains computation and service processor 18, 16 elements, memory, power supplies, and a crossbar switch assembly.
  • the computer system 10 is "scalable" in that an end user can configure a system that consists of from 1 to 16 interconnected segments 12.
  • Each segment 12 contains 20 total processors: sixteen computational processors 18 and four service processors 16.
  • the computational processors 18 may reside on an individual assembly that contains four processors (e.g. the DeschutesTM microprocessor available from Intel Corporation, Santa Clara, CA) and eight interface chips (i.e. two per computational processor 18).
  • Each computational processor 18 has an internal processor clock rate greater than 300 MHz and a system clock speed greater than 100 MHz, and the interface chips provide the connection between the computational processors 18 and the memory switches that connect to memory as will be described and shown in greater detail hereafter.
  • the service processors 16 may be contained on a service processor assembly, which is responsible for all input and output for the computer system 10.
  • Each of the service processor assemblies contain a processor (the same type as the computational processor 18), two interface chips, two 1 Mbyte I/O buffers, and two bi-directional PCI buses.
  • Each PCI bus has a single connector. All I/O ports have DMA capability with equal priority to processors.
  • the PCI modules 20 serve dual purposes, depending upon which service processor 16 with which they are used.
  • the PCI connectors on the master boot service processor 16 0 are used to connect to the boot device 26 and the system console 24.
  • the PCI modules 20 on the regular service processors 16-, through 16 3 are used for all other peripherals.
  • Some of the supported PCI-based interconnects include small computer systems interface (“SCSI”), fiber distributed data interface (“FDDI”), high performance parallel interface (“HIPPI”) and others.
  • SCSI small computer systems interface
  • FDDI fiber distributed data interface
  • HIPPI high performance parallel interface
  • FIG. 2 the interconnect strategy for the computer system 10 of Figs. 1 A and 1 B is shown in greater detail in an implementation employing sixteen segments 12 0 through 12 15 interconnected by means of sixteen trunk lines 14 0 through 14 ⁇ 5 .
  • a number of memory banks 50 0 through 50 15 each allocated to a respective one of the computational processors 18 0 through 18 15 (resulting in sixteen memory banks 50 per segment 12 and two hundred fifty six memory banks 50 in total for a sixteen segment 12 computer system 10) form a portion of the computer system 10 and are respectively coupled to the trunk lines 14 0 through 14 15 through a like number of memory switches 52 0 through 52 ⁇ 5 .
  • the memory utilized in the memory banks 50 0 through 50 15 may be synchronous static random access memory (“SSRAM”) or other suitable high speed memory devices.
  • SSRAM synchronous static random access memory
  • each of the segments 12 0 through 12 15 includes, for example, twenty processors (four service processors 16 0 through 16 3 and sixteen computational processors 18 0 through 18 15 ) coupled to the trunk lines 14 0 through 14 15 through a corresponding one of a like number of processor switches 54 0 through
  • Each segment 12 interconnects to all other segments 12 through the crossbar switch.
  • the computer system 10 crossbar switch technology enables segments 12 to have uniform memory access times across segment boundaries, as well as within the individual segment 12. It also enables the computer system 10 to employ a single memory access protocol for all the memory in the system.
  • the crossbar switch may utilize high-speed Field Programmable Gate Arrays ("FPGAs")to provide interconnect paths between memory and the processors, regardless of where the processors and memory are physically located. This crossbar switch interconnects every segment 12 and enables the processors and memory located in different segments 12 to communicate with a uniform latency.
  • each crossbar switch has a 1 clock latency per tier, which includes reconfiguration time. For a sixteen segment 12 computer system 10 utilizing three hundred and twenty processors 16, 18 only two crossbar tiers are required.
  • the computer system 10 may preferably utilize SSRAM for the memory banks 50 since it presents a component cycle time of 6 nanoseconds.
  • Each memory bank 50 supports from 64 to 256 Mbytes of memory.
  • Each computational processor 18 supports one memory bank 50 , with each memory bank 50 being 256 bits wide, plus 32 parity bits for a total width of 288 bits.
  • the memory bank 50 size may be designed to match the cache line size, resulting in a single bank access for a full cache line. Read and write memory error correction may be provided by completing parity checks on address and data packets.
  • the parity check for address packets may be the same for both read and write functions wherein new and old parity bits are compared to determine whether or not the memory read or write should continue or abort.
  • a parity check may be done on each of the data packets arriving in memory.
  • Each of these data packets has an 8-bit parity code appended to it.
  • a new 8-bit parity code is generated for the data packet and the old and new parity codes are compared.
  • the comparison results in one of two types of codes: single bit error (“SBE") or double-bit or multi-bit error (“DBE").
  • SBE single bit error
  • DBE multi-bit error
  • the single-bit error may be corrected on the data packet before it is entered in memory.
  • the data packet In the case of a double-bit or multi-bit error, the data packet is not written to memory, but is reported back to the processor, which retries the data packet reference.
  • a memory "read” occurs, each of the data packets read from memory generates an 8-bit parity code. This parity code is forwarded with the data to the processor.
  • the processor performs single error correction and double error detection (“SECDED") on each data packet.
  • SECDED single error correction and double error detection
  • FIG. 3 a simplified illustration of a sixteen segment 12 computer system 10 is shown comprising a total of sixty four service processors 16 and two hundred and fifty six computational processors 18 for a total of three hundred and twenty processors.
  • the service processors 16 and computational processors 18 interface to the computer program application software by means of the scalable single system image ("S3I") layer 60 as will be more fully described hereinafter.
  • the service processors 16 handle all I/O operation as previously described as well as the running of the computer system 10 operating system.
  • the S3I layer 60 resides on top of the application 62 and comprises a common API/ABI layer 64 as well as a computational scheduler 66 and operating software 68 layers.
  • the computational scheduler 66 interfaces to the computational processors 18 0 through 18 15 while the operating software 68 interfaces to the service processors 16 0 through 16 3 .
  • the operating software 68 provides an interrupt response signal 70 to the computational scheduler to control the operation of the computational processors 18 as shown.
  • a number of memory communication buffers 72 receive data from the various computational processors 18 0 through 18 15 and, in turn, supply data to the service processors 16 0 through 16 3 .
  • the preferred implementation of the scalable single system interconnect architecture of the present invention is on a multiprocessor computer system 10 with uniform memory access across common, shared memory comprising a plurality of memory banks 50 0 through 50 N .
  • processor subsystems may be partitioned into two groups: those which have I/O connectivity, i.e. the service processors 16 0 through 16 N and those which have no I/O connectivity, i.e. the computational processors 18 0 through 18 N .
  • the S3I utilizes a software environment consisting of service and computational processors 16, 18.
  • a single copy of the operating system software 68 resides across all processors 16 within the service partition.
  • Separate computational schedulers 66 exist in each computational processor 18.
  • This software model guarantees a global resource sharing paradigm, in conjunction with a strong "single system image”.
  • Highly scalable threads of execution must be present in both the operating system software 68 and user application 62 software design model in conjunction with a high degree of software "multithreading" for efficient utilization of this architecture.
  • user applications 62 are able to initiate and terminate multiple threads of execution in application user space, allowing further elimination of operating system software 68 overhead and increased scalability as the number of physical processors increases.
  • a user application 62 makes requests of the system in the normal, system software mechanism.
  • the application 62 has no awareness of whether the application 62 is executing on a service processor 16 or a computational processor 18. If the request is executed on a service processor 16, the request follows the normal operating system path directly into the operating system software 68 for processing.
  • the request is executed on a computational processor 18, the request is processed by the computational scheduler 66.
  • the thread making the request is placed on the run queue of the service processor 16 and the computational processor 18 issues a request to the service processor 16 to examine the queues.
  • the operating software 68 executing in the service processor 16 examines the request queue and processes the request as if it had originated on the service processor 16.
  • the requesting computational processor 18 is either suspended until interrupt acknowledge or placed into the general scheduling tables maintained by the operating software 68 for dispatching of additional work.
  • the service processor 16 acknowledges the original request by queuing an application thread for execution on a computational processor 18 and restoring the original application context, which places the application 62 back into execution.
  • any physical processor 16 in the service partition will be able to execute within the operating system 68 simultaneously.
  • Critical data regions may be locked utilizing standard locking primitives currently found in the underlying hardware.
  • the base component of the software environment is the operating system software 68.
  • the computer system 10 may use, in a preferred embodiment, an enhanced version of the SunSoft® Solaris® 2.6 operating system available from Sun Microsystems, Inc. Palo Alto, CA which is modified to achieve better performance across multiple computational and service processors 18, 16 by limiting the operating system to execute only in the service processors 16.
  • this technique is further accomplished by utilizing a computational scheduler 66 to communicate operating system requests and scheduling information between the service and computational processors 16, 18.
  • a single copy of the operating system software 68 executes in all service processors 16, while separate computational schedulers 66 reside in each computational processor 18.
  • This software model provides for global resource sharing in conjunction with a strong scalable single system image.
  • the computer system 10 of the present invention provides for highly scalable threads of execution in both the operating system software 68 and user application software 62, in conjunction with a high degree of software "multithreading". While there have been described above the principles of the present invention in conjunction with a specific computer architecture, any number of service and/or computational processors may be utilized and it is to be clearly understood that the foregoing description is made only by way of example and not as a limitation to the scope of the invention.

Abstract

A scalable single system image ('S3I') operating system architecture for a multi-processing computer system having separate service (16) and computational (18) processors and wherein a unique distinction exists between the processors but both have shared, common access to all of the computer system memory. The computational processors have no input/output ('I/O') devices mapped directly onto them while the service processors have full I/O capability. A single operating system software image presents a single system application programming interface to application programs across all of the processors and a communication mechanism between the computational and service processors allows multiple requests to the service processors and fast, asynchronous interrupt responses to each request. A computational scheduler executes on each computational processor and provides the interface to the service processors where the operating system software executes.

Description

SCALABLE SINGLE SYSTEM IMAGE OPERATING SOFTWARE ARCHITECTURE FOR A MULTI-PROCESSING COMPUTER SYSTEM
BACKGROUND OF THE INVENTION
The present invention relates, in general, to the field of multiprocessing computer systems. More particularly, the present invention relates to a scalable single system image ("S3I") operating software architecture for a multi-processing computer system. The connection of more than one homogeneous processor to a single, monolithic central memory is denominated as multi-processing. Until recently, hardware and software limitations have minimized the total number of physical processors that could access a single memory efficiently. These limitations have reduced the ability to maintain computational efficiency as the number of processors in a computer system increased, thus reducing the overall scalability of the system. With the advent of faster and ever more inexpensive microprocessors, large processor count systems are becoming a hardware reality.
Hardware advances have allowed interconnected networks to multiplex hundreds of processors to a single memory with a minimal decrease in efficiency. Because of performance issues relating to software locking primitives in large configurations, operating software scalability that is required to efficiently accommodate large numbers of processors has still eluded system architects.
SUMMARY OF THE INVENTION In response to these computer system architecture efficiency issues, SRC Computers, Inc., Colorado Springs, Colorado, has developed an affordable, high performance computer that supplants traditional high performance supercomputers by providing a system with large shared memory utilizing fast commodity processors resulting in high bandwidth input/output ("I/O") functionality. This has been accomplished by creating a balance among processor speed, memory size, and I/O bandwidth to achieve a high degree of efficiency between the system hardware and software, resulting in a greater degree of parallelism.
Disclosed herein is a computer system utilizing a Scalable Single System Image ("S3I") operating software architecture. This architecture allows the efficient scalability of operating software from a few processors to hundreds of processors in a multi-processor environment that effectively presents a single system image. The S3I architecture of the present invention virtually obviates the need for Massively Parallel ("MPP") architectures simply because the need for distributed memories and message passing synchronization primitives no longer exists. A simple, easy to program, flat memory model replaces message passing. A common application programming interface/application binary interface ("API/ABI") is presented to all applications. Performance is much improved both computationally and from an input/output ("I/O") standpoint because of the elimination of the message passing paradigm.
Disclosed herein is a scalable single system image ("S3I") operating system architecture for a multi-processing computer system having separate service and computational processors and wherein a unique distinction exists between the processors but both have shared, common access to all of the computer system memory. The computational processors have no input/output ("I/O") devices mapped directly onto them while the service processors can control attached devices. A single operating system software image presents a single system application programming interface to application programs across all of the processors and a communication mechanism between the computational and service processors allows multiple requests to the service processors and fast, asynchronous interrupt responses to each request. A computational scheduler executes on each computational processor and provides the interface to the service processors where the operating system software executes. The S3I architecture also improves cache performance by reducing cache conflicts. These conflicts are reduced because the operating software no longer forces application data from the cache during the process of servicing application requests. This performance improvement becomes more important as processor speed relative to memory latency increases, as the latest generation of multi-processors demonstrates.
Particularly disclosed herein is a multi-processor computer system including operating software. The computer system comprises a first plurality of service processors functioning in conjunction with the operating software, the service processors handling all input/output functions for the computer system. A second plurality of computational processors functions in conjunction with a computational scheduler. The operating software and computational scheduler providing a communication medium between the service and computational processors.
Further disclosed herein is a multi-processor computer system including operating software. The computer system comprises a first plurality of service processors and a second plurality of computational processors. Each of the service function in conjunction with the operating software and handle all input/output functionality for the computer system. Each of the computational function in conjunction with a computational scheduler, with the operating software and the computational scheduler providing a communication medium between the service and computational processors. BRIEF DESCRIPTION OF THE DRAWINGS
The aforementioned and other features and objects of the present invention and the manner of attaining them will become more apparent and the invention itself will be best understood by reference to the following description of a preferred embodiment taken in conjunction with the accompanying drawings, wherein:
Figs. 1A and 1 B are a functional block system overview illustrating a computer system in accordance with an embodiment of the present invention comprising between 1 and 16 segments coupled together by a like number of trunk lines, each segment containing a number of computational and service processors in addition to memory and a crossbar switch assembly;
Fig. 2 is a simplified functional block diagram for the interconnect strategy for the computer system of Figs. 1 A and 1 B; Fig. 3 is a simplified functional block diagram of the computer system of Figs. 1 and 2 illustrating a 16 segment system comprising 256 computational processors and 64 service processors for interfacing to the computer program application software through a scalable single system image ("S3I") in accordance with the present invention;
Fig. 4 is a more detailed block diagram of a single segment computer system corresponding to a portion of the system of Fig. 3 illustrating the interface to the computer program application software through a common API/ABI to the computational processor through the computational scheduler and the service processors through the operating software which provides an interrupt response to the computational scheduler. DESCRIPTION OF A PREFERRED EMBODIMENT
With reference now to Figs. 1 A and 1 B, a multi-processing computer system 10 in accordance with the present invention is shown. The exemplary computer system 10 comprises, in pertinent part, any number of interconnected segments 120 through 1215, although the principles of the present invention are likewise applicable to any scalable system large numbers of processors. The various segments 120 through 12l 5 are coupled through a number of trunk lines 140 through 1215 as will be more fully described hereinafter. Each of the segments 12 comprises a number of functionally differentiated processing elements in the form of service processors 160 through 163 (service processor 160 functions additionally as a master boot device) and computational processors 180 through 1815. The service processors 16 are coupled to a number of peripheral component interconnect ("PCI") interface modules 20, and in the embodiment shown, each service processor is coupled to two such modules 20 to enable the service processors 16 to carry out all of the I/O functionality of the segment 12.
The computer system 10 further includes a serial-to-PCI interface 22 for coupling a system console 24 to at least one of the segments 12 of the computer system 10. The system console 24 is operational for enabling a user of the computer system 10 to download boot information to the computer system 10, configure devices, monitor status, and perform diagnostic functions. Regardless of how many segments 12 are configured in the computer system 10, only one system console 24 is required.
The boot device 26 (for example, a JAZ® removable disk computer mass storage device available from Iomega Corporation, Roy UT) is also coupled to the master boot service processor 160 through one of the PCI modules 20. The PCI modules 20 coupled to service processors 161 through 163 are utilized to couple the segment 12 to all other peripheral devices such as, for example, disk arrays 280 through 285, any one or more of which may be replaced by, for example, an Ethernet connection. The computer system 10 comprises sophisticated hardware and building blocks which are commodity based, with some enhancements to accommodate the uniqueness of high-performance computing ("HPC"). On the hardware side, the base unit for the computer system 10 is a segment 12. Each segment 12 contains computation and service processor 18, 16 elements, memory, power supplies, and a crossbar switch assembly. The computer system 10 is "scalable" in that an end user can configure a system that consists of from 1 to 16 interconnected segments 12. Each segment 12 contains 20 total processors: sixteen computational processors 18 and four service processors 16. In a preferred embodiment, the computational processors 18 may reside on an individual assembly that contains four processors (e.g. the Deschutes™ microprocessor available from Intel Corporation, Santa Clara, CA) and eight interface chips (i.e. two per computational processor 18). Each computational processor 18 has an internal processor clock rate greater than 300 MHz and a system clock speed greater than 100 MHz, and the interface chips provide the connection between the computational processors 18 and the memory switches that connect to memory as will be described and shown in greater detail hereafter. The service processors 16 may be contained on a service processor assembly, which is responsible for all input and output for the computer system 10. Each of the service processor assemblies contain a processor (the same type as the computational processor 18), two interface chips, two 1 Mbyte I/O buffers, and two bi-directional PCI buses. Each PCI bus has a single connector. All I/O ports have DMA capability with equal priority to processors. The PCI modules 20 serve dual purposes, depending upon which service processor 16 with which they are used. The PCI connectors on the master boot service processor 160 are used to connect to the boot device 26 and the system console 24. The PCI modules 20 on the regular service processors 16-, through 163 are used for all other peripherals. Some of the supported PCI-based interconnects include small computer systems interface ("SCSI"), fiber distributed data interface ("FDDI"), high performance parallel interface ("HIPPI") and others. Each PCI bus has a corresponding commodity-based host adapter.
The separation of service functions from computing functions allows for concurrent execution of numeric processing and the servicing of operating system duties and external peripherals. With reference additionally now to Fig. 2, the interconnect strategy for the computer system 10 of Figs. 1 A and 1 B is shown in greater detail in an implementation employing sixteen segments 120 through 1215 interconnected by means of sixteen trunk lines 140 through 14ι5. As shown, a number of memory banks 500 through 5015, each allocated to a respective one of the computational processors 180 through 1815 (resulting in sixteen memory banks 50 per segment 12 and two hundred fifty six memory banks 50 in total for a sixteen segment 12 computer system 10) form a portion of the computer system 10 and are respectively coupled to the trunk lines 140 through 1415 through a like number of memory switches 520 through 52ι5. The memory utilized in the memory banks 500 through 5015 may be synchronous static random access memory ("SSRAM") or other suitable high speed memory devices. Also as shown, each of the segments 120 through 1215 includes, for example, twenty processors (four service processors 160 through 163 and sixteen computational processors 180 through 1815) coupled to the trunk lines 140 through 1415 through a corresponding one of a like number of processor switches 540 through
Each segment 12 interconnects to all other segments 12 through the crossbar switch. The computer system 10 crossbar switch technology enables segments 12 to have uniform memory access times across segment boundaries, as well as within the individual segment 12. It also enables the computer system 10 to employ a single memory access protocol for all the memory in the system. The crossbar switch may utilize high-speed Field Programmable Gate Arrays ("FPGAs")to provide interconnect paths between memory and the processors, regardless of where the processors and memory are physically located. This crossbar switch interconnects every segment 12 and enables the processors and memory located in different segments 12 to communicate with a uniform latency. In a preferred embodiment, each crossbar switch has a 1 clock latency per tier, which includes reconfiguration time. For a sixteen segment 12 computer system 10 utilizing three hundred and twenty processors 16, 18 only two crossbar tiers are required.
As mentioned previously, the computer system 10 may preferably utilize SSRAM for the memory banks 50 since it presents a component cycle time of 6 nanoseconds. Each memory bank 50 supports from 64 to 256 Mbytes of memory. Each computational processor 18 supports one memory bank 50 , with each memory bank 50 being 256 bits wide, plus 32 parity bits for a total width of 288 bits. In addition, the memory bank 50 size may be designed to match the cache line size, resulting in a single bank access for a full cache line. Read and write memory error correction may be provided by completing parity checks on address and data packets.
The parity check for address packets may be the same for both read and write functions wherein new and old parity bits are compared to determine whether or not the memory read or write should continue or abort. When a memory "write" occurs, a parity check may be done on each of the data packets arriving in memory. Each of these data packets has an 8-bit parity code appended to it. As the data packet arrives in memory, a new 8-bit parity code is generated for the data packet and the old and new parity codes are compared. The comparison results in one of two types of codes: single bit error ("SBE") or double-bit or multi-bit error ("DBE"). The single-bit error may be corrected on the data packet before it is entered in memory. In the case of a double-bit or multi-bit error, the data packet is not written to memory, but is reported back to the processor, which retries the data packet reference. When a memory "read" occurs, each of the data packets read from memory generates an 8-bit parity code. This parity code is forwarded with the data to the processor. The processor performs single error correction and double error detection ("SECDED") on each data packet.
With reference additionally now to Fig. 3, a simplified illustration of a sixteen segment 12 computer system 10 is shown comprising a total of sixty four service processors 16 and two hundred and fifty six computational processors 18 for a total of three hundred and twenty processors. The service processors 16 and computational processors 18 interface to the computer program application software by means of the scalable single system image ("S3I") layer 60 as will be more fully described hereinafter. The service processors 16 handle all I/O operation as previously described as well as the running of the computer system 10 operating system.
With reference additionally now to Fig. 4, a more detailed illustration of the S3I layer 60 is shown as it relates to the computer program code application 62 software and the service and computational processors 16, 18. The S3I layer 60 resides on top of the application 62 and comprises a common API/ABI layer 64 as well as a computational scheduler 66 and operating software 68 layers. The computational scheduler 66 interfaces to the computational processors 180 through 1815 while the operating software 68 interfaces to the service processors 160 through 163. The operating software 68 provides an interrupt response signal 70 to the computational scheduler to control the operation of the computational processors 18 as shown. A number of memory communication buffers 72 receive data from the various computational processors 180 through 1815 and, in turn, supply data to the service processors 160 through 163. As previously described and shown, the preferred implementation of the scalable single system interconnect architecture of the present invention is on a multiprocessor computer system 10 with uniform memory access across common, shared memory comprising a plurality of memory banks 500 through 50N. As also previously described, processor subsystems may be partitioned into two groups: those which have I/O connectivity, i.e. the service processors 160 through 16N and those which have no I/O connectivity, i.e. the computational processors 180 through 18N.
The S3I utilizes a software environment consisting of service and computational processors 16, 18. A single copy of the operating system software 68 resides across all processors 16 within the service partition. Separate computational schedulers 66 exist in each computational processor 18. This software model guarantees a global resource sharing paradigm, in conjunction with a strong "single system image". Highly scalable threads of execution must be present in both the operating system software 68 and user application 62 software design model in conjunction with a high degree of software "multithreading" for efficient utilization of this architecture.
Because of a strong shared memory hardware architecture of the computer system 10, this choice of operating software 68 functionality eliminates the need for a message passing paradigm for communication between processors or hardware boundaries. However, some level of simple communication is required between computational and service processors 18, 16. No physical hardware boundaries are apparent to the application 62 program. All physical processors 16, 18 present the same Application Programming Interface [API] to the end user.
To complete the scalability model, user applications 62 are able to initiate and terminate multiple threads of execution in application user space, allowing further elimination of operating system software 68 overhead and increased scalability as the number of physical processors increases.
Under the S3I architecture, a user application 62 makes requests of the system in the normal, system software mechanism. The application 62 has no awareness of whether the application 62 is executing on a service processor 16 or a computational processor 18. If the request is executed on a service processor 16, the request follows the normal operating system path directly into the operating system software 68 for processing.
If the request is executed on a computational processor 18, the request is processed by the computational scheduler 66. The thread making the request is placed on the run queue of the service processor 16 and the computational processor 18 issues a request to the service processor 16 to examine the queues. The operating software 68 executing in the service processor 16 examines the request queue and processes the request as if it had originated on the service processor 16.
The requesting computational processor 18 is either suspended until interrupt acknowledge or placed into the general scheduling tables maintained by the operating software 68 for dispatching of additional work. The service processor 16 acknowledges the original request by queuing an application thread for execution on a computational processor 18 and restoring the original application context, which places the application 62 back into execution.
It should be noted that outside of the addition of a very small computational scheduler 66 and a small, additional component to the base operating software 68, no major operating software 68 modifications are required. Any physical processor 16 in the service partition will be able to execute within the operating system 68 simultaneously. Critical data regions may be locked utilizing standard locking primitives currently found in the underlying hardware. The base component of the software environment is the operating system software 68. The computer system 10 may use, in a preferred embodiment, an enhanced version of the SunSoft® Solaris® 2.6 operating system available from Sun Microsystems, Inc. Palo Alto, CA which is modified to achieve better performance across multiple computational and service processors 18, 16 by limiting the operating system to execute only in the service processors 16. As previously described, this technique is further accomplished by utilizing a computational scheduler 66 to communicate operating system requests and scheduling information between the service and computational processors 16, 18.
Stated another way, a single copy of the operating system software 68 executes in all service processors 16, while separate computational schedulers 66 reside in each computational processor 18. This software model provides for global resource sharing in conjunction with a strong scalable single system image. The computer system 10 of the present invention provides for highly scalable threads of execution in both the operating system software 68 and user application software 62, in conjunction with a high degree of software "multithreading". While there have been described above the principles of the present invention in conjunction with a specific computer architecture, any number of service and/or computational processors may be utilized and it is to be clearly understood that the foregoing description is made only by way of example and not as a limitation to the scope of the invention. Particularly, it is recognized that the teachings of the foregoing disclosure will suggest other modifications to those persons skilled in the relevant art. Such modifications may involve other features which are already known per se and which may be used instead of or in addition to features already described herein. Although claims have been formulated in this application to particular combinations of features, it should be understood that the scope of the disclosure herein also includes any novel feature or any novel combination of features disclosed either explicitly or implicitly or any generalization or modification thereof which would be apparent to persons skilled in the relevant art, whether or not such relates to the same invention as presently claimed in any claim and whether or not it mitigates any or all of the same technical problems as confronted by the present invention. The applicants hereby reserve the right to formulate new claims to such features and/or combinations of such features during the prosecution of the present application or of any further application derived therefrom.

Claims

WHAT IS CLAIMED IS:
1. A multi-processor computer system including operating software, said computer system comprising: a first plurality of service processors functioning in conjunction with said operating software, said service processors handling all input/output functionality for said computer system; and a second plurality of computational processors, said computational processors functioning in conjunction with a computational scheduler, said operating software and computational scheduler providing a communication medium between said service and computational processors.
2. The multi-processor computer system of claim 1 wherein said communication medium is operational to enable multiple input/output requests to said service processors.
3. The multi-processor computer system of claim 2 wherein said communication medium is operational to enable asynchronous responses to said input/output requests.
4. The multi-processor computer system of claim 1 wherein said service and computational processors have shared access to a plurality of associated memory banks.
5. The multi-processor computer system of claim 1 further comprising: a common application programming interface operationally coupled to said computational scheduler and said operating software, said application programming interface, said computational scheduler and said operating software comprising a single system image application programming interface.
6. The multi-processor computer system of claim 5 wherein said single system image application programming interface is scalable across said first plurality of service processors and said second plurality of computational processors.
7. The multi-processor computer system of claim 1 further comprising: a system console coupled to at least one of said first plurality of service processors for enabling a user of said computer system to interact therewith.
8. The multi-processor computer system of claim 7 further comprising: a boot device coupled to said at least one of said first plurality of service processors for booting said computer system.
9. The multi-processor computer system of claim 1 further comprising: at least one computer mass storage device coupled to at least one of said first plurality of service processors.
10. The multi-processor computer system of claim 1 wherein said first plurality of service processors and said second plurality of computational processors comprise a first computer system segment and said computer system further comprises at least one additional computer system segment comprising: a third plurality of service processors functioning in conjunction with said operating software; and a fourth plurality of computational processors functioning in conjunction with said computational scheduler.
1 1. A multi-processor computer system including operating software, said computer system comprising: a first plurality of service processors and a second plurality of computational processors, each of said service processors functioning in conjunction with said operating software and handling all input/output functionality for said computer system and each of said computational processors functioning in conjunction with a computational scheduler, said operating software and computational scheduler providing a communication medium between said service and computational processors.
12. The multi-processor computer system of claim 1 1 wherein said communication medium is operational to enable multiple input/output requests to said service processors.
13. The multi-processor computer system of claim 12 wherein said communication medium is operational to enable asynchronous responses to said input/output requests.
14. The multi-processor computer system of claim 1 1 wherein said service and computational have shared access to a plurality of associated memory banks.
15. The multi-processor computer system of claim 1 1 further comprising: a common application programming interface operationally coupled to said computational scheduler and said operating software, said application programming interface, said computational scheduler and said operating software comprising a single system image application programming interface.
16. The multi-processor computer system of claim 15 wherein said single system image application programming interface is scalable across said n computing segments.
17. The multi-processor computer system of claim 1 1 further comprising: a system console coupled to at least one of said first plurality of service processors of one of said n computing segments for enabling a user of said computer system to interact therewith.
18. The multi-processor computer system of claim 17 further comprising: a boot device coupled to said at least one of said first plurality of service processors of one of said n computing segments for booting said computer system.
19. The multi-processor computer system of claim 1 1 further comprising: at least one computer mass storage device coupled to at least one of said first plurality of service processors of one of said n computing segments.
PCT/US1998/025586 1998-01-20 1998-12-03 Scalable single system image operating software architecture for a multi-processing computer system WO1999036851A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP98962876A EP1064597A1 (en) 1998-01-20 1998-12-03 Scalable single system image operating software architecture for a multi-processing computer system
JP2000540495A JP2002509311A (en) 1998-01-20 1998-12-03 Scalable single-system image operating software for multi-processing computer systems
CA002317132A CA2317132A1 (en) 1998-01-20 1998-12-03 Scalable single system image operating software architecture for a multi-processing computer system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US887198A 1998-01-20 1998-01-20
US09/008,871 1998-01-20

Publications (1)

Publication Number Publication Date
WO1999036851A1 true WO1999036851A1 (en) 1999-07-22

Family

ID=21734176

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1998/025586 WO1999036851A1 (en) 1998-01-20 1998-12-03 Scalable single system image operating software architecture for a multi-processing computer system

Country Status (4)

Country Link
EP (1) EP1064597A1 (en)
JP (1) JP2002509311A (en)
CA (1) CA2317132A1 (en)
WO (1) WO1999036851A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1834237A1 (en) * 2004-12-30 2007-09-19 Koninklijke Philips Electronics N.V. Data processing arrangement

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5109512A (en) * 1990-05-31 1992-04-28 International Business Machines Corporation Process for dispatching tasks among multiple information processors
US5325526A (en) * 1992-05-12 1994-06-28 Intel Corporation Task scheduling in a multicomputer system
US5675795A (en) * 1993-04-26 1997-10-07 International Business Machines Corporation Boot architecture for microkernel-based systems

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5109512A (en) * 1990-05-31 1992-04-28 International Business Machines Corporation Process for dispatching tasks among multiple information processors
US5325526A (en) * 1992-05-12 1994-06-28 Intel Corporation Task scheduling in a multicomputer system
US5675795A (en) * 1993-04-26 1997-10-07 International Business Machines Corporation Boot architecture for microkernel-based systems

Also Published As

Publication number Publication date
EP1064597A1 (en) 2001-01-03
CA2317132A1 (en) 1999-07-22
JP2002509311A (en) 2002-03-26

Similar Documents

Publication Publication Date Title
US6249830B1 (en) Method and apparatus for distributing interrupts in a scalable symmetric multiprocessor system without changing the bus width or bus protocol
US7743191B1 (en) On-chip shared memory based device architecture
EP1058890B1 (en) System and method for dynamic priority conflict resolution in a multi-processor computer system having shared memory resources
US6295573B1 (en) Point-to-point interrupt messaging within a multiprocessing computer system
JP3381732B2 (en) Interrupt steering system for multiprocessor computer.
Rettberg et al. The Monarch parallel processor hardware design
US6282583B1 (en) Method and apparatus for memory access in a matrix processor computer
US7421524B2 (en) Switch/network adapter port for clustered computers employing a chain of multi-adaptive processors in a dual in-line memory module format
US5944809A (en) Method and apparatus for distributing interrupts in a symmetric multiprocessor system
EP0737923A1 (en) Interrupt system in microprocessor
US6044207A (en) Enhanced dual port I/O bus bridge
EP0398696A2 (en) Servicing interrupts in a data processing system
WO1991020044A1 (en) Communication exchange system for a multiprocessor system
JP2501419B2 (en) Multiprocessor memory system and memory reference conflict resolution method
JPH02236735A (en) Data processing method and apparatus
US6996645B1 (en) Method and apparatus for spawning multiple requests from a single entry of a queue
US5909574A (en) Computing system with exception handler and method of handling exceptions in a computing system
Giloi SUPRENUM: A trendsetter in modern supercomputer development
US5590338A (en) Combined multiprocessor interrupt controller and interprocessor communication mechanism
US20030229721A1 (en) Address virtualization of a multi-partitionable machine
US20060026214A1 (en) Switching from synchronous to asynchronous processing
US6742072B1 (en) Method and apparatus for supporting concurrent system area network inter-process communication and I/O
EP1064597A1 (en) Scalable single system image operating software architecture for a multi-processing computer system
Tuazon et al. Mark IIIfp hypercube concurrent processor architecture
Männer et al. The Heidelberg POLYP—A flexible and fault-tolerant poly-processor

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CA JP MX

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
ENP Entry into the national phase

Ref document number: 2317132

Country of ref document: CA

Ref country code: CA

Ref document number: 2317132

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 1998962876

Country of ref document: EP

ENP Entry into the national phase

Ref country code: JP

Ref document number: 2000 540495

Kind code of ref document: A

Format of ref document f/p: F

WWP Wipo information: published in national office

Ref document number: 1998962876

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1998962876

Country of ref document: EP