WO2017135953A1 - Shared memory access - Google Patents

Shared memory access Download PDF

Info

Publication number
WO2017135953A1
WO2017135953A1 PCT/US2016/016565 US2016016565W WO2017135953A1 WO 2017135953 A1 WO2017135953 A1 WO 2017135953A1 US 2016016565 W US2016016565 W US 2016016565W WO 2017135953 A1 WO2017135953 A1 WO 2017135953A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
shared
access
compute nodes
memory
Prior art date
Application number
PCT/US2016/016565
Other languages
French (fr)
Inventor
Joaquim Gomes Da Costa Eulalio De SOUZA
Original Assignee
Hewlett Packard Enterprise Development Lp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Enterprise Development Lp filed Critical Hewlett Packard Enterprise Development Lp
Priority to PCT/US2016/016565 priority Critical patent/WO2017135953A1/en
Publication of WO2017135953A1 publication Critical patent/WO2017135953A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/176Support for shared access to files; File sharing support

Definitions

  • Computing systems can utilize hardware, software, and/or logic to execute a number of operations.
  • the computing system can include a number of nodes that access a number of memories. Each of the number of nodes can be associated with a particular portion of the number of memories. When a particular node wants to access memory that is associated with an additional node, the particular node can
  • the particular node can use network communication (such as TCP/IP) based reads to access data from the additional node.
  • the number of nodes can use a disk-centric shared-nothing distributed file system (such as Hadoop Distributed File System (HDFS)) to access memory at the number of nodes.
  • HDFS Hadoop Distributed File System
  • Figure 1 illustrates a diagram of an example of a system for shared memory access consistent with the present disclosure.
  • Figure 2 illustrates a diagram of an example of a system for shared memory access consistent with the present disclosure.
  • Figure 3 illustrates a diagram of an example of a system for shared memory access consistent with the present disclosure.
  • Figure 4 illustrates a diagram of an example of a computing system for shared memory access consistent with the present disclosure.
  • Figure 5 illustrates a flow chart of an example computing device for shared memory access consistent with the present disclosure.
  • a computing framework can include a cluster of hardware and a number of memories.
  • a number of nodes (e.g., computing nodes) in the computing framework can be used to access the memory (e.g., non-volatile memory) through hardware.
  • Each of the number of nodes can access the memory as shared memory.
  • the memory can be shared across the number of nodes without partitioning particular portions of memory to a particular node.
  • a local file system (FS) e.g., such as Ext4 + DAX
  • the local FS can be used instead of a Hadoop Distributed File Share (HDFS) protocol
  • the memory can be formed of and/or include memristors that are shared with across the nodes.
  • HDFS Hadoop Distributed File Share
  • the number of nodes can access the memory through a number of layers of protocol and/or applications.
  • the number of layers can include a dataflow engine, a Java interface (e.g., Java Virtual Machine (JVM) interface), an operating system (OS), file system, and/or block device (BD).
  • JVM Java Virtual Machine
  • OS operating system
  • BD block device
  • the number of nodes can access the memory by bypassing the JVM interface and avoid using Java language.
  • the JVM interface can be bypassed by using a JVM "shim" ("JNI-fs") which passes to the file system (FS) (e.g., Ext4 + DAX) without further accessing JAVA, allowing the access of the data in native code (e.g., such as native machine code).
  • FS file system
  • native code e.g., such as native machine code
  • Figure 1 illustrates a diagram of an example of a system for shared memory access consistent with the present disclosure.
  • the system in Figure 1 illustrates a number of nodes ("Node 1 ,” “Node 2,” “Node 3") 1 10-1 , 1 10-2, ... , 1 10-N that can be compute nodes, referenced herein as number of nodes 1 10.
  • a compute node can refer to a desktop or server system, a virtual machine (VM), a blade in a blade server arrangement, and/or any other suitable computing resource or device.
  • the number of nodes 1 10-1 , 1 10-1 , ... , 1 10-N can communicate with a number of memories (e.g. , non-volatile memories) 128-1 , 128-2, 128-3 etc. , referenced herein as memories 128.
  • the memories 128 can include a number of non-volatile memory (such as memristors, in some examples).
  • the number of nodes 1 10 can transfer data from at least one of the memories 128 through a number of communication layers.
  • the communication layers can include applications (“APP") 1 12-1 , 1 12-2, 1 12-3, dataflow engines 1 14-1 , 1 14-2, 1 14-3, and Hadoop Distributed File Systems 1 6-1 , 1 16-2, 1 16-3.
  • the dataflow engine 1 14-1 refers to a cluster computing framework designed for a duster of hardware in a particular computing architecture environment.
  • the particular computing architecture is referred to as a "shared nothing" architecture.
  • memory resources e.g. , memories 128, can be directly connected to a particular compute node (and not any other compute node).
  • a particular compute node e.g. , node 1 10-1
  • the particular node 1 10-1 would communicate with node 1 10-2 through network 1 18 to access memory 128-2.
  • the particular node 1 10-1 can access memory 128-1 without transferring data via the network 1 18 as the particular node 1 10-1 is directly connected to memory 128-1 .
  • a node 1 10-1 can communicate through the network 1 18 to a memory 128-1 through additional layers such as a Java layer (e.g. , Java Virtual Machine (JVM)) 120-1 , an OS 122-1 , a file system 124-1 , and a block device 126-1 .
  • a JVM 120-1 is an abstract computing machine that enables a computing node to run a Java program.
  • the operating system 122-1 is system software that can manage computer hardware and/or software resources.
  • a file system 124-1 can be a file system code at a kernel level thai is aware of an internal format used to store data, handle structures that represent the data (e.g., such as inodes), and is aware of how to read and/or write the data. ion.
  • the file system 124-1 can refer to a journaling file system that monitors changes not yet committed to the file system's main portion by recording intentions of such changes in a data structure known as a 'journal.
  • a block device 126-1 refers to a computer data storage device drive that supports reading and/or writing data in fixed- size blocks, sectors, and/or clusters, generally 512 bytes or a multiple thereof in size.
  • HDFS 1 16-1 , 1 16-2 can connect a node's (e.g., node 1 10-1 's) file system (e.g., FS 124-1 ) to another node's (e.g., node 10-2's) file system (e.g., FS 24- 2) using a Java-based (e.g., Java 120-1 , 120-2) remote-access layer.
  • the disk-based file systems use replication (commonly configured as triplication) to provide availability of data when a node crashes.
  • the "share nothing" architecture moves large amounts of data for processing by the nodes 1 10 and transfers this data via a network 1 18, which can slow down the data transferring process.
  • the data is typically written in a manage language such as Java and/or Scala that has a large amount of garbage collection, data copy, and/or processing overheads.
  • a shared something architecture can also be used by inserting a number of components (e.g., JNi-FS 221 -1 in Figure 2) and bypassing and/or avoiding usage of a number of components (e.g., HDFS 1 16-1 to 1 16-3, BD 126-1 to 126-3, and, in some examples, OS 122-1 to 122-3). Further, the inserted number of components can be located within the system of Figure 1 but are not illustrated for ease of description of the shared nothing operation. Further, the bypassed components of Figure 1 can be within Figure 2 even though they are bypassed and are not illustrated for ease of reference of the shared something architecture.
  • a number of components e.g., JNi-FS 221 -1 in Figure 2
  • bypassing and/or avoiding usage of a number of components e.g., HDFS 1 16-1 to 1 16-3, BD 126-1 to 126-3, and, in some examples, OS 122-1 to 122-3.
  • the inserted number of components can be located within the system of Figure 1 but
  • Figure 2 illustrates a diagram of an example of a system for shared memory access consistent with the present disclosure.
  • the system in Figure 2 can be a multi-compute node system.
  • the system in Figure 2 illustrates a number of nodes ("Node 1 ,” “Node 2,” “Node 3") 210-1 , 210-2, ... , 210-N that can be compute nodes, referenced herein as number of nodes 210.
  • the number of nodes 210-1 , 210-2, 210-N can communicate with memory (e.g., non-volatile memory, illustrated as "MEM”) 228.
  • the memory 228 can be a number of memristors.
  • the memory 228 can be non-volatile shared memory that is made available and/or shared with the number of nodes 210 through an Ext4/Dax file system.
  • the number of nodes 210 can transfer data from memory 228 through a number of communication layers.
  • the communication layers can include applications 212-1 , 212-2, 1 12-3, dataflow engines 214-1 , 214-2, 214-3, a Java interface 219-1 , 219-2, 219-30S 222, and a file system ("FS") 224.
  • FIG. 2 illustrates a particular computing architecture referred to as a " shared something " architecture
  • memory resources e.g., "mem” 228, can be shared by a number of nodes 210 that each connect directly to the memory 228.
  • a particular compute node e.g., node 210-1
  • can communicate with e.g., send and receive data with
  • memory 228 without using a network (such as network 1 18 in Figure 1 ).
  • the particular node 210-1 would communicate through additional layers including a FS 224 through JNI-fs 221 -1 to connect with the memories 228.
  • a file system 224 can include the ext4 files sfem.
  • Data movement in this "share something" architecture can transfer data without use of an HDFS (such as HDFS 1 16 in Figure 1 ).
  • the "share something” architecture moves large amounts of data for processing by the nodes 210 without transferring data via a network (such as network 1 18 in Figurel ) which can speed up the data transferring process.
  • a node of the number of nodes 210-1 can run an application 212-1 using a Java interface 219-1.
  • the Java interface 219-1 can communicate with a Java Native Interface (JNI-fs) 221 -1 , which is referred to herein as a file system "shim.”
  • JNI-fs Java Native Interface
  • a file system "shim” refers to an interface that allows the Java interface 219-1 to interface directly with native code (e.g., code of C/C++) of the data stored in memories 228.
  • the Java interface 219-1 can connect directly with a file system 224 without using additional Java.
  • the Java interface 219-1 can connect directly to the memory 228 for data loads to be issued from and/or stored into memory 228 using the JNI-FS 221 as a shim to communicate directly with the memory 228.
  • the file system 224 can interact with an OS 222 in order to check for permission to access, in some examples, the file system 224 can load directly from the memories (e.g., memristors) 228 and bypass the OS 222 kernel. For example, the file system 224 can bypass the OS 222 for a read of an already opened file using mmap like instructions.
  • Figure 3 illustrates a diagram of an example of a system for shared memory access consistent with the present disclosure.
  • the system in Figure 3 can include a first multi-compute node system 331 (such as the multi-compute node system illustrated in Figure 2) sharing a first portion of memory 328-1 and a second multi- compute node system 333 sharing a second portion of memory 328.
  • a first multi-compute node system 331 such as the multi-compute node system illustrated in Figure 2
  • second multi- compute node system 333 sharing a second portion of memory 328.
  • the first multi- compute node system 331 can include a software stack including an application (“App") 312-1 , 312-2, 312-3, Java Native Interface shims (JNI-fs) 321 -1 , 321 -2, 321 -3, Java Interfaces 320-1 , 320-2, 320-3, a FS 324, operating systems 322-1 , 322-2, 322-3, and shared memory 328-1.
  • the second multi-compute node system can include a software stack including the same features (e.g., applications 312-4, 312-5, 312-6, JNI-fs 321 -4,
  • FS 324 can be shared by the first and second multi-compute node systems 331 and 333.
  • a shared memory 328-1 can be shared through an Remote Direct Memory Access (RDMA) 335 with a shared memory 328-2.
  • RDMA 335 refers to a direct memory access from memory of one computing device into memory of another computing device without involving an operating system associated with either computing device. This can allow for high-throughput, low-latency networking.
  • Figure 4 illustrates a diagram of an example of a computing system for shared memory access consistent with the present disclosure.
  • Figures 4 and 5 illustrate examples of system and computing device 554 consistent with the present disclosure.
  • Figure 4 illustrates a diagram of an example of a system for shared memory access consistent with the present disclosure.
  • the system can include a database 442, a shared memory access system 444, and/or a number of engines (e.g., locate engine 446, access engine 448, load engine 450).
  • the shared memory access system 444 can be in communication with the database 442 via a communication link, and can include the number of engines (e.g. , locate engine 446, access engine 448, load engine 450).
  • the shared memory access system 444 can include additional or fewer engines than are illustrated to perform the various functions as described above in Figures 1 -3.
  • the number of engines can include a combination of hardware and programming, but at least hardware, that is to perform functions described herein (e.g. , locate a portion of data of a shared memory (e.g. , a memristor memory) using a file pathname and a byte offset, access the located portion of the data from the shared memory, load the located portion of data into a compute node of a plurality of compute nodes using a Java Native interface, etc.) stored in a memory resource (e.g., computer readable medium, machine readable medium, etc.) as well as hard-wired program (e.g. , logic).
  • a memory resource e.g., computer readable medium, machine readable medium, etc.
  • hard-wired program e.g. , logic
  • the locate engine 446 can include hardware and/or a combination of hardware and programming, but at least hardware, to locate a portion of data of a shared memory using a file pathname and a byte offset.
  • the portion of data can be located on a memory (e.g., such as memory 228 in Figure 2).
  • the memory e.g., a memristor memory
  • the pathname of the file and a byte offset can allow at least a node of the number of nodes to locate the file in the memory.
  • the byte offset can indicate a location of a file in the memory at a particular position in the memory.
  • the file pathname can identify the file to be access in the memory.
  • the access engine 448 can include hardware and/or a combination of hardware and programming, but at least hardware, to access a located portion of the data from the shared memory.
  • the file can be accessed without transferring data on a network.
  • the file can be accessed without using TCP/IP based reads to access the data from another node as the data is shared across all of the number of nodes.
  • the load engine 450 can include hardware and/or a combination of hardware and programming, but at least hardware, to load a located portion of data into a compute node of a plurality of compute nodes using a Java Native Interface (JNI).
  • JNI Java Native Interface
  • the located portion of data can be loaded into the compute node without using an additional compute node of the plurality of compute nodes.
  • Figure 5 illustrates a diagram of an example computing device 554 consistent with the present disclosure.
  • the computing device 554 can utilize software, hardware, firmware, and/or logic to perform functions described herein.
  • the computing device 554 can be any combination of hardware and program instructions to share information.
  • the hardware for example, can include a processing resource 556 and/or a memory resource 570 (e.g., computer-readable medium (CRM), machine readable medium (MRM), database, etc.).
  • a processing resource 556 as used herein, can include any number of processors capable of executing instructions stored by a memory resource 570.
  • Processing resource 556 may be implemented in a single device or distributed across multiple devices.
  • the program instructions e.g., computer readable instructions (CRI)
  • CRM computer readable instructions
  • the program instructions can include instructions stored on the memory resource 570 and executable by the processing resource 556 to implement a function (e.g., perform a first segmentation of an image to determine an outline of a hand in the image and a center of the hand, perform a second segmentation of the image to segment a center region the image; compare the first segmentation and the second segmentation to determine a plurality of portions of the image associated with the outline of the hand, etc.).
  • a function e.g., perform a first segmentation of an image to determine an outline of a hand in the image and a center of the hand, perform a second segmentation of the image to segment a center region the image; compare the first segmentation and the second segmentation to determine a plurality of portions of the image associated with the outline of the hand, etc.
  • the memory resource 570 can be in communication with a processing resource 556.
  • a memory resource 570 can include any number of memory components capable of storing instructions that can be executed by processing resource 556.
  • Such memory resource 570 can be a non-transitory CRM or MRM.
  • Memory resource 570 may be integrated in a single device or distributed across multiple devices. Further, memory resource 570 may be fully or partially integrated in the same device as processing resource 556 or it may be separate but accessible to that device and processing resource 556. Thus, it is noted that the computing device 554 may be implemented on a participant device, on a server device, on a collection of server devices, and/or a combination of the participant device and the server device.
  • the memory resource 570 can be in communication with the processing resource 556 via a communication link (e.g., a path) 558.
  • the communication link 558 can be local or remote to a machine (e.g. , a computing device) associated with the processing resource 556. Examples of a local communication link 558 can include an electronic bus internal to a machine (e.g., a computing device) where the memory resource 570 is volatile, non-volatile, fixed, and/or removable storage medium in communication with the processing resource 556 via the electronic bus.
  • a number of modules can include CRI that when executed by the processing resource 556 can perform functions.
  • the number of modules such as the locate module 572, the access module 574, and/or the load module 576 can be sub-modules of other modules.
  • the locate module 572 and the access module 574 can be sub-modules and/or contained within the same computing device.
  • the number of modules e.g., a locate module 572, an access module 574, a load module 576) can comprise individual modules at separate and distinct locations (e.g., CRM, etc.).
  • Each of the number of modules can include instructions that when executed by the processing resource 556 can function as a corresponding engine as described herein.
  • the locate module 572 can include instructions that when executed by the processing resource 556 can function as the locate engine 444.
  • logic is an alternative or additional processing resource to perform a particular action and/or function, etc, described herein, which includes hardware, e.g., various forms of transistor logic, application specific integrated circuits (ASICs), etc., as opposed to computer executable instructions, e.g., software firmware, etc., stored in memory and executable by a processor.
  • ASICs application specific integrated circuits
  • a number of something can refer to one or more such things.
  • a number of widgets can refer to one or more widgets.

Abstract

In one example, a system can include non-volatile memory including shared memory storing a plurality of data. The system can include a plurality of compute nodes. Each of the plurality of compute nodes can access the plurality of data directly independent of communicating through an additional node of the plurality of compute nodes and independent of transferring data of the plurality of data via a network.

Description

Background
[0001] Computing systems can utilize hardware, software, and/or logic to execute a number of operations. The computing system can include a number of nodes that access a number of memories. Each of the number of nodes can be associated with a particular portion of the number of memories. When a particular node wants to access memory that is associated with an additional node, the particular node can
communicate with the additional node through a network to access the memory. The particular node can use network communication (such as TCP/IP) based reads to access data from the additional node. The number of nodes can use a disk-centric shared-nothing distributed file system (such as Hadoop Distributed File System (HDFS)) to access memory at the number of nodes.
Brief Description of the Drawings
[0002] Figure 1 illustrates a diagram of an example of a system for shared memory access consistent with the present disclosure.
[0003] Figure 2 illustrates a diagram of an example of a system for shared memory access consistent with the present disclosure. [0004] Figure 3 illustrates a diagram of an example of a system for shared memory access consistent with the present disclosure.
[0005] Figure 4 illustrates a diagram of an example of a computing system for shared memory access consistent with the present disclosure.
[0006] Figure 5 illustrates a flow chart of an example computing device for shared memory access consistent with the present disclosure.
Detailed Description
[0007] A computing framework can include a cluster of hardware and a number of memories. A number of nodes (e.g., computing nodes) in the computing framework can be used to access the memory (e.g., non-volatile memory) through hardware. Each of the number of nodes can access the memory as shared memory. For example, the memory can be shared across the number of nodes without partitioning particular portions of memory to a particular node. A local file system (FS) (e.g., such as Ext4 + DAX) can be used by each node to access data (e.g., input data) stored in the shared memory. In some examples, the local FS can be used instead of a Hadoop Distributed File Share (HDFS) protocol, in some examples, the memory can be formed of and/or include memristors that are shared with across the nodes.
[0008] The number of nodes can access the memory through a number of layers of protocol and/or applications. For example, the number of layers can include a dataflow engine, a Java interface (e.g., Java Virtual Machine (JVM) interface), an operating system (OS), file system, and/or block device (BD). The number of nodes can access the memory by bypassing the JVM interface and avoid using Java language. The JVM interface can be bypassed by using a JVM "shim" ("JNI-fs") which passes to the file system (FS) (e.g., Ext4 + DAX) without further accessing JAVA, allowing the access of the data in native code (e.g., such as native machine code). In addition, in some examples, the OS, the FS, and/or the BD can be bypassed. In this way, the number of nodes can go directly to using native code and thereby using a thinner software stack. In addition, the number of nodes can bypass going through other nodes using a network as the memory is shared by the nodes directly. [0009] Figure 1 illustrates a diagram of an example of a system for shared memory access consistent with the present disclosure. The system in Figure 1 illustrates a number of nodes ("Node 1 ," "Node 2," "Node 3") 1 10-1 , 1 10-2, ... , 1 10-N that can be compute nodes, referenced herein as number of nodes 1 10. A compute node can refer to a desktop or server system, a virtual machine (VM), a blade in a blade server arrangement, and/or any other suitable computing resource or device. The number of nodes 1 10-1 , 1 10-1 , ... , 1 10-N can communicate with a number of memories (e.g. , non-volatile memories) 128-1 , 128-2, 128-3 etc. , referenced herein as memories 128. The memories 128 can include a number of non-volatile memory (such as memristors, in some examples). The number of nodes 1 10 can transfer data from at least one of the memories 128 through a number of communication layers. The communication layers (e.g., a software stack) can include applications ("APP") 1 12-1 , 1 12-2, 1 12-3, dataflow engines 1 14-1 , 1 14-2, 1 14-3, and Hadoop Distributed File Systems 1 6-1 , 1 16-2, 1 16-3. The dataflow engine 1 14-1 refers to a cluster computing framework designed for a duster of hardware in a particular computing architecture environment.
[0010] in Figure 1 , the particular computing architecture is referred to as a "shared nothing" architecture. In a "shared nothing" architecture, memory resources (e.g. , memories 128) can be directly connected to a particular compute node (and not any other compute node). In this architecture, a particular compute node (e.g. , node 1 10-1 ) uses a network 1 18 to access memories (e.g. , memory 128-2) that are not directly connected to the compute node. For example, the particular node 1 10-1 would communicate with node 1 10-2 through network 1 18 to access memory 128-2.
However, the particular node 1 10-1 can access memory 128-1 without transferring data via the network 1 18 as the particular node 1 10-1 is directly connected to memory 128-1 .
[0011] A node 1 10-1 can communicate through the network 1 18 to a memory 128-1 through additional layers such as a Java layer (e.g. , Java Virtual Machine (JVM)) 120-1 , an OS 122-1 , a file system 124-1 , and a block device 126-1 . A JVM 120-1 is an abstract computing machine that enables a computing node to run a Java program. The operating system 122-1 is system software that can manage computer hardware and/or software resources. A file system 124-1 can be a file system code at a kernel level thai is aware of an internal format used to store data, handle structures that represent the data (e.g., such as inodes), and is aware of how to read and/or write the data. ion. The file system 124-1 can refer to a journaling file system that monitors changes not yet committed to the file system's main portion by recording intentions of such changes in a data structure known as a 'journal.' A block device 126-1 refers to a computer data storage device drive that supports reading and/or writing data in fixed- size blocks, sectors, and/or clusters, generally 512 bytes or a multiple thereof in size.
[0012] Data movement in this "share nothing" architecture, both in initial load and in shuffling, can rely on disk-based file systems, such as HDFS 1 18, to persist data as this architecture uses memory disks to store persistent data. The disk-based file systems (e.g., HDFS 1 16-1 , 1 16-2) can connect a node's (e.g., node 1 10-1 's) file system (e.g., FS 124-1 ) to another node's (e.g., node 10-2's) file system (e.g., FS 24- 2) using a Java-based (e.g., Java 120-1 , 120-2) remote-access layer. The disk-based file systems use replication (commonly configured as triplication) to provide availability of data when a node crashes. The "share nothing" architecture moves large amounts of data for processing by the nodes 1 10 and transfers this data via a network 1 18, which can slow down the data transferring process. The data is typically written in a manage language such as Java and/or Scala that has a large amount of garbage collection, data copy, and/or processing overheads.
[0013] While Figure 1 is described as using a shared nothing architecture above, a shared something architecture can also be used by inserting a number of components (e.g., JNi-FS 221 -1 in Figure 2) and bypassing and/or avoiding usage of a number of components (e.g., HDFS 1 16-1 to 1 16-3, BD 126-1 to 126-3, and, in some examples, OS 122-1 to 122-3). Further, the inserted number of components can be located within the system of Figure 1 but are not illustrated for ease of description of the shared nothing operation. Further, the bypassed components of Figure 1 can be within Figure 2 even though they are bypassed and are not illustrated for ease of reference of the shared something architecture.
[0014] Figure 2 illustrates a diagram of an example of a system for shared memory access consistent with the present disclosure. The system in Figure 2 can be a multi-compute node system. The system in Figure 2 illustrates a number of nodes ("Node 1 ," "Node 2," "Node 3") 210-1 , 210-2, ... , 210-N that can be compute nodes, referenced herein as number of nodes 210. The number of nodes 210-1 , 210-2, 210-N can communicate with memory (e.g., non-volatile memory, illustrated as "MEM") 228. The memory 228 can be a number of memristors. In some example, the memory 228 can be non-volatile shared memory that is made available and/or shared with the number of nodes 210 through an Ext4/Dax file system. The number of nodes 210 can transfer data from memory 228 through a number of communication layers. The communication layers can include applications 212-1 , 212-2, 1 12-3, dataflow engines 214-1 , 214-2, 214-3, a Java interface 219-1 , 219-2, 219-30S 222, and a file system ("FS") 224.
[0015] Figure 2 illustrates a particular computing architecture referred to as a " shared something " architecture, in a "shared something" architecture, memory resources (e.g., "mem" 228) can be shared by a number of nodes 210 that each connect directly to the memory 228. In this architecture, a particular compute node (e.g., node 210-1 ) can communicate with (e.g., send and receive data with) memory 228 without using a network (such as network 1 18 in Figure 1 ). For example, the particular node 210-1 would communicate through additional layers including a FS 224 through JNI-fs 221 -1 to connect with the memories 228. A file system 224 can include the ext4 files sfem.
[0016] Data movement in this "share something" architecture, both in initial load and in shuffling, can transfer data without use of an HDFS (such as HDFS 1 16 in Figure 1 ). The "share something" architecture moves large amounts of data for processing by the nodes 210 without transferring data via a network (such as network 1 18 in Figurel ) which can speed up the data transferring process.
[0017] A node of the number of nodes 210-1 can run an application 212-1 using a Java interface 219-1. The Java interface 219-1 can communicate with a Java Native Interface (JNI-fs) 221 -1 , which is referred to herein as a file system "shim." A file system "shim" refers to an interface that allows the Java interface 219-1 to interface directly with native code (e.g., code of C/C++) of the data stored in memories 228. The Java interface 219-1 can connect directly with a file system 224 without using additional Java. When possible, the Java interface 219-1 can connect directly to the memory 228 for data loads to be issued from and/or stored into memory 228 using the JNI-FS 221 as a shim to communicate directly with the memory 228. In some examples, the file system 224 can interact with an OS 222 in order to check for permission to access, in some examples, the file system 224 can load directly from the memories (e.g., memristors) 228 and bypass the OS 222 kernel. For example, the file system 224 can bypass the OS 222 for a read of an already opened file using mmap like instructions.
[0018] Figure 3 illustrates a diagram of an example of a system for shared memory access consistent with the present disclosure. The system in Figure 3 can include a first multi-compute node system 331 (such as the multi-compute node system illustrated in Figure 2) sharing a first portion of memory 328-1 and a second multi- compute node system 333 sharing a second portion of memory 328. The first multi- compute node system 331 can include a software stack including an application ("App") 312-1 , 312-2, 312-3, Java Native Interface shims (JNI-fs) 321 -1 , 321 -2, 321 -3, Java Interfaces 320-1 , 320-2, 320-3, a FS 324, operating systems 322-1 , 322-2, 322-3, and shared memory 328-1. The second multi-compute node system can include a software stack including the same features (e.g., applications 312-4, 312-5, 312-6, JNI-fs 321 -4,
321 - 5, 321 -6, Java interface 320-4, 320-5, 320-6, file system 324, operating systems
322- 4, 322-5, 322-6, and shared memory 328-2).
[0019] FS 324 can be shared by the first and second multi-compute node systems 331 and 333. A shared memory 328-1 can be shared through an Remote Direct Memory Access (RDMA) 335 with a shared memory 328-2. An RDMA 335 refers to a direct memory access from memory of one computing device into memory of another computing device without involving an operating system associated with either computing device. This can allow for high-throughput, low-latency networking.
[0020] Figure 4 illustrates a diagram of an example of a computing system for shared memory access consistent with the present disclosure. Figures 4 and 5 illustrate examples of system and computing device 554 consistent with the present disclosure. Figure 4 illustrates a diagram of an example of a system for shared memory access consistent with the present disclosure. The system can include a database 442, a shared memory access system 444, and/or a number of engines (e.g., locate engine 446, access engine 448, load engine 450). The shared memory access system 444 can be in communication with the database 442 via a communication link, and can include the number of engines (e.g. , locate engine 446, access engine 448, load engine 450). The shared memory access system 444 can include additional or fewer engines than are illustrated to perform the various functions as described above in Figures 1 -3.
[0021] The number of engines (e.g., locate engine 448, access engine 448, load engine 450) can include a combination of hardware and programming, but at least hardware, that is to perform functions described herein (e.g. , locate a portion of data of a shared memory (e.g. , a memristor memory) using a file pathname and a byte offset, access the located portion of the data from the shared memory, load the located portion of data into a compute node of a plurality of compute nodes using a Java Native interface, etc.) stored in a memory resource (e.g., computer readable medium, machine readable medium, etc.) as well as hard-wired program (e.g. , logic).
[0022] The locate engine 446 can include hardware and/or a combination of hardware and programming, but at least hardware, to locate a portion of data of a shared memory using a file pathname and a byte offset. The portion of data can be located on a memory (e.g., such as memory 228 in Figure 2). The memory (e.g., a memristor memory) can be shared by a number of nodes (e.g. , number of nodes 210 in Figure 2). The pathname of the file and a byte offset can allow at least a node of the number of nodes to locate the file in the memory. The byte offset can indicate a location of a file in the memory at a particular position in the memory. The file pathname can identify the file to be access in the memory.
[0023] The access engine 448 can include hardware and/or a combination of hardware and programming, but at least hardware, to access a located portion of the data from the shared memory. The file can be accessed without transferring data on a network. The file can be accessed without using TCP/IP based reads to access the data from another node as the data is shared across all of the number of nodes.
[0024] The load engine 450 can include hardware and/or a combination of hardware and programming, but at least hardware, to load a located portion of data into a compute node of a plurality of compute nodes using a Java Native Interface (JNI). The located portion of data can be loaded into the compute node without using an additional compute node of the plurality of compute nodes. [0025] Figure 5 illustrates a diagram of an example computing device 554 consistent with the present disclosure. The computing device 554 can utilize software, hardware, firmware, and/or logic to perform functions described herein.
[0026] The computing device 554 can be any combination of hardware and program instructions to share information. The hardware, for example, can include a processing resource 556 and/or a memory resource 570 (e.g., computer-readable medium (CRM), machine readable medium (MRM), database, etc.). A processing resource 556, as used herein, can include any number of processors capable of executing instructions stored by a memory resource 570.
[0027] Processing resource 556 may be implemented in a single device or distributed across multiple devices. The program instructions (e.g., computer readable instructions (CRI)) can include instructions stored on the memory resource 570 and executable by the processing resource 556 to implement a function (e.g., perform a first segmentation of an image to determine an outline of a hand in the image and a center of the hand, perform a second segmentation of the image to segment a center region the image; compare the first segmentation and the second segmentation to determine a plurality of portions of the image associated with the outline of the hand, etc.).
[0028] The memory resource 570 can be in communication with a processing resource 556. A memory resource 570, as used herein, can include any number of memory components capable of storing instructions that can be executed by processing resource 556. Such memory resource 570 can be a non-transitory CRM or MRM.
Memory resource 570 may be integrated in a single device or distributed across multiple devices. Further, memory resource 570 may be fully or partially integrated in the same device as processing resource 556 or it may be separate but accessible to that device and processing resource 556. Thus, it is noted that the computing device 554 may be implemented on a participant device, on a server device, on a collection of server devices, and/or a combination of the participant device and the server device.
[0029] The memory resource 570 can be in communication with the processing resource 556 via a communication link (e.g., a path) 558. The communication link 558 can be local or remote to a machine (e.g. , a computing device) associated with the processing resource 556. Examples of a local communication link 558 can include an electronic bus internal to a machine (e.g., a computing device) where the memory resource 570 is volatile, non-volatile, fixed, and/or removable storage medium in communication with the processing resource 556 via the electronic bus.
[0030] A number of modules (e.g., locate module 572, access module 574, load module 576) can include CRI that when executed by the processing resource 556 can perform functions. The number of modules such as the locate module 572, the access module 574, and/or the load module 576 can be sub-modules of other modules. For example, the locate module 572 and the access module 574 can be sub-modules and/or contained within the same computing device. In another example, the number of modules (e.g., a locate module 572, an access module 574, a load module 576) can comprise individual modules at separate and distinct locations (e.g., CRM, etc.).
[0031] Each of the number of modules (e.g., locate module 572, access module 574, load module 576) can include instructions that when executed by the processing resource 556 can function as a corresponding engine as described herein. For example, the locate module 572 can include instructions that when executed by the processing resource 556 can function as the locate engine 444.
[0032] As used herein, "logic" is an alternative or additional processing resource to perform a particular action and/or function, etc, described herein, which includes hardware, e.g., various forms of transistor logic, application specific integrated circuits (ASICs), etc., as opposed to computer executable instructions, e.g., software firmware, etc., stored in memory and executable by a processor. Further, as used herein, "a" or "a number of something can refer to one or more such things. For example, "a number of widgets" can refer to one or more widgets.
[0033] The above specification, examples and data provide a description of the method and applications, and use of the system and method of the present disclosure. Since many examples can be made without departing from the spirit and scope of the system and method of the present disclosure, this specification merely sets forth some of the many possible example configurations and implementations.

Claims

What is claimed:
1 . A system, comprising:
shared memory storing a plurality of shared data;
a plurality of compute nodes, wherein each of the plurality of compute nodes access the plurality of shared data directly independent of:
communicating through an additional node of the plurality of compute nodes; and transferring data of the plurality of data via a network.
2. The system of claim 1 , wherein the plurality of compute nodes access the plurality of shared data using native machine code.
3. The system of claim 2, wherein the plurality of compute nodes access the plurality of shared data using a Java Native Interface (JNI) to read data using the native machine code.
4. The system of claim 2, wherein the plurality of compute nodes access the plurality of shared data without using managed Java code.
5. The system of claim 1 , wherein the plurality of compute nodes access the plurality of shared data independent of using a Hadoop Distributed File System (HDFS).
6. A system comprising:
a pool of shared non-volatile memory;
a plurality of compute nodes with direct and equal access to the entirety of the pool of shared non-volatile memory, wherein each of the plurality of compute nodes access each portion of the pool of shared non-volatiie memory:
independent of sending data over a network; and
using a shared file system.
7. The system of claim 6, comprising an operating system (OS) shared by the plurality of compute nodes.
8. The system of claim 7, wherein the operating system is bypassed by the plurality of compute nodes to access the pool of shared non-volatile memory.
9. The system of claim 8, wherein each of the processors execute instructions to run an application that uses a Java Native Interface to access the native code stored in the shared non-volatile memory.
10. The system of claim 9, wherein the native code stored in the shared non-volatile memory is accessed without using Java or a Java Virtual Machine.
1 1. The system of claim 6, comprising a non-transitory computer readable medium storing instructions executable by a processor for each of the plurality of compute nodes to run an application,
12. A non-transitory computer-readable medium storing instructions executable by a processor to:
locate a portion of data of a shared memory using a file pathname and a byte offset;
access the located portion of the data from the shared memory; and
load the located portion of data into a compute node of a plurality of compute nodes using a Java Native interface (JNI), wherein the located portion of data is loaded into the compute node without using an additional compute node of the plurality of compute nodes.
13. The non-transitory computer-readable medium of claim 12, wherein the instructions are executable by the processor to load the located portion of the data into the compute node without accessing a network.
14. The non-transitory computer-readable medium of claim 12, wherein the instructions are executable by the processor to access the located portion of data by an additional compute node of the plurality of compute nodes without communicating through the compute node and without transferring data via a network.
15. The non-transitory computer-readable medium of claim 12, wherein the instructions are executable by the processor to load the located portion of the data without partitioning data on the compute node of the plurality of compute nodes.
PCT/US2016/016565 2016-02-04 2016-02-04 Shared memory access WO2017135953A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2016/016565 WO2017135953A1 (en) 2016-02-04 2016-02-04 Shared memory access

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2016/016565 WO2017135953A1 (en) 2016-02-04 2016-02-04 Shared memory access

Publications (1)

Publication Number Publication Date
WO2017135953A1 true WO2017135953A1 (en) 2017-08-10

Family

ID=59499956

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2016/016565 WO2017135953A1 (en) 2016-02-04 2016-02-04 Shared memory access

Country Status (1)

Country Link
WO (1) WO2017135953A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070156907A1 (en) * 2005-12-30 2007-07-05 Galin Galchev Session handling based on shared session information
US20110264841A1 (en) * 2010-04-26 2011-10-27 International Business Machines Corporation Sharing of class data among virtual machine applications running on guests in virtualized environment using memory management facility
US20120216216A1 (en) * 2011-02-21 2012-08-23 Universidade Da Coruna Method and middleware for efficient messaging on clusters of multi-core processors
US20150220354A1 (en) * 2013-11-26 2015-08-06 Dynavisor, Inc. Dynamic I/O Virtualization
WO2015195079A1 (en) * 2014-06-16 2015-12-23 Hewlett-Packard Development Company, L.P. Virtual node deployments of cluster-based applications

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070156907A1 (en) * 2005-12-30 2007-07-05 Galin Galchev Session handling based on shared session information
US20110264841A1 (en) * 2010-04-26 2011-10-27 International Business Machines Corporation Sharing of class data among virtual machine applications running on guests in virtualized environment using memory management facility
US20120216216A1 (en) * 2011-02-21 2012-08-23 Universidade Da Coruna Method and middleware for efficient messaging on clusters of multi-core processors
US20150220354A1 (en) * 2013-11-26 2015-08-06 Dynavisor, Inc. Dynamic I/O Virtualization
WO2015195079A1 (en) * 2014-06-16 2015-12-23 Hewlett-Packard Development Company, L.P. Virtual node deployments of cluster-based applications

Similar Documents

Publication Publication Date Title
US11922070B2 (en) Granting access to a storage device based on reservations
EP2972893B1 (en) Caching content addressable data chunks for storage virtualization
US8539137B1 (en) System and method for management of virtual execution environment disk storage
US20180024853A1 (en) Methods, systems, devices and appliances relating to virtualized application-layer space for data processing in data storage systems
US8635616B2 (en) Virtualization processing method and apparatuses, and computer system
US8966476B2 (en) Providing object-level input/output requests between virtual machines to access a storage subsystem
US20210303401A1 (en) Managing storage device errors during processing of inflight input/output requests
US20130282994A1 (en) Systems, methods and devices for management of virtual memory systems
US20160103845A1 (en) Enhanced Handling Of Intermediate Data Generated During Distributed, Parallel Processing
US10585806B2 (en) Associating cache memory with a work process
US20190238618A1 (en) Managing cloud storage of block-based and file-based data
Truyen et al. Evaluation of container orchestration systems for deploying and managing NoSQL database clusters
US20230221897A1 (en) Implementing coherency and page cache support for a storage system spread across multiple data centers
US11194522B2 (en) Networked shuffle storage
Ganelin et al. Spark: Big data cluster computing in production
US10067949B1 (en) Acquired namespace metadata service for controlling access to distributed file system
WO2017135953A1 (en) Shared memory access
US20170131899A1 (en) Scale Out Storage Architecture for In-Memory Computing and Related Method for Storing Multiple Petabytes of Data Entirely in System RAM Memory
Thompson et al. Interoperable SQLite for a bare PC
Divya et al. Big Data Analysis and Its Scheduling Policy–Hadoop
Hennecke et al. DAOS Beyond Persistent Memory: Architecture and Initial Performance Results
Kunwar Disk i/o performance of kata containers
Yee et al. Hfaa: a generic socket api for hadoop file systems
Wiktorski et al. Hadoop architecture
CN117093158B (en) Storage node, system and data processing method and device of distributed storage system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16889602

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16889602

Country of ref document: EP

Kind code of ref document: A1