US20120011144A1 - Aggregation in parallel computation environments with shared memory - Google Patents
Aggregation in parallel computation environments with shared memory Download PDFInfo
- Publication number
- US20120011144A1 US20120011144A1 US12/978,194 US97819410A US2012011144A1 US 20120011144 A1 US20120011144 A1 US 20120011144A1 US 97819410 A US97819410 A US 97819410A US 2012011144 A1 US2012011144 A1 US 2012011144A1
- Authority
- US
- United States
- Prior art keywords
- execution threads
- local hash
- local
- hash
- hash tables
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24542—Plan optimisation
- G06F16/24544—Join order optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2255—Hash tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24532—Query optimisation of parallel queries
Definitions
- Some embodiments relate to a data structure. More specifically, some embodiments provide a method and system for a data structure and use of same in parallel computing environments.
- a number of presently developed and developing computer systems include multiple processors in an attempt to provide increased computing performance. Advances in computing performance, including for example processing speed and throughput, may be provided by parallel computing systems and devices as compared to single processing systems that sequentially process programs and instructions.
- FIG. 1 is block diagram of a system according to some embodiments.
- FIG. 2 is a block diagram of an operating environment according to some embodiments.
- FIGS. 3A-3D are illustrative depictions of various aspects of a data structure according to some embodiments.
- FIG. 4 is a flow diagram of a method relating to a data structure, according to some embodiments herein.
- FIGS. 5A-5D provide illustrative examples of some data tables according to some embodiments.
- FIG. 6 is an illustrative depiction of an aggregation flow, in some embodiments herein.
- FIG. 7 is a flow diagram of a method relating to an aggregation flow, according to some embodiments herein.
- a data structure and techniques of using that data structure may be developed to fully exploit the design characteristics and capabilities of that particular computing environment.
- a data structure and techniques for using that data structure i.e., algorithms
- the term parallel computation environment with shared memory refers to a system or device having more than one processing unit.
- the multiple processing units may be processors, processor cores, multi-core processors, etc. All of the processing units can access a main memory (i.e., a shared memory architecture). All of the processing units can run or execute the same program(s). As used herein, a running program may be referred to as a thread.
- Memory may be organized in a hierarchy of multiple levels, where faster but smaller memory units are located closer to the processing units. The smaller and faster memory units located nearer the processing units as compared to the main memory are referred to as cache.
- FIG. 1 is a block diagram overview of a device, system, or apparatus 100 that may be used in a providing an index hash table or hash map in accordance with some aspects and embodiments herein, as well as providing a parallel aggregation based on such data structures.
- System 100 may be, for example, associated with any of the devices described herein and may include a plurality of processing units 105 , 110 , and 115 .
- the processing units may comprise one or more commercially available Central Processing Units (CPUs) in form of one-chip microprocessors or a multi-core processor, coupled to a communication device 120 configured to communicate via a communication network (not shown in FIG. 1 ) to a end client (not shown in FIG. 1 ).
- CPUs Central Processing Units
- Device 100 may also include a local cache memory associated with each of the processing units 105 , 110 , and 115 such as RAM memory modules.
- Communication device 515 may be used to communicate, for example, with one or more client devices or business service providers.
- System 100 further includes an input device 125 (e.g., a mouse and/or keyboard to enter content) and an output device 130 (e.g., a computer monitor to display a user interface element).
- Processing units 105 , 110 , and 115 communicates with a shared memory 135 via a system bus 175 .
- System bus also provides a mechanism for the processing units to communicate with a storage device 140 .
- Storage device 140 may include any appropriate information storage device, including combinations of magnetic storage devices (e.g., a hard disk drive), optical storage devices, and/or semiconductor memory devices for storing data and programs.
- Storage device 140 stores a program 145 for controlling the processing units 105 , 110 , and 115 and query engine application 150 for executing queries.
- Processing units 105 , 110 , and 115 may perform instructions of the program 145 and thereby operate in accordance with any of the embodiments described herein. For example, the processing units may concurrently execute a plurality of execution threads to build the index hash table data structures disclosed herein.
- query engine 150 may operate to execute a parallel aggregation operation in accordance with aspects herein in cooperation with the processing units and by accessing database 155 .
- Program 145 and other instructions may be stored in a compressed, uncompiled and/or encrypted format.
- Program 645 may also include other program elements, such as an operating system, a database management system, and/or device drivers used by the processing units 105 , 110 , and 115 to interface with peripheral devices.
- storage device 140 includes a database 155 to facilitate the execution of queries based on input table data.
- the database may include data structures (e.g., index hash tables), rules, and conditions for executing a query in a parallel computation environment such as that of FIGS. 1 and 2 .
- the data structure disclosed herein as being developed for use in parallel computing environments with shared memory is referred to as a parallel hash table.
- the parallel hash table may also be referred to as a parallel hash map.
- a hash table may be provided and used as index structures for data storage to enable fast data retrieval.
- the parallel hash table disclosed herein may be used in a parallel computation environment where multiple concurrently executing (i.e., running) threads insert and retrieve data in tables.
- an aggregation algorithm that uses the parallel hash tables herein is provided for computing an aggregate in a parallel computation environment.
- FIG. 2 provides an illustrative example of a computation environment 100 compatible with some embodiments herein. While computation environment 100 may be compatible with some embodiments of the data structures and the methods herein, the data structures and the methods herein are not limited to the example computation environment 100 . Processes to store, retrieve, and perform operations on data may be facilitated by a database system (DBS) and a database warehouse (DWH).
- DBS database system
- DWH database warehouse
- DBS 210 is a server.
- DBS 210 further includes a database management system (DBMS) 215 .
- DBMS 215 may comprise software (e.g., programs, instructions, code, applications, services, etc.) that controls the organization of and access to database 225 that stores data.
- Database 225 may include an internal memory, an external memory, or other configurations of memory.
- Database 225 may be capable of storing large amounts of data, including relational data. The relational data may be stored in tables.
- a plurality of clients such as example client 205 , may communicate with DBS 210 via a communication link (e.g., a network) and specified application programming interfaces (APIs).
- the API language provided by DBS 210 is SQL, the Structured Query Language.
- Client 205 may communicate with DBS 115 using SQL to, for example, create and delete tables; insert, update, and delete data; and query data.
- a user may submit a query from client 205 in the form of a SQL query statement to DBS 210 .
- DBMS 215 may execute the query by evaluating the parameters of the query statement and accessing database 225 as needed to produce a result 230 .
- the result 230 may be provided to client 205 for storage and/or presentation to the user.
- One type of query is an aggregation query.
- a parallel aggregation algorithm, process, or operation may be used to compute SQL aggregates.
- client 205 wanting to group or aggregate data of a table stored in database 225 (e.g., a user at client 205 may desire to know the average salaries of the employees in all of a company's departments).
- Client 205 may connect to DBS 210 and issue a SQL query statement that describes and specifies the desired aggregation.
- DBMS 215 may create a executable instance of the parallel aggregation algorithm herein, provide it with the information needed to run the parallel aggregation algorithm (e.g., the name of a table to access, the columns to group by, the columns to aggregate, the aggregation function, etc.), and run the parallel aggregation operation or algorithm.
- the parallel aggregation algorithm herein may create an index hash map 220 .
- the index hash map may be used to keep track of intermediate result data.
- An overall result comprising a result table may be computed based on the index hash map(s) containing the intermediate results.
- the overall parallel aggregation result may be transmitted to client 205 .
- DWHs may be built on top of DBSs.
- a use-case of a DWH may be similar in some respects to DBS 210 of FIG. 2 .
- the computation environment of FIG. 2 may include a plurality of processors that can operate concurrently, in parallel and include a device or system similar to that described in FIG. 1 . Additionally, the computation environment of FIG. 2 may have a memory that is shared amongst the plurality of processors, for example, like the system of FIG. 1 . In order to fully capitalize on the parallel processing power of such a computation environment, the data structures used by the system may be designed, developed or adapted for being efficiently used in the parallel computing environment.
- a hash table is a fundamental data structure in computer science that is used for mapping “keys” (e.g., the names of people) to the associated values of the keys (e.g., the phone number of the people) for fast data look-up.
- a conventional hash table stores key—value pairs.
- Conventional hash tables are designed for sequential processing.
- the data structure of an index hash map provides a lock-free cache-efficient hash data structure developed to parallel computation environments with shared memory.
- the index hash map may be adapted to column stores.
- the index hash map herein does not store key—value pairs.
- the index hash map herein generates key—index pairs by mapping each distinct key to a unique integer.
- each time a new distinct key is inserted in the index hash map the index hash map increments an internal counter and assigns the value of the counter to the key to produce a key—index pair.
- the counter may provide, at any time, the cardinality of an input set of keys that have thus far been inserted in the hash map.
- the key—index mapping may be used to share a single hash map among different columns (or value arrays).
- the associated index for the key has to be calculated just once.
- the use of key—index pairs may facilitate bulk insertion in columnar storages. Inserting a set of key—index pairs may entail inserting the keys in a hash map to obtain a mapping vector containing indexes. This mapping vector may be used to build a value array per value column.
- FIGS. 3A-3D input data is illustrated in FIG. 3A including a key array 305 .
- the index hash map returns an index 320 (i.e., a unique integer), as seen in FIG. 3B .
- the mapping vector of FIG. 3C results.
- the entries in the mapping of FIG. 3C are the indexes that point to a value array “A” 330 illustrated in FIG. 3D .
- the mapping of FIG. 3C may be used to aggregate the “Kf” columns 310 shown in FIG. 3A .
- the result of the aggregation of column 310 is depicted in FIG. 3D at 335 .
- index hash maps herein may be designed to avoid locking when being operated on by concurrently executing threads by producing wide data independence.
- index hash maps herein may be described by a framework defining a two step process. In a first step, input data is split or separated into equal-sized blocks and the blocks are assigned to worker execution threads. These worker execution threads may produce intermediate results by building relatively small local hash tables or hash maps. The local hash maps are private to the respective thread that produces it. Accordingly, other threads may not see or access the local hash map produced by a given thread.
- the local hash maps including the intermediate results may be merged to obtain a global result by concurrently executing merger threads.
- each of the merger threads may only consider a dedicated range of hash values.
- the merger threads may process hash-disjoint partitions of the local hash maps and produce disjoint result hash tables that may be concatenated to build an overall result.
- FIG. 4 is a flow diagram related to a data structure framework 400 , in accordance with some embodiments herein.
- an input data table is separated or divided into a plurality of partitions.
- the size of the partitions may relate to or even be the size of a memory unit such as, for example, a cache associated with parallel processing units.
- the partitions are equal in size.
- a first plurality of execution threads running in parallel may each generate a local hash table or hash map. Each of the local hash maps is private to the one of the plurality of threads that generated the local hash map.
- the second step of the data structure framework herein is depicted in FIG. 4 at S 410 .
- the local hash maps are merged.
- the merging of the local hash maps produces a set of disjoint result hash tables or hash maps.
- each of the merger threads may only consider a dedicated range of hash values. From a logical perspective, the local hash maps may be considered as being partitioned by their hash value.
- One implementation may use, for example, some first bits of the hash value to form a range of hash values. The same ranges are used for all local hash maps, thus the “partitions” of the local hash maps are disjunctive. As an example, if a value “a” is in range 5 of a local hash map, then the value will be in the same range of other local hash maps. In this manner, all identical values of all local hash maps may be merged into a single result hash map. Since the “partitions” are disjunctive, the merged result hash maps may be created without a need for locks. Additionally, further processing on the merged result hash maps may be performed without locks since any execution threads will be operating on disjunctive data.
- the local (index) hash maps providing the intermediate results may be of a fixed size. Instead of resizing a local hash map, the corresponding worker execution thread may replace its local hash map with a new hash map when a certain load factor is reached and place the current local hash map into a buffer containing hash maps that are ready to be merged.
- the size of the local hash maps may be sized such that the local hash maps fit in a cache (e.g., L2 or L3). The specific size of the cache may depend on the sizes of caches in a given CPU architecture.
- insertions and lookups of keys may largely take place in cache.
- over-crowded areas within a local hash map may be avoided by maintaining statistical data regarding the local hash maps. The statistical data may indicate when the local hash map should be declared full (independent of an actual load factor).
- the size of a buffer of a computing system and environment holding local hash maps ready to be merged is a tuning parameter, wherein a smaller buffer may induce more merge operations while a larger buffer will necessarily require more memory.
- a global result may be organized into bucketed index hash maps where each result hash map includes multiple fixed-size physical memory blocks.
- cache-efficient merging may be realized, as well as memory allocation being more efficient and sustainable since allocated blocks may be shared between queries.
- the hash map may be resized. Resizing a hash map may be accomplished by increasing its number of memory blocks. Resizing of a bucketed index hash map may entail needing to know the entries to be repositioned.
- the maps' hash function may be chosen such that its codomain increases by adding further least significant bits of need during a resize operation. In an effort to avoid too many resize operations, an estimate of a final target size may be determined before an actual resizing of the hash map.
- the index hash map framework discussed above may provide an infrastructure to implement parallelized query processing algorithms or operations.
- One embodiment of a parallelized query processing algorithm includes a hash-based aggregation, as will be discussed in greater detail below.
- a parallelized aggregation refers to a relational aggregation that groups and condenses relational data stored in tables.
- An example of a table that may form an input of a parallel aggregation operation herein is depicted in FIG. 5A .
- Table 500 includes sales data. The sales data is organized in three columns—a Product column 505 , a Country column 510 , and a Revenue column 15 .
- Table 500 may be grouped and aggregated by, for example, four combinations of columns—by Product and Country, by Product, and by Country. In the following discussion the columns by which an aggregation groups the data is referred to as group columns.
- FIGS. 5B-5D Aggregation result tables determined by the four different groupings are illustrated in FIGS. 5B-5D .
- Each of the result tables 520 , 540 , 555 , and 570 contain the distinct values (groups) of the desired group columns and, per group, the aggregated value.
- table 520 includes the results based on grouping by Product and Country.
- Columns 525 and 530 include the distinct Product and Country values (i.e., groups) of the desired Product and Country columns ( FIG. 5A , columns 505 and 510 ) and the aggregated value for each distinct Product and Country group is included in column 535 .
- table 540 includes the results based on grouping by Product.
- Column 545 includes the distinct Product values (i.e., groups) of the desired Product column ( FIG.
- Table 555 includes the results based on grouping by Country where columns 560 and 565 include the distinct Country values (i.e., groups) of the desired Country column ( FIG. 5A , column 510 ) and the aggregated value for each distinct Country group.
- a summation function SUM is used to aggregate values.
- other aggregation functions such as, for example and not as a limitation, a COUNT, a MIN, a MAX, and an AVG aggregation function may be used.
- the column containing the aggregates may be referred to herein as the aggregate column.
- the aggregate columns in FIGS. 5A-5E are columns 535 , 550 , 560 , and 575 , respectively.
- an aggregation operation should be computed and determined in parallel.
- the processing performance for the aggregation would be bound by the speed of a single processing unit instead of being realized by the multiple processing units available in the parallel computing environment.
- FIG. 6 is an illustrative depiction of a parallel aggregation flow, according to some embodiments herein.
- the parallel aggregation flow 600 uses the index hash table framework discussed hereinabove.
- two degrees of parallelism are depicted and are achieved by the concurrent execution of two execution threads.
- the concepts conveyed by FIG. 6 may be extended to additional degrees of parallelism, including computation environments now known and those that become known in the future.
- input table 605 is separated into a plurality of partitions.
- Input table 605 is shown divided into partitions 610 and 615 . All of or a portion of table 605 may be split into partitions for parallel aggregation.
- Portions of table 605 not initially partitioned and processed by a parallel aggregation operation may subsequently be partitioned for parallel aggregation processing.
- Table 605 may be partitioned into equal-sized partitions.
- Partitions 610 and 615 are but two example partitions, and additional partitions may exist and be processed in the parallel aggregation operations herein.
- a first plurality of execution threads, aggregator threads are initially running and a second plurality of execution threads are not initially running or are in a sleep state.
- the concurrently operating aggregator threads operate to fetch an exclusive part of table 605 .
- Partition 610 is fetched by aggregator thread 620 and partition 615 is fetched by aggregator thread 625 .
- Each of the aggregator threads may read their partition and aggregate the values of each partition into a private local hash table or hash map.
- Aggregator thread 620 produces private hash map 630 and aggregator thread 625 produces local hash map 635 . Since each aggregator thread processes its own separate portion of input table 605 , and has its private hash map, the parallel processing of the partitions may be accomplished lock-free.
- the local hash tables may be the same size as the cache associated with the processing unit executing an aggregator thread. Sizing the local hash tables in this manner may function to avoid cache misses.
- input data may be read from table 605 to aggregate and written to the local hash tables row-wise or column-wise.
- the aggregator thread may fetch another, unprocessed partition of input table 605 .
- the aggregator threads move their associated local hash maps into a buffer 640 when the local hash table reaches a threshold size, initiate a new local hash table, and proceed.
- the aggregator threads may wake up a second plurality of execution threads, referred to in the present example as merger threads, and the aggregator threads may enter a sleep state.
- the local hash maps may be retained in buffer 640 until the entire input table 605 is consumed by the aggregations threads 620 and 625 .
- the second plurality of execution threads, the merger threads are awaken and the aggregator threads enter a sleep state.
- Each of the merger threads is responsible for a certain partition of all of the private hash maps in buffer 640 .
- the particular data partition each merger thread is responsible for may be determined by assigning distinct, designated key values of the local hash maps to each of the merger threads. That is, the partition of the data of the portion the data for which each merger thread is responsible may be determined by “key splitting” in the local hash maps.
- merger thread 1 is responsible for designated keys 665
- merger thread 2 is responsible for keys 670 .
- Each of the merger threads 1 and 2 operate to iterate over all of the private hash maps in buffer 640 , read their respective data partition as determined by the key splitting, and merge their respective data partition into a thread-local part hash table (or part hash map).
- merger thread 1 ( 662 ) and merger thread 2 ( 664 ) each consider all of the private hash maps in buffer 640 based on the key based partitions they are each responsible for and produce, respectively, part hash map 1 ( 675 ) and part hash map 2 ( 680 ).
- the executing merger threads may acquire responsibility for a new data partition and proceed to process the new data partition as discussed above.
- the merger threads may enter a sleep state and the aggregator threads may return to an active, running state. Upon returning to the active, running state, the processes discussed above may repeat.
- the parallel aggregation operation herein may terminate.
- the results of the aggregation process will be contained in the set of part hash maps (e.g., 675 and 680 ).
- the part hash maps may be seen as forming a parallel result since the part hash maps are disjoint.
- the part hash maps may be processed in parallel.
- a having clause may be evaluated and applied to every group or parallel sorting and merging may be performed thereon.
- An overall result may be obtained from the disjoint part hash maps by concatenating them together, as depicted in FIG. 6 at 685 .
- FIG. 7 is an illustrative example of a flow diagram 700 relating to some parallel aggregation embodiments herein.
- exclusive partitions of an input data table are received or retrieved for aggregating in parallel.
- the values of each of the exclusive partitions are aggregated.
- the values of each partition are aggregated into a local hash map by one of a plurality of concurrently running execution threads.
- process 700 operates to generate a global result by assembling the results obtained at S 725 into a composite result table.
- the overall result may be produced by concatenating the part hash maps of S 725 to each other.
- Each system described herein may be implemented by any number of devices in communication via any number of other public and/or private networks. Two or more of the devices herein may be co-located, may be a single device, or may be located remote from one another and may communicate with one another via any known manner of network(s) and/or a dedicated connection. Moreover, each device may comprise any number of hardware and/or software elements suitable to provide the functions described herein as well as any other functions. Other topologies may be used in conjunction with other embodiments.
- All systems and processes discussed herein may be embodied in program code stored on one or more computer-readable media.
- Such media may include, for example, a floppy disk, a CD-ROM, a DVD-ROM, magnetic tape, and solid state Random Access Memory (RAM) or Read Only Memory (ROM) storage units.
- RAM Random Access Memory
- ROM Read Only Memory
- a memory storage unit may be associated with access patterns and may be independent from the device (e.g., magnetic, optoelectronic, semiconductor/solid-state, etc.)
- in-memory technologies may be used such that databases, etc. may be completely operated in RAM memory at a processor. Embodiments are therefore not limited to any specific combination of hardware and software.
Abstract
Description
- Some embodiments relate to a data structure. More specifically, some embodiments provide a method and system for a data structure and use of same in parallel computing environments.
- A number of presently developed and developing computer systems include multiple processors in an attempt to provide increased computing performance. Advances in computing performance, including for example processing speed and throughput, may be provided by parallel computing systems and devices as compared to single processing systems that sequentially process programs and instructions.
- For parallel shared-memory aggregation processes, a number of approaches have been proposed. However, the previous approaches each include sequential operations and/or synchronization operations such as, locking, to avoid inconsistencies or lapses in data coherency. Thus, prior proposed solutions for parallel aggregation in parallel computation environments with shared memory either contain a sequential step or require some sort of synchronization on the data structures.
- Accordingly, a method and mechanism for efficiently processing data in parallel computation environments and the use of same in parallel aggregation processes are provided by some embodiments herein.
-
FIG. 1 is block diagram of a system according to some embodiments. -
FIG. 2 is a block diagram of an operating environment according to some embodiments. -
FIGS. 3A-3D are illustrative depictions of various aspects of a data structure according to some embodiments. -
FIG. 4 is a flow diagram of a method relating to a data structure, according to some embodiments herein. -
FIGS. 5A-5D provide illustrative examples of some data tables according to some embodiments. -
FIG. 6 is an illustrative depiction of an aggregation flow, in some embodiments herein. -
FIG. 7 is a flow diagram of a method relating to an aggregation flow, according to some embodiments herein. - In an effort to more fully and efficiently use the resources of a particular computing environment, a data structure and techniques of using that data structure may be developed to fully exploit the design characteristics and capabilities of that particular computing environment. In some embodiments herein, a data structure and techniques for using that data structure (i.e., algorithms) are provided for efficiently using the data structure disclosed herein in a parallel computing environment with shared memory.
- As used herein, the term parallel computation environment with shared memory refers to a system or device having more than one processing unit. The multiple processing units may be processors, processor cores, multi-core processors, etc. All of the processing units can access a main memory (i.e., a shared memory architecture). All of the processing units can run or execute the same program(s). As used herein, a running program may be referred to as a thread. Memory may be organized in a hierarchy of multiple levels, where faster but smaller memory units are located closer to the processing units. The smaller and faster memory units located nearer the processing units as compared to the main memory are referred to as cache.
-
FIG. 1 is a block diagram overview of a device, system, orapparatus 100 that may be used in a providing an index hash table or hash map in accordance with some aspects and embodiments herein, as well as providing a parallel aggregation based on such data structures.System 100 may be, for example, associated with any of the devices described herein and may include a plurality ofprocessing units communication device 120 configured to communicate via a communication network (not shown inFIG. 1 ) to a end client (not shown inFIG. 1 ).Device 100 may also include a local cache memory associated with each of theprocessing units Communication device 515 may be used to communicate, for example, with one or more client devices or business service providers.System 100 further includes an input device 125 (e.g., a mouse and/or keyboard to enter content) and an output device 130 (e.g., a computer monitor to display a user interface element). -
Processing units memory 135 via asystem bus 175. System bus also provides a mechanism for the processing units to communicate with astorage device 140.Storage device 140 may include any appropriate information storage device, including combinations of magnetic storage devices (e.g., a hard disk drive), optical storage devices, and/or semiconductor memory devices for storing data and programs. -
Storage device 140 stores aprogram 145 for controlling theprocessing units query engine application 150 for executing queries.Processing units program 145 and thereby operate in accordance with any of the embodiments described herein. For example, the processing units may concurrently execute a plurality of execution threads to build the index hash table data structures disclosed herein. Furthermore,query engine 150 may operate to execute a parallel aggregation operation in accordance with aspects herein in cooperation with the processing units and by accessingdatabase 155.Program 145 and other instructions may be stored in a compressed, uncompiled and/or encrypted format.Program 645 may also include other program elements, such as an operating system, a database management system, and/or device drivers used by theprocessing units - In some embodiments,
storage device 140 includes adatabase 155 to facilitate the execution of queries based on input table data. The database may include data structures (e.g., index hash tables), rules, and conditions for executing a query in a parallel computation environment such as that ofFIGS. 1 and 2 . - In some embodiments, the data structure disclosed herein as being developed for use in parallel computing environments with shared memory is referred to as a parallel hash table. In some instances, the parallel hash table may also be referred to as a parallel hash map. In general, a hash table may be provided and used as index structures for data storage to enable fast data retrieval. The parallel hash table disclosed herein may be used in a parallel computation environment where multiple concurrently executing (i.e., running) threads insert and retrieve data in tables. Furthermore, an aggregation algorithm that uses the parallel hash tables herein is provided for computing an aggregate in a parallel computation environment.
-
FIG. 2 provides an illustrative example of acomputation environment 100 compatible with some embodiments herein. Whilecomputation environment 100 may be compatible with some embodiments of the data structures and the methods herein, the data structures and the methods herein are not limited to theexample computation environment 100. Processes to store, retrieve, and perform operations on data may be facilitated by a database system (DBS) and a database warehouse (DWH). - As shown in
FIG. 2 , DBS 210 is a server. DBS 210 further includes a database management system (DBMS) 215. DBMS 215 may comprise software (e.g., programs, instructions, code, applications, services, etc.) that controls the organization of and access todatabase 225 that stores data.Database 225 may include an internal memory, an external memory, or other configurations of memory.Database 225 may be capable of storing large amounts of data, including relational data. The relational data may be stored in tables. In some embodiments, a plurality of clients, such asexample client 205, may communicate with DBS 210 via a communication link (e.g., a network) and specified application programming interfaces (APIs). In some embodiments, the API language provided by DBS 210 is SQL, the Structured Query Language.Client 205 may communicate with DBS 115 using SQL to, for example, create and delete tables; insert, update, and delete data; and query data. - In general, a user may submit a query from
client 205 in the form of a SQL query statement to DBS 210.DBMS 215 may execute the query by evaluating the parameters of the query statement and accessingdatabase 225 as needed to produce aresult 230. Theresult 230 may be provided toclient 205 for storage and/or presentation to the user. - One type of query is an aggregation query. As will be explained in greater detail below, a parallel aggregation algorithm, process, or operation may be used to compute SQL aggregates. In general with reference to
FIG. 2 , some embodiments herein may includeclient 205 wanting to group or aggregate data of a table stored in database 225 (e.g., a user atclient 205 may desire to know the average salaries of the employees in all of a company's departments).Client 205 may connect toDBS 210 and issue a SQL query statement that describes and specifies the desired aggregation.DBMS 215 may create a executable instance of the parallel aggregation algorithm herein, provide it with the information needed to run the parallel aggregation algorithm (e.g., the name of a table to access, the columns to group by, the columns to aggregate, the aggregation function, etc.), and run the parallel aggregation operation or algorithm. In the process of running, the parallel aggregation algorithm herein may create anindex hash map 220. The index hash map may be used to keep track of intermediate result data. An overall result comprising a result table may be computed based on the index hash map(s) containing the intermediate results. The overall parallel aggregation result may be transmitted toclient 205. - As an extension of
FIG. 2 , DWHs may be built on top of DBSs. Thus, a use-case of a DWH may be similar in some respects toDBS 210 ofFIG. 2 . - The computation environment of
FIG. 2 may include a plurality of processors that can operate concurrently, in parallel and include a device or system similar to that described inFIG. 1 . Additionally, the computation environment ofFIG. 2 may have a memory that is shared amongst the plurality of processors, for example, like the system ofFIG. 1 . In order to fully capitalize on the parallel processing power of such a computation environment, the data structures used by the system may be designed, developed or adapted for being efficiently used in the parallel computing environment. - A hash table is a fundamental data structure in computer science that is used for mapping “keys” (e.g., the names of people) to the associated values of the keys (e.g., the phone number of the people) for fast data look-up. A conventional hash table stores key—value pairs. Conventional hash tables are designed for sequential processing.
- However, for parallel computation environments there exists a need for data structures particularly suitable for use in the parallel computing environment. In some embodiments herein, the data structure of an index hash map is provided. In some aspects, the index hash map provides a lock-free cache-efficient hash data structure developed to parallel computation environments with shared memory. In some embodiments, the index hash map may be adapted to column stores.
- In a departure from conventional hash tables that store key—value pairs, the index hash map herein does not store key—value pairs. The index hash map herein generates key—index pairs by mapping each distinct key to a unique integer. In some embodiments, each time a new distinct key is inserted in the index hash map, the index hash map increments an internal counter and assigns the value of the counter to the key to produce a key—index pair. The counter may provide, at any time, the cardinality of an input set of keys that have thus far been inserted in the hash map. In some respects, the key—index mapping may be used to share a single hash map among different columns (or value arrays). For example, for processing a plurality of values distributed among different columns, the associated index for the key has to be calculated just once. The use of key—index pairs may facilitate bulk insertion in columnar storages. Inserting a set of key—index pairs may entail inserting the keys in a hash map to obtain a mapping vector containing indexes. This mapping vector may be used to build a value array per value column.
- Referring to
FIGS. 3A-3D , input data is illustrated inFIG. 3A including akey array 305. For each distinct key 315 fromkeys array 305, the index hash map returns an index 320 (i.e., a unique integer), as seen inFIG. 3B . When all of the keys, from a column for example, have been inserted in the hash map, the mapping vector ofFIG. 3C results. The entries in the mapping ofFIG. 3C are the indexes that point to a value array “A” 330 illustrated inFIG. 3D . The mapping ofFIG. 3C may be used to aggregate the “Kf”columns 310 shown inFIG. 3A . The result of the aggregation ofcolumn 310 is depicted inFIG. 3D at 335. - To achieve a maximum parallel processor utilization, the index hash maps herein may be designed to avoid locking when being operated on by concurrently executing threads by producing wide data independence. In some embodiments, index hash maps herein may be described by a framework defining a two step process. In a first step, input data is split or separated into equal-sized blocks and the blocks are assigned to worker execution threads. These worker execution threads may produce intermediate results by building relatively small local hash tables or hash maps. The local hash maps are private to the respective thread that produces it. Accordingly, other threads may not see or access the local hash map produced by a given thread.
- In a second step, the local hash maps including the intermediate results may be merged to obtain a global result by concurrently executing merger threads. When accessing and processing the local hash maps, each of the merger threads may only consider a dedicated range of hash values. The merger threads may process hash-disjoint partitions of the local hash maps and produce disjoint result hash tables that may be concatenated to build an overall result.
-
FIG. 4 is a flow diagram related to a data structure framework 400, in accordance with some embodiments herein. At S405, an input data table is separated or divided into a plurality of partitions. The size of the partitions may relate to or even be the size of a memory unit such as, for example, a cache associated with parallel processing units. In some embodiments, the partitions are equal in size. Furthermore, a first plurality of execution threads running in parallel may each generate a local hash table or hash map. Each of the local hash maps is private to the one of the plurality of threads that generated the local hash map. - The second step of the data structure framework herein is depicted in
FIG. 4 at S410. At S410, the local hash maps are merged. The merging of the local hash maps produces a set of disjoint result hash tables or hash maps. - In some embodiments, when accessing and processing the local hash maps, each of the merger threads may only consider a dedicated range of hash values. From a logical perspective, the local hash maps may be considered as being partitioned by their hash value. One implementation may use, for example, some first bits of the hash value to form a range of hash values. The same ranges are used for all local hash maps, thus the “partitions” of the local hash maps are disjunctive. As an example, if a value “a” is in range 5 of a local hash map, then the value will be in the same range of other local hash maps. In this manner, all identical values of all local hash maps may be merged into a single result hash map. Since the “partitions” are disjunctive, the merged result hash maps may be created without a need for locks. Additionally, further processing on the merged result hash maps may be performed without locks since any execution threads will be operating on disjunctive data.
- In some embodiments, the local (index) hash maps providing the intermediate results may be of a fixed size. Instead of resizing a local hash map, the corresponding worker execution thread may replace its local hash map with a new hash map when a certain load factor is reached and place the current local hash map into a buffer containing hash maps that are ready to be merged. In some embodiments, the size of the local hash maps may be sized such that the local hash maps fit in a cache (e.g., L2 or L3). The specific size of the cache may depend on the sizes of caches in a given CPU architecture.
- In some aspects, insertions and lookups of keys may largely take place in cache. In some embodiments, over-crowded areas within a local hash map may be avoided by maintaining statistical data regarding the local hash maps. The statistical data may indicate when the local hash map should be declared full (independent of an actual load factor). In some aspects and embodiments, the size of a buffer of a computing system and environment holding local hash maps ready to be merged is a tuning parameter, wherein a smaller buffer may induce more merge operations while a larger buffer will necessarily require more memory.
- In some embodiments, a global result may be organized into bucketed index hash maps where each result hash map includes multiple fixed-size physical memory blocks. In this configuration, cache-efficient merging may be realized, as well as memory allocation being more efficient and sustainable since allocated blocks may be shared between queries. In some aspects, when a certain load factor within a global result hash map is reached during a merge operation, the hash map may be resized. Resizing a hash map may be accomplished by increasing its number of memory blocks. Resizing of a bucketed index hash map may entail needing to know the entries to be repositioned. In some embodiments, the maps' hash function may be chosen such that its codomain increases by adding further least significant bits of need during a resize operation. In an effort to avoid too many resize operations, an estimate of a final target size may be determined before an actual resizing of the hash map.
- In some embodiments, the index hash map framework discussed above may provide an infrastructure to implement parallelized query processing algorithms or operations. One embodiment of a parallelized query processing algorithm includes a hash-based aggregation, as will be discussed in greater detail below.
- In some embodiments, a parallelized aggregation refers to a relational aggregation that groups and condenses relational data stored in tables. An example of a table that may form an input of a parallel aggregation operation herein is depicted in
FIG. 5A . Table 500 includes sales data. The sales data is organized in three columns—aProduct column 505, aCountry column 510, and a Revenue column 15. Table 500 may be grouped and aggregated by, for example, four combinations of columns—by Product and Country, by Product, and by Country. In the following discussion the columns by which an aggregation groups the data is referred to as group columns. - Aggregation result tables determined by the four different groupings are illustrated in
FIGS. 5B-5D . Each of the result tables 520, 540, 555, and 570 contain the distinct values (groups) of the desired group columns and, per group, the aggregated value. For example, table 520 includes the results based on grouping by Product and Country.Columns FIG. 5A ,columns 505 and 510) and the aggregated value for each distinct Product and Country group is included incolumn 535. Furthermore, table 540 includes the results based on grouping by Product.Column 545 includes the distinct Product values (i.e., groups) of the desired Product column (FIG. 5A , column 505) and the aggregated value for each distinct Product group. Table 555 includes the results based on grouping by Country wherecolumns FIG. 5A , column 510) and the aggregated value for each distinct Country group. - In some embodiments, such as the examples of
FIGS. 5A-5D , a summation function SUM is used to aggregate values. However, other aggregation functions such as, for example and not as a limitation, a COUNT, a MIN, a MAX, and an AVG aggregation function may be used. The column containing the aggregates may be referred to herein as the aggregate column. Thus, the aggregate columns inFIGS. 5A-5E arecolumns - In an effort to fully utilize the resources of parallel computing environments with shared memory, an aggregation operation should be computed and determined in parallel. In an instance the aggregation is not computed in parallel, the processing performance for the aggregation would be bound by the speed of a single processing unit instead of being realized by the multiple processing units available in the parallel computing environment.
-
FIG. 6 is an illustrative depiction of a parallel aggregation flow, according to some embodiments herein. In some aspects, the parallel aggregation flow 600 uses the index hash table framework discussed hereinabove. In the example ofFIG. 6 , two degrees of parallelism are depicted and are achieved by the concurrent execution of two execution threads. However, the concepts conveyed byFIG. 6 may be extended to additional degrees of parallelism, including computation environments now known and those that become known in the future. InFIG. 6 , input table 605 is separated into a plurality of partitions. Input table 605 is shown divided intopartitions Partitions - In some embodiments, a first plurality of execution threads, aggregator threads, are initially running and a second plurality of execution threads are not initially running or are in a sleep state. The concurrently operating aggregator threads operate to fetch an exclusive part of table 605.
Partition 610 is fetched byaggregator thread 620 andpartition 615 is fetched byaggregator thread 625. - Each of the aggregator threads may read their partition and aggregate the values of each partition into a private local hash table or hash map.
Aggregator thread 620 producesprivate hash map 630 andaggregator thread 625 produceslocal hash map 635. Since each aggregator thread processes its own separate portion of input table 605, and has its private hash map, the parallel processing of the partitions may be accomplished lock-free. - In some embodiments, the local hash tables may be the same size as the cache associated with the processing unit executing an aggregator thread. Sizing the local hash tables in this manner may function to avoid cache misses. In some aspects, input data may be read from table 605 to aggregate and written to the local hash tables row-wise or column-wise.
- When a partition is consumed by an aggregator thread, the aggregator thread may fetch another, unprocessed partition of input table 605. In some embodiments, the aggregator threads move their associated local hash maps into a
buffer 640 when the local hash table reaches a threshold size, initiate a new local hash table, and proceed. - In some embodiments, when the number of hash tables in
buffer 640 reaches a threshold size, the aggregator threads may wake up a second plurality of execution threads, referred to in the present example as merger threads, and the aggregator threads may enter a sleep state. In some embodiments, the local hash maps may be retained inbuffer 640 until the entire input table 605 is consumed by theaggregations threads aggregator threads - Each of the merger threads is responsible for a certain partition of all of the private hash maps in
buffer 640. The particular data partition each merger thread is responsible for may be determined by assigning distinct, designated key values of the local hash maps to each of the merger threads. That is, the partition of the data of the portion the data for which each merger thread is responsible may be determined by “key splitting” in the local hash maps. As illustrated inFIG. 6 ,merger thread 1 is responsible for designatedkeys 665 andmerger thread 2 is responsible forkeys 670. Each of themerger threads buffer 640, read their respective data partition as determined by the key splitting, and merge their respective data partition into a thread-local part hash table (or part hash map). - As further illustrated in
FIG. 6 , merger thread 1 (662) and merger thread 2 (664) each consider all of the private hash maps inbuffer 640 based on the key based partitions they are each responsible for and produce, respectively, part hash map 1 (675) and part hash map 2 (680). - In some embodiments, in the instance a merger thread has processed its data partition and there are additional data partitions in need of being processed, the executing merger threads may acquire responsibility for a new data partition and proceed to process the new data partition as discussed above. In the instance all data partitions are processed, the merger threads may enter a sleep state and the aggregator threads may return to an active, running state. Upon returning to the active, running state, the processes discussed above may repeat.
- In the instance there is no more data to be processed by the aggregator threads and the merger threads, the parallel aggregation operation herein may terminate. The results of the aggregation process will be contained in the set of part hash maps (e.g., 675 and 680). In some respects, the part hash maps may be seen as forming a parallel result since the part hash maps are disjoint.
- In some embodiments, the part hash maps may be processed in parallel. As an example, a having clause may be evaluated and applied to every group or parallel sorting and merging may be performed thereon.
- An overall result may be obtained from the disjoint part hash maps by concatenating them together, as depicted in
FIG. 6 at 685. -
FIG. 7 is an illustrative example of a flow diagram 700 relating to some parallel aggregation embodiments herein. At S705, exclusive partitions of an input data table are received or retrieved for aggregating in parallel. At S710 the values of each of the exclusive partitions are aggregated. In some embodiments, the values of each partition are aggregated into a local hash map by one of a plurality of concurrently running execution threads. - At S715 a determination is made whether the aggregating of the partitions of the input table partitions is complete or whether the buffer is full. In the instance additional partitions remain to be aggregated and
buffer 640 is not full, whether at the end of aggregating a current partition and/or for other considerations, process 700 returns to further aggregate partitions of the input data and store the aggregated values in key—index pairs in local hash tables. In the instance aggregating of the partitions is complete or the buffer is full, process 700 proceeds to assign designated parts of the local hash tables or hash maps to a second plurality of execution threads at S720. The second plurality of execution threads work to merge the designated parts of the local hash maps into thread-local part hash maps at S725 and to produce result tables. - At S730, a determination is made whether the aggregating is complete. In the instance the aggregating is not complete, process 700 returns to further aggregate partitions of the input data. In the instance aggregating is complete, process 700 proceeds S735.
- At S735, process 700 operates to generate a global result by assembling the results obtained at S725 into a composite result table. In some embodiments, the overall result may be produced by concatenating the part hash maps of S725 to each other.
- Each system described herein may be implemented by any number of devices in communication via any number of other public and/or private networks. Two or more of the devices herein may be co-located, may be a single device, or may be located remote from one another and may communicate with one another via any known manner of network(s) and/or a dedicated connection. Moreover, each device may comprise any number of hardware and/or software elements suitable to provide the functions described herein as well as any other functions. Other topologies may be used in conjunction with other embodiments.
- All systems and processes discussed herein may be embodied in program code stored on one or more computer-readable media. Such media may include, for example, a floppy disk, a CD-ROM, a DVD-ROM, magnetic tape, and solid state Random Access Memory (RAM) or Read Only Memory (ROM) storage units. According to some embodiments, a memory storage unit may be associated with access patterns and may be independent from the device (e.g., magnetic, optoelectronic, semiconductor/solid-state, etc.) Moreover, in-memory technologies may be used such that databases, etc. may be completely operated in RAM memory at a processor. Embodiments are therefore not limited to any specific combination of hardware and software.
- Embodiments have been described herein solely for the purpose of illustration. Persons skilled in the art will recognize from this description that embodiments are not limited to those described, but may be practiced with modifications and alterations limited only by the spirit and scope of the appended claims.
Claims (23)
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/978,194 US20120011144A1 (en) | 2010-07-12 | 2010-12-23 | Aggregation in parallel computation environments with shared memory |
EP11004931.9A EP2469423B1 (en) | 2010-12-23 | 2011-06-16 | Aggregation in parallel computation environments with shared memory |
US15/016,978 US10127281B2 (en) | 2010-12-23 | 2016-02-05 | Dynamic hash table size estimation during database aggregation processing |
US15/040,501 US10114866B2 (en) | 2010-12-23 | 2016-02-10 | Memory-constrained aggregation using intra-operator pipelining |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US36330410P | 2010-07-12 | 2010-07-12 | |
US12/978,194 US20120011144A1 (en) | 2010-07-12 | 2010-12-23 | Aggregation in parallel computation environments with shared memory |
Publications (1)
Publication Number | Publication Date |
---|---|
US20120011144A1 true US20120011144A1 (en) | 2012-01-12 |
Family
ID=45439313
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/978,044 Active 2031-01-06 US8370316B2 (en) | 2010-07-12 | 2010-12-23 | Hash-join in parallel computation environments |
US12/978,194 Abandoned US20120011144A1 (en) | 2010-07-12 | 2010-12-23 | Aggregation in parallel computation environments with shared memory |
US12/982,767 Active 2032-04-28 US9223829B2 (en) | 2010-07-12 | 2010-12-30 | Interdistinct operator |
US13/742,034 Active 2032-01-25 US9177025B2 (en) | 2010-07-12 | 2013-01-15 | Hash-join in parallel computation environments |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/978,044 Active 2031-01-06 US8370316B2 (en) | 2010-07-12 | 2010-12-23 | Hash-join in parallel computation environments |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/982,767 Active 2032-04-28 US9223829B2 (en) | 2010-07-12 | 2010-12-30 | Interdistinct operator |
US13/742,034 Active 2032-01-25 US9177025B2 (en) | 2010-07-12 | 2013-01-15 | Hash-join in parallel computation environments |
Country Status (1)
Country | Link |
---|---|
US (4) | US8370316B2 (en) |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110264687A1 (en) * | 2010-04-23 | 2011-10-27 | Red Hat, Inc. | Concurrent linked hashed maps |
US20120166447A1 (en) * | 2010-12-28 | 2012-06-28 | Microsoft Corporation | Filtering queried data on data stores |
US20120254252A1 (en) * | 2011-03-31 | 2012-10-04 | International Business Machines Corporation | Input/output efficiency for online analysis processing in a relational database |
US20130013824A1 (en) * | 2011-07-08 | 2013-01-10 | Goetz Graefe | Parallel aggregation system |
WO2015077951A1 (en) * | 2013-11-28 | 2015-06-04 | Intel Corporation | Techniques for block-based indexing |
US9195599B2 (en) | 2013-06-25 | 2015-11-24 | Globalfoundries Inc. | Multi-level aggregation techniques for memory hierarchies |
US9213732B2 (en) | 2012-12-28 | 2015-12-15 | Sap Ag | Hash table and radix sort based aggregation |
US9292560B2 (en) | 2013-01-30 | 2016-03-22 | International Business Machines Corporation | Reducing collisions within a hash table |
US9311359B2 (en) | 2013-01-30 | 2016-04-12 | International Business Machines Corporation | Join operation partitioning |
US9317517B2 (en) | 2013-06-14 | 2016-04-19 | International Business Machines Corporation | Hashing scheme using compact array tables |
US9378264B2 (en) | 2013-06-18 | 2016-06-28 | Sap Se | Removing group-by characteristics in formula exception aggregation |
US9405858B2 (en) | 2013-06-14 | 2016-08-02 | International Business Machines Corporation | On-the-fly encoding method for efficient grouping and aggregation |
US9411853B1 (en) | 2012-08-03 | 2016-08-09 | Healthstudio, LLC | In-memory aggregation system and method of multidimensional data processing for enhancing speed and scalability |
US20160350394A1 (en) * | 2015-05-29 | 2016-12-01 | Sap Se | Aggregating database entries by hashing |
US9519583B1 (en) * | 2015-12-09 | 2016-12-13 | International Business Machines Corporation | Dedicated memory structure holding data for detecting available worker thread(s) and informing available worker thread(s) of task(s) to execute |
US9519668B2 (en) | 2013-05-06 | 2016-12-13 | International Business Machines Corporation | Lock-free creation of hash tables in parallel |
US9672248B2 (en) | 2014-10-08 | 2017-06-06 | International Business Machines Corporation | Embracing and exploiting data skew during a join or groupby |
US9836492B1 (en) * | 2012-11-01 | 2017-12-05 | Amazon Technologies, Inc. | Variable sized partitioning for distributed hash tables |
US9922064B2 (en) | 2015-03-20 | 2018-03-20 | International Business Machines Corporation | Parallel build of non-partitioned join hash tables and non-enforced N:1 join hash tables |
US10108653B2 (en) | 2015-03-27 | 2018-10-23 | International Business Machines Corporation | Concurrent reads and inserts into a data structure without latching or waiting by readers |
US10114866B2 (en) | 2010-12-23 | 2018-10-30 | Sap Se | Memory-constrained aggregation using intra-operator pipelining |
US10175894B1 (en) | 2014-12-30 | 2019-01-08 | EMC IP Holding Company LLC | Method for populating a cache index on a deduplicated storage system |
US10289307B1 (en) | 2014-12-30 | 2019-05-14 | EMC IP Holding Company LLC | Method for handling block errors on a deduplicated storage system |
US10303791B2 (en) | 2015-03-20 | 2019-05-28 | International Business Machines Corporation | Efficient join on dynamically compressed inner for improved fit into cache hierarchy |
US10437738B2 (en) * | 2017-01-25 | 2019-10-08 | Samsung Electronics Co., Ltd. | Storage device performing hashing-based translation between logical address and physical address |
US10503717B1 (en) * | 2014-12-30 | 2019-12-10 | EMC IP Holding Company LLC | Method for locating data on a deduplicated storage system using a SSD cache index |
US10650011B2 (en) | 2015-03-20 | 2020-05-12 | International Business Machines Corporation | Efficient performance of insert and point query operations in a column store |
US10831736B2 (en) | 2015-03-27 | 2020-11-10 | International Business Machines Corporation | Fast multi-tier indexing supporting dynamic update |
US10891234B2 (en) | 2018-04-04 | 2021-01-12 | Sap Se | Cache partitioning to accelerate concurrent workloads |
US11113237B1 (en) | 2014-12-30 | 2021-09-07 | EMC IP Holding Company LLC | Solid state cache index for a deduplicate storage system |
WO2023034328A3 (en) * | 2021-08-30 | 2023-04-13 | Data.World, Inc. | Correlating parallelized data from disparate data sources to aggregate graph data portions to predictively identify entity data |
US11816118B2 (en) | 2016-06-19 | 2023-11-14 | Data.World, Inc. | Collaborative dataset consolidation via distributed computer networks |
Families Citing this family (46)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8370316B2 (en) | 2010-07-12 | 2013-02-05 | Sap Ag | Hash-join in parallel computation environments |
US20120179669A1 (en) * | 2011-01-06 | 2012-07-12 | Al-Omari Awny K | Systems and methods for searching a search space of a query |
US8863146B1 (en) * | 2011-04-07 | 2014-10-14 | The Mathworks, Inc. | Efficient index folding using indexing expression generated using selected pair of indices for parallel operations based on the number of indices exceeding a pre-determined threshold |
WO2013124522A1 (en) | 2012-02-22 | 2013-08-29 | Nokia Corporation | A system, and a method for providing a predition for controlling a system |
US9009155B2 (en) * | 2012-04-27 | 2015-04-14 | Sap Se | Parallel set aggregation |
US9355146B2 (en) | 2012-06-29 | 2016-05-31 | International Business Machines Corporation | Efficient partitioned joins in a database with column-major layout |
US10387293B2 (en) * | 2012-10-09 | 2019-08-20 | Securboration, Inc. | Systems and methods for automatically parallelizing sequential code |
US10725897B2 (en) | 2012-10-09 | 2020-07-28 | Securboration, Inc. | Systems and methods for automatically parallelizing sequential code |
US9569400B2 (en) * | 2012-11-21 | 2017-02-14 | International Business Machines Corporation | RDMA-optimized high-performance distributed cache |
US9378179B2 (en) | 2012-11-21 | 2016-06-28 | International Business Machines Corporation | RDMA-optimized high-performance distributed cache |
US20140214886A1 (en) | 2013-01-29 | 2014-07-31 | ParElastic Corporation | Adaptive multi-client saas database |
US9600852B2 (en) * | 2013-05-10 | 2017-03-21 | Nvidia Corporation | Hierarchical hash tables for SIMT processing and a method of establishing hierarchical hash tables |
US9411845B2 (en) | 2013-06-13 | 2016-08-09 | Sap Se | Integration flow database runtime |
US9659050B2 (en) | 2013-08-06 | 2017-05-23 | Sybase, Inc. | Delta store giving row-level versioning semantics to a non-row-level versioning underlying store |
CN104424326B (en) * | 2013-09-09 | 2018-06-15 | 华为技术有限公司 | A kind of data processing method and device |
US9558221B2 (en) * | 2013-11-13 | 2017-01-31 | Sybase, Inc. | Multi-pass, parallel merge for partitioned intermediate pages |
US9529849B2 (en) * | 2013-12-31 | 2016-12-27 | Sybase, Inc. | Online hash based optimizer statistics gathering in a database |
US9824106B1 (en) * | 2014-02-20 | 2017-11-21 | Amazon Technologies, Inc. | Hash based data processing |
US9792328B2 (en) | 2014-03-13 | 2017-10-17 | Sybase, Inc. | Splitting of a join operation to allow parallelization |
US9836505B2 (en) | 2014-03-13 | 2017-12-05 | Sybase, Inc. | Star and snowflake join query performance |
US10380183B2 (en) * | 2014-04-03 | 2019-08-13 | International Business Machines Corporation | Building and querying hash tables on processors |
US9684684B2 (en) | 2014-07-08 | 2017-06-20 | Sybase, Inc. | Index updates using parallel and hybrid execution |
US9785660B2 (en) | 2014-09-25 | 2017-10-10 | Sap Se | Detection and quantifying of data redundancy in column-oriented in-memory databases |
US20160378824A1 (en) * | 2015-06-24 | 2016-12-29 | Futurewei Technologies, Inc. | Systems and Methods for Parallelizing Hash-based Operators in SMP Databases |
US10482076B2 (en) | 2015-08-14 | 2019-11-19 | Sap Se | Single level, multi-dimension, hash-based table partitioning |
US10726015B1 (en) * | 2015-11-01 | 2020-07-28 | Yellowbrick Data, Inc. | Cache-aware system and method for identifying matching portions of two sets of data in a multiprocessor system |
US10083206B2 (en) * | 2015-11-19 | 2018-09-25 | Business Objects Software Limited | Visualization of combined table data |
US10528284B2 (en) | 2016-03-29 | 2020-01-07 | Samsung Electronics Co., Ltd. | Method and apparatus for enabling larger memory capacity than physical memory size |
US10678704B2 (en) | 2016-03-29 | 2020-06-09 | Samsung Electronics Co., Ltd. | Method and apparatus for enabling larger memory capacity than physical memory size |
US9983821B2 (en) | 2016-03-29 | 2018-05-29 | Samsung Electronics Co., Ltd. | Optimized hopscotch multiple hash tables for efficient memory in-line deduplication application |
US10496543B2 (en) | 2016-03-31 | 2019-12-03 | Samsung Electronics Co., Ltd. | Virtual bucket multiple hash tables for efficient memory in-line deduplication application |
US9966152B2 (en) | 2016-03-31 | 2018-05-08 | Samsung Electronics Co., Ltd. | Dedupe DRAM system algorithm architecture |
CN109416682B (en) | 2016-06-30 | 2020-12-15 | 华为技术有限公司 | System and method for managing database |
US10685004B2 (en) * | 2016-07-11 | 2020-06-16 | Salesforce.Com, Inc. | Multiple feature hash map to enable feature selection and efficient memory usage |
US11481321B2 (en) | 2017-03-27 | 2022-10-25 | Sap Se | Asynchronous garbage collection in parallel transaction system without locking |
US10726006B2 (en) | 2017-06-30 | 2020-07-28 | Microsoft Technology Licensing, Llc | Query optimization using propagated data distinctness |
US10489348B2 (en) * | 2017-07-17 | 2019-11-26 | Alteryx, Inc. | Performing hash joins using parallel processing |
US10552452B2 (en) | 2017-10-16 | 2020-02-04 | Alteryx, Inc. | Asynchronously processing sequential data blocks |
US10558364B2 (en) | 2017-10-16 | 2020-02-11 | Alteryx, Inc. | Memory allocation in a data analytics system |
US10810207B2 (en) * | 2018-04-03 | 2020-10-20 | Oracle International Corporation | Limited memory and statistics resilient hash join execution |
US11625398B1 (en) | 2018-12-12 | 2023-04-11 | Teradata Us, Inc. | Join cardinality estimation using machine learning and graph kernels |
US11016778B2 (en) | 2019-03-12 | 2021-05-25 | Oracle International Corporation | Method for vectorizing Heapsort using horizontal aggregation SIMD instructions |
US11258585B2 (en) * | 2019-03-25 | 2022-02-22 | Woven Planet North America, Inc. | Systems and methods for implementing robotics frameworks |
US11797539B2 (en) * | 2019-09-12 | 2023-10-24 | Oracle International Corporation | Accelerated building and probing of hash tables using symmetric vector processing |
EP4028907B1 (en) | 2019-09-12 | 2023-10-04 | Oracle International Corporation | Accelerated building and probing of hash tables using symmetric vector processing |
US11138232B1 (en) | 2020-10-15 | 2021-10-05 | Snowflake Inc. | Export data from tables into partitioned folders on an external data lake |
Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5230047A (en) * | 1990-04-16 | 1993-07-20 | International Business Machines Corporation | Method for balancing of distributed tree file structures in parallel computing systems to enable recovery after a failure |
US5850547A (en) * | 1997-01-08 | 1998-12-15 | Oracle Corporation | Method and apparatus for parallel processing aggregates using intermediate aggregate values |
US5884299A (en) * | 1997-02-06 | 1999-03-16 | Ncr Corporation | Optimization of SQL queries involving aggregate expressions using a plurality of local and global aggregation operations |
US20020051536A1 (en) * | 2000-10-31 | 2002-05-02 | Kabushiki Kaisha Toshiba | Microprocessor with program and data protection function under multi-task environment |
US6708178B1 (en) * | 2001-06-04 | 2004-03-16 | Oracle International Corporation | Supporting B+tree indexes on primary B+tree structures with large primary keys |
US6859808B1 (en) * | 2001-05-31 | 2005-02-22 | Oracle International Corporation | Mapping logical row identifiers for primary B+tree-like structures to physical row identifiers |
US7054872B1 (en) * | 2001-05-29 | 2006-05-30 | Oracle International Corporation | Online tracking and fixing of invalid guess-DBAs in secondary indexes and mapping tables on primary B+tree structures |
US20060182046A1 (en) * | 2005-02-16 | 2006-08-17 | Benoit Dageville | Parallel partition-wise aggregation |
US7124147B2 (en) * | 2003-04-29 | 2006-10-17 | Hewlett-Packard Development Company, L.P. | Data structures related to documents, and querying such data structures |
US20060271568A1 (en) * | 2005-05-25 | 2006-11-30 | Experian Marketing Solutions, Inc. | Distributed and interactive database architecture for parallel and asynchronous data processing of complex data and for real-time query processing |
US7216338B2 (en) * | 2002-02-20 | 2007-05-08 | Microsoft Corporation | Conformance execution of non-deterministic specifications for components |
US20080162409A1 (en) * | 2006-12-27 | 2008-07-03 | Microsoft Corporation | Iterate-aggregate query parallelization |
US20080313128A1 (en) * | 2007-06-12 | 2008-12-18 | Microsoft Corporation | Disk-Based Probabilistic Set-Similarity Indexes |
US20090164412A1 (en) * | 2007-12-21 | 2009-06-25 | Robert Joseph Bestgen | Multiple Result Sets Generated from Single Pass Through a Dataspace |
US20100010967A1 (en) * | 2008-07-11 | 2010-01-14 | Day Management Ag | System and method for a log-based data storage |
US20100082633A1 (en) * | 2008-10-01 | 2010-04-01 | Jurgen Harbarth | Database index and database for indexing text documents |
US20100217953A1 (en) * | 2009-02-23 | 2010-08-26 | Beaman Peter D | Hybrid hash tables |
US20110246503A1 (en) * | 2010-04-06 | 2011-10-06 | Bender Michael A | High-Performance Streaming Dictionary |
US20110252033A1 (en) * | 2010-04-09 | 2011-10-13 | International Business Machines Corporation | System and method for multithreaded text indexing for next generation multi-core architectures |
Family Cites Families (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5742806A (en) * | 1994-01-31 | 1998-04-21 | Sun Microsystems, Inc. | Apparatus and method for decomposing database queries for database management system including multiprocessor digital data processing system |
US6338056B1 (en) * | 1998-12-14 | 2002-01-08 | International Business Machines Corporation | Relational database extender that supports user-defined index types and user-defined search |
US6430550B1 (en) * | 1999-12-03 | 2002-08-06 | Oracle Corporation | Parallel distinct aggregates |
US6507847B1 (en) * | 1999-12-17 | 2003-01-14 | Openwave Systems Inc. | History database structure for Usenet |
US7174429B2 (en) * | 2001-12-28 | 2007-02-06 | Intel Corporation | Method for extending the local memory address space of a processor |
US6952692B1 (en) * | 2002-05-17 | 2005-10-04 | Ncr Corporation | Execution of requests in a parallel database system |
US7356542B2 (en) * | 2003-08-22 | 2008-04-08 | Oracle International Corporation | DML statements for densifying data |
US7426520B2 (en) * | 2003-09-10 | 2008-09-16 | Exeros, Inc. | Method and apparatus for semantic discovery and mapping between data sources |
US9183256B2 (en) * | 2003-09-19 | 2015-11-10 | Ibm International Group B.V. | Performing sequence analysis as a relational join |
US7260563B1 (en) * | 2003-10-08 | 2007-08-21 | Ncr Corp. | Efficient costing for inclusion merge join |
US8145642B2 (en) | 2004-11-30 | 2012-03-27 | Oracle International Corporation | Method and apparatus to support bitmap filtering in a parallel system |
US8126870B2 (en) * | 2005-03-28 | 2012-02-28 | Sybase, Inc. | System and methodology for parallel query optimization using semantic-based partitioning |
US20060288030A1 (en) * | 2005-06-09 | 2006-12-21 | Ramon Lawrence | Early hash join |
US7801912B2 (en) * | 2005-12-29 | 2010-09-21 | Amazon Technologies, Inc. | Method and apparatus for a searchable data service |
US20070250470A1 (en) * | 2006-04-24 | 2007-10-25 | Microsoft Corporation | Parallelization of language-integrated collection operations |
US8122006B2 (en) * | 2007-05-29 | 2012-02-21 | Oracle International Corporation | Event processing query language including retain clause |
US7966343B2 (en) * | 2008-04-07 | 2011-06-21 | Teradata Us, Inc. | Accessing data in a column store database based on hardware compatible data structures |
US8862625B2 (en) * | 2008-04-07 | 2014-10-14 | Teradata Us, Inc. | Accessing data in a column store database based on hardware compatible indexing and replicated reordered columns |
US7970872B2 (en) * | 2007-10-01 | 2011-06-28 | Accenture Global Services Limited | Infrastructure for parallel programming of clusters of machines |
US8005868B2 (en) * | 2008-03-07 | 2011-08-23 | International Business Machines Corporation | System and method for multiple distinct aggregate queries |
US8032503B2 (en) * | 2008-08-05 | 2011-10-04 | Teradata Us, Inc. | Deferred maintenance of sparse join indexes |
US8078646B2 (en) * | 2008-08-08 | 2011-12-13 | Oracle International Corporation | Representing and manipulating RDF data in a relational database management system |
US8150836B2 (en) * | 2008-08-19 | 2012-04-03 | Teradata Us, Inc. | System, method, and computer-readable medium for reducing row redistribution costs for parallel join operations |
US8069210B2 (en) * | 2008-10-10 | 2011-11-29 | Microsoft Corporation | Graph based bot-user detection |
US8620884B2 (en) * | 2008-10-24 | 2013-12-31 | Microsoft Corporation | Scalable blob storage integrated with scalable structured storage |
US8370316B2 (en) | 2010-07-12 | 2013-02-05 | Sap Ag | Hash-join in parallel computation environments |
-
2010
- 2010-12-23 US US12/978,044 patent/US8370316B2/en active Active
- 2010-12-23 US US12/978,194 patent/US20120011144A1/en not_active Abandoned
- 2010-12-30 US US12/982,767 patent/US9223829B2/en active Active
-
2013
- 2013-01-15 US US13/742,034 patent/US9177025B2/en active Active
Patent Citations (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5230047A (en) * | 1990-04-16 | 1993-07-20 | International Business Machines Corporation | Method for balancing of distributed tree file structures in parallel computing systems to enable recovery after a failure |
US5850547A (en) * | 1997-01-08 | 1998-12-15 | Oracle Corporation | Method and apparatus for parallel processing aggregates using intermediate aggregate values |
US5884299A (en) * | 1997-02-06 | 1999-03-16 | Ncr Corporation | Optimization of SQL queries involving aggregate expressions using a plurality of local and global aggregation operations |
US20020051536A1 (en) * | 2000-10-31 | 2002-05-02 | Kabushiki Kaisha Toshiba | Microprocessor with program and data protection function under multi-task environment |
US7054872B1 (en) * | 2001-05-29 | 2006-05-30 | Oracle International Corporation | Online tracking and fixing of invalid guess-DBAs in secondary indexes and mapping tables on primary B+tree structures |
US6859808B1 (en) * | 2001-05-31 | 2005-02-22 | Oracle International Corporation | Mapping logical row identifiers for primary B+tree-like structures to physical row identifiers |
US6708178B1 (en) * | 2001-06-04 | 2004-03-16 | Oracle International Corporation | Supporting B+tree indexes on primary B+tree structures with large primary keys |
US7809674B2 (en) * | 2001-06-04 | 2010-10-05 | Oracle International Corporation | Supporting B+tree indexes on primary B+tree structures with large primary keys |
US7216338B2 (en) * | 2002-02-20 | 2007-05-08 | Microsoft Corporation | Conformance execution of non-deterministic specifications for components |
US7124147B2 (en) * | 2003-04-29 | 2006-10-17 | Hewlett-Packard Development Company, L.P. | Data structures related to documents, and querying such data structures |
US7779008B2 (en) * | 2005-02-16 | 2010-08-17 | Oracle International Corporation | Parallel partition-wise aggregation |
US20060182046A1 (en) * | 2005-02-16 | 2006-08-17 | Benoit Dageville | Parallel partition-wise aggregation |
US20060271568A1 (en) * | 2005-05-25 | 2006-11-30 | Experian Marketing Solutions, Inc. | Distributed and interactive database architecture for parallel and asynchronous data processing of complex data and for real-time query processing |
US20080162409A1 (en) * | 2006-12-27 | 2008-07-03 | Microsoft Corporation | Iterate-aggregate query parallelization |
US20080313128A1 (en) * | 2007-06-12 | 2008-12-18 | Microsoft Corporation | Disk-Based Probabilistic Set-Similarity Indexes |
US7610283B2 (en) * | 2007-06-12 | 2009-10-27 | Microsoft Corporation | Disk-based probabilistic set-similarity indexes |
US20090164412A1 (en) * | 2007-12-21 | 2009-06-25 | Robert Joseph Bestgen | Multiple Result Sets Generated from Single Pass Through a Dataspace |
US20100010967A1 (en) * | 2008-07-11 | 2010-01-14 | Day Management Ag | System and method for a log-based data storage |
US20100082633A1 (en) * | 2008-10-01 | 2010-04-01 | Jurgen Harbarth | Database index and database for indexing text documents |
US20100217953A1 (en) * | 2009-02-23 | 2010-08-26 | Beaman Peter D | Hybrid hash tables |
US20110246503A1 (en) * | 2010-04-06 | 2011-10-06 | Bender Michael A | High-Performance Streaming Dictionary |
US20110252033A1 (en) * | 2010-04-09 | 2011-10-13 | International Business Machines Corporation | System and method for multithreaded text indexing for next generation multi-core architectures |
Non-Patent Citations (3)
Title |
---|
Wikipedia, "Barrier (computer science)" retrieved 2/19/2016. * |
Wikipedia, "Parallel computing" retrieved 2/19/2016. * |
Wikipedia, "Table (database)" retrieved 2/19/2016. * |
Cited By (49)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8719307B2 (en) * | 2010-04-23 | 2014-05-06 | Red Hat, Inc. | Concurrent linked hashed maps |
US20110264687A1 (en) * | 2010-04-23 | 2011-10-27 | Red Hat, Inc. | Concurrent linked hashed maps |
US10114866B2 (en) | 2010-12-23 | 2018-10-30 | Sap Se | Memory-constrained aggregation using intra-operator pipelining |
US10311105B2 (en) * | 2010-12-28 | 2019-06-04 | Microsoft Technology Licensing, Llc | Filtering queried data on data stores |
US20120166447A1 (en) * | 2010-12-28 | 2012-06-28 | Microsoft Corporation | Filtering queried data on data stores |
US20120254252A1 (en) * | 2011-03-31 | 2012-10-04 | International Business Machines Corporation | Input/output efficiency for online analysis processing in a relational database |
US8719312B2 (en) * | 2011-03-31 | 2014-05-06 | International Business Machines Corporation | Input/output efficiency for online analysis processing in a relational database |
US20130013824A1 (en) * | 2011-07-08 | 2013-01-10 | Goetz Graefe | Parallel aggregation system |
US8700822B2 (en) * | 2011-07-08 | 2014-04-15 | Hewlett-Packard Development Company, L.P. | Parallel aggregation system |
US9411853B1 (en) | 2012-08-03 | 2016-08-09 | Healthstudio, LLC | In-memory aggregation system and method of multidimensional data processing for enhancing speed and scalability |
US9836492B1 (en) * | 2012-11-01 | 2017-12-05 | Amazon Technologies, Inc. | Variable sized partitioning for distributed hash tables |
US9213732B2 (en) | 2012-12-28 | 2015-12-15 | Sap Ag | Hash table and radix sort based aggregation |
US9292560B2 (en) | 2013-01-30 | 2016-03-22 | International Business Machines Corporation | Reducing collisions within a hash table |
US9311359B2 (en) | 2013-01-30 | 2016-04-12 | International Business Machines Corporation | Join operation partitioning |
US9317548B2 (en) | 2013-01-30 | 2016-04-19 | International Business Machines Corporation | Reducing collisions within a hash table |
US9665624B2 (en) | 2013-01-30 | 2017-05-30 | International Business Machines Corporation | Join operation partitioning |
US9519668B2 (en) | 2013-05-06 | 2016-12-13 | International Business Machines Corporation | Lock-free creation of hash tables in parallel |
US9317517B2 (en) | 2013-06-14 | 2016-04-19 | International Business Machines Corporation | Hashing scheme using compact array tables |
US9367556B2 (en) | 2013-06-14 | 2016-06-14 | International Business Machines Corporation | Hashing scheme using compact array tables |
US9471710B2 (en) | 2013-06-14 | 2016-10-18 | International Business Machines Corporation | On-the-fly encoding method for efficient grouping and aggregation |
US10592556B2 (en) | 2013-06-14 | 2020-03-17 | International Business Machines Corporation | On-the-fly encoding method for efficient grouping and aggregation |
US9405858B2 (en) | 2013-06-14 | 2016-08-02 | International Business Machines Corporation | On-the-fly encoding method for efficient grouping and aggregation |
US9378264B2 (en) | 2013-06-18 | 2016-06-28 | Sap Se | Removing group-by characteristics in formula exception aggregation |
US10489394B2 (en) | 2013-06-18 | 2019-11-26 | Sap Se | Database query calculation using an operator that explicitly removes group-by characteristics |
US9195599B2 (en) | 2013-06-25 | 2015-11-24 | Globalfoundries Inc. | Multi-level aggregation techniques for memory hierarchies |
WO2015077951A1 (en) * | 2013-11-28 | 2015-06-04 | Intel Corporation | Techniques for block-based indexing |
US10242038B2 (en) | 2013-11-28 | 2019-03-26 | Intel Corporation | Techniques for block-based indexing |
US9672248B2 (en) | 2014-10-08 | 2017-06-06 | International Business Machines Corporation | Embracing and exploiting data skew during a join or groupby |
US10489403B2 (en) | 2014-10-08 | 2019-11-26 | International Business Machines Corporation | Embracing and exploiting data skew during a join or groupby |
US11113237B1 (en) | 2014-12-30 | 2021-09-07 | EMC IP Holding Company LLC | Solid state cache index for a deduplicate storage system |
US10503717B1 (en) * | 2014-12-30 | 2019-12-10 | EMC IP Holding Company LLC | Method for locating data on a deduplicated storage system using a SSD cache index |
US10175894B1 (en) | 2014-12-30 | 2019-01-08 | EMC IP Holding Company LLC | Method for populating a cache index on a deduplicated storage system |
US10289307B1 (en) | 2014-12-30 | 2019-05-14 | EMC IP Holding Company LLC | Method for handling block errors on a deduplicated storage system |
US9922064B2 (en) | 2015-03-20 | 2018-03-20 | International Business Machines Corporation | Parallel build of non-partitioned join hash tables and non-enforced N:1 join hash tables |
US11061878B2 (en) | 2015-03-20 | 2021-07-13 | International Business Machines Corporation | Parallel build of non-partitioned join hash tables and non-enforced N:1 join hash tables |
US10394783B2 (en) | 2015-03-20 | 2019-08-27 | International Business Machines Corporation | Parallel build of non-partitioned join hash tables and non-enforced N:1 join hash tables |
US10303791B2 (en) | 2015-03-20 | 2019-05-28 | International Business Machines Corporation | Efficient join on dynamically compressed inner for improved fit into cache hierarchy |
US10387397B2 (en) | 2015-03-20 | 2019-08-20 | International Business Machines Corporation | Parallel build of non-partitioned join hash tables and non-enforced n:1 join hash tables |
US10650011B2 (en) | 2015-03-20 | 2020-05-12 | International Business Machines Corporation | Efficient performance of insert and point query operations in a column store |
US10108653B2 (en) | 2015-03-27 | 2018-10-23 | International Business Machines Corporation | Concurrent reads and inserts into a data structure without latching or waiting by readers |
US10831736B2 (en) | 2015-03-27 | 2020-11-10 | International Business Machines Corporation | Fast multi-tier indexing supporting dynamic update |
US11080260B2 (en) | 2015-03-27 | 2021-08-03 | International Business Machines Corporation | Concurrent reads and inserts into a data structure without latching or waiting by readers |
US20160350394A1 (en) * | 2015-05-29 | 2016-12-01 | Sap Se | Aggregating database entries by hashing |
US10055480B2 (en) * | 2015-05-29 | 2018-08-21 | Sap Se | Aggregating database entries by hashing |
US9519583B1 (en) * | 2015-12-09 | 2016-12-13 | International Business Machines Corporation | Dedicated memory structure holding data for detecting available worker thread(s) and informing available worker thread(s) of task(s) to execute |
US11816118B2 (en) | 2016-06-19 | 2023-11-14 | Data.World, Inc. | Collaborative dataset consolidation via distributed computer networks |
US10437738B2 (en) * | 2017-01-25 | 2019-10-08 | Samsung Electronics Co., Ltd. | Storage device performing hashing-based translation between logical address and physical address |
US10891234B2 (en) | 2018-04-04 | 2021-01-12 | Sap Se | Cache partitioning to accelerate concurrent workloads |
WO2023034328A3 (en) * | 2021-08-30 | 2023-04-13 | Data.World, Inc. | Correlating parallelized data from disparate data sources to aggregate graph data portions to predictively identify entity data |
Also Published As
Publication number | Publication date |
---|---|
US8370316B2 (en) | 2013-02-05 |
US20120011108A1 (en) | 2012-01-12 |
US20130138628A1 (en) | 2013-05-30 |
US20120011133A1 (en) | 2012-01-12 |
US9177025B2 (en) | 2015-11-03 |
US9223829B2 (en) | 2015-12-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20120011144A1 (en) | Aggregation in parallel computation environments with shared memory | |
EP2469423B1 (en) | Aggregation in parallel computation environments with shared memory | |
US11157478B2 (en) | Technique of comprehensively support autonomous JSON document object (AJD) cloud service | |
US10572475B2 (en) | Leveraging columnar encoding for query operations | |
US10628419B2 (en) | Many-core algorithms for in-memory column store databases | |
US8660985B2 (en) | Multi-dimensional OLAP query processing method oriented to column store data warehouse | |
Papailiou et al. | H 2 RDF+: High-performance distributed joins over large-scale RDF graphs | |
US11593323B2 (en) | Parallel and efficient technique for building and maintaining a main memory CSR based graph index in a RDBMS | |
US11797509B2 (en) | Hash multi-table join implementation method based on grouping vector | |
US7640257B2 (en) | Spatial join in a parallel database management system | |
WO2013152543A1 (en) | Multidimensional olap query processing method for column-oriented data warehouse | |
US10185743B2 (en) | Method and system for optimizing reduce-side join operation in a map-reduce framework | |
CN104376109A (en) | Multi-dimension data distribution method based on data distribution base | |
Zhao et al. | A practice of TPC-DS multidimensional implementation on NoSQL database systems | |
Gu et al. | Rainbow: a distributed and hierarchical RDF triple store with dynamic scalability | |
Tian et al. | A survey of spatio-temporal big data indexing methods in distributed environment | |
CN108319604B (en) | Optimization method for association of large and small tables in hive | |
US9870399B1 (en) | Processing column-partitioned data for row-based operations in a database system | |
US20200151178A1 (en) | System and method for sharing database query execution plans between multiple parsing engines | |
EP2469424B1 (en) | Hash-join in parallel computation environments | |
US10706055B2 (en) | Partition aware evaluation of top-N queries | |
Yu et al. | MPDBS: A multi-level parallel database system based on B-Tree | |
Shi et al. | HEDC++: an extended histogram estimator for data in the cloud | |
US11775543B1 (en) | Heapsort in a parallel processing framework | |
CN113742346A (en) | Asset big data platform architecture optimization method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SAP AG, GERMANY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:TRANSIER, FREDERIK;MATHIS, CHRISTIAN;BOHNSACK, NICO;AND OTHERS;REEL/FRAME:025565/0018 Effective date: 20101217 |
|
AS | Assignment |
Owner name: SAP AG, GERMANY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SANDERS, PETER;MULLER, INGO;SIGNING DATES FROM 20121126 TO 20121127;REEL/FRAME:029742/0678 |
|
AS | Assignment |
Owner name: SAP SE, GERMANY Free format text: CHANGE OF NAME;ASSIGNOR:SAP AG;REEL/FRAME:033625/0223 Effective date: 20140707 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION |