WO2010039898A3 - Efficient large-scale filtering and/or sorting for querying of column based data encoded structures - Google Patents
Efficient large-scale filtering and/or sorting for querying of column based data encoded structures Download PDFInfo
- Publication number
- WO2010039898A3 WO2010039898A3 PCT/US2009/059118 US2009059118W WO2010039898A3 WO 2010039898 A3 WO2010039898 A3 WO 2010039898A3 US 2009059118 W US2009059118 W US 2009059118W WO 2010039898 A3 WO2010039898 A3 WO 2010039898A3
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- rows
- sorting
- querying
- data encoded
- window
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/221—Column-oriented storage; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
- G06F16/24561—Intermediate data storage techniques for performance improvement
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The subject disclosure relates to querying of column based data encoded structures enabling efficient query processing over large scale data storage, and more specifically with respect to complex queries implicating filter and/or sort operations for data over a defined window. In this regard, in various embodiments, a method is provided that avoids scenarios involving expensive sorting of a high percentage of, or all, rows, either by not sorting any rows at all, or by sorting only a very small number of rows consistent with or smaller than a number of rows associated with the size of the requested window over the data. In one embodiment, this is achieved by splitting an external query request into two different internal sub-requests, a first one that computes statistics about distribution of rows for any specified WHERE clauses and ORDER BY columns, and a second one that selects only the rows that match the window based on the statistics.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP09818480.7A EP2350868B1 (en) | 2008-10-05 | 2009-09-30 | Efficient large-scale filtering and/or sorting for querying of column based data encoded structures |
JP2011530206A JP5580827B2 (en) | 2008-10-05 | 2009-09-30 | Efficient large-scale filtering and / or sorting for column-based data encoded structure queries |
CN200980139978.3A CN102171680B (en) | 2008-10-05 | 2009-09-30 | Efficient large-scale filtering and/or sorting for querying of column based data encoded structures |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10286308P | 2008-10-05 | 2008-10-05 | |
US61/102,863 | 2008-10-05 | ||
US12/363,637 US8478775B2 (en) | 2008-10-05 | 2009-01-30 | Efficient large-scale filtering and/or sorting for querying of column based data encoded structures |
US12/363,637 | 2009-01-30 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2010039898A2 WO2010039898A2 (en) | 2010-04-08 |
WO2010039898A3 true WO2010039898A3 (en) | 2010-07-01 |
Family
ID=42074198
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2009/059118 WO2010039898A2 (en) | 2008-10-05 | 2009-09-30 | Efficient large-scale filtering and/or sorting for querying of column based data encoded structures |
Country Status (5)
Country | Link |
---|---|
US (1) | US8478775B2 (en) |
EP (1) | EP2350868B1 (en) |
JP (1) | JP5580827B2 (en) |
CN (1) | CN102171680B (en) |
WO (1) | WO2010039898A2 (en) |
Families Citing this family (75)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8352470B2 (en) * | 2008-05-23 | 2013-01-08 | International Business Machines Corporation | Adaptive aggregation: improving the performance of grouping and duplicate elimination by avoiding unnecessary disk access |
US8478775B2 (en) * | 2008-10-05 | 2013-07-02 | Microsoft Corporation | Efficient large-scale filtering and/or sorting for querying of column based data encoded structures |
US8214382B1 (en) * | 2008-11-25 | 2012-07-03 | Sprint Communications Company L.P. | Database predicate constraints on structured query language statements |
US9667269B2 (en) | 2009-04-30 | 2017-05-30 | Oracle International Corporation | Technique for compressing XML indexes |
US8645337B2 (en) * | 2009-04-30 | 2014-02-04 | Oracle International Corporation | Storing compression units in relational tables |
US8452755B1 (en) | 2009-05-12 | 2013-05-28 | Microstrategy Incorporated | Database query analysis technology |
US8577902B1 (en) * | 2009-05-12 | 2013-11-05 | Microstrategy Incorporated | Data organization and indexing related technology |
US8832074B2 (en) * | 2009-10-21 | 2014-09-09 | Teradata Us, Inc. | System, method and computer-readable medium for dynamic skew avoidance for generic queries |
US8832142B2 (en) | 2010-08-30 | 2014-09-09 | Oracle International Corporation | Query and exadata support for hybrid columnar compressed data |
US8682904B1 (en) * | 2010-04-27 | 2014-03-25 | Google Inc. | System of intuitive sorting of a table based on a column containing fractions |
US8346758B2 (en) * | 2010-08-31 | 2013-01-01 | International Business Machines Corporation | Method and system for transmitting a query in a wireless network |
US9110907B2 (en) * | 2010-09-29 | 2015-08-18 | International Business Machines Corporation | Framework for extremely large complex objects (XLCOs) |
US10311105B2 (en) | 2010-12-28 | 2019-06-04 | Microsoft Technology Licensing, Llc | Filtering queried data on data stores |
US9087094B2 (en) | 2011-04-25 | 2015-07-21 | Microsoft Technology Licensing, Llc | Processing records in dynamic ranges |
US10380269B2 (en) | 2011-06-07 | 2019-08-13 | Entit Software Llc | Sideways information passing |
US9171041B1 (en) * | 2011-09-29 | 2015-10-27 | Pivotal Software, Inc. | RLE-aware optimization of SQL queries |
US20130262433A1 (en) * | 2012-04-02 | 2013-10-03 | James Qiu | Filtering a Distributed Join |
US9235527B2 (en) | 2012-07-20 | 2016-01-12 | The Johns Hopkins University | Multiple-cache parallel reduction and applications |
US9563658B2 (en) | 2012-08-20 | 2017-02-07 | Oracle International Corporation | Hardware implementation of the aggregation/group by operation: hash-table method |
US9600522B2 (en) * | 2012-08-20 | 2017-03-21 | Oracle International Corporation | Hardware implementation of the aggregation/group by operation: filter method |
US9727606B2 (en) * | 2012-08-20 | 2017-08-08 | Oracle International Corporation | Hardware implementation of the filter/project operations |
CN103078706B (en) * | 2012-12-31 | 2015-12-23 | 上海宇芯科技有限公司 | A kind of dictionary query compression method and device utilizing redundant digit |
EP2924587A4 (en) * | 2012-12-31 | 2015-12-02 | Huawei Tech Co Ltd | Data processing method and device |
KR102002901B1 (en) * | 2013-01-28 | 2019-07-23 | 삼성전자 주식회사 | Memory device, memory system and control method of the same |
US9165250B2 (en) | 2013-01-30 | 2015-10-20 | Bank Of America Corporation | Dynamic incident response |
US9171043B2 (en) | 2013-01-31 | 2015-10-27 | International Business Machines Corporation | Dynamically determining join order |
US9195711B2 (en) | 2013-03-11 | 2015-11-24 | International Business Machines Corporation | Persisting and retrieving arbitrary slices of nested structures using a column-oriented data store |
US10664474B1 (en) | 2013-03-15 | 2020-05-26 | Progress Software Corporation | Query system |
GB2514779A (en) * | 2013-06-03 | 2014-12-10 | Ibm | Information retrieval from a database system |
WO2015062035A1 (en) * | 2013-10-31 | 2015-05-07 | 华为技术有限公司 | Columnar database processing method and device |
CN104598485B (en) | 2013-11-01 | 2018-05-25 | 国际商业机器公司 | The method and apparatus for handling database table |
WO2015137979A1 (en) * | 2014-03-14 | 2015-09-17 | Hewlett-Packard Development Company, Lp | Column store database compression |
KR101544560B1 (en) * | 2014-04-02 | 2015-08-17 | (주)비아이매트릭스 | An online analytical processing system for big data by caching the results and generating 2-level queries by SQL parsing |
CN104021161B (en) * | 2014-05-27 | 2018-06-15 | 华为技术有限公司 | A kind of clustering storage method and device |
CN104021179B (en) * | 2014-06-05 | 2017-05-31 | 暨南大学 | The Fast Recognition Algorithm of similarity data under a kind of large data sets |
US9633058B2 (en) | 2014-06-16 | 2017-04-25 | International Business Machines Corporation | Predictive placement of columns during creation of a large database |
US9846567B2 (en) | 2014-06-16 | 2017-12-19 | International Business Machines Corporation | Flash optimized columnar data layout and data access algorithms for big data query engines |
CN104320374B (en) * | 2014-08-05 | 2017-06-27 | 杭州安恒信息技术有限公司 | A kind of Oracle transmitted data compressings restoring method |
US10176226B2 (en) * | 2014-11-26 | 2019-01-08 | Looker Data Sciences, Inc. | Relation aware aggregation (RAA) on normalized datasets |
CN104463627B (en) * | 2014-12-05 | 2017-12-19 | 北京国双科技有限公司 | Data processing method and device |
US10089674B1 (en) * | 2015-03-19 | 2018-10-02 | Amazon Technologies, Inc. | Ordering a set of data associated with an item |
CN107077453B (en) * | 2015-03-24 | 2020-11-06 | 华为技术有限公司 | System and method for parallel optimization of database queries using cluster caches |
US9952808B2 (en) | 2015-03-26 | 2018-04-24 | International Business Machines Corporation | File system block-level tiering and co-allocation |
CN104731951B (en) * | 2015-03-31 | 2018-08-07 | 北京奇艺世纪科技有限公司 | A kind of data query method and device |
US9977805B1 (en) * | 2017-02-13 | 2018-05-22 | Sas Institute Inc. | Distributed data set indexing |
CN106407190B (en) | 2015-07-27 | 2020-01-14 | 阿里巴巴集团控股有限公司 | Event record query method and device |
US10198455B2 (en) * | 2016-01-13 | 2019-02-05 | International Business Machines Corporation | Sampling-based deduplication estimation |
US10169364B2 (en) * | 2016-01-13 | 2019-01-01 | International Business Machines Corporation | Gauging accuracy of sampling-based distinct element estimation |
CN107305501B (en) * | 2016-04-25 | 2020-11-17 | 百度在线网络技术(北京)有限公司 | Multi-stream data processing method and system |
EP3449437A4 (en) * | 2016-04-29 | 2019-12-25 | Cisco Technology, Inc. | Dynamic streaming of query responses |
US10489401B2 (en) * | 2016-05-31 | 2019-11-26 | International Business Machines Corporation | Efficient aggregation in a parallel system |
US10762099B2 (en) * | 2016-06-07 | 2020-09-01 | International Business Machines Corporation | Syntactical transformation of database interaction statements |
US10242055B2 (en) * | 2016-12-07 | 2019-03-26 | Medallia, Inc. | Dual filter histogram optimization |
US10884704B2 (en) | 2017-09-21 | 2021-01-05 | International Business Machines Corporation | Sorting a table in analytical databases |
US11050682B2 (en) * | 2017-09-28 | 2021-06-29 | Intel Corporation | Reordering of data for parallel processing |
EP3718023A1 (en) * | 2017-12-01 | 2020-10-07 | MemSQL Inc. | Accelerated filtering, grouping and aggregation in a database system |
WO2019209674A1 (en) * | 2018-04-25 | 2019-10-31 | President And Fellows Of Harvard College | Systems and methods for designing data structures and synthesizing costs |
US11397712B2 (en) * | 2018-05-01 | 2022-07-26 | President And Fellows Of Harvard College | Rapid and robust predicate evaluation |
JP2020135318A (en) * | 2019-02-18 | 2020-08-31 | 株式会社日立製作所 | Data processor, and data processing method |
CN110222046B (en) * | 2019-04-28 | 2023-11-03 | 北京奥星贝斯科技有限公司 | List data processing method, device, server and storage medium |
US11194805B2 (en) | 2019-06-10 | 2021-12-07 | International Business Machines Corporation | Optimization of database execution planning |
US11455303B2 (en) * | 2019-07-19 | 2022-09-27 | Oracle International Corporation | Driving massive scale out through rewrites of analytical functions |
CN112418258A (en) * | 2019-08-22 | 2021-02-26 | 北京京东振世信息技术有限公司 | Feature discretization method and device |
CN111723114B (en) * | 2020-06-24 | 2023-07-25 | 中国工商银行股份有限公司 | Stream statistics method and device and electronic equipment |
US11580123B2 (en) * | 2020-11-13 | 2023-02-14 | Google Llc | Columnar techniques for big metadata management |
US11886404B2 (en) | 2021-01-08 | 2024-01-30 | Optum Services (Ireland) Limited | Automated database modeling |
US11449487B1 (en) | 2021-03-22 | 2022-09-20 | International Business Machines Corporation | Efficient indexing of columns with inappropriate data types in relational databases |
US11593382B2 (en) | 2021-03-22 | 2023-02-28 | International Business Machines Corporation | Efficient storage of columns with inappropriate data types in relational databases |
CN113032400B (en) * | 2021-03-31 | 2022-11-08 | 上海天旦网络科技发展有限公司 | High-performance TopN query method, system and medium for mass data |
CN113204602B (en) * | 2021-05-07 | 2023-08-01 | 星环信息科技(上海)股份有限公司 | Data processing method, device, equipment and storage medium |
US11704312B2 (en) * | 2021-08-19 | 2023-07-18 | Microsoft Technology Licensing, Llc | Conjunctive filtering with embedding models |
US11954134B2 (en) * | 2021-12-08 | 2024-04-09 | Sap Se | Visualization of complex hierarchy data with interactive adjustments |
US11853239B2 (en) * | 2022-04-11 | 2023-12-26 | Western Digital Technologies, Inc. | Hardware accelerator circuits for near storage compute systems |
CN115082135B (en) * | 2022-08-23 | 2022-11-18 | 成都乐超人科技有限公司 | Method, device, equipment and medium for identifying online time difference |
US11899662B1 (en) * | 2022-12-21 | 2024-02-13 | Teradata Us, Inc. | Compression aware aggregations for queries with expressions |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6738755B1 (en) * | 1999-05-19 | 2004-05-18 | International Business Machines Corporation | Query optimization method for incrementally estimating the cardinality of a derived relation when statistically correlated predicates are applied |
US7233939B1 (en) * | 2002-04-30 | 2007-06-19 | Oracle International Corporation | Systems and methods of optimizing database queries for efficient delivery of query data subsets |
Family Cites Families (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4575798A (en) * | 1983-06-03 | 1986-03-11 | International Business Machines Corporation | External sorting using key value distribution and range formation |
US4774657A (en) * | 1986-06-06 | 1988-09-27 | International Business Machines Corporation | Index key range estimator |
US4858106A (en) * | 1987-04-03 | 1989-08-15 | General Electric Company | Automated method implemented on a distributed data processing system for partitioning a data string into two substrings |
JPH0776906B2 (en) * | 1989-06-30 | 1995-08-16 | ディジタル イクイプメント コーポレーション | Speed and memory control for classification accelerators |
US5355478A (en) * | 1991-12-23 | 1994-10-11 | International Business Machines Corporation | Method for avoiding cache misses during external tournament tree replacement sorting procedures |
US5918225A (en) * | 1993-04-16 | 1999-06-29 | Sybase, Inc. | SQL-based database system with improved indexing methodology |
US5619693A (en) * | 1994-05-02 | 1997-04-08 | Tandem Computers Incorporated | Method for sorting and storing data employing dynamic sort tree reconfiguration in volatile memory |
US5644739A (en) * | 1995-01-27 | 1997-07-01 | Microsoft Corporation | Method and system for adding buttons to a toolbar |
US5870752A (en) * | 1997-08-21 | 1999-02-09 | Lucent Technologies Inc. | Incremental maintenance of an approximate histogram in a database system |
US6460027B1 (en) * | 1998-09-14 | 2002-10-01 | International Business Machines Corporation | Automatic recognition and rerouting of queries for optimal performance |
US6457009B1 (en) * | 1998-11-09 | 2002-09-24 | Denison W. Bollay | Method of searching multiples internet resident databases using search fields in a generic form |
US6618719B1 (en) * | 1999-05-19 | 2003-09-09 | Sybase, Inc. | Database system with methodology for reusing cost-based optimization decisions |
US6920453B2 (en) * | 2001-12-31 | 2005-07-19 | Nokia Corporation | Method and system for finding a query-subset of events within a master-set of events |
US7293024B2 (en) * | 2002-11-14 | 2007-11-06 | Seisint, Inc. | Method for sorting and distributing data among a plurality of nodes |
US7330848B2 (en) * | 2003-05-23 | 2008-02-12 | Microsoft Corporation | Method and apparatus for generating statistics on query expressions for optimization |
US20050027690A1 (en) * | 2003-07-29 | 2005-02-03 | International Business Machines Corporation | Dynamic selection of optimal grouping sequence at runtime for grouping sets, rollup and cube operations in SQL query processing |
US7383246B2 (en) * | 2003-10-31 | 2008-06-03 | International Business Machines Corporation | System, method, and computer program product for progressive query processing |
US7302543B2 (en) * | 2004-06-16 | 2007-11-27 | Nec Laboratories America, Inc. | Compressed memory architecture for embedded systems |
US8126870B2 (en) * | 2005-03-28 | 2012-02-28 | Sybase, Inc. | System and methodology for parallel query optimization using semantic-based partitioning |
US7734644B2 (en) * | 2005-05-06 | 2010-06-08 | Seaton Gras | System and method for hierarchical information retrieval from a coded collection of relational data |
US8077059B2 (en) * | 2006-07-21 | 2011-12-13 | Eric John Davies | Database adapter for relational datasets |
US20080065590A1 (en) * | 2006-09-07 | 2008-03-13 | Microsoft Corporation | Lightweight query processing over in-memory data structures |
US8478775B2 (en) * | 2008-10-05 | 2013-07-02 | Microsoft Corporation | Efficient large-scale filtering and/or sorting for querying of column based data encoded structures |
-
2009
- 2009-01-30 US US12/363,637 patent/US8478775B2/en active Active
- 2009-09-30 EP EP09818480.7A patent/EP2350868B1/en active Active
- 2009-09-30 JP JP2011530206A patent/JP5580827B2/en not_active Expired - Fee Related
- 2009-09-30 CN CN200980139978.3A patent/CN102171680B/en not_active Expired - Fee Related
- 2009-09-30 WO PCT/US2009/059118 patent/WO2010039898A2/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6738755B1 (en) * | 1999-05-19 | 2004-05-18 | International Business Machines Corporation | Query optimization method for incrementally estimating the cardinality of a derived relation when statistically correlated predicates are applied |
US7233939B1 (en) * | 2002-04-30 | 2007-06-19 | Oracle International Corporation | Systems and methods of optimizing database queries for efficient delivery of query data subsets |
Non-Patent Citations (1)
Title |
---|
See also references of EP2350868A4 * |
Also Published As
Publication number | Publication date |
---|---|
EP2350868A2 (en) | 2011-08-03 |
US8478775B2 (en) | 2013-07-02 |
EP2350868A4 (en) | 2014-08-27 |
CN102171680B (en) | 2014-05-07 |
WO2010039898A2 (en) | 2010-04-08 |
EP2350868B1 (en) | 2020-05-20 |
JP2012504825A (en) | 2012-02-23 |
US20100088315A1 (en) | 2010-04-08 |
CN102171680A (en) | 2011-08-31 |
JP5580827B2 (en) | 2014-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2010039898A3 (en) | Efficient large-scale filtering and/or sorting for querying of column based data encoded structures | |
WO2007038229A3 (en) | Non-indexed in-memory data storage and retrieval | |
WO2012173827A3 (en) | Query optimization techniques for business intelligence systems | |
WO2011011063A3 (en) | Method and system for document indexing and data querying | |
CA2894429A1 (en) | Extract operator | |
WO2014035879A3 (en) | Operating a distributed database with foreign tables | |
WO2006137977A3 (en) | Device specific content indexing for optimized device operation | |
WO2012103402A3 (en) | Searching sensor data | |
WO2010120457A3 (en) | Dynamic hash table for efficient data access in a relational database system | |
WO2010014185A3 (en) | Federated community search | |
WO2012166885A3 (en) | Context-based ranking of search results | |
WO2009009192A3 (en) | Adaptive archive data management | |
SG11201804544QA (en) | Data query method and apparatus, and database system | |
WO2010039895A3 (en) | Efficient large-scale joining for querying of column based data encoded structures | |
WO2010120929A3 (en) | Generating user-customized search results and building a semantics-enhanced search engine | |
WO2007075417A3 (en) | System and method for analyzing communications using multi-dimensional hierarchical structures | |
WO2012154992A3 (en) | Systems and methods for performing search and retrieval of electronic documents using a big index | |
WO2012103191A3 (en) | Method of and system for error correction in multiple input modality search engines | |
WO2014078641A8 (en) | Category and attribute specifications for product search queries | |
WO2012025837A3 (en) | Adaptive processing for sequence alignment | |
WO2008102681A1 (en) | Circuit operation worst condition decision system, method, and program | |
IL241640B (en) | Method for executing queries on streaming data using graphic processing units | |
EP2009561A3 (en) | Performing intelligent content indexing in method, signal, data carrier and system | |
WO2008005566A3 (en) | System and method to determine a single sql bom solve | |
GB2508742A (en) | Placement of data in shards on a storage device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 200980139978.3 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09818480 Country of ref document: EP Kind code of ref document: A2 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011530206 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2009818480 Country of ref document: EP |