US20140032593A1 - Systems and methods to process a query with a unified storage interface - Google Patents
Systems and methods to process a query with a unified storage interface Download PDFInfo
- Publication number
- US20140032593A1 US20140032593A1 US13/730,583 US201213730583A US2014032593A1 US 20140032593 A1 US20140032593 A1 US 20140032593A1 US 201213730583 A US201213730583 A US 201213730583A US 2014032593 A1 US2014032593 A1 US 2014032593A1
- Authority
- US
- United States
- Prior art keywords
- query
- index
- expression
- expression tree
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G06F17/30424—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24578—Query processing with adaptation to user needs using ranking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/319—Inverted lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
- G06F16/337—Profile generation, learning or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/93—Document management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Business, Economics & Management (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- User Interface Of Digital Computer (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Systems and methods to process a query with a unified storage interface are described. The system receives a query from a client machine and generates a query expression tree based on the query expression. The system generates a cursor expression tree based on the query expression tree. The system executes a plurality of software components in the cursor expression tree to retrieve data from a first storage device. The plurality of software components comprise a first software component that is utilized to retrieve data irrespective of a plurality of storage devices and a second software component that is utilized to retrieve data from a first storage device. Finally, the system communicates search results to the client machine, the search results include at least a portion of the data.
Description
- This application claims priority to U.S. Provisional Application No. 61/675,793, filed on Jul. 25, 2012, and entitled, “SYSTEMS AND METHODS TO BUILD AND UTILIZE A SEARCH INFRASTRUCTURE,” which is hereby incorporated by reference in its entirety.
- This disclosure relates to the technical field of data storage and retrieval. More particularly, systems and methods to process a query with a unified storage interface.
- A search infrastructure supports the storage of data items in one or more databases and the retrieval of the data items from the one or more databases. Building and utilizing the search infrastructure may present many technical challenges. In particular the performance, manageability, and quality of service in storing and retrieving the data items may present many opportunities for innovation.
- Embodiments illustrated, by way of example and not limitation, in the figures of the accompanying drawings, in which:
-
FIG. 1 is a block diagram that illustrates a system, according to an embodiment, to build and utilize a search infrastructure; -
FIG. 2A is a block diagram that illustrates an items table, according to an embodiment; -
FIG. 2B is a block diagram that illustrates the item information, according to an embodiment; -
FIG. 3A is a block diagram that illustrates the items table in association with regions, according to an embodiment; -
FIG. 3B is a block diagram that illustrates regions in association with a column, a column of query node servers and a grid of query node servers, according to an embodiment; -
FIG. 4 is a block diagram illustrating a time-line, according to an embodiment, to generate a full-index and a mini-index; -
FIG. 5A is a block diagram illustrating index information components, according to an embodiment; -
FIG. 5B is a block diagram illustrating a full-index, according to an embodiment; -
FIG. 5C is a block diagram illustrating a mini-index, according to an embodiment; -
FIG. 6A is a block diagram illustrating current bill of material information, according to an embodiment; -
FIG. 6B is a block diagram illustrating full-index bill of material information, according to an embodiment; -
FIG. 6C is a block diagram illustrating mini-index bill of material information, according to an embodiment; -
FIG. 7 is a block diagram illustrating a method, according to an embodiment, to build and utilize a search index; -
FIG. 8A is a block diagram illustrating a method to generate index information components, according to an embodiment; -
FIG. 8B is a block diagram illustrating a method to update index information based on a full-index, according to an embodiment; -
FIG. 8C is a block diagram illustrating a method to update index information based on a mini-index, according to an embodiment; -
FIG. 9A is a block diagram illustrating a data flow, according to an embodiment, to generate a full-index; -
FIG. 9B is a block diagram illustrating a data flow to generate a mint-Index, according to an embodiment; -
FIG. 10A is a network diagram illustrating a system, according to an embodiment, to process a query with a unified storage interface; -
FIG. 10B is a block diagram illustrating search back-end servers, according to an embodiment; -
FIG. 10C is a block diagram illustrating a query node server, according to an embodiment; -
FIG. 10D is a block diagram illustrating a query expression tree, according to an embodiment; -
FIG. 10E is a block diagram illustrating a cursor expression tree, according to an embodiment; -
FIG. 10F is a block diagram illustrating software layers, according to an embodiment; -
FIG. 10G is a block diagram illustrating a storage data dictionary, according to an embodiment; -
FIG. 10H is a block diagram illustrating a storage cursor object, according to an embodiment; -
FIG. 10I is a block diagram illustrating a method to process a query with a unified storage interface, according to an embodiment; -
FIG. 10J is a block diagram illustrating a method to generate a cursor expression tree, according to an embodiment; -
FIG. 11 is a network diagram depicting a networked system, according to an embodiment; -
FIG. 12 is a block diagram illustrating marketplace and payment applications, according to an embodiment; -
FIG. 13 is a high-level entity-relationship diagram, according to an embodiment; and -
FIG. 14 shows a diagrammatic representation of a machine in the example form of a computer system, according to an example embodiment. - In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of some example embodiments. It will be evident, however, to one of ordinary skill in the art that embodiments of the present disclosure may be practiced without these specific details.
- As described further below, according to various example embodiments of the disclosed subject matter described and claimed herein, systems and methods to process a query with a unified storage interface are provided. Various embodiments are described below in connection with the figures provided herein.
-
FIG. 1 illustrates asystem 10 to build and utilize a search infrastructure, according to an embodiment. Thesystem 10 may include an information storage andretrieval platform 11 that is communicatively coupled over a network (e.g., Internet) (not shown) to aclient machine 12 and aclient machine 33. - Illustrated on the top left is an operation A that describes a first user operating the
client machine 12 to interact with anapplication server 14 to store or update adocument 16 in adatabase 18; illustrated in the middle are operations B, C, D, E that describe retrieving and transforming the contents of thedatabase 18, storing the transformed contents in adatabase 20 that is time-stamped, retrieving the contents from thedatabase 20 to generate a full-index 22 and a set ofmini-indexes 24 which are utilized to generate and continually update theindex information 26 in thedatabase 28 to be consumed and served by thequery node servers 30; and illustrated on the top right is an operation F that describes a second user who operates aclient machine 33 to enter a query that is received by one or morequery node servers 30 that, in turn, apply the query to theindex information 26 to identify and return search results that reference thedocument 16. The above operations to continually rebuild theindex information 26 are performed in real-time and without interruption to service that is provided to the first and second users who continue to interact with thesystem 10. - The
index information 26 may include aninverted index 32 anddocument information 34. An inverted index (e.g., inverted index 32), as is well known in the art, is an index data structure storing a mapping from content (e.g., content contained by the document 16), such as words or numbers, to its locations in a database file, or in a document (e.g., document 16) or a set of documents. The documents 16 (e.g., document data, column group data) and/or information contained by thedocuments 16 may be stored in thedocument information 34. - Merely for example a “document X” may include the words “apple,” “orange,” and “banana:” a “document Y” may include the words “apple” and “orange,” and, a “document Z” may include the word “apple.” An inverted index for the words in documents X, Y, and Z may be generated as follows:
-
Word Document apple X(1), Y(1), Z(1) orange X(2), Y(2) banana X(3) - The above inverted index may be utilized to identify the word “apple” as being positioned in the first word of documents X, Y, and Z; the word “orange” as being positioned in the second word of the documents X and Y; and the word “banana” as being positioned as the third word of the document X. Accordingly, the above inverted index may be utilized to map a keyword “apple” contained in a query that is received from a client computer to the documents X, Y, and Z that are further referenced in search results that are returned to the client computer. It is appreciated by one skilled in the art that the
inverted index 32 corresponds to the underlying database that it describes. Accordingly, any update to the underlying database is reflected in a corresponding update to theinverted index 32. Updates to thedatabase 28 may include the addition and deletion ofdocuments 16 in thedocument information 34 as well as the update of any of the contents contained by thedocuments 16 in thedocument information 34. In the present embodiment, theindex information 26 may be updated in real time to respond to a query in real time with accurate search results that include the mostrecent document information 34. To this end, the operations A-F are now further described. - The information storage and
retrieval platform 11 includes multiple components including theapplication servers 14 that may execute on one or more application server machines (not shown), thedatabase 18, adatabase 20, an Hadoop distributedfile system 23, thedatabase 28, thequery node servers 30 that operate on query node server machines (not shown), anHBase Hadoop Cluster 44 comprised of one or more HBase/Hadoop machines (not shown) including an HBase Hadoop Node 49 (e.g, HBase Hadoop machine), an index distribution module 52 executing on HBase/Hadoop machine, search front-end servers 58 that executes on search machines (not shown), and search back-end servers 60 that execute on search machines (not shown) as being communicatively coupled together. For example, the multiple components may be communicatively coupled with any combination of a wide area network, local area network, wireless network, or any other type of network utilizing various networking technologies. - At operation A, the
document 16, or one or more elements of thedocument 16, may be communicated from theclient machine 12 to theapplication servers 14 and stored in the database 18 (e.g., Oracle database). Thedocument 16 may include multiple elements including elements a, b, c, d, e, and f that may include strings of text, numeric information, scores, or other discrete quantum of information that are positioned in different sections or fields of the document (e.g., item information). - At operation B, at the
application servers 14,event manager modules 36 may identify updates to thedatabase 18, generate events that correspond to the respective updates, prioritize the events according to the quality of the data in the event and communicate the prioritized events intoevent queues 38 that are consumed byconsumer modules 40 that service therespective event queues 38. According to an embodiment, theevent manager modules 36 and theconsumer modules 40 may utilize threeevent queues 38 to process and prioritize event types. For example, the update of the “element a” in thedocument 16 in thedatabase 18 may be a price change to item information describing an item for sale that causes the generation of a corresponding event that is associated with a high priority that, in turn, is communicated into in a first event queue associated with high priority that, in turn, is received by aconsumer module 40. Similarly, the update of the “element b” indocument 16 in thedatabase 18 may be a change to a title of the item that causes the generation of an event that is associated with a medium priority that, in turn, is communicated into a second event queue associated with the medium priority that, in turn, is received by aconsumer module 40. Finally, the update of the “element c” indocument 16 in thedatabase 18 may be a change to a description of the item that causes the generation of an event that is communicated into a third event queue associated with a low priority that, in turn, is received by aconsumer module 40. Accordingly, the threeevent queues 38 may be utilized to communicate events in high, medium, and low priorities to facilitate a preference for the update of high priority events (e.g., price) over medium priority events (e.g., title) over low priority events (e.g., description). In some embodiments the priority for the respective event types may be configured. Other embodiments may include fewer ormore event queues 38. - At operation C, the
consumer modules 40 may transform the data in the events and communicate the transformed data via an HBase application programming interface to anHBase master server 42 in an HBase/Hadoop cluster 44 that, in turn, stores the transformed data in one or more tables including an items table 21 in the database 20 (e.g., HBase). The transformed data may be stored according to regions that are managed by region server processes 46. According to an embodiment, thedatabase 20 may be embodied as an open source non-relational, distributed database (e.g., HBase) that runs on a Hadoop Distributed Filesystem (HDFS) 23,HDFS 23 is an open source software framework that supports data-intensive distributed applications, known by those skilled in the art. The HBase/Hadoop cluster 44 may further includes theHBase master server 42 that is utilized to manage the HBase/HDFS environment, ascheduler module 48, and an HBase/Hadoop node 49 that includes multiple region server processes 46 and a map-reducejob module 50. Eachregion server process 46 may further be associated with a column (not shown) that corresponds to a range of documents (e.g., or items corresponding to item information in the items table 21) and may be utilized to manage one or more regions (not shown) that respectively correspond to a range of thedocuments 16. For example, thedocuments 16 may be uniquely identified with document identifiers (e.g., item identifiers) that are numbered from 0 to X where each column and region are dedicated to respective overlapping predetermined ranges of documents (e.g., documents (0-100 and documents (0-50), as described further in this document. According to one embodiment, the number of region server processes 46 may be in the hundreds but scaling is not limited to any fixed number. HBase is a technology that provides a fault-tolerant way of storing large quantities of sparse data featuring compression, in-memory operation, and a space-efficient probabilistic data structure (e.g., Bloom filters) on a per-column basis as outlined in the original BigTable paper, as is known by those skilled in the art. An items table 21 in the database 20 (e.g., HBase) may serve as the input and output for one or more map-reduce jobs that are scheduled by the map-reducejob module 50. The map-reduce jobs may be embodied as a map jobs and reduce jobs that runs in HDFS. The items table 21 in thedatabase 20 may further be accessed through the Java Application Programming Interface (API) but also through representational state transfer (REST) architecture and other APIs. - At operation D, the
scheduler module 48, executing in the HBase/Hadoop cluster 44, may schedule two index generating sub-operations that process in parallel to generate indexes that are subsequently distributed to thequery node servers 30. The sub-operations may execute for the generating of a full-index 22 and the generating of the mini-indexes 24. The sub-operations may further execute for the distribution of the indexes to thequery node servers 30. The full-index 22 may be a snapshot of the contents of items table 21 in thedatabase 20 and the mini-indexes 24 may respectively correspond to a series of consecutive snapshots where each snapshot captures one or more updates to the items table 21 in thedatabase 20 that occurred within an associated time period of time. The distribution of the full-indexes 22 and the mini-indexes 24 to thequery node servers 30 may be over a network utilizing an index distribution module 52 based on Bit Torrent, a peer to peer file sharing protocol. In one embodiment, thescheduler module 48 may schedule the generation of the full-index 22 twice in a twenty-four hour period and the generation ofmini-indexes 24 every five minutes. Thescheduler module 48 may generate a full-index 22 that is associated with a start-time by scheduling a map-reducejob module 50. The map-reducejob module 50 may initiate a map step that divides the job into smaller sub-jobs (e.g., map tasks) and multiple reduce steps that consume the output from the sub-jobs and aggregates results to generate theindex information 26. Similarly, thescheduler module 48 may generate a mini-index 24 by scheduling a map-reducejob module 50 for execution on the HBase/Hadoop Node 49. The generation of the mini-index 24 may include a map step but not, according to one embodiment, a reduce step. Accordingly, each mini-index 24 may be associated with events that arrive from theevent queues 38 during a particular period of time and is associated with one or more full-indexes 22. Eachindex 22, 24 (e.g., full and mini) may include a bill of material (BOM) information which describes the content of theindex index information 26. The full-index 22 may include full-index BOM information 54 and the mini-index 24 may includemini-index BOM information 56. Theindex information 26 may include theinverted index 32 anddocument information 34, as previously described. - At operation E, each of the
query node servers 30 may receive the full-index 22 and the associatedmini-indexes 24. Thequery node servers 30 may be comprised of a search grid that is arranged in columns ofquery node servers 30, as described later in this document. Each column ofquery node servers 30 may be utilized to manage a range of thedocuments 16, as previously mentioned. Theindex information 26 may be stored in memory of thequery node servers 30 and in thedatabase 28 connected to thequery node servers 30. Theindex information 26 may be updated with the full-index 22 responsive to its arrival at thequery node servers 30. Further, theindex information 26 may be updated with the mini-index 24 responsive to its arrival at thequery node servers 30. Theindex information 26 is generally updated in sequential order. For example, theindex information 26 are generally updated at thequery node server 30 in the order in which the full-index 22 and the mini-indexes 24 are generated. To this end, the full-index 22 may be associated with full-index BOM information 54 the mini-index 24 may be associated withmini-index BOM information 56 that are utilized by thequery node server 30 to manage the update of theindex information 26. In one embodiment a map-reducejob module 50 may include sub-jobs that execute on the HBase/Hadoop node 49 to generate inverted indices in the form of region sub-indices (not shown) for part of the region associated with the region server (HBase) The sub-jobs may further merge or stitch the multiple region sub-indices together for the region. - At operation F, a second user who operates the
client machine 33 may enter a query that may be communicated over a network (e.g., Internet) via front-end servers 58 and back-end servers 60 to be received by thequery node servers 30 which may be divided into two layers. The two layers may include an aggregation layer and a query execution layer. The aggregation layer may include aquery node server 30 that includes a query engine 62 (e.g., query module) that receives the query that, in turn, communicates the query tomultiple query engines 62 that respectively execute in the execution layer in multiplequery node servers 30 that correspond to the columns. Thequery engines 62 in the query execution layer may, in turn, respectively apply the same query, in parallel, against respective theindex information 26 that were generated for a range of document identifiers (e.g., column) to identify search results (e.g., document 16) in parallel. Finally, thequery engines 62, at eachquery node server 30 in the query execution layer, may communicate their respective partial search results to thequery engine 62 in the aggregation layer which aggregates the multiple sets of partial search results to form a search result for theentire index information 26 and to communicate the search result over the network to the second user. -
FIG. 2A is a block diagram that illustrates an items table 21, according to an embodiment. The items table 21 may be stored in a database 20 (shown inFIG. 1 ) that is time-stamped. The items table 21 may include multiple entries ofitem information 80. According to one embodiment an entry ofitem information 80 may be in the form of a document, a listing that describes an item or service that is for sale on a network-based marketplace, or some other unit of information. Theitem information 80 may be associated with a time-stamp 81. The time-stamp 81 stores a time theitem information 80 was most recently added, deleted, or modified. -
FIG. 2B is a block diagram that illustratesitem information 80, according to an embodiment. Theitem information 80 may include fields that describe the item (e.g. document, product, service). According to one embodiment the fields may include atitle 82 that includes alphanumeric text, adescription 84 that includes alphanumeric text, apicture 86 of the item, and an item identifier 88 (e.g., 64 bit) that uniquely identifies theitem information 80 from other entries in the items table 21. Each of the fields may be associated with a time-stamp 81. The time-stamp 81 stores a time the field was most recently added, deleted, or modified. -
FIG. 3A is a block diagram illustrating an items table 21 in association with regions 90 (e.g. R1-RM), according to an embodiment. The items table 21 may be logically divided intoregions 90. Eachregion 90 is a logical construct that corresponds to a predetermined number of items (e.g.,item information 80, documents, etc.) in the items table 21 that utilize a particular range of item identifiers. Segmentation of the items table 21 intoregions 90 may facilitate an efficient generation of theindex information 26. For example, in one embodiment, the region 90 s may be associated map tasks that may be executed with multiple HBase/Hadoop Nodes 49 (e.g., machines) to process the items in therespective regions 90 to generate theindex information 26. The number ofregions 90 and HBase/Hadoop Nodes 49 may be scaled. In some embodiments, theregions 90 may further be divided into sub-regions that may be associated with sub-tasks that may be utilized to parallel process the items in theregion 90. -
FIG. 3B is a blockdiagram illustrating regions 90 in association with acolumn 98, a column ofquery node servers 94 and agrid 92 ofquery node servers 30, according to an embodiment. Thegrid 92 of servers is comprised of query node servers 30 (e.g., QN) that are arranged inquery node columns 94 andquery node rows 96. Thegrid 92 may be utilized to process a query by applying the query to index information 26 (not shown) that is stored at each of thequery node servers 30. It may be recalled that eachregion 90 is a logical construct that corresponds to a predetermined number of items (e.g.,item information 80, documents, etc.) that utilize a particular range of item identifiers in the items table 21.FIG. 3B further illustrates, according to an embodiment, the regions 90 (e.g., R1-RM) that respectively correspond to columns 98 (COL-1-COL-N) that respectively correspondquery node columns 94. Thecolumn 98 is a logical construct that corresponds to a predetermined number of items (e.g.,item information 80, documents, etc.) that utilize a particular range of item identifiers in the items table 21. Segmentation of thegrid 92 into columns facilitates efficient processing of a query. For example, a query (e.g., Ipod Nano) may be processed by a singlequery node server 30 in eachquery node column 94 of thegrid 92, in parallel, to generate search results that are subsequently aggregated together to form the search results. Thecolumn 98 may be identified with a column identifier. Thequery node columns 94 and thequery node rows 96 may be independently scaled. Thequery node rows 96 may be increased to maximize throughput in processing a query and decreased to minimize the resources utilized to process the query. Thequery node columns 94 may be increased to accommodate an increase in the size of the items table 21 and decreased to accommodate a decrease in the size of the items table 21. -
FIG. 4 is a block diagram illustrating a time-line 100, according to an embodiment, to generate a full-index 22 and a mini-index 24. The time-line 100 moves from left to right. The down arrows correspond to events associated with the generation and deployment of the full-index 22. The up arrows correspond to events associated with the generation of the mini-index 24. -
Callout 102 corresponds to a full snapshot (1) of the items table 21 andcallout 104 corresponds to a full deployment of the full snapshot (1). The full snapshot may capture the entire contents of the items table 21 at an instant in time. Further,callout 106 corresponds to a full snapshot (2) that occurs later in time andcallout 108 corresponds to a full deployment of the full snapshot (2). The full snapshot (1) and the full snapshot (2) may be utilized to respectively generate the full-index 22 (1) and the full-index 22 (2) -
Callout 102 corresponds to a start-time of a delta snapshot (1) of the items table 21 andcallout 110 corresponds to an end-time of the delta snapshot (1). The delta snapshot may capture the changes to the items table 21 that are subsequent to the previous delta snapshot. For example, subsequent to a prior delta snapshot, an entry ofitem information 80 may be added to the items table, an entry ofitem information 80 may be removed from the items table 21 or an existingitem information 80 entry may be modified. These changes are capture with the delta snapshot. Sequential delta snapshots are illustrated includingcallout 112 which corresponds to a start-time of a delta snapshot (7) of the items table 21 andcallout 114 which corresponds to an end-time of the delta snapshot (7). The successive delta snapshots may be may be utilized to generate the mini-indexes 24 (e.g., mini-index 24 (1), mini-index 24 (2), mini-index 24 (3), etc.). - Update of Index Information with Full Snapshots and Delta Snapshots
- The
index information 26 at thequery node servers 30 may be updated with the full-indexes 22 and the mini-indexes 24 in an order that is sequential. For example, theindex information 26 may be updated based on the order in which the full-index 22 and the mini-indexes 24 are generated and communicated to thequery node servers 30. Further, the mini-indexes 24 may arrive out of sequence at thequery node servers 30. Accordingly, each of thequery node servers 30 may utilizecurrent BOM information 64 at thequery node servers 30, a full-index BOM information 54 associated with the full-index 22, and themini-index BOM information 56 associated with the mini-index 24 to ensure the update is performed in sequential order. In some embodiments a delta snapshot may be skipped if explicitly identified. Further, it will be appreciated that thesame index information 26 at thequery node server 30 may be generated by combining different full and delta snapshots. For example, theindex information 26 may be generated based on the full snapshot associated with the full-index 22 (1) and the delta snapshots respectively associated with the mini-indexes 24 (1-9) or the full-index 22 (2) and the delta snapshots respectively associated with the mini-indexes 24 (7-9). Other equivalent combinations may be formed. For example, theindex information 26 may be generated based on the full snapshot associated with the full-index 22 (1) and the delta snapshots respectively associated with the mini-indexes 24 (1-10) or the full snapshot associated with the full-index 22 (2) and the delta snapshots respectively associated with the mini-indexes 24 (7-10), etc. -
FIG. 5A is a block diagram illustratingindex information components 120, according to an embodiment. Theindex information component 120 is an abstraction that includes the full-index 22 and the mini-index 24. -
FIG. 5B is a block diagram illustrating a full-index 22, according to an embodiment. The full-index 22 is based on a snapshot, at an instant in time, of a set ofitem information 80 which are identified in the items table 21 by a range of item identifiers 88 (e.g., column). The full-index 22 may include full-index BOM information 54 andsection information 121. Thesection information 121 may include primarykey information 122,index information 26, andindex properties information 128. Theindex information 26 is an abstraction of theinverted index 32 and document information 34 (e.g., column group information). The document information is a snapshot, at an instant in time, of a set ofitem information 80 which are identified in the items table 21 by a range of item identifiers 88 (e.g. column). The primarykey information 122 may include an item identifier (e.g., 64 bit identifier) for the items in thecolumn 98 and an internal logical item identifier for each of the items in thecolumn 98. Theinverted index 32 may include a posting list for thecolumn 98 that utilizes internal logical item identifiers. Thedocument information 34 may include an array ofitem information 80 for the items in thecolumn 98 that may be accessed according to the internal logical item identifier. Theinverted index 32 and thedocument information 34 may use internal item logical identifiers to identifyitem information 80 rather than the full 64 bit item identifier to reduce space requirements. The primarykey information 122 may be utilized to perform mapping operations. For example, the primarykey information 122 may be utilized to map an internal logical item identifier to the corresponding 64 bit item identifier. The reverse operation may also be performed. Theindex properties information 128 may include statistical information that is gathered while the full-index 22 is being generated. -
FIG. 5C is a block diagram illustrating a mini-index 24, according to an embodiment. The mini-index 24 is based on a snapshot of changes, during a period of time, to a set ofitem information 80 which are identified in the items table 21 by a range of item identifiers 88 (e.g., column). The mini-index 24 may include themini-index BOM information 54 andsection information 121. Thesection information 121 may include the same sections as the full-index 22 and adelete information 130 section. Thedocument information 34 is a snapshot of changes, during a period of time, to a set ofitem information 80 which are identified in the items table 21 by a range of item identifiers 88 (e.g. column). Theinverted index 32 is a posting list that may enable access to thedocument information 34, as such. Thedelete information 130 may describe items (e.g., item information 80) that were deleted subsequent to the generation of theprevious mini-index 24. -
FIG. 6A is a block diagram illustratingcurrent BOM information 64, according to an embodiment. Thecurrent BOM information 64 may be stored at aquery node server 30 and utilized to manage the updating of theindex information 26 with the full-indexes 22 and the mini-indexes 24. Thecurrent BOM information 64 may store a current full-index identifier 150 that identifies the most recently updated full-index 22, a currentmini-index identifier 152 that identifies the most recently updatedmini-index 24 andmini-index storage information 154 that stores mini-indexes 24 that have arrived at thequery node server 30 but are not yet merged with theindex information 26 presently being utilized by thequery node server 30. Generally, as previously described, theindex information 26 in thequery node server 30 is updated with mini-indexes 24 in sequential order based on themini-index identifier 162. Mini-indexes 24 that arrive at thequery node server 30 may be stored for subsequent merger. For example, the full-index identifier 150 indicates the most recently updated full-index 22 is identified with a full-index identifier 156 of “1.” the currentmini-index identifier 152 indicates the full-index 22 has not been updated with a mini-index 24 (e.g., NULL), and themini-index storage information 154 is illustrated as storingmini-indexes 24 “2,” “3,” and “4” indicating these mini-index 24 have arrived at thequery node server 30 out of sequence (e.g., mini-index 24 “1” is missing) and are not yet merged into theindex information 26. Accordingly, the arrival of the mini-index 24 with themini-index identifier 162 of “1” may result in the sequential merger of the set ofmini-indexes 24 withmini-index identifiers 162 of “1,” “2,” “3,” and “4.” -
FIG. 6B is a block diagram illustrating full-index BOM information 54, according to an embodiment. The full-index BOM information 54 may include a full-index identifier 156 that identifies the full-index 54 and a full-index version identifier 158 that identifies the version of the full-index 54. For example, the full-index identifier 156 is illustrated with a full-index identifier 156 of “1” and a full-index version identifier 158 of “0.01.” -
FIG. 6C is a block diagram illustrating mini-index BOM information 160, according to an embodiment. The mini-index BOM information 160 may include amini-index identifier 162 that identifies the mini-index 24, amini-index version identifier 164 that identifies the version of the mini-index 24, compatible full-index identifiers 166 that identifies the full-indexes 22 that are compatible with thepresent mini-index 24, sequencinginformation 168 that identifies the sequence ofmini-indexes 24 that were generated prior to and inclusive of thepresent mini-index 24 and skipinformation 170 that identifies mini-indexes 24 that may be skipped. For example, themini-index identifier 162 is illustrated with amini-index identifier 162 of “6,” amini-index version identifier 164 of “0.02.” compatible full-index identifiers 156 of “1 and 2,” sequencing information of “1, 2, 3, 4, 5 and 6” and skipinformation 170 of “5.” Accordingly, theindex information 26 in thequery node server 30 may be updated with the most recently arrived mini-index 24 (e.g., mini-index “6”) provided that theindex information 26 was previously updated with the full-index 22 “1 or 2.” Further, theindex information 26 in thequery node server 30 may be updated with the mini-index 24 (e.g., mini-index “6”) without updating theindex information 26 with the mini-index 24 that is identified with themini-index identifier 162 of “5” because theskip information 170 identifies the mini-index 24 “5” as being skipped. -
FIG. 7 is a block diagram illustrating amethod 300, according to an embodiment, to build and utilize a search index. Themethod 300 may commence atoperation 302 with the information storage andretrieval platform 11 receiving information for an item from aclient machine 12. Receipt of the information may cause theHBase master server 42 to add item information 80 (e.g., one entry) to the items table 21. Merely for example, theitem information 80 may include atitle 82, adescription 84, and apicture 86 of a book that is being offered for sale by a seller on the information storage andretrieval platform 11. Theitem information 80, thetitle 82, thedescription 84, and thepicture 86 are stored in the items table 21 with a time-stamp that chronicles their respective times of storage. - At
operation 304, the HBase/Hadoop Cluster 44 may include ascheduler module 48 that periodically generates/builds theindex information components 120 including the full-index 22 or the mini-index 24. Thescheduler module 48 may periodically generate theindex information component 120 by scheduling a map-reducejob module 50 that initiates jobs that execute in a map-reduce framework. The map-reducejob module 50 may schedule one set of jobs to generate the full-index 22 and another set of jobs to generate the mini-index 24. The building of the full-index 22 and the mini-index 24 may be in real time while the information search andretrieval platform 11 remains operational and in parallel. For example, thescheduler module 48 may schedule the generation of the full-index 22 twice in a twenty-four hour period and the generation ofmini-indexes 24 every five minutes. The scheduling and execution of jobs is described more fully inmethod 400 ofFIG. 8A . - At
operation 306, the index distribution module 52 may communicate theindex information component 120 to the appropriatequery node servers 30. For example, the index distribution module 52 may communicate the full-index 22 to theappropriate column 94 ofquery node servers 30 in thegrid 92 ofquery node servers 30 responsive to the build of the full-index 22 being completed. Also for example, the index distribution module 52 may communicate the mini-index 24 to theappropriate column 94 ofquery node servers 30 in thegrid 92 ofquery node servers 30 responsive to the build of the mini-index 24 being completed. - At
operation 308, thequery node servers 30 in thequery node column 94 may update theindex information 26 responsive to receipt of theindex information component 120. Thequery node server 30 may update theindex information 26 with the full-index 22 by restarting thequery node server 30, as described more fully inmethod 450 ofFIG. 8B . Also for example, thequery node server 30 may update theindex information 26 with the mini-index 24 as described more fully inmethod 470 ofFIG. 8C . - At
operation 310, the information storage andretrieval platform 11 may receive a search query, over a network, from aclient machine 33 and utilize theindex information 26 in thegrid 92 ofquery node servers 30 to identify search results that are communicated back to theclient machine 33. -
FIG. 8A is a block diagram illustrating amethod 400, according to an embodiment, to generate anindex information component 120. Themethod 400 may execute in a loop without end. Themethod 400 may commence atoperation 402 with thescheduler module 48 identifying commencement of the next time increment and initiating execution of the map-reducejob module 50. Atdecision operation 404, the map-reducejob module 50 may identify whether a full-index 22 is scheduled for generation/build. If a full-index 22 is scheduled for generation/build then the map-reducejob module 50 may sequentially execute the full index section job (operation 406), the merger job (operation 408), the index packing job (operation 410) and the transport packing job (operation 412). The respective jobs may generate output that is consumed by the next job in the sequence until the transport packing job communicates the full-index 22 to the appropriatequery node column 94 ofquery node servers 30 in thegrid 92 ofquery node servers 30. The execution of jobs is described more fully in thedata flow 550 ofFIG. 9A . - At
decision operation 414, the map-reducejob module 50 may identify whether a mini-index 24 is scheduled for generation/build. If a mini-index 24 is scheduled for generation/build then the map-reducejob module 50 may sequentially execute the mini-index section job (operation 416) and the transport packing job (operation 412). The transport packing job may communicate the mini-index 24 to the appropriatequery node column 94 ofquery node servers 30 in thegrid 92 ofquery node servers 30. The execution of jobs is described more fully in adata flow 570 ofFIG. 9B . -
FIG. 8B is a block diagram illustrating amethod 450 to updateindex information 26 at aquery node server 30 based on a full-index 22, according to an embodiment. The method commences at operation 453 with thequery engine 62, at thequery node server 30, receiving the full-index 22. Atoperation 454 thequery engine 62 may identify whether the full-index 22 is valid (e.g. well formed). Recall that the full-index 22 may include full-index BOM information 54 that includes a full-index identifier 156 and a full-index version identifier 158. If thequery engine 62 identifies the full-index identifier 156 is the same as the current full-index identifier 150 then the full-index 22 may be identified as not valid (e.g., the full index is already installed). Further, if thequery engine 62 identifies the full-index version identifier 158 is not within a predetermined range then the full-index 22 may be identified as not valid (e.g., the full-index version identifier 158 is not well formed). If the full-index 22 is identified as not valid then processing ends. Otherwise processing continues atoperation 456. Atoperation 456, thequery engine 62 may update the current full-index identifier 150 in theBOM information 64 with the full-index identifier 156 in the full-index BOM information 54. Atoperation 458, thequery engine 62 may restart thequery node server 30 to identify and initialize theindex information 26 in thequery node server 30 with the full-index 22. For example, thequery node server 30 may utilize the full-index identifier 156 in the full-index BOM information 54 to identify the appropriate full-index 22. -
FIG. 8C is a block diagram illustrating amethod 470 to updateindex information 26 in aquery node server 30 based on a mini-index 24, according to an embodiment. The method may commence atoperation 472 with thequery engine 62, at thequery node server 30, receiving the mini-index 24. Atdecision operation 474 thequery engine 62 may identify whether the mini-index 24 is valid. For example, if thequery engine 62 identifies themini-index identifier 162 is the same as the currentmini-index identifier 152 then the mini-index 24 may be identified as not valid (e.g., the mini-index 24 is already merged into the index information 26). Further, if thequery engine 62 identifies the mini-index version identifier(s) 164 is not within a predetermined range then then the mini-index 24 may be identified as not valid (e.g., themini-index version identifier 164 is not well formed). Further, if thequery engine 62 identifies the compatible full-index identifier 166 does not include at least one full-index identifier 156 that matches the current-full index identifier 150 then the mini-index 24 may be identified as not valid (e.g., the mini-index 24 is not compatible with the full-index 22 utilized to build theindex information 26 in the query node server 30). - At
decision operation 478, thequery engine 62 may identify whether the received mini-index 24 is identified with amini-index identifier 162 that identifies the next expectedmini-index 24. For example, thequery engine 62 may identify whether themini-index identifier 162 in themini-index BOM information 56 is equal to the currentmini-index identifier 152plus 1. If the received mini-index 24 is the next in sequence then processing continues atoperation 490. Otherwise processing continues atoperation 480. Atoperation 480, thequery engine 62 may identify whethermini-indexes 24 may be skipped. For example, the query engine may read the skip-information 170 in themini-index BOM information 56 included in the mini-index 24. Atoperation 482, thequery engine 62 may identify whether any mini-indexes 24 have been stored asmini-index storage information 154. Atdecision operation 484, thequery engine 62 may determine whether the update of theindex information 26 in thequery node server 30 may be performed based on theskip information 170 and the identified storedmini-indexes 24. If the update may be performed then processing continues atoperation 490. Otherwise processing continues atoperation 488. Atoperation 488, thequery engine 62 may store the mini-index 24 that was most recently received asmini-index storage information 154. Atoperation 490, thequery engine 62 may update in sequential order theindex information 26 in thequery node server 30 with the mini-indexes 24 that were identified. For example, thequery engine 62 may sequentially update theindex information 26 with the one or more mini-indexes 24 identified as stored asmini-index storage information 154 and the mini-index 24 that was most recently received while skipping any mini-indexes that were identified in theskip information 170. -
FIG. 9A is a block diagram illustrating thedata flow 550, according to an embodiment, to generate a full-index 22. The data flow 550 moves from left to right, chronologically, as directed by a scheduler module 48 (not shown). Thescheduler module 48 may periodically initiate the map-reducejob module 50 that causes the execution of a set of jobs illustrated on the top of thedata flow 550. The set of jobs may include a full-index section job 202,merger jobs 204, anindex packing job 206 and atransport job 208. The full-index section job 202 and themerger jobs 204 are components of a map-reduce framework, as known in the art. - The full-
index section job 202 may initiate map tasks 552 (e.g., M1, M2, M3, MN), one for each of theregions 90 of the items table 21, as previously described. Themap tasks 552 may take full snapshots of theitem information 80 corresponding to theitem identifiers 88 in the associatedregion 90. To this end, themap tasks 552 may read item information 80 (e.g., describing items) from the items table 21, according toregions 90, and generatetoken information 554 and other information both being utilized to generate thesection information 121. The other information may be communicated directly to the reducers 556 (e.g., “R1,” “R2,” “R3,” “RN”). Thetoken information 554 may be communicated to apartitioner 555 which, in turn, partitions thetoken information 554 for consumption by reducers 556 (e.g., “R1.” “R2,” “R3,” “RN”). Thepartitioner 555 may partition the token information 554 (not shown) based on the contents of thetoken information 554 including a token element 211, anitem identifier 88, and the column identifier. For example, token information 210 may be embodied as follows: - “‘cat,’
item 100,column 1.” - Responsive to receiving the token information 210, the
partitioner 555 may identify a particular reducer 556 (e.g., “R1,” “R2,” “R3,” “RN”) based on a hash value that is generated from the token element 211 and the column identifier and send the token information 210 to the identifiedreducer 556. Themerger jobs 204 may initiate thereducers 556 andmap tasks 560 to process thetoken information 554 and other information to generate the full-index 22. Thereducers 556 andmap tasks 560 may execute on the HBase/Hadoop nodes 49. It will be appreciated that processing time to produce the full-index 22 may be minimized by increasing the number ofmap tasks 552,reducers 556,map tasks 560 and HBase/Hadoop nodes 49. Further, resources may be economized by decreasing the same. Each of thereducers 556 may segregate the receivedtoken information 554 according to columns 98 (e.g., “COLUMN 1,” “COLUMN 2,” “COLUMN 3,” COLUMN N). For example, thetoken information 554 and other information for “Column 1” may be segregated asoutput 558 for “COLUMN 1.”Other output 558 may be segregated forother columns 98 in a similar manner. Recall that thecolumns 98 may correspond to aquery node column 94 ofquery node servers 30 in agrid 92 of query node servers 30 (not shown)) that utilize the full-index 22, once generated, to process a query. Thereducers 556 may organize thetoken information 554 and other information intooutput 558 according tocolumns 98 based on column identifiers and distributes theoutput 558 in accordance with thecolumns 98 to themap tasks 560. For example,FIG. 7A illustrates thereducer 556 identified as “R1” as receiving thetoken information 554 for allcolumns 98, generatingoutput 558 that is organized according to the columns “C1,” “C2,” “C3,” “CN” and distributing theoutput 558 for “C1” to themap task 560 “M1.” For clarity sake the other output 558 (e.g., “C2,” “C3,” and “CN”) is not illustrated as being distributed to the other map tasks 560 (e.g., “M1,” “M2,” “M3,” and “MN”). Further, the remaining reducers 556 (e.g., “R2,” “R3,” and “RN”) are also illustrated as distributing theoutput 558 for “C1” to themap task 560 “R1” but again, for clarity sake, the full data flow is not illustrated. Broadly, eachreducer 556 may generateoutput 558 for allcolumns 98 and distributes theoutput 558 to the map tasks, according tocolumns 98. - The
map task 560 may receive theoutput 558 for asingle column 98. Themap task 560 may utilize theoutput 558 and the other information to generate the section information 121 (e.g., “S1,” “S2,” “S3,” and “SN”) for theparticular column 98. - The
index packing job 206 may execute to generate thefull index 22. Theindex packing job 206 may generate thefull index 22 by packing the sections of thesection information 121 together, generating the full-index BOM information 54, and packing the full-index 22. Theindex packing job 206 may pack the full-index by packing thesection information 121, the full-index BOM information 54 and theindex properties information 128 into the full-index 22. - Finally, the
transport job 208 may execute to distribute the full-indexes 22, according tocolumns 94, to thegrid 92 ofquery node servers 30. For example, thetransport job 208 may execute to transport the full-index 22 forcolumn 1 to each of thequery node servers 30 incolumn 1 of thegrid 92. In one embodiment, the distribution of the full-indexes 22 to thequery node servers 30 may be over a network utilizing the index distribution module 52 based on Bit Torrent, a peer to peer file sharing protocol. -
FIG. 9B is a block diagram illustrating thedata flow 570 to generate a mini-index 24, according to an embodiment. The data flow 250 moves from left to right, chronologically, as directed by ascheduler module 48 that initiates execution of the map-reducejob module 50 that initiates execution of amini-index section job 252 and atransport job 208. Themini-index section job 252 is a component of a map-reduce framework as is known in the art. - The
mini-index section job 252 may initiate map tasks 572 (e.g., “M1.” “M2,” “M3,” and “MN”), one for eachcolumn 98. Themap tasks 572 may further correspond to tworegions 90 of the items table 21, according to an embodiment. Other embodiments may utilize a different ratio ofregions 90 tocolumns 94 to maptasks 552. Themap tasks 552 may take a snapshot of changes to the items table 21 that have occurred between a start-time and an end-time. For example, the snapshot may record an addition of item information 80 (e.g., new item), a deletion ofitem information 80, and a modification to existing item information 80 (e.g., field addition, field addition, field modification). Themap tasks 552 may further generate the mini-index 24. Themap tasks 552 may generate the mini-index 24 by packing the sections of thesection information 121 together, generating themini-index BOM information 56, and packing the mini-index 24. Themap tasks 552 may pack the mini-index 24 by packing thesection information 121, themini-index BOM information 56 and theindex properties information 128 into the mini-index 24. - The
transport job 208 may execute to distribute the mini-indexes 24, according tocolumns 98, to thequery node column 94 in thegrid 92 ofquery node servers 30. For example, thetransport job 208 may execute to transport the mini-index 24 forcolumn 1 to the query node servers 30 (not shown) incolumn 1 of the grid 92 (not shown). In one embodiment, the distribution of the mini-indexes 24 to thequery node servers 30 may be over a network utilizing the index distribution module 52 (not shown) based on Bit Torrent, a peer to peer file sharing protocol. -
FIG. 10A is a network diagram illustrating asystem 600, according to an embodiment, to process aquery 602 with a unified storage interface. Thesystem 600 may embody thesystem 10 inFIG. 1 and, accordingly, the same or similar references have been used to indicate the same or similar features unless otherwise indicated. Thesystem 600 may include aclient machine 33 and an information storage andretrieval platform 11. Broadly, the information storage andretrieval platform 11 may receive thequery 11, over a network (e.g., Internet) (not shown) from theclient machine 33, generate aquery container 606 that includes aquery expression 604 that is based on thequery 602, and process thequery expression 604 in anexecution layer 622 andstorage layer 624 that utilize a unified storage interface to retrieve data from heterogeneous storage devices 625 (e.g., box, circle, rectangle) and communicate at least a portion of the data back over the network to theclient machine 33. The unified storage interface hides interactions that are unique to therespective storage device 625. For example, the unified storage interface may be embodied as an application programming interface that is utilized to retrieve data from thedifferent storage device 625. For example, thestorage devices 625 may differ with respect to instructions utilized to retrieve the data, format in which the data is stored, format in which the data is retrieved, and location of the storage device 625 (e.g., local or remote). Examples ofheterogeneous storage devices 625 may include a relational database that stores relational data as tuples, a directed acyclic word graph (DAWG) database that stores data as a set of strings arranged as a hierarchy of nodes connected by edges that may be traced without forming a loop and the same two databases being accessed remotely over a network. The unified storage interface may be accessed in theexecution layer 622 which is common for all storage types to enter thestorage layer 625 that is organized according to storage type. Accordingly, operations unique to aparticular storage device 625 are contained inside thestorage layer 624 which is accessed via theexecution layer 622 that, in turn, provides a generalized service to access allstorage devices 625 for query processing clients (e.g., query engine 62). One benefit of decoupling the generalized operations from thestorage device 625 specific operations is to simplify software development. For example, software that executes in theexecution layer 622 may be modified independently of software that executes in thestorage layer 624 minimizing engineering resources to achieve interoperability and integration. Thesystem 600 is now discussed more fully in detail. - At operation A, the information storage and
retrieval platform 11 may utilize search front-end servers 58 to receive thequery 602 from theclient machine 33. For example, the query may include the keywords “BLACK IPOD NANO ACCESSORIES.” The search front-end servers 58 may parse thequery 602 to generatequery information 604 and store thequery information 604 in aquery container 606. Thequery container 606 may contain multiple entries ofquery information 604, some being parsed from thesame query 602 “BLACK IPOD NANO ACCESSORIES” and others being parsed from other queries (not shown). Thequery information 604 that is illustrated is for thequery expression 608 “AND (IPOD, NANO)” being parsed from the example query, “BLACK IPOD NANO ACCESSORIES.”Other query information 604 is not illustrated. Thequery information 604 may include thequery expression 608,output field information 610, sortfield information 612 and a primary input table 614. Thequery expression 608, as described above, may be comprised of keywords that are parsed by the front-end server 58 from thequery 602 that is received and operators that either appear in thequery 602 or are implied as being in thequery 602. Theoutput field information 610 may identify output fields to be included the search results. For example, theoutput field information 610 may identify one or more fields of records (e.g., items, documents) that are included in the search results. Thesort field information 612 may identify the one or more field(s) utilized to sort the search results and whether to sort in ascending or descending order. The primary input table 614 may identify an input table from which data is retrieved based on thequery expression 608. At least a portion of the data may be returned to theclient machine 33 as search results. - At operation B, the search front-
end servers 58 may communicate thequery container 606 to the search back-end servers 60. The search back-end servers 60 may process thequery information 604 in thequery container 606, as described later in this document. - At operation C, the search back-
end servers 60 may communicate thequery container 606 to aquery node server 30 in anaggregation layer 616 ofquery node servers 30. Thequery node server 30 may respond to receipt of thequery container 606 by invoking a query engine 62 (not shown) to generate aquery expression tree 618 based on thequery expression 608 and store thequery expression tree 618 in thequery container 606. Further, thequery engine 62 may identify a singlequery node server 30 in each of thequery node columns 94 of thegrid 92 ofquery node servers 30 and communicate thequery container 606 to the identifiedquery node servers 30. Further, recall that each of thequery node columns 94 is dedicated to a particular range of documents (e.g., items) in the index information 19. Accordingly, thequery node server 30 in theaggregation layer 616 communicates thequery container 606 to onequery node server 30 in each of thequery node columns 94 in thegrid 92 to retrieve search results for the entire index information 19. - At operation D, the
query node servers 30 in respectivequery node columns 94 may receive thequery container 606 and process thequery information 604 entries in thequery container 606. For example, thequery node server 30 may process eachquery information 604 entry to build (e.g. generate) and execute aquery plan 626. To this end, onequery engine 62 from eachquery node column 94 invokes aquery plan builder 654 to build thequery plan 626 and further executes thequery plan 626. Thequery plan 626 may include acursor expression tree 628 that include expression nodes (not shown) that correspond to cursor objects of the query expression tree 618 (not shown). Thequery plan builder 654 may invoke expansion generators (not shown) that read the expression nodes of thequery expression tree 618 to generate the cursor objects of thecursor expression tree 628. The expansion generators may include a generic expansion generator (not shown) that executes in theexecution layer 622 to generate cursor objects and multiple specific expansion generators (not shown) that execute in the storage layer to generate storage cursor objects. The expression nodes directly correspond to the cursor objects (e.g., one-to-one correspondence). - The
query engine 62 executes thequery plan 626. For example, thequery engine 62 may execute cursor objects (not shown) and storage cursor objects (not shown) in thequery plan 626. Thequery engine 62 may execute the storage cursor objects to retrieve data from aparticular storage device 625. Thequery node server 30 may store the data that was retrieved in atable container 620 that may subsequently be communicated as search results via theaggregation layer 616 to the search back-end servers 60 to the search front-end servers 58 to theclient machine 33. Accordingly, operations that are unique to aparticular storage device 625 are hidden within a storage layer that is accessible via an execution layer that is exposed to query processing clients (e.g., query engine 62) resulting in a unified storage interface. -
FIG. 10B is a block diagram illustrating search back-end servers 60, according to an embodiment. The search back-end servers 60 may includesearch load balancers 650 andtransformers 652. Thesearch load balancers 650 may receive the query container 606 (not shown) from the search front-end servers 58 and communicate the query container 606 (not shown) to atransformer 652 to balance traffic. For example, thesearch load balancer 650 may communicate thequery container 606 to thetransformer 652 that is the least loaded. Thetransformer 652 performs an expansion function and a scatter/gather function for each of thequery information 604 in thequery container 606. The expansion function expands the terms of thequery expression 608 to widen the scope of the search result. For example, thequery expression 608 “AND (IPOD, NANO)” may be expanded to capture plural forms as follows “AND ((IPOD OR IPODS), NANO).” Other types of expansions may also be performed to capture synonyms, idioms, etc. Further, thetransformer 652 may perform the scatter/gather function by iterating the search of thequery expression 608. For example, consider a database ofitem information 80 describing multiple items that are for sale on a network-based marketplace with an auction process and/or purchase process. A buyer who enters a search query may be interested in receiving search results that include matchingitem information 80 for both formats. Accordingly, thetransformer 652 may generate the desired search result by initiating two searches in parallel. The first search may be for items that are offered for sale with an auction process. The second search may be for items that are offered for sale with a purchase process. Each of the searches would proceed as previously described in operation “C” ofFIG. 10A . Further, thetransformer 652 may receive and blend the results of the two searches. For example, the transformer may receive the search results (e.g., item information 80) of the first search in a first table container 620 (e.g. auction process) and the search results of the second search in a second table container 620 (e.g., auction process) and blend the two search results in athird table container 620 that is returned via the search front-end servers 58 to theclient machine 33. The blending to form thethird table container 620 may be according to a predetermined percentage, according to one embodiment. For example, thethird table container 620 may be comprised of twenty-percent of theitem information 80 in the first table container 620 (e.g., auction process) and eighty-percent of theitem information 80 in the second table container 620 (e.g., purchase process). -
FIG. 10C is a block diagram illustrating aquery node server 30, according to an embodiment. Thequery node server 30 may include aquery engine 62 that executes aquery plan builder 654 to build thequery plan 626 and execute thequery plan 626, as previously described. Thequery node server 30 may storeindex information 26, as previously described. Further, thequery node server 30 may be coupled to adatabase 28 that is local and over a network (e.g., Internet) to adatabase 662 that is remote. Thedatabase 28 that is local may store theindex information 26, astorage data dictionary 656, relational storage 658 (e.g., M-storage), and directed acyclic word graph (DAWG)storage 660. Thedata dictionary 656 may be utilized by thequery plan builder 654 to identify the appropriate specific expansion generator based on the primary input table 614 in thequery information 604. Therelational storage 658 may be utilized to store data in a relational format. A relation is defined as a set of tuples that have the same attributes. A tuple represents an object and information about that object. Objects are typically physical objects or concepts. A relation is usually described as a table, which is organized into rows and columns. TheDAWG storage 660 database stores data as a set of strings arranged as a hierarchy of nodes connected by edges that may be traced without forming a loop. Thedatabase 662 may be accessed over a network (e.g., Internet) and may also may storerelational storage 658 and DAWG storage 664. Accordingly, thequery node server 30 may provide access to four storage devices including the relational storage 658 (e.g., M-storage) accessed via thedatabase 28 that is local, theDAWG storage 660 accessed via thedatabase 28 that is local, the relational storage 658 (e.g., M-storage) accessed over the network via thedatabase 662 that is remote, and theDAWG storage 660 accessed over the network via thedatabase 662 that is remote. -
FIG. 10D is a block diagram illustrating aquery expression tree 618, according to an embodiment. Thequery expression tree 618 may include expression nodes including anoperator expression node 680 and aterm expression node 684. Theoperator expression node 680 may be utilized to represent operators (e.g., AND) that are identified in the query expression 608 (not shown). Theterm expression node 684 may be utilized to represent terms (e.g., IPOD, NANO) that were identified in thequery expression 608. -
FIG. 10E is a block diagram illustrating acursor expression tree 628, according to an embodiment. Thecursor expression tree 628 may include software components in the form of cursor objects. The cursor objects may be executed by thequery engine 62 to retrieve data from adata storage device 625. A cursor is a moving placement or pointer that indicates a position. In the present example the notion of the cursor may correspond to a particular record in a database (e.g., relational storage, DAWG storage, etc.). Thecursor expression tree 628 may include cursor objects 686 and storage cursor objects 688. Thecursor object 686 may correspond to an operator expression node 680 (e.g., AND) in thequery expression tree 618. Thestorage cursor object 688 may correspond to a term expression node 684 (e.g., IPOD, NANO) in thequery expression tree 618. -
FIG. 10F is a block diagram illustrating software layers 690, according to an embodiment. The software layers 690 may include anexecution layer 622 and astorage layer 624. Theexecution layer 622 may include a generic expansion generator 692 (e.g., factory) that is utilized to generate cursor objects for thecursor expression tree 628. For example, thegeneric expansion generator 692 may generate acursor object 686 that corresponds to anoperator expression node 680 in thequery expression tree 618. Thestorage layer 624 may includestorage sub-layers 694 that correspond to types ofdata storage devices 625. The storage sub-layers 694 may include DAWG storage that is remote, DAWG storage that is local, relational storage that is remote, relational storage that is local and other types of storage. Associated with eachstorage sub-layer 694 is a specific expansion generator 696 (e.g., factory) that is utilized to generate storage cursor objects 688 for thecursor expression tree 628. For example, thespecific expansion generator 696 may generate a cursor storage object that corresponds to aterm expression node 684 in thequery expression tree 618. -
FIG. 10G is a block diagram illustrating astorage data dictionary 698, according to an embodiment. Thestorage data dictionary 698 may be used to associate the primary input table 614 in thequery information 604 with astorage sub-layer 694 that is associated with aspecific expansion generator 696. For example, the primary input table “A” may be associated with thestorage sub-layer 694 for relational storage (e.g., M-STORAGE) that is associated with the specific expansion generator for relational storage (e.g., M-STORAGE) that is local. -
FIG. 10H is a block diagram illustrating astorage cursor object 688, according to an embodiment. Thestorage cursor object 688 may include methods and memory for storage. For example, thestorage cursor object 688 may include methods to set the current position (e.g. database record), get the next likely position, get generic value, and other methods (not shown). -
FIG. 10I is a block diagram illustrating a method 700, according to an embodiment, to process a query with a unified storage interface. The method 700 may commence atoperation 702 with a front-end server 58 at the information storage andretrieval platform 11 receiving a query from aclient machine 33. For example, the query may include the keywords “BLACK IPOD NANO ACCESSORIES.” The search front-end servers 58 may parse the query to generatequery information 604, as described, and store thequery information 604 in aquery container 606. Thequery information 604 may include aquery expression 608 that may be comprised of keywords and operators that either appear in the query or are implied as being in the query. For example, thequery expression 608 may include “AND (IPOD, NANO).” The search front-end servers 58 may communicate thequery container 606 to the search back-end servers 60 which process thequery information 604 in thequery container 606, as previously described, and communicate thequery container 606 to aquery node server 30 in anaggregation layer 616 ofquery node servers 30. - In one embodiment, the
transformer 652 may perform an expansion function for thequery information 604 in thequery container 606. For example, thequery expression 608 “AND (IPOD, NANO)” may be expanded to capture plural forms, synonyms, idioms, etc. In one embodiment, thetransformer 652 may further perform a scatter/gather function by iterating the search of thequery expression 608 and blending the results. Thetransformer 652 may generate the desired search result by initiating two searches in parallel. Thetransformer 652 may initiate the first search by communicating thequery container 606 to a firstquery node server 30 in theaggregation layer 616 ofquery node servers 30 to requestitem information 80 for items that are offered for sale with an auction process and the second search by communicating thequery container 606 to a secondquery node server 30 in theaggregation layer 616 ofquery node servers 30 to requestitem information 80 for items that are offered for sale with a purchase process, as previously described. The search results may be blended into a single search result, as previously described. - At
operation 704, thequery node server 30 in the aggregation layer may utilize thequery engine 62 to generate aquery expression tree 618 for eachquery information 604 entry in thequery container 606. For example, thequery engine 62 may generate thequery expression tree 618 based on thequery expression 608 in thequery information 604 and store thequery expression tree 618 in thequery container 606. Thequery expression tree 618 may include nodes representing expressions in thequery expression 608 that are logically connected with edges. For example, thequery expression tree 618 for the query expression “AND (IPOD, NANO)” may include anoperator expression node 680 for “AND,” aterm expression node 684 “IPOD” and aterm expression node 684 for “NANO” where two edges lead away from the “AND”operator expression node 680, one leading to the “IPOD”term expression node 684 and the other leading to the “NANO”term expression node 684. Further, thequery engine 62 may identify a singlequery node server 30 in each of thequery node columns 94 of agrid 92 ofquery node servers 30 and communicate thequery container 606 to the identifiedquery node servers 30. - At
operation 706, thequery node servers 30 in respectivequery node columns 94 may receive thequery container 606 and invoke thequery engine 62 to invoke thequery plan builder 654. Thequery plan builder 654 may process eachquery information 604 entry in thequery container 606 to build (e.g., generate) an associatedquery plan 626. Thequery plan builder 654 may build thequery plan 626 to include acursor expression tree 628 that includes cursor objects that correspond to expression nodes in thequery expression tree 618, as described more fully inmethod 750 onFIG. 10J . - At
operation 708, thequery engine 62 may execute thecursor expression tree 628 to retrieve data from astorage device 625. For example, thequery engine 62 may execute a method in thestorage cursor object 688 for “NANO” to retrieve records (e.g., item information 80) that include the string “NANO” from a storage device 625 (e.g., relational storage). Further, thequery engine 62 may execute a method in thestorage cursor object 688 for “IPOD” to retrieve records (e.g., item information 80) that include the string “IPOD” from a storage device 625 (e.g., relational storage). Finally, thequery engine 62 may execute a method in thecursor object 686 for “AND” to “AND” the two sets of retrieved records and store the combined set in atable container 620 as results. - At
operation 710, thequery node server 30 may communicatetable container 620 via theaggregation layer 616 to the search back-end servers 60 that, in turn, communicate thetable container 620 to the search front-end servers 58 that, in turn, extract the search results from thetable container 620 and communicate the search results to theclient machine 33. -
FIG. 10J is a block diagram illustrating amethod 750, according to an embodiment, to generate acursor expression tree 628. Atoperation 752, thequery plan builder 654, at thequery node server 30, may identify the next expression node in thequery expression tree 618 as the current expression node. Atdecision operation 754, thequery plan builder 654 may identify whether the current expression node corresponds to a software component. For example, thequery plan builder 654 may invoke a generic expansion generator 692 (e.g., factory) that executes in theexecution layer 622 to identify and instantiate a cursor object that corresponds to the current expression node. If a software component was identified then processing continues atoperation 760. Otherwise processing continues atoperation 756. Atoperation 756, thequery plan builder 654 may identify the appropriate specific expansion generator 696 (e.g. factory) based on the primary input table 614 in thequery information 604. For example, thequery plan builder 654 may utilize thestorage data dictionary 698 to associate the primary input table 614 in thequery information 604 to theappropriate storage sub-layer 694 that is associated with aspecific expansion generator 696. Atoperation 758, thequery plan builder 654 may invoke thespecific expansion generator 692 to identify a software component in the form of astorage cursor object 688. For example, the specific expansion generator 696 (e.g., factory) may execute in thestorage layer 624 to identify and instantiate astorage cursor object 688 that corresponds to the current expression node. Atoperation 762, thequery plan builder 654 may store the software component that was identified at the appropriate position in thecursor expression tree 628. Atdecision operation 764, thequery plan builder 654 may identify whether there are more expression nodes in thequery expression tree 618. If there are more expression nodes in thequery expression tree 618 then processing continues atdecision operation 752. -
FIG. 11 is a network diagram depicting a networked system 800, within which one example embodiment may be deployed. The networked system 800 may embody thesystem 10 inFIG. 1 and, accordingly, the same or similar references have been used to indicate the same or similar features unless otherwise indicated. A network-basedmarketplace 812 provides server-side functionality, via a network 814 (e.g., the Internet or Wide Area Network (WAN)) to one or more clients.FIG. 11 illustrates, for example, a web client 816 (e.g., a browser, such as the Internet Explorer browser developed by Microsoft Corporation of Redmond, Wash. State) executing onclient machine 820, aprogrammatic client 818 executing onclient machine 822, and amobile web client 833 executing onmobile device 811. For example, themobile web client 833 may be embodied as one or more mobile modules that are used to support the Blackberry™ wireless hand held business or smart phone manufactured by Research In Motion of Waterloo. Ontario. An Application Program Interface (API)server 824 and aweb server 826 are coupled to, and provide programmatic and web interfaces respectively to, one ormore application servers 828. Theapplication servers 828 host one ormore marketplace applications 830 andpayment applications 832. Theapplication servers 828 are, in turn, shown to be coupled to one ormore database servers 834 that facilitate access to one ormore databases 836 - The
marketplace applications 830 may provide a number of marketplace functions and services to users that access the network-basedmarketplace 812. Thepayment applications 832 may likewise provide a number of payment services and functions to users. Thepayment applications 832 may allow users to accumulate value in accounts and then to later redeem the accumulated value for products (e.g., goods or services) that are made available via themarketplace applications 830. The value may be accumulated in a commercial currency, such as the U.S. dollar, or a proprietary currency, such as “points.” While themarketplace applications 830 andpayment applications 832 are shown inFIG. 11 to both form part of the network-basedmarketplace 812, it will be appreciated that, in alternative embodiments, thepayment applications 832 may form part of a payment service that is separate and distinct from the network-basedmarketplace 812. - Further, while the networked system 800 shown in
FIG. 11 employs client-server architecture, embodiments of the present disclosure are of course not limited to such an architecture and could equally well find application in a distributed, or peer-to-peer, architecture system, for example. Thevarious marketplace applications 830 andpayment applications 832 could also be implemented as standalone software programs, which do not necessarily have networking capabilities. - The
web client 816 andmobile web client 833 access thevarious marketplace applications 830 andpayment applications 832 via the web interface supported by theweb server 826. Similarly, theprogrammatic client 818 accesses the various services and functions provided by themarketplace applications 830 andpayment applications 832 via the programmatic interface provided by theAPI server 824. Theprogrammatic client 818 may, for example, be a seller application (e.g., the TurboLister application developed by eBay Inc., of San Jose, Calif.) to enable sellers to author and manage listings on the network-basedmarketplace 812 in an off-line manner, and to perform batch-mode communications between theprogrammatic client 818 and the network-basedmarketplace 812. -
FIG. 11 also illustrates athird party application 829, executing on a thirdparty server machine 831, as having programmatic access to the networked system 800 via the programmatic interface provided by theAPI server 824. - The
mobile device 811 may be embodied as a mobile phone, a personal digital assistant (PDA), a cell phone, or any other wireless device that is capable of communicating with the network-basedmarketplace 812. For example, themobile device 811 may be embodied as an iPhone mobile phone manufactured by Apple, Inc. of Cupertino, Calif. or, as previously mentioned, a Blackberry™ mobile phone manufactured by Research In Motion of Waterloo, Ontario. -
FIG. 12 is a block diagram illustratingmarketplace applications 830 andpayment applications 832 that, in one example embodiment, are provided as part of the networked system 800 ofFIG. 11 . Themarketplace applications 830 andpayment applications 832 may be hosted on dedicated or shared server machines, as shown onFIG. 11 , that are communicatively coupled to enable communications between server machines. The applications themselves are communicatively coupled (e.g., via appropriate interfaces) to each other and to various data sources, so as to allow information to be passed between the applications or so as to allow the applications to share and access common data. The applications may furthermore access one ormore databases 836 via thedatabase servers 834, as shown onFIG. 11 . - The network-based
marketplace 812 ofFIG. 11 may provide a number of publishing, listing and price-setting mechanisms whereby a seller may list (or publish information concerning) goods or services for sale; a buyer can express interest in or indicate a desire to purchase such goods or services; and a price can be set for a transaction pertaining to the goods or services. To this end, themarketplace applications 830 are shown to include at least onepublication application 840 and one ormore auction applications 842 which support auction-format listing and price setting mechanisms (e.g., English, Dutch, Vickrey, Chinese, Double, Reverse auctions, etc.). Thevarious auction applications 842 may also provide a number of features in support of such auction-format listings, such as a reserve price feature whereby a seller may specify a reserve price in connection with a listing and a proxy-bidding feature whereby a bidder may invoke automated proxy bidding. - A number of fixed-
price applications 844 support fixed-price listing formats (e.g., the traditional classified advertisement-type listing or a catalogue listing) and buyout-type listings. Specifically, buyout-type listings (e.g., including the Buy-It-Now (BIN) technology developed by eBay Inc., of San Jose, Calif.) may be offered in conjunction with auction-format listings and may allow a buyer to purchase goods or services, which are also being offered for sale via an auction, for a fixed-price that is typically higher than the starting price of the auction. - Store application(s) 846 allows a seller to group listings within a “virtual” store, which may be branded and otherwise personalized by and for the seller. Such a virtual store may also offer promotions, incentives and features that are specific and personalized to a relevant seller.
-
Reputation applications 848 allow users that transact, utilizing the network-basedmarketplace 812, to establish, build and maintain reputations, which may be made available and published to potential trading partners. Consider that where, for example, the network-basedmarketplace 812 supports person-to-person trading, users may otherwise have no history or other reference information whereby the trustworthiness and credibility of potential trading partners may be assessed. Thereputation applications 848 allow a user to establish a reputation within the network-basedmarketplace 812 over time, for example, through feedback provided by other transaction partners and by the computation of a feedback score based on the feedback. For example, the feedback score may be publicly displayed by the network-basedmarketplace 812. Other potential trading partners may then reference such a feedback score for the purposes of assessing credibility and trustworthiness. -
Personalization applications 850 allow users of the network-basedmarketplace 812 to personalize various aspects of their interactions with the network-basedmarketplace 812. For example, a user may, utilizing anappropriate personalization application 850, create a personalized reference page at which information regarding transactions to which the user is (or has been) a party may be viewed. Further, apersonalization application 850 may enable a user to personalize listings and other aspects of their interactions with the networked system 800 and other parties. - The networked system 800 may support a number of marketplaces that are customized, for example, for specific geographic regions. A version of the networked system 800 may be customized for the United Kingdom, whereas another version of the networked system 800 may be customized for the United States. Some of these versions may operate as an independent marketplace, or may be customized (or internationalized) presentations of a common underlying marketplace. The networked system 800 may accordingly include a number of
internationalization applications 852 that customize information (and/or the presentation of information) by the networked system 800 according to predetermined criteria (e.g., geographic, demographic or marketplace criteria). For example, theinternationalization applications 852 may be used to support the customization of information for a number of regional websites that are operated by the networked system 800 and that are accessible viarespective servers FIG. 11 . - Navigation of the network-based
marketplace 812 may be facilitated by one ormore navigation applications 854. Merely for example, thenavigation applications 854 may receive search information in the form of a query to search for items on the network-based marketplace and return search results responsive to the request. A browse application may allow users to browse various category, catalogue, or inventory data structures according to which listings may be classified within the networked system 800. Various other navigation applications may be provided to supplement the search and browsing applications. For example, thenavigation applications 854 may include theevent manager module 36, thescheduler module 48, the map-reducejob module 50, included in thesystem 10 to build and utilize a search infrastructure. Further, thenavigation applications 854 may include other modules in thesystem 10 that are not presently mentioned. In order to make listings available via the networked system 800 as visually informing and attractive as possible, themarketplace applications 830 may include one ormore imaging applications 856 with which users may upload images for inclusion within listings. Animaging application 856 also operates to incorporate images within viewed listings. Theimaging applications 856 may also support one or more promotional features, such as image galleries that are presented to potential buyers. For example, sellers may pay an additional fee to have an image included within a gallery of images for promoted items. -
Listing creation applications 858 allow sellers to conveniently author listings pertaining to goods or services that they wish to transact via the network-basedmarketplace 812, while thelisting management applications 860 allow sellers to manage such listings. Specifically, where a particular seller has authored and/or published a large number of listings, the management of such listings may present a challenge. The listing creation applications may further include a processing module, communication module, and listing module that facilitate a buyer watching for specific types of listings. Thelisting management applications 860 provide a number of features (e.g., auto-relisting, inventory level monitors, etc.) to assist the seller in managing such listings. - One or more
post-listing management applications 862 may also assist sellers with a number of activities that may typically occur post-listing. For example, upon completion of an auction facilitated by one ormore auction applications 842, a seller may wish to leave feedback regarding a particular buyer. To this end, apost-listing management application 862 may provide an interface to one ormore reputation applications 848, so as to allow the seller conveniently to provide feedback regarding multiple buyers to thereputation applications 848. -
Dispute resolution applications 864 provide mechanisms whereby disputes arising between transacting parties may be resolved. For example, thedispute resolution applications 864 may provide guided procedures whereby the parties are guided through a number of steps in an attempt to settle a dispute. In the event that the dispute cannot be settled via the guided procedures, the dispute may be escalated to a third party mediator or arbitrator. - A number of
fraud prevention applications 866 implement fraud detection and prevention mechanisms to reduce the occurrence of fraud within the network-basedmarketplace 812. -
Messaging applications 868 are responsible for the generation and delivery of messages to users of the network-basedmarketplace 812, with such messages, for example, advising users regarding the status of listings at the network-based marketplace 812 (e.g. providing “outbid” notices to bidders during an auction process or to providing promotional and merchandising information to users).Respective messaging applications 868 may utilize any one of a number of message delivery networks and platforms to deliver messages to users. For example,messaging applications 868 may deliver electronic mail (e-mail), instant message (IM), Short Message Service (SMS), text, facsimile, or voice (e.g., Voice over IP (VoIP)) messages via the wired (e.g., the Internet), Plain Old Telephone Service (POTS), or wireless (e.g., mobile, cellular, WiFi (e.g., IEEE 802.11 technologies including 802.11n, 802.11b, 802.11g, and 802.11a)), Worldwide Interoperability for Microwave Access (e.g. WiMAX—IEEE 802.16) networks. -
Merchandising applications 870 support various merchandising functions that are made available to sellers to enable sellers to increase sales via the network-basedmarketplace 812. Themerchandising applications 870 also operate the various merchandising features that may be invoked by sellers and may monitor and track the success of merchandising strategies employed by sellers. Thetransaction incentivizing applications 872 operate to provide incentives for buyers and sellers to enter into and complete transactions. -
FIG. 13 is a high-level entity-relationship diagram, illustrating various tables 880 storage structures that may be maintained within thedatabases 836 ofFIG. 11 , and that are utilized by and support themarketplace applications 830 andpayment applications 832 both ofFIG. 12 . A user table 882 contains a record for registered users of the network-basedmarketplace 812 ofFIG. 11 . A user may operate as a seller, a buyer, or both, within the network-basedmarketplace 812. In one example embodiment, a buyer may be a user that has accumulated value (e.g., commercial or proprietary currency), and is accordingly able to exchange the accumulated value for items that are offered for sale by the network-basedmarketplace 812. - The tables 800 also include an items table 884 in which item records are maintained for goods and services that are available to be, or have been, transacted via the network-based
marketplace 812. Item records within the items table 884 may furthermore be linked to one or more user records within the user table 882, so as to associate a seller and one or more actual or potential buyers with an item record. - A transaction table 886 contains a record for each transaction (e.g., a purchase or sale transaction or auction) pertaining to items for which records exist within the items table 884.
- An order table 888 is populated with order records, with each order record being associated with an order. Each order, in turn, may be associated with one or more transactions for which records exist within the transaction table 886.
- Bid records within a bids table 890 relate to a bid received at the network-based
marketplace 812 in connection with an auction-format listing supported by anauction application 842 ofFIG. 12 . A feedback table 892 is utilized by one ormore reputation applications 848 ofFIG. 12 , in one example embodiment, to construct and maintain reputation information concerning users in the form of a feedback score. A history table 894 maintains a history of transactions to which a user has been a party. One or more attributes tables 896 record attribute information pertaining to items for which records exist within the items table 884. Considering only a single example of such an attribute, the attributes tables 896 may indicate a currency attribute associated with a particular item, with the currency attribute identifying the currency of a price for the relevant item as specified in by a seller. -
Search storage structures 898 may store information that is utilized to search the items table 884 and other tables. For example, thesearch storage structures 898 may be utilized by thesystem 10, as illustrated nFIG. 1 , to build and utilize a search infrastructure, according to an embodiment. A customization table 899 may store customization records that may be utilized to customize the operation of the network-basedmarketplace 812. -
FIG. 14 shows a diagrammatic representation of a machine in the example form of acomputer system 900 within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed. In alternative embodiments, the machine operates as a standalone device or may be connected (e.g. networked) to other machines. In a networked deployment, the machine may operate in the capacity of a server or a client machine in client-server network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine may be a server computer, a client computer, a personal computer (PC), a tablet PC, a set-top box (STB), a PDA, a cellular telephone, a web appliance, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein. - The
example computer system 900 includes a processor 902 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), or both), amain memory 904 and astatic memory 906, which communicate with each other via abus 908. Thecomputer system 900 may further include a video display unit 910 (e.g. a liquid crystal display (LCD) or a cathode ray tube (CRT)). Thecomputer system 900 also includes an input device 912 (e.g., a keyboard), a cursor control device 914 (e.g., a mouse), adisk drive unit 916, a signal generation device 918 (e.g., a speaker) and anetwork interface device 920. - The
disk drive unit 916 includes a machine-readable medium 922 on which is stored one or more sets of instructions (e.g., software 924) embodying any one or more of the methodologies or functions described herein. The instructions (e.g., software 924) may also reside, completely or at least partially, within themain memory 904, thestatic memory 906, and/or within theprocessor 902 during execution thereof by thecomputer system 900. Themain memory 904 and theprocessor 902 also may constitute machine-readable media. Theinstructions 924 may further be transmitted or received over anetwork 926 via thenetwork interface device 920. - Applications that may include the apparatus and systems of various embodiments broadly include a variety of electronic and computer systems. Some embodiments implement functions in two or more specific interconnected hardware modules or devices with related control and data signals communicated between and through the modules, or as portions of an application-specific integrated circuit. Thus, the example system is applicable to software, firmware, and hardware implementations. In example embodiments, a computer system (e.g., a standalone, client or server computer system) configured by an application may constitute a “module” that is configured and operates to perform certain operations as described herein. In other embodiments, the “module” may be implemented mechanically or electronically. For example, a module may comprise dedicated circuitry or logic that is permanently configured (e.g., within a special-purpose processor) to perform certain operations. A module may also comprise programmable logic or circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily configured by software to perform certain operations. It will be appreciated that the decision to implement a module mechanically, in the dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g. configured by software) may be driven by cost and time considerations. Accordingly, the term “module” should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired) or temporarily configured (e.g., programmed) to operate in a certain manner and/or to perform certain operations described herein.
- While the machine-
readable medium 922 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “machine-readable medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present description. The term “machine-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical media and magnetic media. As noted, the software may be transmitted over a network using a transmission medium. The term “transmission medium” shall be taken to include any medium that is capable of storing, encoding or carrying instructions for transmission to and execution by the machine, and includes digital or analogue communications signal or other intangible medium to facilitate transmission and communication of such software. - The illustrations of embodiments described herein are intended to provide a general understanding of the structure of various embodiments, and they are not intended to serve as a complete description of all the elements and features of apparatus and systems that might make use of the structures described herein. Many other embodiments will be apparent to those of ordinary skill in the art upon reviewing the above description. Other embodiments may be utilized and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. The figures provided herein are merely representational and may not be drawn to scale. Certain proportions thereof may be exaggerated, while others may be minimized. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.
- In some embodiments, the methods described herein may be implemented in a distributed or non-distributed software application designed under a three-tier architecture paradigm, whereby the various components of computer code that implement this method may be categorized as belonging to one or more of these three tiers. Some embodiments may include a first tier as an interface (e.g., an interface tier) that is relatively free of application processing. Further, a second tier may be a logic tier that performs application processing in the form of logical/mathematical manipulations of data inputted through the interface level and communicates the results of these logical/mathematical manipulations to the interface tier and/or to a backend, or storage, tier. These logical/mathematical manipulations may relate to certain business rules or processes that govern the software application as a whole. A third, storage tier may be a persistent storage medium or non-persistent storage medium. In some cases, one or more of these tiers may be collapsed into another, resulting in a two-tier architecture, or even a one-tier architecture. For example, the interface and logic tiers may be consolidated, or the logic and storage tiers may be consolidated, as in the case of a software application with an embedded database. This three-tier architecture may be implemented using one technology, or, as will be discussed below, a variety of technologies. This three-tier architecture, and the technologies through which it is implemented, may be executed on two or more computer systems organized in a server-client, peer-to-peer, or so some other suitable configuration. Further, these three tiers may be distributed between multiple computer systems as various software components.
- Some example embodiments may include the above illustrated tiers, and processes or operations that make them up, as being written as one or more software components. Common to many of these components is the ability to generate, use, and manipulate data. These components, and the functionality associated with each, may be used by client, server, or peer computer systems. These various components may be implemented by a computer system on an as-needed basis. These components may be written in an object-oriented computer language such that a component oriented, or object-oriented programming technique can be implemented using a Visual Component Library (VCL), Component Library for Cross Platform (CLX), Java Beans (JB), Java Enterprise Beans (EJB), Component Object Model (COM), Distributed Component Object Model (DCOM), or other suitable technique. These components may be linked to other components via various APIs, and then compiled into one complete server, client, and/or peer software application. Further, these APIs may be able to communicate through various distributed programming protocols as distributed computing components.
- Some example embodiments may include remote procedure calls being used to implement one or more of the above illustrated components across a distributed programming environment as distributed computing components. For example, an interface component (e.g., an interface tier) may reside on a first computer system that is remotely located from a second computer system containing a logic component (e.g., a logic tier). These first and second computer systems may be configured in a server-client, peer-to-peer, or some other suitable configuration. These various components may be written using the above illustrated object-oriented programming techniques, and can be written in the same programming language, or a different programming language. Various protocols may be implemented to enable these various components to communicate regardless of the programming language used to write these components. For example, a component written in C++ may be able to communicate with another component written in the Java programming language by using a distributed computing protocol such as a Common Object Request Broker Architecture (CORBA), a Simple Object Access Protocol (SOAP), or some other suitable protocol. Some embodiments may include the use of one or more of these protocols with the various protocols outlined in the Open Systems Interconnection (OSI) model, or Transport Control Protocol/Internet Protocol (TCP/IP) protocol stack model for defining the protocols used by a network to transmit data.
- Some embodiments may utilize the OSI model or TCP/IP protocol stack model for defining the protocols used by a network to transmit data. In applying these models, a system of data transmission between a server and client, or between peer computer systems, is illustrated as a series of roughly five layers comprising: an application layer, a transport layer, a network layer, a data link layer, and a physical layer. In the case of software having a three-tier architecture, the various tiers (e.g., the interface, logic, and storage tiers) reside on the application layer of the TCP/IP protocol stack. In an example implementation using the TCP/IP protocol stack model, data from an application residing at the application layer is loaded into the data load field of a TCP segment residing at the transport layer. This TCP segment also contains port information for a recipient software application residing remotely. This TCP segment is loaded into the data load field of an IP datagram residing at the network layer. Next, this IP datagram is loaded into a frame residing at the data link layer. This frame is then encoded at the physical layer, and the data transmitted over a network such as an internet, Local Area Network (LAN). WAN, or some other suitable network. In some cases, internet refers to a network of networks. These networks may use a variety of protocols for the exchange of data, including the aforementioned TCP/IP, and additionally ATM, SNA, SDI, or some other suitable protocol. These networks may be organized within a variety of topologies (e.g., a star topology) or structures.
- The illustrations of embodiments described herein are intended to provide a general understanding of the structure of various embodiments, and they are not intended to serve as a complete description of all the elements and features of apparatus and systems that might make use of the structures described herein. Many other embodiments will be apparent to those of ordinary skill in the art upon reviewing the above description. Other embodiments may be utilized and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. The figures provided herein are merely representational and may not be drawn to scale. Certain proportions thereof may be exaggerated, while others may be minimized. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.
- Thus, systems and methods to process a query with a unified storage interface disclosed. While the present disclosure has been described in terms of several example embodiments, those of ordinary skill in the art will recognize that the present disclosure is not limited to the embodiments described, but may be practiced with modification and alteration within the spirit and scope of the appended claims. The description herein is thus to be regarded as illustrative instead of limiting.
Claims (20)
1. A system comprising:
a front-end server to receive a query, over a network, from a client machine, the query includes a query expression that includes at least one keyword:
a query engine to generate a query expression tree based on the query expression, the query expression tree includes a plurality of nodes that are representative of the query expression, the query engine to generate a cursor expression tree based on the query expression tree, the cursor expression tree includes a plurality of software components that correspond to the plurality of nodes in the query expression tree, the query engine to execute the plurality of software components in the cursor expression tree to retrieve data from a first storage device, the plurality of software components comprise a first software component that is utilized to retrieve data irrespective of a plurality of storage devices and a second software component that is utilized to retrieve data from a first storage device that is included in the plurality of storage devices, the front-end server to communicate search results, over the network, to the client machine, the search results include at least a portion of the data.
2. The system of claim 1 , wherein the first software component includes a storage cursor object.
3. The system of claim 1 , wherein the query engine generates the first software component based on an “AND” expression node in the query expression tree.
4. The system of claim 1 , wherein the query engine generates the second software component based on a term expression node in the query expression tree.
5. The system of claim 4 , wherein the query engine identifies the first storage device based on an input table that is associated with the term expression.
6. The system of claim 1 , wherein the query engine identifies a second storage device based on a second input table that is associated with a second query expression that is included in a second query that is received from the client machine.
7. The system of claim 6 , wherein the second storage device is utilized to store data in directed graph format.
8. The system of claim 1 , wherein the first storage device is utilized to store data in a relational format that includes tuples.
9. The system of claim 1 , wherein the query engine stores the first and second software components in the cursor expression tree.
10. A method comprising:
receiving a query, over a network, from a client machine, the query including a query expression that includes at least one keyword;
generating a query expression tree based on the query expression, the query expression tree including a plurality of nodes being that are representative of the query expression;
generating a cursor expression tree based on a query expression tree, the cursor expression tree including a plurality of software components that correspond to the plurality of nodes in the query expression tree;
executing the plurality of software components in the cursor expression tree to retrieve data from a first storage device, the plurality of software components comprising a first software component that is utilized to retrieve data irrespective of a plurality of storage devices and a second software component that is utilized to retrieve data from a first storage device that is included in the plurality of storage devices; and
communicating search results, over the network, to the client machine, the search results including at least a portion of the data.
11. The method of claim 10 , wherein the first software component includes a storage cursor object.
12. The method of claim 10 , wherein the generating the cursor expression tree includes generating the first software component based on an “AND” expression node in the query expression tree.
13. The method of claim 10 , wherein the generating the cursor expression tree includes generating the second software component based on a term expression node in the query expression tree.
14. The method of claim 13 , further comprising identifying the first storage device based on an input table that is associated with the term expression.
15. The method of claim 10 , further comprising identifying a second storage device based on an second input table that is associated with a second query expression that is included in a second query that is received from the client machine.
16. The method of claim 15 , wherein the second storage device is utilized to store data in directed graph format.
17. The method of claim 10 , wherein the first storage device is utilized to store data in a relational format that includes tuples.
18. The method of claim 10 , further comprising storing the first and second software components in the cursor expression tree.
19. A machine readable medium storing instructions, which when executed on a processor, cause the processor to perform a method comprising.
receiving a query, over a network, from a client machine, the query including a query expression that includes at least one keyword,
generating a query expression tree based on the query expression, the query expression tree including a plurality of nodes being that are representative of the query expression;
generating a cursor expression tree based on a query expression tree, the cursor expression tree including a plurality of software components that correspond to the plurality of nodes in the query expression tree:
executing the plurality of software components in the cursor expression tree to retrieve data from a first storage device, the plurality of software components comprising a first software component that is utilized to retrieve data irrespective of a plurality of storage devices and a second software component that is utilized to retrieve data from a first storage device that is included in the plurality of storage devices; and
communicating search results, over the network, to the client machine, the search results including at least a portion of the data.
20. A system comprising:
a front-end server to receive a query, over a network, from a client machine, the query includes a query expression that includes at least one keyword;
a means for generating a query expression tree based on the query expression, the query expression tree includes a plurality of nodes that are representative of the query expression, a query engine to generate a cursor expression tree based on the query expression tree, the cursor expression tree includes a plurality of software components that correspond to the plurality of nodes in the query expression tree, the means for executing the plurality of software components in the cursor expression tree to retrieve data from a first storage device, the plurality of software components comprise a first software component that is utilized to retrieve data irrespective of a plurality of storage devices and a second software component that is utilized to retrieve data from a first storage device that is included in the plurality of storage devices, the front-end server to communicate search results, over the network, to the client machine, the search results include at least a portion of the data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/730,583 US20140032593A1 (en) | 2012-07-25 | 2012-12-28 | Systems and methods to process a query with a unified storage interface |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261675793P | 2012-07-25 | 2012-07-25 | |
US13/730,583 US20140032593A1 (en) | 2012-07-25 | 2012-12-28 | Systems and methods to process a query with a unified storage interface |
Publications (1)
Publication Number | Publication Date |
---|---|
US20140032593A1 true US20140032593A1 (en) | 2014-01-30 |
Family
ID=49995896
Family Applications (5)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/711,287 Active 2033-02-21 US9081821B2 (en) | 2012-07-25 | 2012-12-11 | Spell check using column cursor |
US13/730,536 Active 2034-06-27 US9607049B2 (en) | 2012-07-25 | 2012-12-28 | Systems and methods to build and utilize a search infrastructure |
US13/730,583 Abandoned US20140032593A1 (en) | 2012-07-25 | 2012-12-28 | Systems and methods to process a query with a unified storage interface |
US13/854,801 Abandoned US20140032517A1 (en) | 2012-07-25 | 2013-04-01 | System and methods to configure a profile to rank search results |
US15/470,565 Active US10482113B2 (en) | 2012-07-25 | 2017-03-27 | Systems and methods to build and utilize a search infrastructure |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/711,287 Active 2033-02-21 US9081821B2 (en) | 2012-07-25 | 2012-12-11 | Spell check using column cursor |
US13/730,536 Active 2034-06-27 US9607049B2 (en) | 2012-07-25 | 2012-12-28 | Systems and methods to build and utilize a search infrastructure |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/854,801 Abandoned US20140032517A1 (en) | 2012-07-25 | 2013-04-01 | System and methods to configure a profile to rank search results |
US15/470,565 Active US10482113B2 (en) | 2012-07-25 | 2017-03-27 | Systems and methods to build and utilize a search infrastructure |
Country Status (1)
Country | Link |
---|---|
US (5) | US9081821B2 (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140046928A1 (en) * | 2012-08-09 | 2014-02-13 | International Business Machines Corporation | Query plans with parameter markers in place of object identifiers |
US20140164452A1 (en) * | 2012-12-06 | 2014-06-12 | Empire Technology Development Llc | Decentralizing a hadoop cluster |
US20140188825A1 (en) * | 2012-12-31 | 2014-07-03 | Kannan Muthukkaruppan | Placement policy |
US20150220583A1 (en) * | 2014-01-31 | 2015-08-06 | Microsoft Corporation | External data access with split index |
US9158768B2 (en) | 2012-07-25 | 2015-10-13 | Paypal, Inc. | System and methods to configure a query language using an operator dictionary |
US20170032003A1 (en) * | 2013-10-01 | 2017-02-02 | Cloudera, Inc. | Background format optimization for enhanced sql-like queries in hadoop |
US9607049B2 (en) | 2012-07-25 | 2017-03-28 | Ebay Inc. | Systems and methods to build and utilize a search infrastructure |
CN107169138A (en) * | 2017-06-13 | 2017-09-15 | 电子科技大学 | A kind of data distributing method of Based on Distributed memory database query engine |
US10073874B1 (en) * | 2013-05-28 | 2018-09-11 | Google Llc | Updating inverted indices |
CN110837585A (en) * | 2019-11-07 | 2020-02-25 | 中盈优创资讯科技有限公司 | Multi-source heterogeneous data association query method and system |
US20200356447A1 (en) * | 2013-11-01 | 2020-11-12 | Cloudera, Inc. | Manifest-based snapshots in distributed computing environments |
US11372823B2 (en) * | 2019-02-06 | 2022-06-28 | President And Fellows Of Harvard College | File management with log-structured merge bush |
US11416180B2 (en) * | 2020-11-05 | 2022-08-16 | International Business Machines Corporation | Temporary data storage in data node of distributed file system |
US20220405263A1 (en) * | 2021-06-21 | 2022-12-22 | International Business Machines Corporation | Increasing Index Availability in Databases |
Families Citing this family (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5705788B2 (en) * | 2012-06-05 | 2015-04-22 | 株式会社日立製作所 | Assembly model similar structure search system and assembly model similar structure search method |
US9607025B2 (en) | 2012-09-24 | 2017-03-28 | Andrew L. DiRienzo | Multi-component profiling systems and methods |
US9805078B2 (en) | 2012-12-31 | 2017-10-31 | Ebay, Inc. | Next generation near real-time indexing |
US9372942B1 (en) * | 2013-03-15 | 2016-06-21 | Dell Software Inc. | System and method for facilitating data visualization via a map-reduce framework |
US9361329B2 (en) * | 2013-12-13 | 2016-06-07 | International Business Machines Corporation | Managing time series databases |
US20150220510A1 (en) * | 2014-01-31 | 2015-08-06 | International Business Machines Corporation | Interactive data-driven optimization of effective linguistic choices in communication |
US9037967B1 (en) * | 2014-02-18 | 2015-05-19 | King Fahd University Of Petroleum And Minerals | Arabic spell checking technique |
US9753818B2 (en) | 2014-09-19 | 2017-09-05 | Splunk Inc. | Data forwarding using multiple data pipelines |
US9838467B2 (en) | 2014-03-17 | 2017-12-05 | Splunk Inc. | Dynamically instantiating dual-queue systems |
US9660930B2 (en) | 2014-03-17 | 2017-05-23 | Splunk Inc. | Dynamic data server nodes |
US9838346B2 (en) | 2014-03-17 | 2017-12-05 | Splunk Inc. | Alerting on dual-queue systems |
US9836358B2 (en) | 2014-03-17 | 2017-12-05 | Splunk Inc. | Ephemeral remote data store for dual-queue systems |
US20160092532A1 (en) * | 2014-09-29 | 2016-03-31 | Facebook, Inc. | Load-balancing inbound real-time data updates for a social networking system |
US10095683B2 (en) * | 2015-04-10 | 2018-10-09 | Facebook, Inc. | Contextual speller models on online social networks |
DE102015216722A1 (en) * | 2015-09-01 | 2017-03-02 | upday GmbH & Co. KG | Data processing system |
US10749766B1 (en) | 2015-10-23 | 2020-08-18 | Amazon Technologies, Inc. | Archival datastore for aggregated metrics |
US11003690B1 (en) * | 2015-10-23 | 2021-05-11 | Amazon Technologies, Inc. | Aggregator systems for storage of data segments |
FR3056797B1 (en) * | 2016-09-29 | 2022-04-15 | Target2Sell | METHOD FOR ESTABLISHING AN ORDERED LIST OF OBJECTS AND SYSTEM FOR IMPLEMENTING THE METHOD. |
US10417234B2 (en) * | 2016-10-07 | 2019-09-17 | Sap Se | Data flow modeling and execution |
US10803034B2 (en) * | 2016-11-23 | 2020-10-13 | Amazon Technologies, Inc. | Global column indexing in a graph database |
CN110462605A (en) * | 2016-12-22 | 2019-11-15 | 酷旺 | Method for the method for terminal user locally dissected and for searching for personal information |
FR3061329B1 (en) * | 2016-12-22 | 2019-08-30 | Qwant | METHOD FOR LOCAL PROFILING OF A USER OF A TERMINAL AND METHOD FOR SEARCHING PRIVATE INFORMATION |
US10705943B2 (en) * | 2017-09-08 | 2020-07-07 | Devfactory Innovations Fz-Llc | Automating identification of test cases for library suggestion models |
US11562006B2 (en) * | 2017-10-03 | 2023-01-24 | Ohio State Innovation Foundation | Apparatus and method for interactive analysis of aviation data |
CN108021636B (en) * | 2017-11-27 | 2021-05-04 | 武汉大学 | Propagation network structure reconstruction method independent of time information |
CN108228107A (en) * | 2018-01-02 | 2018-06-29 | 联想(北京)有限公司 | A kind of data transmission method, data transmission device and electronic equipment |
US11044258B2 (en) * | 2018-08-24 | 2021-06-22 | Kyocera Document Solutions Inc. | Decentralized network for secure distribution of digital documents |
US11093446B2 (en) * | 2018-10-31 | 2021-08-17 | Western Digital Technologies, Inc. | Duplicate request checking for file system interfaces |
JP7193721B2 (en) * | 2019-01-31 | 2022-12-21 | 富士通株式会社 | Information processing device and database search program |
CN111814003B (en) * | 2019-04-12 | 2024-04-23 | 伊姆西Ip控股有限责任公司 | Method, electronic device and computer program product for establishing metadata index |
US11537554B2 (en) * | 2019-07-01 | 2022-12-27 | Elastic Flash Inc. | Analysis of streaming data using deltas and snapshots |
US20220414171A1 (en) * | 2021-06-28 | 2022-12-29 | Flipkart Internet Private Limited | System and method for generating a user query based on a target context aware token |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6526403B1 (en) * | 1999-12-17 | 2003-02-25 | International Business Machines Corporation | Method, computer program product, and system for rewriting database queries in a heterogenous environment |
US20030229639A1 (en) * | 2002-06-07 | 2003-12-11 | International Business Machines Corporation | Runtime query optimization for dynamically selecting from multiple plans in a query based upon runtime-evaluated performance criterion |
US20050010606A1 (en) * | 2003-07-11 | 2005-01-13 | Martin Kaiser | Data organization for database optimization |
US20050015381A1 (en) * | 2001-09-04 | 2005-01-20 | Clifford Paul Ian | Database management system |
US7376642B2 (en) * | 2004-03-30 | 2008-05-20 | Microsoft Corporation | Integrated full text search system and method |
US20090228528A1 (en) * | 2008-03-06 | 2009-09-10 | International Business Machines Corporation | Supporting sub-document updates and queries in an inverted index |
US8601474B2 (en) * | 2011-10-14 | 2013-12-03 | International Business Machines Corporation | Resuming execution of an execution plan in a virtual machine |
Family Cites Families (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
USRE32773E (en) * | 1983-02-22 | 1988-10-25 | Method of creating text using a computer | |
US5347653A (en) * | 1991-06-28 | 1994-09-13 | Digital Equipment Corporation | System for reconstructing prior versions of indexes using records indicating changes between successive versions of the indexes |
US5576734A (en) * | 1993-10-28 | 1996-11-19 | The Mitre Corporation | Keyboard emulator system |
US6009425A (en) * | 1996-08-21 | 1999-12-28 | International Business Machines Corporation | System and method for performing record deletions using index scans |
US6502233B1 (en) | 1998-11-13 | 2002-12-31 | Microsoft Corporation | Automated help system for reference information |
US6438579B1 (en) | 1999-07-16 | 2002-08-20 | Agent Arts, Inc. | Automated content and collaboration-based system and methods for determining and providing content recommendations |
US6701362B1 (en) | 2000-02-23 | 2004-03-02 | Purpleyogi.Com Inc. | Method for creating user profiles |
JP5105456B2 (en) | 2000-05-30 | 2012-12-26 | 株式会社ホットリンク | Distributed monitoring system that provides knowledge services |
US20040199899A1 (en) * | 2003-04-04 | 2004-10-07 | Powers Richard Dickert | System and method for determining whether a mix of system components is compatible |
WO2005027068A1 (en) * | 2003-09-12 | 2005-03-24 | Canon Kabushiki Kaisha | Streaming non-continuous video data |
US7693827B2 (en) | 2003-09-30 | 2010-04-06 | Google Inc. | Personalization of placed content ordering in search results |
US20050071328A1 (en) | 2003-09-30 | 2005-03-31 | Lawrence Stephen R. | Personalization of web search |
US20050283473A1 (en) * | 2004-06-17 | 2005-12-22 | Armand Rousso | Apparatus, method and system of artificial intelligence for data searching applications |
US8495023B1 (en) * | 2004-09-01 | 2013-07-23 | Symantec Operating Corporation | Delta catalogs in a backup system |
US7647580B2 (en) | 2004-09-07 | 2010-01-12 | Microsoft Corporation | General programming language support for nullable types |
US7367019B2 (en) | 2004-09-16 | 2008-04-29 | International Business Machines Corporation | Parameter management using compiler directives |
US8019752B2 (en) * | 2005-11-10 | 2011-09-13 | Endeca Technologies, Inc. | System and method for information retrieval from object collections with complex interrelationships |
US7925676B2 (en) * | 2006-01-27 | 2011-04-12 | Google Inc. | Data object visualization using maps |
US7844603B2 (en) | 2006-02-17 | 2010-11-30 | Google Inc. | Sharing user distributed search results |
US8122019B2 (en) | 2006-02-17 | 2012-02-21 | Google Inc. | Sharing user distributed search results |
US8051385B1 (en) * | 2006-03-29 | 2011-11-01 | Amazon Technologies, Inc. | Content selection and aggregated search results presentation on a handheld electronic device |
US8271452B2 (en) * | 2006-06-12 | 2012-09-18 | Rainstor Limited | Method, system, and database archive for enhancing database archiving |
US7917499B2 (en) * | 2006-06-30 | 2011-03-29 | Microsoft Corporation | Updating adaptive, deferred, incremental indexes |
US7676524B2 (en) * | 2007-01-31 | 2010-03-09 | Microsoft Corporation | Hierarchical cursor-based object model |
US20080208844A1 (en) | 2007-02-27 | 2008-08-28 | Jenkins Michael D | Entertainment platform with layered advanced search and profiling technology |
US8364648B1 (en) * | 2007-04-09 | 2013-01-29 | Quest Software, Inc. | Recovering a database to any point-in-time in the past with guaranteed data consistency |
JP2009064120A (en) * | 2007-09-05 | 2009-03-26 | Hitachi Ltd | Search system |
US8077983B2 (en) * | 2007-10-04 | 2011-12-13 | Zi Corporation Of Canada, Inc. | Systems and methods for character correction in communication devices |
US8494978B2 (en) | 2007-11-02 | 2013-07-23 | Ebay Inc. | Inferring user preferences from an internet based social interactive construct |
US20090248401A1 (en) * | 2008-03-31 | 2009-10-01 | International Business Machines Corporation | System and Methods For Using Short-Hand Interpretation Dictionaries In Collaboration Environments |
US8346749B2 (en) | 2008-06-27 | 2013-01-01 | Microsoft Corporation | Balancing the costs of sharing private data with the utility of enhanced personalization of online services |
US8073847B2 (en) | 2008-06-27 | 2011-12-06 | Microsoft Corporation | Extended user profile |
US8214380B1 (en) | 2009-02-09 | 2012-07-03 | Repio, Inc. | System and method for managing search results |
US9519716B2 (en) | 2009-03-31 | 2016-12-13 | Excalibur Ip, Llc | System and method for conducting a profile based search |
US20100269090A1 (en) | 2009-04-17 | 2010-10-21 | France Telecom | Method of making it possible to simplify the programming of software |
EP2438540A1 (en) * | 2009-06-01 | 2012-04-11 | AOL Inc. | Providing suggested web search queries based on click data of stored search queries |
US8280902B2 (en) | 2009-09-01 | 2012-10-02 | Lockheed Martin Corporation | High precision search system and method |
US20110218986A1 (en) * | 2010-03-06 | 2011-09-08 | David Joseph O'Hanlon | Search engine optimization economic purchasing method |
US8538959B2 (en) | 2010-07-16 | 2013-09-17 | International Business Machines Corporation | Personalized data search utilizing social activities |
US8380711B2 (en) * | 2011-03-10 | 2013-02-19 | International Business Machines Corporation | Hierarchical ranking of facial attributes |
US8943015B2 (en) | 2011-12-22 | 2015-01-27 | Google Technology Holdings LLC | Hierarchical behavioral profile |
US9081821B2 (en) | 2012-07-25 | 2015-07-14 | Ebay Inc. | Spell check using column cursor |
US9158768B2 (en) | 2012-07-25 | 2015-10-13 | Paypal, Inc. | System and methods to configure a query language using an operator dictionary |
-
2012
- 2012-12-11 US US13/711,287 patent/US9081821B2/en active Active
- 2012-12-28 US US13/730,536 patent/US9607049B2/en active Active
- 2012-12-28 US US13/730,583 patent/US20140032593A1/en not_active Abandoned
-
2013
- 2013-04-01 US US13/854,801 patent/US20140032517A1/en not_active Abandoned
-
2017
- 2017-03-27 US US15/470,565 patent/US10482113B2/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6526403B1 (en) * | 1999-12-17 | 2003-02-25 | International Business Machines Corporation | Method, computer program product, and system for rewriting database queries in a heterogenous environment |
US20050015381A1 (en) * | 2001-09-04 | 2005-01-20 | Clifford Paul Ian | Database management system |
US20030229639A1 (en) * | 2002-06-07 | 2003-12-11 | International Business Machines Corporation | Runtime query optimization for dynamically selecting from multiple plans in a query based upon runtime-evaluated performance criterion |
US20050010606A1 (en) * | 2003-07-11 | 2005-01-13 | Martin Kaiser | Data organization for database optimization |
US7376642B2 (en) * | 2004-03-30 | 2008-05-20 | Microsoft Corporation | Integrated full text search system and method |
US20090228528A1 (en) * | 2008-03-06 | 2009-09-10 | International Business Machines Corporation | Supporting sub-document updates and queries in an inverted index |
US8601474B2 (en) * | 2011-10-14 | 2013-12-03 | International Business Machines Corporation | Resuming execution of an execution plan in a virtual machine |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9158768B2 (en) | 2012-07-25 | 2015-10-13 | Paypal, Inc. | System and methods to configure a query language using an operator dictionary |
US9460151B2 (en) | 2012-07-25 | 2016-10-04 | Paypal, Inc. | System and methods to configure a query language using an operator dictionary |
US10482113B2 (en) | 2012-07-25 | 2019-11-19 | Ebay Inc. | Systems and methods to build and utilize a search infrastructure |
US9607049B2 (en) | 2012-07-25 | 2017-03-28 | Ebay Inc. | Systems and methods to build and utilize a search infrastructure |
US20140046928A1 (en) * | 2012-08-09 | 2014-02-13 | International Business Machines Corporation | Query plans with parameter markers in place of object identifiers |
US8924373B2 (en) * | 2012-08-09 | 2014-12-30 | International Business Machines Corporation | Query plans with parameter markers in place of object identifiers |
US20140164452A1 (en) * | 2012-12-06 | 2014-06-12 | Empire Technology Development Llc | Decentralizing a hadoop cluster |
US9588984B2 (en) * | 2012-12-06 | 2017-03-07 | Empire Technology Development Llc | Peer-to-peer data management for a distributed file system |
US20140188825A1 (en) * | 2012-12-31 | 2014-07-03 | Kannan Muthukkaruppan | Placement policy |
US10521396B2 (en) | 2012-12-31 | 2019-12-31 | Facebook, Inc. | Placement policy |
US9268808B2 (en) * | 2012-12-31 | 2016-02-23 | Facebook, Inc. | Placement policy |
US10073874B1 (en) * | 2013-05-28 | 2018-09-11 | Google Llc | Updating inverted indices |
US11630830B2 (en) * | 2013-10-01 | 2023-04-18 | Cloudera Inc. | Background format optimization for enhanced queries in a distributed computing cluster |
US11567956B2 (en) * | 2013-10-01 | 2023-01-31 | Cloudera, Inc. | Background format optimization for enhanced queries in a distributed computing cluster |
US20170032003A1 (en) * | 2013-10-01 | 2017-02-02 | Cloudera, Inc. | Background format optimization for enhanced sql-like queries in hadoop |
US10706059B2 (en) * | 2013-10-01 | 2020-07-07 | Cloudera, Inc. | Background format optimization for enhanced SQL-like queries in Hadoop |
US20200356447A1 (en) * | 2013-11-01 | 2020-11-12 | Cloudera, Inc. | Manifest-based snapshots in distributed computing environments |
US11768739B2 (en) * | 2013-11-01 | 2023-09-26 | Cloudera, Inc. | Manifest-based snapshots in distributed computing environments |
US11030179B2 (en) | 2014-01-31 | 2021-06-08 | Microsoft Technology Licensing, Llc | External data access with split index |
US9715515B2 (en) * | 2014-01-31 | 2017-07-25 | Microsoft Technology Licensing, Llc | External data access with split index |
US20150220583A1 (en) * | 2014-01-31 | 2015-08-06 | Microsoft Corporation | External data access with split index |
CN107169138A (en) * | 2017-06-13 | 2017-09-15 | 电子科技大学 | A kind of data distributing method of Based on Distributed memory database query engine |
US11372823B2 (en) * | 2019-02-06 | 2022-06-28 | President And Fellows Of Harvard College | File management with log-structured merge bush |
CN110837585A (en) * | 2019-11-07 | 2020-02-25 | 中盈优创资讯科技有限公司 | Multi-source heterogeneous data association query method and system |
US11416180B2 (en) * | 2020-11-05 | 2022-08-16 | International Business Machines Corporation | Temporary data storage in data node of distributed file system |
US20220405263A1 (en) * | 2021-06-21 | 2022-12-22 | International Business Machines Corporation | Increasing Index Availability in Databases |
WO2022269396A1 (en) * | 2021-06-21 | 2022-12-29 | International Business Machines Corporation | Increasing index availability in databases |
Also Published As
Publication number | Publication date |
---|---|
US9081821B2 (en) | 2015-07-14 |
US9607049B2 (en) | 2017-03-28 |
US20140032566A1 (en) | 2014-01-30 |
US20140032532A1 (en) | 2014-01-30 |
US10482113B2 (en) | 2019-11-19 |
US20140032517A1 (en) | 2014-01-30 |
US20170242911A1 (en) | 2017-08-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10482113B2 (en) | Systems and methods to build and utilize a search infrastructure | |
US11216430B2 (en) | Next generation near real-time indexing | |
US11586692B2 (en) | Streaming data processing | |
US20220327149A1 (en) | Dynamic partition allocation for query execution | |
US11481396B2 (en) | Executing untrusted commands from a distributed execution model | |
US11163758B2 (en) | External dataset capability compensation | |
US11232100B2 (en) | Resource allocation for multiple datasets | |
US11416528B2 (en) | Query acceleration data store | |
US11461334B2 (en) | Data conditioning for dataset destination | |
US10977260B2 (en) | Task distribution in an execution node of a distributed execution environment | |
US11126632B2 (en) | Subquery generation based on search configuration data from an external data system | |
US10795884B2 (en) | Dynamic resource allocation for common storage query | |
US10726009B2 (en) | Query processing using query-resource usage and node utilization data | |
US11243963B2 (en) | Distributing partial results to worker nodes from an external data system | |
US11411804B1 (en) | Actionable event responder | |
US11604795B2 (en) | Distributing partial results from an external data system between worker nodes | |
US10592561B2 (en) | Co-located deployment of a data fabric service system | |
US20190138642A1 (en) | Execution of a query received from a data intake and query system | |
US10853847B2 (en) | Methods and systems for near real-time lookalike audience expansion in ads targeting | |
WO2020027867A1 (en) | Generating a subquery for a distinct data intake and query system | |
US20190095488A1 (en) | Executing a distributed execution model with untrusted commands | |
US8700652B2 (en) | Systems and methods to generate and utilize a synonym dictionary | |
US9589285B2 (en) | Representation manipulation language |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: EBAY INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LIBENZI, DAVIDE;HENDERSON, RICHARD D.;LAKSHMINATH, ANAND;AND OTHERS;SIGNING DATES FROM 20130214 TO 20140123;REEL/FRAME:032128/0727 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |