WO2007023136A1 - Packing nodes into records to store xml xquery data model and other hierarchically structured data - Google Patents

Packing nodes into records to store xml xquery data model and other hierarchically structured data Download PDF

Info

Publication number
WO2007023136A1
WO2007023136A1 PCT/EP2006/065449 EP2006065449W WO2007023136A1 WO 2007023136 A1 WO2007023136 A1 WO 2007023136A1 EP 2006065449 W EP2006065449 W EP 2006065449W WO 2007023136 A1 WO2007023136 A1 WO 2007023136A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
record
nodes
identifier
sub
Prior art date
Application number
PCT/EP2006/065449
Other languages
French (fr)
Inventor
Yao- Ching Stephen Chen
Yue Huang
Fen-Ling Lin
Brian Thinh-Vinh Tran
Guogen Zhang
Original Assignee
International Business Machines Corporation
Ibm United Kingdom Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corporation, Ibm United Kingdom Limited filed Critical International Business Machines Corporation
Publication of WO2007023136A1 publication Critical patent/WO2007023136A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/81Indexing, e.g. XML tags; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/84Mapping; Conversion
    • G06F16/86Mapping to a database

Definitions

  • the present invention relates to hierarchically structured data, and more particularly to the storage of hierarchically structured data in a database.
  • Hierarchically structured data such as extensible Mark-up Language (XML)
  • XML extensible Mark-up Language
  • Another conventional approach is to decompose and store the XML as tables in the relational database. This requires either a special relational schema for each XML schema or a generic relational representation for the XML data model.
  • the result data is relatively large, and the queries are usually slow to execute.
  • Another conventional approach uses an object data model to store XML tree data, where many direct references or pointers are stored in the records for the parent-child relationships.
  • this approach lacks scalability, has a larger data volume due to the references, and is less flexible in the re-organization of records.
  • Another conventional approach decomposes the XML data at a high level into relational data.
  • this approach is inefficient in that it places lower levels and long text into a Character Large Object (CLOB) , or it stores the original textual XML redundantly along with the object model .
  • CLOB Character Large Object
  • An improved method and system for storing hierarchically structured data in record data structures uses logical node identifiers to reference the nodes of a hierarchically structured data stored within and across relational data structures, such as records or pages of records.
  • a node identifier index is used to map each logical node identifier to a record identifier for the record that contains the node.
  • a proxy node is used to represent the sub-tree in the parent record.
  • the mapping in the node identifier index reflects the storage of the sub-tree nodes in the separate record. This storage scheme supports document order clustering and sub-document update with the record as the unit.
  • Figure 1 illustrates an example hierarchically structured data tree containing a plurality of nodes
  • Figure 2 is a flowchart illustrating an embodiment of a method for storing hierarchically structured data in a record data structure in accordance with a preferred embodiment of the present invention
  • Figure 3 illustrates an example record storing a hierarchically structured data tree in accordance with a preferred embodiment of the present invention
  • Figure 4 illustrates the local and absolute node identifiers for the example tree in Figure 1;
  • Figure 5 is a flowchart illustrating a search for a node of the hierarchically structured data in accordance with a preferred embodiment of the present invention
  • Figure 6 illustrates example records for storing a tree across multiple records in accordance with a preferred embodiment of the present invention
  • Figure 7 is a flowchart illustrating a method for generating the node identifier indexes for records with proxy nodes in accordance with a preferred embodiment of the present invention
  • Figure 8 illustrates example entries of the node identifier index in accordance with a preferred embodiment of the present invention
  • Figures 9A and 9B illustrate in more detail the tree traversal process used by the method in accordance with a preferred embodiment of the present invention.
  • FIGS. IOA and IOB illustrate range proxy nodes in accordance with a preferred embodiment of the present invention.
  • Embodiments of the present invention provide improved methods and systems for storing hierarchically structured data in record data structures.
  • Various modifications to the preferred embodiment will be readily apparent to those skilled in the art and the generic principles herein may be applied to other embodiments.
  • the present invention is not intended to be limited to the embodiment shown but is to be accorded the widest scope consistent with the principles and features described herein.
  • the method and system in accordance with a preferred embodiment of the present invention uses logical node identifiers to reference the nodes of a hierarchically structured data stored within and across relational data structures, such as records or pages of records.
  • a node identifier index is then used to map each logical node identifier to a record identifier for the record that contains the node.
  • a proxy node is used to represent the sub-tree in the parent record.
  • the mapping in the node identifier index is then updated to reflect the storage of the sub-tree nodes in the separate record. In this manner, when re-organization of records are desired or needed, only the node identifier index needs to be updated.
  • the logical node identifiers in the records need not be changed.
  • FIG. 1 illustrates an example hierarchically structured data tree containing a plurality of nodes.
  • the tree 101 can represent any type of hierarchically structured data, such as XML. Although the preferred embodiment of the present invention may be described below in the context of XML, one of ordinary skill in the art will understand that the method and system can be applied to other types of hierarchically structure data.
  • the tree 101 has a root node (Node 0) with one child node (Node 1) .
  • Node 1 has three child nodes (Nodes 2, 6, and 7) .
  • Nodes 2, 6, and 7 are thus sibling nodes.
  • Node 6 is a leaf node (it has no child nodes) .
  • Node 2 has three child nodes (Nodes 3, 4, and 5) .
  • Node 7 has one child node (Node 8) .
  • FIG. 2 is a flowchart illustrating an embodiment of a method for storing hierarchically structured data in a record data structure in accordance with the preferred embodiment of the present invention.
  • the hierarchically structure data comprises a plurality of nodes. Initially, there is no node, a working buffer (wbuf) is empty, and a parent stack is also empty, via step 201. The next node information is then obtained, via step 202. It is then determined if there is more node information, via step 203, i.e., if the traversal of the hierarchically structured data or its equivalent token information has ended. If not, and if the node is the first node, then the node is put in the beginning of the wbuf, via step 204, and the working pointer is advanced.
  • wbuf working buffer
  • the node is a new node, then it is determined if there is enough space in the wbuf for the new node, via step 205. If not, then the largest sub-tree (or a sequence of sub-trees) of the parent node is stored into one record, via step 207. The taken-out sub-tree (or a sequence of sub-trees) is replaced with a proxy node until there is enough space for the new node. If there is enough space, then the relationship between the node and a current node is determined, via step 208. If the node is the child node of the current node, then the parent node pointer is pushed onto the parent stack, via step 209.
  • step 209 is skipped.
  • the node is put into the wbuf at the location pointed to by the working pointer, the parent's child count is incremented, and the working pointer is advanced, via step 210. If the node is an end of a set of child nodes, then the parent node pointer is popped from the parent stack, via step 206.
  • step 203 there is no more node information, via step 203.
  • the nodes stored in the wbuf is stored into one or more records, via step 211.
  • FIG. 3 illustrates an example record storing a hierarchically structured data tree in accordance with the preferred embodiment of the present invention.
  • nodes are stored within records.
  • a plurality of records is stored within a page.
  • a plurality of pages is stored for a document.
  • This type of record data structure is known in the art and will not be described in detail here.
  • Each document is assigned a document identifier (DocID) .
  • DocID document identifier
  • the record contains a record header 301 and nodes 302.
  • the record is assigned a record identifier (RID) , which references a physical address of the record.
  • each node is assigned a logical node identifier (node ID) .
  • a logical node ID identifies a node based upon its relationship with the other nodes in the tree. It does not identify the physical location where the node is stored.
  • the local node ID of a node is assigned to the node according to its sequence under that particular parent node. Child nodes of different parent nodes are assigned local node ID's independently at each level in the tree.
  • the absolute node ID is a concatenation of the local node ID's from the root node to the node.
  • the local node ID for Node 5 is x 06' to indicate that it is the third sibling node at its level, while its absolute node ID is '020206' .
  • the absolute node ID indicates that Node 5 is the third child node of its parent node (Node 2), where its parent node is a first child node of its grandparent node (Node 1) , where its grandparent is a first child node of the root node (Node 0) .
  • the root node is assigned a local node ID of '00' and is ignored.
  • the record header 301 contains an absolute node ID of the rooted node.
  • Each node 302 within the record contains a node kind, node length, number of children, and the nodes for the children. It also stores its local node ID.
  • Figure 4 illustrates the local and absolute node IDs for the example tree 101 in Figure 1. Logical node ID's are further described in co-pending U.S. patent application serial no. 10/709,415 published with publication number 20060004858 titled "Self-Adaptive Prefix Encoding for Stable Node identifiers", filed on May 4, 2004, and assigned to the assignee of the present application. Applicant hereby incorporates this patent application by reference.
  • the logical node ID provides stable node encodings that allow for arbitrary insertion, deletion or replacement of nodes. Existing node ID's need not be modified when a node is inserted, deleted, or replaced to keep node ID's in document order. This holds true because a logical node ID is not modeled as a fixed string of decimal numbers, but rather as a variable-length binary string.
  • the storage of the tree 101 into records is based on a preorder traversal process, known in the art.
  • a grouping logic keeps track of the sub-tree being constructed for the length of the sub-tree rooted at the current node. For example, assume that the maximum record size, R, is known. A working buffer of 2xR or more in size is used in the construction. If the entire tree is smaller than R, then the entire tree is stored into one record. Otherwise, the tree is split into multiple records.
  • R maximum record size
  • the root node (Node 0) is first stored with an indication that it has one child node. Its child node (Node 1) is then stored with an indication that it has three child nodes. Next, the first child node (Node 2) is stored with an indication that it has three child nodes. These child nodes (Nodes 3, 4, and 5) are then stored. The traversal process returns to Node 2 and continues with the next sibling node (Node 6) . Nodes 6 and 7 are then stored, with an indication that Node 7 has one child. Node 8 is then stored after Node 7.
  • the relationships among the nodes of the tree 101 are captured by the nesting structure. No explicit links are used.
  • the node identifier index is searched, via step 501, to obtain the RID corresponding to a logical node ID.
  • the record corresponding to the RID is then traversed to obtain the node corresponding to the logical node ID, via step 502.
  • the same traversal process used when storing the tree is used to locate the node with the local node ID at each level.
  • a hierarchically structured data tree is stored within a single record whenever possible. Occasionally, multiple records are required to store the hierarchically structured data tree.
  • the method in accordance with the preferred embodiment of the present invention stores sub-trees in a separate record, and represents this sub-tree in the parent record with a "proxy node", which itself does not contain a logical node ID.
  • a "proxy node” which itself does not contain a logical node ID.
  • the sub-tree of Node 2, containing Nodes 2, 3, 4, and 5, is then stored in a separate record.
  • each record stores one sub-tree.
  • FIG. 6 illustrates example records for storing a tree across multiple records in accordance with the preferred embodiment of the present invention.
  • the parent record 601 contains a proxy node 603 that represents the sub-tree rooted in Node 2.
  • a node identifier (node ID) index is created. This index is to map a node ID to the RID of a record that contain the node with the given node ID. All the node IDs in document order can be viewed as points in a line. The records break this line into a plurality of intervals. The node ID index contains the upper end point of each interval.
  • Figure 7 is a flowchart illustrating a method for generating the node identifier indexes for records with proxy nodes in accordance with the preferred embodiment of the present invention. First, the record is traversed to find the proxy node, via step 701.
  • An entry is then created for the largest logical node ID before the proxy node, via step 702, with a mapping to the record's RID.
  • Another entry is created for the largest node ID in the record, via step 703, with a mapping to the record's RID.
  • node identifier index entries are being created for the records 601 and 602 ( Figure 6) .
  • the record 601 is traversed to find the proxy node 603, via step 701.
  • Node 1 has the largest logical node ID before the proxy node 603, so an entry 801 is created for Node 1 with a mapping to the RID for the record 601 (rid2) , via step 702.
  • Node 8 has the maximum logical node ID in the record 601, so an entry 803 is also created for Node 8 with a mapping to the RID (rid2) of the record, 601, via step 703.
  • steps 701 and 702 are skipped.
  • Node 5 has the largest logical node ID for the record 602, so entry 802 is created for Node 5 with a mapping to the RID (ridl) for the record 602, via step 703.
  • Node 4 finds the three entries 801-803.
  • the identifier '020204' is greater than '02' of entry 801, but less than '020206' of entry 802.
  • Node 4 is thus mapped to the RID (ridl) for the sub-tree record 602. If Node 8 with logical node ID '020602' is to be located, '020602' is greater than '020206' of entry 802 and equal to '020602' of entry 803. Node 8 is thus mapped to the RID (rid2) for the parent record 601.
  • the storage of hierarchically structured data is significantly more scalable than conventional methods. This is especially true since the nodes of the tree are stored as a few records, and the nodes of sub-trees can be moved together more efficiently.
  • the records may require reorganization once it is discovered that not all nodes of the tree can be stored in one record.
  • a sub-tree that can be stored in a separate record is identified. The nodes of the sub-tree are then replaced with the proxy node. If the records are less clustered, reorganization can be performed to make records in document order again. Because references between records are accomplished through logical node ID' s rather than explicit references, this reorganization is significantly more easily accomplished, allowing greater scalability.
  • Figures 9A and 9B illustrate in more detail the tree traversal process used by the method in accordance with the preferred embodiment of the present invention.
  • Figure 9A illustrates the traversal process with one record
  • Figure 9B illustrates the traversal process with two records.
  • a stack is used to track each level of nodes.
  • the node ID' s 901-902 are absolute node ID' s in a variable-length binary string (2-byte length, followed by the node ID encoding) .
  • the length of each local node ID is kept in a separate array.
  • Both node ID and length of node IDs are used as a stack when the tree nodes are traversed.
  • the level is used as the stack top pointer. This way, the (absolute) node ID can be maintained easily, and is always available as a variable-length binary string format .
  • In-scope namespaces in the XQuery data model can be similarly maintained for each node.
  • a sub-tree starts at a current node and ends at the current node start position plus the sub-tree length.
  • a tree can be traversed using two primitives: getFirstChild and getNextSibling.
  • the primitive ⁇ "getFirstChild' starts from the current node, and if the number of children is ⁇ 0', then ' 'not found' is returned. Otherwise, the next node is the first child.
  • the primitive ' 'getNextSibling' starts from the current node, and if it is the root node, then ' 'not found' is returned. Otherwise, the total sub-tree length rooted at the current node is added to the start position of the node to get the next node position. If it is beyond the sub-tree rooted at the parent node, then ' 'not found' is returned. Otherwise, that next node is the next sibling.
  • the search key for the node ID index is set to ⁇ (DocID, node ID) ' .
  • the index will return the RID of the record that contains the node. This record is then fetched and the traversal continues. To find a node with a given node ID, a node with the local node ID at each level is found using the above two primitives.
  • a proxy node can represent a sequence of sub-trees contained in a record, and multiple proxy nodes next to each other within a record can be collapsed into a single "range proxy node".
  • a range proxy node 1001 can represent two proxy nodes that are collapsed, each of which represents a sequence of sibling nodes (or sub-trees) 1002-1003 stored in a separate record.
  • a range proxy node 1004 can represent multiple proxy nodes 1005-1007, each corresponding to a record that may contain a sub-tree or multiple sub-trees.
  • An improved method and system for storing hierarchically structured data in relational data structures are disclosed.
  • the method and system uses logical node identifiers to reference the nodes of a hierarchically structured data stored within and across relational data structures, such as records or pages of records.
  • a node identifier index is used to map each logical node ID to a RID for the record that contains the node.
  • a proxy node is used to represent the sub-tree in the parent record.
  • the mapping in the node identifier index is then updated to reflect the storage of the sub-tree nodes in the separate record.
  • the DocID and node ID for the sub-tree root are used.
  • the DocID and the minimum node ID of the nodes which is also the absolute node ID of the sub-tree root, can be put into separate fields within the record of nodes .
  • a sub-document update can be performed with the record as the unit. Insert, delete, or replace of a sub-tree can be performed easily.
  • Partitioning Even a document can be partitioned based on node ID ranges .
  • the method and system in accordance with the preferred embodiment of the present invention thus is significantly more scalable than conventional approaches. It has a much smaller storage consumption than conventional object approaches that uses explicit references between nodes. They can also leverage existing indexing approaches and reuse some of its utilities.

Abstract

A storage of nodes of hierarchically structured data uses logical node identifiers to reference the nodes stored within and across record data structures. A node identifier index is used to map each logical node identifier to a record identifier for the record that contains the node. When a sub-tree is stored in a separate record, a proxy node is used to represent the sub-tree in the parent record. The mapping in the node identifier index reflects the storage of the sub-tree nodes in the separate record. Since the references between the records are through logical node identifiers, there is no limitation to the moving of records across pages, as long as the indices are updated or rebuilt to maintain synchronization with the resulting data pages. This approach is highly scalable and has a much smaller storage consumption than approaches that use explicit references between nodes.

Description

PACKING NODES INTO RECORDS TO STORE XML XQUERY DATA MODEL AND OTHER HIERARCHICALLY STRUCTURED DATA
FIELD OF THE INVENTION
The present invention relates to hierarchically structured data, and more particularly to the storage of hierarchically structured data in a database.
BACKGROUND OF THE INVENTION
As hierarchically structured data, such as extensible Mark-up Language (XML) , become widely used as a data format, it also becomes a native data type for database systems. The storage of hierarchically structured data in relational databases, however, poses particular challenges .
One conventional approach is to store XML as text. This approach preserves the original documents and retrieves the entire document. However, it is inefficient in supporting queries and document updates, especially when the document is large.
Another conventional approach is to decompose and store the XML as tables in the relational database. This requires either a special relational schema for each XML schema or a generic relational representation for the XML data model. However, the result data is relatively large, and the queries are usually slow to execute.
Another conventional approach uses an object data model to store XML tree data, where many direct references or pointers are stored in the records for the parent-child relationships. However, this approach lacks scalability, has a larger data volume due to the references, and is less flexible in the re-organization of records.
Another conventional approach decomposes the XML data at a high level into relational data. However, this approach is inefficient in that it places lower levels and long text into a Character Large Object (CLOB) , or it stores the original textual XML redundantly along with the object model . Accordingly, there exists a need for an improved method and system for storing hierarchically structured data in record data structures. The improved method and system should combine the advantages of relational scalability and flexibility for the re-organization of records and the object efficiency for traversal and update.
SUMMARY OF THE INVENTION
An improved method and system for storing hierarchically structured data in record data structures uses logical node identifiers to reference the nodes of a hierarchically structured data stored within and across relational data structures, such as records or pages of records. A node identifier index is used to map each logical node identifier to a record identifier for the record that contains the node. When a sub-tree is stored in a separate record, a proxy node is used to represent the sub-tree in the parent record. The mapping in the node identifier index reflects the storage of the sub-tree nodes in the separate record. This storage scheme supports document order clustering and sub-document update with the record as the unit. Since the references between the records are through logical node identifiers, there is no limitation to the moving of records across pages, as long as the indices are updated or rebuilt to maintain synchronization with the resulting data pages. The method and system in accordance with the present invention thus is significantly more scalable than conventional approaches. It has a much smaller storage consumption than conventional object approaches that uses explicit references between nodes.
BRIEF DESCRIPTION OF THE FIGURES
Embodiments of the present invention shall now be described, by way of example, with reference to the following drawings:
Figure 1 illustrates an example hierarchically structured data tree containing a plurality of nodes;
Figure 2 is a flowchart illustrating an embodiment of a method for storing hierarchically structured data in a record data structure in accordance with a preferred embodiment of the present invention;
Figure 3 illustrates an example record storing a hierarchically structured data tree in accordance with a preferred embodiment of the present invention; Figure 4 illustrates the local and absolute node identifiers for the example tree in Figure 1;
Figure 5 is a flowchart illustrating a search for a node of the hierarchically structured data in accordance with a preferred embodiment of the present invention;
Figure 6 illustrates example records for storing a tree across multiple records in accordance with a preferred embodiment of the present invention;
Figure 7 is a flowchart illustrating a method for generating the node identifier indexes for records with proxy nodes in accordance with a preferred embodiment of the present invention;
Figure 8 illustrates example entries of the node identifier index in accordance with a preferred embodiment of the present invention;
Figures 9A and 9B illustrate in more detail the tree traversal process used by the method in accordance with a preferred embodiment of the present invention; and
Figures IOA and IOB illustrate range proxy nodes in accordance with a preferred embodiment of the present invention.
DETAILED DESCRIPTION
Embodiments of the present invention provide improved methods and systems for storing hierarchically structured data in record data structures. Various modifications to the preferred embodiment will be readily apparent to those skilled in the art and the generic principles herein may be applied to other embodiments. Thus, the present invention is not intended to be limited to the embodiment shown but is to be accorded the widest scope consistent with the principles and features described herein.
The method and system in accordance with a preferred embodiment of the present invention uses logical node identifiers to reference the nodes of a hierarchically structured data stored within and across relational data structures, such as records or pages of records. A node identifier index is then used to map each logical node identifier to a record identifier for the record that contains the node. When a sub-tree is stored in a separate record, a proxy node is used to represent the sub-tree in the parent record. The mapping in the node identifier index is then updated to reflect the storage of the sub-tree nodes in the separate record. In this manner, when re-organization of records are desired or needed, only the node identifier index needs to be updated. The logical node identifiers in the records need not be changed.
To more particularly describe the features of the preferred embodiment of the present invention, please refer to Figures 1 through IOB in conjunction with the discussion below.
Figure 1 illustrates an example hierarchically structured data tree containing a plurality of nodes. The tree 101 can represent any type of hierarchically structured data, such as XML. Although the preferred embodiment of the present invention may be described below in the context of XML, one of ordinary skill in the art will understand that the method and system can be applied to other types of hierarchically structure data. The tree 101 has a root node (Node 0) with one child node (Node 1) . Node 1 has three child nodes (Nodes 2, 6, and 7) . Nodes 2, 6, and 7 are thus sibling nodes. Node 6 is a leaf node (it has no child nodes) . Node 2 has three child nodes (Nodes 3, 4, and 5) . Node 7 has one child node (Node 8) .
Figure 2 is a flowchart illustrating an embodiment of a method for storing hierarchically structured data in a record data structure in accordance with the preferred embodiment of the present invention. Assume that the hierarchically structure data comprises a plurality of nodes. Initially, there is no node, a working buffer (wbuf) is empty, and a parent stack is also empty, via step 201. The next node information is then obtained, via step 202. It is then determined if there is more node information, via step 203, i.e., if the traversal of the hierarchically structured data or its equivalent token information has ended. If not, and if the node is the first node, then the node is put in the beginning of the wbuf, via step 204, and the working pointer is advanced.
If the node is a new node, then it is determined if there is enough space in the wbuf for the new node, via step 205. If not, then the largest sub-tree (or a sequence of sub-trees) of the parent node is stored into one record, via step 207. The taken-out sub-tree (or a sequence of sub-trees) is replaced with a proxy node until there is enough space for the new node. If there is enough space, then the relationship between the node and a current node is determined, via step 208. If the node is the child node of the current node, then the parent node pointer is pushed onto the parent stack, via step 209. If the node is a sibling node of the current node, then step 209 is skipped. Next, the node is put into the wbuf at the location pointed to by the working pointer, the parent's child count is incremented, and the working pointer is advanced, via step 210. If the node is an end of a set of child nodes, then the parent node pointer is popped from the parent stack, via step 206.
Eventually, there is no more node information, via step 203. At that time, the nodes stored in the wbuf is stored into one or more records, via step 211.
Figure 3 illustrates an example record storing a hierarchically structured data tree in accordance with the preferred embodiment of the present invention. In this embodiment, nodes are stored within records. A plurality of records is stored within a page. A plurality of pages is stored for a document. This type of record data structure is known in the art and will not be described in detail here. Each document is assigned a document identifier (DocID) . Assume that all nodes of the tree 101 is part of the same document and can be stored within one record. The record contains a record header 301 and nodes 302. The record is assigned a record identifier (RID) , which references a physical address of the record. And each node is assigned a logical node identifier (node ID) . A logical node ID identifies a node based upon its relationship with the other nodes in the tree. It does not identify the physical location where the node is stored. There are two types of logical node ID's, an absolute node ID and a local or relative node ID. The local node ID of a node is assigned to the node according to its sequence under that particular parent node. Child nodes of different parent nodes are assigned local node ID's independently at each level in the tree. The absolute node ID is a concatenation of the local node ID's from the root node to the node. For example, the local node ID for Node 5 is x06' to indicate that it is the third sibling node at its level, while its absolute node ID is '020206' . The absolute node ID indicates that Node 5 is the third child node of its parent node (Node 2), where its parent node is a first child node of its grandparent node (Node 1) , where its grandparent is a first child node of the root node (Node 0) . The root node is assigned a local node ID of '00' and is ignored.
Returning to Figure 3, the record header 301 contains an absolute node ID of the rooted node. Each node 302 within the record contains a node kind, node length, number of children, and the nodes for the children. It also stores its local node ID. Figure 4 illustrates the local and absolute node IDs for the example tree 101 in Figure 1. Logical node ID's are further described in co-pending U.S. patent application serial no. 10/709,415 published with publication number 20060004858 titled "Self-Adaptive Prefix Encoding for Stable Node identifiers", filed on May 4, 2004, and assigned to the assignee of the present application. Applicant hereby incorporates this patent application by reference. The logical node ID provides stable node encodings that allow for arbitrary insertion, deletion or replacement of nodes. Existing node ID's need not be modified when a node is inserted, deleted, or replaced to keep node ID's in document order. This holds true because a logical node ID is not modeled as a fixed string of decimal numbers, but rather as a variable-length binary string.
In this embodiment, the storage of the tree 101 into records is based on a preorder traversal process, known in the art. However, other types of traversal processes can be used. With the preorder traversal processing, as the nodes are constructed, a grouping logic keeps track of the sub-tree being constructed for the length of the sub-tree rooted at the current node. For example, assume that the maximum record size, R, is known. A working buffer of 2xR or more in size is used in the construction. If the entire tree is smaller than R, then the entire tree is stored into one record. Otherwise, the tree is split into multiple records. The storage of a tree in multiple records is described further below.
For example, referring to both Figures 1 and 3, the root node (Node 0) is first stored with an indication that it has one child node. Its child node (Node 1) is then stored with an indication that it has three child nodes. Next, the first child node (Node 2) is stored with an indication that it has three child nodes. These child nodes (Nodes 3, 4, and 5) are then stored. The traversal process returns to Node 2 and continues with the next sibling node (Node 6) . Nodes 6 and 7 are then stored, with an indication that Node 7 has one child. Node 8 is then stored after Node 7. Thus, with the preferred embodiment of the present invention, the relationships among the nodes of the tree 101 are captured by the nesting structure. No explicit links are used.
Referring now to Figure 5, to obtain a node with a given logical node ID, such as in response to a query, the node identifier index is searched, via step 501, to obtain the RID corresponding to a logical node ID. The record corresponding to the RID is then traversed to obtain the node corresponding to the logical node ID, via step 502. To locate the node inside the record, the same traversal process used when storing the tree is used to locate the node with the local node ID at each level. [032] A hierarchically structured data tree is stored within a single record whenever possible. Occasionally, multiple records are required to store the hierarchically structured data tree. When more than one record is required, the method in accordance with the preferred embodiment of the present invention stores sub-trees in a separate record, and represents this sub-tree in the parent record with a "proxy node", which itself does not contain a logical node ID. Assume for example, that the tree 101 in Figure 1 cannot be stored within one record. The sub-tree of Node 2, containing Nodes 2, 3, 4, and 5, is then stored in a separate record. Here, each record stores one sub-tree.
Assuming again that the maximum record size, R, is known, as the nodes are constructed node by node in the preorder traversal process, if the entire tree is larger than R, then the tree is split into multiple records. The largest sub-tree is searched and copied into a separate record. The copied sub-tree is replaced with a proxy node, and the length of the nodes in the separate record is excluded from the calculation of the sub-tree length. Only the length of the proxy node is included. All the length information is updated accordingly. Figure 6 illustrates example records for storing a tree across multiple records in accordance with the preferred embodiment of the present invention. Here, the parent record 601 contains a proxy node 603 that represents the sub-tree rooted in Node 2.
In order to find the sub-tree nodes represented by a proxy node, a node identifier (node ID) index is created. This index is to map a node ID to the RID of a record that contain the node with the given node ID. All the node IDs in document order can be viewed as points in a line. The records break this line into a plurality of intervals. The node ID index contains the upper end point of each interval. Figure 7 is a flowchart illustrating a method for generating the node identifier indexes for records with proxy nodes in accordance with the preferred embodiment of the present invention. First, the record is traversed to find the proxy node, via step 701. An entry is then created for the largest logical node ID before the proxy node, via step 702, with a mapping to the record's RID. Another entry is created for the largest node ID in the record, via step 703, with a mapping to the record's RID. These entries represent the range of logical node ID' s that encompass the tree. For a logical node ID that falls within any two of entries, the greater RID is used to locate the node .
For example, referring to Figure 8, assume that node identifier index entries are being created for the records 601 and 602 (Figure 6) . First, the record 601 is traversed to find the proxy node 603, via step 701. Node 1 has the largest logical node ID before the proxy node 603, so an entry 801 is created for Node 1 with a mapping to the RID for the record 601 (rid2) , via step 702. Node 8 has the maximum logical node ID in the record 601, so an entry 803 is also created for Node 8 with a mapping to the RID (rid2) of the record, 601, via step 703. For record 602, there are no proxy nodes, so steps 701 and 702 are skipped. Node 5 has the largest logical node ID for the record 602, so entry 802 is created for Node 5 with a mapping to the RID (ridl) for the record 602, via step 703.
Thus, to locate Node 4 with logical node ID '020204', for example, a search of the node identifier index finds the three entries 801-803. The identifier '020204' is greater than '02' of entry 801, but less than '020206' of entry 802. Node 4 is thus mapped to the RID (ridl) for the sub-tree record 602. If Node 8 with logical node ID '020602' is to be located, '020602' is greater than '020206' of entry 802 and equal to '020602' of entry 803. Node 8 is thus mapped to the RID (rid2) for the parent record 601.
By using proxy nodes to reference sub-tree nodes stored outside of a parent record, the storage of hierarchically structured data is significantly more scalable than conventional methods. This is especially true since the nodes of the tree are stored as a few records, and the nodes of sub-trees can be moved together more efficiently. When nodes are updated, the records may require reorganization once it is discovered that not all nodes of the tree can be stored in one record. Upon this discovery, a sub-tree that can be stored in a separate record is identified. The nodes of the sub-tree are then replaced with the proxy node. If the records are less clustered, reorganization can be performed to make records in document order again. Because references between records are accomplished through logical node ID' s rather than explicit references, this reorganization is significantly more easily accomplished, allowing greater scalability.
Figures 9A and 9B illustrate in more detail the tree traversal process used by the method in accordance with the preferred embodiment of the present invention. Figure 9A illustrates the traversal process with one record, while Figure 9B illustrates the traversal process with two records. A stack is used to track each level of nodes. In Figures 9A and 9B, the node ID' s 901-902 are absolute node ID' s in a variable-length binary string (2-byte length, followed by the node ID encoding) . The length of each local node ID is kept in a separate array. Both node ID and length of node IDs are used as a stack when the tree nodes are traversed. The level is used as the stack top pointer. This way, the (absolute) node ID can be maintained easily, and is always available as a variable-length binary string format . In-scope namespaces in the XQuery data model can be similarly maintained for each node.
Here, a sub-tree starts at a current node and ends at the current node start position plus the sub-tree length. A tree can be traversed using two primitives: getFirstChild and getNextSibling. The primitive "getFirstChild' starts from the current node, and if the number of children is Λ0', then ''not found' is returned. Otherwise, the next node is the first child. The primitive ''getNextSibling' starts from the current node, and if it is the root node, then ''not found' is returned. Otherwise, the total sub-tree length rooted at the current node is added to the start position of the node to get the next node position. If it is beyond the sub-tree rooted at the parent node, then ''not found' is returned. Otherwise, that next node is the next sibling.
If a proxy node is encountered, the search key for the node ID index is set to Λ (DocID, node ID) ' . The index will return the RID of the record that contains the node. This record is then fetched and the traversal continues. To find a node with a given node ID, a node with the local node ID at each level is found using the above two primitives.
To further improve efficiency, a proxy node, called a range proxy node, can represent a sequence of sub-trees contained in a record, and multiple proxy nodes next to each other within a record can be collapsed into a single "range proxy node". For example, as illustrated in Figure 1OA, a range proxy node 1001 can represent two proxy nodes that are collapsed, each of which represents a sequence of sibling nodes (or sub-trees) 1002-1003 stored in a separate record. For another example, as illustrated in Figure lOB, a range proxy node 1004 can represent multiple proxy nodes 1005-1007, each corresponding to a record that may contain a sub-tree or multiple sub-trees. An improved method and system for storing hierarchically structured data in relational data structures are disclosed. The method and system uses logical node identifiers to reference the nodes of a hierarchically structured data stored within and across relational data structures, such as records or pages of records. A node identifier index is used to map each logical node ID to a RID for the record that contains the node. When a sub-tree is stored in a separate record, a proxy node is used to represent the sub-tree in the parent record. The mapping in the node identifier index is then updated to reflect the storage of the sub-tree nodes in the separate record. This storage scheme supports the following:
Clustering. To support document order clustering, the DocID and node ID for the sub-tree root are used. To improve the efficiency of the clustering, the DocID and the minimum node ID of the nodes, which is also the absolute node ID of the sub-tree root, can be put into separate fields within the record of nodes .
Update. A sub-document update can be performed with the record as the unit. Insert, delete, or replace of a sub-tree can be performed easily.
Re-organization of records. Since the references between the records are through logical node ID's, then there is no limitation to the moving of records across pages, as long as the indices are updated or rebuilt to maintain synchronization with the resulting data pages.
Partitioning. Even a document can be partitioned based on node ID ranges .
The method and system in accordance with the preferred embodiment of the present invention thus is significantly more scalable than conventional approaches. It has a much smaller storage consumption than conventional object approaches that uses explicit references between nodes. They can also leverage existing indexing approaches and reuse some of its utilities.
Although the present invention has been described in accordance with the embodiments shown, one of ordinary skill in the art will readily recognize that there could be variations to the embodiments. Accordingly, many modifications may be made by one of ordinary skill in the art without departing from the spirit and scope of the appended claims.

Claims

1. A method for storing hierarchically structured data in record data structures, the hierarchically structured data containing a plurality of nodes, comprising: storing each node of the hierarchically structured data in at least one record of a relational data structure; and referencing each node of the hierarchically structured data using a logical node identifier.
2. The method of claim 1, wherein the logical node identifier identifies its corresponding node based upon the corresponding node's relationship with others of the plurality of nodes.
3. The method of claim 1, wherein the storing comprises: storing with each node of the hierarchically structured data a corresponding local node identifier, wherein the local node identifier is assigned according to its sequence under its parent node.
4. The method of claim 3, wherein the hierarchically structured data is traversed using getFirstChild and getNextSibling primitives.
5. The method of claim 1, wherein the storing comprises: storing an absolute node identifier of a context node in a header of the record, wherein the absolute node identifier comprises a concatenation of local node identifiers from a root node to a current node, wherein the context node comprises a parent node of a root node of a sub-tree or root nodes of sub-trees that share a same parent.
6. The method of claim 1, wherein a construction for storing nodes is performed based on a preorder traversal sequence.
7. The method of claim 1, further comprising: constructing a node identifier index ; searching the node identifier index for a record identifier corresponding to a given logical node identifier; and traversing a record corresponding to the record identifier to obtain a node corresponding to the given logical node identifier.
8. The method of claim 1, wherein the storing comprises: determining that not all of the plurality of nodes can be stored within one record; finding a sub-tree of the plurality of nodes that can be stored within a separate record; storing nodes of the sub-tree in the separate record; and replacing the nodes of the sub-tree in a parent record with a proxy node .
9. The method of claim 8, further comprising: changing a node identifier index to update a record identifier that corresponds to the nodes of the sub-tree, wherein the logical node identifiers for the nodes of the sub-tree need not be changed.
10. The method of claim 8, further comprising: traversing the record and finding the proxy node; creating an entry in a node identifier index for a largest logical node identifier before the proxy node with a mapping to a record identifier for the record; and creating another entry in the node identifier index for a maximum logical node identifier for the plurality of nodes with a mapping to the record identifier for the record.
11. The method of claim 8, further comprising: collapsing a plurality of proxy nodes in the parent record into a single range proxy node.
12. The method of claim 11, wherein the range proxy node represents a sequence of sibling proxy nodes stored in the separate record.
13. The method of claim 11, wherein the range proxy node represents the plurality of proxy nodes, wherein each proxy node corresponds to a record that contains a sub-tree or multiple sub-trees .
14. A computer readable medium with program instructions for storing hierarchically structured data in record data structures, the hierarchically structured data containing a plurality of nodes, comprising instructions for: storing each node of the hierarchically structured data in at least one record of a relational data structure; and referencing each node of the hierarchically structured data using a logical node identifier.
15. The medium of claim 14, wherein the logical node identifier identifies its corresponding node based upon the corresponding node's relationship with others of the plurality of nodes.
16. The medium of claim 14, wherein the storing comprises: storing with each node of the hierarchically structured data a corresponding local node identifier, wherein the local node identifier is assigned according to its sequence under its parent node.
17. The medium of claim 16, wherein the hierarchically structured data is traversed using getFirstChild and getNextSibling primitives.
18. The medium of claim 14, wherein the storing comprises: storing an absolute node identifier of a context node in a header of the record, wherein the absolute node identifier comprises a concatenation of local node identifiers from a root node to a current node, wherein the context node comprises a parent node of a root node of a sub-tree or root nodes of sub-trees that share a same parent.
19. The medium of claim 14, wherein a construction for the storing nodes is performed based on a preorder traversal process.
20. The medium of claim 14, further comprising: constructing a node identifier index; searching the node identifier index for a record identifier corresponding to a given logical node identifier; and traversing a record corresponding to the record identifier to obtain a node corresponding to the given logical node identifier.
21. The medium of claim 14, wherein the storing comprises: determining that not all of the plurality of nodes can be stored within one record; finding a sub-tree of the plurality of nodes that can be stored within a separate record; storing nodes of the sub-tree in the separate record; and replacing the nodes of the sub-tree in a parent node with a proxy node.
22. The medium of claim 21, further comprising: changing a node identifier index to update a record identifier that corresponds to the nodes of the sub-tree, wherein the logical node identifiers for the nodes of the sub-tree need not be changed.
23. The medium of claim 21, further comprising: traversing the record and finding the proxy node; creating an entry in a node identifier index for a largest logical node identifier before the proxy node with a mapping to a record identifier for the record; and creating another entry in the node identifier index for a maximum logical node identifier for the plurality of nodes with a mapping to the record identifier for the record.
24. The medium of claim 21, further comprising: collapsing a plurality of proxy nodes in the parent node into a single range proxy node.
25. The medium of claim 24, wherein the range proxy node represents a sequence of sibling proxy nodes stored in the separate record.
26. The medium of claim 24, wherein the range proxy node represents the plurality of proxy nodes, wherein each proxy node corresponds to a record that contains a sub-tree or multiple sub-trees .
27. A system, comprising: a relational data structure comprising at least one record, wherein a plurality of nodes of a hierarchically structured data is stored within the at least one record, wherein each node of the hierarchically structured data is referenced using a logical node identifier; and a node identifier index for mapping each logical node identifier for each node to a record identifier corresponding to a record containing the node.
28. The system of claim 27, wherein the at least one record comprises: the plurality of nodes, where each node comprises its corresponding local node identifier, wherein the local node identifier is assigned according to its sequence under its parent node; and a header comprising an absolute node identifier comprising a concatenation of the local node identifiers from a root node to a current node .
29. The system of claim 27, wherein the at least one record comprises a proxy node, wherein the proxy node represents node of a sub-tree of the plurality of nodes that cannot be stored within one record, wherein the nodes of the sub-tree are stored in a separate record.
PCT/EP2006/065449 2005-08-22 2006-08-18 Packing nodes into records to store xml xquery data model and other hierarchically structured data WO2007023136A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/209,997 2005-08-22
US11/209,997 US8543614B2 (en) 2005-08-22 2005-08-22 Packing nodes into records to store XML XQuery data model and other hierarchically structured data

Publications (1)

Publication Number Publication Date
WO2007023136A1 true WO2007023136A1 (en) 2007-03-01

Family

ID=37102504

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2006/065449 WO2007023136A1 (en) 2005-08-22 2006-08-18 Packing nodes into records to store xml xquery data model and other hierarchically structured data

Country Status (2)

Country Link
US (1) US8543614B2 (en)
WO (1) WO2007023136A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080005719A1 (en) * 2006-06-30 2008-01-03 Morris Robert P Methods, systems, and computer program products for providing a program execution environment
US8972377B2 (en) * 2007-10-25 2015-03-03 International Business Machines Corporation Efficient method of using XML value indexes without exact path information to filter XML documents for more specific XPath queries
DE102008024809B3 (en) * 2008-05-23 2009-11-19 Universität Konstanz A method of storing a plurality of revisions of tree-structured data family parts
EP2141615A1 (en) * 2008-07-04 2010-01-06 Software AG Method and system for generating indexes in an XML database management system
US8037404B2 (en) * 2009-05-03 2011-10-11 International Business Machines Corporation Construction and analysis of markup language document representing computing architecture having computing elements
US8495085B2 (en) 2010-09-27 2013-07-23 International Business Machines Corporation Supporting efficient partial update of hierarchically structured documents based on record storage
EP2608054B1 (en) * 2011-12-21 2018-02-21 Siemens Aktiengesellschaft Executing database insert calls in a MES system
GB201315993D0 (en) * 2013-09-06 2013-10-23 Middleton Technology Ltd Element identification in a structural model
US20150127687A1 (en) * 2013-11-04 2015-05-07 Roger Graves System and methods for creating and modifying a hierarchial data structure
US10997119B2 (en) * 2015-10-23 2021-05-04 Nutanix, Inc. Reduced size extent identification

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001042881A2 (en) * 1999-12-06 2001-06-14 B-Bop Associates, Inc. System and method for the storage, indexing and retrieval of xml documents using relational databases

Family Cites Families (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62194533A (en) * 1986-02-21 1987-08-27 Hitachi Ltd Pattern matching system for tree structure data
US5151697A (en) 1990-10-15 1992-09-29 Board Of Regents Of The University Of Washington Data structure management tagging system
JPH0793370A (en) * 1993-09-27 1995-04-07 Hitachi Device Eng Co Ltd Gene data base retrieval system
JPH08190543A (en) 1995-01-10 1996-07-23 Fujitsu Ltd Document processor
US5608904A (en) 1995-02-13 1997-03-04 Hewlett-Packard Company Method and apparatus for processing and optimizing queries having joins between structured data and text data
JP3267142B2 (en) 1996-02-23 2002-03-18 ケイディーディーアイ株式会社 Variable length code generator
US6058397A (en) 1997-04-08 2000-05-02 Mitsubishi Electric Information Technology Center America, Inc. 3D virtual environment creation management and delivery system
US6295526B1 (en) 1997-10-14 2001-09-25 Bellsouth Intellectual Property Corporation Method and system for processing a memory map to provide listing information representing data within a database
US6085188A (en) * 1998-03-30 2000-07-04 International Business Machines Corporation Method of hierarchical LDAP searching with relational tables
US6313766B1 (en) 1998-07-01 2001-11-06 Intel Corporation Method and apparatus for accelerating software decode of variable length encoded information
US6263332B1 (en) 1998-08-14 2001-07-17 Vignette Corporation System and method for query processing of structured documents
JP3484096B2 (en) * 1999-03-03 2004-01-06 インターナショナル・ビジネス・マシーンズ・コーポレーション Logical zoom method in logical zoom device for directed graph
TW428146B (en) 1999-05-05 2001-04-01 Inventec Corp Data file updating method by increment
US6381605B1 (en) 1999-05-29 2002-04-30 Oracle Corporation Heirarchical indexing of multi-attribute data by sorting, dividing and storing subsets
JP3492247B2 (en) 1999-07-16 2004-02-03 富士通株式会社 XML data search system
ATE246824T1 (en) 1999-07-21 2003-08-15 Torben Bach Pedersen METHODS AND SYSTEMS TO MAKE OLAP HIERARCHICES SUMMERIZABLE
US6539396B1 (en) 1999-08-31 2003-03-25 Accenture Llp Multi-object identifier system and method for information service pattern environment
US6353820B1 (en) 1999-09-29 2002-03-05 Bull Hn Information Systems Inc. Method and system for using dynamically generated code to perform index record retrieval in certain circumstances in a relational database manager
US6985898B1 (en) * 1999-10-01 2006-01-10 Infoglide Corporation System and method for visually representing a hierarchical database objects and their similarity relationships to other objects in the database
US6721727B2 (en) 1999-12-02 2004-04-13 International Business Machines Corporation XML documents stored as column data
KR100748772B1 (en) 1999-12-10 2007-08-13 모사이드 테크놀로지스 코포레이션 Method and apparatus for longest match address lookup
US6510434B1 (en) 1999-12-29 2003-01-21 Bellsouth Intellectual Property Corporation System and method for retrieving information from a database using an index of XML tags and metafiles
US6810414B1 (en) 2000-02-04 2004-10-26 Dennis A. Brittain System and methods for easy-to-use periodic network data capture engine with automatic target data location, extraction and storage
US20020029207A1 (en) 2000-02-28 2002-03-07 Hyperroll, Inc. Data aggregation server for managing a multi-dimensional database and database management system having data aggregation server integrated therein
WO2002003245A1 (en) 2000-07-04 2002-01-10 Otoobe Method for storing xml-format information objects in a relational database
US6647391B1 (en) 2000-07-11 2003-11-11 Ian E. Smith System, method and article of manufacture for fast mapping from a propertied document management system to a relational database
US20020105548A1 (en) 2000-12-12 2002-08-08 Richard Hayton Methods and apparatus for creating a user interface using property paths
EP1225516A1 (en) 2001-01-22 2002-07-24 Sun Microsystems, Inc. Storing data of an XML-document in a relational database
US6633242B2 (en) 2001-02-08 2003-10-14 Sun Microsystems, Inc. Entropy coding using adaptable prefix codes
US7274671B2 (en) 2001-02-09 2007-09-25 Boly Media Communications, Inc. Bitwise adaptive encoding using prefix prediction
JP2002269139A (en) 2001-03-08 2002-09-20 Ricoh Co Ltd Method for retrieving document
US20030088639A1 (en) 2001-04-10 2003-05-08 Lentini Russell P. Method and an apparatus for transforming content from one markup to another markup language non-intrusively using a server load balancer and a reverse proxy transcoding engine
US7293028B2 (en) 2001-06-08 2007-11-06 Sap Ag Cache-conscious concurrency control scheme for database systems
US7080065B1 (en) 2001-06-22 2006-07-18 Oracle International Corporation Query pruning using interior rectangles in an R-tree index
US6587057B2 (en) 2001-07-25 2003-07-01 Quicksilver Technology, Inc. High performance memory efficient variable-length coding decoder
US20030023539A1 (en) 2001-07-27 2003-01-30 Wilce Scot D. Systems and methods for facilitating agreement definition via an agreement modeling system
US6889226B2 (en) 2001-11-30 2005-05-03 Microsoft Corporation System and method for relational representation of hierarchical data
US6910040B2 (en) 2002-04-12 2005-06-21 Microsoft Corporation System and method for XML based content management
US6563441B1 (en) 2002-05-10 2003-05-13 Seiko Epson Corporation Automatic generation of program logic to decode variable-length codes
US7346598B2 (en) 2002-06-28 2008-03-18 Microsoft Corporation Schemaless dataflow within an XML storage solution
US20040044959A1 (en) 2002-08-30 2004-03-04 Jayavel Shanmugasundaram System, method, and computer program product for querying XML documents using a relational database system
JP3560043B2 (en) 2002-11-25 2004-09-02 株式会社セック XML data storage method and storage device, program and recording medium storing program
US7072904B2 (en) 2002-12-02 2006-07-04 Microsoft Corporation Deletion and compaction using versioned nodes
CA2414047A1 (en) 2002-12-09 2004-06-09 Corel Corporation System and method of extending scalable vector graphics capabilities
US7043487B2 (en) 2002-12-28 2006-05-09 International Business Machines Corporation Method for storing XML documents in a relational database system while exploiting XML schema
US7062507B2 (en) 2003-02-24 2006-06-13 The Boeing Company Indexing profile for efficient and scalable XML based publish and subscribe system
US20040167915A1 (en) 2003-02-25 2004-08-26 Bea Systems, Inc. Systems and methods for declaratively transforming data objects between disparate representations
US7188308B2 (en) 2003-04-08 2007-03-06 Thomas Weise Interface and method for exploring a collection of data
US7013311B2 (en) 2003-09-05 2006-03-14 International Business Machines Corporation Providing XML cursor support on an XML repository built on top of a relational database system
US7293005B2 (en) 2004-01-26 2007-11-06 International Business Machines Corporation Pipelined architecture for global analysis and index building
US7281002B2 (en) 2004-03-01 2007-10-09 International Business Machine Corporation Organizing related search results
US7302421B2 (en) * 2004-03-17 2007-11-27 Theoris Software, Llc System and method for transforming and using content in other systems
US7246138B2 (en) 2004-04-13 2007-07-17 Bea Systems, Inc. System and method for content lifecycles in a virtual content repository that integrates a plurality of content repositories
US9760652B2 (en) * 2004-06-21 2017-09-12 International Business Machines Corporation Hierarchical storage architecture using node ID ranges
US7716253B2 (en) 2004-07-09 2010-05-11 Microsoft Corporation Centralized KPI framework systems and methods

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001042881A2 (en) * 1999-12-06 2001-06-14 B-Bop Associates, Inc. System and method for the storage, indexing and retrieval of xml documents using relational databases

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
AMER-YAHIA S ET AL: "Logical and physical support for heterogeneous data", PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT. CIKM 2002 ACM NEW YORK, NY, USA, 4 November 2002 (2002-11-04) - 9 November 2002 (2002-11-09), pages 270 - 281, XP002404140, ISBN: 1-58113-492-4 *
FIEBIG T ET AL: "Anatomy of a native XML base management system", VLDB JOURNAL, SPRINGER VERLAG, BERLIN, DE, vol. 11, 2002, pages 292 - 314, XP002325045, ISSN: 1066-8888 *
KANNE C-C ET AL: "Efficient Storage of XML Data", TECHNICAL REPORT UNIVERSITY OF MANNHEIM, 1999, pages 1 - 20, XP010378712 *
THORSTEN FIEBIG, SVEN HELMER, CARL-CHRISTIAN KANNE, JULIA MILDENBERGER, GUIDO MOERKOTTE, ROBERT SCHIELE, TILL WESTMANN: "Anatomy of a Native XML Base Management System", TECHNICAL REPORT 01, UNIVERSITY OF MANNHEIM, 2002, pages 1 - 52, XP002404139, Retrieved from the Internet <URL:http://citeseer.ist.psu.edu/fiebig02anatomy.html> [retrieved on 20061123] *

Also Published As

Publication number Publication date
US8543614B2 (en) 2013-09-24
US20070043743A1 (en) 2007-02-22

Similar Documents

Publication Publication Date Title
US8543614B2 (en) Packing nodes into records to store XML XQuery data model and other hierarchically structured data
Cooper et al. A fast index for semistructured data
US20120078942A1 (en) Supporting efficient partial update of hierarchically structured documents based on record storage
US8266151B2 (en) Efficient XML tree indexing structure over XML content
US8489597B2 (en) Encoding semi-structured data for efficient search and browsing
US7529726B2 (en) XML sub-document versioning method in XML databases using record storages
US9576011B2 (en) Indexing hierarchical data
Meier eXist: An open source native XML database
US7774346B2 (en) Indexes that are based on bitmap values and that use summary bitmap values
US20070174309A1 (en) Mtreeini: intermediate nodes and indexes
US8566343B2 (en) Searching backward to speed up query
Li et al. Efficient updates in dynamic XML data: from binary string to quaternary string
US10698953B2 (en) Efficient XML tree indexing structure over XML content
EP1552426A1 (en) A subtree-structured xml database
JP2009503679A (en) Lightweight application program interface (API) for extensible markup language (XML)
Liu et al. Dynamic labeling scheme for XML updates
Ko et al. A binary string approach for updates in dynamic ordered XML data
Hsu et al. UCIS-X: an updatable compact indexing scheme for efficient extensible markup language document updating and query evaluation
GB2409078A (en) Encoding semi-structured data for efficient search and browsing
Norvag V2: a database approach to temporal document management
Alkhatib et al. Compacting xml structures using a dynamic labeling scheme
Mohammad et al. XML structural indexes
Haustein et al. Deweyids-the key to fine-grained management of xml documents
KR20050116089A (en) Fixed-rdbms model based indexing technology for massive xml data searching with different schema
Subhash et al. A Research Study on the Issues Related to XML Updates

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 06792893

Country of ref document: EP

Kind code of ref document: A1