WO2001093599A2 - Method and apparatus for unified query interface for network information - Google Patents

Method and apparatus for unified query interface for network information Download PDF

Info

Publication number
WO2001093599A2
WO2001093599A2 PCT/KR2001/000886 KR0100886W WO0193599A2 WO 2001093599 A2 WO2001093599 A2 WO 2001093599A2 KR 0100886 W KR0100886 W KR 0100886W WO 0193599 A2 WO0193599 A2 WO 0193599A2
Authority
WO
WIPO (PCT)
Prior art keywords
query
terms
source
ontologies
sources
Prior art date
Application number
PCT/KR2001/000886
Other languages
French (fr)
Other versions
WO2001093599A3 (en
Inventor
Jae-Woo Kang
Original Assignee
Wisengine Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wisengine Inc. filed Critical Wisengine Inc.
Priority to AU2001260758A priority Critical patent/AU2001260758A1/en
Publication of WO2001093599A2 publication Critical patent/WO2001093599A2/en
Publication of WO2001093599A3 publication Critical patent/WO2001093599A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/2448Query languages for particular applications; for extensibility, e.g. user defined types
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/2445Data retrieval commands; View definitions

Definitions

  • the present invention relates to methods and systems for finding or searching information available on a network, in different information formats, and in some embodiments including access over a public network.
  • the present invention in various aspects, involves a method and/or system and/or apparatus for providing a scalable, unified view or search over large numbers of queryable information sources.
  • the invention accomplishes this, in part by sacrificing some expressive power in the set of queries supported.
  • a system provides scalability through three main techniques. First, it uses a collection of ontologies organized into hierarchical namespaces as a medium for expressing data semantics. Second, it employs a declarative query language to describe information sources so that source descriptions can be "executed" at run time instead of being pre-compiled into the system. Third, it utilizes inverted-index style operations to identify a subset of information sources that are relevant to a particular user query.
  • FIG. 1 is a functional block diagram of a system overview according to specific embodiments of the invention.
  • FIG. 2 illustrates two example of namespaces, according to specific embodiments of the invention.
  • FIG. 3 illustrates an example syntax for a source description language according to specific embodiments of the invention.
  • FIG. 4 is a block diagram showing a snapshot of inverted index after registering sample sources.
  • FIG. 5 is a block diagram showing a representative example logic device in which various aspects of the present invention may be embodied.
  • the present invention involves a search system and/or method that employs namespaces in its query facility and "soft-wrapping" information sources.
  • Figure 1 illustrates main components of an example system according to the present invention.
  • a system according to the present invention aims to provide a general query facility by using a collection of ontologies organized into a hierarchical namespace. Each ontology in a namespace defines a set of terms that describe common concepts.
  • a namespace is used as the medium for expressing data semantics. Both user queries and source descriptions are written using terms in the namespace.
  • a query language (sometimes referred to as IDBQL) that, is based on SQL. Queries are expressed using terms from a namespace. When writing a query, users do not need to know about the exported views of each individual information source. Instead, the query engine will identify a set of relevant information sources by using terms that appear in the query to probe an inverted index.
  • the present invention does not generally infer implicit joins. This implies that the invention can answer only a subset of the queries handled by systems that use joins. However, this also implies that query planning according to the present invention is much more simple than under these prior systems (it requires simple inverted list lookup operations) and scales to large numbers of sites.
  • the present invention in further aspects utilizes a novel approach referred to as soft-wrapping to wrap information sources.
  • a novel approach referred to as soft-wrapping to wrap information sources.
  • “wrapper” is a declarative query evaluated at runtime. Source descriptions can be executed or evaluated at run time instead of being pre-compiled into the system (or “hard-wrapped”).
  • the advantages of soft-wrapping over hard-wrapping are many. First, it is more flexible and portable, because the writing of source descriptions is independent of any run-time environment. Second, soft-wrappers can be tested and registered dynamically at runtime through a Web interface, without having to restart the system. Third, it is easy to adapt to dynamically changing Web data sources, as recompilation is not needed. Finally, soft wrapping is more secure in that what is registered is a declarative query, and not a pre-compiled wrapper program that must be trusted by whoever executes the wrapper.
  • the present invention uses a collection of ontologies organized into a hierarchical namespace as a medium for expressing data semantics.
  • An ontology according to the invention is a grouping of terms describing a concept. The terms in the ontologies are reusable. When defining an ontology, one can borrow existing terms from other ontologies in the namespace as well as create new terms. An ontology can selectively inherit (or reuse) any subset of the parent ontology. Inheritance from multiple ontologies is also allowed.
  • the IDB namespace functions as a global schema that provides a uniform view over information on the Web. It is an a priori schema as opposed to the a posteriori schema of some prior art systems.
  • TSIMMIS for example, user queries are formulated over the view exported by a mediator.
  • the mediated view is, in turn, generated by integrating views of lower level mediators or data sources.
  • any source level changes such as adding a new source or dropping an existing source may affect the upper level mediated view user queries being formulated on.
  • a namespace according to the current invention is defined independently from the views of data sources. In fact, the source view is defined using the terms in the namespace. Because of this, information source level changes do not affect the global view.
  • the invention uses a simple collection of terms as the global schema.
  • XML namespaces become prevalent these could be used in place of the IDB ontology.
  • the invention would then be able to reuse a large number of widely used namespaces as schema without having to learn them.
  • FIG. 2 illustrates two example of namespaces, according to specific embodiments of the invention.
  • the movie ontology consists of terms that may be useful to describe movies.
  • product#name from the product namespace is reused in the movie namespace as movie#title. It is advantageous to reuse existing terms, as this increases the number of information sources that can contribute to a given query. For instance, if user queries on the name of a product using the product ontology, then information sources belonging to book and movie ontologies are also queried in addition to sources directly belonging to the product ontology. This is because the book#title and movie#title terms are inherited from the product#name term in the product namespace.
  • a query system interacts with information sources using source descriptions.
  • the role of source descriptions is twofold: (1) They export the views and capabilities of information sources; (2) They extract and map local data in the described source to the exported view of the source.
  • the present invention uses a "soft wrapping" scheme that allows source descriptions to be executed at query evaluation time.
  • the source description is, in fact, a query language that "queries" a remote document or database.
  • IDB does not require hard-coded or compiled wrappers to communicate with sources.
  • Prior art "hard wrappers” generally require recompilation each time an information source changes its data presentation.
  • mapping-rule [[and] or] mapping-rule] ...
  • the SELECT clause defines the exported view; the FROM clause specifies the location of the remote database and its query capability; and the WHERE clause defines the mapping rules.
  • Figure 3 shows a source description for amazon.com. After evaluating the source description of amazon.com, an eight-column table of vendor, title, etc. will be generated.
  • the execution starts by evaluating the FROM clause.
  • the FROM clause specifies the location of amazon.com' s book database and the query binding that it accepts.
  • Amazon.com' s book database is published on the Web through a front-end form interface. This form interface accepts user inputs on the title and author fields, and this information is encoded in the url string in the FROM clause.
  • the url of the document can simply be placed in the FROM clause without any query binding encoding.
  • IDB Once IDB has rewritten the user query into local queries, the placeholders $book#title$ and $book#author$ will be replaced with the corresponding values from the user query. After it opens a url connection and sends the query string, the query result will be returned from the source in an HTML page.
  • the HTML page is parsed into a DOM tree [DOM98]. If a source returns an XML page, then IDB will invoke an XML parser instead to generate the DOM tree. After this parsing step, the remaining query processing steps are transparent to both XML and HTML since the DOM interface is generic to both markup languages.
  • the WHERE clause consists of a set of path expressions and perl-style text operations.
  • the path expressions are evaluated over the DOM tree generated from the result page.
  • the syntax of our path expression is like that of HEL [SA99a, SA99b] and WIDL [A1197].
  • HEL also supports perl-style pattern matching.
  • the IDB source description language allows direct mapping from path expressions to the exported view and provides a larger set of text operations. Further, it allows the conjunction and disjunction of path expressions. For instance, depending on the user query binding, the amazon.com database returns two different types of HTML pages. In case the user query binding results in exactly one book entry, it directly returns the HTML page that contains the full book description.
  • a source description language supports popular perl regular expression operations such as match, substitute, join, split, and a custom-designed switch operator.
  • the switch operator is used to normalize the irregularity of output data across multiple sources. For example, some sources represent product availability in graphical symbols and they must be transformed into the text equivalents.
  • the dot(.) in the path expression implies the direct path from the parent element to the child and the arrow(->) implies 0 or more steps exist in between.
  • the SELECT clause provides a global semantics for the local data. It defines the schema of the table that is generated by the execution of the source description. Note that the constant value ⁇ amazon' is materialized into the book#vendor term as all book entries are coming from the same source, amazon.com. The plus(+) sign at the end of an attribute is shorthand for TS NOT NULL'.
  • the IDB source description can choose terms selectively from one or more ontologies.
  • the source description need not conform to any namespace nor have any restriction on choosing sets of terms from various ontologies. This allows the source description language to describe sources using terms that are close as possible to the original semantics of the data.
  • data extracted from the result page can potentially have some nested structure.
  • IDB employs a set of special iterators that are associated with each output attribute.
  • a "buggy" wrapper may cause the data from the wrapped source to be mapped incorrectly, but since it is just a declarative query, it does not pose a security risk to the site executing the soft wrapper query.
  • a query language may be understood as a subset of SQL, with additional keyword predicates.
  • the keyword predicates are added to support a keyword match operation that is perhaps the most popular operation in real-world web queries.
  • a query is formulated using the terms defined in the ontology.
  • the query writer does not need to know about the exported views of each of the individual underlying information sources.
  • a query processor according to specific embodiments of the invention will identify a set of relevant information sources by probing an inverted index using the terms used in the query as described in the next section.
  • a first example query illustrates a basic structure of a query language and the use of the keyword predicate.
  • the keyword operators are especially useful because data may come from autonomous information sources.
  • the presentation format of the data may differ across information sources, and perhaps even the data within one source may have different presentation formats over time.
  • One common example would be the format of person's name. Some sources may put the last name before the first name, and some others first name first.
  • a query language supports three keyword operators and their semantics as defined in Table 1.
  • the first example query was not very selective as returned more than 200 entries from various online book vendors.
  • a second example query adds two more selection conditions to the first query and retrieves book availability information along with the original attributes.
  • This query illustrates the use of numeric order predicates and data type coercion.
  • a data model according to the invention is essentially type free. Attribute values are treated as string literals.
  • vendor Fatbrain title: Database Management Systems , Second Edition author: Ramakrishnan , Raghu / Gehrke, Johannes price: 53.25 year: 1999 stock: Ships same day
  • Example Query 3 is a simple explicit join query. It is provided to illustrate the case where more than one ontology is involved in a query. It retrieves title, actor of movies and vendor, url, format, price of books where the movies are directed by 'Steven Spielberg', books are written by 'Michael Crichton', and both movie and book have the same title. Part of the result table for this query follows the example.
  • a user query is formulated using terms in one or more ontologies.
  • the query engine identifies base tables for each ontology used in the query.
  • a base table is determined by identifying the minimum subset of terms in a given ontology that is required to evaluate the user query.
  • the query engine retrieves a set of source descriptions for each base table from the source description index. This index is, in fact, an inverted index that associates terms in the ontology to the relevant source descriptions. • The query engine translates the original user query into local queries using the views exported from the set of source descriptions that were identified in the previous step.
  • the query engine materializes local views at each source, unions results by base tables, and processes remaining predicates (e.g. joins between base tables).
  • Example Ontology For Example Query 4 An example query is shown below. It retrieves the vendor, title, price, and review attributes of books that have the keywords 'Database' and 'Systems' in their title.
  • the first step of the query processing is to identify the base tables and the predicate binding implied for the base tables.
  • query4 fbfff vendor, title, author, price, review
  • fbff vendor, title, author, price
  • review bf title, review
  • title ⁇ ' 'Database Systems'
  • Predicate adornment is used to illustrate how the binding pattern serves as a filter for pruning out irrelevant sources.
  • title is the only variable that is bound in query4.
  • the base tables in this query are book fbff ( vendor, title , author, price ) and review bf ( title ; review) .
  • the way base tables are identified is straightforward; all terms used in the SELECT and WHERE clause are gathered and grouped into the ontologies that appear in the FROM clause. Note that the base table is different from the global predicates as discussed, for example, with regard to the Information Manifold [LR096a, LR096b].
  • a base table is dynamically generated by projecting out terms from the particular user query.
  • this aspect of the invention uses a much simpler way to identify information sources that are relevant to a particular user query.
  • Previous systems including the Information Manifold and Infomaster [DG97, GKD97] identify information sources through a query rewriting scheme based on the view containment test in their query processing (see [U1197] for an overview).
  • embodiments of the invention utilizes inverted index style operations in query processing. To illustrate this, assume the following information sources. amazon (vendor, title, author, year, price) borders f ff (vendor, title, author, price) Book3 fbff (vendor, title, price, isbn) Book4 fb (author, isbn)
  • the adornment here has a slightly different meaning than the adornment in the query. It is used here to specify the query capability of a particular source. For instance, the adornment of amazon, fbbff means that amazon can accept the queries on either title or author and returns a table of columns including vendor, title, author, year, and price. Similarly, nytimes (with adornment bff) can answer the queries on title and returns a table of title, author, and review.
  • the first inverted index, A maintains the relation between all terms used in the source description and the identifier of the source description itself.
  • the second index, B indexes only bound variables (terms). A snap shot of two example indexes are shown in FIG. 4.
  • a method For each base table in the query, a method according to this aspect of the invention, identifies a subset of source descriptions using the indexes.
  • the index A For each base table in the query, a method according to this aspect of the invention, identifies a subset of source descriptions using the indexes.
  • title is the only bound variable in the user query. It returns amazon, borders, and book3.
  • intersect the two results to get the subset of sources that are relevant to the user query, specifically for the book base table. Repeat the same process to get nytimes and wpost for the review base table.
  • All sources in the result have the query capability on the title attribute and can produce a table of the projected columns in each base table.
  • the remaining steps of the query processing are straightforward.
  • the system groups the source descriptions on the base tables and executes them.
  • the placeholders in the FROM clause are replaced by the query binding and encoded into a legal url query string.
  • the source descriptions of amazon and borders are executed and generate a book table by unioning the results from both sources.
  • the source descriptions of nytimes and wpost are executed to generate the review table.
  • the last step of query processing is to evaluate join predicates across the ontologies.
  • the book and review tables are joined on title attribute.
  • the first inverted index indexes Source Descriptions (SDs) based on their exported terms.
  • the second inverted index indexes SDs based on their input terms.
  • SDs Source Descriptions
  • the first inverted index is proved with terml, term2, and term3 and the resulting list of sources are intersected. This step identifies all sources that export all three terms that are needed.
  • the second inverted index is proved with term 1 only to identify all sources that are capable of answering queries on terml.
  • both results from the first index and second index are intersected to get sources that can answer queries on terml and export term2 and term3.
  • wrapping a new source in some implementations can be done very efficiently, in one implementation taking just 10 minutes on average.
  • the query planning stage is effectively instantaneous, with the delay in query evaluation due to waiting for the information sources to respond. With multithreading in a system according to the invention of the query engine, the delay in waiting for the sites to respond is overlapped.
  • FIG. 5 is a block diagram showing a representative example logic device in which various aspects of the present invention may be embodied.
  • FIG. 5 shows digital device 700 that may be understood as a logical apparatus that can read instructions from media 717 and/or network port 719. Apparatus 700 can thereafter use those instructions to direct a method according to the invention.
  • One type of logical apparatus that may embody the invention is a computer system as illustrated in 700, containing CPU 707, optional input devices 709 and 711, disk drives 715 and optional monitor 705.
  • Fixed media 717 may be used to program such a system and could represent a disk-type optical or magnetic media or a memory.
  • Communication port 719 may also be used to program such a system and could represent any type of communication connection.
  • the invention also may be embodied within the circuitry of an application specific integrated circuit (ASIC) or a programmable logic device (PLD).
  • ASIC application specific integrated circuit
  • PLD programmable logic device
  • the invention may be embodied in a computer understandable descriptor language which may be used to create an ASIC or PLD that operates as herein described.
  • the invention also may be embodied within the circuitry or logic processes of other digital apparatus, such as cameras, displays, image editing equipment, etc.

Abstract

A method and system for providing enhanced searching over multiple queryable data sources uses a simplified and scalable descriptors for representing queries to multiple data sources.

Description

METHOD AND APPARATUS FOR UNIFIED QUERY INTERFACE FOR NETWORK INFORMATION
TECHNICAL FIELD
The present invention relates to methods and systems for finding or searching information available on a network, in different information formats, and in some embodiments including access over a public network.
BACKGROUND ART
As the number of information sources on the web continues to grow, the need for good information integration technology increases. For this reason, the problem of information integration has received a great deal of attention from the research community and from the commercial community. The differences between the approaches taken by the research community and the commercial community are striking. At the risk of oversimplifying the issue, the research community has focused on the semantic power of the integration technology at the expense of the scalability of the approach, while the commercial community has focused on scalability at the expense of semantic power.
Research systems such as TSIMMIS, the Information Manifold, and Infomaster provide a general query facility over the integrated view of a number of data sources. They are able to support powerful queries and infer implicit joins through the use of a view containment test. While this provides the very real advantage that a user can get an answer to a query without knowing that a join is required to construct this answer (since the system deduces the implicit join "under the covers"), unfortunately, it also renders query planning and execution expensive. Query planning and execution in such systems is expensive because the size of the plan space that must be searched and generated query plan grows quickly with the number of sources wrapped. For this reason, these systems are most effective at evaluating complex queries over a relatively small number of sites.
In the commercial world, the information integration space is dominated by comparison shopping services. In contrast to the research systems, these systems do not provide general purpose querying - their goal is to be able to evaluate a small, j number of canned queries (expressed by forms presented to the user) over a large number of sites.
Another very real barrier to scalability that is largely orthogonal to the semantic power of an integration system is the difficulty of "wrapping" new sites, that is, how hard it is to add new information sources to the system. The research community has not really dealt with this scalability issue (there is little call to wrap hundreds of sites in a research prototype), while the commercial community "solves" this problem by employing a small army of programmers to write wrappers, usually aided by some sort of wrapper generation toolkit.
DISCLOSURE OF THE INVENTION
The present invention, in various aspects, involves a method and/or system and/or apparatus for providing a scalable, unified view or search over large numbers of queryable information sources. In specific embodiments, the invention accomplishes this, in part by sacrificing some expressive power in the set of queries supported.
A system according to one embodiment of the invention provides scalability through three main techniques. First, it uses a collection of ontologies organized into hierarchical namespaces as a medium for expressing data semantics. Second, it employs a declarative query language to describe information sources so that source descriptions can be "executed" at run time instead of being pre-compiled into the system. Third, it utilizes inverted-index style operations to identify a subset of information sources that are relevant to a particular user query.
A further understanding of the invention can be had from the detailed discussion of specific embodiments below. For purposes of clarity, this discussion refers to devices, methods, and concepts in terms of specific examples. However, the method of the present invention may operate in a wide variety of applications. It is therefore intended that the invention not be limited except as provided in the attached claims. Furthermore, it is well known in the art that computer systems can include a wide variety of different components and different functions in a modular fashion. Different embodiments of the present invention can include different mixtures of elements and functions and may group various functions as parts of various elements. For purposes of clarity, the invention is described in terms of systems that include different innovative components and innovative combinations of components. No inference should be taken to limit the invention to combinations containing all of the innovative components listed in any illustrative embodiment in this specification. Furthermore, it is well known in the art of internet applications and software systems that particular file formats, languages, and underlying methods of operation may vary. The disclosure of a particular implementation language or format of an element should not be taken to limit the invention to that particular implementation unless so provided in the attached claims. All publications, patents, and patent applications cited herein are hereby incorporated by reference in their entirety for all purposes. The invention will be better understood with reference to the following drawings and detailed description.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a functional block diagram of a system overview according to specific embodiments of the invention.
FIG. 2 illustrates two example of namespaces, according to specific embodiments of the invention.
FIG. 3 illustrates an example syntax for a source description language according to specific embodiments of the invention. FIG. 4 is a block diagram showing a snapshot of inverted index after registering sample sources.
FIG. 5 is a block diagram showing a representative example logic device in which various aspects of the present invention may be embodied.
BEST MODE FOR CARRYING OUT THE INVENTION 1. System Overview
In particular embodiments, the present invention involves a search system and/or method that employs namespaces in its query facility and "soft-wrapping" information sources. Figure 1 illustrates main components of an example system according to the present invention. (One implementation of a system according to the invention is referred in some associated documents IDB.) A system according to the present invention, aims to provide a general query facility by using a collection of ontologies organized into a hierarchical namespace. Each ontology in a namespace defines a set of terms that describe common concepts. A namespace is used as the medium for expressing data semantics. Both user queries and source descriptions are written using terms in the namespace.
According to further aspects of the invention, a query language is provided (sometimes referred to as IDBQL) that, is based on SQL. Queries are expressed using terms from a namespace. When writing a query, users do not need to know about the exported views of each individual information source. Instead, the query engine will identify a set of relevant information sources by using terms that appear in the query to probe an inverted index.
Unlike prior art systems (such as TSIMMIS and the Information Manifold), in specific embodiments, the present invention does not generally infer implicit joins. This implies that the invention can answer only a subset of the queries handled by systems that use joins. However, this also implies that query planning according to the present invention is much more simple than under these prior systems (it requires simple inverted list lookup operations) and scales to large numbers of sites.
2. Soft-Wrapping
The present invention in further aspects utilizes a novel approach referred to as soft-wrapping to wrap information sources. According to the present invention, a
"wrapper" is a declarative query evaluated at runtime. Source descriptions can be executed or evaluated at run time instead of being pre-compiled into the system (or "hard-wrapped"). The advantages of soft-wrapping over hard-wrapping are many. First, it is more flexible and portable, because the writing of source descriptions is independent of any run-time environment. Second, soft-wrappers can be tested and registered dynamically at runtime through a Web interface, without having to restart the system. Third, it is easy to adapt to dynamically changing Web data sources, as recompilation is not needed. Finally, soft wrapping is more secure in that what is registered is a declarative query, and not a pre-compiled wrapper program that must be trusted by whoever executes the wrapper.
3. Namespaces The present invention, in particular embodiments, uses a collection of ontologies organized into a hierarchical namespace as a medium for expressing data semantics. An ontology according to the invention is a grouping of terms describing a concept. The terms in the ontologies are reusable. When defining an ontology, one can borrow existing terms from other ontologies in the namespace as well as create new terms. An ontology can selectively inherit (or reuse) any subset of the parent ontology. Inheritance from multiple ontologies is also allowed.
The IDB namespace functions as a global schema that provides a uniform view over information on the Web. It is an a priori schema as opposed to the a posteriori schema of some prior art systems. In TSIMMIS, for example, user queries are formulated over the view exported by a mediator. The mediated view is, in turn, generated by integrating views of lower level mediators or data sources. As a result, any source level changes such as adding a new source or dropping an existing source may affect the upper level mediated view user queries being formulated on. A namespace according to the current invention is defined independently from the views of data sources. In fact, the source view is defined using the terms in the namespace. Because of this, information source level changes do not affect the global view.
In particular embodiments, the invention uses a simple collection of terms as the global schema. In future, if XML namespaces become prevalent these could be used in place of the IDB ontology. By adopting XML namespaces, the invention would then be able to reuse a large number of widely used namespaces as schema without having to reinvent them.
FIG. 2 illustrates two example of namespaces, according to specific embodiments of the invention. The movie ontology consists of terms that may be useful to describe movies. The term product#name from the product namespace is reused in the movie namespace as movie#title. It is advantageous to reuse existing terms, as this increases the number of information sources that can contribute to a given query. For instance, if user queries on the name of a product using the product ontology, then information sources belonging to book and movie ontologies are also queried in addition to sources directly belonging to the product ontology. This is because the book#title and movie#title terms are inherited from the product#name term in the product namespace.
4. Source Description Language According to Specific Embodiments of the Invention
A query system according to the invention interacts with information sources using source descriptions. The role of source descriptions is twofold: (1) They export the views and capabilities of information sources; (2) They extract and map local data in the described source to the exported view of the source. Unlike traditional "hard wrapping," the present invention, according to specific embodiments, uses a "soft wrapping" scheme that allows source descriptions to be executed at query evaluation time. The source description is, in fact, a query language that "queries" a remote document or database. As a result, IDB does not require hard-coded or compiled wrappers to communicate with sources. Prior art "hard wrappers" generally require recompilation each time an information source changes its data presentation.
An example syntax of the source description language is as follows:
SELECT list-of-terms FROM url [post | get] [html | xml] WHERE mapping-rule [[and] or] mapping-rule] ... The SELECT clause defines the exported view; the FROM clause specifies the location of the remote database and its query capability; and the WHERE clause defines the mapping rules.
As a further example, Figure 3 shows a source description for amazon.com. After evaluating the source description of amazon.com, an eight-column table of vendor, title, etc. will be generated.
The execution starts by evaluating the FROM clause. The FROM clause specifies the location of amazon.com' s book database and the query binding that it accepts. Amazon.com' s book database is published on the Web through a front-end form interface. This form interface accepts user inputs on the title and author fields, and this information is encoded in the url string in the FROM clause. In the case where the target information source is a document, the url of the document can simply be placed in the FROM clause without any query binding encoding.
Once IDB has rewritten the user query into local queries, the placeholders $book#title$ and $book#author$ will be replaced with the corresponding values from the user query. After it opens a url connection and sends the query string, the query result will be returned from the source in an HTML page. The HTML page is parsed into a DOM tree [DOM98]. If a source returns an XML page, then IDB will invoke an XML parser instead to generate the DOM tree. After this parsing step, the remaining query processing steps are transparent to both XML and HTML since the DOM interface is generic to both markup languages.
The WHERE clause consists of a set of path expressions and perl-style text operations. The path expressions are evaluated over the DOM tree generated from the result page. The syntax of our path expression is like that of HEL [SA99a, SA99b] and WIDL [A1197]. HEL also supports perl-style pattern matching. The IDB source description language, however, allows direct mapping from path expressions to the exported view and provides a larger set of text operations. Further, it allows the conjunction and disjunction of path expressions. For instance, depending on the user query binding, the amazon.com database returns two different types of HTML pages. In case the user query binding results in exactly one book entry, it directly returns the HTML page that contains the full book description. Otherwise, it returns an HTML page that contains a list of matching book entries, where each book entry has a short description and url to the book page. We need different path expressions for each of these cases, as shown in Figure 3. A source description language according to specific embodiments supports popular perl regular expression operations such as match, substitute, join, split, and a custom-designed switch operator. The switch operator is used to normalize the irregularity of output data across multiple sources. For example, some sources represent product availability in graphical symbols and they must be transformed into the text equivalents. The dot(.) in the path expression implies the direct path from the parent element to the child and the arrow(->) implies 0 or more steps exist in between.
The SELECT clause provides a global semantics for the local data. It defines the schema of the table that is generated by the execution of the source description. Note that the constant value ςamazon' is materialized into the book#vendor term as all book entries are coming from the same source, amazon.com. The plus(+) sign at the end of an attribute is shorthand for TS NOT NULL'.
Although the exported view of Figure 3 consists of terms only from the book ontology, this is not a requirement of the IDB approach. The IDB source description can choose terms selectively from one or more ontologies. The source description need not conform to any namespace nor have any restriction on choosing sets of terms from various ontologies. This allows the source description language to describe sources using terms that are close as possible to the original semantics of the data. As is the case in amazon.com, data extracted from the result page can potentially have some nested structure. To map this nested data to a flat output table, IDB employs a set of special iterators that are associated with each output attribute.
As we pointed out earlier, since IDB uses a declarative query language for its source descriptions, the traditional pre-compiled "wrapper" is no longer needed. This "soft wrapper" approach is more scalable since the process of writing, testing, and registering wrappers is not dependent on the hardware and software development environment, thus it can be completely decentralized. In fact, the source descriptions can be tested and registered at runtime through the Internet without bringing down the system. Anyone can write and register the source descriptions from anywhere in the Internet. Also, with the soft wrapper approach it is easier to adjust to dynamically changing Web sources, as they need not be recompiled each time the source changes. Finally, it is more secure in that what is registered is a declarative query and not a pre-compiled wrapper program. That is, using the soft-wrapper approach, a "buggy" wrapper may cause the data from the wrapped source to be mapped incorrectly, but since it is just a declarative query, it does not pose a security risk to the site executing the soft wrapper query.
5. Query Language According to Specific Embodiments of the Invention
This section discusses a query language according to a specific embodiment of the invention (at times, referred to herein, as IDBQL) primarily through examples. In a particualr embodiment, a query language may be understood as a subset of SQL, with additional keyword predicates. The keyword predicates are added to support a keyword match operation that is perhaps the most popular operation in real-world web queries.
A query is formulated using the terms defined in the ontology. The query writer does not need to know about the exported views of each of the individual underlying information sources. A query processor according to specific embodiments of the invention will identify a set of relevant information sources by probing an inverted index using the terms used in the query as described in the next section. A first example query illustrates a basic structure of a query language and the use of the keyword predicate.
SELECT B. endor, B. title, B. author, B. price, B.year
FROM book B
WHERE B. title ~ = 'Database Systems' Example Query 1 This query uses the "book" ontology and retrieves vendor, title, author, price, and year information of books whose title contains the keyword 'Database' and 'Systems'. The result table will include book entries with titles, e.g. 'Database Management Systems', 'Readings in Database Systems', etc. The partial output of the query is shown below. vendor : Bookpool title : Database Management Systems author : Raghu Ramakrishnan, et al price : 55.95 year : 1999
vendor : Amazon title : Fundamentals of Database Systems author : Ramez A. Elmasri/Shamkant B. Navathe price : 79.75 year : 1999
vendor : Barnes and Noble title : A First Course in Database Systems author : Jeffrey D. Ullman, With Jennifer Widom price : 59.75 year : 1997
Example Results 1
The keyword operators are especially useful because data may come from autonomous information sources. The presentation format of the data may differ across information sources, and perhaps even the data within one source may have different presentation formats over time. One common example would be the format of person's name. Some sources may put the last name before the first name, and some others first name first. A query language supports three keyword operators and their semantics as defined in Table 1.
Figure imgf000012_0001
Table 1. Keyword Operator Semantics
Figure imgf000012_0002
Table 2. Data Type Coercion Rules
The first example query was not very selective as returned more than 200 entries from various online book vendors. A second example query adds two more selection conditions to the first query and retrieves book availability information along with the original attributes.
SELECT B . endor, B . title, B . author, B . price, B . year, B . stock
FROM book B
WHERE B . title -=' Database Systems ' AND B . author ~= ' Ramakrishnan ' AND B . year > 1998
Example Query 2
This query illustrates the use of numeric order predicates and data type coercion. A data model according to the invention is essentially type free. Attribute values are treated as string literals. To evaluate an order predicate (<, >, >=, <=, =), a system uses the Lore[MAG+97] coercion rules as shown in Table 2. For instance, in the above query, if the year attribute is not null and can be parsed into a number, the predicate will be evaluated over two numeric values. In a join predicate, such as book.year = movie.year, both operands are attributes. In thi? "nc,e, one of the attributes is first coerced into an appropriate type before the predicate is evaluated using the rules in Table 2. Part of the result table for the second query is shown below. vendor: Fatbrain title: Database Management Systems , Second Edition author: Ramakrishnan , Raghu / Gehrke, Johannes price: 53.25 year: 1999 stock: Ships same day
vendor: Bookpool title: Database Management Systems author: Raghu Ramakrishnan, et al price: 55.95 year: 1999 stock: In Stock!
vendor: Amazon title: Database Management Systems author: Raghu Ramakrishnan, Johannes Gehrke price: 75.60 year: 1999 stock: Usually ships in 24 hours
Example Results 2
Example Query 3 is a simple explicit join query. It is provided to illustrate the case where more than one ontology is involved in a query. It retrieves title, actor of movies and vendor, url, format, price of books where the movies are directed by 'Steven Spielberg', books are written by 'Michael Crichton', and both movie and book have the same title. Part of the result table for this query follows the example.
SELECT M . title , M . actor, B . endor, B . url ,
B . format , B . price
FROM book B, movie M
WHERE B . author ~= ' Michael Crichton ' AND
M . director ~= ' Steven Spielberg ' AND
B . title = M . title
Example Query 3 title: Jurassic Park actor: Morgan Freeman / Nigel Hawthorne / Anthony Hopkins /
Sir Anthony Hopkines vendor: Amazon url: http://www.amazon.com/exec/obidos/ASLN/0394588169/ ... format: Hardcover price: 18.87
title: Jurassic Park actor: Morgan Freeman / Nigel Hawthorne / Anthony Hopkins / Sir
Anthony Hopkines vendor: Barnes and Noble url: http://shop.bamesandnoble.com/booksearch/isbnTnquiry.asp ?isbn=
0345370775.. format: Mass Market Paperback price: 6.39
title: Jurassic Park actor: Morgan Freeman / Nigel Hawthorne / Anthony Hopkins /
Sir Anthony Hopkines vendor: Borders url: http://search.borders.com/fcgi-bin/db2www/search/search.d2w/
Details?prodID... format: Paperback price: 6.39
Example Results 3
6. Example Query Processing
The following are the steps of query processing according to specific embodiments of the invention.
• A user query is formulated using terms in one or more ontologies.
• The query engine identifies base tables for each ontology used in the query. A base table is determined by identifying the minimum subset of terms in a given ontology that is required to evaluate the user query.
• The query engine retrieves a set of source descriptions for each base table from the source description index. This index is, in fact, an inverted index that associates terms in the ontology to the relevant source descriptions. • The query engine translates the original user query into local queries using the views exported from the set of source descriptions that were identified in the previous step.
• The query engine materializes local views at each source, unions results by base tables, and processes remaining predicates (e.g. joins between base tables).
These steps are illustrated below using a query example. The following are the two ontologies (book and review) that the example query references. The list of terms in each is shown only for illustration. An ontology is a collection of terms that describe a concept, e.g. book and review in the example; a namespace is the collection of all those ontologies organized into a hierarchical semantic graph. book (vendor, title, url, author, year, price, format, stock, publisher, isbn) review (isbn, title, author, year, review)
Example Ontology For Example Query 4 An example query is shown below. It retrieves the vendor, title, price, and review attributes of books that have the keywords 'Database' and 'Systems' in their title.
SELECT B .vendor, B title, B. author, B. price, R . review
FROM book B, review R
WHERE B . title~= = Database Syst 2ms ' and B. title =
R. title
Example Query 4
The first step of the query processing is to identify the base tables and the predicate binding implied for the base tables. To illustrate this process, we represent the above query in rules with adornments. query4fbfff (vendor, title, author, price, review) :- bookfbff (vendor, title, author, price), reviewbf (title, review), title ~= ''Database Systems'
Query 4 in Rules with Adornments
Predicate adornment is used to illustrate how the binding pattern serves as a filter for pruning out irrelevant sources. As shown above, the title is the only variable that is bound in query4. The base tables in this query are bookfbff ( vendor, title , author, price ) and reviewbf ( title ; review) . The way base tables are identified is straightforward; all terms used in the SELECT and WHERE clause are gathered and grouped into the ontologies that appear in the FROM clause. Note that the base table is different from the global predicates as discussed, for example, with regard to the Information Manifold [LR096a, LR096b]. Unlike the predefined set of predicates in the Information Manifold, a base table according to specific embodiments of the invention is dynamically generated by projecting out terms from the particular user query. Also, compared to other systems, this aspect of the invention uses a much simpler way to identify information sources that are relevant to a particular user query. Previous systems, including the Information Manifold and Infomaster [DG97, GKD97] identify information sources through a query rewriting scheme based on the view containment test in their query processing (see [U1197] for an overview). In contrast, in this aspect, embodiments of the invention utilizes inverted index style operations in query processing. To illustrate this, assume the following information sources. amazon (vendor, title, author, year, price) bordersf ff (vendor, title, author, price) Book3fbff (vendor, title, price, isbn) Book4fb (author, isbn)
nytimesbff (title, author, review) wpost .bbf (title, isbn, review)
Due to the space limitation, only the list of terms that each source exports, instead of source descriptions, is shown. For an example of a full source description see FIG. 3.
The adornment here has a slightly different meaning than the adornment in the query. It is used here to specify the query capability of a particular source. For instance, the adornment of amazon, fbbff means that amazon can accept the queries on either title or author and returns a table of columns including vendor, title, author, year, and price. Similarly, nytimes (with adornment bff) can answer the queries on title and returns a table of title, author, and review.
When a source description is registered, a system according to a specific embodiment of the invention, indexes the source into two inverted indexes. The first inverted index, A, maintains the relation between all terms used in the source description and the identifier of the source description itself. The second index, B, indexes only bound variables (terms). A snap shot of two example indexes are shown in FIG. 4.
For each base table in the query, a method according to this aspect of the invention, identifies a subset of source descriptions using the indexes. In the example query, first probe the index A using all terms in the book base table that include vendor, title, author, and price. The result amazon and borders is obtained by intersecting the four resulting inverted lists. Second, probe the index B using bound variables in the book base table. Here, title is the only bound variable in the user query. It returns amazon, borders, and book3. Finally, intersect the two results to get the subset of sources that are relevant to the user query, specifically for the book base table. Repeat the same process to get nytimes and wpost for the review base table.
All sources in the result have the query capability on the title attribute and can produce a table of the projected columns in each base table. The remaining steps of the query processing are straightforward. The system groups the source descriptions on the base tables and executes them. When executing source descriptions, the placeholders in the FROM clause are replaced by the query binding and encoded into a legal url query string. In the example, the source descriptions of amazon and borders are executed and generate a book table by unioning the results from both sources. Similarly, the source descriptions of nytimes and wpost are executed to generate the review table.
Earlier integration systems based on the view containment test may find more results than this procedure. (For instance, the Information Manifold would generate tuples from the sources book3 and book4 by joini"" nn the attribute isbn.) The present invention, because it does not do such inference, would not find this implicit join. However by giving up some semantic power, the present invention gains flexibility and scalability.
The last step of query processing is to evaluate join predicates across the ontologies. In the example, the book and review tables are joined on title attribute.
By way of further example, consider in a further embodiment, the first inverted index indexes Source Descriptions (SDs) based on their exported terms. The second inverted index indexes SDs based on their input terms. As an example, assume a user query has an input on terml, and requires term2 and term3 exported, and all terms are coming from a single ontology for simplicity. To identify relevant sources, the first inverted index is proved with terml, term2, and term3 and the resulting list of sources are intersected. This step identifies all sources that export all three terms that are needed. Then, the second inverted index is proved with term 1 only to identify all sources that are capable of answering queries on terml. Finally, both results from the first index and second index are intersected to get sources that can answer queries on terml and export term2 and term3.
7. Implementation Issues
From the teachings provided herein, it will be seen that wrapping a new source in some implementations can be done very efficiently, in one implementation taking just 10 minutes on average. The query planning stage is effectively instantaneous, with the delay in query evaluation due to waiting for the information sources to respond. With multithreading in a system according to the invention of the query engine, the delay in waiting for the sites to respond is overlapped.
8. Embodiment in a Programmed Digital Apparatus
The invention or aspects thereof may be embodied in a fixed media or transmissible program component containing logic instructions and/or data that, when loaded into an appropriately configured computing device, cause that device to perform interpolation according to the invention. FIG. 5 is a block diagram showing a representative example logic device in which various aspects of the present invention may be embodied.
FIG. 5 shows digital device 700 that may be understood as a logical apparatus that can read instructions from media 717 and/or network port 719. Apparatus 700 can thereafter use those instructions to direct a method according to the invention. One type of logical apparatus that may embody the invention is a computer system as illustrated in 700, containing CPU 707, optional input devices 709 and 711, disk drives 715 and optional monitor 705. Fixed media 717 may be used to program such a system and could represent a disk-type optical or magnetic media or a memory. Communication port 719 may also be used to program such a system and could represent any type of communication connection.
The invention also may be embodied within the circuitry of an application specific integrated circuit (ASIC) or a programmable logic device (PLD). In such a case, the invention may be embodied in a computer understandable descriptor language which may be used to create an ASIC or PLD that operates as herein described.
The invention also may be embodied within the circuitry or logic processes of other digital apparatus, such as cameras, displays, image editing equipment, etc.
9. Conclusion The invention has now been explained with regard to specific embodiments.
Variations on these embodiments and other embodiments will be apparent to those of skill in the art. The invention therefore should not be limited except as provided in the attached claims.
It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All publications, patents, and patent applications cited herein are hereby incorporated by reference in their entirety for all purposes.

Claims

1. An IDB query engine comprising: a global schema providing a standardized view of information: a resource description that defines the view of information sources using ontologies defined in said global schema and describing how to interact with the sources for extracting information; and a query engine processing user queries in a declarative query language based on SQL.
2. A logic system for querying multiple information sources comprising: a collection of ontologies organized into a hierarchical namespace; a declarative query language able to express data relations using said ontologies; an inverted index to identify information sources relevant to a particular query.
3. The system according to claim 2 wherein both queries and information source descriptions may be expressed using terms in one or more ontologies of said namespaces.
4. The system according to claim 2 wherein both queries and information source descriptions may be expressed using said declarative query language.
5. The system according to claim 2 wherein said declarative query language can express information source descriptions that may be evaluated at run time.
6. The system according to claim 2 wherein said query language is related to SQL.
7. The system according to claim 2 wherein queries are expressed using terms from said namespace.
8. The system according to claim 2 further comprising: a query engine that identifies a set of relevant information sources by using terms appearing in a query to probe said inverted index.
9. The system according to claim 2 further wherein said inverted index comprising: a first inverted index identifying sources that can return data relevant to a set ontologies; and a second inverted index identifying sources that can be searched according to particular terms.
10. The system according to claim 2 wherein when writing a query it is not necessary to know the exported views of individual information sources.
11. The system according to claim 2 wherein terms in one of said ontologies is reusable in other ontologies.
12. The system according to claim 2 wherein when defining an ontology existing terms from other ontologies in a namespace may be borrowed as well as creating new terms.
13. The system according to claim 2 wherein an ontology can selectively inherit (or reuse) any subset of terms in a parent ontology.
14. The system according to claim 2 wherein an ontology can selectively inherit (or reuse) any subset of terms in multiple ontologies.
15. The system according to claim 2 wherein a namespace functions as a global schema that provides a uniform view over information from multiple information sources.
16. The system according to claim 2 wherein a namespace is a collection of ontologies organized into a hierarchical graph of terms.
17. The system according to claim 2 wherein a namespace is defined independently from views provided by various information sources and because of this, information source level changes do not affect a global view provided by said namespace.
18. The system according to claim 2 wherein a source view is defined using
tenns in a namespace.
19. The system wherein a source view is defined using terms in a namespace.
20. The system according to claim 2 wherein at least one of said namespaces is an XML namespace.
21. The system according to claim 2 further comprising: a soft-wrapper for at least one of said information sources wherein said soft- wrapper is a declarative query evaluated at runtime.
22. The system according to claim 2 wherein said information source descriptions may be written independent of any run-time environment.
23. The system according to claim 2 wherein soft- wrappers can be tested and registered dynamically at runtime through an network interface, without having to restart the system.
24. The system according to claim 2 wherein soft wrapping is more secure because what is registered is a declarative query and not a pre-compiled wrapper program.
25. A method of query processing comprising: receiving a user query formulated using terms from one or more ontologies in the namespace; identifying base tables for ontologies used in said query wherein a base table is determined by identifying a minimum subset of terms in an ontology that is required to evaluate said user query; identifying base tables and the predicate binding implied for the base tables; retrieving a set of source descriptions for each identified base table; translating said user query using views exported from søϊ^ set of source descriptions; materializing local views at each source; receiving results.
26. A method according to claim 25 further comprising: uniting results provided from two or more base tables.
27. A method according to claim 25 further comprising: processing remaining predicates, such as joins between base tables.
28. A method according to claim 25 wherein said set of source descriptions are retrieved from a source description index that is an inverted index that associates terms to relevant source descriptions.
29. A method according to claim 25 wherein queries may be represented in rules with adornments wherein said adornments specify filtering for pruning out irrelevant sources.
30. A method according to claim 25 wherein base tables are identified by gathering terms used in a SELECT and WHERE clause of said query and grouped into ontologies that appear in a FROM clause of said query.
31. A method according to claim 25 wherein a base table is dynamically generated by projecting. out terms from a particular user query.
32. A method according to claim 25 wherein information sources that are relevant to a particular user query are identified utilizes inverted index style operations.
33. A method according to claim 25 wherein query sources may be indicated with an adornment that specifies query capability of a particular source.
34. A method according to claim 25 wherein when a source description is registered said source description is indexed into multiple inverted indexes.
35. A method according to claim 25 wherein said inverted indexes comprise: a first inverted index storing relations between all terms used in said source description and said source description; and a second inverted index storing relations between bound terms used in said source description and said source description.
36. A method according to claim 25 further comprising:
intersecting two results to get the subset of sources that are relevant to the user query.
37. A method according to claim 25 wherein when executing source descriptions, the placeholders in the FROM clause are replaced by the query binding and encoded into a legal url query string.
PCT/KR2001/000886 2000-06-01 2001-05-26 Method and apparatus for unified query interface for network information WO2001093599A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001260758A AU2001260758A1 (en) 2000-06-01 2001-05-26 Method and apparatus for unified query interface for network information

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US20854400P 2000-06-01 2000-06-01
US60/208,544 2000-06-01
US67605000A 2000-09-28 2000-09-28
US09/676,050 2000-09-28

Publications (2)

Publication Number Publication Date
WO2001093599A2 true WO2001093599A2 (en) 2001-12-06
WO2001093599A3 WO2001093599A3 (en) 2002-08-08

Family

ID=26903282

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2001/000886 WO2001093599A2 (en) 2000-06-01 2001-05-26 Method and apparatus for unified query interface for network information

Country Status (3)

Country Link
KR (1) KR20010109206A (en)
AU (1) AU2001260758A1 (en)
WO (1) WO2001093599A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002088988A1 (en) * 2001-04-30 2002-11-07 The Commonwealth Of Australia Data processing architecture
WO2022134878A1 (en) * 2020-12-21 2022-06-30 中兴通讯股份有限公司 Data processing method and apparatus, data querying method and apparatus, electronic device, and storage medium

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100484138B1 (en) * 2002-05-08 2005-04-18 삼성전자주식회사 XML indexing method for regular path expression queries in relational database and data structure thereof.
EP1467289A1 (en) * 2003-04-07 2004-10-13 Deutsche Thomson-Brandt Gmbh database model for hierarchical data formats
KR100551954B1 (en) * 2003-12-04 2006-02-20 한국전자통신연구원 System and Method of concept-based retrieval model of protein interaction networks with gene ontology
KR100704285B1 (en) * 2004-06-02 2007-04-10 인하대학교 산학협력단 Apparatus and methd for constructing ontology of product data using resource description framework
KR100815563B1 (en) * 2006-08-28 2008-03-20 한국과학기술정보연구원 System and method for knowledge extension and inference service based on DBMS
KR100834536B1 (en) * 2006-09-28 2008-06-02 한국전자통신연구원 Method for information display based on ontology
KR100863121B1 (en) * 2007-01-12 2008-10-15 한국문화콘텐츠진흥원 Ontology search system
KR100993288B1 (en) 2008-12-15 2010-11-09 한국과학기술정보연구원 System and Method for Efficient Reasoning Using View in DBMS-based RDF Triple Store

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5692180A (en) * 1995-01-31 1997-11-25 International Business Machines Corporation Object-oriented cell directory database for a distributed computing environment
WO2000005664A1 (en) * 1998-07-24 2000-02-03 Jarg Corporation Search system and method based on multiple ontologies
US6061675A (en) * 1995-05-31 2000-05-09 Oracle Corporation Methods and apparatus for classifying terminology utilizing a knowledge catalog

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5659724A (en) * 1992-11-06 1997-08-19 Ncr Interactive data analysis apparatus employing a knowledge base
US5737591A (en) * 1996-05-23 1998-04-07 Microsoft Corporation Database view generation system
US6076092A (en) * 1997-08-19 2000-06-13 Sun Microsystems, Inc. System and process for providing improved database interfacing using query objects
KR100303153B1 (en) * 1997-12-27 2001-11-22 윤덕용 System for storing and searching html document
KR100625422B1 (en) * 1999-12-07 2006-09-18 주식회사 케이티 Method for integrating schema using multidatabase query language

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5692180A (en) * 1995-01-31 1997-11-25 International Business Machines Corporation Object-oriented cell directory database for a distributed computing environment
US6061675A (en) * 1995-05-31 2000-05-09 Oracle Corporation Methods and apparatus for classifying terminology utilizing a knowledge catalog
WO2000005664A1 (en) * 1998-07-24 2000-02-03 Jarg Corporation Search system and method based on multiple ontologies

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002088988A1 (en) * 2001-04-30 2002-11-07 The Commonwealth Of Australia Data processing architecture
US7027055B2 (en) 2001-04-30 2006-04-11 The Commonwealth Of Australia Data view of a modelling system
US7085683B2 (en) 2001-04-30 2006-08-01 The Commonwealth Of Australia Data processing and observation system
US7250944B2 (en) 2001-04-30 2007-07-31 The Commonweath Of Australia Geographic view of a modelling system
US8121973B2 (en) 2001-04-30 2012-02-21 The Commonwealth Of Australia Event handling system
WO2022134878A1 (en) * 2020-12-21 2022-06-30 中兴通讯股份有限公司 Data processing method and apparatus, data querying method and apparatus, electronic device, and storage medium

Also Published As

Publication number Publication date
WO2001093599A3 (en) 2002-08-08
AU2001260758A1 (en) 2001-12-11
KR20010109206A (en) 2001-12-08

Similar Documents

Publication Publication Date Title
Hai et al. Constance: An intelligent data lake system
US7581170B2 (en) Visual and interactive wrapper generation, automated information extraction from Web pages, and translation into XML
Walmsley XQuery
Josifovski et al. Garlic: a new flavor of federated query processing for DB2
US6708186B1 (en) Aggregating and manipulating dictionary metadata in a database system
US5768578A (en) User interface for information retrieval system
US8983931B2 (en) Index-based evaluation of path-based queries
WO2007098320A2 (en) Apparatus and method for federated querying of unstructured data
Zhang et al. Light-weight domain-based form assistant: querying web databases on the fly
CA2473446A1 (en) Identifier vocabulary data access method and system
JP5048956B2 (en) Information retrieval by database crawling
Amann et al. Integrating ontologies and thesauri for RDF schema creation and metadata querying
WO2001093599A2 (en) Method and apparatus for unified query interface for network information
Heibi et al. Enabling text search on SPARQL endpoints through OSCAR
Liu et al. An XML-enabled data extraction toolkit for web sources
Bhowmick et al. Web data management: a warehouse approach
Fuentes‐Lorenzo et al. A RESTful and semantic framework for data integration
May et al. Information extraction from the Web
Manghi et al. Hybrid applications over XML: Integrating the procedural and declarative approaches
Campi et al. Designing service marts for engineering search computing applications
Srivastava et al. Enhancing asset search and retrieval in a services repository using consumption contexts
Yerneni Mediated query processing over autonomous data sources
Roth et al. An Architecture for Transparent Access to Diverse Data Sources
Goldman Integrated query and search of databases, XML, and the Web
Kang et al. IDB: Toward the scalable integration of queryable internet data sources

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: A3

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: COMMUNICATION PURSUANT TO RULE 69 EPC (EPO FORM 1205A OF 210303)

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP