WO2010137940A1 - A method and system for extendable semantic query interpretation - Google Patents

A method and system for extendable semantic query interpretation Download PDF

Info

Publication number
WO2010137940A1
WO2010137940A1 PCT/MY2010/000084 MY2010000084W WO2010137940A1 WO 2010137940 A1 WO2010137940 A1 WO 2010137940A1 MY 2010000084 W MY2010000084 W MY 2010000084W WO 2010137940 A1 WO2010137940 A1 WO 2010137940A1
Authority
WO
WIPO (PCT)
Prior art keywords
query
natural language
statement
internal
result
Prior art date
Application number
PCT/MY2010/000084
Other languages
French (fr)
Inventor
Arun Anand Sadanandan
Kow Weng Onn
Mohammad Reza Beik Zadeh
Dickson Lukose
Original Assignee
Mimos Berhad
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mimos Berhad filed Critical Mimos Berhad
Publication of WO2010137940A1 publication Critical patent/WO2010137940A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion

Definitions

  • the present invention relates to a method and system for extendable semantic query interpretation.
  • semantic query engines are becoming increasingly accepted as the preferred query processing means to provide meaning-based queries and retrieval of documents from a semantic based Web.
  • Resources in the semantic based Web are semantically annotated in Resource Description Framework (RDF) triple format to provide for ontology knowledge bases.
  • RDF Resource Description Framework
  • RDF Resource Description Framework
  • Al Artificial Intelligence
  • SPARQL or Al language query format has been well defined to query ontology knowledge bases with its powerful inference engine
  • the query layer remains as the building block for developing semantic technology applications. This is because a user is required to know exactly how and where the information is stored in the knowledge base in order to perform a query on the same. This hinders the efforts of providing a simple and practical approach of performing a query on the semantic based web.
  • the method comprises receiving a structured natural language user query, interpreting the structured natural language user query, generating from the structured natural language user query, an internal query representation statement, performing query enrichment to generate at least one enriched internal query representation statement, generating from the at least one enriched internal query representation statement, a knowledge base compliant query to provide for searching at least one query result from an ontology knowledge base, generating at least one internal query result representation statement and converting the at least one internal query result representation statement to a structured natural language result.
  • the system comprises an intelligent word sense, a semantic query interpreter, a query transformer, a query enricher and a natural language generator.
  • the intelligent word sense comprises means for receiving a structured natural language user query.
  • the semantic query interpreter comprises means for interpreting the structured natural language user query.
  • the query transformer comprises means for generating from the structured natural language user query, an internal query representation statement.
  • the query enricher comprises means for performing query enrichment to generate at least one enriched internal query representation statement, generating from the at least one enriched internal query representation statement, a knowledge base compliant query to provide for searching at least one query result from an ontology knowledge base and generating at least one internal query result representation statement.
  • the natural language generator comprises means for converting the at least one internal query result representation statement to a structured natural language result.
  • FIG. 1 illustrates a flowchart of a method for an extendable semantic query interpretation.
  • FIG. 2 illustrates a block diagram of a system for an extendable semantic query interpretation.
  • the present invention relates to a method and system for extendable semantic query interpretation.
  • this specification will describe the present invention according to the preferred embodiments of the present invention.
  • limiting the description to the preferred embodiments of the invention is merely to facilitate discussion of the present invention and it is envisioned that those skilled in the art may devise various modifications and equivalents without departing from the scope of the appended claims.
  • the embodiments of the present invention relate to a method and system for extendable semantic query interpretation.
  • the present invention provides an extendable semantic query interpretation to enable a user to query information stored in ontology knowledge bases, which are essentially knowledge bases in Resource Description Framework (RDF) format, without having in-depth understanding of the underlying knowledge base.
  • RDF Resource Description Framework
  • the embodiments of the present invention simplifies the process of generating valid query statements by creating knowledge base compliant query statements from the user's query which is received in the form of a structured natural language query. Further, the embodiments of the present invention also translate the query results obtained from the ontology knowledge base in the Resource Description Framework (RDF) format into a structured natural language result. Hence, this substantially eliminates the necessity for the user to explicitly understand the structure of the knowledge base to perform a query on the knowledge base. Additionally, the user is shielded from the complexity of the query processing while having the liberty of constructing complex queries.
  • the embodiments of the present invention addresses the issue of meaningful query retrieval by employing query enrichment techniques.
  • the user's query terms residing in the form of a structured natural language query are expanded by making reference to a lexical database such as Wordnet. Additionally, different word senses, synonyms, antonyms and the like are used to expand the user's query to enable comprehensive and meaningful query processing.
  • FIG. 1 illustrates a flowchart of the method (100) for an extendable semantic query interpretation.
  • FIG. 2 illustrates a block diagram of the system (200) for an extendable semantic query interpretation.
  • the method (100) for the extendable semantic query interpretation comprises receiving a structured natural language user query (102), interpreting the structured natural language user query (104), generating from the structured natural language user query, an internal query representation statement (106), performing query enrichment (108) to generate at least one enriched internal query representation statement, generating from the at least one enriched internal query representation statement, a knowledge base compliant query (110) to provide for searching at least one query result from an ontology knowledge base, generating at least one internal query result representation statement (112) and converting the at least one internal query result representation statement to a structured natural language result (114).
  • the system (200) for the extendable semantic query interpretation comprises an intelligent word sense (202), a semantic query interpreter (206), a query transformer
  • These components operate in tandem with a parser (204), a query module (214), a lexical database (212) and a knowledge base (220) as a semantic query engine for the purpose of providing the extendable semantic query interpretation while the user performs a query on the knowledge base (220).
  • the intelligent word sense (202) is the interface module between the user and the semantic query engine. It has the purpose of preprocessing the query received from the user for the ontology knowledge base query.
  • the intelligent word sense (202) receives the user's query in the form of a structured natural language user query (102).
  • the intelligent word sense (202) provides a plurality of auto-completion suggestions for the terms of the structured natural language user query as the same is being received from the user.
  • the auto-completion suggestions are obtained from a plurality of terms available in the ontology knowledge base (220).
  • the plurality of terms is defined by a list of concepts and relations in the ontology knowledge base (220).
  • This feature of auto-completion suggestions assists the user while typing the query statement in the structured natural language form by providing auto-completion suggestions from terms available in the ontology knowledge base (220) as an option to be adopted by the user.
  • This allows the user to provide a query statement without having knowledge of the variety of complex terms available in the ontology knowledge base (220). Further more, this feature provides the basis of efficient and accurate query statements where the occurrences typographical errors are reduced.
  • the query statement received as the structured natural language user query by the intelligent word sense (202) is then forwarded to the semantic query interpreter (206). In this module, the structured natural language user query (104) is interpreted before further processing by the query transformer (208).
  • the semantic query interpreter (206) comprises means for interpreting the structured natural language user query (104), which includes two steps. Firstly, using a parser (204) the structured natural language user query is parsed to identify a Part-of-Speech (POS) information of each term in the query statement.
  • the parser (204) may comprise of a lexical parser.
  • the Part-of- Speech (POS) information is identified using a word-level tagging lookup.
  • the second step comprises identifying a type of query by matching the parsed structured natural language user query to a set of configurable query templates. From the parsed structured natural language user query, the sub-phrases are chunked together such that by matching the same with a set of configurable query templates, the type of query may be identified.
  • the set of configurable query templates are a predefined set of possible identified query types.
  • the complexity of the structured natural language user query is determined at this stage that is whether the structured natural language user query is a union, intersection or negation type of query.
  • the complexity of the user's query may be enhanced by the extensiveness of the set of configurable query templates.
  • the parsed structured natural language user query is classified as being the type of query as follows: "What/WP is/VBZ NP1 IN NP2", where NP1 would refer to “the symptoms” and NP2 would refer to "essential hypertension”.
  • the information pertaining to the parsed structured natural language user query and the type of query is forwarded to the query transformer (208) for further processing.
  • the query transformer (208) comprises a Subject-Predicate-Object transformer (208).
  • the Subject-Predicate-Object transformer (208) comprises means for generating from the structured natural language user query, an internal query representation statement (106).
  • the internal query representation statement is generated in the form of Resource Description Framework (RDF) triple format.
  • RDF Resource Description Framework
  • the Subject-Predicate-Object transformer (208) receives the parsed structured natural language user query and the type of query. Based on the Part-of-Speech (POS) information from the parsed structured natural language user query and the type of query of the same, the Subject-Predicate-Object transformer (208) extracts a subject part, a predicate part and an object part of the structured natural language user query. The subject part, the predicate part and the object part are used to generate at least one internal query representation statement in the Resource Description Framework (RDF) triple format.
  • the subject part refers to the root term of the query statement and the predicate part refers to the term of the query statement connecting the subject part to the object part. Since Resource Description Framework (RDF) stores triples of data in the ontology knowledge base (220), a query may be performed on either one of the subject part, predicate part or the object part.
  • the Subject-Predicate-Object transformer (208) then tags an argument of the at least one internal query representation statement by tagging the subject part, the predicate part or the object part for searching the at least one query result from the ontology knowledge base (220).
  • the tag used for tagging the argument comprises a symbol normally a question mark, "?".
  • the query enricher (210) comprises means for performing query enrichment (108) to generate at least one enriched internal query representation statement, generating from the at least one enriched internal query representation statement, a knowledge base compliant query (110) to provide for searching at least one query result from an ontology knowledge base (220) and generating at least one internal query result representation statement (112).
  • the query enricher (210) performs query enrichment (108) to generate the at least one enriched internal query representation statement using syntactic matching and semantic matching using a lexical database (212). Query enrichment enables comprehensive and meaningful query processing.
  • Syntactic matching determines the degree or score of matching between terms in the query statement and terms in the ontology knowledge base (220) using syntactic distance techniques such as Tf-ldf, Jaro Winkler and the like. If the matching score is higher than a predetermined threshold score value, which is manually set, the terms are considered matched and vice-versa. In the case where the terms are matched, the original term in the query statement will be substituted with the term identified from the ontology knowledge base (220).
  • Semantic matching identifies terms in the ontology knowledge base (220) that have similar meaning with terms in the internal query representation statement. Semantic matching may have several level of matching including synonym matching, associative matching, "kind-of matching and "part-of matching.
  • semantic matching is achieved by matching the terms in the internal query representation statement with terms in a lexical database (212) such as Wordnet from which synsets of a term are retrieved and utilized to generate the at least one enriched internal query representation statement.
  • the synsets comprise nouns, verbs, adjectives and adverbs that are grouped into sets of cognitive synonyms, each expressing a distinct concept.
  • query enrichment performed on the internal query representation statement as follows:
  • the query enricher (210) then generates from the at least one enriched internal query representation statement, the knowledge base compliant query (110) to provide for searching the at least one query result from the ontology knowledge base (220). This begins with the query enricher (210) combining the at least one enriched internal query representation statement generated from query enrichment (108). The at least one enriched internal query representation statement is combined to form a valid query statement. Using the valid query statement and based on the properties of the structured natural language user query, the query enricher (210) automatically identifies which knowledge base compliant query statement is to be generated, either a SPARQL query statement or a PROLOG query statement. This knowledge base compliant query statement is used for searching the at least one query result from the ontology knowledge base (220).
  • the query module (214) searches the at least one query result from the ontology knowledge base (220).
  • the query module (214), specifically a Resource Description Framework (RDF) query module receives the knowledge base compliant query statement from the query enricher (210) in the form of Resource Description Framework (RDF) triple format for SPARQL query and PROLOG statement for PROLOG query. The same is then validated to generate a validated query statement.
  • the query module (214) uses the validated query statement to search for the at least one query result in the ontology knowledge base (220) residing in an appropriate knowledge base server.
  • the query module (214) then returns the at least one query result from the ontology knowledge base (220) to the query enricher (210).
  • the query module (214) returns "headache”, “visual_disturbance” and “vomiting" as the at least one query result obtained from the ontology knowledge base (220).
  • the query enricher (210) finally generates the at least one internal query result representation statement (112) by integrating the at least one query result obtained from the query module (214) with the internal query representation statement generated by the Subject-Predicate-Object transformer (208). This is performed by an answer modifier module of the query enricher (210).
  • the answer modifier module replaces the tagged argument of the internal query representation statement with the at least one query result.
  • the answer modifier then tags the at least one query result to generate the at least one internal query result representation statement.
  • the tag used for tagging the at least one query result comprises a symbol normally an exclamation question mark, "!.
  • the final step for the extendable semantic query interpretation comprises the natural language generator (218) converting the at least one internal query result representation statement to a structured natural language result (114).
  • the natural language generator (218) generates the structured natural language result using the structured natural language user query and the at least one internal query result representation statement. More specifically, the natural language generator (218) utilizes the Part-of-Speech (POS) information from the parsed structured natural language user query to generate the structured natural language result by considering linguistic features such as tense, part of speech, vowels, consonants and the like.
  • POS Part-of-Speech
  • the at least one internal query result is provided to the user as the structured natural language result instead of merely a list of query results.
  • the structured natural language result may comprise of a plurality of structured natural language result.
  • the structured natural language result for the structured natural language query "What are the symptoms of essential hypertension?" is as follows: "The symptoms of essential hypertension are: Headache Visual_disturbance Vomiting”.
  • the structured natural language result will be either a union or intersection of the at least one internal query result obtained from the query module (214).
  • the union or intersection operation is dependant on the type of query as identified by the semantic query interpreter (206).
  • PROLOG statement for PROLOG query, it is possible to create rules that may infer new connections or relationships between concepts in the ontology knowledge base (220). These new connections or relationships may be used for searching the at least one query result from the ontology knowledge base (220). For example, in a given ontology knowledge base that has a "hasParent" relationship, it is possible to infer other relationships such as 'grandparent' and 'sibling'.

Abstract

A method (100) and a system (200) for an extendable semantic query interpretation, the system (200) comprises an intelligent word sense (202), a semantic query interpreter (206), a query transformer (208), a query enricher (210) and a natural language generator (218). The intelligent word sense (202) comprises means for receiving a structured natural language user query (102). The semantic query interpreter (206) comprises means for interpreting the structured natural language user query (104). The query transformer (208) comprises means for generating from the structured natural language user query, an internal query representation statement (106). The query enricher (210) comprises means for performing query enrichment (108) to generate at least one enriched internal query representation statement, generating from the at least one enriched internal query representation statement, a knowledge base compliant query (110) to provide for searching at least one query result from an ontology knowledge base (220) and generating at least one internal query result representation statement (112). The natural language generator (218) comprises means for converting the at least one internal query result representation statement to a structured natural language result (114).

Description

A METHOD AND SYSTEM FOR EXTENDABLE SEMANTIC QUERY INTERPRETATION
FIELD OF INVENTION
The present invention relates to a method and system for extendable semantic query interpretation.
BACKGROUND ART
The use of semantic query engines is becoming increasingly accepted as the preferred query processing means to provide meaning-based queries and retrieval of documents from a semantic based Web. Resources in the semantic based Web are semantically annotated in Resource Description Framework (RDF) triple format to provide for ontology knowledge bases.
However, to query an Resource Description Framework (RDF) triple or ontology knowledge base, a user is required to have knowledge of an SQL-like language such as SPARQL or an Artificial Intelligence (Al) language such as PROLOG or LISP. While this is considered an acceptable option for developers and the technically inclined, general users need not be required to have such in-depth knowledge to perform a query on the semantic based web.
Although SPARQL or Al language query format has been well defined to query ontology knowledge bases with its powerful inference engine, the query layer remains as the building block for developing semantic technology applications. This is because a user is required to know exactly how and where the information is stored in the knowledge base in order to perform a query on the same. This hinders the efforts of providing a simple and practical approach of performing a query on the semantic based web.
Furthermore, there has been a rising need to address the issue of meaningful query retrieval. In many occasions, it is observed that users have different ways of constructing a particular query that is required to yield identical query results. It is therefore essential to understand the purpose of a query, accurately interpret the same and provide the precise query results.
SUMMARY OF INVENTION
In one embodiment of the present invention is a method for extendable semantic query interpretation. The method comprises receiving a structured natural language user query, interpreting the structured natural language user query, generating from the structured natural language user query, an internal query representation statement, performing query enrichment to generate at least one enriched internal query representation statement, generating from the at least one enriched internal query representation statement, a knowledge base compliant query to provide for searching at least one query result from an ontology knowledge base, generating at least one internal query result representation statement and converting the at least one internal query result representation statement to a structured natural language result.
In another embodiment of the present invention is a system for extendable semantic query interpretation. The system comprises an intelligent word sense, a semantic query interpreter, a query transformer, a query enricher and a natural language generator. The intelligent word sense comprises means for receiving a structured natural language user query. The semantic query interpreter comprises means for interpreting the structured natural language user query. The query transformer comprises means for generating from the structured natural language user query, an internal query representation statement. The query enricher comprises means for performing query enrichment to generate at least one enriched internal query representation statement, generating from the at least one enriched internal query representation statement, a knowledge base compliant query to provide for searching at least one query result from an ontology knowledge base and generating at least one internal query result representation statement. The natural language generator comprises means for converting the at least one internal query result representation statement to a structured natural language result.
The present invention consists of several features and a combination of parts hereinafter fully described and illustrated in the accompanying drawings, it being understood that various changes in the details may be made without departing from the scope of the invention or sacrificing any of the advantages of the present invention.
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS
To further clarify various aspects of some embodiments of the present invention, a more particular description of the invention will be rendered by references to specific embodiments thereof, which are illustrated, in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail through the accompanying drawings in which:
FIG. 1 illustrates a flowchart of a method for an extendable semantic query interpretation.
FIG. 2 illustrates a block diagram of a system for an extendable semantic query interpretation.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The present invention relates to a method and system for extendable semantic query interpretation. Hereinafter, this specification will describe the present invention according to the preferred embodiments of the present invention. However, it is to be understood that limiting the description to the preferred embodiments of the invention is merely to facilitate discussion of the present invention and it is envisioned that those skilled in the art may devise various modifications and equivalents without departing from the scope of the appended claims.
The embodiments of the present invention relate to a method and system for extendable semantic query interpretation. The present invention provides an extendable semantic query interpretation to enable a user to query information stored in ontology knowledge bases, which are essentially knowledge bases in Resource Description Framework (RDF) format, without having in-depth understanding of the underlying knowledge base.
The embodiments of the present invention simplifies the process of generating valid query statements by creating knowledge base compliant query statements from the user's query which is received in the form of a structured natural language query. Further, the embodiments of the present invention also translate the query results obtained from the ontology knowledge base in the Resource Description Framework (RDF) format into a structured natural language result. Hence, this substantially eliminates the necessity for the user to explicitly understand the structure of the knowledge base to perform a query on the knowledge base. Additionally, the user is shielded from the complexity of the query processing while having the liberty of constructing complex queries. The embodiments of the present invention addresses the issue of meaningful query retrieval by employing query enrichment techniques. With these techniques, the user's query terms residing in the form of a structured natural language query are expanded by making reference to a lexical database such as Wordnet. Additionally, different word senses, synonyms, antonyms and the like are used to expand the user's query to enable comprehensive and meaningful query processing.
Reference is now collectively being made to FIGs. 1 and 2. FIG. 1 illustrates a flowchart of the method (100) for an extendable semantic query interpretation. FIG. 2 illustrates a block diagram of the system (200) for an extendable semantic query interpretation.
The method (100) for the extendable semantic query interpretation comprises receiving a structured natural language user query (102), interpreting the structured natural language user query (104), generating from the structured natural language user query, an internal query representation statement (106), performing query enrichment (108) to generate at least one enriched internal query representation statement, generating from the at least one enriched internal query representation statement, a knowledge base compliant query (110) to provide for searching at least one query result from an ontology knowledge base, generating at least one internal query result representation statement (112) and converting the at least one internal query result representation statement to a structured natural language result (114).
The system (200) for the extendable semantic query interpretation comprises an intelligent word sense (202), a semantic query interpreter (206), a query transformer
(208), a query enricher (210) and a natural language generator (218). These components operate in tandem with a parser (204), a query module (214), a lexical database (212) and a knowledge base (220) as a semantic query engine for the purpose of providing the extendable semantic query interpretation while the user performs a query on the knowledge base (220).
The intelligent word sense (202) is the interface module between the user and the semantic query engine. It has the purpose of preprocessing the query received from the user for the ontology knowledge base query. The intelligent word sense (202) receives the user's query in the form of a structured natural language user query (102).
In some embodiments of the present invention, the intelligent word sense (202) provides a plurality of auto-completion suggestions for the terms of the structured natural language user query as the same is being received from the user. The auto-completion suggestions are obtained from a plurality of terms available in the ontology knowledge base (220). The plurality of terms is defined by a list of concepts and relations in the ontology knowledge base (220).
This feature of auto-completion suggestions assists the user while typing the query statement in the structured natural language form by providing auto-completion suggestions from terms available in the ontology knowledge base (220) as an option to be adopted by the user. This allows the user to provide a query statement without having knowledge of the variety of complex terms available in the ontology knowledge base (220). Further more, this feature provides the basis of efficient and accurate query statements where the occurrences typographical errors are reduced. The query statement received as the structured natural language user query by the intelligent word sense (202) is then forwarded to the semantic query interpreter (206). In this module, the structured natural language user query (104) is interpreted before further processing by the query transformer (208).
The semantic query interpreter (206) comprises means for interpreting the structured natural language user query (104), which includes two steps. Firstly, using a parser (204) the structured natural language user query is parsed to identify a Part-of-Speech (POS) information of each term in the query statement. The parser (204) may comprise of a lexical parser. Upon parsing the structured natural language user query, the Part-of- Speech (POS) information is identified using a word-level tagging lookup.
The second step comprises identifying a type of query by matching the parsed structured natural language user query to a set of configurable query templates. From the parsed structured natural language user query, the sub-phrases are chunked together such that by matching the same with a set of configurable query templates, the type of query may be identified. The set of configurable query templates are a predefined set of possible identified query types.
The complexity of the structured natural language user query is determined at this stage that is whether the structured natural language user query is a union, intersection or negation type of query. The complexity of the user's query may be enhanced by the extensiveness of the set of configurable query templates.
For example, where the structured natural language user query is as follows: "What are the symptoms of essential hypertension", parsing the structured natural language user query would yield as follows: "What/WP are/VBP the/DT symptoms/NNS of/IN essential/JJ hypertension/NN"
From the parsed structured natural language user query, the sub-phrases such as "the symptoms" and "essential hypertension" are chunked together and then matched to the set of configurable query templates. Hence, in this example, the parsed structured natural language user query is classified as being the type of query as follows: "What/WP is/VBZ NP1 IN NP2", where NP1 would refer to "the symptoms" and NP2 would refer to "essential hypertension".
Once the type of query is identified to be within a particular template, the information pertaining to the parsed structured natural language user query and the type of query is forwarded to the query transformer (208) for further processing.
The query transformer (208) comprises a Subject-Predicate-Object transformer (208). The Subject-Predicate-Object transformer (208) comprises means for generating from the structured natural language user query, an internal query representation statement (106). The internal query representation statement is generated in the form of Resource Description Framework (RDF) triple format.
The Subject-Predicate-Object transformer (208) receives the parsed structured natural language user query and the type of query. Based on the Part-of-Speech (POS) information from the parsed structured natural language user query and the type of query of the same, the Subject-Predicate-Object transformer (208) extracts a subject part, a predicate part and an object part of the structured natural language user query. The subject part, the predicate part and the object part are used to generate at least one internal query representation statement in the Resource Description Framework (RDF) triple format. The subject part refers to the root term of the query statement and the predicate part refers to the term of the query statement connecting the subject part to the object part. Since Resource Description Framework (RDF) stores triples of data in the ontology knowledge base (220), a query may be performed on either one of the subject part, predicate part or the object part.
The Subject-Predicate-Object transformer (208) then tags an argument of the at least one internal query representation statement by tagging the subject part, the predicate part or the object part for searching the at least one query result from the ontology knowledge base (220). The tag used for tagging the argument comprises a symbol normally a question mark, "?".
For example, based on the parsed structured natural language user query as follows:
"What/WP are/VBP the/DT symptoms/NNS of/IN essential/JJ hypertension/NN" and the type of query as follows: "What/WP is/VBZ NP1 IN NP2", where NP1 would refer to "the symptoms" and NP2 would refer to "essential hypertension", obtained from the semantic query interpreter (206), the Subject-Predicate-Object transformer (208) generates the internal query representation statement comprising a Subject-Predicate-Object string as follows: "essential hypertension; symptoms; ?What" where "?What" is the tagged argument.
The internal query representation statement in the Resource Description Framework
(RDF) triple format comprising a Subject-Predicate-Object string is then forwarded to the query enricher (210). The query enricher (210) comprises means for performing query enrichment (108) to generate at least one enriched internal query representation statement, generating from the at least one enriched internal query representation statement, a knowledge base compliant query (110) to provide for searching at least one query result from an ontology knowledge base (220) and generating at least one internal query result representation statement (112).
The query enricher (210) performs query enrichment (108) to generate the at least one enriched internal query representation statement using syntactic matching and semantic matching using a lexical database (212). Query enrichment enables comprehensive and meaningful query processing.
Syntactic matching determines the degree or score of matching between terms in the query statement and terms in the ontology knowledge base (220) using syntactic distance techniques such as Tf-ldf, Jaro Winkler and the like. If the matching score is higher than a predetermined threshold score value, which is manually set, the terms are considered matched and vice-versa. In the case where the terms are matched, the original term in the query statement will be substituted with the term identified from the ontology knowledge base (220).
Semantic matching identifies terms in the ontology knowledge base (220) that have similar meaning with terms in the internal query representation statement. Semantic matching may have several level of matching including synonym matching, associative matching, "kind-of matching and "part-of matching. In the embodiments of the present invention, semantic matching is achieved by matching the terms in the internal query representation statement with terms in a lexical database (212) such as Wordnet from which synsets of a term are retrieved and utilized to generate the at least one enriched internal query representation statement. The synsets comprise nouns, verbs, adjectives and adverbs that are grouped into sets of cognitive synonyms, each expressing a distinct concept.
For example, query enrichment performed on the internal query representation statement as follows:
"essential hypertension; symptoms; ?What" using syntactic matching and semantic matching, would possibly yield in three enriched internal query representation statement as follows: "essential hypertension; has_Symptoms; ?What"
"High_Blood_Pressure; has_Symptoms; ?What"
"High_Blood_Pressure; hasjndication; ?What"
The query enricher (210) then generates from the at least one enriched internal query representation statement, the knowledge base compliant query (110) to provide for searching the at least one query result from the ontology knowledge base (220). This begins with the query enricher (210) combining the at least one enriched internal query representation statement generated from query enrichment (108). The at least one enriched internal query representation statement is combined to form a valid query statement. Using the valid query statement and based on the properties of the structured natural language user query, the query enricher (210) automatically identifies which knowledge base compliant query statement is to be generated, either a SPARQL query statement or a PROLOG query statement. This knowledge base compliant query statement is used for searching the at least one query result from the ontology knowledge base (220).
The query module (214) searches the at least one query result from the ontology knowledge base (220). The query module (214), specifically a Resource Description Framework (RDF) query module receives the knowledge base compliant query statement from the query enricher (210) in the form of Resource Description Framework (RDF) triple format for SPARQL query and PROLOG statement for PROLOG query. The same is then validated to generate a validated query statement. The query module (214) then uses the validated query statement to search for the at least one query result in the ontology knowledge base (220) residing in an appropriate knowledge base server. The query module (214) then returns the at least one query result from the ontology knowledge base (220) to the query enricher (210).
For example, where the internal query representation statement is as follows: "essential hypertension; symptoms; ?What", the query module (214) returns "headache", "visual_disturbance" and "vomiting" as the at least one query result obtained from the ontology knowledge base (220).
The query enricher (210) finally generates the at least one internal query result representation statement (112) by integrating the at least one query result obtained from the query module (214) with the internal query representation statement generated by the Subject-Predicate-Object transformer (208). This is performed by an answer modifier module of the query enricher (210). The answer modifier module replaces the tagged argument of the internal query representation statement with the at least one query result. The answer modifier then tags the at least one query result to generate the at least one internal query result representation statement. The tag used for tagging the at least one query result comprises a symbol normally an exclamation question mark, "!".
For example, using the internal query representation statement as follows:
"essential hypertension; symptoms; ?What", and the at least one query result as follows: "headache", "visual_disturbance" and "vomiting" the query enricher (210) generates three internal query result representation statement as follows:
"essential hypertension; symptoms; Iheadache"
"essential hypertension; symptoms; !visual_disturbance" "essential hypertension; symptoms; Ivomiting".
The final step for the extendable semantic query interpretation comprises the natural language generator (218) converting the at least one internal query result representation statement to a structured natural language result (114). The natural language generator (218) generates the structured natural language result using the structured natural language user query and the at least one internal query result representation statement. More specifically, the natural language generator (218) utilizes the Part-of-Speech (POS) information from the parsed structured natural language user query to generate the structured natural language result by considering linguistic features such as tense, part of speech, vowels, consonants and the like. The at least one internal query result is provided to the user as the structured natural language result instead of merely a list of query results. Based on the structured natural language user query and type of query, the structured natural language result may comprise of a plurality of structured natural language result.
For example, the structured natural language result for the structured natural language query "What are the symptoms of essential hypertension?", is as follows: "The symptoms of essential hypertension are: Headache Visual_disturbance Vomiting".
In another embodiment of the present invention, in the case where a plurality of internal query representation statements is generated, multiple arguments will be tagged. For this embodiment, the structured natural language result will be either a union or intersection of the at least one internal query result obtained from the query module (214). The union or intersection operation is dependant on the type of query as identified by the semantic query interpreter (206).
Additionally, by using PROLOG statement for PROLOG query, it is possible to create rules that may infer new connections or relationships between concepts in the ontology knowledge base (220). These new connections or relationships may be used for searching the at least one query result from the ontology knowledge base (220). For example, in a given ontology knowledge base that has a "hasParent" relationship, it is possible to infer other relationships such as 'grandparent' and 'sibling'.

Claims

1. A method (100) for an extendable semantic query interpretation, the method (100) comprises receiving a structured natural language user query (102); interpreting the structured natural language user query (104); generating from the structured natural language user query, an internal query representation statement (106); performing query enrichment (108) to generate at least one enriched internal query representation statement; generating from the at least one enriched internal query representation statement, a knowledge base compliant query (110) to provide for searching at least one query result from an ontology knowledge base; generating at least one internal query result representation statement (112); and converting the at least one internal query result representation statement to a structured natural language result (114).
2. The method (100) according to claim 1 , wherein the internal query representation statement and the at least one internal query result representation statement comprises Resource Description Framework (RDF) triple format.
3. The method (100) according to claim 1, wherein receiving the structured natural language user query (102) further comprises providing from a plurality of terms in the ontology knowledge base, a plurality of auto-completion suggestions for at least one term of the structured natural language user query.
4. The method (100) according to claim 1 , wherein interpreting the structured natural language user query (104) further comprises parsing the structured natural language user query to identify a Part-of- Speech (POS) information; and identifying a type of query by matching the parsed structured natural language user query to a set of configurable query templates.
5. The method (100) according to claim 1 and 4, wherein generating from the structured natural language user query, the internal query representation statement (106) further comprises extracting a subject part, a predicate part and an object part based on the Part-of-Speech (POS) information and the type of query to generate at least one internal query representation statement; and tagging an argument of the at least one internal query representation statement by tagging either the subject part, the predicate part or the object part for searching the at least one query result from the ontology knowledge base.
6. The method (100) according to claim 1, wherein performing query enrichment
(108) to generate the at least one enriched internal query representation statement query further comprises syntactic matching; and semantic matching using a lexical database.
7. The method (100) according to claim 1 , wherein generating from the at least one enriched internal query representation statement, the knowledge base compliant query (110) to provide for searching the at least one query result from the ontology knowledge base further comprises combining the at least one enriched internal query representation statement to form a valid query statement; and generating from the valid query statement, a SPARQL query statement or a PROLOG query statement.
8. The method (100) according to claim 1 and 5, wherein generating the at least one internal query result representation statement (112) further comprises replacing the tagged argument of the internal query representation statement with the at least one query result; and tagging the at least one query result to generate the at least one internal query result representation statement.
9. The method (100) according to claim 1, wherein converting the at least one internal query result representation statement to the structured natural language result (114) further comprises generating the structured natural language result using the structured natural language user query and the at least one internal query result representation statement.
10. A system (200) for an extendable semantic query interpretation, the system (200) comprises an intelligent word sense (202); a semantic query interpreter (206); a Subject-Predicate-Object transformer (208); a query enricher (210); and a natural language generator (218); characterized in that the intelligent word sense (202) comprises means for receiving a structured natural language user query; the semantic query interpreter (206) comprises means for interpreting the structured natural language user query; the query transformer (208) comprises means for generating from the structured natural language user query, an internal query representation statement; the query enricher (210) comprises means for performing query enrichment to generate at least one enriched internal query representation statement; generating from the at least one enriched internal query representation statement, a knowledge base compliant query to provide for searching at least one query result from an ontology knowledge base (220); and generating at least one internal query result representation statement; the natural language generator (218) comprises means for converting the at least one internal query result representation statement to a structured natural language result.
11. The system (200) according to claim 10, wherein the internal query representation statement and the at least one internal query result representation statement comprises Resource Description Framework (RDF) triple format.
12. The system (200) according to claim 10, wherein the intelligent word sense (202) further comprises means for providing from a plurality of terms in the ontology knowledge base (220), a plurality of auto-completion suggestions for at least one term of the structured natural language user query.
13. The system (200) according to claim 10, wherein the semantic query interpreter
(206) further comprises means for parsing the structured natural language user query to identify a Part-of-
Speech (POS) information using a parser (204); and identifying a type of query by matching the parsed structured natural language user query to a set of configurable query templates.
14. The system (200) according to claim 10 and 13, wherein the query transformer (208) comprises a Subject-Predicate-Object transformer (208) characterized in that the Subject-Predicate-Object transformer (208) comprises means for extracting a subject part, a predicate part and an object part based on the
Part-of-Speech (POS) information and the type of query to generate at least one internal query representation statement; and tagging an argument of the at least one internal query representation statement by tagging either the subject part, the predicate part or the object part for searching the at least one query result from the ontology knowledge base (220).
15. The system (200) according to claim 10 and 14, wherein the query enricher (210) further comprises means for syntactic matching; semantic matching using a lexical database (212); combining the at least one enriched internal query representation statement to form a valid query statement; generating from the valid query statement, a SPARQL query statement or a PROLOG query statement; replacing the tagged argument of the internal query representation statement with the at least one query result; and tagging the at least one query result to generate the at least one internal query result representation statement.
16. The system (200) according to claim 10, wherein the natural language generator
(218) further comprises means for generating the structured natural language result using the structured natural language user query and the at least one internal query result representation statement.
17. The system (200) according to claim 10, wherein the system (200) further comprises a query module (214) for searching the at least one query result from the ontology knowledge base (220).
PCT/MY2010/000084 2009-05-25 2010-05-25 A method and system for extendable semantic query interpretation WO2010137940A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
MYPI20092121A MY168837A (en) 2009-05-25 2009-05-25 A method and system for extendable semantic query interpretation
MYPI20092121 2009-05-25

Publications (1)

Publication Number Publication Date
WO2010137940A1 true WO2010137940A1 (en) 2010-12-02

Family

ID=43222894

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/MY2010/000084 WO2010137940A1 (en) 2009-05-25 2010-05-25 A method and system for extendable semantic query interpretation

Country Status (2)

Country Link
MY (1) MY168837A (en)
WO (1) WO2010137940A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150317572A1 (en) * 2014-05-05 2015-11-05 Sap Ag On-Demand Enrichment of Business Data
WO2016050066A1 (en) * 2014-09-29 2016-04-07 华为技术有限公司 Method and device for parsing interrogative sentence in knowledge base
EP3016006A1 (en) * 2014-10-30 2016-05-04 Fujitsu Limited Information presentation program, information presentation method, and information presentation apparatus
CN111639254A (en) * 2020-05-28 2020-09-08 华中科技大学 System and method for generating SPARQL query statement in medical field
US20210382923A1 (en) * 2020-06-04 2021-12-09 Louis Rudolph Gragnani Systems and methods of question answering against system of record utilizing natural language interpretation
US20240004907A1 (en) * 2022-06-30 2024-01-04 International Business Machines Corporation Knowledge graph question answering with neural machine translation

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5197005A (en) * 1989-05-01 1993-03-23 Intelligent Business Systems Database retrieval system having a natural language interface
US5963940A (en) * 1995-08-16 1999-10-05 Syracuse University Natural language information retrieval system and method
US20020152202A1 (en) * 2000-08-30 2002-10-17 Perro David J. Method and system for retrieving information using natural language queries
US20030069880A1 (en) * 2001-09-24 2003-04-10 Ask Jeeves, Inc. Natural language query processing
US20050289124A1 (en) * 2004-06-29 2005-12-29 Matthias Kaiser Systems and methods for processing natural language queries
US20060036512A1 (en) * 2002-10-01 2006-02-16 Ims Software Services, Ltd. System and method for interpreting sales data through the use of natural language questions
US20060100995A1 (en) * 2004-10-26 2006-05-11 International Business Machines Corporation E-mail based Semantic Web collaboration and annotation
US20070016563A1 (en) * 2005-05-16 2007-01-18 Nosa Omoigui Information nervous system
US20070073680A1 (en) * 2005-09-29 2007-03-29 Takahiro Kawamura Semantic analysis apparatus, semantic analysis method and semantic analysis program
WO2008100849A2 (en) * 2007-02-15 2008-08-21 Cycorp, Inc. Semantics-based method and system for document analysis

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5197005A (en) * 1989-05-01 1993-03-23 Intelligent Business Systems Database retrieval system having a natural language interface
US5963940A (en) * 1995-08-16 1999-10-05 Syracuse University Natural language information retrieval system and method
US20020152202A1 (en) * 2000-08-30 2002-10-17 Perro David J. Method and system for retrieving information using natural language queries
US20030069880A1 (en) * 2001-09-24 2003-04-10 Ask Jeeves, Inc. Natural language query processing
US20060036512A1 (en) * 2002-10-01 2006-02-16 Ims Software Services, Ltd. System and method for interpreting sales data through the use of natural language questions
US20050289124A1 (en) * 2004-06-29 2005-12-29 Matthias Kaiser Systems and methods for processing natural language queries
US20060100995A1 (en) * 2004-10-26 2006-05-11 International Business Machines Corporation E-mail based Semantic Web collaboration and annotation
US20070016563A1 (en) * 2005-05-16 2007-01-18 Nosa Omoigui Information nervous system
US20070073680A1 (en) * 2005-09-29 2007-03-29 Takahiro Kawamura Semantic analysis apparatus, semantic analysis method and semantic analysis program
WO2008100849A2 (en) * 2007-02-15 2008-08-21 Cycorp, Inc. Semantics-based method and system for document analysis

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"11 th International Conference on Applications of Natural Language to Information Systems, NLDB,2006", vol. 3999, article TOMASSEN S. ET AL.: "Document Space Adapted Ontology: Application in Query Enrichment", pages: 46 - 57, XP019034203 *
BROEKSTRA J. ET AL.: "Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema", AIDMINISTRATOR NEDERLAND B.V., 1 October 2001 (2001-10-01), Retrieved from the Internet <URL:http://citeseerx.ist.psu.edu/yiewdoc/download?doi=10.1.1.103.9588&rep=rep1&tYpe=pdf> [retrieved on 20100707] *
LIANG J. ET AL.: "Ontology-Based Natural Language Query processing for the Biological Domain", PROCEEDINGS OF THE BIONLP WORKSHOP ON LINKING NATURAL LANGUAGE PROCESSING AND BIOLOGY AT HLT-NAACL, 2006, pages 9 - 16, Retrieved from the Internet <URL:http://www.aclweb.oranthology/W/W06/W06-3302.pdf> [retrieved on 20100707] *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150317572A1 (en) * 2014-05-05 2015-11-05 Sap Ag On-Demand Enrichment of Business Data
WO2016050066A1 (en) * 2014-09-29 2016-04-07 华为技术有限公司 Method and device for parsing interrogative sentence in knowledge base
US10706084B2 (en) 2014-09-29 2020-07-07 Huawei Technologies Co., Ltd. Method and device for parsing question in knowledge base
EP3016006A1 (en) * 2014-10-30 2016-05-04 Fujitsu Limited Information presentation program, information presentation method, and information presentation apparatus
US10360243B2 (en) 2014-10-30 2019-07-23 Fujitsu Limited Storage medium, information presentation method, and information presentation apparatus
CN111639254A (en) * 2020-05-28 2020-09-08 华中科技大学 System and method for generating SPARQL query statement in medical field
US20210382923A1 (en) * 2020-06-04 2021-12-09 Louis Rudolph Gragnani Systems and methods of question answering against system of record utilizing natural language interpretation
US20240004907A1 (en) * 2022-06-30 2024-01-04 International Business Machines Corporation Knowledge graph question answering with neural machine translation

Also Published As

Publication number Publication date
MY168837A (en) 2018-12-04

Similar Documents

Publication Publication Date Title
Affolter et al. A comparative survey of recent natural language interfaces for databases
Diefenbach et al. Core techniques of question answering systems over knowledge bases: a survey
US9448995B2 (en) Method and device for performing natural language searches
CN107291687B (en) Chinese unsupervised open type entity relation extraction method based on dependency semantics
US11080295B2 (en) Collecting, organizing, and searching knowledge about a dataset
US8156053B2 (en) Automated tagging of documents
US9740685B2 (en) Generation of natural language processing model for an information domain
US20110301941A1 (en) Natural language processing method and system
US20070088734A1 (en) System and method for exploiting semantic annotations in executing keyword queries over a collection of text documents
US11914627B1 (en) Parsing natural language queries without retraining
US20110078192A1 (en) Inferring lexical answer types of questions from context
KR20110020462A (en) System and method for intelligent searching and question-answering
AlAgha et al. AR2SPARQL: an arabic natural language interface for the semantic web
Sahu et al. Prashnottar: a Hindi question answering system
JP2023507286A (en) Automatic creation of schema annotation files for converting natural language queries to structured query language
WO2010137940A1 (en) A method and system for extendable semantic query interpretation
Liu et al. Question answering over knowledge bases
CN112507089A (en) Intelligent question-answering engine based on knowledge graph and implementation method thereof
Song et al. Semantic query graph based SPARQL generation from natural language questions
Hosseini Pozveh et al. FNLP‐ONT: A feasible ontology for improving NLP tasks in Persian
Iqbal et al. A Negation Query Engine for Complex Query Transformations
Wohlgenannt Learning Ontology Relations by Combining Corpus-Based Techniques and Reasoning on Data from Semantic Web Sources
Puerto et al. Automatic learning of ontologies for the Semantic Web: Experiment lexical learning
Kokare et al. A survey of natural language query builder interface for structured databases using dependency parsing
Vickers Ontology-based free-form query processing for the semantic web

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10780844

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10780844

Country of ref document: EP

Kind code of ref document: A1