WO2012091541A1 - A semantic web constructor system and a method thereof - Google Patents
A semantic web constructor system and a method thereof Download PDFInfo
- Publication number
- WO2012091541A1 WO2012091541A1 PCT/MY2011/000153 MY2011000153W WO2012091541A1 WO 2012091541 A1 WO2012091541 A1 WO 2012091541A1 MY 2011000153 W MY2011000153 W MY 2011000153W WO 2012091541 A1 WO2012091541 A1 WO 2012091541A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- web
- websites
- semantic
- database
- crawlers
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 27
- 238000013501 data transformation Methods 0.000 claims description 6
- 241000239290 Araneae Species 0.000 claims description 5
- 230000009193 crawling Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 2
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000004801 process automation Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Definitions
- the present invention relates to a semantic web constructor system supported by a plurality of web crawlers in a World Wide Web, upon declaring a plurality of trustworthy websites in order to query a semantic browser for a plurality of websites that are page- ranked.
- Keyword based search engines usually pull out websites with occurrence of one or more of the keywords specified in the search.
- searching based on keywords tend to produce large numbers of hits which burdens the user to ensure relevance of results. Users end up spending large amount of time going through the resulting websites to identify a relevant document. This becomes a huge drain on the user.
- 6,044,375 describes a method of automatically extracting metadata from a document through a neural network to generate metadata guesses including word guesses, compound guesses and document guesses along with confidence factors associated with the guesses indicating the likelihood that each of the guesses is correct.
- the described invention does not look at the information on the website itself and does not extract data from the website itself.
- a semantic web constructor system supported by a plurality of web crawlers in a World Wide Web upon declaring a plurality of trustworthy websites, characterized in that, the system includes at least one web crawler controller engageable to manage the plurality of web crawlers, a semantic web database connectable to the plurality of web crawlers and a plurality of data building editors connectable to the at least one web crawler controller wherein a semantic browser is further connectable to the semantic web database to receive at least one natural language query from at least one user.
- a method of constructing a semantic web supported by a plurality of web crawlers in a World Wide Web upon declaring a plurality of trustworthy websites characterized in that, the method includes the steps of crawling the web to select unprocessed websites, selecting websites based on trustworthiness calculated by comparing a trustworthiness numeric value to a predetermined threshold value, extracting a plurality of text from the selected websites, tokenizing the extracted plurality of text, applying a predetermined set of data transformation rules to the tokenized extracted plurality of text, converting the tokenized extracted plurality of text to metadata and storing the metadata in semantic web database.
- a method of querying a semantic web supported by a plurality of web crawlers in a World Wide Web upon declaring a plurality of trustworthy websites characterized in that, the method includes the steps of receiving a query from a user, parsing the query into an internal representation format, searching the semantic web database using the internal representation format, ranking a plurality of websites based on trustworthiness and returning the ranked plurality of websites to the user.
- Figure 1 is a block diagram illustrating architecture of a preferred embodiment of a semantic web constructor system
- Figure 2 is flowchart illustrating a preferred embodiment of the steps of constructing a semantic web supported by a plurality of web crawlers
- Figure 3 is a diagram showing an example of a natural language query using the preferred embodiment of a semantic web constructor method and system.
- the present invention relates to a semantic web constructor system supported by a plurality of web crawlers in a World Wide Web, upon declaring a plurality of trustworthy websites in order to query a semantic browser for a plurality of websites that are page- ranked.
- this specification will describe the present invention according to the preferred embodiment of the present invention. However, it is to be understood that limiting the description to the preferred embodiment of the invention is merely to facilitate discussion of the present invention and it is envisioned that those skilled in the art may devise various modifications and equivalents without departing from the scope of the appended claims.
- the present invention provides a semantic web constructor system (100) supported by a plurality of web crawlers (1 12) in a World Wide Web upon declaring a plurality of trustworthy websites as seen in Figure 1.
- the system (100) includes at least one web crawler controller (110) engage able to manage the plurality of web crawlers (112).
- a semantic web database (116) is further connectable to the plurality of web crawlers (112).
- a plurality of data building editors (122, 124, 126) is connectable to the at least one web crawler controller (110) wherein a semantic browser (120) is further connectable to receive at least one natural language query from at least one user.
- a trust engine (1 8) is connectable to the at least one web crawler controller (110).
- a plurality of data building editors (122, 124, 126) further include a website list editor (122), a rule editor (124) and a concept editor (126).
- the plurality of data building editors may also include other editors of a similar nature in other embodiments of the system (100).
- the system (100) In order for the system (100) to be functional, the system (100) must first be provided with a plurality of relevant concepts, a predetermined set of transformation rules and an initial set of websites where relevant information associated with the relevant concepts are found.
- the concept editor (126) is used to specify a plurality of relevant concepts, properties and relationships in a subject area of interest.
- the concept editor (126) uses external resources such as Wordnet to expand on a specified concept.
- the rule editor (124) is used for defining data transformation rules.
- a website list editor (122) is used to specify an initial set of websites where data related to the subject area of interest may be found.
- An example of a web crawler (112) is a spider as used in a search engine to retrieve information from the World Wide Web. Accordingly, an example of a web crawler controller (110) is a spider controller.
- the web crawler controller (110) as seen in Figure 1 filters websites to ensure that trustworthy websites are processed first by the plurality of web crawlers (112). Each web crawler (1 12) is assigned to a different website. Websites that are to be processed by the plurality of web crawlers (112) are delegated in a balanced manner by the web crawler controller (110).
- the web crawler controller (110) maintains a plurality of websites that are related to the subject area of interest. Initially, the websites are provided by the user.
- An intermediary database (114) is connectable to the plurality of web crawlers (112) and the semantic web database (116).
- An example of the intermediary database (114) is a merger database.
- a method of constructing a semantic web supported by a plurality of web crawlers (1 12) in a World Wide Web upon declaring a plurality of trustworthy websites is described herein as seen in Figure 1. The method includes crawling the World Wide Web to select unprocessed websites using a plurality of web crawlers ( 10) such as spiders.
- each concept provided to the concept editor (126) is expanded to include a plurality of similar concepts. This step is carried out by means of an external lexical database such as Wordnet.
- the trust engine (118) determines trustworthiness of each declared website. Websites are selected based on trustworthiness calculated by comparing a trustworthiness numeric value to a predetermined threshold value. For example, the trustworthiness value is a numeric value from 0 to 100.
- the trust engine (118) accepts websites with a trustworthiness value that is higher than the predetermined threshold value.
- a trusted list of websites is created as shown in Figure 1.
- a plurality of text is extracted from the selected trusted list of websites. It is to be understood that the plurality of text may also be extracted from documents found on the World Wide Web. Further, the extracted plurality of text is tokenized and a predetermined set of data transformation rules is applied to the tokenized extracted plurality of text. The tokenized extracted plurality of text is converted to metadata, such as Resource Description Framework (RDF) data and the RDF data is stored in a semantic web database (116) as seen in Figure 2. Identification of new Uniform Resource Locators (URLs) on documents as found on the World Wide Web is carried out continuously. Upon encountering unprocessed URLs, the plurality of web crawlers (110) will pass the unprocessed URLs to the web crawler controller (110) for processing.
- RDF Resource Description Framework
- Each web crawler (110) processes allocated web pages in a non-intrusive manner.
- Figure 2 shows a method performed by a web crawler (112) to process a web page.
- a document or web page is classified to determine relevance of each document.
- Sentence extraction is performed by identifying a targeted sentence from the document or web page and extracting the sentence.
- the extracted sentence is then tokenized in order to apply a set of predetermined data transformation rules to the tokenized extracted sentence.
- the tokenized sentence is then converted to metadata such as RDF data.
- the RDF data is then stored in a web crawler's local semantic web crawler database (201 ).
- an intermediary database such as a merger database retrieves RDF data collected by each local semantic web crawler database (201 ) and merges the RDF data with the semantic web database (116).
- the system (100) Upon merging all RDF data into the semantic web database, the system (100) is ready to respond to natural language or structured natural language query processing.
- a method of querying a semantic web supported by a plurality of web crawlers (112) in a World Wide Web upon declaring a plurality of trustworthy websites is now described as seen in Figure 1.
- the method includes the steps of receiving a query from a user and parsing the query into an internal representation format.
- a semantic browser (120) is used to receive a query from the user and a query parser is used to parse queries entered by users.
- the semantic web database (116) is searched using the internal representation format.
- the internal representation format is then passed to a semantic search engine that performs the search.
- the semantic web database (116) responds with a set of answers to the query by the user as well as references to a plurality of websites where the set of answers may be found.
- the plurality of websites is ranked based on trustworthiness by a page-ranker.
- the ranked plurality of websites is then returned to the user who issued the query.
- FIG. 3 An example of the described embodiment of system (100) and method is seen Figure 3.
- a typical website called "Bali Thai” is identified.
- the plurality of web crawlers (1 12) then process the website and construct RDF data as seen in Figure 3.
- a user may issue a query as follows:
- QUERY Find me a Thai restaurant that is Halal, not too expensive, no alcohol served, near Jurong Point Shopping Centre in Singapore.
- the system (100) then responds with a reference to the website entitled "Bali Thai". It is to be appreciated that it is not possible to receive meaningful results such as those seen in this embodiment in a conventional search engine by issuing a natural language query of this nature.
- the meaningful results in this embodiment are specific websites that contains information as searched by the user.
- Each individual web crawler (1 12) processes data extraction and data transformation of websites locally by using concepts and rules. Further, semantic web databases that are local, are created and then merged with global semantic web databases.
- This architecture allows the system (100) to be inherently scalable, which is a critical requirement for creating a semantic web database for the World Wide Web. Ontology based concepts and rules are used to automatically extract data of interest from websites.
- the described invention can be applied, but not restricted to, create a knowledge database for any domain from any data source that is unstructured, semantically unintelligible documents to transform said documents into a computer-understandable, semantic database to perform knowledge discovery and semantic information search efficiently and accurately.
Abstract
A semantic web constructor system (100) supported by a plurality of web crawlers (112) in a World Wide Web upon declaring a plurality of trustworthy websites is provided, characterized in that, the system (100) includes at least one web crawler controller (110) engagable to manage the plurality of web crawlers (112), a semantic web database (116) connectable to the plurality of web crawlers (112) and a plurality of data building editors (122, 124, 126) connectable to the at least one web crawler controller (110) wherein a semantic browser (120) is further connectable to the semantic web database (116) to receive at least one natural language query from at least one user.
Description
A SEMANTIC WEB CONSTRUCTOR SYSTEM AND A METHOD THEREOF
FIELD OF INVENTION The present invention relates to a semantic web constructor system supported by a plurality of web crawlers in a World Wide Web, upon declaring a plurality of trustworthy websites in order to query a semantic browser for a plurality of websites that are page- ranked. BACKGROUND OF INVENTION
Most contemporary web search engines function based on keyword searches. Keyword based search engines usually pull out websites with occurrence of one or more of the keywords specified in the search. However, searching based on keywords tend to produce large numbers of hits which burdens the user to ensure relevance of results. Users end up spending large amount of time going through the resulting websites to identify a relevant document. This becomes a huge drain on the user.
In addition to the problem of having a large volume of hits, the user also needs to internalize the relevant material in order to make it applicable and utilize it for the problem that is being solved.
With the current web browser technology, a user is not able to issue a natural language query as a keyword based search engine is not able to understand the context of the natural language query and is unable to search for relevant information effectively from the web.
Furthermore, the web is made up of a large amount and highly distributed information that may come from questionable sources. Therefore, the user must be able to discern a reliable website from an unreliable one. Most documents available on the World Wide Web are not semantically enabled for computers to interpret and to process beyond simple keyword search. Human input is required to read the document and to interpret the information. This is a very slow and inefficient process made painfully obvious with the current information explosion online. U.S. 6,044,375 describes a method of automatically extracting metadata from a document through a neural network to generate metadata guesses including word guesses, compound guesses and document guesses along with confidence factors associated with the guesses indicating the likelihood that each of the guesses is correct. However, the described invention does not look at the information on the website itself and does not extract data from the website itself.
Therefore, there is a need for a solution in order to search for relevant data, for users who are looking for specific and relevant data without having to spend too long manually reading through long lists of results.
SUMMARY OF INVENTION
Accordingly there is provided a semantic web constructor system supported by a plurality of web crawlers in a World Wide Web upon declaring a plurality of trustworthy websites, characterized in that, the system includes at least one web crawler controller engageable to manage the plurality of web crawlers, a semantic web database connectable to the plurality of web crawlers and a plurality of data building editors connectable to the at least one web crawler controller wherein a semantic browser is further connectable to the semantic web database to receive at least one natural language query from at least one user.
There is also provided a method of constructing a semantic web supported by a plurality of web crawlers in a World Wide Web upon declaring a plurality of trustworthy websites, characterized in that, the method includes the steps of crawling the web to select unprocessed websites, selecting websites based on trustworthiness calculated by comparing a trustworthiness numeric value to a predetermined threshold value, extracting a plurality of text from the selected websites, tokenizing the extracted plurality of text, applying a predetermined set of data transformation rules to the tokenized extracted plurality of text, converting the tokenized extracted plurality of text to metadata and storing the metadata in semantic web database.
There is further provided a method of querying a semantic web supported by a plurality of web crawlers in a World Wide Web upon declaring a plurality of trustworthy websites, characterized in that, the method includes the steps of receiving a query from a user, parsing the query into an internal representation format, searching the semantic web database using the internal representation format, ranking a plurality of websites based on trustworthiness and returning the ranked plurality of websites to the user.
The present invention consists of several novel features and a combination of parts hereinafter fully described and illustrated in the accompanying description and drawings, it being understood that various changes in the details may be made without departing from the scope of the invention or sacrificing any of the advantages of the present invention.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will be fully understood from the detailed description given herein below and the accompanying drawings which are given by way of illustration only, and thus are not limitative of the present invention, wherein:
Figure 1 is a block diagram illustrating architecture of a preferred embodiment of a semantic web constructor system;
Figure 2 is flowchart illustrating a preferred embodiment of the steps of constructing a semantic web supported by a plurality of web crawlers; and Figure 3 is a diagram showing an example of a natural language query using the preferred embodiment of a semantic web constructor method and system.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
The present invention relates to a semantic web constructor system supported by a plurality of web crawlers in a World Wide Web, upon declaring a plurality of trustworthy websites in order to query a semantic browser for a plurality of websites that are page- ranked. Hereinafter, this specification will describe the present invention according to the preferred embodiment of the present invention. However, it is to be understood that limiting the description to the preferred embodiment of the invention is merely to facilitate discussion of the present invention and it is envisioned that those skilled in the art may devise various modifications and equivalents without departing from the scope of the appended claims.
The following detailed description of the preferred embodiment will now be described in accordance with the attached drawings, either individually or in combination.
The present invention provides a semantic web constructor system (100) supported by a plurality of web crawlers (1 12) in a World Wide Web upon declaring a plurality of trustworthy websites as seen in Figure 1. The system (100) includes at least one web crawler controller (110) engage able to manage the plurality of web crawlers (112). A semantic web database (116) is further connectable to the plurality of web crawlers (112). A plurality of data building editors (122, 124, 126) is connectable to the at least one web crawler controller (110) wherein a semantic browser (120) is further connectable to receive at least one natural language query from at least one user. A trust engine (1 8) is connectable to the at least one web crawler controller (110).
A plurality of data building editors (122, 124, 126) further include a website list editor (122), a rule editor (124) and a concept editor (126). However, it is to be appreciated by
one skilled in the art that the plurality of data building editors may also include other editors of a similar nature in other embodiments of the system (100). In order for the system (100) to be functional, the system (100) must first be provided with a plurality of relevant concepts, a predetermined set of transformation rules and an initial set of websites where relevant information associated with the relevant concepts are found. The concept editor (126) is used to specify a plurality of relevant concepts, properties and relationships in a subject area of interest. The concept editor (126) uses external resources such as Wordnet to expand on a specified concept. The rule editor (124) is used for defining data transformation rules. A website list editor (122) is used to specify an initial set of websites where data related to the subject area of interest may be found.
An example of a web crawler (112) is a spider as used in a search engine to retrieve information from the World Wide Web. Accordingly, an example of a web crawler controller (110) is a spider controller. The web crawler controller (110) as seen in Figure 1 filters websites to ensure that trustworthy websites are processed first by the plurality of web crawlers (112). Each web crawler (1 12) is assigned to a different website. Websites that are to be processed by the plurality of web crawlers (112) are delegated in a balanced manner by the web crawler controller (110). The web crawler controller (110) maintains a plurality of websites that are related to the subject area of interest. Initially, the websites are provided by the user. However, the plurality of websites increases as the web crawlers (112) encounter newly linked Uniform Resource Locators (URLs) on websites that are processed. An intermediary database (114) is connectable to the plurality of web crawlers (112) and the semantic web database (116). An example of the intermediary database (114) is a merger database.
A method of constructing a semantic web supported by a plurality of web crawlers (1 12) in a World Wide Web upon declaring a plurality of trustworthy websites is described herein as seen in Figure 1. The method includes crawling the World Wide Web to select unprocessed websites using a plurality of web crawlers ( 10) such as spiders. An expansion of the initial plurality of relevant concepts is first performed, wherein each concept provided to the concept editor (126) is expanded to include a plurality of similar concepts. This step is carried out by means of an external lexical database such as Wordnet. Upon declaring the initial set of websites, the trust engine (118) determines trustworthiness of each declared website. Websites are selected based on trustworthiness calculated by comparing a trustworthiness numeric value to a predetermined threshold value. For example, the trustworthiness value is a numeric value from 0 to 100. The trust engine (118) accepts websites with a trustworthiness value that is higher than the predetermined threshold value. A trusted list of websites is created as shown in Figure 1.
A plurality of text is extracted from the selected trusted list of websites. It is to be understood that the plurality of text may also be extracted from documents found on the World Wide Web. Further, the extracted plurality of text is tokenized and a predetermined set of data transformation rules is applied to the tokenized extracted plurality of text. The tokenized extracted plurality of text is converted to metadata, such as Resource Description Framework (RDF) data and the RDF data is stored in a semantic web database (116) as seen in Figure 2. Identification of new Uniform Resource Locators (URLs) on documents as found on the World Wide Web is carried out continuously. Upon encountering unprocessed URLs,
the plurality of web crawlers (110) will pass the unprocessed URLs to the web crawler controller (110) for processing.
Each web crawler (110) processes allocated web pages in a non-intrusive manner. Figure 2 shows a method performed by a web crawler (112) to process a web page. A document or web page is classified to determine relevance of each document. Sentence extraction is performed by identifying a targeted sentence from the document or web page and extracting the sentence. The extracted sentence is then tokenized in order to apply a set of predetermined data transformation rules to the tokenized extracted sentence. The tokenized sentence is then converted to metadata such as RDF data. The RDF data is then stored in a web crawler's local semantic web crawler database (201 ).
In an event where there are no more URLs to be assigned to any web crawlers ( 12), an intermediary database (114) such as a merger database retrieves RDF data collected by each local semantic web crawler database (201 ) and merges the RDF data with the semantic web database (116).
Upon merging all RDF data into the semantic web database, the system (100) is ready to respond to natural language or structured natural language query processing. A method of querying a semantic web supported by a plurality of web crawlers (112) in a World Wide Web upon declaring a plurality of trustworthy websites is now described as seen in Figure 1. The method includes the steps of receiving a query from a user and parsing the query into an internal representation format. A semantic browser (120) is used to receive a query from the user and a query parser is used to parse queries entered by users. The semantic web database (116) is searched using the internal representation format. The internal representation format is then passed to a semantic
search engine that performs the search. The semantic web database (116) responds with a set of answers to the query by the user as well as references to a plurality of websites where the set of answers may be found. The plurality of websites is ranked based on trustworthiness by a page-ranker. The ranked plurality of websites is then returned to the user who issued the query.
An example of the described embodiment of system (100) and method is seen Figure 3. A typical website called "Bali Thai" is identified. The plurality of web crawlers (1 12) then process the website and construct RDF data as seen in Figure 3. A user may issue a query as follows:
QUERY: Find me a Thai restaurant that is Halal, not too expensive, no alcohol served, near Jurong Point Shopping Centre in Singapore.
The system (100) then responds with a reference to the website entitled "Bali Thai". It is to be appreciated that it is not possible to receive meaningful results such as those seen in this embodiment in a conventional search engine by issuing a natural language query of this nature. The meaningful results in this embodiment are specific websites that contains information as searched by the user. Each individual web crawler (1 12) processes data extraction and data transformation of websites locally by using concepts and rules. Further, semantic web databases that are local, are created and then merged with global semantic web databases. This architecture allows the system (100) to be inherently scalable, which is a critical requirement for creating a semantic web database for the World Wide Web. Ontology based concepts and rules are used to automatically extract data of interest from websites. It is to be understood that the usage of websites in this description includes web pages, documents, messages and other information in a text format. Formats of all
concepts and rules used are compatible with World Wide Web Consortium (W3C) ontology standards in order to be created or edited using commercially available ontology editing tools or with a specialized editor. The described method and system can be used to transform data of interest into RDF knowledge representation to enable downstream semantic search, knowledge discovery and process automation. Therefore, the described system and method is able to transform natural language queries into appropriate internal semantic query. This produces results that directly answer the query rather than relying on users to check each search result for relevancy. The described invention can be applied, but not restricted to, create a knowledge database for any domain from any data source that is unstructured, semantically unintelligible documents to transform said documents into a computer-understandable, semantic database to perform knowledge discovery and semantic information search efficiently and accurately.
Claims
A semantic web constructor system (100) supported by a plurality of web crawlers (1 12) in a World Wide Web upon declaring a plurality of trustworthy websites, characterized in that, the system (100) includes:
i. at least one web crawler controller (1 10) engage able to manage the plurality of web crawlers (1 12);
ii. a semantic web database (1 16) connectable to the plurality of web crawlers (1 12); and
iii. a plurality of data building editors (122, 124, 126) connectable to the at least one web crawler controller (110)
wherein a semantic browser (120) is further connectable to the semantic web database (1 16) to receive at least one natural language query from at least one user.
The system (100) as claimed in claim 1 , wherein a trust engine (1 18) is connectable to the at least one web crawler controller (1 10).
The system (100) as claimed in claim 1 , wherein the plurality of data building editors (122,124, 126) further include a website list editor (122), a rule editor (124) and a concept editor (126).
The system (100) as claimed in claim 1 , wherein the plurality of web crawlers (1 12) is a plurality of spiders.
The system (100) as claimed in claim 1 , wherein the at least one web crawler controller (1 10) is at least one spider controller.
The system as claimed in claim 1 , wherein an intermediary database (114) is connectable to the plurality of web crawlers (1 12) and the semantic web database (1 16).
A method of constructing a semantic web supported by a plurality of web crawlers (112) in a World Wide Web upon declaring a plurality of trustworthy websites, characterized in that, the method includes the steps of:
i. crawling the web to select unprocessed websites;
ii. selecting websites based on trustworthiness calculated by comparing a trustworthiness numeric value to a predetermined threshold value;
iii. extracting a plurality of text from the selected websites;
iv. tokenizing the extracted plurality of text;
v. applying a predetermined set of data transformation rules to the tokenized extracted plurality of text;
vi. converting the tokenized extracted plurality of text to metadata; and vii. storing the metadata in semantic web database (1 6).
The method as claimed in claim 8, wherein the metadata used is Resource Description Framework (RDF) data.
A method of querying a semantic web supported by a plurality of web crawlers (112) in a World Wide Web upon declaring a plurality of trustworthy websites, characterized in that, the method includes the steps of:
i. receiving a query from a user;
ii. parsing the query into an internal representation format;
iii. searching the semantic web database (1 16) using the internal representation format;
ranking a plurality of websites based on trustworthiness; and returning the ranked plurality of websites to the user.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
MYPI2010006268 | 2010-12-28 | ||
MYPI2010006268A MY176053A (en) | 2010-12-28 | 2010-12-28 | A semantic web constructor system and a method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012091541A1 true WO2012091541A1 (en) | 2012-07-05 |
Family
ID=46383351
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/MY2011/000153 WO2012091541A1 (en) | 2010-12-28 | 2011-06-24 | A semantic web constructor system and a method thereof |
Country Status (2)
Country | Link |
---|---|
MY (1) | MY176053A (en) |
WO (1) | WO2012091541A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106156305A (en) * | 2016-06-30 | 2016-11-23 | 北京奇虎科技有限公司 | Represent the method and device of propelling data |
US10972306B2 (en) | 2016-11-23 | 2021-04-06 | Carrier Corporation | Building management system having event reporting |
CN112667606A (en) * | 2021-01-15 | 2021-04-16 | 中国科学院空天信息创新研究院 | Knowledge base system based on multi-source knowledge acquisition technology and construction method thereof |
KR20220094797A (en) * | 2020-12-29 | 2022-07-06 | 케이웨어 (주) | Data management server for managing metadata and control method thereof |
US11586938B2 (en) | 2016-11-23 | 2023-02-21 | Carrier Corporation | Building management system having knowledge base |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5642502A (en) * | 1994-12-06 | 1997-06-24 | University Of Central Florida | Method and system for searching for relevant documents from a text database collection, using statistical ranking, relevancy feedback and small pieces of text |
US5694592A (en) * | 1993-11-05 | 1997-12-02 | University Of Central Florida | Process for determination of text relevancy |
WO2002031705A1 (en) * | 2000-10-10 | 2002-04-18 | Science Applications International Corporation | Method and system for facilitating the refinement of data queries |
US20050010553A1 (en) * | 2000-10-30 | 2005-01-13 | Microsoft Corporation | Semi-automatic annotation of multimedia objects |
US7117207B1 (en) * | 2002-09-11 | 2006-10-03 | George Mason Intellectual Properties, Inc. | Personalizable semantic taxonomy-based search agent |
US20070050338A1 (en) * | 2005-08-29 | 2007-03-01 | Strohm Alan C | Mobile sitemaps |
US7603350B1 (en) * | 2006-05-09 | 2009-10-13 | Google Inc. | Search result ranking based on trust |
US20110022598A1 (en) * | 2009-07-24 | 2011-01-27 | Yahoo! Inc. | Mixing knowledge sources for improved entity extraction |
-
2010
- 2010-12-28 MY MYPI2010006268A patent/MY176053A/en unknown
-
2011
- 2011-06-24 WO PCT/MY2011/000153 patent/WO2012091541A1/en active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5694592A (en) * | 1993-11-05 | 1997-12-02 | University Of Central Florida | Process for determination of text relevancy |
US5642502A (en) * | 1994-12-06 | 1997-06-24 | University Of Central Florida | Method and system for searching for relevant documents from a text database collection, using statistical ranking, relevancy feedback and small pieces of text |
WO2002031705A1 (en) * | 2000-10-10 | 2002-04-18 | Science Applications International Corporation | Method and system for facilitating the refinement of data queries |
US20050010553A1 (en) * | 2000-10-30 | 2005-01-13 | Microsoft Corporation | Semi-automatic annotation of multimedia objects |
US7117207B1 (en) * | 2002-09-11 | 2006-10-03 | George Mason Intellectual Properties, Inc. | Personalizable semantic taxonomy-based search agent |
US20070050338A1 (en) * | 2005-08-29 | 2007-03-01 | Strohm Alan C | Mobile sitemaps |
US7603350B1 (en) * | 2006-05-09 | 2009-10-13 | Google Inc. | Search result ranking based on trust |
US20110022598A1 (en) * | 2009-07-24 | 2011-01-27 | Yahoo! Inc. | Mixing knowledge sources for improved entity extraction |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106156305A (en) * | 2016-06-30 | 2016-11-23 | 北京奇虎科技有限公司 | Represent the method and device of propelling data |
US10972306B2 (en) | 2016-11-23 | 2021-04-06 | Carrier Corporation | Building management system having event reporting |
US11586938B2 (en) | 2016-11-23 | 2023-02-21 | Carrier Corporation | Building management system having knowledge base |
KR20220094797A (en) * | 2020-12-29 | 2022-07-06 | 케이웨어 (주) | Data management server for managing metadata and control method thereof |
KR102597181B1 (en) | 2020-12-29 | 2023-11-02 | 케이웨어 (주) | Data management server for managing metadata and control method thereof |
CN112667606A (en) * | 2021-01-15 | 2021-04-16 | 中国科学院空天信息创新研究院 | Knowledge base system based on multi-source knowledge acquisition technology and construction method thereof |
Also Published As
Publication number | Publication date |
---|---|
MY176053A (en) | 2020-07-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108763333B (en) | Social media-based event map construction method | |
US9864808B2 (en) | Knowledge-based entity detection and disambiguation | |
US9378285B2 (en) | Extending keyword searching to syntactically and semantically annotated data | |
US8051080B2 (en) | Contextual ranking of keywords using click data | |
KR101040119B1 (en) | Apparatus and Method for Search of Contents | |
KR20130060720A (en) | Apparatus and method for interpreting service goal for goal-driven semantic service discovery | |
WO2010014082A1 (en) | Method and apparatus for relating datasets by using semantic vectors and keyword analyses | |
KR20160007040A (en) | Method and system for searching by using natural language query | |
CN102200975A (en) | Vertical search engine system and method using semantic analysis | |
US20120130999A1 (en) | Method and Apparatus for Searching Electronic Documents | |
WO2012091541A1 (en) | A semantic web constructor system and a method thereof | |
JP4864095B2 (en) | Knowledge correlation search engine | |
Zhang et al. | Topic level disambiguation for weak queries | |
Alafif et al. | Domain and range identifier module for semantic web search engines | |
Kumar et al. | An efficient and optimized sematic web enabled framework (EOSWEF) for Google search engine using ontology | |
Sahu et al. | Analytical study on intelligent information retrieval system using semantic network | |
Czerski et al. | What NEKST?—semantic search engine for polish internet | |
Al-Hamami et al. | Development of an opinion blog mining system | |
Prabhumoye et al. | Automated query analysis techniques for semantics based question answering system | |
Kolthoff et al. | Automated retrieval of graphical user interface prototypes from natural language requirements | |
JP2006236254A (en) | Community-dependent information retrieval system and method | |
Khattak et al. | Intelligent search in digital documents | |
Mountantonakis et al. | Linking Entities from Text to Hundreds of RDF Datasets for Enabling Large Scale Entity Enrichment. Knowledge 2022, 2, 1–25 | |
Chun et al. | Semantic annotation and search for deep web services | |
Pardakhe et al. | Enhancement of web search engine results using keyword frequency based ranking |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11852653 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 11852653 Country of ref document: EP Kind code of ref document: A1 |