CN101169780A - Semantic ontology retrieval system and method - Google Patents

Semantic ontology retrieval system and method Download PDF

Info

Publication number
CN101169780A
CN101169780A CNA2006101498039A CN200610149803A CN101169780A CN 101169780 A CN101169780 A CN 101169780A CN A2006101498039 A CNA2006101498039 A CN A2006101498039A CN 200610149803 A CN200610149803 A CN 200610149803A CN 101169780 A CN101169780 A CN 101169780A
Authority
CN
China
Prior art keywords
semantic
index
text
file
semantic body
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2006101498039A
Other languages
Chinese (zh)
Inventor
王伟
舒琦
方琦
钟杰萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CNA2006101498039A priority Critical patent/CN101169780A/en
Publication of CN101169780A publication Critical patent/CN101169780A/en
Pending legal-status Critical Current

Links

Images

Abstract

The inventive embodiment discloses an indexing system based on a semantic body. The system comprises a semantic body index database and a semantic body indexing processing unit. The semantic body indexing processing unit is used for acquiring a text hit file list, matching the text hit file list with a semantic body index in the semantic body index database to obtain a file semantic classification list. The indexing system can identify the semantic information of a file to be indexed, and the semantic classification result is presented in the searching result. The invention also discloses an indexing method based on semantic body, which comprises constructing semantic body index for the file with constructed text index, and performing semantic body index matching to the text matching result when user is searching, so that the semantic classification is presented in the final output result based on the conventional text matching result, thereby facilitating user's searching.

Description

A kind of searching system and method based on semantic body
Technical field
The present invention relates to information retrieval technique, particularly a kind of searching system and method based on semantic body.
Background technology
Develop rapidly along with retrieval technique, it is ripe that the text based information retrieval technique also is tending towards gradually, formed one and overlapped complete thinking and perfect algorithm, and be widely applied in all kinds of search engines, as Google (Google), AltaVista, Lycos, Yahoo (Yahoo) etc.
Fig. 1 is the structured flowchart of existing a kind of text search engine.As shown in Figure 1, existing text search engine comprises: spider control module 101, unified resource location (URL) database 102, Web Spider 103, URL extraction module 104, web database 105, link information extraction module 106, text index module 107, linked database 108, index data base 109, webpage grading module 110 and querying server 111.
Web Spider 103 grasps webpage from the internet, and webpage is sent into web database 105.URL extraction module 104 extracts URL from the webpage that Web Spider 103 grasps, and URL is sent into url database 102.Spider control module 101 is obtained the URL of webpage from url database 102, and Control Network spider 103 grasps other webpages, repeats above-mentioned steps up to all webpages have been grasped.
System obtains text message from web database 105, and sends into text index module 107, sets up index by text index module 107, sends into index data base 109 again.Link information extraction module 106 obtains link information from web database 105 simultaneously, and sends into linked database 108.Link information in the linked database 108 provides the foundation of webpage grading for webpage grading module 110.
When the user submits query requests to by querying server 111, querying server 111 is searched the webpage relevant with the user inquiring request in index data base 109, webpage grading module 110 combines the evaluation of Search Results being carried out the degree of correlation to the link information in user inquiring request and the linked database 108 simultaneously, and sort according to its degree of correlation by 111 pairs of Search Results of querying server, organize the last page to return to the user.
Though existing text retrieval technology can search the file of the text query information that comprises the user, can't identify the content and the meaning of the file that searches.This is based on the text-string coupling because of existing text retrieval technology, the problem of this retrieval technique is, when different speech can represent that identical meaning or a speech have different meanings in different linguistic context, will limit the precision ratio and the recall ratio of retrieval, the result who causes searching can not satisfy user's demand far away, for example, when user's searching key word is " paradise ", can't judge that the file that meets the user search condition is reflection " paradise recreation " or the content of " paradise music ".And the proposition of semantic net provides opportunity for addressing these problems.
Semantic net is can be by the network that constitutes of webpage of computer controlled automatic and its content of identification by a group, be on basis, existing internet, for webpage expansion computing machine can recognition data, and the document that computing machine uses is specialized in increase, promptly webpage is marked with the ontology language, clear and definite its semanteme, thus make info web not only be understood by the people, also can be by computer controlled automatic and identification.The webpage of semantic tagger is that data are done mark with extend markup language (XML) or hypertext markup language (Html) generally, as the data description model, and in conjunction with semantic body, makes the data that are marked have clear and definite semanteme with resource description framework (RDF).Body is a notion that comes from philosophy, and original meaning is meant that the back is introduced by artificial intelligence field, refers in particular to one of generalities explicit specification about the theory of existence and essence and rule.Body can be with each conception of species in the field and mutual relationship explicitly, express formally, thereby the semantic explicitly of term is expressed, thereby plays an important role aspect semantic query.Here the semantic ontology definition that refers to form the basic terms of subject area notion and the relation between them, and stipulated the rule for extent of combination basic terms and the contextual definition vocabulary between them.
The purpose of semantic retrieval is by the data of obtaining from semantic net, strengthens and improves traditional Search Results.Fig. 2 is the structured flowchart of existing a kind of semantic search system.As shown in Figure 2, existing semantic search system comprises: query interface 201, inquiry pretreatment module 202, semantic ontology inference engine 203, mark ontology library 204, traditional search module 205 and result return interface 206.
Query interface 201 obtains user's Query Information, sends it to inquiry pretreatment module 202.
The Query Information of inquiry pretreatment module 202 analysis user by the segmenting word technology, is cut into searching keyword with it, and sends to semantic ontology inference engine 203.
Semantic ontology inference engine 203 is according to the Ontological concept vocabulary of definition in the mark ontology library 204 and the relation between notion and the notion, and coupling infers the pairing Ontological concept vocabulary of searching keyword, and it is returned to inquiry pretreatment module 202.
The Ontological concept vocabulary that inquiry pretreatment module 202 is returned semantic ontology inference engine 203 sends to traditional search module 205, and indicates according to semantic search.Here be meant at webpage according to semantic search to be marked under the semantic situation, carry out string matching according to the semantic concept of webpage label, rather than directly the content of webpage self is carried out string matching.
Tradition search module 205 carries out semantic search, and Search Results is sent to the result returns interface 206.The result returns interface 206 and again Search Results is returned to the user.
As can be seen, the semantic concept vocabulary of user inquiring keyword with the mark webpage is mated in above-mentioned semantic search system.
In sum, though existing text retrieval technology can search the file that comprises searching keyword, can't identify the semantic information of the file that searches; And existing semantic retrieval technology is no longer done keyword retrieval, and the file that causes searching comprises the too many result who does not conform to user inquiring information, and also not fully up to expectations based on the matching efficiency of user inquiring keyword and semantic concept vocabulary.So the search accuracy of existing retrieval technique is not high.
Summary of the invention
In view of this, the fundamental purpose of the embodiment of the invention is to provide a kind of searching system based on semantic body, to improve the accuracy of search.
Another purpose of the embodiment of the invention is to provide a kind of search method based on semantic body, to improve the accuracy of search.
For achieving the above object, technical scheme of the present invention is achieved in that
The embodiment of the invention discloses a kind of searching system based on semantic body, this system comprises:
Semantic body index data base is used to preserve semantic body index;
Semantic body search processing is used to obtain the tabulation of text hit file, and the semantic body index in tabulation of text hit file and the semantic body index data base is carried out matching treatment, obtains the document semantic sorted table.
The embodiment of the invention also discloses a kind of search method based on semantic body, this method may further comprise the steps:
A, obtain the file of setting up text index, and set up semantic body index for the file that obtains;
B, obtain text hit file tabulation, semantic body index matching treatment is carried out in tabulation to the text hit file, obtains the document semantic sorted table.
Therefore, searching system and method that the embodiment of the invention provides based on semantic body, have the following advantages: set up semantic body index for the file of setting up text index earlier, when user search, the text query information of user's input is carried out the text index matching treatment obtain the tabulation of text hit file, semantic body index matching treatment is carried out in tabulation to the text hit file again, obtain the document semantic sorted table, make the text retrieval result have semantic classification information, improved the accuracy of search.
Description of drawings
Fig. 1 is the structured flowchart of existing text search engine;
Fig. 2 is the structured flowchart of existing semantic search system;
Fig. 3 is the structured flowchart of a kind of searching system based on semantic body of the embodiment of the invention;
Fig. 4 is the process flow diagram that semantic body index is set up in the semantic body index process unit in the embodiment of the invention;
Fig. 5 is embodiment of the invention searching system shown in Figure 3 is carried out search procedure for the user a process flow diagram;
Fig. 6 is two resource description synoptic diagram of embodiment of the invention definition;
Fig. 7 is the result schematic diagram that is inferred by Fig. 6;
Fig. 8 is the graph of a relation of mark ontology library for the semantic body vocabulary among the embodiment is set up in the embodiment of the invention;
Fig. 9 is the graph of a relation after the semantic body vocabulary among Fig. 8 passes through reasoning.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, the present invention is described in further detail below in conjunction with drawings and the specific embodiments.
Fig. 3 is the structured flowchart of a kind of searching system based on semantic body of the embodiment of the invention.As shown in Figure 3, this system comprises: search interface module 301, document semantic classifying rules engine 3 02, search processing module 303, semantic ontology inference engine 3 04, mark ontology library 305, index data base 306, index process module 307, document data bank 308 and network file grasp module 309.Wherein, search processing module 303 comprises: text search processing unit 310, semantic body search processing 311 and ordering processing unit 312; Index data base 306 comprises: text index 315 and semantic body index 316; The index process module comprises: text index processing unit 313 and semantic body index process unit 314.
Network file grasps module 309 main being responsible for and grasps webpage from the internet, and the webpage that grasps is saved in the document data bank 308.It generally is by the webpage capture program that network file grasps module 309, for example " network robot " or " Web Spider " etc., the traversal page space, scan the website in certain Internet protocol (IP) address realm, and the link on the network from a webpage to another webpage, from a website to another website, the collection network file.
Document data bank 308 is used to store the file for user search, comprises audio file, video file and text.These files can be network files, also can be non-network files.Each file in the document data bank 308 all has a unique file identification (DocID).
Index process module 307 main being responsible for are analyzed the file that is kept in the document data bank 308, extract the keyword of file content, the file that elimination repeats etc., for the file in the document data bank 308 is set up dissimilar index informations.Index process module 307 comprises text index processing unit 313 and semantic body index process unit 314.
Text index processing unit 313 is traditional processing units of setting up text index, by the Study document content, extracts the identification information of keyword and file, sets up text index.Setting up flow process in view of traditional text index is ripe prior art, no longer repeats here.
Semantic body index process unit 314 is responsible for the file of setting up text index and sets up semantic body index.At first analyze the file of having set up text index, judge whether it contains semantic tagger information,, then extract relevant semantic tagger information and file identification information, set up the semantic body index of this document if certain file contains semantic tagger information.
Index data base 306 is used for preserving the index information that index process module 307 is set up, and promptly preserves the text index 315 of text index processing unit 313 foundation and the semantic body index 316 that semantic body index process unit 314 is set up.
Search processing module 303 is responsible for the query requests of process user, and by the text query information of match user and the index information of file, the file that will meet the user inquiring condition feeds back to the user with certain clooating sequence.Search processing module 303 comprises text search processing unit 310, semantic body search processing 311 and ordering processing unit 312.
Text search processing unit 310 is responsible for the text query information and the text index 315 of user's input are mated, and inquires the text hit file identification information that meets the user inquiring condition.
Text hit file identification information and semantic body index 316 that semantic body search processing 311 is responsible for text search processing unit 310 is drawn carry out matching treatment, these text hit file identification informations are carried out semantic classification, obtain the document semantic sorted table.
Ontological concept word finder in the document semantic sorted table that mark ontology library 305 and semantic ontology inference engine 3 04 are responsible for semantic body search processing 311 is produced carries out semantic reasoning, the semantic body word finder that is expanded.Wherein mark ontology library 305 and preserved the semantic Ontological concept word finder of definition and the relation between the semantic Ontological concept thereof, semantic ontology inference engine 3 04 has defined inference rule and has carried out the reasoning operation.
Document semantic classifying rules engine 3 02 triggers the semantic classification rule that self defines according to the situation that semantic ontology inference engine 3 04 infers, and the document semantic sorted table is expanded integration.
Ordering processing unit 312 is responsible for the ordering optimization of end product, promptly to a series of processing of process, as text index coupling, semantic body index coupling and semantic reasoning expansion etc., the semantic document classification table that obtains, calculate the correlativity and the importance of its document, and the file ordering that searches is fed back to search interface module 301 according to result of calculation.
Search interface module 301 is responsible for native system and user's interactive operation, and the text query information that the user is imported is transmitted to search processing module 303; And the ranking results of the processing unit 312 that will sort feeds back to the user.
The text index 315 that index data base 306 is preserved comprises text forward index and text inverted index.Table 1 is a text forward concordance list, and table 2 is text inverted index tables, as shown in Table 1 and Table 2:
Table 1
File identification (DocID) Keyword
1 Paradise, music ...
2 Application, software ...
3 Use ...
4 Paradise, recreation ...
... ...
Table 2
Keyword File identification sequence (DocID)
The paradise 1、4、...
Use 2、3、...
... ...
From above two forms as can be seen, text forward index is to be key assignments with the file identification, sets up the mapping relations between file identification and the keyword; And the text inverted index is key assignments with the keyword, sets up the mapping relations between keyword and the file identification.
Equally, the semantic body index 315 of index data base 306 preservations comprises semantic body forward index and semantic body inverted index.Table 3 is semantic body forward concordance lists, the semantic body inverted index table of table 4, shown in table 3 and table 4:
Table 3
File identification (DocID) Semantic sign
1 Pop music
2 Classical music
3 Novel
4 Computer game
5 Pop music
... ...
Table 4
Semantic sign File identification sequence (DocID)
Pop music 1、5、...
Classical music 2、...
Novel 3、...
Computer game 4、...
... ...
Semantic body forward index is to be key assignments with the file identification, sets up the mapping relations between file identification and semantic the sign; And semantic body inverted index is designated key assignments with semanteme, sets up the mapping relations between semantic sign and the file identification.
Fig. 4 is the process flow diagram that semantic body index 316 is set up in the semantic body index process unit 314 in the embodiment of the invention.The flow process of setting up of semantic body index is to carry out on the text index processing unit has been set up the basis of text index, and it carries out trigger condition is that text index processing unit 313 has been set up text index to certain file.Referring to Fig. 4, the flow process of setting up of semantic body index may further comprise the steps:
Step 401, semantic body index process unit 314 at first read through text index processing unit 313 and handle, and have set up the file of text index.
Step 402, semantic body index process unit 314 judges whether the file that is read has been marked semantic marker.If this document has marked semantic marker, execution in step 403, otherwise the flow process of semantic body index is set up in end to this document.
The file of semantic tagger and the difference of not passing through between the file of semantic tagger are that the file of semantic tagger has been set up the Ontological concept map information.For example, a file identification is 9, and network address is Http:// grids.ucs.indiana.edu/ptliupages/publications/index.htmlThe content of webpage mainly be to have described the relevant item that makes a search and should be noted that, then can be " research (Research) " notion with this webpage label.Existing semantic tagger information is with the note form a bit, and some is with in the XML packet form embedded web page.In this example, provide one with the text marking instrument OntoMat of Stanford University mark, the semantic tagger information of representing with the note form:
<html>
<head>
<!--<rdf:RDF?xmlns:rdf=″ http://www.w3.org/1999/02/22-rdf-syntax-ns#
xmlns:daml=″http://www.daml.org/2001/03/daml+oil#″
xmlns=″http://annotation.semanticweb.org/iswc/iswc.daml#″
<Research?rdf:about=″http://grids.ucs.indiana.edu/ptliupages/publications/index.html″
</rdf:RDF>
-->
<title>Community?Grids?Publications</title>
This example expression webpage Http:// grids.ucs.indiana.edu/ptliupages/publications/index.htmlContent mainly be about " Research ".For the webpage with OntoMat instrument mark, its semantic tagger information is placed in the annotation information in the Html head, with<the rdf:RDF beginning, with</rdf:RDF〉ending.Therefore, be with<rdf:RDF beginning when semantic body index process unit 314 detects semantic tagger information, with</rdf:RDF〉ending, judge that then this web page files was marked by semantic marker.
Step 403, semantic body index process unit 314 reads the semantic tagger information of file.
Semantic in the present embodiment body index process unit 314 reads the semantic tagger information that file identification is 9 webpage, promptly reads the annotation information in the Html head.Table 5 is to extract semantic tagger information format table, and is as shown in table 5:
Table 5
File identification (DocID) Semantic tagger information
9 <rdf:RDF xmlns:rdf=″ http://www.w3.org/1999/02/22-rdf-svntax-ns#″ xmlns:daml=http://www.daml.org/2001/03/daml+oil#″ xmlns=″http://annotation.semanticweb.org/iswc/iswc.daml#″ <Research rdf:about=″http://grids.ucs.indiana.edu/ptliupages/publications/index.html″ </rdf:RDF>
Step 404, semantic body index process unit 314 extracts semantic Ontological concept vocabulary in the middle of the semantic tagger information that reads, set up semantic body index.
Semantic in the present embodiment body index process unit 314 calls relevant RDF document processing application DLL (dynamic link library) (API), from semantic tagger information, extract semantic Ontological concept vocabulary " Research ", set up the semantic body forward index of webpage 9, and convert semantic body inverted index simultaneously to, shown in table 6 and table 7.Table 6 is semantic body forward index of webpage 9, and table 7 is semantic body inverted indexs of webpage 9, shown in table 6 and table 7:
Table 6
File identification (DocID) Semantic sign
9 Research
Table 7
Semantic sign File identification (DocID)
Research 9
Step 405, semantic body forward index and semantic body inverted index that semantic body index process unit 314 will be set up are saved in the index data base 306, have promptly formed the content of semantic body index 316.
Set up before the semantic body index, why to pass through the treatment step of text index processing unit 313 earlier, be because when user search, will inquire the file of the text query information that meets user's input earlier, and then these files are carried out semantic body index matching treatment.The treatment step of text index processing unit 313 has guaranteed that each has set up text index, and the file that semantic information is arranged, corresponding semantic body index information is all arranged in semantic body index 316, thereby avoid because directly read that file carries out semantic body index coupling and the file that produces has semantic body index and do not have the situation of text index from document data bank 308.
Fig. 5 is that embodiment of the invention searching system shown in Figure 3 is the process flow diagram that the user carries out search procedure, as shown in Figure 5, may further comprise the steps:
Step 501, search interface module 301 are obtained the text query information of user's input, and send it to search processing module 303.The Query Information of hypothesis user input is " paradise " in the present embodiment.
Step 502, search processing module 303 receives the text query information that search interface module 301 sends, and it is carried out the cutting pre-service, then the searching keyword after the cutting is sent to text search processing unit 310.
The detailed process that cutting is handled all has description in the pertinent literature of existing description search engine, no longer repeat here.Present embodiment Chinese version Query Information " paradise " is keyword " paradise " through the pretreated result of cutting.
Step 503, text search processing unit 310 mates searching keyword and the text inverted index after the cuttings, and the text hit file of match hit is tabulated sends to semantic body search processing 311.
After text search processing unit 310 receives searching keyword, send the solicited message that reads the text inverted index to index data base 306, index data base 306 is according to the text inverted index in the request returned text index 315.Text search processing unit 310 mates user inquiring keyword " paradise " and text inverted index, obtain a series of web page files sign---text hit file identification lists that comprise this keyword, and the tabulation of text hit file is sent to semantic body search processing 311 handle.
For the sake of simplicity, hypothesis has only been set up index to 20 files in the present embodiment.Table 8 is the text inverted index table that index data base 306 returns to text search processing unit 310, and is as shown in table 8:
Table 8
Keyword The file identification sequence
Use 01011100100011011010
The paradise 11011011111110001011
In the table 8, corresponding keyword of each row and the file identification sequence that this keyword occurred.Wherein, the general act number of index is set up in scale-of-two total bit 20 expressions of file identification sequence, each binary digit is represented a file, the position number of binary digit is identical with the file identification sequence number, be that to represent to identify sequence number be 1 file to first binary digit, it is 2 file that second binary digit represents to identify sequence number, and the like.If certain binary digit is 0, represent that corresponding keyword does not occur in corresponding file, in corresponding file if 1 corresponding keyword of expression occurs.
Text search processing unit 310 matches " paradise " keyword in the table 8 with user inquiring keyword " paradise ", with file identification sequence thereafter, be that text hit file tabulation 11011011111110001011 is taken out, send to semantic body search processing 311.In the text hit file tabulation binary digit be 1 be exactly the file that hits.
In like manner if the text query information of user's input is " paradise application ", through obtaining keyword " paradise " and keyword " application " after the cutting pre-service, therefore need only " paradise " and " application " two keywords that match respectively in the text inverted index, thereafter file identification sequence done with operation obtaining result 01011000100010001010, wherein binary digit is 1 is illustrated in the corresponding file and occurred " paradise " and " application " two keywords simultaneously.
Step 504 after semantic body search processing 311 obtains the tabulation of text hit file, at first judges whether to carry out semantic body inverted index matching treatment.
The foundation that semantic body search processing 311 is judged is the number of text hit file, if the number of hit file is then carried out semantic body inverted index matching treatment, execution in step 505 greater than certain threshold values; Otherwise carry out semantic body forward index matching treatment, execution in step 506.Threshold values can be used as predefined value storage in semantic body search processing 311, also can be that searching system is according to statistical law or the dynamic numerical value of adjusting of other condition.
After semantic body search processing 311 received text hit file tabulation 11011011111110001011, it was 14 that accumulation calculating obtains in this binary sequence 1 number, and promptly text hit file number is 14.Suppose that threshold values is 10,, therefore carry out semantic body inverted index matching treatment because 14 greater than 10.If threshold values is 15, then owing to 14 less than 15, carry out semantic body forward index matching treatment.
Step 505, the file in 311 pairs of text hit file tabulations of semantic body search processing carries out semantic body inverted index matching treatment, obtains the document semantic sorted table.
At first, semantic body search processing 311 sends the request message that reads semantic body inverted index to index data base 306.Index data base 306 returns semantic body inverted index according to request.Each bar record that semantic body search processing 311 is read in the semantic body inverted index successively, file identification sequence in the record and the tabulation of text hit file are done intersection operation, be about to two binary sequences and carry out step-by-step and operation, cover corresponding file identifier in the semantic body inverted index table with operating result then.At last, filter out the record that occurs simultaneously for empty, then original semantic body inverted index table has just become the document semantic sorted table.Execution in step 507.
Table 9 is the semantic body inverted index table that index data base 306 returns to semantic body search processing 311 in the present embodiment, and is as shown in table 9:
Table 9
Semantic sign The file identification sequence
Pop music 01011010110001100000
Computer game 10100101000100001011
Classical music 00010000001011000001
Novel 10000000000000000110
The sports star 00000000000000010000
Suppose in the table 9 that 20 files setting up index only relate to five semantic Ontological concepts, promptly the sign of the semanteme in all files has five kinds.File identification sequence after each semantic sign is represented the situation that this Ontological concept occurs in 20 files.Its method for expressing is with the file identification sequence in the text inverted index, and each binary digit is represented a file, and the position number of binary digit is identical with the sign sequence number of file.If certain binary digit is 0, the expression corresponding file does not mark corresponding Ontological concept, if 1 expression has marked corresponding Ontological concept.For example the file identification sequence of pop music is 01011010110001100000, and the expression file identification is the notion that 2,4,5,7,9,10,14,15 file is marked into pop music, has reflected that the content of these files is relevant with pop music.
Semantic body search unit 311 reads each file identification sequence in the semantic body inverted index shown in the table 9, do step-by-step and operation with text hit file tabulation 11011011111110001011, operating result is deposited in the position of corresponding file identifier in the table 9, and cover original file identification sequence, filter out at last and occur simultaneously for empty, be complete zero semantic identification item both, produced the document semantic sorted table with operating result.Table 10 is the document semantic sorted table that produces, and is as shown in table 10:
Table 10
Semantic sign The file identification sequence
Pop music 01011010110000000000
Computer game 10000001000100001011
Classical music 00010000001010000001
Novel 10000000000000000010
Like this, just text hit file tabulation 11011011111110001011 has been classified by semanteme.
Step 506, the file in 311 pairs of text hit file tabulations of semantic body search processing carries out semantic body forward index matching treatment, obtains the document semantic sorted table.
At first, semantic body search processing 311 sends the request message that reads semantic body forward index to index data base 306.Table 11 is that index data base 306 returns semantic body forward concordance list according to the request of semantic body search processing 311, and is as shown in table 11:
Table 11
File identification Semantic sign
1 Computer game, novel
2 Pop music
3 Computer game
4 Pop music, classical music
5 Pop music
6 Computer game
7 Pop music
8 Computer game
9 Pop music
10 Pop music
11 Classical music
12 Computer game
13 Classical music
14 Pop music, classical music
15 Pop music
16 The sports star
17 Computer game
18 Novel
19 Computer game, novel
20 Computer game, classical music
Semantic body search processing 311 is converted into concrete file identification with text hit file tabulation 11011011111110001011: 1,2,4,5,7,8,9,10,11,12,13,17,19,20, and be that querying condition mates corresponding record in semantic body forward index with each file identification, obtain a semantic body forward index that only comprises these file identifications.Table 12 is the semantic body forward concordance list that obtains by said process, and is as shown in table 12:
Table 12
File identification Semantic sign
1 Computer game, novel
2 Pop music
4 Pop music, classical music
5 Pop music
7 Pop music
8 Computer game
9 Pop music
10 Pop music
11 Classical music
12 Computer game
13 Classical music
17 Computer game
19 Computer game, novel
20 Computer game, classical music
At last, be key assignments with each the semantic Ontological concept that occurs in the table 12, count the file identification that this key assignments occurs, finish the conversion that forward indexes inverted index, produce the document semantic sorted table.Table 13 is to obtain the document semantic sorted table by said process, and is as shown in table 13:
Table 13
Semantic sign The file identification sequence
Pop music 01011010110000000000
Computer game 10000001000100001011
Classical music 00010000001010000001
Novel 10000000000000000010
Execution in step 507 then.
Why being divided into semantic body inverted index matching treatment and semantic body forward index matching treatment, is to consider efficiency.Because in the process of carrying out semantic body inverted index matching treatment, need mate each bar record in the semantic body inverted index successively with text hit file tabulation, and do intersection operation, the process of the semantic body inverted index of this full table scan, its calculated amount expense is very big.Therefore, when the number of text hit file seldom the time, carry out semantic body forward index matching treatment and can reduce calculated amount.But no matter use which kind of matching process, the document semantic sorted table of Chan Shenging all is identical at last, and promptly table 13 is identical with table 10.
Step 507, semantic body search processing 311 utilizes semantic ontology inference engine 3 04, mark ontology library 305 and document semantic classifying rules engine that the semantic vocabulary in the document semantic sorted table is carried out reasoning, the result expands the semantic classification table by inference, and the document semantic sorted table after will expanding sends to ordering processing unit 312.
After semantic body search processing 311 executes the matching operation of semantic body index, at first the semantic Ontological concept vocabulary in the document semantic sorted table is sent to semantic ontology inference engine 3 04 and carry out semantic reasoning.Semantic ontology inference engine 3 04 produces the RDF document that concerns between the semantic body vocabulary of expression according to the semantic Ontological concept of definition in the body mark storehouse 305 and the inference rule of relation and self definition thereof, returns to semantic body search processing 311.Then, semantic body search processing 311 is mated the trigger condition in the semantic classification rule of definition in this RDF document and the document semantic classifying rules engine 3 02, judge which semantic classification rule needs to trigger, and trigger corresponding rule, produce document semantic sorted table through the reasoning expansion.At last, the semantic document classification table after the expansion is sent to ordering processing unit 312.
In the present embodiment, semantic body search processing 311 is with four in table 10 or the table 13 semantic Ontological concept speech, and pop music, computer game, classical music, novel send to semantic ontology inference engine 3 04 and carry out reasoning.The reasoning principle of semantic ontology inference engine 3 04 is: the representation according to the RDF tlv triple of resource, carry out reasoning according to the inference rule of definition and handle.The form of expression of RDF tlv triple is: (main body, predicate, individuality).For example define two resource descriptions as shown in Figure 6: Shenzhen 601 belongs to Guangdong 602; Guangdong 602 belongs to China 603.Defining an inference rule simultaneously is: (? a belongs to,? b), (? b belongs to,? c) → (? a belongs to,? c).The implication that this inference rule is expressed is: if a belongs to b, and b belongs to c, then can infer a and belong to c.Therefore, can infer result shown in Figure 7 from relation shown in Figure 6: Shenzhen 601 belongs to China 603.
Suppose in the mark ontology library 305 four Ontological concepts of present embodiment to have been set up relation as shown in Figure 8: the parent of pop music 801 is a popular music 802, and the parent of popular music 802 and classical music 803 is music 804; The parent of novel 805 is a literature 806; The parent of computer game 807 is recreation.Then the RDF relation of four Ontological concepts that obtain after the reasoning of process inference rule as shown in Figure 9: the parent of pop music 801 and classical music 803 is music 804; The parent of novel 805 is a literature 806; The parent of computer game 807 is recreation 808.Its RDF tlv triple output format is:
(pop music, parent, music)
(classical music, parent, music)
(novel, parent, literature)
(computer game, parent, recreation)
Defined such semantic classification rule in the document semantic classifying rules engine 3 02: if there is common individuality in a plurality of tlv triple, and predicate is " parent ", then in the document semantic sorted table, increase new document classification, item name is this individual title, the file identification sequence is the union of each main body vocabulary corresponding file identifier in a plurality of tlv triple, i.e. the sequence as a result of step-by-step or operation.Table 14 is above-mentioned semantic classification rule list, and is as shown in table 14:
Table 14
Trigger condition Executable operations
Exist a plurality of (? X1, parent,? Y) (? X2, parent,? Y) The document semantic sorted table increases a record.Is the semanteme of this record designated? Y, is the file identification sequence? X1,? X1 ... the union of corresponding file identifier
Document semantic sorted table after then expansion is integrated through the semantic reasoning processing and according to the semantic classification rule.Table 15 is the document semantic sorted table after the expansion, and is as shown in Table 15:
Table 15
Semantic sign The file identification sequence
Pop music 01011010110000000000
Computer game 10000001000100001011
Classical music 00010000001010000001
Novel 10000000000000000010
Music 01011010111010000001
Step 508, file in the document semantic sorted table behind 312 pairs of processes of ordering processing unit semantic reasoning carries out the calculating of correlativity and importance, according to result of calculation file is sorted then, result and document semantic classified information after will sorting at last send to search interface module 301.
Step 509, search interface module 301 feeds back to the user with ranking results and the semantic classification information that receives as Search Results.
The above is preferred embodiment of the present invention only, is not to be used for limiting protection scope of the present invention.

Claims (24)

1. the searching system based on semantic body is characterized in that, this system comprises:
Semantic body index data base is used to preserve semantic body index;
Semantic body search processing is used to obtain the tabulation of text hit file, and the semantic body index in tabulation of text hit file and the semantic body index data base is carried out matching treatment, obtains the document semantic sorted table.
2. the system as claimed in claim 1 is characterized in that, this system further comprises semantic body index process unit, is used to obtain the file of having set up text index, and is that the file that obtains is set up semantic body index.
3. system as claimed in claim 2 is characterized in that, this system further comprises:
The text index processing unit is used to file to set up text index;
The text index database is used to preserve text index;
The text search processing unit is used for the text query information of match user and the text index of text index database, obtains the tabulation of text hit file.
4. as claim 1,2 or 3 described systems, it is characterized in that this system further comprises:
Semantic ontology inference engine according to semantic body word finder in the mark ontology library and the relation between the semantic body vocabulary, carries out semantic reasoning to the semantic body word finder in the described document semantic sorted table, the semantic body word finder that is expanded;
The mark ontology library is used to preserve the relation between semantic body word finder and the semantic body vocabulary;
Document semantic classifying rules engine, be used to preserve the semantic classification rule, and the semantic classification rule of the semantic body word finder match triggers correspondence of the expansion that infers according to semantic ontology inference engine, described document semantic sorted table is expanded integration, the document semantic sorted table after being expanded.
5. system as claimed in claim 4 is characterized in that this system further comprises the ordering processing unit, is used for the processing of sorting of file to the document semantic sorted table after the described expansion.
6. system as claimed in claim 5 is characterized in that this system further comprises the search interface module, is used for user's text query information is sent to described text search processing unit; And the ranking results of described ordering processing unit fed back to the user.
7. system as claimed in claim 3 is characterized in that this system further comprises document data bank, is used for storage file, sets up semantic body index and described text index processing unit for described semantic body index process unit and sets up text index and use.
8. system as claimed in claim 7 is characterized in that, this system comprises that further network file grasps module, is used for grasping network file from the internet, and is saved in the described document data bank.
9. system as claimed in claim 3 is characterized in that, described text index comprises text forward index and text inverted index; Described semantic body index comprises semantic body forward index and semantic body inverted index.
10. the system as claimed in claim 1 is characterized in that, described semantic body search processing is tabulated to the text hit file and carried out semantic body forward index matching treatment or carry out semantic body inverted index matching treatment.
11. system as claimed in claim 10, it is characterized in that described semantic body search processing, the number of the text hit file in text hit file tabulation are during greater than threshold values, carry out semantic body inverted index matching treatment, otherwise carry out semantic body forward index matching treatment.
12. system as claimed in claim 11 is characterized in that, described threshold values is predefined fixed numbers or the numerical value that can dynamically adjust.
13. system as claimed in claim 3 is characterized in that, described text search processing unit and semantic body search processing are integrated in the search processing module; Described text index processing unit and semantic body index process unit are integrated in the index process module; Described text index database and semantic body index data base are integrated into an index data base.
14. system as claimed in claim 5 is characterized in that, described text search processing unit, semantic body search processing and ordering processing unit are integrated in the search processing module.
15. the search method based on semantic body is characterized in that, this method may further comprise the steps:
A, obtain the file of setting up text index, and set up semantic body index for the file that obtains;
B, obtain text hit file tabulation, semantic body index matching treatment is carried out in tabulation to the text hit file, obtains the document semantic sorted table.
16. method as claimed in claim 15 is characterized in that,
Before steps A, further comprise, set up the step of text index for the file in the document data bank;
Before step B, further comprise, user's text query information is carried out the text index matching treatment, obtain the step of text hit file tabulation.
17., it is characterized in that this method further may further comprise the steps as claim 15 or 16 described methods:
C, the semantic body word finder in the described document semantic sorted table is carried out semantic reasoning, the semantic body word finder that is expanded;
D, the semantic body word finder of the expansion that goes out are by inference carried out expansion integrated operation, the document semantic sorted table after being expanded to described document semantic sorted table.
18. method as claimed in claim 17 is characterized in that, this method further comprises: the step that the file in the document semantic sorted table after the described expansion is sorted and handles.
19. method as claimed in claim 15 is characterized in that, setting up semantic body index described in the steps A is to set up semantic body forward index and set up semantic body inverted index; Described in the step B semantic body index matching treatment being carried out in text hit file tabulation, is to carry out semantic body inverted index matching treatment or carry out semantic body forward index matching treatment.
20. method as claimed in claim 15 is characterized in that, is further comprising before the step B: in step B, semantic body inverted index matching treatment is carried out in tabulation to the text hit file, or carries out the determining step of semantic body forward index matching treatment.
21. method as claimed in claim 20, it is characterized in that, described determining step is: when the number of the text hit file in the text hit file tabulation is carried out semantic body inverted index matching treatment at step B during greater than threshold values, otherwise carry out semantic body forward index matching treatment at step B.
22. method as claimed in claim 21 is characterized in that, described threshold values is predefined fixed numbers or the numerical value that can dynamically adjust.
23. method as claimed in claim 16 is characterized in that, the described text index of setting up is to set up text forward index and set up the text inverted index; Described text query information to the user is carried out the text index matching treatment, is to carry out text inverted index matching treatment or carry out text forward index matching treatment.
24. method as claimed in claim 16 is characterized in that, further comprises before setting up text index described: the step of setting up document data bank.
CNA2006101498039A 2006-10-25 2006-10-25 Semantic ontology retrieval system and method Pending CN101169780A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2006101498039A CN101169780A (en) 2006-10-25 2006-10-25 Semantic ontology retrieval system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2006101498039A CN101169780A (en) 2006-10-25 2006-10-25 Semantic ontology retrieval system and method

Publications (1)

Publication Number Publication Date
CN101169780A true CN101169780A (en) 2008-04-30

Family

ID=39390409

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2006101498039A Pending CN101169780A (en) 2006-10-25 2006-10-25 Semantic ontology retrieval system and method

Country Status (1)

Country Link
CN (1) CN101169780A (en)

Cited By (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101799835A (en) * 2010-04-21 2010-08-11 中国测绘科学研究院 Ontology-driven geographic information retrieval system and method
CN101917413A (en) * 2010-07-29 2010-12-15 清华大学 Service assembly system and method based on service quality optimization and semantic information integration
CN101944099A (en) * 2010-06-24 2011-01-12 西北工业大学 Method for automatically classifying text documents by utilizing body
CN101566984B (en) * 2008-07-11 2011-02-09 博采林电子科技(深圳)有限公司 Search engine used in personal hand-held equipment and resource search method
CN102063453A (en) * 2010-05-31 2011-05-18 百度在线网络技术(北京)有限公司 Method and device for searching based on demands of user
CN102073692A (en) * 2010-12-16 2011-05-25 北京农业信息技术研究中心 Agricultural field ontology library based semantic retrieval system and method
CN102725759A (en) * 2010-02-05 2012-10-10 微软公司 Semantic table of contents for search results
CN102750277A (en) * 2011-04-18 2012-10-24 腾讯科技(深圳)有限公司 Method and device for obtaining information
CN102799677A (en) * 2012-07-20 2012-11-28 河海大学 Water conservation domain information retrieval system and method based on semanteme
CN102880645A (en) * 2012-08-24 2013-01-16 上海云叟网络科技有限公司 Semantic intelligent search method
CN103020283A (en) * 2012-12-27 2013-04-03 华北电力大学 Semantic search method based on dynamic reconfiguration of background knowledge
CN103136360A (en) * 2013-03-07 2013-06-05 北京宽连十方数字技术有限公司 Internet behavior markup engine and behavior markup method corresponding to same
CN103177123A (en) * 2013-04-15 2013-06-26 昆明理工大学 Method for improving database retrieval information relevancy
CN103440284A (en) * 2013-08-14 2013-12-11 郭克华 Multimedia storage and search method supporting cross-type semantic search
CN104462060A (en) * 2014-12-03 2015-03-25 百度在线网络技术(北京)有限公司 Method and device for calculating text similarity and realizing search processing through computer
CN104615729A (en) * 2014-10-30 2015-05-13 南京源成语义软件科技有限公司 Network searching method based on semantic net technology
CN104765779A (en) * 2015-03-20 2015-07-08 浙江大学 Patent document inquiry extension method based on YAGO2s
CN104866598A (en) * 2015-06-01 2015-08-26 北京理工大学 Heterogeneous database integrating method based on configurable templates
CN105160046A (en) * 2015-10-30 2015-12-16 成都博睿德科技有限公司 Text-based data retrieval method
WO2016009321A1 (en) * 2014-07-14 2016-01-21 International Business Machines Corporation System for searching, recommending, and exploring documents through conceptual associations and inverted table for storing and querying conceptual indices
CN105335510A (en) * 2015-10-30 2016-02-17 成都博睿德科技有限公司 Text data efficient searching method
CN102750277B (en) * 2011-04-18 2016-12-14 深圳市世纪光速信息技术有限公司 The method and apparatus of acquisition information
CN103886099B (en) * 2014-04-09 2017-02-15 中国人民大学 Semantic retrieval system and method of vague concepts
CN106951191A (en) * 2017-03-22 2017-07-14 江苏金易达供应链管理有限公司 Towards the big data storage method of auto service platform
CN107004158A (en) * 2014-11-27 2017-08-01 爱克发医疗保健公司 Data repository querying method
CN107590166A (en) * 2016-12-20 2018-01-16 百度在线网络技术(北京)有限公司 A kind of data creation method and device based on inquiry content
CN108170739A (en) * 2017-12-18 2018-06-15 深圳前海微众银行股份有限公司 Problem matching process, terminal and computer readable storage medium
WO2018141140A1 (en) * 2017-02-06 2018-08-09 中兴通讯股份有限公司 Method and device for semantic recognition
CN109522414A (en) * 2018-11-26 2019-03-26 吉林大学 A kind of document delivery object selection system
CN110245215A (en) * 2019-06-05 2019-09-17 阿里巴巴集团控股有限公司 A kind of text searching method and device
US10496683B2 (en) 2014-07-14 2019-12-03 International Business Machines Corporation Automatically linking text to concepts in a knowledge base
US10503761B2 (en) 2014-07-14 2019-12-10 International Business Machines Corporation System for searching, recommending, and exploring documents through conceptual associations
US10572521B2 (en) 2014-07-14 2020-02-25 International Business Machines Corporation Automatic new concept definition
CN111199170A (en) * 2018-11-16 2020-05-26 长鑫存储技术有限公司 Formula file identification method and device, electronic equipment and storage medium
CN111353055A (en) * 2020-03-02 2020-06-30 中国传媒大学 Intelligent tag extended metadata-based cataloging method and system
CN112182239A (en) * 2020-09-22 2021-01-05 中国建设银行股份有限公司 Information retrieval method and device
CN113779032A (en) * 2021-09-14 2021-12-10 广州汇通国信科技有限公司 Search engine index construction method and device based on recurrent neural network

Cited By (59)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101566984B (en) * 2008-07-11 2011-02-09 博采林电子科技(深圳)有限公司 Search engine used in personal hand-held equipment and resource search method
CN102725759B (en) * 2010-02-05 2015-11-25 微软技术许可有限责任公司 For the semantic directory of Search Results
CN102725759A (en) * 2010-02-05 2012-10-10 微软公司 Semantic table of contents for search results
CN101799835B (en) * 2010-04-21 2012-07-04 中国测绘科学研究院 Ontology-driven geographic information retrieval system and method
CN101799835A (en) * 2010-04-21 2010-08-11 中国测绘科学研究院 Ontology-driven geographic information retrieval system and method
CN102063453A (en) * 2010-05-31 2011-05-18 百度在线网络技术(北京)有限公司 Method and device for searching based on demands of user
CN101944099A (en) * 2010-06-24 2011-01-12 西北工业大学 Method for automatically classifying text documents by utilizing body
CN101917413A (en) * 2010-07-29 2010-12-15 清华大学 Service assembly system and method based on service quality optimization and semantic information integration
CN101917413B (en) * 2010-07-29 2013-07-17 清华大学 Service assembly system and method based on service quality optimization and semantic information integration
CN102073692A (en) * 2010-12-16 2011-05-25 北京农业信息技术研究中心 Agricultural field ontology library based semantic retrieval system and method
CN102073692B (en) * 2010-12-16 2016-04-27 北京农业信息技术研究中心 Based on the semantic retrieval system and method for agriculture field ontology library
CN102750277A (en) * 2011-04-18 2012-10-24 腾讯科技(深圳)有限公司 Method and device for obtaining information
CN102750277B (en) * 2011-04-18 2016-12-14 深圳市世纪光速信息技术有限公司 The method and apparatus of acquisition information
CN102799677B (en) * 2012-07-20 2014-11-12 河海大学 Water conservation domain information retrieval system and method based on semanteme
CN102799677A (en) * 2012-07-20 2012-11-28 河海大学 Water conservation domain information retrieval system and method based on semanteme
CN102880645A (en) * 2012-08-24 2013-01-16 上海云叟网络科技有限公司 Semantic intelligent search method
CN102880645B (en) * 2012-08-24 2015-12-16 上海云叟网络科技有限公司 The intelligent search method of semantization
CN103020283A (en) * 2012-12-27 2013-04-03 华北电力大学 Semantic search method based on dynamic reconfiguration of background knowledge
CN103020283B (en) * 2012-12-27 2015-12-09 华北电力大学 A kind of semantic retrieving method of the dynamic restructuring based on background knowledge
CN103136360B (en) * 2013-03-07 2016-09-07 北京宽连十方数字技术有限公司 A kind of internet behavior markup engine and to should the behavior mask method of engine
CN103136360A (en) * 2013-03-07 2013-06-05 北京宽连十方数字技术有限公司 Internet behavior markup engine and behavior markup method corresponding to same
CN103177123B (en) * 2013-04-15 2016-05-11 昆明理工大学 A kind of method that improves database retrieval information correlation
CN103177123A (en) * 2013-04-15 2013-06-26 昆明理工大学 Method for improving database retrieval information relevancy
CN103440284B (en) * 2013-08-14 2016-04-20 郭克华 A kind of support across type semantic search multimedia store and searching method
CN103440284A (en) * 2013-08-14 2013-12-11 郭克华 Multimedia storage and search method supporting cross-type semantic search
CN103886099B (en) * 2014-04-09 2017-02-15 中国人民大学 Semantic retrieval system and method of vague concepts
US10572521B2 (en) 2014-07-14 2020-02-25 International Business Machines Corporation Automatic new concept definition
WO2016009321A1 (en) * 2014-07-14 2016-01-21 International Business Machines Corporation System for searching, recommending, and exploring documents through conceptual associations and inverted table for storing and querying conceptual indices
US10496684B2 (en) 2014-07-14 2019-12-03 International Business Machines Corporation Automatically linking text to concepts in a knowledge base
US10503761B2 (en) 2014-07-14 2019-12-10 International Business Machines Corporation System for searching, recommending, and exploring documents through conceptual associations
US10503762B2 (en) 2014-07-14 2019-12-10 International Business Machines Corporation System for searching, recommending, and exploring documents through conceptual associations
US10496683B2 (en) 2014-07-14 2019-12-03 International Business Machines Corporation Automatically linking text to concepts in a knowledge base
US10956461B2 (en) 2014-07-14 2021-03-23 International Business Machines Corporation System for searching, recommending, and exploring documents through conceptual associations
CN104615729A (en) * 2014-10-30 2015-05-13 南京源成语义软件科技有限公司 Network searching method based on semantic net technology
CN107004158A (en) * 2014-11-27 2017-08-01 爱克发医疗保健公司 Data repository querying method
CN104462060B (en) * 2014-12-03 2017-08-01 百度在线网络技术(北京)有限公司 Pass through computer implemented calculating text similarity and search processing method and device
CN104462060A (en) * 2014-12-03 2015-03-25 百度在线网络技术(北京)有限公司 Method and device for calculating text similarity and realizing search processing through computer
CN104765779A (en) * 2015-03-20 2015-07-08 浙江大学 Patent document inquiry extension method based on YAGO2s
CN104866598B (en) * 2015-06-01 2018-05-08 北京理工大学 Heterogeneous databases integration method based on configurable template
CN104866598A (en) * 2015-06-01 2015-08-26 北京理工大学 Heterogeneous database integrating method based on configurable templates
CN105335510A (en) * 2015-10-30 2016-02-17 成都博睿德科技有限公司 Text data efficient searching method
CN105160046A (en) * 2015-10-30 2015-12-16 成都博睿德科技有限公司 Text-based data retrieval method
CN107590166B (en) * 2016-12-20 2019-02-12 百度在线网络技术(北京)有限公司 A kind of data creation method and device based on inquiry content
US11301515B2 (en) 2016-12-20 2022-04-12 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for generating data based on query content
CN107590166A (en) * 2016-12-20 2018-01-16 百度在线网络技术(北京)有限公司 A kind of data creation method and device based on inquiry content
WO2018141140A1 (en) * 2017-02-06 2018-08-09 中兴通讯股份有限公司 Method and device for semantic recognition
CN106951191A (en) * 2017-03-22 2017-07-14 江苏金易达供应链管理有限公司 Towards the big data storage method of auto service platform
CN108170739A (en) * 2017-12-18 2018-06-15 深圳前海微众银行股份有限公司 Problem matching process, terminal and computer readable storage medium
CN111199170A (en) * 2018-11-16 2020-05-26 长鑫存储技术有限公司 Formula file identification method and device, electronic equipment and storage medium
CN111199170B (en) * 2018-11-16 2022-04-01 长鑫存储技术有限公司 Formula file identification method and device, electronic equipment and storage medium
CN109522414A (en) * 2018-11-26 2019-03-26 吉林大学 A kind of document delivery object selection system
CN109522414B (en) * 2018-11-26 2021-06-04 吉林大学 Document delivery object selection system
CN110245215A (en) * 2019-06-05 2019-09-17 阿里巴巴集团控股有限公司 A kind of text searching method and device
CN110245215B (en) * 2019-06-05 2023-10-20 创新先进技术有限公司 Text retrieval method and device
CN111353055A (en) * 2020-03-02 2020-06-30 中国传媒大学 Intelligent tag extended metadata-based cataloging method and system
CN111353055B (en) * 2020-03-02 2024-04-16 中国传媒大学 Cataloging method and system based on intelligent tag extension metadata
CN112182239A (en) * 2020-09-22 2021-01-05 中国建设银行股份有限公司 Information retrieval method and device
CN113779032A (en) * 2021-09-14 2021-12-10 广州汇通国信科技有限公司 Search engine index construction method and device based on recurrent neural network
CN113779032B (en) * 2021-09-14 2024-03-12 广州汇通国信科技有限公司 Search engine index construction method and device based on cyclic neural network

Similar Documents

Publication Publication Date Title
CN101169780A (en) Semantic ontology retrieval system and method
US7613602B2 (en) Structured document processing apparatus, structured document search apparatus, structured document system, method, and program
CN105045875B (en) Personalized search and device
WO2008098502A1 (en) Method and device for creating index as well as method and system for retrieving
JP6355840B2 (en) Stopword identification method and apparatus
CN103838833A (en) Full-text retrieval system based on semantic analysis of relevant words
CN103902652A (en) Automatic question-answering system
WO2008097856A2 (en) Search result delivery engine
JP2006004417A (en) Method and device for recognizing specific type of information file
CN101710318A (en) Knowledge intelligent acquiring system of vegetable supply chains
US20120130925A1 (en) Decomposable ranking for efficient precomputing
US9971828B2 (en) Document tagging and retrieval using per-subject dictionaries including subject-determining-power scores for entries
CN101101605A (en) Method, device and system for searching web page and device for establishing index database
CN101339560B (en) Method and device for searching series data, and search engine system
CN103559258A (en) Webpage ranking method based on cloud computation
CN115563313A (en) Knowledge graph-based document book semantic retrieval system
CN105404677A (en) Tree structure based retrieval method
CN109783599A (en) Knowledge mapping search method and system based on multi storage
CN108509449B (en) Information processing method and server
CN105677684A (en) Method for making semantic annotations on content generated by users based on external data sources
CN112597370A (en) Webpage information autonomous collecting and screening system with specified demand range
CN105426490A (en) Tree structure based indexing method
CN102508920B (en) Information retrieval method based on Boosting sorting algorithm
JP6173958B2 (en) Program, apparatus and method for searching using a plurality of hash tables
Thakare et al. Extraction of template using clustering from heterogeneous web documents

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Open date: 20080430