CN103235780A - Method for storing and searching correlation information based on semanteme - Google Patents

Method for storing and searching correlation information based on semanteme Download PDF

Info

Publication number
CN103235780A
CN103235780A CN201310089129XA CN201310089129A CN103235780A CN 103235780 A CN103235780 A CN 103235780A CN 201310089129X A CN201310089129X A CN 201310089129XA CN 201310089129 A CN201310089129 A CN 201310089129A CN 103235780 A CN103235780 A CN 103235780A
Authority
CN
China
Prior art keywords
keyword
relation
retrieval
key words
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310089129XA
Other languages
Chinese (zh)
Inventor
张经纶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201310089129XA priority Critical patent/CN103235780A/en
Publication of CN103235780A publication Critical patent/CN103235780A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a method for storing and searching correlation information based on semanteme. According to the invention, a semantic information database and a key words correlation database are built, wherein the semantic information database is used for keeping content information of the key words; and the key words correlation database is used for keeping key words correlation space. The method comprises the following steps: step one, defining the key words correlation space; step two, searching correlation key words of the key words; and step three, searching the semantic information database according to the correlation key words of the key words to obtain relevant semantic information. The key words correlation space comprises key words correlation relations: main and auxiliary relation, member relation, similar relation, co-ordinate relation, antagonistic relation, and degree relation. The method extends the searching range by adding key words semantic analysis before searching, so as to increase the searching accuracy, and greatly reduces the repetition of information contents through key words semantic relation expressions, so as to reduce contents of the semantic information database, and is easy to realize and widely applied.

Description

A kind of based on the information association store of semanteme and the method for retrieval
Technical field
The present invention relates to data retrieval and semantic intellectual analysis.
Background technology
The current information field has entered big data age, how can retrieve/search required information exactly, is that each user is closed the gains in depth of comprehension problem.Among existing search engine such as Baidu or the Google, we import " Lantern Festival ", we obtain only about the information in " Lantern Festival ", and can not obtain information about " Lantern Festival ".And in fact " Lantern Festival " be equal to " Lantern Festival ", search " Lantern Festival " just is equal to search " Lantern Festival ".In a certain e-commerce platform, the user wants search " red shirt " for another example, and " fuchsin ", " pink ", " purplish red ", " orange red ", " dark red ", " pink " all may be the information that the user wants so.For various " redness ", in Chinese information processing system, only need retrieval " red " to get final product, but in english system or other language systems, different " redness " have different words, such as " Red ", " Magenta ", " Pink ", " Purple ", " Crimson ", " Salmon ", " Peachpuff " etc.If the user need with " Magenta ", " Pink ", " Purple ", " Crimson ", " Salmon ", " Peachpuff " information like the pink group, then need carrying out retrieving again after semanteme is resolved to different words.
Summary of the invention
Problem to be solved by this invention is: in searching system, provide a kind of semantic intelligent analysis system, provide more result for retrieval information to the user.
For addressing the above problem, the scheme that the present invention adopts is as follows:
A kind of based on the information association store of semanteme and the method for retrieval, comprise semantic information database and keyword linked database.The semantic information database is used for preserving the content information of keyword.The keyword linked database is used for preserving the keyword incident space.This method may further comprise the steps:
S1, definition keyword incident space;
S2, the related keyword of search key;
S3 obtains corresponding semantic information according to the related keyword retrieval semantic information database retrieval of keyword.
The keyword incident space comprises the keyword incidence relation; The keyword incidence relation comprises: major-minor relation, member relation, similarity relation, coordination, antagonistic relations and degree relation.Constitute the keyword incident space between the various keyword incidence relation.For in this keyword incident space, retrieving the incidence relation of keyword, need the step of keyword incidence relation computing retrieval.So-called keyword incidence relation computing retrieval is the retrieval of keyword set that obtains the various different incidence relations of a certain keyword according to related operation rule computing.Related operation rule comprises: delivery rules, the anti-rule that pushes away, deduce rule, plus-minus rule etc.The part that related operation rule can be further used as the keyword incident space is stored in the keyword linked database, that is, the keyword incident space can also comprise related operation rule.The step of keyword incidence relation computing retrieval can realize in step S1 according to the storage means of different keyword incidence relations, also can realize in step S2.That is, step S1 comprises the step of keyword incidence relation computing retrieval or the step that step S2 comprises keyword incidence relation computing retrieval.
Further, this method can also comprise the step of extracting keyword from user's input information.
Technique effect of the present invention is as follows:
1, before in retrieval, adds the semantic scope that has enlarged retrieval of analyzing of keyword, improve the accuracy of retrieval;
2, significantly reduced the repetition of the information content by the semantic relationship expression of keyword, particularly the present invention has defined the major-minor relation, only need to preserve the key word information of subject term, thereby reduced the semantic information data-base content, alleviated the burden of semantic information database;
3, the semantic relation of keyword of the present invention can be expanded at any time;
4, implementation method of the present invention is simple, is widely used, and can be applied to search engine, also can be applied to the retrieval service of specific area, such as other fields such as e-commerce platform, blog, forums.
Description of drawings
Do not have
Embodiment
Below content of the present invention is described in further detail.
1, semantic information database
Semantic information database in the said method is used for preserving the keyword data message corresponding with keyword, is similar to a kind of dictionary.The information content in the semantic information database is preserved.In the practical application, it may not be the ingredient of the system that the present invention relates to, also may be the external data base by network connection, such as the data content that has utilized the Google search engine to obtain.The semantic information database also can be the ingredient that the present invention relates to system.The content of this situation semantic information database needs own the definition, thereby need comprise that semantic information with keyword is kept at the step in the semantic information database." database " in the semantic information database is a kind of expression, those skilled in the art understand, it also can adopt as relational databases such as Oricle, mySQL, DB2, also can adopt the object-oriented database as Versant, Db4o, even can also adopt common file mode.
2, keyword incidence relation
In natural language, the relation between the word is variation, and such as the synonym relation, the antonym relation also has its degree of modifying object to concern for adjective or adverbial word.The present invention preferentially is generalized into the relation of the word in the natural language six classes between the keyword: major-minor relation, member relation, similarity relation, antagonistic relations, coordination and degree relation.
A, major-minor relation.The major-minor relationship expression be synonymy between the keyword.One of them keyword is subject term, and other synonymous keyword are adverbial word.Adopt the formulation mode, can state as: s (A, B).Wherein A is subject term, and B is adverbial word.
B, member relation.Member relation is expressed is relation of inclusion between the keyword.One of them keyword is the member of another keyword.Formulation is: (A, B), B is the member of A to m.
C, similarity relation.Two keyword semantemes are close, but not exclusively are equal to.Be designated as: (A, B), expression A and B are similarity relation to c.
D, antagonistic relations.Two keywords are semantic opposite, are designated as: (A, B), expression A and B are semantic opposite for r.
E, coordination.Two keywords are the member of another keyword, are designated as: b(A, B), expression A and B coordination.Such as m (C, A) and m (C, B), expression A and B are the member of C, then b (A, B).
F, degree relation.Two keyword semantemes are close, but have the branch of degree, are designated as d(A, B, and n), it is B that the degree of expression A adds n, it is A that the degree of B subtracts n.
Constitute the keyword incident space between the various keyword incidence relation, the keyword incidence relation can also further related computing.The definition of above-mentioned incidence relation has been arranged, and following related operation rule is apparent:
The deduction of deduce rule, major-minor relation, if s (A, B), s (B, C), then have s (A, C);
The transmission of delivery rules one, similarity relation, if c (A, B), c (B, C), then have c (A, C);
Delivery rules two, the transmission of coordination, if b (A, B), b (B, C), then have b (A, C);
The antisense of inverting rule one, antonymy, if r (A, B), r (B, C), then have c (A, C);
The antisense of inverting rule two, major-minor relation, if s (A, B), r (B, C), then have r (A, C);
The antisense of inverting rule three, similarity relation, if c (A, B), r (B, C), then have r (A, C);
The plus-minus of plus-minus rule one, degree relation, if d (A, B, n), d (B, C, m), then have d (A, C, m+n);
The plus-minus of plus-minus rule two, similarity relation, if c (A, B), c (A, C), then have c (A, B)+c (A, C)=c (A, B+C), c (A, B+C)=c (A, D), D=B+C, B here, C, D are the similarity relation keyword set of keyword A;
The plus-minus of plus-minus rule three, coordination, if b (A, B), b (A, C), then have b (A, B)+b (A, C)=b (A, B+C), b (A, B+C)=b (A, D), D=B+C, B here, C, D are the coordination keyword set of keyword A.
It will be appreciated by those skilled in the art that the incidence relation between the keyword includes but not limited to six above-mentioned classes, the related operation rule of incidence relation includes but not limited to above-mentioned seven kinds.The computing of keyword incidence relation realizes by the step of keyword incidence relation computing retrieval.So-called keyword incidence relation computing retrieval is the keyword set that obtains the various different incidence relations of a certain keyword according to the above-mentioned incidence relation operation rule of stating.
3, the definition of keyword incident space
The keyword linked database is used for preserving the keyword incident space.The keyword incident space comprises the keyword incidence relation, and the keyword incidence relation includes but not limited to aforesaid: major-minor relation, member relation, similarity relation, antagonistic relations, coordination and degree relation.
With the semantic information database, " database " in the keyword linked database is a kind of expression, those skilled in the art understand, it also can adopt as relational databases such as Oricle, mySQL, DB2, also can adopt the object-oriented database as Versant, Db4o, even can also adopt common file mode.Even in some small-sized system, it in addition can realize by memory-resident.The preserving type of the keyword incidence relation in the keyword linked database also can have multiple mode.Such as, to press incidence relation and preserve, different incidence relations is independently preserved.Also can be according to keyword to preserve, in this manner, certain keyword have comprised its various incidence relation.
As previously mentioned, the keyword incidence relation can also be further according to related operation rule computing.The keyword operation rule can be by the definition of search incidence relation search utility, also can be in the keyword linked database predefined.Such as, in relational database, define related operation rule by storing process (Procedure).The present invention is the related operation rule of definition in the keyword linked database preferentially.Thus, not only preserve keyword, keyword incidence relation in the keyword linked database, gone back related operation rule.Under this situation, keyword, keyword incidence relation, related operation rule have been formed the keyword incident space.That is, the keyword incident space comprises related operation rule.
Keyword incident space in the keyword linked database needs pre-defined, thereby need comprise the step of the definition keyword incident space among the step S1.So-called definition keyword incident space has comprised two steps, one, definition, and definition keyword incident space; Two, preserve, the keyword incident space is kept in the keyword linked database.Need to prove, content in the pre-defined keyword linked database here refers to that the retrieval of keyword incidence relation is to the dependence of content in the keyword linked database, it does not also mean that the content of this database keeps unalterable in the pre-defined back of keyword incidence relation, on the contrary, the incidence relation of keyword often changes, such as increasing incidence relation to expand the content of this database.
3, keyword incidence relation computing retrieval
Under the definition of aforesaid six class keyword incidence relations, for convenience of description, six kinds of related keyword set are defined as respectively: major-minor concerns that related keyword set is combined into the s collection; The related keyword set of member relation is combined into the m collection; The related keyword set of similarity relation is combined into the c collection; The related keyword set of coordination is combined into the b collection; The related keyword set of antonymy is combined into the r collection; Degree concerns that related keyword set is combined into the d collection.
As previously mentioned, the preservation of keyword incidence relation can be preserved by incidence relation, and different incidence relations is independently preserved.In this manner, various incidence relation is set up tables of data respectively, the incidence relation between two keywords of each record expression of tables of data.Under this mode, it is very simple that the storage of incidence relation seems.But keyword incidence relation computing retrieval will be very complicated, and the computing retrieval of keyword incidence relation needs by continuous Recursive Implementation.
The preservation of keyword incidence relation also can be according to keyword to preserve, and in this manner, certain keyword has comprised its various incidence relation.Various incidence relation is only used a tables of data, and the keyword set of the various different incidence relations of a certain keyword preserved in each record of tables of data.In this manner, each the record content in the tables of data can be complete, and is also can right and wrong complete.If each record is complete, then need in the keyword incidence relation, carry out keyword incidence relation computing retrieval.Add a keyword such as user or keeper, this keyword has comprised initialized s collection, m collection, c collection, b collection, r collection and d collection, obtain complete s collection, m collection, c collection, b collection, r collection and d collection according to keyword incidence relation computing retrieval then, finally be kept in the tables of data, also revise other keyword incidence relations that are associated and preservation simultaneously.Record of every interpolation just needs the record strip number of the tables of data revised a lot.But under this situation, the retrieval of keyword related information will be very simple and quick, only need find specific record just can find out the keyword set of its all incidence relation.Each record content in the tables of data also can be incomplete, under this situation, can carry out keyword incidence relation computing retrieval during the storage of keyword incidence relation and also can not carry out keyword incidence relation computing retrieval.Need do complete keyword incidence relation computing retrieval during the retrieval of keyword related information.Certainly, can also adopt other modes here, such as, be the result that complete keyword incidence relation computing retrieval obtains during to the retrieval of keyword related information and preserve.Concrete which kind of mode of employing depends on the Different Strategies in the different system.
This shows that the step of keyword incidence relation computing retrieval can realize according to the storage means of different keyword incidence relations, also can realize in step S2 in step S1.That is, step S1 comprises the step of keyword incidence relation computing retrieval or the step that step S2 comprises keyword incidence relation computing retrieval.
Can adopt Different Strategies in the algorithm different system of keyword incidence relation computing retrieval.In some system, can only do limited retrieval, can do complete retrieval in some system.Complete keyword incidence relation computing retrieval needs Recursive Implementation usually in the Database Systems of relationship type, this need expend the plenty of time.Consider necessity and the consumed time of complete retrieval, also can do limited number of time recurrence retrieval, such as a recurrence 2 times or 3 times.In object-oriented database, complete retrieval consumed time will be wanted much less than system R, and realizing also can be fairly simple.If the whole frameworks of data in memory headroom, will become more simple.The algorithm of concrete keyword incidence relation computing retrieval it will be appreciated by those skilled in the art that and can realize by the related operation rule of the keyword of aforementioned definitions.
5, the retrieval of keyword related information
The retrieval of keyword related information is its relevant related information of keyword retrieval by user's input.May further comprise the steps:
A, input keyword; B, retrieve the related keyword of this keyword at the keyword incident space; C, the related information of search key and related keyword again.Wherein, b step correspondence the step S2 in the summary of the invention, the c step for that step S3 in the summary of the invention.When the input keyword, keyword may not be accurate in user's the input information, therefore can also further add the step of keyword extraction in this step.The mode of extracting keyword from user's input information only need get final product the keyword coupling that defines in user's input information and the keyword linked database.The mode that it will be appreciated by those skilled in the art that this realization is not so difficult.

Claims (6)

1. one kind based on the information association store of semanteme and the method for retrieval, it is characterized in that, comprises semantic information database and keyword linked database; Described semantic information database is used for preserving the content information of keyword; Described keyword linked database is used for preserving the keyword incident space; This method may further comprise the steps:
S1, definition keyword incident space;
S2, the related keyword of search key;
S3 obtains corresponding semantic information according to the related keyword retrieval semantic information database retrieval of keyword.
2. as claimed in claim 1ly it is characterized in that based on the information association store of semanteme and the method for retrieval described keyword incident space comprises the keyword incidence relation; Described keyword incidence relation comprises: major-minor relation, member relation, similarity relation, coordination, antagonistic relations and degree relation.
3. as claimed in claim 1ly it is characterized in that based on the information association store of semanteme and the method for retrieval described keyword incident space comprises related operation rule; Described related operation rule comprises: deduce rule, delivery rules, the anti-rule that pushes away, plus-minus rule.
4. as claimed in claim 1ly it is characterized in that based on the information association store of semanteme and the method for retrieval that described step S1 comprises the step of keyword incidence relation computing retrieval.
5. as claimed in claim 1ly it is characterized in that based on the information association store of semanteme and the method for retrieval that described step S2 comprises the step of keyword incidence relation computing retrieval.
6. as claimed in claim 1ly it is characterized in that based on the information association store of semanteme and the method for retrieval that this method also comprises the step of extracting keyword from user's input information.
CN201310089129XA 2013-03-20 2013-03-20 Method for storing and searching correlation information based on semanteme Pending CN103235780A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310089129XA CN103235780A (en) 2013-03-20 2013-03-20 Method for storing and searching correlation information based on semanteme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310089129XA CN103235780A (en) 2013-03-20 2013-03-20 Method for storing and searching correlation information based on semanteme

Publications (1)

Publication Number Publication Date
CN103235780A true CN103235780A (en) 2013-08-07

Family

ID=48883822

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310089129XA Pending CN103235780A (en) 2013-03-20 2013-03-20 Method for storing and searching correlation information based on semanteme

Country Status (1)

Country Link
CN (1) CN103235780A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103995844A (en) * 2014-05-06 2014-08-20 小米科技有限责任公司 Information search method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5864863A (en) * 1996-08-09 1999-01-26 Digital Equipment Corporation Method for parsing, indexing and searching world-wide-web pages
CN1335574A (en) * 2001-09-05 2002-02-13 罗笑南 Intelligent semantic searching method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5864863A (en) * 1996-08-09 1999-01-26 Digital Equipment Corporation Method for parsing, indexing and searching world-wide-web pages
CN1335574A (en) * 2001-09-05 2002-02-13 罗笑南 Intelligent semantic searching method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王云: "查找同义词和相关词的定义衍生法", 《情报理论与实践》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103995844A (en) * 2014-05-06 2014-08-20 小米科技有限责任公司 Information search method and device
CN103995844B (en) * 2014-05-06 2017-11-21 小米科技有限责任公司 Information search method and device

Similar Documents

Publication Publication Date Title
Kara et al. An ontology-based retrieval system using semantic indexing
CN103646032B (en) A kind of based on body with the data base query method of limited natural language processing
US10853361B2 (en) Scenario based insights into structure data
Polfliet et al. Automated mapping generation for converting databases into linked data
Gubanov Polyfuse: A large-scale hybrid data fusion system
US20230205996A1 (en) Automatic Synonyms Using Word Embedding and Word Similarity Models
KR101095866B1 (en) Triple indexing and searching scheme for efficient information retrieval
Hu et al. Scalable aggregate keyword query over knowledge graph
Michel et al. A generic mapping-based query translation from SPARQL to various target database query languages
Drakopoulos et al. Tensor-based document retrieval over Neo4j with an application to PubMed mining
Kalyani et al. Paper on searching and indexing using elasticsearch
US10372736B2 (en) Generating and implementing local search engines over large databases
Bordawekar et al. Enabling cognitive intelligence queries in relational databases using low-dimensional word embeddings
CN110717014B (en) Ontology knowledge base dynamic construction method
Bontcheva et al. Semantic search over documents and ontologies
Marx et al. Exploring term networks for semantic search over RDF knowledge graphs
CN103235780A (en) Method for storing and searching correlation information based on semanteme
Chaware et al. Ontology supported inference system for Hindi and Marathi
Suganyakala et al. Movie related information retrieval using ontology based semantic search
Krishnamurthy et al. Information retrieval models: trends and techniques
Zhang et al. XML-based document retrieval in Chinese diseases question answering system
Condoravdi et al. Natural Language Access to Data: It Takes Common Sense!
Khan Comparative study of information retrieval models used in search engine
Khazalah et al. Automatic mapping rules and OWL ontology extraction for the OBDA Ontop
Narula et al. Improving statistical multimedia information retrieval model by using ontology

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20130807