CN100524317C

CN100524317C - Method and apparatus for ordering incidence relation search result

Info

Publication number: CN100524317C
Application number: CNB2007101631523A
Authority: CN
Inventors: 舒琦; 文坤梅; 李瑞轩; 孙小林; 赵艳涛
Original assignee: Huawei Technologies Co Ltd; Huazhong University of Science and Technology
Current assignee: Huawei Technologies Co Ltd; Huazhong University of Science and Technology
Priority date: 2007-10-10
Filing date: 2007-10-10
Publication date: 2009-08-05
Anticipated expiration: 2027-10-10
Also published as: CN101140588A

Abstract

The invention discloses a sorting method for correlation search results and a relevant device. The method comprises: Perform resolution on triad information of each main body example, in order to construct an example correlation diagram; according to the two input examples, perform ergode on all correlation paths between the two examples in the example correlation diagram, and create search results information; calculate the field of relativity and/or correlation length and/or correlation frequency; perform sorting on the search result information. The device comprises: A body resolution module, a correlation search module and a correlation sorting module. It is known through the technical scheme that, through calculation on such parameters as field of relativity and/or correlation length and/or correlation frequency, the invention can fulfill flexible sorting for the search results, so that users can obtain anticipated information effectively from the search results.

Description

A kind of sort method of incidence relation search result and device

Technical field

The present invention relates to a kind of sort method and device of Search Results, especially a kind of sort method of incidence relation search result and device.

Background technology

A main target of current network is exactly information sharing, and promptly regardless of which type of platform, language and agreement, the user can both have access to the information that needs, this contemporary just network business demand behind.Therefore, backing wire network search in 10 years concentrates on and guarantees that the development of the level framework in the development of semantic net and the semantic net system can make computing machine more efficient when the process information resource retrieval on the accessed standard of network information resource energy in the past.

Current the semantic search Study on Technology is mainly concentrated on the search of example in the semantic network resource (being example or entity), but in real world applications, people are most interested in is not entity in the Internet resources, but the relation of the semantic association between them.Therefore corresponding research emphasis also should turn to the search of incidence relation between the resource in the semantic net from legacy network in the search to keyword or semantic annotation.Association search should be able to provide a kind of effective method to answer such as " whether having certain semantic association between entity X and the entity Y ", existing at present research at semantic association, and obtained certain progress.

Another major issue that need solve is after entity associated is searched, how decides the importance of these incidence relations from user's angle, also promptly how these incidence relation search results is sorted.Along with becoming increasingly abundant of semantic resource, with respect to the quantity of entity itself, the incidence relation between the entity will surpass entity itself.Therefore the sort method of incidence relation also seems particularly important.The sort method of research incidence relation search result helps further developing of semantic net, also will play facilitation to the semantic search technology simultaneously.

Semantic search result's ordering is the gordian technique that semantic search need solve, the number that concerns between the entity in the knowledge base may far exceed entity itself, traditional result ordering method can only sort to text message, can't discern semantic information, therefore can not realize sort result based on semanteme.Mostly be in conjunction with traditional search engines sort result algorithm and information retrieval technique at present, attempt new semantic search result ordering method, utilize the Semantic Web resources importance that result set is sorted, the information retrieval example is concentrated on the semantic metadata, the relation of complexity on the metadata that tries to find out, the sort method that has proposed a kind of predictive user demand is discerned semantic association.The research of association search sort method relates to the technology of many aspects, such as ontology, semantic net, link analysis, community network and statistics etc.

Body (Ontology) is the clear and definite formalization normalized illustration of shared ideas model, in a lot of fields,, information management etc. integrated as knowledge engineering, natural language processing, information synergism system, intelligent information all is the hot issue of research, and it provides the common understanding shared of a cover to domain-specific knowledge.In a certain field, body has carried out strict difinition to notion, determine the accurate implication of notion by the relation between the notion, expression is common approval, sharable knowledge, has multiple vocabulary and same vocabulary that the problem of multiple notion (implication) is arranged thereby solve identical concept.Ontology Modeling comprises the hierarchy description of key concept in the cover field, the important attribute of each notion is described by " attribute-value " mechanism, relation between notion is described by corresponding logical statement, gives one or more notions to interested individual instances in the field.Research to incidence relation is based upon on the body, and the search of incidence relation is exactly the search to concerning between the entity in the body, therefore, grasps the basis that Ontological concept and effect are research semantic association searching orders.

Semantic net (Semantic Web) is the WWW of future generation that WWW inventor Tim Berners-Lee advocates, and is intended to give the unique sign of all resources on the WWW, and sets up the accessible all kinds of semantic relations of machine between resource.The notion of semantic search was proposed in 2003.In recent years, progressively launch correlative study in this field, and obtained preliminary development.Semantic net is exactly the application of ontology in WWW, undoubtedly will have influence on as follow-on WWW on the building mode of website and user's the use-pattern.

Link analysis (Hyperlink analysis) is called structure analysis (structure analysis) again, with character, the especially macroscopic property of Yin Zanging of hyperlink as main input research Web.Link analysis on the Web is based on following two hypothesis:

What suppose that 1: one hyperlink from page A to negative B represents is: the author of page A is to a kind of recommendation of page B.

Suppose 2: if page A and page B couple together by hyperlink, we just think that they might be about same theme.

If regard the page as summit, directed edge is regarded in link as, and whole Web just can be regarded as a digraph, is called Web figure (Web graph), can research and analyse with Complex Networks Theory.More famous link analysis algorithm has the PageRank algorithm of google, HITS (Hyperlink-Induced TopicSearch) algorithm, ARC (Automatic Resource Compilation) algorithm or the like at present.Though they are to be used for traditional WWW, link analysis is regarded whole Web as the notion of a digraph.

Another branch's community network of incidence relation in the body between the entity and computing machine subject is learned research the resemblance of some, can utilize community network to learn being fruitful of research and obtain the incidence relation that the user is concerned about and have, therefore need a more clearly understanding be arranged the present Research that community network is learned.The research of community network is because of subject development such as sociology, anthropology, pestology, and little by little the sociologist develops into powerful instrument with it---and social network analysis (social network analysis, SNA).SNA is by mapping and analyze inner interpersonal relations such as group, tissue, community, and method, instrument and the technology of description abundant, system and analysis social relation network is provided.The theoretical visual angle of SNA problem analysis mainly concentrates on some characteristic of relation (network topology structure) between the actor rather than actor, and emphasizes to influence each other between the actor, rely on, and emerges in large numbers behavior thereby produce integral body.The set that social relation network is made up of the line (concerning between the actor) between a plurality of nodes (actor) and the node is represented network with node and line, and this obtains formalization preferably with regard to the analysis that makes community network and defines.Therefore, the data of community network should comprise structure variable (structural variable) and composition variable (composition variable) at least.Structure variable is measured certain particular kind of relationship between two actors, and it is the foundation stone of community network data set.For example, it can measure interpersonal information, Knowledge Flowing, perhaps the trade between enterprise, investment etc.Composition variable, or perhaps actor's attribute variable, the description of normally single actor's aspect.For example, it can measure actor's sex, specialty, or the industry of enterprise, scale etc.

Statistics is a research chance phenomenon, and to be inferred as the methodology science of feature, the thought of " being spreaded to all by part " is through all the time statistical.Specifically, it is to study numerical data how to collect, put in order, analyze reflection things overall information, and on this basis, principle and method that general characteristic is inferred.The step of being familiar with things with statistics is: research and design-sample survey-statistical inference-conclusion.Here, research and design is exactly the plan of formulating investigation and experimental study, and sample survey is the process of gathering information, and statistical inference is the process of analysis of data.Obviously the major function of statistics is to infer, and the method for inferring is a kind of incomplete induction, infers totally with part information because be.Incidence relation quantity between the entity is huge, if it is impossible therefrom drawing the emphasis that the user uses with artificial calculation mode purely, only with utilizing a kind of reasoning, the technology of analyzing, sample survey by the part obtains the overall conclusion of the overall situation, and a so just subject of statistics.

In the existing search technique, retrieval at keyword, just do simple word match in the text, also promptly to the retrieval of entity, the result who retrieves far can not meet user's requirement, therefore, having occurred is the retrieval of carrying out at the incidence relation between the entity, but in the existing retrieval at incidence relation, the technical scheme that does not exist the result for retrieval to incidence relation to sort again, make the user to search it efficiently, accurately and go for information, can't satisfy user's demand.

Summary of the invention

The purpose of the embodiment of the invention provides a kind of sort method and device of incidence relation search result, gets access to the information that it is wanted so that the user can be more accurate and effective from Search Results.

For achieving the above object, the embodiment of the invention provides a kind of sort method of incidence relation search result, comprising: resolve the triplet information of each example of body, according to the triplet information structure example incidence relation figure of each example; According to any two examples in the described body of input, the path of all incidence relations between two examples described in the traversal example incidence relation figure generates the search result information of all incidence relations between two examples; According to described search result information, calculate domain correlation degree, incidence relation length or incidence relation frequency; According to domain correlation degree or incidence relation length or incidence relation frequency, perhaps come described search result information is sorted according to the combination in any of domain correlation degree, incidence relation length, incidence relation frequency; Wherein, the domain correlation degree D of each search result information _RCalculate by following formula:

D_{R} = d + (1 - d) \times \frac{| Y_{i} |}{length (R)} \times (1 - \frac{| N_{i} |}{length (R)})

Wherein, R is the incidence relation of described each search result information correspondence:

R={O ₁, P ₁, O ₂, P ₂, O ₃..., O _N-1, P _N-1, O _n, wherein n equals length (R); Length (R) is the path of this incidence relation; D is for adjusting the factor, 0＜d＜1; Y _iFor in the incidence relation of described each search result information correspondence, belong to the example O of user's interest field D _iWith attribute P _iSet:

Y _i＝{O _i?or?P _i|(O _i∈R)∩(P _i∈R)∩(O _i∈D)∩(P _i∈D)}

N _iFor in the incidence relation of described each search result information correspondence, do not belong to the example O of user's interest field D _iWith attribute P _iSet:

N_{i} = {O_{i} or P_{i} | (O_{i} &Element; R) \cap (P_{i} &Element; R) \cap (O_{i} &NotElement; D) \cap (P_{i} &NotElement; D)};

The incidence relation length L of each search result information _RCalculate by following formula:

L_{R} = \frac{1}{length (R)}

Or

L_{R} = 1 - \frac{1}{length (R)};

The incidence relation frequency F of each search result information _RCalculate by following formula:

F_{R} = \frac{{RI}_{R} + {RS}_{R}}{2};

RS _RRelative out-degree for incidence relation R; RI _RRelative in-degree for incidence relation R.

The embodiment of the invention also provides a kind of collator of incidence relation search result, comprising: the body parsing module, be used to resolve the triplet information of each example of body, and make up example incidence relation figure according to the triplet information of each example; The incidence relation search module is used for any two examples of described body according to input, and the path of all incidence relations between two examples described in the traversal example incidence relation figure generates the search result information of all incidence relations between two examples; The incidence relation order module is used for according to described search result information, calculates domain correlation degree, incidence relation length or incidence relation frequency; According to domain correlation degree or incidence relation length or incidence relation frequency, perhaps come described search result information is sorted according to the combination in any of domain correlation degree, incidence relation length, incidence relation frequency; Wherein, the domain correlation degree D of each search result information _RCalculate by following formula:

D_{R} = d + (1 - d) \times \frac{| Y_{i} |}{length (R)} \times (1 - \frac{| N_{i} |}{length (R)})

Y _i＝{O _i?or?P _i|(O _i∈R)∩(P _i∈R)∩(O _i∈D)∩(P _i∈D)}

N_{i} = {O_{i} or P_{i} | (O_{i} &Element; R) \cap (P_{i} &Element; R) \cap (O_{i} &NotElement; D) \cap (P_{i} &NotElement; D)};

L_{R} = \frac{1}{length (R)}

Or

L_{R} = 1 - \frac{1}{length (R)};

F_{R} = \frac{{RI}_{R} + {RS}_{R}}{2};

As shown from the above technical solution, the embodiment of the invention is by calculating the parameter of domain correlation degree and/or incidence relation length and/or incidence relation frequency, can sort to Search Results flexibly, thus make that the user can be more accurate and effective from Search Results, get access to the information that it is wanted.

Below by drawings and Examples, technical scheme of the present invention is described in further detail.

Description of drawings

Fig. 1 is the incidence relation sort method process flow diagram of embodiments of the invention one;

Fig. 2 is the structural representation of collator of the incidence relation search result of embodiments of the invention two;

Fig. 3 is the body construction synoptic diagram one of embodiments of the invention three;

Fig. 4 is the body construction synoptic diagram two of embodiments of the invention three;

Fig. 5 is the user interface synoptic diagram of embodiments of the invention three;

Fig. 6 is the search result interfaces synoptic diagram of embodiments of the invention three;

Fig. 7 is the user interface synoptic diagram one of embodiments of the invention four;

Fig. 8 is the search result interfaces synoptic diagram one of embodiments of the invention four;

Fig. 9 is the user interface synoptic diagram two of embodiments of the invention four;

Figure 10 is the search result interfaces synoptic diagram two of embodiments of the invention four;

Figure 11 is the search result interfaces synoptic diagram three of embodiments of the invention four;

Figure 12 is the user interface synoptic diagram three of embodiments of the invention four;

Figure 13 is the search result interfaces synoptic diagram four of embodiments of the invention four;

Figure 14 is the system's ranking results and the artificial ranking results data comparison diagram of inventive embodiments.

Embodiment

The search of semantic association relation is promptly searched for the semantic association between the example in the body, i.e. two examples in certain ken, if they directly link together by one or more attributes, or the attribute of similar (identical or derive) links together indirectly, just is referred to as semantic association.This incidence relation has constituted an incidence relation figure, and each example is each node among the figure.Semantic association is based on the viewpoint of resource description framework (Resource Description Framework is called for short RDF) sequence of attributes, can regard the path of having done mark in the knowledge base as.

Semantic association result's ordering depends on correlation techniques such as statistics, link analysis, community network and morphology.

In the semantic association sort method, the embodiment of the invention has considered that mainly the order standard yardstick of several keys is provided with for user's demand according to oneself when searching for.

Design suitable sort method, at first must be able to identify the key factor of influence ordering.

To any search Q=(O ₁, O _n), this search expression user wishes query case O ₁And O _nBetween the incidence relation that exists.Its Query Result is incidence relation R={O ₁, P ₁, O ₂, P ₂, O ₃..., O _N-1, P _N-1, O _n, its set by all example O on the incidence relation path and attribute P constitutes.Below three kinds of factors be the key factors of influence ordering, be respectively domain correlation degree, semantic association path and incidence relation frequency.

1) domain correlation degree is meant in certain incidence relation, the correlativity size of all examples of appearance and attribute and user's domain of interest, and its size is designated as D _R

Domain correlation degree can be regulated and assignment voluntarily by the user, and the user may user interested in some field, different, and its interested field also can change.In concrete the application, can mark off some fields, and give different weights different field.To user's domain of interest, can give higher relatively weights.Certain field is made up of associated all examples and community set.Comprise as sphere of learning that example " teacher ", " paper ", " course " and attribute " are delivered ", " giving lessons " etc.The example and the attribute that belong to user's domain of interest in certain incidence relation are many more, and then the domain correlation degree of this incidence relation should be big more.The incidence relation that the degree of correlation is big more then is the more interested result of user.

2) semantic association length is meant the influence of the length in semantic association path between connection two examples to the incidence relation sort result, and its size is designated as L _R

To Query Result R={O ₁, P ₁, O ₂, P ₂, O ₃..., O _N-1, P _N-1, O _n, the length of its associated path is n.Generally speaking, two associated instance paths are short more, illustrate that then its relation is important more.Under some situation, then opposite.For example in national foreign exchange department or security department, user expectation is found potential suspect or terrorist by complicated incidence relation, user's interest information may lie in the long semantic association relation, at this moment long incidence relation should be endowed the higher degree of correlation with respect to short general incidence relation.

3) the incidence relation frequency is meant in certain incidence relation, all examples of appearance go out the influence of in-degree to the incidence relation sort result, its size is designated as F _R

Before compute associations concerns frequency, at first need to understand the in-degree of example and the notion of out-degree.Similar webpage ordering (PageRank) technology, an example has bigger in-degree and out-degree, shows that then it has higher importance.As in education sector, two instantiations " Tsing-Hua University " and " Changjiang University " as " school " this notion, in RDF figure, " Tsing-Hua University " has bigger in-degree and out-degree, show " Tsing-Hua University " for " Changjiang University ", " Tsing-Hua University " has higher popularity.These have the example that more exceeds in-degree, and we can regard more the example of " important " as, for the incidence relation that has comprised " important " example, can give higher weights when ordering.

To search Q=(O ₁, O _n), after the result of its search travels through the path of all incidence relations between these two examples among the example incidence relation figure, the incidence relation collection that obtains:

R _i＝{O ₁，P _1i，O _2i，P _2i，O _3i，......，O _(n-1)i，P _(n-1)i，O _n}，i＝1，2，...，m

Embodiment one

Based on the key factor of these three kinds influence orderings, the embodiment of the invention has proposed a kind of incidence relation sort method, comprises the steps: as shown in Figure 1

The triplet information of each example of step 1, parsing body is according to the triplet information structure example incidence relation figure of each example;

Step 2, according to any two examples in the described body of input, the path of all incidence relations between two examples described in the traversal example incidence relation figure generates the search result information of all incidence relations between two examples;

According to two examples of user's input, in the example associated diagram of having built up, to search for, way of search is the depth-first search of figure, under other example and attribute preservation with these two examples of be related in search procedure.All incidence relations that utilize graph search algorithm to search for to exist between two examples.In adjacency matrix, this adjacency matrix is in this body between all examples direct correlation and closes the expression that ties up to figure in the data structure with the triple store that extracts.Adopt the graph search algorithm of depth-first, algorithm is from figure certain vertex v 1 (being the node among the figure of initial search example correspondence of user input) at first, visit this summit, survey by depth-first from the not accessed abutment points continuation of v1 successively then, and the summit that all-access is crossed makes a visited (visiting) sign, so that can skip when running into this summit once more later on.When the point that arrives just in time is the terminal point v2 (being the node among the figure of termination search example correspondence of user's input) of requirement, then the path of a starting point v1 to terminal point v2 found in explanation, this path is preserved, and continue search, searched come out up to the path that all is connected v1 and v2.Still do not find any paths after all else if summits are all surveyed and finished, then illustrate not have incidence relation between v1 and the v2.

Step 3, according to described search result information, calculate domain correlation degree and/or incidence relation length and/or incidence relation frequency;

Step 4, come described search result information is sorted according to domain correlation degree or incidence relation length or incidence relation frequency or according to the combination in any of domain correlation degree, incidence relation length, incidence relation frequency.

In above-mentioned steps 3, domain correlation degree, incidence relation length, incidence relation frequency can adopt following computing method:

(1) calculates domain correlation degree D _R

At first obtain the affiliated class (summary of example, abstractdesription) of example in the incidence relation, then this type of is compared with the class in user-selected interested field, if coupling just illustrates that this type of is the class that the user wishes Special attention will be given to, obtain all this type of example and the attribute in all incidence relation paths, according to example that gets access to and attribute, calculate by following formula.

To search Q=(O ₁, O _n), setting user's domain of interest is D.

The example and the community set that belong to field D are:

Y _i＝{O _i?or?P _i|(O _i∈R)∩(P _i∈R)∩(Ox∈D)∩(P _i∈D)}

The example and the community set that do not belong to field D are:

N_{i} = {O_{i} or P_{i} | (O_{i} &Element; R) \cap (P_{i} &Element; R) \cap (O_{i} &NotElement; D) \cap (P_{i} &NotElement; D)}

Then the domain correlation degree of incidence relation is:

D_{R} = d + (1 - d) \times \frac{| Y_{i} |}{length (R)} \times (1 - \frac{| N_{i} |}{length (R)})

(formula 1)

Wherein, the length of length (R) expression incidence relation, d is for fear of D _R=0 and the adjustment factor set, the size of d is set between 0 and 1, can set up on their own, generally gets 0＜d＜0.1.Computing method show that domain correlation degree is directly proportional with example that belongs to field D and attribute number, are inversely proportional to example that does not belong to field D and attribute number, also are that to belong to example and the attribute of field D in the incidence relation many more, and its domain correlation degree is big more.

(2) computing semantic correlation length L _R

L_{R} = \frac{1}{length (R)} or L_{R} = 1 - \frac{1}{length (R)}

(formula 2)

Formula (2) in two kinds of situation, first kind of computing method shows that the semantic association path is short more, then this path is valuable more, i.e. the semantic association length L _RBig more, the implication of second kind of computing method expression is opposite fully, gives bigger semantic association path values to the path that semantic association is long.The user can select suitable semantic association length computation formula in conjunction with actual demand.

(3) compute associations concerns frequency F _R

What the incidence relation frequency was actually node among the figure goes out relatively that in-degree calculates, and therefore, at first will obtain the out-degree and the in-degree of each node in the path, goes out in-degree relatively according to what this out-degree and in-degree calculated node.Also to select the maximum in-degree that goes out relatively the in-degree simultaneously from going out relatively of all nodes.Bringing these values into following formula calculates.

At first, definitions example O _iAbsolute in-degree AI _i, absolute out-degree AO _i, relative in-degree RI _i, relative out-degree RO _iDefining relative in-degree and relative out-degree, is for to incidence relation frequency F _RStandardized consideration.

In RDF figure, suppose to have k entity to point to example O _i, O _iPoint to p other example, then definitions example O _iAbsolute in-degree AI _i: AI _i=k.

Definitions example O _iAbsolute out-degree AO _i: absolute out-degree is AO _i=p.

Define its relative in-degree RI _i:

{RI}_{i} = \frac{1}{k} Σ_{j = 1}^{k} S_{ji} .

S wherein _JiBe example O _jPoint to O _iAnd distribute to O _iIn-degree, then:

S_{ji} = \frac{1}{{AO}_{j}},

Wherein, AO _jIt is the absolute out-degree of node j.

Therefore, relative in-degree RI _iFor:

{RI}_{i} = \frac{1}{k} Σ_{j = 1}^{k} \frac{1}{{AO}_{j}} .

Out-degree is defined as relatively:

{RO}_{i} = 1 - \frac{1}{{AO}_{i}},

AO wherein _iBe example O _iAbsolute out-degree.

P example appears in incidence relation R, on this basis, and the relative in-degree RI of we definable incidence relation R _R

{RI}_{R} = \frac{1}{length (R)} Σ_{i = 1}^{p} \frac{{RI}_{i}}{Max (I)},

Wherein Max (I) is that maximum in all entities is gone into the number of degrees.

Also promptly:

{RI}_{R} = \frac{1}{length (R)} Σ_{i = 1}^{p} \frac{(\frac{1}{k} Σ_{j = 1}^{k} \frac{1}{{AO}_{j}})}{Max (I)} .

The relative out-degree of incidence relation R

{RS}_{R} = \frac{1}{length (R)} Σ_{i = 1}^{p} R O_{i},

Also promptly:

{RS}_{R} = \frac{1}{length (R)} Σ_{i = 1}^{p} (1 - \frac{1}{{AO}_{i}}) .

The frequency weight F of incidence relation _RThe in-degree size that goes out by entity in the incidence relation determines simultaneously.

F_{R} = \frac{{RI}_{R} + R S_{R}}{2} = \frac{\frac{1}{length (R)} Σ_{i = 1}^{q} (\frac{{RI}_{i}}{Max (I)} + (1 - \frac{1}{{AO}_{i}}))}{2},

Also promptly:

F_{R} = \frac{{RI}_{R} + R S_{R}}{2} = \frac{\frac{1}{length (R)} Σ_{i = 1}^{q} (\frac{(\frac{1}{k} Σ_{j = 1}^{k} \frac{1}{{AO}_{j}})}{Max (I)} + (1 - \frac{1}{{AO}_{i}}))}{2}

(formula 3)

Wherein Max (I) is that maximum in all entities is gone into the number of degrees, to entity i, has k entity to point to i, AO _jPresentation-entity O _jAbsolute out-degree, AO _iPresentation-entity O _iAbsolute out-degree.

In the concrete sorting operation of above-mentioned steps 4, can be separately with one of domain correlation degree, incidence relation length and incidence relation frequency foundation, the also foundation that can be used as sorting with their combination in any as ordering.As preferred embodiment, can take all factors into consideration the three sorts, specifically can come the total weight value of compute associations relation by the mode of weighted sum, the weighting coefficient of each factor can be provided with flexibly according to user's actual needs not to be had, can weighting coefficient be set by the predefined mode of mode or system of user's input, specific as follows:

V _R=k ₁* D _R+ k ₂* L _R+ k ₃* F _R(formula 4)

Wherein, k ₁+ k ₂+ k ₃=1, the user can be according to the actual demand in using to variable assignments.D _RExpression domain correlation degree size, L _RExpression semantic association length scale, F _RExpression incidence relation frequency size.

According to the total weight value of each incidence relation that calculates gained, all incidence relation search results are sorted by from big to small standard.

Embodiment two

Embodiments of the invention also provide a kind of collator of incidence relation search result, as shown in Figure 2, comprising:

Body parsing module 1 is used to resolve the triplet information of each example of body, makes up example incidence relation figure according to the triplet information of each example;

Incidence relation search module 2 is used for any two examples of described body according to input, and the path of all incidence relations between two examples described in the traversal example incidence relation figure generates the search result information of all incidence relations between two examples;

Incidence relation order module 3 is used for according to described search result information, calculates domain correlation degree and/or incidence relation length and/or incidence relation frequency; Come described search result information is sorted according to domain correlation degree or incidence relation length or incidence relation frequency or according to the combination in any of domain correlation degree, incidence relation length, incidence relation frequency.

Can also comprise that in said apparatus body load-on module 4 is used for loading body to device.Wherein, the body of loading promptly can be that oneself makes up, and also can obtain by search on the internet, behind the body, just can carry out corresponding parse operation after loading.

Embodiment three

As shown in Figure 3, it is a body construction synoptic diagram one, according to the analysis to body, has determined the following main class:

Bank account (Account);

Books (Book);

The selected course of student (Course);

Client (Customer);

The employee of certain tissue (Employee);

The teacher of school (Faculty): it comprises subclass: the professor in the university (Professor), and Professor comprises subclass again: the professor (Adviser) who is the tutor;

The flight of aircraft (Flight);

The leader of certain tissue (Leader);

Tissue or mechanism (Organization);

The passenger of flight (Passenger);

Way of paying (Payment_Type); Two subclasses are arranged under the Payment_Type: one is credit card (Credit_Card), and another often is the passenger (Frequent_Flier) of flight;

Student (Student) can be undergraduate, Master degree candidate or doctor or the like; The Student class has subclass: postgraduate (Grad_Student) has subclass under the Grad_Student: teacher assistant (TA);

Take the bill (Ticket) of flight;

According to the class of building up above, just can add corresponding example for each class.Part example in the body as shown in Figure 4, it is a body construction synoptic diagram two, is expressed as the relation after class Customer and Account add the part example.R1, r4, r5, r6, r8, r9, r10 are respectively the example of above-mentioned two classes.Wherein, r1, r5, r6, r9 are the examples of Customer, and r4, r8, r10 are the examples of Account.R1 is the R8 and the R10 account owner, and R1 can withdraw the money from R10, and R1 can deposit the R8 account; R5 is a tissue, and R1 is R5 shareholder, and R5 is the R4 account owner; The R5 organization leadership is R6, and R6 also is that R8 account owner .R6 is the tutor of R1 simultaneously; R9 also is a tissue, and R6 is R9 shareholder, also is the leader of R9 simultaneously.

From incidence relation figure as can be known, there is multiple different incidence relation between the example, the user is when searching for, import the initial example that to search for and stop example, user interface as shown in Figure 5, two examples of its input are r1 and r9, incidence relation between r1 and the r9 is retrieved, import domain correlation degree according to the user, the weighting coefficient of incidence relation length and incidence relation frequency, in conjunction with the computing method in the foregoing description, calculate to obtain the total weight value of the incidence relation between r1 and the r6, Search Results is sorted, the user is wished that the incidence relation that obtains preferentially returns according to the size of total weight value.Search Results after the ordering that the user obtains as shown in Figure 6.

Embodiment four

Describe at another body sort method below.Fig. 7 is the user search interface of present embodiment, searches out the incidence relation of two examples according to the order standard of system default, such as the incidence relation between Sun Xiaolin and the Lu Zhengding.Fig. 8 is the result of search.Article one result is done a simple explanation.

1. grandson's holt is delivered works and is delivered works paper005 author Li Ruixuan based on the integrated research author of the multiple domain access control policy Wen Kunmei of body and deliver works realize insertable authentication and the positive ancient cooking vessel Total of access control author Song Wei Lu instructor Value (total weight value of incidence relation) in the Java2 environment: 0.508706

Wherein Sun Xiaolin is an initial search example, he has delivered based on the such one piece of paper of the integrated research of multiple domain access control policy of body, and this piece paper also has another author Wen Kunmei, and Wen Kunmei has delivered another piece paper paper005, push over according to a kind of like this relation always the back, knows with the such example of Lu Zhengding to produce incidence relation.

And in most of the cases, the ranking results that the order standard of system default provides can not satisfy user's demand, such as the user wishes to find out more direct incidence relation between Sun Xiaolin and the Lu Zhengding, if the user will obtain this direct incidence relation, translate into several pages of back possibly, even tens pages.And the embodiment of the invention can allow user oneself that order standard is provided with, we can be made as 0.7 with Length weight (weights of incidence relation length), and the weights of Context weight (weights of domain correlation degree) are made as 0.2, Node In﹠amp; Out weight (weights of incidence relation frequency) is made as and 0.1, and Long Association Prior (length is related preferential) option is removed.As shown in Figure 9:

The Search Results that Figure 10 returns after sorting for the standard according to user's setting.

From the result, can see, first that the direct correlation relation between two examples comes, this incidence relation shows that the counselor of Sun Xiaolin is Lu Zhengding, and other short association also comes forward position, has so just reached user's demand.

Utilize default sort standard (it is preferential to have removed long association) that the incidence relation between two example grandson holts and the Wen Kunmei is searched for below, Search Results as shown in figure 11.

Article one, incidence relation shows that Wen Kunmei and Sun Xiaolin are the authors of " the integrated research of multiple domain access control policy " this piece paper in the body.But the user may be interested in by the path that " teacher " waits some concept connection to get up two examples, and the user just can emphasize these classes (setting the user's interest field) when order standard is set so, is illustrated in fig. 12 shown below.

In order standard, added after the user's interest class Search Results as shown in figure 13:

Current ranking results is different with last result, has just comprised " Li Ruixuan " and " Lu Zhengding " example of " teacher " class more like this in former the incidence relations.

The assessment test data of some reality is provided below:

In actual applications, multiple because the mode of statement semantic association has, and also user's order standard also has very strong subjectivity, and being difficult to a unified standard can assess this.Five users test sort method, and test result is assessed.Provide different semantic association relational queries, picked at random provides the order standard of each inquiry.Also need provide various types of examples and incidence relation simultaneously, whether relevant so that the user can judge certain association with its interested field.The user sorts to incidence relation according to its interested field and order standard then.Consider that different users has certain subjectivity to sort result the time, so all users' mean value can be regarded a kind of reference as.

Five kinds of more representative ordering combinations have been customized.In each test query, emphasize two kinds of factors (promptly by giving their higher weights).Based on the body construction of Fig. 3, following table has been listed the order standard and the meaning of search.

Table 1

Sequence number	Order standard	Meaning
Sequence number	Order standard	Meaning		1	Incidence relation between the example of the correspondence of inquiry " Passenger " and " Organization " is emphasized short incidence relation length and the path of containing the class (for example Ticket and Flight etc.) of transportation types.	For the test sequencing method is caught the ability in direct correlation path and the incidence relation in some interested field.
2	Inquire about the incidence relation between the example of two " Customer " classes.Emphasize long incidence relation length and contain the path of in a organized way class (for example Organization etc.).	Search out the incidence relation in long path and some interested field for the test subscriber.		1
2			3	Inquire about the incidence relation between the example of two " Customer " types.Emphasize long incidence relation length and the incidence relation that comprises important node.	Test macro is searched for the rare association energy related with comprising important node

		Power.
		Power.	4	Incidence relation between the example of inquiry " Customer " type and " Account " type.Emphasize to contain class (for example Organization etc.) in a organized way and comprise the incidence relation of important node (being incidence relation frequency height).	Can be used for the semantic analysis system, for example find getting in touch between client and the account, be used for anti money washing and detect.
5	Inquire about the incidence relation between the example of two " Customer " types.Select different weights for respectively incidence relation length, incidence relation frequency and domain correlation degree.	According to the ranking results behind three order standards of demand selection of user.	4

In order to prove the validity of sort method, data are put in order out in the mode of chart, as shown in figure 14, it has shown the system's ranking results (as shown in the figure, comprise inquiry 1, inquiry 2, inquiry 3, inquiry 4 and inquire about 5) and the artificial general relationship of ranking results (being desirable ranking results).Wherein " the desirable ordering " represents a kind of ideal situation, and promptly system's ranking results and artificial ranking results fit like a glove.The test effect of sort method still is more satisfactory as we can see from the figure.The result of tester ordering is more approaching with the ranking results of system, even some ranking results is direct the coupling with the result that system provides.Though just assessment test that finite sum is preliminary, still there are some differences between each tester's the order standard, but table with test results is understood the feasibility of sort method, and the dirigibility that has of the method is enough to satisfy a plurality of users' multiple preference, can allow them obtain satisfied Search Results.

It should be noted that at last: above embodiment is only in order to technical scheme of the present invention to be described but not limit it, although the present invention is had been described in detail with reference to preferred embodiment, those of ordinary skill in the art is to be understood that: it still can make amendment or be equal to replacement technical scheme of the present invention, and these modifications or be equal to replacement and also can not make amended technical scheme break away from the spirit and scope of technical solution of the present invention.

Claims

1, a kind of sort method of incidence relation search result is characterized in that, comprising:

Resolve the triplet information of each example of body, according to the triplet information structure example incidence relation figure of each example;

According to any two examples in the described body of input, the path of all incidence relations between two examples described in the traversal example incidence relation figure generates the search result information of all incidence relations between two examples;

According to described search result information, calculate domain correlation degree, incidence relation length or incidence relation frequency;

According to domain correlation degree or incidence relation length or incidence relation frequency, perhaps come described search result information is sorted according to the combination in any of domain correlation degree, incidence relation length, incidence relation frequency;

Wherein, the domain correlation degree D of each search result information _RCalculate by following formula:

D_{R} = d + (1 - d) \times \frac{| Y_{i} |}{length (R)} \times (1 - \frac{| N_{i} |}{length (R)})

R={O ₁, P ₁, O ₂, P ₂, O ₃..., O _N-1, P _N-1, O _n, wherein n equals length (R);

Length (R) is the path of this incidence relation;

D is for adjusting the factor, 0＜d＜1;

Y _iFor in the incidence relation of described each search result information correspondence, belong to the example O of user's interest field D _iWith attribute P _iSet:

Y _i＝{O _i?or?P _i|(O _i∈R)∩(P _i∈R)∩(O _i∈D)∩(P _i∈D)}

N_{i} = {O_{i} or P_{i} | (O_{i} &Element; R) \cap (P_{i} &Element; R) \cap (O_{i} &NotElement; D) \cap (P_{i} &NotElement; D)};

L_{R} = \frac{1}{length (R)}

Or

L_{R} = 1 - \frac{1}{length (R)};

F_{R} = \frac{{RI}_{R} + {RS}_{R}}{2};

RS _RRelative out-degree for incidence relation R;

RI _RRelative in-degree for incidence relation R.

2, method according to claim 1 is characterized in that, coming described search result information sorted according to the combination in any of domain correlation degree, incidence relation length, incidence relation frequency is specially:

Weighting coefficient according to user input or the predefined described domain correlation degree of system, incidence relation length, incidence relation frequency, described domain correlation degree, incidence relation length, incidence relation frequency are weighted summation, obtain the comprehensive degree of correlation, come described search result information is sorted according to the size of the comprehensive degree of correlation.

3, method according to claim 1 and 2 is characterized in that, described RSR calculates by following formula:

{RS}_{R} = \frac{1}{length (R)} Σ_{i = 1}^{p} R O_{i};

Wherein, length (R) is the path of described incidence relation;

RO _iBe example O _iRelative out-degree;

Described RO _iCalculate by following formula:

{RO}_{i} = 1 - \frac{1}{{AO}_{i}};

Wherein, AO _iBe example O _iAbsolute out-degree, AO _i=p, p are the number of example in the described incidence relation.

4, method according to claim 1 and 2 is characterized in that, described RI _RCalculate by following formula:

{RI}_{R} = \frac{1}{length (R)} Σ_{i = 1}^{p} \frac{{RI}_{i}}{Max (I)};

Wherein, length (R) is the path of described incidence relation;

Max (I) is that maximal phase in all entities is to in-degree;

RI _iBe example O _iRelative in-degree;

P is the number of example in the described incidence relation.

5, method according to claim 4 is characterized in that, described RI _iCalculate by following formula:

{RI}_{i} = \frac{1}{k} Σ_{j = 1}^{k} S_{ji};

Wherein, S _JiBe example O _jPoint to example O _iAnd distribute to O _iAbsolute in-degree, k is example O _iAbsolute in-degree.

6, method according to claim 5 is characterized in that, described S _JiCalculate by following formula:

S_{ji} = \frac{1}{{AO}_{j}};

Wherein, AO _jBe example O _jAbsolute out-degree.

7, a kind of collator of incidence relation search result is characterized in that, comprising:

The body parsing module is used to resolve the triplet information of each example of body, makes up example incidence relation figure according to the triplet information of each example;

The incidence relation search module is used for any two examples of described body according to input, and the path of all incidence relations between two examples described in the traversal example incidence relation figure generates the search result information of all incidence relations between two examples;

The incidence relation order module is used for according to described search result information, calculates domain correlation degree, incidence relation length or incidence relation frequency; According to domain correlation degree or incidence relation length or incidence relation frequency, perhaps come described search result information is sorted according to the combination in any of domain correlation degree, incidence relation length, incidence relation frequency;

D_{R} = d + (1 - d) \times \frac{| Y_{i} |}{length (R)} \times (1 - \frac{| N_{i} |}{length (R)})

Length (R) is the path of this incidence relation;

D is for adjusting the factor, 0＜d＜1;

Y _i＝{O _i?or?P _i|(O _i∈R)∩(P _i∈R)∩(O _i∈D)∩(P _i∈D)}

N_{i} = {O_{i} or P_{i} | (O_{i} &Element; R) \cap (P_{i} &Element; R) \cap (O_{i} &NotElement; D) \cap (P_{i} &NotElement; D)};

L_{R} = \frac{1}{length (R)}

Or

L_{R} = 1 - \frac{1}{length (R)};

F_{R} = \frac{{RI}_{R} + {RS}_{R}}{2};

RS _RRelative out-degree for incidence relation R;

RI _RRelative in-degree for incidence relation R.

8, device according to claim 7 is characterized in that, also comprises:

The body load-on module is used for loading body to device.