Summary of the invention
The embodiment of the invention provides a kind of method and server of extending user Search Results at the above-mentioned problems in the prior art, can intelligently provide abundanter Search Results for the user.
For this reason, the embodiment of the invention provides following technical scheme:
A kind of method of extending user Search Results comprises:
Obtain the searching key word of user in the search interface input;
Obtain the conjunctive word that is associated with described searching key word;
According to the query composition index data base of described searching key word, conjunctive word and described searching key word and conjunctive word, obtain Search Results respectively;
Go heavy and ordering to described Search Results;
Search Results after the ordering is sent to client, so that client represents the Search Results of receiving to described user.
Preferably, the described conjunctive word that is associated with described searching key word that obtains comprises:
Search association rule database according to described searching key word;
If the correlation rule that comprises described searching key word is arranged in the described association rule database, then from described correlation rule, obtain the conjunctive word that is associated with described searching key word.
Preferably, described method also comprises:
Setting comprises keyword and the correlation rule of the conjunctive word that is associated with described keyword; And/or, generate the correlation rule of the conjunctive word that comprises keyword and be associated with described keyword according to a plurality of searching key words that described user imports;
Described correlation rule is saved in the described association rule database.
Preferably, described method also comprises:
Add up all users' search behavior and/or Search Results;
Determine the power of incidence relation between keyword and corresponding conjunctive word in the described correlation rule according to statistics;
According to definite result the correlation rule in the described association rule database is safeguarded.
Preferably, describedly determine that according to statistics the power of incidence relation between keyword and corresponding conjunctive word in the described correlation rule comprises:
Calculate the support and/or the degree of confidence of described correlation rule according to statistics;
If described support greater than the preset confidence threshold value, determines then that described correlation rule be strong association greater than the support threshold value of setting and/or described degree of confidence; Otherwise be weak association.
A kind of server comprises:
The keyword acquiring unit is used to obtain the searching key word of user in the search interface input;
The conjunctive word acquiring unit is used to obtain the conjunctive word that is associated with described searching key word;
Query unit is used for respectively obtaining Search Results according to the query composition index data base of described searching key word, conjunctive word and described searching key word and conjunctive word;
The arrangement unit is used for going heavy and ordering to described Search Results;
Transmitting element is used for the Search Results after the ordering is sent to client, so that client represents the Search Results of receiving to described user.
Preferably, described conjunctive word acquiring unit, specifically be used for searching association rule database according to described searching key word, if the correlation rule that comprises described searching key word is arranged in the described association rule database, then from described correlation rule, obtain the conjunctive word that is associated with described searching key word.
Preferably, described server also comprises: rule is provided with unit and/or regular generation unit, and preserves the unit;
Described rule is provided with the unit, is used to be provided with the correlation rule of the conjunctive word that comprises keyword and be associated with described keyword;
Described regular generation unit is used for a plurality of searching key words according to described user's input, generates the correlation rule of the conjunctive word that comprises keyword and be associated with described keyword;
Described preservation unit is used for described correlation rule is saved in described association rule database.
Preferably, described server also comprises:
Statistic unit is used to add up all users' search behavior and/or Search Results;
The power of incidence relation between degree of association determining unit, the keyword that is used for determining described correlation rule and corresponding conjunctive word according to statistics;
The rule maintenance unit is used for according to determining that the result safeguards the correlation rule of described association rule database.
Preferably, described degree of association determining unit comprises:
Computation subunit is used for calculating according to statistics the support and/or the degree of confidence of described correlation rule;
Analyze subelement, be used for during greater than the preset confidence threshold value, determining that described correlation rule is for strong related greater than the support threshold value of setting and/or described degree of confidence in described support; Otherwise be weak association.
The method and the server of embodiment of the invention extending user Search Results, searching key word at user's input, excavated the conjunctive word that has incidence relation with described searching key word, and respectively according to the query composition index data base of described searching key word, conjunctive word and described searching key word and conjunctive word, obtain search result corresponding, thereby expanded Search Results, the document that will have incidence relation with the searching key word of user's input also offers the user in the lump, for the user provides abundanter Search Results.
Embodiment
In order to make those skilled in the art person understand the scheme of the embodiment of the invention better, the embodiment of the invention is described in further detail below in conjunction with drawings and embodiments.
The method and the server of embodiment of the invention extending user Search Results, searching key word at user's input, excavated the conjunctive word that has incidence relation with described searching key word, that is to say, intelligently user's search behavior and expectation are predicted, and respectively according to described searching key word, conjunctive word, and the query composition index data base of described searching key word and conjunctive word, obtain search result corresponding, thereby expanded Search Results, the document that will have incidence relation with the searching key word of user's input also offers the user in the lump, for the user provides abundanter Search Results.
As shown in Figure 1, be the process flow diagram of the method for embodiment of the invention extending user Search Results, comprise following basic step:
Step 101 is obtained the searching key word of user in the search interface input.
Described searching key word can be a Chinese, English arbitrarily, can be independent speech, also can be phrase, and the keyword of user's input can be one or more.
In addition, user's input also can be the phrase that comprises one or more keys, has imported " 360 and QQ Great War " such as the user, then can therefrom extract " 360 ", " QQ " and " Great War " these several keywords.Concrete extracting mode can be handled according to extracting mode of the prior art, and this embodiment of the invention is not done qualification.
For above-mentioned situation, server can be searched for respectively and each keyword document matching, obtains search result corresponding.
Step 102 is obtained the conjunctive word that is associated with described searching key word.
In embodiments of the present invention, can set up various correlation rules in advance, the conjunctive word that comprises keyword in the described correlation rule and be associated with described keyword, for the ease of these correlation rules are safeguarded, the various correlation rules of setting up can also be saved in the association rule database, so as when needed to described correlation rule upgrade, increase or deletion etc.
Such as, some have very strong ageing information, As time goes on, these information no longer become focus, people also can descend to the attention rate of these information, correspondingly, the correlation rule relevant with these information also needs to upgrade or deletion, is that the user provides some unwanted Search Results to avoid.
Correspondingly, server is behind the described searching key word that receives the client transmission, just can search described association rule database according to described searching key word, if the correlation rule that comprises described searching key word is arranged in the described association rule database, then from described correlation rule, obtain the conjunctive word that is associated with described searching key word.
Need to prove that the foundation of described correlation rule can have multiple mode, such as:
(1) sets up the correlation rule of the conjunctive word comprise keyword and to be associated by set-up mode, that is to say, set up described correlation rule by human-edited's mode with described keyword.
(2) front is mentioned, the keyword of user's input can be a plurality of, for the situation that a plurality of keywords are arranged, just can have incidence relation between these keywords, therefore, can also generate the correlation rule of the conjunctive word that comprises keyword and be associated automatically by a plurality of searching key words of server according to described user's input with described keyword.Need to prove that described server can be a search engine server, its at the user also be meant the user that all use this search engine.
Certainly, can adopt above-mentioned dual mode to set up corresponding correlation rule simultaneously, and the situation that other modes can also be arranged and deposit, the embodiment of the invention are not done qualification to this yet.
Such as, in described association rule database, the correlation rule shown in the table 1 is arranged:
Table 1:
ID |
Rule |
1 |
Potato=>dietary function |
2 |
QQ=>360 |
3 |
{ Zhang San, Li Si }=>lawsuit |
4 |
Law of conservation of mass=>Luo Mengnuosuofu |
5 |
Einstein=>relativity |
6 |
Einstein=>Nobel Prize in physics |
The front is mentioned, and the keyword of user's input can be one or more.For the situation of having only a keyword, when searching described association rule database, may obtain the one or more conjunctive words corresponding with this keyword.Such as, the user has imported searching key word " einstein " at search interface, then searches described association rule database, can obtain two conjunctive words being associated with keyword " einstein ", i.e. " relativity " and " Nobel Prize in physics ".For the situation that a plurality of keywords are arranged, when searching described association rule database, can search according to described a plurality of keywords, such as, the user has imported searching key word " Zhang San " and " Li Si " at search interface, then search described association rule database, can obtain the conjunctive word " lawsuit " that is associated with keyword " Zhang San " and " Li Si ".
In addition, have when a plurality of at the keyword of user input, these keywords have certain incidence relation usually, therefore, server also can extract these keywords, generate corresponding correlation rule, if the record of this correlation rule not in the described association rule database then is saved in the correlation rule that generates in the described association rule database.Such as, the user has imported searching key word " 360 " and " network security " at search interface, then server generates correlation rule { 360=>network security } according to the keyword of user's input, and this correlation rule not in the described association rule database, then server adds the correlation rule { 360=>network security } that generates in the described association rule database to.
Step 103 according to the query composition index data base of described searching key word, conjunctive word and described searching key word and conjunctive word, obtains Search Results respectively.
Such as, if the user imports QQ, the described association rule database of whois lookup obtains the conjunctive word 360 with the relevant relation of QQ, then to { QQ}, { { 360} searches for respectively, obtains search result corresponding for QQ, 360}.
Described Search Results can comprise the summary or the partial content of relevant documentation, can further include the URL of described document, is linked to relevant documentation so that the user can click URL.
Need to prove, some the Internet web page information of regularly collecting have been preserved in the described index data base, concrete collection mode can adopt prior art, such as utilizing web crawlers program search internet web page, set up the index of different web pages information correspondence, deposit the index of setting up in described index data base.
Step 104 is gone heavy and ordering to described Search Results.
Because once search may obtain a plurality of Search Results, go heavy and ordering to these Search Results, the user is better experienced.
Described go heavily to be meant a plurality of identical result document are only kept one, the specific implementation process can not done qualification to this embodiment of the invention with reference to prior art.
When Search Results is sorted, the Search Results of the described searching key word of correspondence can be come the front, be corresponding described searching key word then and the Search Results of related contamination, be the Search Results of corresponding described conjunctive word at last.Certainly, also can adopt other orders.
In addition, when Search Results is sorted, can also take all factors into consideration other factors, such as, can sort to described Search Results according to the time that the source information document relevant with described Search Results produces, before the Search Results that the time is nearest comes; Can also sort to described Search Results according to the source information document relevant and the matching degree of described searching key word with described Search Results, before the Search Results that matching degree is the highest came, the calculating of described matching degree can be carried out according to account form of the prior art.When considering that multiple factor sorts to described Search Results, can set different weights to different factors, according to the priority of each each Search Results of weight calculation, priority is high comes the front.
Step 105 sends to client with the Search Results after the ordering, so that client represents the Search Results of receiving to described user.
Need to prove, the whole of the Search Results after the ordering or ordering partly can be sent to client the preceding.
As seen, the method of embodiment of the invention extending user Search Results, searching key word at user's input, excavated the conjunctive word that has incidence relation with described searching key word, that is to say, intelligently user's search behavior and expectation are predicted, and respectively according to described searching key word, conjunctive word, and the query composition index data base of described searching key word and conjunctive word, obtain search result corresponding, thereby expanded Search Results, the document that will have incidence relation with the searching key word of user's input also offers the user in the lump, for the user provides abundanter Search Results.
The front is mentioned, and in example of the present invention, described correlation rule can have multiple mode to set up, such as, set up corresponding correlation rule by the mode of setting or by server automatically according to a plurality of searching key words of user's input.And, these correlation rules can also be saved in the same association rule database.
In order further to guarantee the strong correlation of these correlation rules, equally, can be by manually it being carried out regular update, can also analyze all users' search behavior and/or Search Results by server, automatically these correlation rules are safeguarded according to analysis result, below this is elaborated.
At first, simple introduction two notions relevant: support, degree of confidence with described correlation rule.Described correlation rule, support, degree of confidence all are the notion in the data mining subject at first, wherein:
A correlation rule can be designated as:
A=>B (1)
Wherein, A represents keyword, and B represents the conjunctive word of A.
Support is defined as:
Wherein, and n (A " B) expression A and the simultaneous number of times of B, N represents the number of all affairs.
Degree of confidence is defined as:
Wherein, n (A) represents the number of times that A takes place.
Support and degree of confidence can be represented the power of incidence relation between a plurality of clauses and subclauses.
Need to prove that by last formula (2), (3) as can be seen, (A=>B) must equal sup, and (B=>A), ((value of B=>A) is then different for A=>B) and conf for conf for sup.
Based on above-mentioned principle, in embodiments of the present invention, can also may further comprise the steps: search behavior and/or the Search Results of adding up all users, determine the power of incidence relation between keyword and corresponding conjunctive word in the described correlation rule according to statistics, according to definite result the correlation rule in the described association rule database is safeguarded, particularly, can be that correlation rule is upgraded, adds or operation such as deletion.
Between keyword and the corresponding conjunctive word determined according to statistics in the described correlation rule, during incidence relation strong and weak, can multiple implementation specifically be arranged, to describing in detail for example below this based on foregoing support and/or degree of confidence.
(1) determines the power of incidence relation between keyword and corresponding conjunctive word in the described correlation rule according to all users' search behavior
For instance, when supposing to have several users to use search, the several query words below having imported:
1.360 Great War QQ;
2.QQ prosecute 360;
3.QQ。
Suppose that A is 360, B is QQ, then according to above-mentioned these search behaviors, can obtain:
N (A " B)=2, N=3, so, sup (A=>B)=2/3=0.667;
N (A)=2, thus conf (A=>B)=2/2=1.0;
Equally, can obtain conf (B=>A)=2/3=0.667.
(2) determine the power of incidence relation between keyword and corresponding conjunctive word in the described correlation rule according to all users' Search Results
In embodiments of the present invention, user's one query can be called affairs.
For the current incidence relation that has existed in the association rule database, 360 related QQ for example, at 360, QQ and 360, QQ} searches for the result document that obtains and follows the trail of statistics, supposes that statistics is as follows in certain period:
These result document add up to N=100;
The number that only comprises 360 document is: n (360)=10;
The number that only comprises the document of QQ is n (QQ)=20;
Not only comprised 360 but also the number that comprises the document of QQ be n (A " B)=70;
Then calculate and can obtain:
sup(360=>QQ)=70/100=0.7;
conf(360=>QQ)=70/(70+10)=0.875;
conf(QQ=>360)=70/(70+20)=0.778。
These three values are dynamic changes, if a certain period, these three values are all diminishing, illustrate 360 and the incidence relation of QQ weakening, otherwise illustrate that then its incidence relation is in enhancing.
(3) comprehensive above-mentioned two kinds of statisticses are promptly taken all factors into consideration the power that user's search behavior and Search Results are determined incidence relation between keyword and corresponding conjunctive word in the described correlation rule
Such as, can give specific weight to the statistical value of described search behavior and Search Results respectively, when calculating described support and degree of confidence, be weighted on average according to separately weight, weight separately can be identical, also can difference.
Need to prove, plant the statistical computation mode at above-mentioned (1), in user's once search, the situation of the keyword more than three or three may appear comprising in the searching key word of user input, has imported " 360 and QQ lawsuit situation " such as the user, keyword set is exactly { 360 so, QQ, the lawsuit situation }, at this moment, can calculate respectively the support and the degree of confidence of each combination, comprise:
Sup (360=>{ QQ, lawsuit }), conf (360=>{ QQ, lawsuit });
Sup (QQ=>{ 360, lawsuit }), conf (QQ=>{ 360, lawsuit });
Sup (lawsuit=>QQ, 360}), conf (lawsuit=>QQ, 360});
Conf ({ QQ tells }=>360);
Conf ({ 360, tell }=>QQ);
Conf (QQ, 360}=>tell);
sup(360=>{QQ}),conf(360=>{QQ});
Sup (360=>{ lawsuit }), conf (360=>{ lawsuit });
Conf (lawsuit=>360);
Conf (lawsuit=>QQ);
conf(QQ=>360);
Sup (QQ=>lawsuit), conf (QQ=>lawsuit);
As seen, when the keyword number in certain search affairs is too many, can cause calculated amount too big, in the application of reality, can do a little restrictions, such as only calculating two degree of confidence and supports between the keyword.
Need to prove, after calculating described support and degree of confidence, can determine the power of incidence relation between keyword and corresponding conjunctive word in the described correlation rule according to one of them, such as support threshold value and confidence threshold value are set respectively, after the support that calculates surpasses described support threshold value, think strong association, otherwise think weak association; Equally, after the degree of confidence that calculates surpasses described confidence threshold value, think strong association.Certainly, also can take all factors into consideration this two values, after the support that calculates and degree of confidence are all above corresponding threshold, just think strong association.
In addition, when the correlation rule in the described association rule database being safeguarded according to definite result, can determine whether the needs deletion according to the power of its incidence relation, add or revise the correlation rule in the described association rule database, such as, incidence relation in determining certain correlation rule is deleted this correlation rule after belonging to weak association.
Need to prove, above-mentioned only be to utilize in the embodiment of the invention support and/or degree of confidence judge incidence relation in the correlation rule power concrete for example, in the practical application, can also judge the power of described incidence relation, this embodiment of the invention is not done qualification by other modes.
As seen, the method of embodiment of the invention extending user Search Results, not only intelligently user's search behavior and expectation are predicted, the document that will have incidence relation with the searching key word of user's input also offers the user in the lump, for the user provides abundanter Search Results, and, guaranteed the validity and the accuracy of the Search Results of expansion by automatic maintenance to correlation rule.
One of ordinary skill in the art will appreciate that all or part of step that realizes in the foregoing description method is to instruct relevant hardware to finish by program, described program can be stored in the computer read/write memory medium, described storage medium, as: ROM/RAM, magnetic disc, CD etc.
Correspondingly, the embodiment of the invention also provides a kind of server, as shown in Figure 2, is a kind of structural representation of this server.
In this embodiment, described server comprises:
Keyword acquiring unit 201 is used to obtain the searching key word of user in the search interface input;
Conjunctive word acquiring unit 202 is used to obtain the conjunctive word that is associated with described searching key word;
Query unit 203 is used for respectively obtaining Search Results according to the query composition index data base of described searching key word, conjunctive word and described searching key word and conjunctive word;
Arrangement unit 204 is used for going heavy and ordering to described Search Results;
Transmitting element 205 is used for the Search Results after the ordering is sent to client, so that client represents the Search Results of receiving to described user.
In embodiments of the present invention, can set up various correlation rules in advance, the conjunctive word that comprises keyword in the described correlation rule and be associated with described keyword, for the ease of these correlation rules are safeguarded, the various correlation rules of setting up can also be saved in the association rule database, so as when needed to described correlation rule upgrade, increase or deletion etc.
Correspondingly, described conjunctive word acquiring unit 202, specifically be used for searching association rule database 205 according to described searching key word, if the correlation rule that comprises described searching key word is arranged in the described association rule database, then from described correlation rule, obtain the conjunctive word that is associated with described searching key word.
Need to prove that described association rule database 205 can be in described server inside, also can be independent of outside the described server.
In addition, in embodiments of the present invention, described server also can further comprise: rule is provided with unit and/or regular generation unit, and preserves the unit, wherein:
Described rule is provided with the unit, is used to be provided with the correlation rule of the conjunctive word that comprises keyword and be associated with described keyword;
Described regular generation unit is used for a plurality of searching key words according to described user's input, generates the correlation rule of the conjunctive word that comprises keyword and be associated with described keyword;
Described preservation unit is used for described correlation rule is saved in described association rule database.
That is to say that described correlation rule can have multiple mode to generate, such as, by manually the unit being set some correlation rules are set by described rule, can also generate some correlation rules automatically by described regular generation unit.In actual applications, described server can include only described rule in unit and the described regular generation unit any is set, and also can comprise these two unit simultaneously.Certainly, the embodiment of the invention is not limited in above-mentioned these implementations, can also adopt other modes or above-mentioned variety of way and other modes and the mode of depositing generates described correlation rule, and this is enumerated no longer one by one.
As seen, the server of the embodiment of the invention, searching key word at user's input, intelligently user's search behavior and expectation are predicted, and respectively according to the query composition index data base of described searching key word, conjunctive word and described searching key word and conjunctive word, obtain search result corresponding, thereby expanded Search Results, the document that will have incidence relation with the searching key word of user's input also offers the user in the lump, for the user provides abundanter Search Results.
As shown in Figure 3, be the another kind of structural representation of embodiment of the invention server.
Be that with difference embodiment illustrated in fig. 2 in this embodiment, described server also further comprises:
Statistic unit 206 is used to add up all users' search behavior and/or Search Results;
The power of incidence relation between degree of association determining unit 207, the keyword that is used for determining described correlation rule and corresponding conjunctive word according to statistics;
Rule maintenance unit 208 is used for according to determining that the result safeguards the correlation rule of described association rule database, and particularly, this maintenance can be the correlation rule of deleting, adding or revise in the described association rule database.
In embodiments of the present invention, described degree of association determining unit 207 can be determined the power of incidence relation between keyword and corresponding conjunctive word in the described correlation rule in several ways, such as, can determine according to support and/or degree of confidence.
Correspondingly, described degree of association determining unit 207 comprises:
Computation subunit is used for calculating according to statistics the support and/or the degree of confidence of described correlation rule;
Analyze subelement, be used for during greater than the preset confidence threshold value, determining that described correlation rule is for strong related greater than the support threshold value of setting and/or described degree of confidence in described support; Otherwise be weak association.
Certainly, the embodiment of the invention is not limited in above-mentioned this implementation, and in actual applications, described degree of association determining unit 207 can also be determined the power of described incidence relation by other modes, and this embodiment of the invention is not done qualification.
The server of the embodiment of the invention, not only intelligently user's search behavior and expectation are predicted, the document that will have incidence relation with the searching key word of user's input also offers the user in the lump, for the user provides abundanter Search Results, and, guaranteed the validity and the accuracy of the Search Results of expansion by automatic maintenance to correlation rule.
Identical similar part is mutually referring to getting final product between each embodiment in this instructions, and each embodiment stresses all is difference with other embodiment.Especially, for Apparatus and system embodiment, because it is substantially similar in appearance to method embodiment, so describe fairly simplely, relevant part gets final product referring to the part explanation of method embodiment.System embodiment described above only is schematic, wherein said unit as the separating component explanation can or can not be physically to separate also, the parts that show as the unit can be or can not be physical locations also, promptly can be positioned at a place, perhaps also can be distributed on a plurality of network element.Can select wherein some or all of module to realize the purpose of present embodiment scheme according to the actual needs.Those of ordinary skills promptly can understand and implement under the situation of not paying creative work.
More than disclosed only be preferred implementation of the present invention; but the present invention is not limited thereto; any those skilled in the art can think do not have a creationary variation, and, all should drop in protection scope of the present invention not breaking away from some improvements and modifications of being done under the principle of the invention prerequisite.