CN102982125A - Method and device for identifying texts with same meaning - Google Patents

Method and device for identifying texts with same meaning Download PDF

Info

Publication number
CN102982125A
CN102982125A CN2012104570842A CN201210457084A CN102982125A CN 102982125 A CN102982125 A CN 102982125A CN 2012104570842 A CN2012104570842 A CN 2012104570842A CN 201210457084 A CN201210457084 A CN 201210457084A CN 102982125 A CN102982125 A CN 102982125A
Authority
CN
China
Prior art keywords
text
synonym
sequence
candidate
search results
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012104570842A
Other languages
Chinese (zh)
Other versions
CN102982125B (en
Inventor
刘钦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201210457084.2A priority Critical patent/CN102982125B/en
Publication of CN102982125A publication Critical patent/CN102982125A/en
Application granted granted Critical
Publication of CN102982125B publication Critical patent/CN102982125B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a method and a device for identifying texts with the same meaning. The method and the device comprise the steps of carrying out word segmentation to text series to be processed, obtaining at least one text segment, searching in candidate same meaning series of the text series according to the at least one segment, obtaining candidate same meaning series containing the at least one text segment or one or more synonyms of the text segment, and selecting the same meaning texts of the text series from the candidate same meaning series. Compared with the prior art, the method and the device are capable of obtaining the synonyms of the text series to be processed, wherein the synonyms are difficult to recall in the prior art, and capable of well improving accuracy of judgments of synonyms of the text series to be processed.

Description

A kind of method and apparatus for determining the synonym text
Technical field
The present invention relates to field of computer technology, relate in particular to a kind of method and apparatus for determining the synonym text.
Background technology
When the user searches on the internet, probably can adopt different names to express same object search; For example, for using " where is my water ", the user may adopt the titles such as " crocodile is liked to have a bath ", " crocodile is liked to have a shower ", " the little naughtiness of crocodile " to come it is searched for when search; Again for example, user " the palm Baidu " searched for and " palm hundred " may be for same object searches etc.Therefore, it is different to need to identify these titles in the search technique, but represents the text sequence of same object search.
The identification that exists in the prior art represents that the mode of the text sequence of same object search comprises:
1) by manually identifying and mark;
2) identification of the synonym by semantically, as identify " having a bath " and " having a shower " for semantically synonym etc., identify the text sequence that represents same object search.
Yet the mode hysteresis quality of artificial cognition and mark object search large and that can identify is limited, and cost of labor is also higher; The discrimination of semantic identification is low, and for example, semantically difference is very big for some, but still represents the text sequence of same object search, then None-identified; And above-mentioned dual mode all has the low problem of coverage rate.
Summary of the invention
The purpose of this invention is to provide a kind of method and apparatus for determining the synonym text.
According to an aspect of the present invention, provide a kind of method for setting up or upgrade candidate's synonym sequence library, wherein, the method may further comprise the steps:
A mates the second Search Results of the first Search Results sequence to be excavated with it of pending text sequence;
Wherein, the method is further comprising the steps of:
When X meets the first predetermined condition as the result of described coupling, set up or upgrade candidate's synonym sequence library of described pending text sequence according to described sequence to be excavated;
Wherein, described the first predetermined condition comprises that described the first Search Results comprises at least one identical search result items with described the second Search Results.
According to another aspect of the present invention, also provide a kind of method for determining the synonym text, wherein, the method may further comprise the steps:
A cuts word to pending text sequence, obtains at least one text fragments;
B is according to described at least one text fragments, in candidate's synonym sequence library of described text sequence, inquire about, acquisition comprises the one or more candidate's synonym sequence in described at least one text fragments or its synonym, candidate's synonym text as described text sequence, wherein, meet the first predetermined condition based on the historical search result of described text sequence gained with matching result based on the historical search result of described candidate's synonym sequence gained;
C selects the synonym text of described text sequence from described candidate's synonym text;
Wherein, described the first predetermined condition comprises that described the first Search Results comprises at least one identical search result items with described the second Search Results.
According to another aspect of the present invention, also provide a kind of updating device for setting up or upgrade candidate's synonym sequence library, wherein, this updating device comprises:
Coalignment is used for the second Search Results of the first Search Results sequence to be excavated with it of pending text sequence is mated;
The storehouse updating device is used for when the result of described coupling meets the first predetermined condition, according to described sequence foundation to be excavated or upgrade candidate's synonym sequence library of described pending text sequence;
Wherein, described the first predetermined condition comprises that described the first Search Results comprises at least one identical search result items with described the second Search Results.
According to another aspect of the present invention, also provide a kind of and determined device for the synonym text of determining the synonym text, wherein, this synonym text determines that device comprises:
Cut the word device, be used for pending text sequence is cut word, obtain at least one text fragments;
Inquiry unit, be used for according to described at least one text fragments, in candidate's synonym sequence library of described text sequence, inquire about, acquisition comprises the one or more candidate's synonym sequence in described at least one text fragments or its synonym, candidate's synonym text as described text sequence, wherein, meet the first predetermined condition based on the historical search result of described text sequence gained with matching result based on the historical search result of described candidate's synonym sequence gained;
The first selecting arrangement is used for from the synonym text of the described text sequence of described candidate's synonym text selection;
Wherein, described the first predetermined condition comprises that described the first Search Results comprises at least one identical search result items with described the second Search Results.
Compared with prior art, the present invention has the following advantages: 1) can obtain the search result items of all being clicked by the user and set up related pending text sequence and candidate's synonym sequence thereof in Search Results separately, and judge in several ways that further whether each candidate's synonym sequence is the synonym text of pending text sequence really, thereby can obtain to be difficult in the prior art synonym of the pending text sequence of recalling, and can improve preferably the synonym judgment accuracy of pending text sequence; 2) can put in order pending text sequence and synonym text thereof, guarantee both unitarities; 3) by searching for based on text sequence and the synonym text thereof of ask search, can obtain merely to be difficult to obtain based on text sequence search and actual capabilities are the required search result items of user; 4) if owing to a search result items appears in the Search Results of two text sequence, although can think that then the user has inputted different text sequence, but it wishes the same or analogous to liking of search, the present invention excavates candidate's synonym sequence of a text sequence accordingly, can obtain candidate's synonym sequence that the scheme based on prior art is difficult to recall; 5) further, if a search result items not only appears in the Search Results of two text sequence, also all clicked by the user, can think that then the user may think that these two text sequence are same or analogous, the present invention further excavates candidate's synonym sequence of a text sequence accordingly, can obtain candidate's synonym sequence that the scheme based on prior art is difficult to recall; 6) owing to the number of times of all being clicked by the user in two Search Results, frequency etc. are higher, the quantity of itself and the search result items all clicked by the user is more, then the user thinks that the possibility that these two search result items point to same object search is larger, accordingly, this preferred version can also based on the click information of the search result items of all being clicked by the user, further screen candidate's synonym sequence.
Description of drawings
By reading the detailed description that non-limiting example is done of doing with reference to the following drawings, it is more obvious that other features, objects and advantages of the present invention will become:
Fig. 1 is the method flow diagram that is used for determining the synonym text of a preferred embodiment of the invention;
Fig. 2 is the method flow diagram that is used for setting up or upgrading candidate's synonym sequence library of a preferred embodiment of the invention;
Fig. 3 is the structural representation of the definite device that is used for definite synonym text of a preferred embodiment of the invention;
Fig. 4 is the structural representation that is used for setting up or upgrading the updating device of candidate's synonym sequence library of a preferred embodiment of the invention.
Same or analogous Reference numeral represents same or analogous parts in the accompanying drawing.
Embodiment
Below in conjunction with accompanying drawing the present invention is described in further detail.
Fig. 1 is the method flow diagram that is used for determining the synonym text of a preferred embodiment of the invention.Method according to present embodiment comprises step S1, step S2 and step S3.Wherein, the method for present embodiment mainly realizes by computer equipment; Described computer equipment includes but not limited to the network equipment or subscriber equipment; The described network equipment includes but not limited to server group that single network server, a plurality of webserver form or based on the cloud that is made of a large amount of computing machines or the webserver of cloud computing (Cloud Computing), wherein, cloud computing is a kind of of Distributed Calculation, a super virtual machine that is comprised of the loosely-coupled computing machine collection of a group; Described subscriber equipment includes but not limited to PC, panel computer etc.; The residing network of described computer equipment includes but not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN (Local Area Network), VPN network etc.
Need to prove that only for giving an example, other computing equipments existing or that may occur from now on or network also should be included in the protection domain of the present invention as applicable to the present invention, and are contained in this with way of reference for described computer equipment and network.
In step S1, computer equipment is cut word to pending text sequence, obtains at least one text fragments.
Wherein, described pending text sequence comprises that any needs determine the text sequence of its synonym text; Preferably, described pending text sequence comprises the Internet resources title, and this Internet resources title comprises the title of any resource that can obtain in the network, such as Apply Names, audio frequency and video title etc.; More preferably, described pending text sequence comprises Apply Names.
Wherein, the computer equipment mode that obtains pending text sequence includes but not limited to:
1) computer equipment obtains the pending text sequence of pre-stored; As being pre-stored in text sequence in computer equipment or other equipment etc.;
2) the computer equipment Real-time Obtaining is from user's search sequence, as pending text sequence etc.
Wherein, computer equipment can adopt various ways to come pending text sequence is cut word, obtains its at least one text fragments.
For example, computer equipment is cut word according to dictionary to pending text sequence " little naughty love is had a shower ", obtains 3 text fragments " little naughtiness ", " love " and " having a shower " of this pending text sequence.
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any pending text sequence is cut word, obtain the implementation of at least one text fragments, all should be within the scope of the present invention.
Then, in step S2, computer equipment is according at least one text fragments of cutting the word gained, in candidate's synonym sequence library of pending text sequence, inquire about, acquisition comprises the one or more candidate's synonym sequence in described at least one text fragments or its synonym, as candidate's synonym text of pending text sequence.
Wherein, meet the first predetermined condition based on the first Search Results of text sequence gained and matching result based on the second Search Results of candidate's synonym sequence gained, this first predetermined condition comprises that described the first Search Results comprises at least one identical search result items with described the second Search Results.Preferably, this first predetermined condition also can comprise other conditions, and will be described in detail with reference to the embodiment shown in FIG. 2, does not repeat them here.
Wherein, described search result items can comprise any search result information, for example, and Search Results link, Search Results summary etc.
Wherein, candidate's synonym sequence library of pending text sequence can be determined before step S2 carries out in advance; Should determine in advance that the mode in candidate's synonym text sequence storehouse will be described in detail with reference to the embodiment shown in FIG. 2, not repeat them here.
Wherein, computer equipment can adopt various ways to determine the synonym of a text fragments; For example, determine one or more synonyms of a text fragments by inquiring about predetermined synonymicon; Again for example, by inquiring about predetermined synonym word dictionary, and determine one or more synonyms etc. of a text fragments in conjunction with semantic analysis.
Particularly, computer equipment is according at least one text fragments of cutting the word gained, in candidate's synonym sequence library of pending text sequence, inquire about, acquisition comprises the one or more candidate's synonym sequence in described at least one text fragments or its synonym, includes but not limited to as the mode of candidate's synonym text of pending text sequence:
1) when computer equipment inquiry and when determining that candidate's synonym sequence comprises the synonym of one or more text fragments at least one text fragments of cutting the word gained or this at least one text fragments, determines that this candidate's synonym sequence is candidate's synonym text of pending text sequence.
For example, the text fragments of pending text sequence " crocodile is liked to have a shower " comprises " crocodile ", " love " and " having a shower ", and candidate's synonym sequence comprises " little naughty love is had a bath ", " crocodile is liked to have a bath ", " little naughtiness is had a bath ", " having washed ", " how washing "; Then computer equipment is inquired about in candidate's synonym sequence library of text sequence " crocodile is liked to have a bath ", and determine that " little naughty like to have a bath " comprises that the synonym of text fragments " love " and " having a shower " " has a bath ", candidate's synonym sequence " crocodile like have a bath " comprises that the synonym of text fragments " crocodile " and " love " and " having a shower " " has a bath ", candidate's synonym sequence " little naughtiness is had a bath " comprises that the synonym that text fragments " is had a shower " " has a bath ", then with candidate's synonym sequence " little naughty like to have a bath ", " crocodile is liked to have a bath " and " little naughtiness is had a bath " are as candidate's synonym text of pending text sequence " crocodile is liked to have a shower ".
2) computer equipment obtains the synonym of at least one text fragments of cutting the word gained; And in candidate's synonym sequence library of described text sequence, inquire about, to obtain to comprise described synon candidate's synonym sequence; And, when the candidate's synonym sequence that obtains when described inquiry only comprises described synonym, directly with the described candidate's synonym sequence that inquires as described candidate's synonym text; When the candidate's synonym sequence that obtains when described inquiry comprises described synonym and other text messages, with other text messages candidate synonym sequence identical with pending text sequence part that comprises, as described candidate's synonym text.
For example, the text fragments of pending text sequence " crocodile is liked to have a shower " comprises " crocodile ", " love " and " having a shower ", and candidate's synonym sequence library comprises " little naughty love is had a bath ", " crocodile is liked to have a bath ", " little naughtiness is had a bath ", " having washed ", " how washing ".
Computer equipment inquires candidate's synonym sequence " little naughty like to have a bath " and comprises the synonym that text fragments " has a shower " and " have a bath " in candidate's synonym sequence library of pending text sequence " crocodile like have a shower ", and computer equipment judges in other text messages in candidate's synonym sequence " little naughty like to have a bath " " little naughty like " and the pending text sequence " crocodile is liked to have a shower " and has the identical text message of part " loves ", and then computer equipment determines that candidate's synonym sequence " little naughty like to have a bath " is candidate's synonym text of pending text sequence " crocodile love is had a shower ".
Then, similarly, computer equipment continues to inquire about in candidate's synonym sequence, determines that candidate's synonym sequence " little naughty like to have a bath ", " crocodile is liked to have a bath " and " little naughtiness is had a bath " are candidate's synonym text of pending text sequence " crocodile love is had a shower ".
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any synonym that obtains at least one text fragments of cutting the word gained, and in candidate's synonym sequence library of described text sequence, inquire about, to obtain to comprise described synon candidate's synonym sequence, and, when the candidate's synonym sequence that obtains when described inquiry only comprises described synonym, directly with the described candidate's synonym sequence that inquires as described candidate's synonym text, when the candidate's synonym sequence that obtains when described inquiry comprises described synonym and other text messages, with other text messages candidate synonym sequence identical with pending text sequence part that comprises, as the implementation of described candidate's synonym text; For example, when candidate's synonym sequence of obtaining of inquiry comprises synonym and other text messages, it is all or part of when identical only to work as in other text messages of comprising and the pending text sequence text fragments except the corresponding text fragments of synonym that this candidate's synonym sequence comprises, just with this candidate's synonym sequence, as described candidate's synonym text etc., all should be within the scope of the present invention.
What need to further specify is that those skilled in the art will be understood that under the mode that step S2 limits computer equipment can adopt various ways to select candidate's synonym text from candidate's synonym sequence library.For example, computer equipment can be inquired about first and determine that all comprise synon candidate's text sequence of text fragments, therefrom select candidate's synonym text again; Perhaps, computer equipment can judge one by one also whether each candidate's text sequence is candidate's synonym text.
Then, in step S3, computer equipment is selected the synonym text of pending text sequence from candidate's synonym text.
Particularly, computer equipment selects the mode of the synonym text of described text sequence to include but not limited to from described candidate's synonym text:
1) computer equipment is selected the synonym text according to the degree of association between candidate's synonym text and the pending text sequence from candidate's synonym text.Wherein, the degree of association between candidate's synonym text and the pending text sequence is higher, and then to be selected as the possibility of synonym text higher for candidate's synonym text.
Wherein, this degree of association can be determined based on many factors, for example, click information based on the search result items of all being clicked by the user in the first Search Results of the Search Results of candidate's synonym text and pending text sequence is determined, wherein, the click information of search result items includes but not limited to clicking rate, number of clicks, clicked time, click frequency of search result items etc.; Preferably, clicking rate, number of clicks, click frequency etc. are higher, and then the degree of association is higher.Preferably, the pre-degree of closeness of determining between the synonym of the text fragments that the degree of association also can comprise based on candidate's synonym text and the pending text sequence, the synonym that candidate's synonym text comprises shared ratio etc. in this candidate's synonym text is determined.
2) pending text sequence comprises the Internet resources title, computer equipment selects the synonym text of described text sequence by in all or part of candidate's synonym text of candidate's synonym text each being carried out at least one among following operation A and the B from described candidate's synonym text; Wherein, computer equipment can all be carried out following operation A and/or B in candidate's synonym text each, perhaps, computer equipment can be according to the degree of association order from high to low between candidate's synonym text and the pending text sequence, perhaps, according to based on the weights order from high to low of determining such as the degree of association, the pre-parameters such as importance degree of determining, one by one each candidate's synonym text is carried out following operation, until till obtaining predetermined quantity (such as 30) or all preferred synonym texts having been executed following operation A and/or B.
Below will describe operation A and B:
Operation A: judge pending text sequence and whether have non-synonym feature when candidate's synonym text of pre-treatment.
Wherein, described non-synonym feature comprises and anyly can embody pending text sequence and candidate's synonym text is not synon characteristic information.Preferably, this non-synonym feature includes but not limited to following at least one:
1) the corresponding Internet resources of pending text sequence and the corresponding Internet resources of candidate's synonym text belong to different brands.
For example, belong to the application of different brands, as belong to the QQ mobile phone assistant of QQ and belong to 360 360 mobile phone assistants etc.
Again for example, the films and television programs etc. that belong to different brands.
Preferably, computer equipment can be by identifying the text message that has brand identity in pending text sequence and the candidate's synonym text, such as QQ, 360 etc., perhaps, obtain computer equipment or other equipment text sequence that determine in advance, pending and the brand message of candidate's synonym text, determine whether the corresponding Internet resources of pending text sequence and the corresponding Internet resources of candidate's synonym text belong to different brands.
2) candidate's synonym text comprises the predetermined resource vocabulary of deriving; Wherein, this predetermined resource vocabulary of deriving comprises relevant with Internet resources but is not the vocabulary of Internet resources itself.
For example, with to use game relevant but do not belong to walkthrough, map, the modifier of using game itself; Again for example, relevant with films and television programs but do not belong to film review of films and television programs etc.
3) described candidate's synonym text comprises predetermined resource fragment feature; Wherein, this predetermined resource fragment feature comprises a specific part that belongs to resource, but not describes the feature of resource integral body.
For example, the special scenes title in the game; Again for example, clip name of films and television programs etc.
4) one in pending text sequence and the described candidate's synonym text is the instantiation of another one.
For example, accurate Apply Names is general instantiation with using, and is the instantiation etc. of " race component software " such as " peace rabbit rabbit runs component software ".
Preferably, computer equipment can whether classification be the subclassification of another one under the one in pending text sequence and the described candidate's synonym text by identifying, perhaps, whether the identification one is the predetermined instantiation of another one, perhaps, obtain computer equipment or other equipment text sequence that determine in advance, pending and the instantiation information of candidate's synonym text, determine that the one in pending text sequence and the described candidate's synonym text is the instantiation of another one.
5) there is macaronic at least text message in pending text sequence and the described candidate's synonym text, and will be wherein a kind of Language Translation be that the translation result of another kind of language gained does not exist synonym in the text message of this another kind language, also be, after all or part of text message of one is another kind of language from a kind of Language Translation in pending text sequence and the candidate's synonym text, in another one, there is not corresponding synonym.
For example, have English and Chinese macaronic text message in pending text sequence " sd card cleaning tool " and the candidate's synonym text " Disk Cleanup instrument ", and there is not corresponding synonym etc. in the text fragments in the pending text sequence " sd card cleaning tool " English " sd card " Chinese " safe digital card " of gained after translation in candidate's synonym text " Disk Cleanup instrument ".
Need to prove, giving an example only for technical scheme of the present invention is described better of above-mentioned non-synonym feature, but not limitation of the present invention, those skilled in the art should understand that, anyly can embody pending text sequence and candidate's synonym text is not synon characteristic information, all should be within the scope of the present invention.
Particularly, when judge judging pending text sequence and when candidate's synonym text of pre-treatment has non-synonym feature, computer equipment will not worked as candidate's synonym text of pre-treatment as the synonym text of pending text sequence.
For example, computer equipment will not belong to " 360 mobile phone assistant " the synonym text of different brands " QQ mobile phone assistant " conduct " QQ mobile phone assistant "; Again for example, computer equipment not with candidate's synonym text " peace rabbit rabbit runs component software " of the instantiation of text sequence " race component software " as its synonym text etc.
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any when judge judging pending text sequence and when candidate's synonym text of pre-treatment has non-synonym feature, will not work as candidate's synonym text of pre-treatment as the implementation of the synonym text of pending text sequence, all should be within the scope of the present invention.
Operation B: judge whether the candidate's synonym text when pre-treatment exists corresponding Internet resources.
Wherein, when there were corresponding Internet resources in judgement, candidate's synonym text selecting that the second sub-selecting arrangement will be worked as pre-treatment was the synonym text of pending text sequence.
Particularly, computer equipment judges whether the candidate's synonym text when pre-treatment exists the mode of corresponding Internet resources to include but not limited to:
1) computer equipment obtains in advance Internet resources judged result that determine, candidate's synonym text, whether has corresponding Internet resources to judge the candidate's synonym text when pre-treatment.
For example, the Internet resources judged result that computer equipment obtains is that itself or other equipment was determined before this step S3 carries out in advance, whether candidate's synonym text " crocodile is liked to have a bath " exists Internet resources in network judges whether " crocodile love is had a bath " exists corresponding Internet resources.
Wherein, the pre-mode of determining the Internet resources judged result of candidate's synonym text, with following implementation 2) Computer equipment real-time judge is when whether candidate's synonym text of pre-treatment exists the mode of corresponding Internet resources same or similar, do not repeat them here.
2) whether computer equipment real-time judge in step S3 exists corresponding Internet resources when candidate's synonym text of pre-treatment.
Preferably, whether the computer equipment real-time judge exists the mode of corresponding Internet resources to include but not limited to when candidate's synonym text of pre-treatment:
I) computer equipment is based on the candidate's synonym text when pre-treatment, in the predetermined network resource website, carry out resource searching, and according to whether obtaining the resource searching result, judge whether described candidate's synonym text when pre-treatment exists corresponding Internet resources in described predetermined network resource website.
For example, the predetermined network resource website comprises Android (Android) website, when candidate's synonym text of pre-treatment comprises " crocodile is liked to have a bath ", computer equipment is searched for based on " crocodile is liked to have a bath " in the Android website, and according to whether obtaining the resource searching result, judge whether " crocodile is liked to have a bath " exists corresponding Internet resources in the Android website.
Ii) computer equipment is based on the candidate's synonym text when pre-treatment, carry out Webpage search, and according to whether can in the webpage of search gained, extracting the text message that meets the pre-determined text template, judge whether described candidate's synonym text when pre-treatment exists corresponding Internet resources, wherein, described pre-determined text template comprise described when pre-treatment candidate's synonym text and with the predetermined vocabulary of its character pitch less than predetermined threshold.Wherein, the pre-determined text template can be one or more.
For example, the pre-determined text template comprises " [XXX] download ", " [XXX] trivial games " and " [XXX] play download ", wherein " XXX " expression is when candidate's synonym text of pre-treatment, predetermined vocabulary " downloads ", " trivial games " and " game is downloaded " and work as character pitch between candidate's synonym text of pre-treatment less than or equal to 1 character; Then computer equipment carries out Webpage search based on the candidate's synonym text " crocodile is liked to have a bath " when pre-treatment, and according to whether can in the webpage of search gained, extracting the text message that meets pre-determined text template " [crocodile is liked to have a bath] download/trivial games/game is downloaded ", judge whether the candidate's synonym text " crocodile is liked to have a bath " when pre-treatment exists corresponding Internet resources.
Need to prove, computer equipment can be based on described candidate's synonym text when pre-treatment, in the predetermined network resource website, carry out resource searching, and, based on described candidate's synonym text when pre-treatment, carry out Webpage search, and according to whether obtaining the resource searching result and whether can in the webpage of search gained, extract the text message that meets the pre-determined text template, judge whether described candidate's synonym text when pre-treatment exists corresponding Internet resources.
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, whether any judgement exists the implementation of corresponding Internet resources when candidate's synonym text of pre-treatment, all should be within the scope of the present invention.
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any implementation of from candidate's synonym text, selecting the synonym text of pending text sequence, for example, from candidate's synonym text, select at random the synonym text of predetermined quantity etc., all should be within the scope of the present invention.
In the present embodiment, the search result items of in Search Results separately, all being clicked by the user be can obtain and related pending text sequence and candidate's synonym sequence thereof set up, and judge in several ways that further whether each candidate's synonym sequence is the synonym text of pending text sequence really, thereby can obtain to be difficult in the prior art synonym of the pending text sequence of recalling, and can improve preferably the synonym judgment accuracy of pending text sequence.
As one of preferred version of present embodiment, pending text sequence comprises Apply Names, the method of present embodiment is further comprising the steps of: for each synonym text of pending text sequence, when only one comprises predetermined application additional feature information in judging pending text sequence and this synonym text, according to the predetermined additional feature information of using, upgrade pending text sequence or this synonym text, so that pending text sequence and this synonym text all comprises or all do not comprise described application additional feature information.
Wherein, described predetermined application additional feature information comprises the characteristic information that adds restriction to using title; For example, the characteristic information 1,2 etc. of expression application version; The characteristic information 3d of expression effect etc.; Free characteristic information lite, free etc. are used in expression; The characteristic information HD of applicable equipment etc. is used in expression.
Preferably, computer equipment upgrades text sequence or this synonym text according to the predetermined additional feature information of using, and includes but not limited to so that pending text sequence and this synonym text all comprise or all do not comprise the mode of described application additional feature information:
1) computer equipment adds this application additional feature information in the one that does not comprise predetermined application additional feature information;
2) additional feature information should be predeterminedly used in computer equipment deletion in the one that comprises predetermined application additional feature information.
And, for a pending text sequence and/or its all synonym texts, computer equipment is only carried out above-mentioned update mode 1) or 2) in one, to guarantee pending text sequence and this synonym text all comprises or all do not comprise the predetermined additional feature information of using.
For example, computer equipment determines that in step S3 the synonym text of pending text message " Sea World dynamic desktop " comprises " 3d Sea World dynamic desktop ", then computer equipment judges in " Sea World dynamic desktop " and " 3d Sea World dynamic desktop " that only one comprises predetermined application additional feature information, then the predetermined application additional feature information " 3d " in the computer equipment deletion synonym text " 3d Sea World dynamic desktop " is updated to the synonym text " Sea World dynamic desktop ".
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any each synonym text for pending text sequence, when only one comprises predetermined application additional feature information in judging pending text sequence and this synonym text, according to the predetermined additional feature information of using, upgrade pending text sequence or this synonym text, so that pending text sequence and this synonym text all comprise or all do not comprise the implementation of described application additional feature information, all should be within the scope of the present invention.
In this preferred version, can put in order pending text sequence and synonym text thereof, guarantee both unitarities.
As one of preferred version of present embodiment, the method of present embodiment is further comprising the steps of: computer equipment receives subscriber equipment and asks the text sequence of searching for, search for based on described text sequence and synonym text thereof, and Search Results is offered described subscriber equipment.
Particularly, computer equipment receives subscriber equipment and asks the text sequence of searching for, and search for respectively based on described text sequence and synonym text thereof, and will be based on text sequence and synonym thereof after each search result items of gained merges respectively, offer subscriber equipment.
In the present embodiment, by searching for based on text sequence and the synonym text thereof of ask search, can obtain merely to be difficult to obtain based on text sequence search and actual capabilities are the required search result items of user.
Fig. 2 is the method flow diagram that is used for setting up or upgrading candidate's synonym sequence library of a preferred embodiment of the invention.The method of present embodiment comprises step S4 and step S5.
In step S4, computer equipment mates the second Search Results of the first Search Results sequence to be excavated with it of pending text sequence.
Wherein, described the first Search Results and the second Search Results can be respectively the Search Results that carries out the real-time search gained based on pending text sequence and sequence to be excavated, also can be respectively pending text sequence and the historical search result of sequence to be excavated.
Wherein, computer equipment can adopt various ways that the first Search Results and the second Search Results are mated.
For example, computer equipment obtains first the first Search Results and the second Search Results, both is compared again.
Again for example, when the first Search Results and the second Search Results are that historical search is as a result the time, computer equipment is inquired about the historical record that each search result items that the first Search Results comprises occurs in other Search Results, determining whether there is the search result items that appears in the second Search Results in the first Search Results, thereby determine the matching result etc. of the first Search Results and the second Search Results.
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, the implementation that the second Search Results of any the first Search Results sequence to be excavated with it with pending text sequence mates all should be within the scope of the present invention.
In step S5, when the result of described coupling met the first predetermined condition, candidate's synonym sequence library of described pending text sequence is set up or upgraded to computer equipment according to described sequence to be excavated; Wherein, described the first predetermined condition comprises that described the first Search Results comprises at least one identical search result items with described the second Search Results.
For example, when computer equipment matches the first Search Results and the second Search Results and all comprises search result items C in step S4, computer equipment is directly with the candidate synonym sequence of sequence to be excavated as pending text sequence, add in its candidate's synonym sequence library, or this candidate's synonym sequence carried out adding in candidate's synonym sequence library such as after removing meaningless information etc. and adjusting.
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, when any result when described coupling meets the first predetermined condition, set up or upgrade the implementation of candidate's synonym sequence library of described pending text sequence according to described sequence to be excavated, all should be within the scope of the present invention.
As a kind of preferred version of present embodiment, the first predetermined condition comprises that the quantity of the identical search result items that described the first Search Results and described the second Search Results comprise surpasses the first predetermined threshold.For example, the first Search Results comprises the identical search result items more than 30 etc. with the second Search Results.
If because a search result items appears in the Search Results of two text sequence, although can think that then the user has inputted different text sequence, but it wishes the same or analogous to liking of search, the present invention excavates candidate's synonym sequence of a text sequence accordingly, can obtain candidate's synonym sequence that the scheme based on prior art is difficult to recall.
As the another kind of preferred version of present embodiment, the first Search Results and described the second Search Results are respectively described pending text sequence and the historical search result of described sequence to be excavated, and abovementioned steps S4 comprises step S4 '.
In step S4 ', computer equipment mates described the first Search Results and described the second Search Results according to user's click information of the historical search result of pending text sequence and sequence to be excavated;
Wherein, in the present embodiment, described the first predetermined condition comprises among the historical search result of described pending text sequence and described sequence to be excavated and comprises at least one search result items identical and that all clicked by the user.
For example, the historical search result who searches for gained based on pending text sequence query1 comprises search result items C1, C2 and C3, wherein, when search result items C1 and C2 are presented in the historical search result of query1, is clicked by the user; Comprise search result items C1 and C2 among the historical search result based on the gained of another text sequence query2, and search result items C1 when being presented to the user, the Search Results of query2 is clicked by the user; Comprise search result items C1 and C3 among the historical search result based on the gained of another text sequence query3, and search result items C3 when being presented to the user, the Search Results of query3 is clicked by the user; Then have search result items C1 identical and that all clicked by the user based among the historical search result (i.e. the first Search Results) of pending text sequence query1 gained and the historical search result (i.e. the second Search Results) based on text sequence query2 gained in the first Search Results and the second Search Results, text sequence query2 is candidate's synonym sequence of pending text sequence query1.And text sequence query3 is not candidate's synonym sequence of pending text sequence query1.
Preferably, in above-mentioned preferred version, the first predetermined condition comprises that also the click information of the described search result items of all being clicked by the user meets the second predetermined condition.
Wherein, described the second predetermined condition comprises the condition that the click information of the text sequence to be excavated that can be confirmed as candidate's synonym sequence should meet, and meets predetermined rule etc. such as clicking rate above a predetermined threshold, click rule.
For example, the second predetermined condition comprises that the value of the number of clicks addition gained of each search result items that will all be clicked by the user surpasses the pre-threshold value of determining in the first Search Results and the second Search Results.
Need to prove that preferably, the first predetermined condition can comprise all conditions in the above-mentioned preferred version; For example, the quantity of the first predetermined condition can comprise that the first Search Results and the second Search Results comprise identical and the search result items all clicked by the user surpasses the first predetermined threshold; Again for example, the quantity of search result items identical and that all clicked by the user that the first predetermined condition can comprise that the first Search Results and the second Search Results comprise surpasses the first predetermined threshold, and the click information of the search result items that these are all clicked by the user meets the second predetermined condition etc.
As the another kind of preferred version of present embodiment, the method for present embodiment is further comprising the steps of:
After described coupling finishes, computer equipment selects another sequence as the sequence to be excavated of pending text sequence, repeating said steps S1 and S2, until satisfy predetermined stoppage condition, as the personnel that are operated stop repetitive operation, perhaps, the candidate's synonym amount of text that comprises in candidate's synonym text library of pending text sequence has reached 1000 etc.
Preferably, when candidate's synonym sequence of determining by the way surpasses N, computer equipment can be according to the matching degree height of the first Search Results and the second Search Results, select the text sequence to be excavated of ordering top N, candidate's synonym sequence as pending text sequence, wherein, N is the predetermined sequence amount threshold.
If owing to a search result items appears in the Search Results of two text sequence, and all clicked by the user, although can think that then the user has inputted different text sequence, it wishes the same or analogous to liking of search.This preferred version excavates candidate's synonym sequence of a text sequence accordingly, can obtain candidate's synonym sequence that the scheme based on prior art is difficult to recall; Further, because the number of times of all being clicked by the user in two Search Results, frequency etc. are higher, the quantity of itself and the search result items all clicked by the user is more, then the user thinks that the possibility that these two search result items point to same object search is larger, accordingly, this preferred version can also based on the click information of the search result items of all being clicked by the user, further screen candidate's synonym sequence.
Fig. 3 is the structural representation of the definite device that is used for definite synonym text of a preferred embodiment of the invention; Definite device of present embodiment comprises cuts word device 1, inquiry unit 2 and the first selecting arrangement 3, and this determines that device is contained in the computer equipment.
Cut 1 pair of pending text sequence of word device and cut word, obtain at least one text fragments.
Wherein, described pending text sequence comprises that any needs determine the text sequence of its synonym text; Preferably, described pending text sequence comprises the Internet resources title, and this Internet resources title comprises the title of any resource that can obtain in the network, such as Apply Names, audio frequency and video title etc.; More preferably, described pending text sequence comprises Apply Names.
Wherein, cutting the mode that word device 1 obtains pending text sequence includes but not limited to:
1) cuts word device 1 and obtain the pending text sequence of pre-stored; As being pre-stored in text sequence in computer equipment or other equipment etc.;
2) cut word device 1 Real-time Obtaining from user's search sequence, as pending text sequence etc.
Wherein, cut word device 1 and can adopt various ways to come pending text sequence is cut word, obtain its at least one text fragments.
For example, cut word device 1 according to dictionary, pending text sequence " little naughty love is had a shower " is cut word, obtain 3 text fragments " little naughtiness ", " love " and " having a shower " of this pending text sequence.
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any pending text sequence is cut word, obtain the implementation of at least one text fragments, all should be within the scope of the present invention.
Then, inquiry unit 2 is according at least one text fragments of cutting the word gained, in candidate's synonym sequence library of pending text sequence, inquire about, acquisition comprises the one or more candidate's synonym sequence in described at least one text fragments or its synonym, as candidate's synonym text of pending text sequence.
Wherein, meet the first predetermined condition based on the first Search Results of text sequence gained and matching result based on the second Search Results of candidate's synonym sequence gained, this first predetermined condition comprises that described the first Search Results comprises at least one identical search result items with described the second Search Results.Preferably, this first predetermined condition also can comprise other conditions, and will be described in detail with reference to the embodiment shown in FIG. 4, does not repeat them here.。
Wherein, described search result items can comprise any search result information, for example, and Search Results link, Search Results summary etc.
Wherein, candidate's synonym sequence library of pending text sequence can be determined before inquiry unit 2 executable operations in advance; Should determine in advance that the mode in candidate's synonym text sequence storehouse will be described in detail with reference to the embodiment shown in FIG. 4, not repeat them here.
Wherein, inquiry unit 2 can adopt various ways to determine the synonym of a text fragments; For example, determine one or more synonyms of a text fragments by inquiring about predetermined synonymicon; Again for example, by inquiring about predetermined synonym word dictionary, and determine one or more synonyms etc. of a text fragments in conjunction with semantic analysis.
Particularly, inquiry unit 2 is according to cutting at least one text fragments that word device 1 is cut the word gained, in candidate's synonym sequence library of pending text sequence, inquire about, acquisition comprises the one or more candidate's synonym sequence in described at least one text fragments or its synonym, includes but not limited to as the mode of candidate's synonym text of pending text sequence:
1) when inquiry unit 2 inquiry and when determining that candidate's synonym sequence comprises the synonym of one or more text fragments at least one text fragments of cutting the word gained or this at least one text fragments, determines that this candidate's synonym sequence is candidate's synonym text of pending text sequence.
For example, the text fragments of pending text sequence " crocodile is liked to have a shower " comprises " crocodile ", " love " and " having a shower ", and candidate's synonym sequence comprises " little naughty love is had a bath ", " crocodile is liked to have a bath ", " little naughtiness is had a bath ", " having washed ", " how washing "; Then inquiry unit 2 is inquired about in candidate's synonym sequence library of text sequence " crocodile is liked to have a bath ", and determine that " little naughty like to have a bath " comprises that the synonym of text fragments " love " and " having a shower " " has a bath ", candidate's synonym sequence " crocodile like have a bath " comprises that the synonym of text fragments " crocodile " and " love " and " having a shower " " has a bath ", candidate's synonym sequence " little naughtiness is had a bath " comprises that the synonym that text fragments " is had a shower " " has a bath ", and then inquiry unit 2 is with candidate's synonym sequence " little naughty like to have a bath ", " crocodile is liked to have a bath " and " little naughtiness is had a bath " are as candidate's synonym text of pending text sequence " crocodile is liked to have a shower ".
2) inquiry unit 2 further comprises the first deriving means (not shown), subquery device (not shown), first definite device (not shown) and second definite device (not shown); The first deriving means obtains the synonym of at least one text fragments of cutting the word gained; The subquery device is inquired about in candidate's synonym sequence library of described text sequence, to obtain to comprise described synon candidate's synonym sequence; And, when the candidate's synonym sequence that obtains when described inquiry only comprises described synonym, first determine device directly with the described candidate's synonym sequence that inquires as described candidate's synonym text; When the candidate's synonym sequence that obtains when described inquiry comprises described synonym and other text messages, second determines that device is with other text messages candidate synonym sequence identical with pending text sequence part that comprises, as described candidate's synonym text.
For example, the text fragments of pending text sequence " crocodile is liked to have a shower " comprises " crocodile ", " love " and " having a shower ", and candidate's synonym sequence library comprises " little naughty love is had a bath ", " crocodile is liked to have a bath ", " little naughtiness is had a bath ", " having washed ", " how washing ".
The subquery device inquires candidate's synonym sequence " little naughty like to have a bath " and comprises that the first deriving means obtains in candidate's synonym sequence library of pending text sequence " crocodile like have a shower ", the synonym that text fragments " is had a shower " " is had a bath ", and judge in other text messages in candidate's synonym sequence " little naughty like to have a bath " " little naughty like " and the pending text sequence " crocodile is liked to have a shower " to have the identical text message of part " loves " that then second determines that device determines that candidate's synonym sequence " little naughty like to have a bath " is candidate's synonym text of pending text sequence " crocodile love is had a shower ".
Then, similarly, the first deriving means in the inquiry unit 2, subquery device, first determine that device and second determines that device continues to carry out corresponding operating, determine that candidate's synonym sequence " little naughty like to have a bath ", " crocodile is liked to have a bath " and " little naughtiness is had a bath " are candidate's synonym text of pending text sequence " crocodile love is had a shower ".
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any synonym that obtains at least one text fragments of cutting the word gained, and in candidate's synonym sequence library of described text sequence, inquire about, to obtain to comprise described synon candidate's synonym sequence, and, when the candidate's synonym sequence that obtains when described inquiry only comprises described synonym, directly with the described candidate's synonym sequence that inquires as described candidate's synonym text, when the candidate's synonym sequence that obtains when described inquiry comprises described synonym and other text messages, with other text messages candidate synonym sequence identical with pending text sequence part that comprises, as the implementation of described candidate's synonym text; For example, when candidate's synonym sequence that the inquiry of group inquiry unit obtains comprises synonym and other text messages, it is all or part of when identical only to work as in other text messages of comprising and the pending text sequence text fragments except the corresponding text fragments of synonym that this candidate's synonym sequence comprises, second determines that device is just with this candidate's synonym sequence, as described candidate's synonym text etc., all should be within the scope of the present invention.
What need to further specify is that those skilled in the art will be understood that under the mode that the explanation to inquiry unit 2 limits inquiry unit 2 can adopt various ways to select candidate's synonym text from candidate's synonym sequence library.For example, inquiry unit 2 can be inquired about first and determine that all comprise synon candidate's text sequence of text fragments, therefrom select candidate's synonym text again; Perhaps, inquiry unit 2 can judge one by one also whether each candidate's text sequence is candidate's synonym text.
Then, the first selecting arrangement 3 is selected the synonym text of pending text sequence from candidate's synonym text.
Particularly, the first selecting arrangement 3 selects the mode of the synonym text of described text sequence to include but not limited to from described candidate's synonym text:
1) the first selecting arrangement 3 is standby according to the degree of association between candidate's synonym text and the pending text sequence, selection synonym text from candidate's synonym text.Wherein, the degree of association between candidate's synonym text and the pending text sequence is higher, and then candidate's synonym text is higher by the possibility that the first selecting arrangement 3 is chosen as the synonym text.
Wherein, this degree of association can be determined based on many factors, for example, click information based on the search result items of all being clicked by the user in the first Search Results of the Search Results of candidate's synonym text and pending text sequence is determined, wherein, the click information of search result items includes but not limited to clicking rate, number of clicks, clicked time, click frequency of search result items etc.; Preferably, clicking rate, number of clicks, click frequency etc. are higher, and then the degree of association is higher.Preferably, the pre-degree of closeness of determining between the synonym of the text fragments that the degree of association also can comprise based on candidate's synonym text and the pending text sequence, the synonym that candidate's synonym text comprises shared ratio etc. in this candidate's synonym text is determined.
2) pending text sequence comprises the Internet resources title, the first selecting arrangement 3 comprises the second sub-selecting arrangement (not shown), the second sub-selecting arrangement selects the synonym text of described text sequence by in all or part of candidate's synonym text of candidate's synonym text each being carried out at least one among following operation A and the B from described candidate's synonym text; Wherein, the second sub-selecting arrangement can all be carried out following operation A and/or B in candidate's synonym text each, perhaps, the second sub-selecting arrangement can be according to the degree of association order from high to low between candidate's synonym text and the pending text sequence, perhaps, according to based on the weights order from high to low of determining such as the degree of association, the pre-parameters such as importance degree of determining, one by one each candidate's synonym text is carried out following operation, until till obtaining predetermined quantity (such as 30) or all preferred synonym texts having been executed following operation A and/or B.
Below will describe operation A and B:
Operation A: judge pending text sequence and whether have non-synonym feature when candidate's synonym text of pre-treatment.
Wherein, described non-synonym feature comprises and anyly can embody pending text sequence and candidate's synonym text is not synon characteristic information.Preferably, this non-synonym feature includes but not limited to following at least one:
1) the corresponding Internet resources of pending text sequence and the corresponding Internet resources of candidate's synonym text belong to different brands.
For example, belong to the application of different brands, as belong to the QQ mobile phone assistant of QQ and belong to 360 360 mobile phone assistants etc.
Again for example, the films and television programs etc. that belong to different brands.
Preferably, the second sub-selecting arrangement can be by identifying the text message that has brand identity in pending text sequence and the candidate's synonym text, such as QQ, 360 etc., perhaps, obtain computer equipment or other equipment text sequence that determine in advance, pending and the brand message of candidate's synonym text, determine whether the corresponding Internet resources of pending text sequence and the corresponding Internet resources of candidate's synonym text belong to different brands.
2) candidate's synonym text comprises the predetermined resource vocabulary of deriving; Wherein, this predetermined resource vocabulary of deriving comprises relevant with Internet resources but is not the vocabulary of Internet resources itself.
For example, with to use game relevant but do not belong to walkthrough, map, the modifier of using game itself; Again for example, relevant with films and television programs but do not belong to film review of films and television programs etc.
3) described candidate's synonym text comprises predetermined resource fragment feature; Wherein, this predetermined resource fragment feature comprises a specific part that belongs to resource, but not describes the feature of resource integral body.
For example, the special scenes title in the game; Again for example, clip name of films and television programs etc.
4) one in pending text sequence and the described candidate's synonym text is the instantiation of another one.
For example, accurate Apply Names is general instantiation with using, and is the instantiation etc. of " race component software " such as " peace rabbit rabbit runs component software ".
Preferably, the second sub-selecting arrangement can whether classification be the subclassification of another one under the one in pending text sequence and the described candidate's synonym text by identifying, perhaps, whether the identification one is the predetermined instantiation of another one, perhaps, obtain computer equipment or other equipment text sequence that determine in advance, pending and the instantiation information of candidate's synonym text, determine that the one in pending text sequence and the described candidate's synonym text is the instantiation of another one.
5) there is macaronic at least text message in pending text sequence and the described candidate's synonym text, and will be wherein a kind of Language Translation be that the translation result of another kind of language gained does not exist synonym in the text message of this another kind language, also be, after all or part of text message of one is another kind of language from a kind of Language Translation in pending text sequence and the candidate's synonym text, in another one, there is not corresponding synonym.
For example, have English and Chinese macaronic text message in pending text sequence " sd card cleaning tool " and the candidate's synonym text " Disk Cleanup instrument ", and there is not corresponding synonym etc. in the text fragments in the pending text sequence " sd card cleaning tool " English " sd card " Chinese " safe digital card " of gained after translation in candidate's synonym text " Disk Cleanup instrument ".
Need to prove, giving an example only for technical scheme of the present invention is described better of above-mentioned non-synonym feature, but not limitation of the present invention, those skilled in the art should understand that, anyly can embody pending text sequence and candidate's synonym text is not synon characteristic information, all should be within the scope of the present invention.
Particularly, when judge judging pending text sequence and when candidate's synonym text of pre-treatment has non-synonym feature, the second sub-selecting arrangement will not worked as candidate's synonym text of pre-treatment as the synonym text of pending text sequence.
For example, the second sub-selecting arrangement will not belong to " 360 mobile phone assistant " the synonym text of different brands " QQ mobile phone assistant " conduct " QQ mobile phone assistant "; Again for example, the second sub-selecting arrangement not with candidate's synonym text " peace rabbit rabbit runs component software " of the instantiation of text sequence " race component software " as its synonym text etc.
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any when judge judging pending text sequence and when candidate's synonym text of pre-treatment has non-synonym feature, will not work as candidate's synonym text of pre-treatment as the implementation of the synonym text of pending text sequence, all should be within the scope of the present invention.
Operation B: judge whether the candidate's synonym text when pre-treatment exists corresponding Internet resources.
Wherein, when there were corresponding Internet resources in judgement, candidate's synonym text selecting that the second sub-selecting arrangement will be worked as pre-treatment was the synonym text of pending text sequence.
Particularly, the second sub-selecting arrangement judges whether the candidate's synonym text when pre-treatment exists the mode of corresponding Internet resources to include but not limited to:
1) the second sub-selecting arrangement obtains in advance Internet resources judged result that determine, candidate's synonym text, whether has corresponding Internet resources to judge the candidate's synonym text when pre-treatment.
For example, the Internet resources judged result that the second sub-selecting arrangement obtains is that itself or other equipment was determined before the first selecting arrangement 3 executable operations in advance, whether candidate's synonym text " crocodile is liked to have a bath " exists Internet resources in network judges whether " crocodile is liked to have a bath " exists corresponding Internet resources.
Wherein, the pre-mode of determining the Internet resources judged result of candidate's synonym text, with following implementation 2) in the second sub-selecting arrangement real-time judge when whether candidate's synonym text of pre-treatment exists the mode of corresponding Internet resources same or similar, do not repeat them here.
2) whether the second sub-selecting arrangement real-time judge exists corresponding Internet resources when candidate's synonym text of pre-treatment.
Preferably, whether the second sub-selecting arrangement real-time judge exists the mode of corresponding Internet resources to include but not limited to when candidate's synonym text of pre-treatment:
I) the second sub-selecting arrangement is based on the candidate's synonym text when pre-treatment, in the predetermined network resource website, carry out resource searching, and according to whether obtaining the resource searching result, judge whether described candidate's synonym text when pre-treatment exists corresponding Internet resources in described predetermined network resource website.
For example, the predetermined network resource website comprises Android (Android) website, when candidate's synonym text of pre-treatment comprises " crocodile is liked to have a bath ", the second sub-selecting arrangement is searched for based on " crocodile is liked to have a bath " in the Android website, and according to whether obtaining the resource searching result, judge whether " crocodile is liked to have a bath " exists corresponding Internet resources in the Android website.
Ii) the second sub-selecting arrangement is based on the candidate's synonym text when pre-treatment, carry out Webpage search, and according to whether can in the webpage of search gained, extracting the text message that meets the pre-determined text template, judge whether described candidate's synonym text when pre-treatment exists corresponding Internet resources, wherein, described pre-determined text template comprise described when pre-treatment candidate's synonym text and with the predetermined vocabulary of its character pitch less than predetermined threshold.Wherein, the pre-determined text template can be one or more.
For example, the pre-determined text template comprises " [XXX] download ", " [XXX] trivial games " and " [XXX] play download ", wherein " XXX " expression is when candidate's synonym text of pre-treatment, predetermined vocabulary " downloads ", " trivial games " and " game is downloaded " and work as character pitch between candidate's synonym text of pre-treatment less than or equal to 1 character; Then the second sub-selecting arrangement carries out Webpage search based on the candidate's synonym text " crocodile is liked to have a bath " when pre-treatment, and according to whether can in the webpage of search gained, extracting the text message that meets pre-determined text template " [crocodile is liked to have a bath] download/trivial games/game is downloaded ", judge whether the candidate's synonym text " crocodile is liked to have a bath " when pre-treatment exists corresponding Internet resources.
Need to prove, the second sub-selecting arrangement can be based on described candidate's synonym text when pre-treatment, in the predetermined network resource website, carry out resource searching, and, based on described candidate's synonym text when pre-treatment, carry out Webpage search, and according to whether obtaining the resource searching result and whether can in the webpage of search gained, extract the text message that meets the pre-determined text template, judge whether described candidate's synonym text when pre-treatment exists corresponding Internet resources.
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, whether any judgement exists the implementation of corresponding Internet resources when candidate's synonym text of pre-treatment, all should be within the scope of the present invention.
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any implementation of from candidate's synonym text, selecting the synonym text of pending text sequence, for example, from candidate's synonym text, select at random the synonym text of predetermined quantity etc., all should be within the scope of the present invention.
In the present embodiment, the search result items of in Search Results separately, all being clicked by the user be can obtain and related pending text sequence and candidate's synonym sequence thereof set up, and judge in several ways that further whether each candidate's synonym sequence is the synonym text of pending text sequence really, thereby can obtain to be difficult in the prior art synonym of the pending text sequence of recalling, and can improve preferably the synonym judgment accuracy of pending text sequence.
As one of preferred version of present embodiment, pending text sequence comprises Apply Names, definite device of present embodiment also comprises text updating device (not shown), each synonym text for pending text sequence, when only one comprises predetermined application additional feature information in judging pending text sequence and this synonym text, text updating device is according to the predetermined additional feature information of using, upgrade pending text sequence or this synonym text, so that pending text sequence and this synonym text all comprises or all do not comprise described application additional feature information.
Wherein, described predetermined application additional feature information comprises the characteristic information that adds restriction to using title; For example, the characteristic information 1,2 etc. of expression application version; The characteristic information 3d of expression effect etc.; Free characteristic information lite, free etc. are used in expression; The characteristic information HD of applicable equipment etc. is used in expression.
Preferably, the text updating device upgrades text sequence or this synonym text according to the predetermined additional feature information of using, and includes but not limited to so that pending text sequence and this synonym text all comprise or all do not comprise the mode of described application additional feature information:
1) the text updating device adds this application additional feature information in the one that does not comprise predetermined application additional feature information;
2) additional feature information should be predeterminedly used in the deletion in the one that comprises predetermined application additional feature information of text updating device.
And, for a pending text sequence and/or its all synonym texts, the text updating device is only carried out above-mentioned update mode 1) or 2) in one, to guarantee pending text sequence and this synonym text all comprises or all do not comprise the predetermined additional feature information of using.
For example, the first selecting arrangement 3 determines that the synonym text of pending text message " Sea World dynamic desktop " comprises " 3d Sea World dynamic desktop ", then the text updating device judges in " Sea World dynamic desktop " and " 3d Sea World dynamic desktop " that only one comprises predetermined application additional feature information, then the predetermined application additional feature information " 3d " in the text updating device deletion synonym text " 3d Sea World dynamic desktop " is updated to the synonym text " Sea World dynamic desktop ".
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any each synonym text for pending text sequence, when only one comprises predetermined application additional feature information in judging pending text sequence and this synonym text, according to the predetermined additional feature information of using, upgrade pending text sequence or this synonym text, so that pending text sequence and this synonym text all comprise or all do not comprise the implementation of described application additional feature information, all should be within the scope of the present invention.
In this preferred version, can put in order pending text sequence and synonym text thereof, guarantee both unitarities.
As one of preferred version of present embodiment, definite device of present embodiment also comprises receiving trap (not shown) and generator (not shown), receiving trap receives subscriber equipment and asks the text sequence of searching for, generator is searched for based on described text sequence and synonym text thereof, and Search Results is offered described subscriber equipment.
Particularly, receiving trap receives subscriber equipment and asks the text sequence of searching for, generator is searched for respectively based on described text sequence and synonym text thereof, and will be based on text sequence and synonym thereof after each search result items of gained merges respectively, offers subscriber equipment.
In the present embodiment, by searching for based on text sequence and the synonym text thereof of ask search, can obtain merely to be difficult to obtain based on text sequence search and actual capabilities are the required search result items of user.
Fig. 4 is the structural representation that is used for setting up or upgrading the updating device of candidate's synonym sequence library of a preferred embodiment of the invention; Definite device of present embodiment comprises coalignment 4 and storehouse updating device 5.
Coalignment 4 mates the second Search Results of the first Search Results sequence to be excavated with it of pending text sequence.
Wherein, described the first Search Results and the second Search Results can be respectively the Search Results that carries out the real-time search gained based on pending text sequence and sequence to be excavated, also can be respectively pending text sequence and the historical search result of sequence to be excavated.
Wherein, coalignment 4 can adopt various ways that the first Search Results and the second Search Results are mated.
For example, coalignment 4 obtains first the first Search Results and the second Search Results, both is compared again.
Again for example, when the first Search Results and the second Search Results are that historical search is as a result the time, the historical record that each search result items that coalignment 4 inquiries the first Search Results comprises occurs in other Search Results, determining whether there is the search result items that appears in the second Search Results in the first Search Results, thereby determine the matching result etc. of the first Search Results and the second Search Results.
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, the implementation that the second Search Results of any the first Search Results sequence to be excavated with it with pending text sequence mates all should be within the scope of the present invention.
When the result of described coupling met the first predetermined condition, candidate's synonym sequence library of described pending text sequence is set up or upgraded to storehouse updating device 5 according to described sequence to be excavated; Wherein, described the first predetermined condition comprises that described the first Search Results comprises at least one identical search result items with described the second Search Results.
For example, when coalignment 4 matches the first Search Results and the second Search Results and all comprises search result items C, storehouse updating device 5 is directly with the candidate synonym sequence of sequence to be excavated as pending text sequence, add in its candidate's synonym sequence library, or this candidate's synonym sequence carried out adding in candidate's synonym sequence library such as after removing meaningless information etc. and adjusting.
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, when any result when described coupling meets the first predetermined condition, set up or upgrade the implementation of candidate's synonym sequence library of described pending text sequence according to described sequence to be excavated, all should be within the scope of the present invention.
As a kind of preferred version of present embodiment, the first predetermined condition comprises that the quantity of the identical search result items that described the first Search Results and described the second Search Results comprise surpasses the first predetermined threshold.For example, the first Search Results comprises the identical search result items more than 30 etc. with the second Search Results.
If because a search result items appears in the Search Results of two text sequence, although can think that then the user has inputted different text sequence, but it wishes the same or analogous to liking of search, the present invention excavates candidate's synonym sequence of a text sequence accordingly, can obtain candidate's synonym sequence that the scheme based on prior art is difficult to recall.
Another kind of preferred version as present embodiment, the first Search Results and described the second Search Results are respectively described pending text sequence and the historical search result of described sequence to be excavated, and aforementioned coalignment 4 comprises sub-coalignment (not shown).
Sub-coalignment mates described the first Search Results and described the second Search Results according to user's click information of the historical search result of pending text sequence and sequence to be excavated;
Wherein, in the present embodiment, described the first predetermined condition comprises among the historical search result of described pending text sequence and described sequence to be excavated and comprises at least one search result items identical and that all clicked by the user.
For example, the historical search result who searches for gained based on pending text sequence query1 comprises search result items C1, C2 and C3, wherein, when search result items C1 and C2 are presented in the historical search result of query1, is clicked by the user; Comprise search result items C1 and C2 among the historical search result based on the gained of another text sequence query2, and search result items C1 when being presented to the user, the Search Results of query2 is clicked by the user; Comprise search result items C1 and C3 among the historical search result based on the gained of another text sequence query3, and search result items C3 when being presented to the user, the Search Results of query3 is clicked by the user; Then have search result items C1 identical and that all clicked by the user based among the historical search result (i.e. the first Search Results) of pending text sequence query1 gained and the historical search result (i.e. the second Search Results) based on text sequence query2 gained in the first Search Results and the second Search Results, text sequence query2 is candidate's synonym sequence of pending text sequence query1.And text sequence query3 is not candidate's synonym sequence of pending text sequence query1.
Preferably, in above-mentioned preferred version, the first predetermined condition comprises that also the click information of the described search result items of all being clicked by the user meets the second predetermined condition.
Wherein, described the second predetermined condition comprises the condition that the click information of the text sequence to be excavated that can be confirmed as candidate's synonym sequence should meet, and meets predetermined rule etc. such as clicking rate above a predetermined threshold, click rule.
For example, the second predetermined condition comprises that the value of the number of clicks addition gained of each search result items that will all be clicked by the user surpasses the pre-threshold value of determining in the first Search Results and the second Search Results.
Need to prove that preferably, the first predetermined condition can comprise all conditions in the above-mentioned preferred version; For example, the quantity of the first predetermined condition can comprise that the first Search Results and the second Search Results comprise identical and the search result items all clicked by the user surpasses the first predetermined threshold; Again for example, the quantity of search result items identical and that all clicked by the user that the first predetermined condition can comprise that the first Search Results and the second Search Results comprise surpasses the first predetermined threshold, and the click information of the search result items that these are all clicked by the user meets the second predetermined condition etc.
As the another kind of preferred version of present embodiment, the updating device of present embodiment also comprises the iteration means (not shown).
After described coupling finishes, iteration means selects another sequence as the sequence to be excavated of pending text sequence, to trigger coalignment and storehouse updating device executable operations, until satisfy predetermined stoppage condition, as the personnel that are operated stop repetitive operation, perhaps, the candidate's synonym amount of text that comprises in candidate's synonym text library of pending text sequence has reached 1000 etc.
Preferably, when candidate's synonym sequence of determining by the way surpasses N, updating device can be according to the matching degree height of the first Search Results and the second Search Results, select the text sequence to be excavated of ordering top N, candidate's synonym sequence as pending text sequence, wherein, N is the predetermined sequence amount threshold.
If owing to a search result items appears in the Search Results of two text sequence, and all clicked by the user, although can think that then the user has inputted different text sequence, it wishes the same or analogous to liking of search.This preferred version excavates candidate's synonym sequence of a text sequence accordingly, can obtain candidate's synonym sequence that the scheme based on prior art is difficult to recall; Further, because the number of times of all being clicked by the user in two Search Results, frequency etc. are higher, the quantity of itself and the search result items all clicked by the user is more, then the user thinks that the possibility that these two search result items point to same object search is larger, accordingly, this preferred version can also based on the click information of the search result items of all being clicked by the user, further screen candidate's synonym sequence.
It should be noted that the present invention can be implemented in the assembly of software and/or software and hardware, for example, each device of the present invention can adopt special IC (ASIC) or any other similar hardware device to realize.In one embodiment, software program of the present invention can carry out to realize step mentioned above or function by processor.Similarly, software program of the present invention (comprising relevant data structure) can be stored in the computer readable recording medium storing program for performing, for example, and RAM storer, magnetic or CD-ROM driver or flexible plastic disc and similar devices.In addition, steps more of the present invention or function can adopt hardware to realize, for example, thereby as cooperate the circuit of carrying out each step or function with processor.
To those skilled in the art, obviously the invention is not restricted to the details of above-mentioned example embodiment, and in the situation that does not deviate from spirit of the present invention or essential characteristic, can realize the present invention with other concrete form.Therefore, no matter from which point, all should regard embodiment as exemplary, and be nonrestrictive, scope of the present invention is limited by claims rather than above-mentioned explanation, therefore is intended to be included in the present invention dropping on the implication that is equal to important document of claim and all changes in the scope.Any Reference numeral in the claim should be considered as limit related claim.In addition, obviously other unit or step do not got rid of in " comprising " word, and odd number is not got rid of plural number.A plurality of unit of stating in system's claim or device also can be realized by software or hardware by a unit or device.The first, the second word such as grade is used for representing title, and does not represent any specific order.

Claims (26)

1. method of be used for setting up or upgrading candidate's synonym sequence library, wherein, the method may further comprise the steps:
A mates the second Search Results of the first Search Results sequence to be excavated with it of pending text sequence;
Wherein, the method is further comprising the steps of:
When X meets the first predetermined condition as the result of described coupling, set up or upgrade candidate's synonym sequence library of described pending text sequence according to described sequence to be excavated;
Wherein, described the first predetermined condition comprises that described the first Search Results comprises at least one identical search result items with described the second Search Results.
2. method according to claim 1, wherein, described the first predetermined condition comprises that the quantity of the described identical search result items that described the first Search Results and described the second Search Results comprise surpasses the first predetermined threshold.
3. method according to claim 1, wherein, described the first Search Results and described the second Search Results are respectively described pending text sequence and the historical search result of described sequence to be excavated, and wherein, described steps A may further comprise the steps:
-according to user's click information of the described historical search result of described pending text sequence and described sequence to be excavated, described the first Search Results and described the second Search Results are mated;
Wherein, described the first predetermined condition comprises among the historical search result of described pending text sequence and described sequence to be excavated and comprises at least one search result items identical and that all clicked by the user.
4. method according to claim 3, wherein, described the first predetermined condition comprises that also the click information of the described search result items of all being clicked by the user meets the second predetermined condition.
5. each described method in 4 according to claim 1, wherein, the method is further comprising the steps of:
-after described coupling finishes, select another sequence as the sequence to be excavated of described pending text sequence, repeating said steps A and X.
6. method of be used for determining the synonym text, wherein, the method may further comprise the steps:
A cuts word to pending text sequence, obtains at least one text fragments;
B is according to described at least one text fragments, in candidate's synonym sequence library of described text sequence, inquire about, acquisition comprises the one or more candidate's synonym sequence in described at least one text fragments or its synonym, candidate's synonym text as described text sequence, wherein, meet the first predetermined condition based on the first Search Results of described text sequence gained and matching result based on the second Search Results of described candidate's synonym sequence gained;
C selects the synonym text of described text sequence from described candidate's synonym text;
Wherein, described the first predetermined condition comprises that described the first Search Results comprises at least one identical search result items with described the second Search Results.
7. method according to claim 6, wherein, described step b may further comprise the steps:
-obtain the synonym of described at least one text fragments;
-in candidate's synonym sequence library of described text sequence, inquire about, to obtain to comprise described synon candidate's synonym sequence;
When-candidate's synonym sequence of obtaining when described inquiry only comprises described synonym, directly with the described candidate's synonym sequence that inquires as described candidate's synonym text;
When-candidate's synonym sequence of obtaining when described inquiry comprises described synonym and other text messages, with other text messages candidate synonym sequence identical with described text sequence part that comprises, as described candidate's synonym text.
8. according to claim 6 or 7 described methods, wherein, described text sequence comprises the Internet resources title, and described step c may further comprise the steps:
-pass through following at least one the operation of each execution in all or part of candidate's synonym text of described candidate's synonym text selected the synonym text of described text sequence from described candidate's synonym text:
Operation A: judge described text sequence and whether have non-synonym feature when candidate's synonym text of pre-treatment;
Operation B: judge whether the candidate's synonym text when pre-treatment exists corresponding Internet resources.
9. method according to claim 8, wherein, described non-synonym feature comprises following at least one:
The corresponding Internet resources of-described text sequence and the corresponding Internet resources of described candidate's synonym text belong to different brands;
-described candidate's synonym text comprises the predetermined resource vocabulary of deriving;
-described candidate's synonym text comprises predetermined resource fragment feature;
One in-described text sequence and the described candidate's synonym text is the instantiation of another one;
Have macaronic at least text message in-described text sequence and the described candidate's synonym text, and wherein a kind of Language Translation is that the translation result of another kind of language gained does not exist synonym in the text message of this another kind language.
10. according to claim 8 or 9 described methods, wherein, described operation B comprises:
-based on described candidate's synonym text when pre-treatment, in the predetermined network resource website, carry out resource searching, and according to whether obtaining the resource searching result, judge whether described candidate's synonym text when pre-treatment exists corresponding Internet resources in described predetermined network resource website.
11. each described method in 10 according to claim 8, wherein, described operation B comprises:
-based on described candidate's synonym text when pre-treatment, carry out Webpage search, and according to whether can in the webpage of search gained, extracting the text message that meets the pre-determined text template, judge whether described candidate's synonym text when pre-treatment exists corresponding Internet resources, wherein, described pre-determined text template comprise described when pre-treatment candidate's synonym text and with the predetermined vocabulary of its character pitch less than predetermined threshold.
12. each described method in 11 according to claim 6, wherein, described text sequence comprises Apply Names, and the method is further comprising the steps of:
-for each synonym text of described text sequence, when only one comprises predetermined application additional feature information in judging described text sequence and this synonym text, according to described predetermined application additional feature information, upgrade described text sequence or this synonym text, so that described text sequence and this synonym text all comprises or all do not comprise described application additional feature information.
13. each described method in 12 according to claim 6, wherein, the method is further comprising the steps of:
-receive subscriber equipment to ask the text sequence of searching for;
-search for based on described text sequence and synonym text thereof, and Search Results is offered described subscriber equipment.
14. a updating device that is used for setting up or upgrading candidate's synonym sequence library, wherein, this updating device comprises:
Coalignment is used for the second Search Results of the first Search Results sequence to be excavated with it of pending text sequence is mated;
The storehouse updating device is used for when the result of described coupling meets the first predetermined condition, according to described sequence foundation to be excavated or upgrade candidate's synonym sequence library of described pending text sequence;
Wherein, described the first predetermined condition comprises that described the first Search Results comprises at least one identical search result items with described the second Search Results.
15. updating device according to claim 14, wherein, described the first predetermined condition comprises that the quantity of the described identical search result items that described the first Search Results and described the second Search Results comprise surpasses the first predetermined threshold.
16. updating device according to claim 14, wherein, described the first Search Results and described the second Search Results are respectively described pending text sequence and the historical search result of described sequence to be excavated, and wherein, described coalignment comprises:
Sub-coalignment is used for the user's click information according to the described historical search result of described pending text sequence and described sequence to be excavated, and described the first Search Results and described the second Search Results are mated;
Wherein, described the first predetermined condition comprises among the historical search result of described pending text sequence and described sequence to be excavated and comprises at least one search result items identical and that all clicked by the user.
17. updating device according to claim 16, wherein, described the first predetermined condition comprises that also the click information of the described search result items of all being clicked by the user meets the second predetermined condition.
18. each described updating device in 17 according to claim 14, wherein, this updating device also comprises:
Iteration means is used for after described coupling end, selects another sequence as the sequence to be excavated of described pending text sequence, to trigger described coalignment and described storehouse updating device executable operations.
19. a synonym text that is used for definite synonym text is determined device, wherein, this synonym text determines that device comprises:
Cut the word device, be used for pending text sequence is cut word, obtain at least one text fragments;
Inquiry unit, be used for according to described at least one text fragments, in candidate's synonym sequence library of described text sequence, inquire about, acquisition comprises the one or more candidate's synonym sequence in described at least one text fragments or its synonym, candidate's synonym text as described text sequence, wherein, meet the first predetermined condition based on the first Search Results of described text sequence gained and matching result based on the second Search Results of described candidate's synonym sequence gained;
The first selecting arrangement is used for from the synonym text of the described text sequence of described candidate's synonym text selection;
Wherein, described the first predetermined condition comprises that described the first Search Results comprises at least one identical search result items with described the second Search Results.
20. synonym text according to claim 19 is determined device, wherein, described inquiry unit comprises:
The first deriving means is for the synonym that obtains described at least one text fragments;
The subquery device is used for inquiring about at candidate's synonym sequence library of described text sequence, to obtain to comprise described synon candidate's synonym sequence;
First determines device, be used for when candidate's synonym sequence that described inquiry obtains only comprises described synonym, directly with the described candidate's synonym sequence that inquires as described candidate's synonym text;
Second determines device, when comprising described synonym and other text messages for the candidate's synonym sequence that obtains when described inquiry, with other text messages candidate synonym sequence identical with described text sequence part that comprises, as described candidate's synonym text.
21. according to claim 19 or 20 described synonym texts determine device, wherein, described text sequence comprises the Internet resources title, described the first selecting arrangement comprises:
The second sub-selecting arrangement, for following at least one operation of each execution of passing through all or part of candidate's synonym text of described candidate's synonym text, select the synonym text of described text sequence from described candidate's synonym text:
Operation A: judge described text sequence and whether have non-synonym feature when candidate's synonym text of pre-treatment;
Operation B: judge whether the candidate's synonym text when pre-treatment exists corresponding Internet resources.
22. synonym text according to claim 21 is determined device, wherein, described non-synonym feature comprises following at least one:
The corresponding Internet resources of-described text sequence and the corresponding Internet resources of described candidate's synonym text belong to different brands;
-described candidate's synonym text comprises the predetermined resource vocabulary of deriving;
-described candidate's synonym text comprises predetermined resource fragment feature;
One in-described text sequence and the described candidate's synonym text is the instantiation of another one;
Have macaronic at least text message in-described text sequence and the described candidate's synonym text, and wherein a kind of Language Translation is that the translation result of another kind of language gained does not exist synonym in the text message of this another kind language.
23. according to claim 21 or 22 described synonym texts determine device, wherein, described operation B comprises:
-based on described candidate's synonym text when pre-treatment, in the predetermined network resource website, carry out resource searching, and according to whether obtaining the resource searching result, judge whether described candidate's synonym text when pre-treatment exists corresponding Internet resources in described predetermined network resource website.
24. each described synonym text is determined device in 23 according to claim 21, wherein, described operation B comprises:
-based on described candidate's synonym text when pre-treatment, carry out Webpage search, and according to whether can in the webpage of search gained, extracting the text message that meets the pre-determined text template, judge whether described candidate's synonym text when pre-treatment exists corresponding Internet resources, wherein, described pre-determined text template comprise described when pre-treatment candidate's synonym text and with the predetermined vocabulary of its character pitch less than predetermined threshold.
25. each described synonym text is determined device in 24 according to claim 19, wherein, described text sequence comprises Apply Names, and this synonym text determines that device also comprises:
The text updating device, be used for each the synonym text for described text sequence, when only one comprises predetermined application additional feature information in judging described text sequence and this synonym text, according to described predetermined application additional feature information, upgrade described text sequence or this synonym text, so that described text sequence and this synonym text all comprises or all do not comprise described application additional feature information.
26. each described synonym text is determined device in 25 according to claim 19, wherein, this synonym text determines that device also comprises:
Receiving trap be used for to receive subscriber equipment and asks the text sequence of searching for;
Generator is used for searching for based on described text sequence and synonym text thereof, and Search Results is offered described subscriber equipment.
CN201210457084.2A 2012-11-14 2012-11-14 A kind of method and apparatus for determining synonym text Active CN102982125B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210457084.2A CN102982125B (en) 2012-11-14 2012-11-14 A kind of method and apparatus for determining synonym text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210457084.2A CN102982125B (en) 2012-11-14 2012-11-14 A kind of method and apparatus for determining synonym text

Publications (2)

Publication Number Publication Date
CN102982125A true CN102982125A (en) 2013-03-20
CN102982125B CN102982125B (en) 2016-03-02

Family

ID=47856143

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210457084.2A Active CN102982125B (en) 2012-11-14 2012-11-14 A kind of method and apparatus for determining synonym text

Country Status (1)

Country Link
CN (1) CN102982125B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899322A (en) * 2015-06-18 2015-09-09 百度在线网络技术(北京)有限公司 Search engine and implementation method thereof
CN105095203A (en) * 2014-04-17 2015-11-25 阿里巴巴集团控股有限公司 Methods for determining and searching synonym, and server
WO2016155643A1 (en) * 2015-04-01 2016-10-06 北京奇虎科技有限公司 Input-based candidate word display method and device
CN106844325A (en) * 2015-12-04 2017-06-13 北大医疗信息技术有限公司 Medical information processing method and medical information processing unit
CN110162753A (en) * 2018-11-08 2019-08-23 腾讯科技(深圳)有限公司 For generating the method, apparatus, equipment and computer-readable medium of text template
CN111428478A (en) * 2020-03-20 2020-07-17 北京百度网讯科技有限公司 Evidence searching method, device, equipment and storage medium for term synonymy discrimination
CN113221550A (en) * 2020-02-06 2021-08-06 百度在线网络技术(北京)有限公司 Text filtering method, device, equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101576916A (en) * 2009-06-18 2009-11-11 清华大学 Method and device for obtaining synonyms
US20100082657A1 (en) * 2008-09-23 2010-04-01 Microsoft Corporation Generating synonyms based on query log data
US20110282856A1 (en) * 2010-05-14 2011-11-17 Microsoft Corporation Identifying entity synonyms
CN102760134A (en) * 2011-04-28 2012-10-31 北京百度网讯科技有限公司 Method and device for mining synonyms

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100082657A1 (en) * 2008-09-23 2010-04-01 Microsoft Corporation Generating synonyms based on query log data
CN101576916A (en) * 2009-06-18 2009-11-11 清华大学 Method and device for obtaining synonyms
US20110282856A1 (en) * 2010-05-14 2011-11-17 Microsoft Corporation Identifying entity synonyms
CN102760134A (en) * 2011-04-28 2012-10-31 北京百度网讯科技有限公司 Method and device for mining synonyms

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095203A (en) * 2014-04-17 2015-11-25 阿里巴巴集团控股有限公司 Methods for determining and searching synonym, and server
CN105095203B (en) * 2014-04-17 2018-10-23 阿里巴巴集团控股有限公司 Determination, searching method and the server of synonym
WO2016155643A1 (en) * 2015-04-01 2016-10-06 北京奇虎科技有限公司 Input-based candidate word display method and device
CN104899322B (en) * 2015-06-18 2021-09-17 百度在线网络技术(北京)有限公司 Search engine and implementation method thereof
CN104899322A (en) * 2015-06-18 2015-09-09 百度在线网络技术(北京)有限公司 Search engine and implementation method thereof
CN106844325B (en) * 2015-12-04 2022-01-25 北大医疗信息技术有限公司 Medical information processing method and medical information processing apparatus
CN106844325A (en) * 2015-12-04 2017-06-13 北大医疗信息技术有限公司 Medical information processing method and medical information processing unit
CN110162753A (en) * 2018-11-08 2019-08-23 腾讯科技(深圳)有限公司 For generating the method, apparatus, equipment and computer-readable medium of text template
CN110162753B (en) * 2018-11-08 2022-12-13 腾讯科技(深圳)有限公司 Method, apparatus, device and computer readable medium for generating text template
CN113221550A (en) * 2020-02-06 2021-08-06 百度在线网络技术(北京)有限公司 Text filtering method, device, equipment and medium
CN113221550B (en) * 2020-02-06 2023-09-29 百度在线网络技术(北京)有限公司 Text filtering method, device, equipment and medium
CN111428478A (en) * 2020-03-20 2020-07-17 北京百度网讯科技有限公司 Evidence searching method, device, equipment and storage medium for term synonymy discrimination
CN111428478B (en) * 2020-03-20 2023-08-15 北京百度网讯科技有限公司 Entry synonym discrimination evidence searching method, entry synonym discrimination evidence searching device, entry synonym discrimination evidence searching equipment and storage medium

Also Published As

Publication number Publication date
CN102982125B (en) 2016-03-02

Similar Documents

Publication Publication Date Title
CN102982125B (en) A kind of method and apparatus for determining synonym text
CN108460014B (en) Enterprise entity identification method and device, computer equipment and storage medium
JP6114403B2 (en) Method and apparatus for providing input candidate item corresponding to input character string
US20180075013A1 (en) Method and system for automating training of named entity recognition in natural language processing
CN103092943B (en) A kind of method of advertisement scheduling and advertisement scheduling server
CN105653701B (en) Model generating method and device, word assign power method and device
CN105027121A (en) Indexing application pages of native applications
CN103514230A (en) Method and device used for training language model according to corpus sequence
JP2014013584A (en) System and method for online handwriting recognition in web queries
CN103559313B (en) Searching method and device
CN103942319A (en) Searching method and device
CN103534696A (en) Exploiting query click logs for domain detection in spoken language understanding
CN111400586A (en) Group display method, terminal, server, system and storage medium
CN104809223A (en) Method and device for supplying application content search result in application
CN114021577A (en) Content tag generation method and device, electronic equipment and storage medium
CN111090991A (en) Scene error correction method and device, electronic equipment and storage medium
CN111401044A (en) Title generation method and device, terminal equipment and storage medium
CN112818200A (en) Data crawling and event analyzing method and system based on static website
CN111984774A (en) Search method, device, equipment and storage medium
CN103955480A (en) Method and equipment for determining target object information corresponding to user
US20140129490A1 (en) Image url-based junk detection
CN109376362A (en) A kind of the determination method and relevant device of corrected text
CN108388556A (en) The method for digging and system of similar entity
CN105095385B (en) A kind of output method and device of retrieval result
CN112052390A (en) Resource screening method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant