CN102982025B - A kind of search need recognition methods and device - Google Patents

A kind of search need recognition methods and device Download PDF

Info

Publication number
CN102982025B
CN102982025B CN201110258835.3A CN201110258835A CN102982025B CN 102982025 B CN102982025 B CN 102982025B CN 201110258835 A CN201110258835 A CN 201110258835A CN 102982025 B CN102982025 B CN 102982025B
Authority
CN
China
Prior art keywords
keyword
user
search
searching request
translation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110258835.3A
Other languages
Chinese (zh)
Other versions
CN102982025A (en
Inventor
蓝翔
柴春光
吴华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201110258835.3A priority Critical patent/CN102982025B/en
Publication of CN102982025A publication Critical patent/CN102982025A/en
Application granted granted Critical
Publication of CN102982025B publication Critical patent/CN102982025B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The application discloses a kind of search need recognition methods and device. A kind of search need recognition methods comprises: according to user's historical behavior daily record, obtain the keyword that user uses in the time carrying out translating operation; The frequency of occurrences to obtained keyword is added up; Receive after searching request, judge according to statistics whether the frequency of occurrences of searched key word in this searching request exceedes default threshold value, if so, determine that this searching request has translate requirements. The scheme that application the embodiment of the present application provides, can not require that user input " translation " or " being what meaning " etc. in the time of search clearly represent the keyword of translate requirements, directly determine whether the content that user inputs has translate requirements and provide translation result, thereby improve the range of application of translate requirements identification, and further facilitated user's use.

Description

A kind of search need recognition methods and device
Technical field
The application relates to technical field of internet application, particularly relates to a kind of search need recognition methods and device.
Background technology
Search engine (searchengine) refers to according to certain strategy, uses specific computer program from interconnectedGather information on the net, after information being organized and processed, for user provides retrieval service, by information relevant user searchShow user's system. Traditional search engine, after the searching request (query) of submission that receives user, first carriesGet the keyword that this query comprises, then based on content of text matching operation, will include webpage or the document of this keywordReturn to user. Along with the continuous lifting of user to search intelligent requirements, search need identification has become search fieldA study hotspot.
So-called search need identification, the query submitting to according to user exactly, analysis and prediction user's demand, determinesUser's intention or interested field, and then provide corresponding information to it. For example, user's input " from Beijing to Shanghai "Such query, can identify this user and may have stronger map inquiry demand or ticket query demand, so justCan, in the time showing Search Results, directly provide the related content of map or ticketing service to user, or by the phase of map or ticketing serviceClose content come Search Results before, thereby facilitate user further to browse.
Search need is identified related key technology and is comprised semantic analysis, behavioural analysis, intelligent human-machine interaction, magnanimity meterCalculate processing, information extraction etc. Due to the diversity of user query form of presentation, at present a kind of comparatively conventional mode beDifferent fields is analyzed user's query, to realize search need identification more targetedly.
Translate requirements is user's a kind of comparatively common demand in search procedure, according to prior art, when user's inputAfter " xxx translation " or " xxx be what meaning " such query, search engine can be according to " translation " or " being what looks like "Deng the statement obviously with translate requirements, identify preferably user and there is the translate requirements for word " xxx ". But in realityIn the application of border, in user's query, a word or expression be may only comprise, and " translation " or " being what meaning " etc. do not comprisedHave the statement of translate requirements, in this case, existing search engine can't be determined the current whether tool of user wellThere is translate requirements.
Summary of the invention
For solving the problems of the technologies described above, the embodiment of the present application provides kind of a kind of search need recognition methods and a device, with realityThe now more effectively identification to user's translate requirements, technical scheme is as follows:
The embodiment of the present application provides a kind of search need recognition methods, comprising:
According to user's historical behavior daily record, obtain the keyword that user uses in the time carrying out translating operation;
The frequency of occurrences to obtained keyword is added up;
Receive after searching request, whether the frequency of occurrences that judges searched key word in this searching request according to statisticsExceed default threshold value, if so, determine that this searching request has translate requirements.
According to a kind of embodiment of the application, described in obtain user carrying out the keyword using when translating operation,Comprise:
If user, in the given Search Results of search engine, has selected to provide the Search Results of translation service,Obtain the keyword that this search of user is used.
According to a kind of embodiment of the application, described in obtain user carrying out the keyword using when translating operation,Comprise:
If according to the searching request of user's input, can clearly judge this search and there is translate requirements, obtainThis search has the keyword of translate requirements part.
According to a kind of embodiment of the application, described in obtain user carrying out the keyword using when translating operation,Comprise:
Obtain the keyword that user inputs in translation series products.
According to a kind of embodiment of the application, the described frequency of occurrences to obtained keyword is added up, and comprising:
Utilize n-gram model, the frequency of the each n-gram unit occurring in obtained keyword is added up.
According to a kind of embodiment of the application, described in receive after searching request, judge this search according to statisticsIn request, whether the frequency of occurrences of searched key word exceedes default threshold value, comprising:
According to statistics, obtain the frequency of each n-gram unit in searched key word;
Whether the frequency values sum that judges each n-gram unit in searched key word exceedes default threshold value.
According to a kind of embodiment of the application, before the frequency of occurrences of the keyword to obtained is added up, alsoComprise:
Obtained keyword is carried out lemmatization processing and/or removes stop words processing.
According to a kind of embodiment of the application, whether exceed in the frequency of occurrences that judges searched key word in searching requestBefore default threshold value, also comprise:
Searched key word in searching request is carried out lemmatization processing and/or removes stop words processing.
According to a kind of embodiment of the application, have after translate requirements in definite searching request, also comprising please to searchAsk corresponding translation result to represent, the exhibiting method of described translation result comprises:
In the search box, represent the corresponding translation result of searching request; Or
Corresponding searching request translation result is represented with the form of search suggestion.
According to a kind of embodiment of the application, after receiving searching request and generating search suggestion, also comprise:
Whether the content that judges search suggestion has translate requirements.
The embodiment of the present application also provides a kind of search need recognition device, comprising:
Translation keyword acquiring unit, for according to user's historical behavior daily record, obtains user and is carrying out translating operationTime the keyword that uses;
Translation keyword statistic unit, adds up for the frequency of occurrences of the keyword to obtained;
Translate requirements recognition unit, for receiving after searching request, judges in this searching request and searches according to statisticsWhether the frequency of occurrences of rope keyword exceedes default threshold value, if so, determines that this searching request has translate requirements.
According to a kind of embodiment of the application, described translation keyword acquiring unit, concrete configuration is:
For user at the given Search Results of search engine, selected to provide the Search Results of translation serviceSituation under, obtain this keyword of using of search of user.
According to a kind of embodiment of the application, described translation keyword acquiring unit, concrete configuration is:
For according to the searching request of user input, this search can clearly be judged has the situation of translate requirementsUnder, obtain this search and have the keyword of translate requirements part.
According to a kind of embodiment of the application, described translation keyword acquiring unit, concrete configuration is:
The keyword of inputting at translation series products for obtaining user.
According to a kind of embodiment of the application, described translation keyword statistic unit, concrete configuration is:
Be used for utilizing n-gram model, the frequency of the each n-gram unit occurring in obtained keyword is unitedMeter.
According to a kind of embodiment of the application, described translate requirements recognition unit, concrete configuration is:
For according to statistics, obtain the frequency of each n-gram unit in searched key word;
Whether the frequency values sum that judges each n-gram unit in searched key word exceedes default threshold value.
According to a kind of embodiment of the application, this device also comprises:
Translation keyword pretreatment unit, in described translation keyword statistic unit going out obtained keywordBefore existing frequency is added up, obtained keyword is carried out lemmatization processing and/or removes stop words processing.
According to a kind of embodiment of the application, this device also comprises:
Searched key word pretreatment unit, for judging searching request search pass at described translation keyword statistic unitBefore whether the frequency of occurrences of keyword exceedes default threshold value, the searched key word in searching request is carried out to lemmatization processingAnd/or removal stop words processing.
According to a kind of embodiment of the application, this device also comprises:
Translation result represents unit, for determining that at described translate requirements recognition unit searching request has translate requirementsAfter, the translation result corresponding to searching request represents, and described translation result represents unit concrete configuration and is:
For in the search box, represent the corresponding translation result of searching request; Or
Corresponding searching request translation result is represented with the form of search suggestion.
According to a kind of embodiment of the application, described translate requirements recognition unit is also for receiving searching request alsoAfter generating search suggestion, judge whether the content of search suggestion has translate requirements.
The scheme that the embodiment of the present application provides is first obtained user and is being carried out from the historical behavior daily record of a large number of usersThe keyword using during with translation associative operation, and the frequency of occurrences of these keywords is added up. In statistics,The frequency of occurrences of word is higher, illustrates that user is stronger to the translate requirements of these words. And then, if user, in search procedure, makesWith the frequency of occurrences of searched key word reach certain requirement, can judge that this search behavior of this user has translation and needsAsk.
The scheme that provides of application the embodiment of the present application, can not require user's input " translation " or " be assorted in the time of searchThe meaning " etc. clearly represent the keyword of translate requirements, directly determine whether the content that user inputs has translate requirements alsoProvide translation result, thereby improved the range of application of translate requirements identification, and further facilitated user's use.
Brief description of the drawings
In order to be illustrated more clearly in the embodiment of the present application or technical scheme of the prior art, below will be to embodiment or existingHave the accompanying drawing of required use in technical description to be briefly described, apparently, the accompanying drawing in the following describes is only thisSome embodiment that record in application, for those of ordinary skill in the art, can also obtain other according to these accompanying drawingsAccompanying drawing.
Fig. 1 is the flow chart of the embodiment of the present application search need recognition methods;
A kind of translation result ways of presentation schematic diagram that Fig. 2 provides for the embodiment of the present application;
The second translation result ways of presentation schematic diagram that Fig. 3 provides for the embodiment of the present application;
The third translation result ways of presentation schematic diagram that Fig. 4 provides for the embodiment of the present application;
Fig. 5 is the first structural representation of the embodiment of the present application search need recognition device;
Fig. 6 is the second structural representation of the embodiment of the present application search need recognition device;
Fig. 7 is the third structural representation of the embodiment of the present application search need recognition device.
Detailed description of the invention
In existing search engine, when user inputs passage at search box, while particularly inputting foreign language, Yong HukeCan be to expect the webpage or the document that include this word content, i.e. general search demand; Also may be to want to check and this articleThe corresponding translation of word content or bilingual example sentence, i.e. translate requirements. For search engine, if can correctly judge userCurrent demand, can build the Search Results that more meets user's request and represent to user, to facilitate user to browse.
The embodiment of the present application provides a kind of search need recognition methods, and the method comprises the following steps:
According to user's historical behavior daily record, obtain the keyword that user uses in the time carrying out translating operation;
The frequency of occurrences to obtained keyword is added up;
Receive after searching request, whether the frequency of occurrences that judges searched key word in this searching request according to statisticsExceed default threshold value, if so, determine that this searching request has translate requirements.
First said method obtains user in the time carrying out with translation associative operation from the historical behavior daily record of a large number of usersThe keyword using, and the frequency of occurrences of these keywords is added up. In statistics, the frequency of occurrences of word moreHeight, illustrates that user is stronger to the translate requirements of these words. And then, if user in search procedure, the searched key word of useThe frequency of occurrences reach certain requirement, can judge that this search behavior of this user has translate requirements. Apply above-mentioned sideCase, can not require that user input " translation " or " being what meaning " etc. in the time of search clearly represent the keyword of translate requirements,Directly determine whether the content that user inputs has translate requirements and provide translation result, thereby improved translate requirements identificationRange of application, and further facilitated user's use.
In order to make those skilled in the art person understand better the technical scheme in the application, real below in conjunction with the applicationExecute the accompanying drawing in example, the technical scheme in the embodiment of the present application is clearly and completely described, obviously, described enforcementExample is only some embodiments of the present application, instead of whole embodiment. Based on the embodiment in the application, this area is commonThe every other embodiment that technical staff obtains, should belong to the scope that the application protects.
Shown in Fig. 1, be the flow chart of a kind of search need recognition methods of the embodiment of the present application, the method can comprise followingStep:
S101, according to user's historical behavior daily record, obtains the keyword that user uses in the time carrying out translating operation;
The embodiment of the present application scheme is the historical data of the behavior based on user, to once clearly turning over of userTranslate the keyword of operation and add up, as the foundation of identification translate requirements. For each user who uses search engine, beSystem all can recording user various actions, and by these behavior records in user journal. The common translating operation of user is passableComprise following several:
1) user, in the given Search Results of search engine, has selected to provide the Search Results of translation service.
When user inputs passage at search engine, search engine returns to corresponding Search Results, wherein, and some searchResult can provide translation service, for example, translate class website. If user has further clicked this class translation result,The word that user is inputted in the search box carries out record.
For example user has inputted query in search engine: " patent ", then user has clicked in search results pagesThe link (such as www.iciba.com, dict.youdao.com etc.) of translation class website, now can think what user inputtedThis query has translate requirements, so by this query: " patent " records. And if user inputs after queryDo not click translation class website, such as user input " iphone ", then clicked a shopping website, think this queryThere is no translate requirements, this query is not carried out to record.
2) according to the searching request of user's input, can clearly judge this search and there is translate requirements.
According to existing translate requirements recognition technology, when comprising and obviously there is translate requirements in the query that user inputsStatement time, can think that this search of user has translate requirements, now by the word segment that has translate requirements in queryCarry out record.
For example, user has inputted query in search engine: " patent translation ", search engine can be according to " translation "This statement obviously with translate requirements determines that this search of user has translate requirements, so will obviously have in queryThe statement part of translate requirements is removed, and only remaining part " patent " is carried out to record.
For another example, user has inputted query in search engine: " what meaning patent is ", search engine can rootDetermine that according to " being what meaning " this statement obviously with translate requirements this search of user has translate requirements, so will" being what meaning " in query removes, and only remaining part " patent " carried out to record.
3) user uses other translation series products outside search engine.
Except obtain the keyword using when user carries out translating operation from search engine, can also be from other translationIn series products, obtain the keyword using when user carries out translating operation. For example,, for Baidu's system, except providingOutside basic search engine, also provide the product of other direct translation services, as Baidu's translation simultaneously(fanyi.baidu.com), Baidu's dictionary (dict.baidu.com) etc., and the word that user inputs in these products is aobviousSo there is translate requirements. Therefore,, as long as can pass through certain approach, acquisition user inputs in other translation series productsContent, just can get off these content records, as the foundation of subsequent searches engine identification translate requirements.
In the time that user carries out above-mentioned several translating operation, the content of inputting can think that having clear and definite translation needsAsk, therefore can record the foundation as search engine identification translate requirements. Several users of obtaining that more than provide are at toolThe method of the keyword using while having clear and definite translate requirements, can be used respectively, the use that also mutually combines, certainly, this areaTechnical staff also can be according to actual application demand, and while adopting other modes to obtain user to have clear and definite translate requirements, institute is usedKeyword, these do not affect the realization of the embodiment of the present application scheme.
In addition, it should be noted that, the embodiment of the present application scheme is when recording a large number of users and carried out translating operationThe keyword using, as the foundation of identification translate requirements. Therefore it is right that the content recording in actual applications, does not needThe concrete user of required a certain name.
S102, adds up the frequency of occurrences of obtained keyword;
At step S101, obtain a large amount of keywords, in this step, the frequency that these keywords are occurred is unitedMeter.
In actual applications, if user inputs query is word or phrase, can be directly taking word or expression asUnit, records the occurrence number of the word or expression of same form. If the query of user's input is sentence, can be first rightSentence carries out participle, then taking each word segmentation result as unit, and the number of times that statistics occurs. Certainly, in actual applications, except going outOccurrence number, the appearance that also can represent keyword by other forms such as the ratio of occurrence number and total degree or tf-idf values frequentlyRate, the embodiment of the present application does not need this to limit.
In the application's preferred embodiment, before the number of times that these keywords are occurred is added up, can also be firstCarry out following pretreatment operation:
1) lemmatization:
Taking English as example, each word may comprise the variation of variform, the singular/plural of for example noun, verbDifferent tenses, adjective/adverbial word change etc., in actual process, can be by user to same word different shapeTranslate requirements is classified as a class processing, therefore, can first unify by the lemmatization of word be prototype (for example by runs,Running, ran are reduced to run), then add up. That is to say, any distortion occurring in searched key word,In statistic processes, all process with original shape.
Wherein, lemmatization can utilize prior art to realize as PorterStemming, no longer does detailed theory hereBright.
2) remove stop words:
Stop words (StopWords) is broadly divided into following two classes: a class be use very extensive, or even too frequentlySome numerous words. Such as " i ", " is ", " what " of English, another kind of is that in text, the frequency of occurrences is very high, but practical significanceLittle word again. This class has mainly comprised auxiliary words of mood, adverbial word, preposition, conjunction etc., conventionally self there is no its meaning, onlyHave to put it in a complete sentence just to have certain effect, as common " in ", " on ", " and " etc.
Visible, for stop words, also there is no need to record separately the frequency of its appearance, therefore can be first according to presetInactive vocabulary, after the keyword obtaining in step S101 is removed stop words and processed, then add up.
According to actual application demand, above-mentioned two kinds of preferred pretreatment modes can use respectively, also can be in conjunction with makingWith, the embodiment of the present application does not need this to limit.
S103, receives after searching request, judges the appearance frequency of searched key word in this searching request according to statisticsWhether rate exceedes default threshold value, if so, determines that this searching request has translate requirements.
At step S101 and S102, according to user's historical behavior, some translate requirements keywords that have are obtained, at thisIn step, when search engine receives after new searching request, by according to the frequency of occurrences of searched key word in searching request, trueWhether fixed this searching request has translate requirements.
For the method to set up of threshold value, can rule of thumb directly set, also can select a collection of containing according to preceding methodThere is the query of translate requirements, and select another batch not contain the query of translate requirements, both close being advisable of quantity simultaneously. SoGive a mark respectively afterwards, select one to make numerical value that two class data can obviously distinguish as threshold value.
The simplest a kind of mode is whether the keyword that judges current input is present in and has translate requirements keywordIn, if so, determine that current search request has translate requirements, this mode is equivalent to set the threshold to 0. Also can be byThreshold is to be greater than 0 numerical value, that is to say, only has the keyword of current input in statistics, to occur exceeding certain timeNumber, just thinks that current search request has translate requirements. Certainly, it will be understood by those skilled in the art that according to the actual requirements, alsoMultiple different threshold ranges can be set, thereby determine the translate requirements intensity of current search request. Turn over for thering is differenceTranslate the searching request of demand intensity, can give different processing modes, for example, for thering is searching of stronger translate requirements intensityRope request, can come translation result position more forward in Search Results.
Similar S102, in actual applications, if user inputs query is word or phrase, can be directly with listWord or phrase are unit, contrast with statistics; If the query of user's input is sentence, can first enter sentenceRow participle, then taking each word segmentation result as unit, contrasts with statistics, especially, exists multiple at current queryIn the situation of participle, can sue for peace with the corresponding statistic frequency of each participle, and contrast with preset threshold value, as identificationThe foundation of translate requirements.
Equally, if in S102, the number of times that keyword is occurred has first done lemmatization or removal before adding upThe operation of stop words, in this step, also should, before current query and statistics are contrasted, carry out correspondingLemmatization or the operation of removal stop words.
In another embodiment of the application, can also utilize n-gram model at S102, to obtained keywordThe frequency of each n-gram of middle appearance is added up.
N-Gram is conventional a kind of language model during large vocabulary is identified continuously, and this model can will have l wordSentence be split as l-n+1 n-gram unit. In the time that n gets 1, be equivalent to basic participle operation above. In practical applicationIn, can determine according to the average length of the query obtaining in S101 the concrete value of n, if average length is long (as 10Above), can select larger n, if average length is shorter, can select less n, generally, N value gets 2,3, and 4Effect is better.
Taking n=2 as example, the embodiment of the present application is described below.
Suppose at step S101, obtain that to have the query collection of translate requirements as follows:
A1)Theserveristemporarilyunabletoserviceyourrequestduetomaintenancedowntimeorcapacityproblems.Pleasetryagainlater.
B1)Thisisawrongnumber.Pleasecheckupandtryagainlater.
S102a, first carries out participle to two sentences, and does lemmatization processing, obtains result as follows:
A2)theserverbetemporarunabletoserviceyourrequestduetomaintenancedowntimeorcapacityproblempleasetryagainlat
B2)thisbeawrongnumber.pleasecheckupandtryagainlat
S102b, then goes stop words processing to two sentences, obtains result as follows:
A3)servertemporarunableservicerequestduemaintenancedowntimecapacityproblempleasetryagainlat
B3)wrongnumberpleasecheckuptryagainlat
S102c, carries out 2-gram frequency statistics:
In above two sentences, all 2-gram unit of appearance is listed below:
servertemporar
temporarunable
unableservice
servicerequest
requestdue
duemaintenance
maintenancedowntime
downtimecapacity
capacityproblem
problemplease
pleasetry
tryagain
againlat
wrongnumber
numberplease
pleasecheck
checkup
uptry
tryagain
againlat
Above 2-gram is carried out to frequency statistics, and score value using the frequency as 2-gram, obtain score value inquiry dictionary:
At S103, suppose the query of the new input of user: " Thepageyouarelookingforistemporarilyunavailable.Pleasetryagainlater.”
A) first carry out participle, lemmatization, remove stop words according to the processing method of S102a and S102b, obtain:
pagelooktemporarunavailablepleasetryagainlat
For this sentence, add up the value of each 2-gram in score value dictionary, and the summation of formula below substitution:
Score = Σ i = 1 l - n + 1 f ( G i )
Wherein, l is through lemmatization, removes stop words text size after treatment, and l=8 in this example, during Gi represents in textI n-gram unit, f (Gi) be Gi in score value dictionary score value, by above-mentioned score value substitution formula, obtain:
Score = Σ i = 1 8 - 2 + 1 f ( G i )
= f ( pagelook ) + f ( looktemporar ) + f ( temporarunavailable )
+ f ( unavailableplease ) + f ( pleasetry ) + f ( tryagain ) + f ( againlat )
= 0 + 0 + 0 + 0 + 1 + 2 + 2
= 5
Suppose that default threshold value is 3, and the Score=5 of this query can judge that this query has translate requirements.
In a kind of embodiment providing in the application, if search engine has Real time identification query reactionFunction, determines that according to such scheme searching request has after translate requirements, can be directly at searched page to searching request pairThe translation result of answering represents, and like this, user just can, in the situation that not entering search results pages, obtain required translationResult.
Figure 2 shows that a kind of translation result ways of presentation that the embodiment of the present application provides, in which, translation resultTo represent in the search box.
Figure 3 shows that the another kind of translation result ways of presentation that the embodiment of the present application provides, in which, translation knotFruit is that the form of searching for suggestion represents.
In actual applications, for representing of translation result, can use the word of the form such as different fonts, color, alsoCan use other media modes such as link, picture to represent. The content representing not only can comprise direct translation result (asDictionary definition, automatic translation result etc.), also can comprise other related contents, for example part of speech, usage, commonly used collocation, is used ringBorder, example sentence, phonetic symbol, function of reading aloud etc.
In a kind of embodiment providing in the application, if search engine can generate in real time for the current input of userSearch suggestion, under the prerequisite allowing in system resource, search engine can also further judge that whether these search advise toolThere is translate requirements. If had, can be by translation content revealing corresponding search suggestion in search Suggestion box, as shown in Figure 4.
Corresponding to embodiment of the method above, the application also provides a kind of search need recognition device, shown in Figure 5,Comprise:
Translation keyword acquiring unit 501, for according to user's historical behavior daily record, obtains user and is carrying out translation behaviourThe keyword using while work;
The embodiment of the present application scheme is the historical data of the behavior based on user, to once clearly turning over of userTranslate the keyword of operation and add up, as the foundation of identification translate requirements. For each user who uses search engine, beSystem all can recording user various actions, and by these behavior records in user journal. According to the common translating operation of user,Can be following several mode by translation keyword acquiring unit 501 concrete configurations:
1) for user at the given Search Results of search engine, selected to provide the search knot of translation serviceIn the situation of fruit, obtain the keyword that this search of user is used.
When user inputs passage at search engine, search engine returns to corresponding Search Results, wherein, and some searchResult can provide translation service, for example, translate class website. If user has further clicked this class translation result,The word that user is inputted in the search box carries out record.
For example user has inputted query in search engine: " patent ", then user has clicked in search results pagesThe link (such as www.iciba.com, dict.youdao.com etc.) of translation class website, now can think what user inputtedThis query has translate requirements, so by this query: " patent " records. And if user inputs after queryDo not click translation class website, such as user input " iphone ", then clicked a shopping website, think this queryThere is no translate requirements, this query is not carried out to record.
2), for according to the searching request of user input, this search can clearly be judged have the feelings of translate requirementsUnder condition, obtain this search and have the keyword of translate requirements part.
According to existing translate requirements recognition technology, when comprising and obviously there is translate requirements in the query that user inputsStatement time, can think that this search of user has translate requirements, now by the word segment that has translate requirements in queryCarry out record.
For example, user has inputted query in search engine: " patent translation ", search engine can be according to " translation "This statement obviously with translate requirements determines that this search of user has translate requirements, so will obviously have in queryThe statement part of translate requirements is removed, and only remaining part " patent " is carried out to record.
For another example, user has inputted query in search engine: " what meaning patent is ", search engine can rootDetermine that according to " being what meaning " this statement obviously with translate requirements this search of user has translate requirements, so will" being what meaning " in query removes, and only remaining part " patent " carried out to record.
3) keyword of inputting at translation series products for obtaining user.
Except obtain the keyword using when user carries out translating operation from search engine, can also be from other translationIn series products, obtain the keyword using when user carries out translating operation. For example,, for Baidu's system, except providingOutside basic search engine, also provide the product of other direct translation services, as Baidu's translation simultaneously(fanyi.baidu.com), Baidu's dictionary (dict.baidu.com) etc., and the word that user inputs in these products is aobviousSo there is translate requirements. Therefore,, as long as can pass through certain approach, acquisition user inputs in other translation series productsContent, just can get off these content records, as the foundation of subsequent searches engine identification translate requirements.
Translation keyword statistic unit 502, adds up for the frequency of occurrences of the keyword to obtained;
In actual applications, if user inputs query is word or phrase, can be directly taking word or expression asUnit, records the occurrence number of the word or expression of same form. If the query of user's input is sentence, can be first rightSentence carries out participle, then taking each word segmentation result as unit, and the number of times that statistics occurs. Certainly, in actual applications, except going outOccurrence number, the appearance that also can represent keyword by other forms such as the ratio of occurrence number and total degree or tf-idf values frequentlyRate, the embodiment of the present application does not need this to limit.
Translate requirements recognition unit 503, for receiving after searching request, judges in this searching request according to statisticsWhether the frequency of occurrences of searched key word exceedes default threshold value, if so, determines that this searching request has translate requirements.
For the method to set up of threshold value, can rule of thumb directly set, also can select a collection of containing according to preceding methodThere is the query of translate requirements, and select a collection of query that does not contain translate requirements simultaneously, both close being advisable of quantity. ThenGive a mark respectively, select one to make numerical value that two class data can obviously distinguish as threshold value.
The simplest a kind of mode is whether the keyword that judges current input is present in and has translate requirements keywordIn, if so, determine that current search request has translate requirements, this mode is equivalent to set the threshold to 0. Also can be byThreshold is to be greater than 0 numerical value, that is to say, only has the keyword of current input in statistics, to occur exceeding certain timeNumber, just thinks that current search request has translate requirements. Certainly, it will be understood by those skilled in the art that according to the actual requirements, alsoMultiple different threshold ranges can be set, thereby determine the translate requirements intensity of current search request. Turn over for thering is differenceTranslate the searching request of demand intensity, can give different processing modes, for example, for thering is searching of stronger translate requirements intensityRope request, can come translation result position more forward in Search Results.
Shown in Figure 6, in a kind of embodiment of the application, said apparatus can also comprise: translation keyword is pre-Processing unit 504 and searched key word pretreatment unit 505:
Translation keyword pretreatment unit 504, at described translation keyword statistic unit to obtained keywordThe frequency of occurrences add up before, obtained keyword is carried out lemmatization processing and/or removes stop words processing.
Searched key word pretreatment unit 505, for judging that at described translation keyword statistic unit searching request searchesBefore whether the frequency of occurrences of rope keyword exceedes default threshold value, the searched key word in searching request is carried out to lemmatizationProcess and/or remove stop words processing.
In a kind of embodiment of the application,
Described translation keyword statistic unit 502, can concrete configuration be:
Be used for utilizing n-gram model, the frequency of the each n-gram unit occurring in obtained keyword is unitedMeter.
Described translate requirements recognition unit 503, concrete configuration is:
For according to statistics, obtain the frequency of each n-gram unit in searched key word;
Whether the frequency values sum that judges each n-gram unit in searched key word exceedes default threshold value.
Shown in Figure 7, in a kind of embodiment of the application, said apparatus can also comprise:
Translation result represents unit 506, for determining that at described translate requirements recognition unit searching request has translation and needsAfter asking, the translation result corresponding to searching request represents.
If search engine has the function of Real time identification query reaction, according to determining that searching request has translationAfter demand, translation result represents unit 506 and can directly represent translation result corresponding to searching request at searched page,Like this, user just can, in the situation that not entering search results pages, obtain required translation result.
Described translation result represents unit and specifically can be configured to:
For in the search box, represent the corresponding translation result of searching request; Represent result as shown in Figure 2.
Described translation result represents unit and can also be configured to:
Corresponding searching request translation result is represented with the form of search suggestion; Represent result as shown in Figure 3.
In actual applications, for representing of translation result, can use the word of the form such as different fonts, color, alsoCan use other media modes such as link, picture to represent. The content representing not only can comprise direct translation result (asDictionary definition, automatic translation result etc.), also can comprise other related contents, for example part of speech, usage, commonly used collocation, is used ringBorder, example sentence, phonetic symbol, function of reading aloud etc.
In addition,, in the another kind of embodiment in the application, translate requirements recognition unit 501 can also be used in searchAfter engine receives searching request and generates search suggestion, judge whether the content of search suggestion has translate requirements. IfIdentified translate requirements, translation result represents translation content revealing that unit 507 is can search suggestion corresponding in searchIn Suggestion box, as shown in Figure 4.
For convenience of description, while describing above device, being divided into various unit with function describes respectively. Certainly, implementing thisWhen application, the function of each unit can be realized in same or multiple software and/or hardware.
As seen through the above description of the embodiments, those skilled in the art can be well understood to the application canThe mode that adds essential general hardware platform by software realizes. Based on such understanding, the application's technical scheme essenceOn the part that in other words prior art contributed can embody with the form of software product, this computer software productCan be stored in storage medium, as ROM/RAM, magnetic disc, CD etc., comprise that some instructions are in order to make a computer equipment(can be personal computer, server, or the network equipment etc.) carries out some of each embodiment of the application or embodimentThe method that part is described.
Each embodiment in this description all adopts the mode of going forward one by one to describe, identical similar portion between each embodimentPoint mutually referring to, what each embodiment stressed is and the difference of other embodiment. Especially, for device orSystem embodiment, because it is substantially similar in appearance to embodiment of the method, so describe fairly simplely, relevant part is referring to methodThe part explanation of embodiment. Apparatus and system embodiment described above is only schematically, wherein said conductThe unit of separating component explanation can or can not be also physically to separate, the parts that show as unit can be orPerson can not be also physical location, can be positioned at a place, or also can be distributed on multiple NEs. Can rootThe needs on border select some or all of module wherein to realize the object of the present embodiment scheme factually. Ordinary skillPersonnel, in the situation that not paying creative work, are appreciated that and implement.
The application can be used in numerous general or special purpose computing system environment or configuration. For example: personal computer, serviceDevice computer, handheld device or portable set, laptop device, multicomputer system, the system based on microprocessor, top setBox, programmable consumer-elcetronics devices, network PC, minicom, mainframe computer, comprise above any system or equipmentDCE etc.
The application can describe in the general context of computer executable instructions, for example programModule. Usually, program module comprises routine, program, object, the group carrying out particular task or realize particular abstract data typePart, data structure etc. Also can in DCE, put into practice the application, in these DCEs, byThe teleprocessing equipment being connected by communication network is executed the task. In DCE, program module canBe arranged in the local and remote computer-readable storage medium including memory device.
The above is only the application's detailed description of the invention, it should be pointed out that the ordinary skill people for the artMember, not departing under the prerequisite of the application's principle, can also make some improvements and modifications, and these improvements and modifications also shouldBe considered as the application's protection domain.

Claims (18)

1. a search need recognition methods, is characterized in that, comprising:
According to user's historical behavior daily record, obtain the keyword that user uses in the time carrying out translating operation;
The frequency of occurrences to obtained keyword is added up;
Receive after searching request, judge according to statistics whether the frequency of occurrences of searched key word in this searching request exceedesDefault threshold value, if so, determines that this searching request has translate requirements;
Described receiving after searching request, whether the frequency of occurrences that judges searched key word in this searching request according to statisticsExceed default threshold value, comprising:
According to statistics, obtain the frequency of each n-gram unit in searched key word;
Whether the frequency values sum that judges each n-gram unit in searched key word exceedes default threshold value.
2. method according to claim 1, is characterized in that, described in obtain user and use when translating operation carrying outKeyword, comprising:
If user, in the given Search Results of search engine, has selected to provide the Search Results of translation service, obtainTake the keyword that this search of family is used.
3. method according to claim 1, is characterized in that, described in obtain user and use when translating operation carrying outKeyword, comprising:
If according to the searching request of user's input, can clearly judge this search and there is translate requirements, obtain thisSearch has the keyword of translate requirements part.
4. method according to claim 1, is characterized in that, described in obtain user and use when translating operation carrying outKeyword, comprising:
Obtain the keyword that user inputs in translation series products.
5. method according to claim 1, is characterized in that, the described frequency of occurrences to obtained keyword is unitedMeter, comprising:
Utilize n-gram model, the frequency of the each n-gram unit occurring in obtained keyword is added up.
6. according to the method described in claim 1-5 any one, it is characterized in that, in the frequency of occurrences of the keyword to obtainedBefore adding up, also comprise:
Obtained keyword is carried out lemmatization processing and/or removes stop words processing.
7. method according to claim 6, is characterized in that, in the frequency of occurrences that judges searched key word in searching requestBefore whether exceeding default threshold value, also comprise:
Searched key word in searching request is carried out lemmatization processing and/or removes stop words processing.
8. according to the method described in claim 1-5 any one, it is characterized in that, there is translate requirements in definite searching requestAfter, also comprising translation result corresponding to searching request represented, the exhibiting method of described translation result comprises:
In the search box, represent the corresponding translation result of searching request; Or
Corresponding searching request translation result is represented with the form of search suggestion.
9. according to the method described in claim 1-5 any one, it is characterized in that, receive searching request and generate search buildAfter view, also comprise:
Whether the content that judges search suggestion has translate requirements.
10. a search need recognition device, is characterized in that, comprising:
Translation keyword acquiring unit, for according to user's historical behavior daily record, obtains user in execution translating operation time instituteThe keyword using;
Translation keyword statistic unit, adds up for the frequency of occurrences of the keyword to obtained;
Translate requirements recognition unit, for receiving after searching request, judges that according to statistics in this searching request, search is closedWhether the frequency of occurrences of keyword exceedes default threshold value, if so, determines that this searching request has translate requirements;
Described translate requirements recognition unit, concrete configuration is:
For according to statistics, obtain the frequency of each n-gram unit in searched key word;
Whether the frequency values sum that judges each n-gram unit in searched key word exceedes default threshold value.
11. devices according to claim 10, is characterized in that, described translation keyword acquiring unit, and concrete configuration is:
For user at the given Search Results of search engine, selected to provide the feelings of the Search Results of translation serviceUnder condition, obtain the keyword that this search of user is used.
12. devices according to claim 10, is characterized in that, described translation keyword acquiring unit, and concrete configuration is:
For according to the searching request of user input, can clearly judge in the situation that this search has translate requirements,Obtain the keyword that this search has translate requirements part.
13. devices according to claim 10, is characterized in that, described translation keyword acquiring unit, and concrete configuration is:
The keyword of inputting at translation series products for obtaining user.
14. devices according to claim 10, is characterized in that, described translation keyword statistic unit, and concrete configuration is:
Be used for utilizing n-gram model, the frequency of the each n-gram unit occurring in obtained keyword is added up.
15. according to the device described in claim 10-14 any one, it is characterized in that, this device also comprises:
Translation keyword pretreatment unit, at described translation keyword statistic unit to the appearance of obtained keyword frequentlyBefore rate is added up, obtained keyword is carried out lemmatization processing and/or removes stop words processing.
16. devices according to claim 15, is characterized in that, this device also comprises:
Searched key word pretreatment unit, for judging searching request searched key word at described translation keyword statistic unitThe frequency of occurrences whether exceed default threshold value before, to the searched key word in searching request carry out lemmatization process and/Or removal stop words processing.
17. according to the device described in claim 10-14 any one, it is characterized in that, also comprises:
Translation result represents unit, for determining that at described translate requirements recognition unit searching request has after translate requirements, rightTranslation result corresponding to searching request represents, and described translation result represents unit concrete configuration and is:
For in the search box, represent the corresponding translation result of searching request; Or
Corresponding searching request translation result is represented with the form of search suggestion.
18. according to the device described in claim 10-14 any one, it is characterized in that, described translate requirements recognition unit is also usedAfter receiving searching request and generating search suggestion, judge whether the content of search suggestion has translate requirements.
CN201110258835.3A 2011-09-02 2011-09-02 A kind of search need recognition methods and device Active CN102982025B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110258835.3A CN102982025B (en) 2011-09-02 2011-09-02 A kind of search need recognition methods and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110258835.3A CN102982025B (en) 2011-09-02 2011-09-02 A kind of search need recognition methods and device

Publications (2)

Publication Number Publication Date
CN102982025A CN102982025A (en) 2013-03-20
CN102982025B true CN102982025B (en) 2016-05-11

Family

ID=47856064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110258835.3A Active CN102982025B (en) 2011-09-02 2011-09-02 A kind of search need recognition methods and device

Country Status (1)

Country Link
CN (1) CN102982025B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103714054B (en) * 2013-12-30 2017-03-15 北京百度网讯科技有限公司 Interpretation method and translating equipment
CN103793364B (en) * 2014-01-23 2018-09-07 北京百度网讯科技有限公司 The method and apparatus that automatic phonetic notation processing and display are carried out to text
CN105677927B (en) * 2016-03-31 2019-04-12 百度在线网络技术(北京)有限公司 For providing the method and apparatus of search result
CN105956038A (en) * 2016-04-26 2016-09-21 宇龙计算机通信科技(深圳)有限公司 Notification message management method and apparatus as well as terminal
CN110147479B (en) * 2017-10-31 2021-06-11 北京搜狗科技发展有限公司 Search behavior recognition method and device and search behavior recognition device
CN112068981B (en) * 2020-09-24 2022-06-21 中国人民解放军国防科技大学 Knowledge base-based fault scanning recovery method and system in Linux operating system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1761972A (en) * 2003-03-18 2006-04-19 Nhn株式会社 A method of determining an intention of internet user, and a method of advertising via internet by using the determining method and a system thereof
CN102012900A (en) * 2009-09-04 2011-04-13 阿里巴巴集团控股有限公司 An information retrieval method and system
CN102096717A (en) * 2011-02-15 2011-06-15 百度在线网络技术(北京)有限公司 Search method and search engine

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006036781A2 (en) * 2004-09-22 2006-04-06 Perfect Market Technologies, Inc. Search engine using user intent
US7840538B2 (en) * 2006-12-20 2010-11-23 Yahoo! Inc. Discovering query intent from search queries and concept networks
US20090043749A1 (en) * 2007-08-06 2009-02-12 Garg Priyank S Extracting query intent from query logs
US7949672B2 (en) * 2008-06-10 2011-05-24 Yahoo! Inc. Identifying regional sensitive queries in web search

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1761972A (en) * 2003-03-18 2006-04-19 Nhn株式会社 A method of determining an intention of internet user, and a method of advertising via internet by using the determining method and a system thereof
CN102012900A (en) * 2009-09-04 2011-04-13 阿里巴巴集团控股有限公司 An information retrieval method and system
CN102096717A (en) * 2011-02-15 2011-06-15 百度在线网络技术(北京)有限公司 Search method and search engine

Also Published As

Publication number Publication date
CN102982025A (en) 2013-03-20

Similar Documents

Publication Publication Date Title
Cimiano et al. Learning taxonomic relations from heterogeneous sources of evidence
Chauhan et al. A comprehensive analysis of adverb types for mining user sentiments on amazon product reviews
Arendarenko et al. Ontology-based information and event extraction for business intelligence
Barbosa et al. Evaluating hotels rating prediction based on sentiment analysis services
CN102982025B (en) A kind of search need recognition methods and device
US9632998B2 (en) Claim polarity identification
Rana et al. Improving aspect extraction using aspect frequency and semantic similarity-based approach for aspect-based sentiment analysis
Alrefai et al. Sentiment analysis for Arabic language: A brief survey of approaches and techniques
Amir et al. Sentence similarity based on semantic kernels for intelligent text retrieval
Bouarroudj et al. Named entity disambiguation in short texts over knowledge graphs
Shahi et al. Automatic analysis of corporate sustainability reports and intelligent scoring
Roth et al. Parsing software requirements with an ontology-based semantic role labeler
Saleiro et al. TexRep: A text mining framework for online reputation monitoring
Tahir et al. Corpulyzer: A novel framework for building low resource language corpora
Hoon et al. App reviews: Breaking the user and developer language barrier
Cação et al. Deepagé: Answering questions in portuguese about the brazilian environment
Phan et al. Applying skip-gram word estimation and SVM-based classification for opinion mining Vietnamese food places text reviews
Rajput Ontology based semantic annotation of Urdu language web documents
EP3660699A1 (en) Method and system to extract domain concepts to create domain dictionaries and ontologies
JP2016103156A (en) Text feature amount extraction device, text feature amount extraction method, and program
EP4080381A1 (en) Method and apparatus for generating patent summary information, and electronic device and medium
El Idrissi Esserhrouchni et al. Learning domain taxonomies: The TaxoLine approach
Omurca et al. An annotated corpus for Turkish sentiment analysis at sentence level
KR20070008994A (en) System and method for extracting domain information in unstructured web documents
Xia et al. Research on feature-based opinion mining using topic maps

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant