CN102982025A - Identification method and device for searching requirement - Google Patents

Identification method and device for searching requirement Download PDF

Info

Publication number
CN102982025A
CN102982025A CN2011102588353A CN201110258835A CN102982025A CN 102982025 A CN102982025 A CN 102982025A CN 2011102588353 A CN2011102588353 A CN 2011102588353A CN 201110258835 A CN201110258835 A CN 201110258835A CN 102982025 A CN102982025 A CN 102982025A
Authority
CN
China
Prior art keywords
keyword
user
translation
search
searching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011102588353A
Other languages
Chinese (zh)
Other versions
CN102982025B (en
Inventor
蓝翔
柴春光
吴华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201110258835.3A priority Critical patent/CN102982025B/en
Publication of CN102982025A publication Critical patent/CN102982025A/en
Application granted granted Critical
Publication of CN102982025B publication Critical patent/CN102982025B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses an identification method and a device for a searching requirement. The identification method for the searching requirement comprises that according to history action journal of a user, keywords used by the user in translation operation are obtained and appearance frequency of the keywords obtained is counted. When a searching requirement is received, whether the appearance frequency of the searching keywords in the searching requirement surpassing a pre-set threshold value or not is judged according to a statistical result. If the appearance frequency surpasses the pre-set threshold value, a translation request of the searching requirement is confirmed. According to the identification method and the device for the searching requirement, the user is not required to input keywords such as ' translation' or 'what is the meaning' to indicate a translation requirement when searching, the identification method and the device for the searching requirement can directly confirm whether content input by the user needs to be translated or not and then output a translation result so that application range of translation requirement identification is enlarged and the users can use the identification method and the device further conveniently.

Description

A kind of search need recognition methods and device
Technical field
The application relates to technical field of internet application, particularly relates to a kind of search need recognition methods and device.
Background technology
Search engine (search engine) refers to according to certain strategy, uses specific computer program to gather information from the internet, after information being organized and is processed, for the user provides retrieval service, the information display that user search is relevant is to user's system.Traditional search engine after the searching request (query) of the submission that receives the user, at first extracts the keyword that this query comprises, and then based on the content of text matching operation, webpage or the document that will include this keyword return to the user.Along with the continuous lifting of user to the search intelligent requirements, search need identification has become a study hotspot of search field.
So-called search need identification, the query that submits to according to the user exactly, analysis and prediction user's demand is determined user's intention or interested field, and then provides corresponding information to it.For example, the user inputs " from Beijing to Shanghai " such query, then can identify this user and may have stronger map inquiry demand or ticket query demand, so just can be when showing Search Results, the related content of map or ticketing service directly is provided to the user, perhaps the related content of map or ticketing service is come the front of Search Results, thereby make things convenient for the user further to browse.
The related gordian technique of search need identification comprises semantic analysis, behavioural analysis, intelligent human-machine interaction, magnanimity computing, information extraction etc.Because the diversity of user query form of presentation, at present a kind of mode comparatively commonly used is that the query to the user analyzes in different fields, to realize more targetedly search need identification.
Translate requirements is user's a kind of comparatively common demand in search procedure, according to prior art, after the user inputs " xxx translation " or " what meaning xxx is " such query, search engine can obviously have according to " translation " or " be what the meaning " etc. the statement of translate requirements, identifies preferably the user and has translate requirements for word " xxx ".But in actual applications, may only comprise a word or expression among user's the query, and do not comprise that " translation " or " be what the meaning " etc. has the statement of translate requirements, in this case, existing search engine can't be determined well that the user is current and whether have a translate requirements.
Summary of the invention
For solving the problems of the technologies described above, the embodiment of the present application provides kind of a kind of search need recognition methods and device, and to realize the more effective identification to user's translate requirements, technical scheme is as follows:
The embodiment of the present application provides a kind of search need recognition methods, comprising:
According to user's historical behavior daily record, obtain user's employed keyword when carrying out translating operation;
The frequency of occurrences to the keyword that obtains is added up;
After receiving searching request, judge according to statistics whether the frequency of occurrences of searching key word in this searching request surpasses default threshold value, if so, determine that then this searching request has translate requirements.
According to a kind of embodiment of the application, the described user's employed keyword when carrying out translating operation that obtains comprises:
If the user in the given Search Results of search engine, has selected to provide the Search Results of translation service, this searches for employed keyword then to obtain the user.
According to a kind of embodiment of the application, the described user's employed keyword when carrying out translating operation that obtains comprises:
If according to the searching request of user's input, can clearly judge this search and have translate requirements, then obtain the keyword that this search has the translate requirements part.
According to a kind of embodiment of the application, the described user's employed keyword when carrying out translating operation that obtains comprises:
Obtain the keyword that the user inputs in the translation series products.
According to a kind of embodiment of the application, the described frequency of occurrences to the keyword that obtains is added up, and comprising:
Utilize the n-gram model, the frequency of each n-gram unit of occurring in the keyword that obtains is added up.
According to a kind of embodiment of the application, described receive searching request after, judge that according to statistics whether the frequency of occurrences of searching key word in this searching request surpasses default threshold value, comprising:
According to statistics, obtain the frequency of each n-gram unit in the searching key word;
Whether the frequency values sum of judging each n-gram unit in the searching key word surpasses default threshold value.
According to a kind of embodiment of the application, before the frequency of occurrences of the keyword that obtains is added up, also comprise:
The keyword that obtains is carried out lemmatization process and/or remove the stop words processing.
According to a kind of embodiment of the application, whether the frequency of occurrences of searching key word surpasses before the default threshold value in judging searching request, also comprises:
Searching key word in the searching request is carried out lemmatization process and/or remove the stop words processing.
According to a kind of embodiment of the application, after definite searching request has translate requirements, comprise also translation result corresponding to searching request represented that the exhibiting method of described translation result comprises:
In the search box, represent the corresponding translation result of searching request; Or
The form of the corresponding translation result of searching request with the search suggestion represented.
According to a kind of embodiment of the application, after receiving searching request and generating the search suggestion, also comprise:
Whether the content of judging the search suggestion has translate requirements.
The embodiment of the present application also provides a kind of search need recognition device, comprising:
Translation keyword acquiring unit is used for the historical behavior daily record according to the user, obtains user's employed keyword when carrying out translating operation;
Translation keyword statistic unit is used for the frequency of occurrences of the keyword that obtains is added up;
The translate requirements recognition unit after being used for receiving searching request, judges according to statistics whether the frequency of occurrences of searching key word in this searching request surpasses default threshold value, if so, determines that then this searching request has translate requirements.
According to a kind of embodiment of the application, described translation keyword acquiring unit, concrete configuration is:
Be used for the user at the given Search Results of search engine, selected to provide in the situation of Search Results of translation service, this searches for employed keyword to obtain the user.
According to a kind of embodiment of the application, described translation keyword acquiring unit, concrete configuration is:
Be used in the searching request according to user's input, can clearly judge in the situation that this search has translate requirements, obtain the keyword that this search has the translate requirements part.
According to a kind of embodiment of the application, described translation keyword acquiring unit, concrete configuration is:
Be used for obtaining the keyword that the user inputs at the translation series products.
According to a kind of embodiment of the application, described translation keyword statistic unit, concrete configuration is:
Be used for utilizing the n-gram model, the frequency of each n-gram unit of occurring in the keyword that obtains is added up.
According to a kind of embodiment of the application, described translate requirements recognition unit, concrete configuration is:
Be used for according to statistics, obtain the frequency of each n-gram unit in the searching key word;
Whether the frequency values sum of judging each n-gram unit in the searching key word surpasses default threshold value.
According to a kind of embodiment of the application, this device also comprises:
Translation keyword pretreatment unit was used for before described translation keyword statistic unit is added up the frequency of occurrences of the keyword that obtains, the keyword that obtains is carried out lemmatization process and/or remove stop words and process.
According to a kind of embodiment of the application, this device also comprises:
The searching key word pretreatment unit,, the searching key word in the searching request is carried out lemmatization process and/or remove the stop words processing whether above before the default threshold value for the frequency of occurrences of judging the searching request searching key word at described translation keyword statistic unit.
According to a kind of embodiment of the application, this device also comprises:
Translation result represents the unit, is used for after described translate requirements recognition unit determines that searching request has translate requirements, and the translation result corresponding to searching request represents, and described translation result represents the unit concrete configuration and is:
Be used for representing the corresponding translation result of searching request in the search box; Or
The form of the corresponding translation result of searching request with the search suggestion represented.
According to a kind of embodiment of the application, described translate requirements recognition unit also is used for after receiving searching request and generating the search suggestion, judges whether the content of search suggestion has translate requirements.
The scheme that the embodiment of the present application provides is at first obtained user's employed keyword when carrying out with the translation associative operation from the historical behavior daily record of a large number of users, and the frequency of occurrences of these keywords is added up.In statistics, the frequency of occurrences of word is higher, illustrates that the user is stronger to the translate requirements of these words.And then, if the user in search procedure, the frequency of occurrences of the searching key word of use reaches certain requirement, can judge that then this search behavior of this user has translate requirements.
Use the scheme that the embodiment of the present application provides, can not require that user input " translation " or " being what meaning " etc. when search clearly represent the keyword of translate requirements, determine directly whether the content that the user inputs has translate requirements and provide translation result, thereby improved the range of application of translate requirements identification, and further facilitated user's use.
Description of drawings
In order to be illustrated more clearly in the embodiment of the present application or technical scheme of the prior art, the below will do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art, apparently, the accompanying drawing that the following describes only is some embodiment that put down in writing among the application, for those of ordinary skills, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is the process flow diagram of the embodiment of the present application search need recognition methods;
A kind of translation result ways of presentation synoptic diagram that Fig. 2 provides for the embodiment of the present application;
The second translation result ways of presentation synoptic diagram that Fig. 3 provides for the embodiment of the present application;
The third translation result ways of presentation synoptic diagram that Fig. 4 provides for the embodiment of the present application;
Fig. 5 is the first structural representation of the embodiment of the present application search need recognition device;
Fig. 6 is the second structural representation of the embodiment of the present application search need recognition device;
Fig. 7 is the third structural representation of the embodiment of the present application search need recognition device.
Embodiment
In existing search engine, when the user inputs passage at search box, when particularly inputting foreign language, the user expects webpage or the document that includes this literal content, i.e. general search demand; Also may be to want to check and the corresponding translation of this literal content or bilingual example sentence, i.e. translate requirements.For search engine, if can correctly judge the current demand of user, then can make up the Search Results that more meets user's request and represent to the user, browse to make things convenient for the user.
The embodiment of the present application provides a kind of search need recognition methods, and the method may further comprise the steps:
According to user's historical behavior daily record, obtain user's employed keyword when carrying out translating operation;
The frequency of occurrences to the keyword that obtains is added up;
After receiving searching request, judge according to statistics whether the frequency of occurrences of searching key word in this searching request surpasses default threshold value, if so, determine that then this searching request has translate requirements.
Said method at first obtains user's employed keyword when carrying out with the translation associative operation from the historical behavior daily record of a large number of users, and the frequency of occurrences of these keywords is added up.In statistics, the frequency of occurrences of word is higher, illustrates that the user is stronger to the translate requirements of these words.And then, if the user in search procedure, the frequency of occurrences of the searching key word of use reaches certain requirement, can judge that then this search behavior of this user has translate requirements.Use such scheme, can not require that user input " translation " or " being what meaning " etc. when search clearly represent the keyword of translate requirements, determine directly whether the content that the user inputs has translate requirements and provide translation result, thereby improved the range of application of translate requirements identification, and further facilitated user's use.
In order to make those skilled in the art person understand better technical scheme among the application, below in conjunction with the accompanying drawing in the embodiment of the present application, technical scheme in the embodiment of the present application is clearly and completely described, obviously, described embodiment only is the application's part embodiment, rather than whole embodiment.Based on the embodiment among the application, the every other embodiment that those of ordinary skills obtain should belong to the scope that the application protects.
Shown in Figure 1, be the process flow diagram of a kind of search need recognition methods of the embodiment of the present application, the method can may further comprise the steps:
S101 according to user's historical behavior daily record, obtains user's employed keyword when carrying out translating operation;
The embodiment of the present application scheme is based on the historical data of user's behavior, user's the keyword that once clearly carried out translating operation is added up, as the foundation of identification translate requirements.For each user who uses search engine, the various actions that system all can recording user, and with these behavior records in user journal.The common translating operation of user can comprise following several:
1) user has selected to provide the Search Results of translation service in the given Search Results of search engine.
When the user inputs passage at search engine, search engine returns corresponding Search Results, and wherein, some Search Results can provide translation service, for example translates the class website.If the user has further clicked this class translation result, the literal of then user being inputted in the search box carries out record.
For example the user has inputted query in search engine: " patent ", then the user has clicked the link of translation class website (such as www.iciba.com in search results pages, dict.youdao.com etc.), can think that this query that the user inputs has translate requirements this moment, so with this query: " patent " records.And if the user does not click translation class website after inputting query, such as user's input " iphone ", then clicked a shopping website, think that then this query does not have translate requirements, this query is not carried out record.
2) according to the searching request of user's input, can clearly judge this search and have translate requirements.
According to existing translate requirements recognition technology, when comprising the statement that obviously has translate requirements among the query that the user inputs, can think that this search of user has translate requirements, will carry out record this moment to the word segment that translate requirements is arranged among the query.
For example, the user has inputted query in search engine: " patent translation ", search engine can determine that this search of user has translate requirements according to " translation " this statement that obviously has translate requirements, so the statement that obviously has translate requirements among the query is partly removed, only remaining part " patent " is carried out record.
For another example, the user has inputted query in search engine: " what meaning patent is ", search engine can determine that this search of user has translate requirements according to " being what meaning " this statement that obviously has translate requirements, so " being what meaning " among the query removed, only remaining part " patent " carried out record.
3) user uses other translation series products outside the search engine.
Employed keyword when carrying out translating operation except obtain the user from search engine can also be from other translation series products, employed keyword when obtaining the user and carrying out translating operation.For example, for the system of Baidu, except basic search engine is provided, the product of other direct translation services also is provided simultaneously, such as Baidu's translation (fanyi.baidu.com), Baidu's dictionary (dict.baidu.com) etc., and the literal that the user inputs in these products obviously has translate requirements.Therefore, as long as can pass through certain approach, obtain the content that the user inputs in other translation series products, just these content records can be got off, as the foundation of subsequent searches engine identification translate requirements.
When the user carried out above-mentioned several translating operation, the content of inputting can think to have clear and definite translate requirements, therefore can record the foundation as search engine identification translate requirements.The method of several users of the obtaining employed keyword when having clear and definite translate requirements that more than provides, can use respectively, use also mutually combines, certainly, those skilled in the art also can be according to the application demand of reality, employed keyword when adopting other modes to obtain the user to have clear and definite translate requirements, these do not affect the realization of the embodiment of the present application scheme.
In addition, need to prove that the embodiment of the present application scheme is employed keyword when carrying out translating operation by the record a large number of users, as the foundation of identification translate requirements.Therefore in actual applications, the content that records does not need to correspond to the concrete user of a certain name.
S102 adds up the frequency of occurrences of the keyword that obtains;
At step S101, obtained a large amount of keywords, in this step, the frequency that these keywords occur is added up.
In actual applications, if the user inputs query is word or phrase, can directly take word or expression as unit, record the occurrence number of the word or expression of same form.If the query of user's input is sentence, then can carry out participle to sentence first, then take each word segmentation result as unit, the number of times that statistics occurs.Certainly, in actual applications, except occurrence number, can represent with other forms such as the ratio of occurrence number and total degree or tf-idf values the frequency of occurrences of keyword, the embodiment of the present application does not need this to limit yet.
In the application's preferred embodiment, the number of times that these keywords are occurred can also carry out first following pretreatment operation before adding up:
1) lemmatization:
Take English as example, each word may comprise the variation of variform, for example the different tenses of the singular/plural of noun, verb, adjective/adverbial word change etc., in actual process, the user can be classified as a class to the translate requirements of same word different shape processes, therefore, the lemmatization that can unify first word is prototype (for example runs, running, ran being reduced to run), adds up again.That is to say that any distortion that occurs is all processed with original shape in statistic processes in searching key word.
Wherein, lemmatization can utilize prior art such as Porter Stemming to realize, no longer elaborates here.
2) remove stop words:
Stop words (Stop Words) is broadly divided into following two classes: a class be use very extensive, or even some words too frequently.Such as " i ", " is ", " what " of English, another kind of is that the frequency of occurrences is very high in the text, but practical significance little word again.This class has mainly comprised auxiliary words of mood, adverbial word, preposition, conjunction etc., usually self there is no its meaning, only puts it into just to have certain effect in the complete sentence, such as common " in ", " on ", " and " etc.
As seen, for stop words, also there is no need to record separately the frequency of its appearance, therefore can be first according to the inactive vocabulary that presets, the keyword that obtains among the step S101 removed the stop words processing after, add up again.
According to the application demand of reality, above-mentioned two kinds of preferred pretreatment modes can use respectively, also can be combined with, and the embodiment of the present application does not need this to limit.
S103, receive searching request after, judge that according to statistics whether the frequency of occurrences of searching key word in this searching request surpasses default threshold value, if so, determine that then this searching request has translate requirements.
At step S101 and S102, according to user's historical behavior, some translate requirements keywords that have have been obtained, in this step, after search engine receives new searching request, will according to the frequency of occurrences of searching key word in the searching request, determine whether this searching request has translate requirements.
For the method to set up of threshold value, can rule of thumb directly set, also can select a collection of query that contains translate requirements according to preceding method, and select simultaneously another batch not contain the query of translate requirements, both quantity is close is advisable.Then give a mark respectively, select one to make numerical value that two class data can obviously distinguish as threshold value.
The simplest a kind of mode is whether the keyword of judging current input is present in and has in the translate requirements keyword, if so, determines that then the current search request has translate requirements, and this mode is equivalent to set the threshold to 0.Also can set the threshold to the numerical value greater than 0, that is to say, only have the keyword of current input in statistics, to occur surpassing certain number of times, think that just the current search request has translate requirements.Certainly, it will be understood by those skilled in the art that according to the actual requirements, a plurality of different threshold ranges also can be set, thereby determine the translate requirements intensity of current search request.For the searching request with different translate requirements intensity, can give different processing modes, for example, for the searching request with stronger translate requirements intensity, translation result can be come position more forward in the Search Results.
Similar S102 in actual applications, if the user inputs query is word or phrase, can directly take word or expression as unit, compare with statistics; If the query of user's input is sentence, then can carry out participle to sentence first, then take each word segmentation result as unit, compare with statistics, especially, exist at current query in the situation of a plurality of participles, can be with the corresponding statistical frequency summation of each participle, and compare with the threshold value that presets, as the foundation of identification translate requirements.
Equally, if in S102, before the number of times that keyword is occurred is added up, the operation of having done first lemmatization or having removed stop words, then in this step, also should before current query and statistics be compared, carry out corresponding lemmatization or remove the stop words operation.
In another embodiment of the application, can also utilize the n-gram model at S102, the frequency of each n-gram of occurring in the keyword that obtains is added up.
N-Gram is a kind of language model commonly used during large vocabulary is identified continuously, and this model can be split as the sentence with l word l-n+1 n-gram unit.When n gets 1, namely be equivalent to the basic participle operation of front.In actual applications, the concrete value of n can be determined according to the average length of resulting query among the S101, if average length is grown (as more than 10), larger n can be selected, if average length is shorter, can select less n, generally speaking, the N value gets 2,3, and 4 effects are better.
The below describes the embodiment of the present application take n=2 as example.
Suppose that at step S101 the query collection that obtains to have translate requirements is as follows:
A1)The?server?is?temporarily?unable?to?service?your?request?due?to?maintenance?downtime?or?capacity?problems.Please?try?again?later.
B1)This?is?a?wrong?number.Please?check?up?and?try?again?later.
S102a at first carries out participle to two sentences, and does lemmatization and process, and it is as follows to obtain the result:
A2)the?server?be?temporar?unable?to?service?your?request?due?to?maintenance?downtime?or?capacity?problem?please?try?again?lat
B2)this?be?a?wrong?number.please?check?up?and?try?again?lat
Then S102b goes stop words to process to two sentences, and it is as follows to obtain the result:
A3)server?temporar?unable?service?request?due?maintenance?downtimecapacity?problem?please?try?again?lat
B3)wrong?number?please?check?up?try?again?lat
S102c, carry out the 2-gram frequency statistics:
In above two sentences, all 2-gram unit of appearance are listed below:
server?temporar
temporar?unable
unable?service
service?request
request?due
due?maintenance
maintenance?downtime
downtime?capacity
capacity?problem
problem?please
please?try
try?again
again?lat
wrong?number
number?please
please?check
check?up
up?try
try?again
again?lat
Above 2-gram is carried out frequency statistics, and with the score value of the frequency as 2-gram, obtains score value inquiry dictionary:
Figure BDA0000088652760000121
At S103, suppose the query of the new input of user: " The page you are looking for istemporarily unavailable.Please try again later. "
A) at first carry out participle, lemmatization, remove stop words according to the disposal route of S102a and S102b, obtain:
page?look?temporar?unavailable?please?try?again?lat
For this sentence, add up the value of each 2-gram in the score value dictionary, and the summation of the formula below the substitution:
Score = Σ i = 1 l - n + 1 f ( G i )
Wherein, l is the text size through lemmatization, after going stop words to process, i n-gram unit during l=8 in this example, Gi represent in the text, f (Gi) be Gi in the score value dictionary score value, with the above-mentioned formula of score value substitution, obtain:
Score = Σ i = 1 8 - 2 + 1 f ( G i )
= f ( pagelook ) + f ( looktemporar ) + f ( temporarunavailable )
+ f ( unavailableplease ) + f ( pleasetry ) + f ( tryagain ) + f ( againlat )
= 0 + 0 + 0 + 0 + 1 + 2 + 2
= 5
Suppose that default threshold value is 3, and the Score=5 of this query can judge that then this query has translate requirements.
In a kind of embodiment that the application provides, if search engine has the function of Real time identification query and reaction, after then determining that according to such scheme searching request has translate requirements, can directly represent translation result corresponding to searching request at searched page, like this, the user just can in the situation that does not enter search results pages, obtain required translation result.
Figure 2 shows that a kind of translation result ways of presentation that the embodiment of the present application provides, in this mode, translation result is to represent in the search box.
Figure 3 shows that the another kind of translation result ways of presentation that the embodiment of the present application provides, in this mode, translation result is that the form of searching for suggestion represents.
In actual applications, for representing of translation result, can use the literal of the forms such as different fonts, color, also can use other media modes such as link, picture to represent.The content that represents not only can comprise direct translation result (such as dictionary definition, automatic translation result etc.), also can comprise other related contents, part of speech for example, usage, commonly used collocation, environment for use, example sentence, phonetic symbol, function of reading aloud etc.
In a kind of embodiment that the application provides, if search engine can generate the search suggestion in real time for the current input of user, then under the prerequisite that system resource allows, search engine can also judge further whether these search suggestions have translate requirements.If have, translation content revealing that can the search suggestion is corresponding is in the search Suggestion box, as shown in Figure 4.
Corresponding to top embodiment of the method, the application also provides a kind of search need recognition device, referring to shown in Figure 5, comprising:
Translation keyword acquiring unit 501 is used for the historical behavior daily record according to the user, obtains user's employed keyword when carrying out translating operation;
The embodiment of the present application scheme is based on the historical data of user's behavior, user's the keyword that once clearly carried out translating operation is added up, as the foundation of identification translate requirements.For each user who uses search engine, the various actions that system all can recording user, and with these behavior records in user journal.According to the common translating operation of user, can be following several mode with translation keyword acquiring unit 501 concrete configurations:
1) be used for the user at the given Search Results of search engine, selected to provide in the situation of Search Results of translation service, this searches for employed keyword to obtain the user.
When the user inputs passage at search engine, search engine returns corresponding Search Results, and wherein, some Search Results can provide translation service, for example translates the class website.If the user has further clicked this class translation result, the literal of then user being inputted in the search box carries out record.
For example the user has inputted query in search engine: " patent ", then the user has clicked the link of translation class website (such as www.iciba.com in search results pages, dict.youdao.com etc.), can think that this query that the user inputs has translate requirements this moment, so with this query: " patent " records.And if the user does not click translation class website after inputting query, such as user's input " iphone ", then clicked a shopping website, think that then this query does not have translate requirements, this query is not carried out record.
2) be used in the searching request according to user's input, can clearly judge in the situation that this search has translate requirements, obtain the keyword that this search has the translate requirements part.
According to existing translate requirements recognition technology, when comprising the statement that obviously has translate requirements among the query that the user inputs, can think that this search of user has translate requirements, will carry out record this moment to the word segment that translate requirements is arranged among the query.
For example, the user has inputted query in search engine: " patent translation ", search engine can determine that this search of user has translate requirements according to " translation " this statement that obviously has translate requirements, so the statement that obviously has translate requirements among the query is partly removed, only remaining part " patent " is carried out record.
For another example, the user has inputted query in search engine: " what meaning patent is ", search engine can determine that this search of user has translate requirements according to " being what meaning " this statement that obviously has translate requirements, so " being what meaning " among the query removed, only remaining part " patent " carried out record.
3) be used for obtaining the keyword that the user inputs at the translation series products.
Employed keyword when carrying out translating operation except obtain the user from search engine can also be from other translation series products, employed keyword when obtaining the user and carrying out translating operation.For example, for the system of Baidu, except basic search engine is provided, the product of other direct translation services also is provided simultaneously, such as Baidu's translation (fanyi.baidu.com), Baidu's dictionary (dict.baidu.com) etc., and the literal that the user inputs in these products obviously has translate requirements.Therefore, as long as can pass through certain approach, obtain the content that the user inputs in other translation series products, just these content records can be got off, as the foundation of subsequent searches engine identification translate requirements.
Translation keyword statistic unit 502 is used for the frequency of occurrences of the keyword that obtains is added up;
In actual applications, if the user inputs query is word or phrase, can directly take word or expression as unit, record the occurrence number of the word or expression of same form.If the query of user's input is sentence, then can carry out participle to sentence first, then take each word segmentation result as unit, the number of times that statistics occurs.Certainly, in actual applications, except occurrence number, can represent with other forms such as the ratio of occurrence number and total degree or tf-idf values the frequency of occurrences of keyword, the embodiment of the present application does not need this to limit yet.
Translate requirements recognition unit 503 after being used for receiving searching request, judges according to statistics whether the frequency of occurrences of searching key word in this searching request surpasses default threshold value, if so, determines that then this searching request has translate requirements.
For the method to set up of threshold value, can rule of thumb directly set, also can select a collection of query that contains translate requirements according to preceding method, and select simultaneously a collection of query that does not contain translate requirements, both quantity is close is advisable.Then give a mark respectively, select one to make numerical value that two class data can obviously distinguish as threshold value.
The simplest a kind of mode is whether the keyword of judging current input is present in and has in the translate requirements keyword, if so, determines that then the current search request has translate requirements, and this mode is equivalent to set the threshold to 0.Also can set the threshold to the numerical value greater than 0, that is to say, only have the keyword of current input in statistics, to occur surpassing certain number of times, think that just the current search request has translate requirements.Certainly, it will be understood by those skilled in the art that according to the actual requirements, a plurality of different threshold ranges also can be set, thereby determine the translate requirements intensity of current search request.For the searching request with different translate requirements intensity, can give different processing modes, for example, for the searching request with stronger translate requirements intensity, translation result can be come position more forward in the Search Results.
Referring to shown in Figure 6, in a kind of embodiment of the application, said apparatus can also comprise: translation keyword pretreatment unit 504 and searching key word pretreatment unit 505:
Translation keyword pretreatment unit 504 was used for before described translation keyword statistic unit is added up the frequency of occurrences of the keyword that obtains, the keyword that obtains is carried out lemmatization process and/or remove stop words and process.
Searching key word pretreatment unit 505,, the searching key word in the searching request is carried out lemmatization process and/or remove the stop words processing whether above before the default threshold value for the frequency of occurrences of judging the searching request searching key word at described translation keyword statistic unit.
In a kind of embodiment of the application,
Described translation keyword statistic unit 502 can concrete configuration be:
Be used for utilizing the n-gram model, the frequency of each n-gram unit of occurring in the keyword that obtains is added up.
Described translate requirements recognition unit 503, concrete configuration is:
Be used for according to statistics, obtain the frequency of each n-gram unit in the searching key word;
Whether the frequency values sum of judging each n-gram unit in the searching key word surpasses default threshold value.
Referring to shown in Figure 7, in a kind of embodiment of the application, said apparatus can also comprise:
Translation result represents unit 506, is used for after described translate requirements recognition unit determines that searching request has translate requirements, and the translation result corresponding to searching request represents.
If search engine has the function of Real time identification query and reaction, then according to after determining that searching request has translate requirements, translation result represents unit 506 and can directly represent translation result corresponding to searching request at searched page, like this, the user just can in the situation that does not enter search results pages, obtain required translation result.
Described translation result represents the unit and specifically can be configured to:
Be used for representing the corresponding translation result of searching request in the search box; Represent the result as shown in Figure 2.
Described translation result represents unit further and is configured to:
The form of the corresponding translation result of searching request with the search suggestion represented; Represent the result as shown in Figure 3.
In actual applications, for representing of translation result, can use the literal of the forms such as different fonts, color, also can use other media modes such as link, picture to represent.The content that represents not only can comprise direct translation result (such as dictionary definition, automatic translation result etc.), also can comprise other related contents, part of speech for example, usage, commonly used collocation, environment for use, example sentence, phonetic symbol, function of reading aloud etc.
In addition, in the another kind of embodiment in the application, translate requirements recognition unit 501 can also be used for judging whether the content of search suggestion has translate requirements after search engine receives searching request and generates the search suggestion.If identified translate requirements, then translation result represents can the search suggestion corresponding translation content revealing in unit 507 in the search Suggestion box, as shown in Figure 4.
For the convenience of describing, be divided into various unit with function when describing above device and describe respectively.Certainly, when implementing the application, can in same or a plurality of softwares and/or hardware, realize the function of each unit.
As seen through the above description of the embodiments, those skilled in the art can be well understood to the application and can realize by the mode that software adds essential general hardware platform.Based on such understanding, the part that the application's technical scheme contributes to prior art in essence in other words can embody with the form of software product, this computer software product can be stored in the storage medium, such as ROM/RAM, magnetic disc, CD etc., comprise that some instructions are with so that a computer equipment (can be personal computer, server, the perhaps network equipment etc.) carry out the described method of some part of each embodiment of the application or embodiment.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and identical similar part is mutually referring to getting final product between each embodiment, and each embodiment stresses is difference with other embodiment.Especially, for device or system embodiment, because its basic simlarity is in embodiment of the method, so describe fairly simplely, relevant part gets final product referring to the part explanation of embodiment of the method.Apparatus and system embodiment described above only is schematic, wherein said unit as the separating component explanation can or can not be physically to separate also, the parts that show as the unit can be or can not be physical locations also, namely can be positioned at a place, perhaps also can be distributed on a plurality of network element.Can select according to the actual needs wherein some or all of module to realize the purpose of present embodiment scheme.Those of ordinary skills namely can understand and implement in the situation of not paying creative work.
The application can be used in numerous general or special purpose computingasystem environment or the configuration.For example: personal computer, server computer, handheld device or portable set, plate equipment, multicomputer system, the system based on microprocessor, set top box, programmable consumer-elcetronics devices, network PC, small-size computer, mainframe computer, comprise distributed computing environment of above any system or equipment etc.
The application can describe in the general context of the computer executable instructions of being carried out by computing machine, for example program module.Usually, program module comprises the routine carrying out particular task or realize particular abstract data type, program, object, assembly, data structure etc.Also can in distributed computing environment, put into practice the application, in these distributed computing environment, be executed the task by the teleprocessing equipment that is connected by communication network.In distributed computing environment, program module can be arranged in the local and remote computer-readable storage medium that comprises memory device.
The above only is the application's embodiment; should be pointed out that for those skilled in the art, under the prerequisite that does not break away from the application's principle; can also make some improvements and modifications, these improvements and modifications also should be considered as the application's protection domain.

Claims (20)

1. a search need recognition methods is characterized in that, comprising:
According to user's historical behavior daily record, obtain user's employed keyword when carrying out translating operation;
The frequency of occurrences to the keyword that obtains is added up;
After receiving searching request, judge according to statistics whether the frequency of occurrences of searching key word in this searching request surpasses default threshold value, if so, determine that then this searching request has translate requirements.
2. method according to claim 1 is characterized in that, the described user's employed keyword when carrying out translating operation that obtains comprises:
If the user in the given Search Results of search engine, has selected to provide the Search Results of translation service, this searches for employed keyword then to obtain the user.
3. method according to claim 1 is characterized in that, the described user's employed keyword when carrying out translating operation that obtains comprises:
If according to the searching request of user's input, can clearly judge this search and have translate requirements, then obtain the keyword that this search has the translate requirements part.
4. method according to claim 1 is characterized in that, the described user's employed keyword when carrying out translating operation that obtains comprises:
Obtain the keyword that the user inputs in the translation series products.
5. method according to claim 1 is characterized in that, the described frequency of occurrences to the keyword that obtains is added up, and comprising:
Utilize the n-gram model, the frequency of each n-gram unit of occurring in the keyword that obtains is added up.
6. method according to claim 1 is characterized in that, described receive searching request after, judge that according to statistics whether the frequency of occurrences of searching key word in this searching request surpasses default threshold value, comprising:
According to statistics, obtain the frequency of each n-gram unit in the searching key word;
Whether the frequency values sum of judging each n-gram unit in the searching key word surpasses default threshold value.
7. each described method is characterized in that according to claim 1-6, before the frequency of occurrences of the keyword that obtains is added up, also comprises:
The keyword that obtains is carried out lemmatization process and/or remove the stop words processing.
8. method according to claim 7 is characterized in that, whether the frequency of occurrences of searching key word surpasses before the default threshold value in judging searching request, also comprises:
Searching key word in the searching request is carried out lemmatization process and/or remove the stop words processing.
9. each described method is characterized in that according to claim 1-6, after definite searching request has translate requirements, comprises also translation result corresponding to searching request represented that the exhibiting method of described translation result comprises:
In the search box, represent the corresponding translation result of searching request; Or
The form of the corresponding translation result of searching request with the search suggestion represented.
10. each described method is characterized in that according to claim 1-6, after receiving searching request and generating the search suggestion, also comprises:
Whether the content of judging the search suggestion has translate requirements.
11. a search need recognition device is characterized in that, comprising:
Translation keyword acquiring unit is used for the historical behavior daily record according to the user, obtains user's employed keyword when carrying out translating operation;
Translation keyword statistic unit is used for the frequency of occurrences of the keyword that obtains is added up;
The translate requirements recognition unit after being used for receiving searching request, judges according to statistics whether the frequency of occurrences of searching key word in this searching request surpasses default threshold value, if so, determines that then this searching request has translate requirements.
12. device according to claim 11 is characterized in that, described translation keyword acquiring unit, and concrete configuration is:
Be used for the user at the given Search Results of search engine, selected to provide in the situation of Search Results of translation service, this searches for employed keyword to obtain the user.
13. device according to claim 11 is characterized in that, described translation keyword acquiring unit, and concrete configuration is:
Be used in the searching request according to user's input, can clearly judge in the situation that this search has translate requirements, obtain the keyword that this search has the translate requirements part.
14. device according to claim 11 is characterized in that, described translation keyword acquiring unit, and concrete configuration is:
Be used for obtaining the keyword that the user inputs at the translation series products.
15. device according to claim 11 is characterized in that, described translation keyword statistic unit, and concrete configuration is:
Be used for utilizing the n-gram model, the frequency of each n-gram unit of occurring in the keyword that obtains is added up.
16. device according to claim 11 is characterized in that, described translate requirements recognition unit, and concrete configuration is:
Be used for according to statistics, obtain the frequency of each n-gram unit in the searching key word;
Whether the frequency values sum of judging each n-gram unit in the searching key word surpasses default threshold value.
17. each described device is characterized in that according to claim 11-16, this device also comprises:
Translation keyword pretreatment unit was used for before described translation keyword statistic unit is added up the frequency of occurrences of the keyword that obtains, the keyword that obtains is carried out lemmatization process and/or remove stop words and process.
18. device according to claim 17 is characterized in that, this device also comprises:
The searching key word pretreatment unit,, the searching key word in the searching request is carried out lemmatization process and/or remove the stop words processing whether above before the default threshold value for the frequency of occurrences of judging the searching request searching key word at described translation keyword statistic unit.
19. each described device is characterized in that according to claim 11-16, also comprises:
Translation result represents the unit, is used for after described translate requirements recognition unit determines that searching request has translate requirements, and the translation result corresponding to searching request represents, and described translation result represents the unit concrete configuration and is:
Be used for representing the corresponding translation result of searching request in the search box; Or
The form of the corresponding translation result of searching request with the search suggestion represented.
20. each described device is characterized in that according to claim 11-16, described translate requirements recognition unit also is used for after receiving searching request and generating the search suggestion, judges whether the content of search suggestion has translate requirements.
CN201110258835.3A 2011-09-02 2011-09-02 A kind of search need recognition methods and device Active CN102982025B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110258835.3A CN102982025B (en) 2011-09-02 2011-09-02 A kind of search need recognition methods and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110258835.3A CN102982025B (en) 2011-09-02 2011-09-02 A kind of search need recognition methods and device

Publications (2)

Publication Number Publication Date
CN102982025A true CN102982025A (en) 2013-03-20
CN102982025B CN102982025B (en) 2016-05-11

Family

ID=47856064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110258835.3A Active CN102982025B (en) 2011-09-02 2011-09-02 A kind of search need recognition methods and device

Country Status (1)

Country Link
CN (1) CN102982025B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103714054A (en) * 2013-12-30 2014-04-09 北京百度网讯科技有限公司 Translation method and translation device
CN103793364A (en) * 2014-01-23 2014-05-14 北京百度网讯科技有限公司 Method and device for conducting automatic phonetic notation processing and display on text
CN105677927A (en) * 2016-03-31 2016-06-15 百度在线网络技术(北京)有限公司 Method and device for providing searching result
CN105956038A (en) * 2016-04-26 2016-09-21 宇龙计算机通信科技(深圳)有限公司 Notification message management method and apparatus as well as terminal
CN110147479A (en) * 2017-10-31 2019-08-20 北京搜狗科技发展有限公司 Recognition methods, device and the identification device for search behavior of search behavior
CN112068981A (en) * 2020-09-24 2020-12-11 中国人民解放军国防科技大学 Knowledge base-based fault scanning recovery method and system in Linux operating system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060064411A1 (en) * 2004-09-22 2006-03-23 William Gross Search engine using user intent
CN1761972A (en) * 2003-03-18 2006-04-19 Nhn株式会社 A method of determining an intention of internet user, and a method of advertising via internet by using the determining method and a system thereof
US20090043749A1 (en) * 2007-08-06 2009-02-12 Garg Priyank S Extracting query intent from query logs
US20090307198A1 (en) * 2008-06-10 2009-12-10 Yahoo! Inc. Identifying regional sensitive queries in web search
US20110035397A1 (en) * 2006-12-20 2011-02-10 Yahoo! Inc. Discovering query intent from search queries and concept networks
CN102012900A (en) * 2009-09-04 2011-04-13 阿里巴巴集团控股有限公司 An information retrieval method and system
CN102096717A (en) * 2011-02-15 2011-06-15 百度在线网络技术(北京)有限公司 Search method and search engine

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1761972A (en) * 2003-03-18 2006-04-19 Nhn株式会社 A method of determining an intention of internet user, and a method of advertising via internet by using the determining method and a system thereof
US20060064411A1 (en) * 2004-09-22 2006-03-23 William Gross Search engine using user intent
US20110035397A1 (en) * 2006-12-20 2011-02-10 Yahoo! Inc. Discovering query intent from search queries and concept networks
US20090043749A1 (en) * 2007-08-06 2009-02-12 Garg Priyank S Extracting query intent from query logs
US20090307198A1 (en) * 2008-06-10 2009-12-10 Yahoo! Inc. Identifying regional sensitive queries in web search
CN102012900A (en) * 2009-09-04 2011-04-13 阿里巴巴集团控股有限公司 An information retrieval method and system
CN102096717A (en) * 2011-02-15 2011-06-15 百度在线网络技术(北京)有限公司 Search method and search engine

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103714054A (en) * 2013-12-30 2014-04-09 北京百度网讯科技有限公司 Translation method and translation device
CN103714054B (en) * 2013-12-30 2017-03-15 北京百度网讯科技有限公司 Interpretation method and translating equipment
CN103793364A (en) * 2014-01-23 2014-05-14 北京百度网讯科技有限公司 Method and device for conducting automatic phonetic notation processing and display on text
CN103793364B (en) * 2014-01-23 2018-09-07 北京百度网讯科技有限公司 The method and apparatus that automatic phonetic notation processing and display are carried out to text
CN105677927A (en) * 2016-03-31 2016-06-15 百度在线网络技术(北京)有限公司 Method and device for providing searching result
CN105677927B (en) * 2016-03-31 2019-04-12 百度在线网络技术(北京)有限公司 For providing the method and apparatus of search result
CN105956038A (en) * 2016-04-26 2016-09-21 宇龙计算机通信科技(深圳)有限公司 Notification message management method and apparatus as well as terminal
WO2017185463A1 (en) * 2016-04-26 2017-11-02 宇龙计算机通信科技(深圳)有限公司 Management method and management device for notification message, and terminal
CN110147479A (en) * 2017-10-31 2019-08-20 北京搜狗科技发展有限公司 Recognition methods, device and the identification device for search behavior of search behavior
CN110147479B (en) * 2017-10-31 2021-06-11 北京搜狗科技发展有限公司 Search behavior recognition method and device and search behavior recognition device
CN112068981A (en) * 2020-09-24 2020-12-11 中国人民解放军国防科技大学 Knowledge base-based fault scanning recovery method and system in Linux operating system
CN112068981B (en) * 2020-09-24 2022-06-21 中国人民解放军国防科技大学 Knowledge base-based fault scanning recovery method and system in Linux operating system

Also Published As

Publication number Publication date
CN102982025B (en) 2016-05-11

Similar Documents

Publication Publication Date Title
US8719005B1 (en) Method and apparatus for using directed reasoning to respond to natural language queries
Hung Word of mouth quality classification based on contextual sentiment lexicons
US8751218B2 (en) Indexing content at semantic level
Mladenović et al. Hybrid sentiment analysis framework for a morphologically rich language
WO2010107327A1 (en) Natural language processing method and system
US9632998B2 (en) Claim polarity identification
US20220180317A1 (en) Linguistic analysis of seed documents and peer groups
Gacitua et al. Relevance-based abstraction identification: technique and evaluation
CN102982025A (en) Identification method and device for searching requirement
Alrefai et al. Sentiment analysis for Arabic language: A brief survey of approaches and techniques
Bouarroudj et al. Named entity disambiguation in short texts over knowledge graphs
Ménard et al. Concept extraction from business documents for software engineering projects
KR20120064559A (en) Apparatus and method for question analysis for open web question-answering
Mandal et al. A sequence labeling model for catchphrase identification from legal case documents
CN111737607B (en) Data processing method, device, electronic equipment and storage medium
Rajput Ontology based semantic annotation of Urdu language web documents
US20220365956A1 (en) Method and apparatus for generating patent summary information, and electronic device and medium
Sood et al. Creating domain based dictionary and its evaluation using classification accuracy
US20230196023A1 (en) Classification of user sentiment based on machine learning
Sidhu et al. Role of machine translation and word sense disambiguation in natural language processing
El Idrissi Esserhrouchni et al. Learning domain taxonomies: The TaxoLine approach
Xia et al. Research on feature-based opinion mining using topic maps
Singh et al. Neural network guided fast and efficient query-based stemming by predicting term co-occurrence statistics
Krilavičius et al. News media analysis using focused crawl and natural language processing: case of Lithuanian news websites
Selvadurai A natural language processing based web mining system for social media analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant