CN103440242A

CN103440242A - User search behavior-based personalized recommendation method and system

Info

Publication number: CN103440242A
Application number: CN2013102600388A
Authority: CN
Inventors: 罗峰; 黄苏支; 李娜
Original assignee: BEIJING IZP TECHNOLOGIES Co Ltd
Current assignee: BEIJING IZP TECHNOLOGIES Co Ltd
Priority date: 2013-06-26
Filing date: 2013-06-26
Publication date: 2013-12-11

Abstract

The embodiment of the invention provides a text training method. The method comprises the following steps of: acquiring a corpus and a user search behavior test document, wherein the corpus comprises a plurality of categories, each category comprises a plurality of training texts, and the user search behavior test document belongs to the category and comprises a user identity, a search phrase and corresponding search time; generating a training characteristic item of the category by using the training text of each category and generating a search characteristic item of the user search behavior test document by using the search phrase; configuring weight according to the search characteristic item and the corresponding search time and constructing a user preference characteristic classification model according to the search characteristic item and the training characteristic item. The embodiment of the invention has the characteristics of easiness in calculation, short calculation time, high calculation accuracy, high matching rate for corresponding recommendation and high success rate for recommendation.

Description

A kind of personalized recommendation method and system based on the user search behavior

Technical field

The embodiment of the present application relates to technical field of data processing, particularly relates to a kind of text training method, a kind of text training system, a kind of hobby tagsort method, a kind of hobby tagsort system, a kind of personalized recommendation method and a kind of personalized recommendation system based on the user search behavior based on the user search behavior based on the user search behavior based on the user search behavior.

Background technology

The fast development of the Internet Internet has been brought people into information society and the age of Internet economy, and development and the personal lifestyle of enterprise all produced to deep effect.Simultaneously, excessive information makes people can't therefrom obtain efficiently the part oneself needed, and the service efficiency of information reduces on the contrary.

When people need to obtain the information of hobby, often to manually be searched for, then filter incoherent information, dish obtains preference information.Obviously, people are unwilling the time of costing a lot of money in the online searching preference information extended endlessly, but wish and information that can like that recommend according to the hobby auto acquisition system of self.Therefore, the effect of calculating user interest preference classification seems very outstanding.

Website channel or the webpage that can access according to the user at present carry out the classification of interest preference, and step is:

(1), channel or webpage are manually marked, mark type of preferences under its audient;

(2), channel or webpage and the number of times thereof of statistic of user accessing, according to the number of times descending sort and obtain top n channel or webpage; Wherein, N is positive integer;

(3), if the user has accessed certain webpage of certain channel, get other webpages in the above-mentioned channel obtained and recommended, or in the above-mentioned webpage obtained, other co-channel (adjective) webpages are recommended.

For the method, the accuracy of its classification depends on the granularity in the website channel division, in the excessive situation of granularity, to the accuracy of classification, can have a negative impact.

Therefore, need at present the urgent technical matters solved of those skilled in the art to be exactly: propose a kind of mechanism of calculating the user interest profile classification, classification accuracy is high, can serve targetedly based on result of calculation, improves the efficiency of service.

The application content

The embodiment of the present application technical matters to be solved is to provide a kind of method of the feature extraction based on user behavior and a kind of method of the personalized recommendation based on user behavior, can the behavioural information based on the user user be divided into to the customer group that hobby is close, and extract different user group's feature, make this feature can distinguish different customer groups, when personalized recommendation, follow feature accordingly to be recommended quickly and efficiently.

Accordingly, the embodiment of the present application also provides a kind of system of the feature extraction based on user behavior and a kind of system of the personalized recommendation based on user behavior, in order to guarantee the implementation and application of said method.

The embodiment of the present application discloses a kind of text training method, comprising:

Obtain corpus, and, user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;

Adopt the training text of each classification to generate the training characteristics item of described classification, and, adopt described search phrase to generate the search characteristics item of described user search performance testing document;

According to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model for the search characteristics item.

Preferably, the step that the training text of described each classification of employing generates the training characteristics item of described classification comprises:

In each classification, for each training document, carry out participle;

Add up the frequency of occurrence of each participle in described classification;

According to frequency of occurrence, described participle is sorted from high to low;

Extract front M participle and the frequency of occurrence thereof of predetermined number, generate the training characteristics item of described classification; Wherein, M is positive integer;

The step that the described search phrase of described employing generates the search characteristics item of described user search performance testing document comprises:

Carry out participle for each search phrase;

Add up the frequency of occurrence of each participle;

Extract the top n participle of predetermined number, generate the search characteristics item of described user search performance testing document; Wherein, N is positive integer.

Preferably, described for the search characteristics item according to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, the step that builds user's disaggregated model comprises:

The training text that calculates each classification accounts for the prior probability of the ratio of all training texts;

Using the frequency of occurrence of the training characteristics item of each classification as the search characteristics item frequency of occurrence in described classification identical with described training characteristics item;

Adopt described frequency of occurrence to calculate the first condition probability that described search characteristics item appears in described each classification;

According to being corresponding search characteristics item configure weights search time;

Adopt described weight and described first condition probability calculation the second condition probability of described user search performance testing document to occur in each classification;

Adopt described prior probability and described second condition probability calculation user search performance testing document to belong to the posterior probability of each classification;

Extract the classification of classification corresponding to maximum posterior probability as user search performance testing document ownership;

According to the classification of the former ownership of described user search performance testing document and the classification of current calculating ownership, judge whether to meet pre-conditioned; If obtain final user preferences tagsort model; If not, return to the described sub-step according to being corresponding search characteristics item configure weights search time.

Preferably, described first condition probability obtains by following formula:

P (t_{j} | c_{k}) = \frac{1 + TF (t_{j}, c_{k})}{| V | + Σ_{j = 1}^{| V |} TF (t_{j}, c_{k})}

Wherein, TF (t _j, c _k) be search characteristics item t _jat classification c _kin frequency of occurrence, | V| be the classification c _kthe total quantity of middle training characteristics item.

Preferably, the described step according to being corresponding search characteristics item configure weights search time comprises:

The training text quantity of described search characteristics item appears in statistics;

The half life period of configure user interest;

Adopt following formula to obtain described weight:

w_{i} = \frac{TF (t_{i}) \times TDF (t_{i})}{\sqrt{Σ_{i = 1}^{n} {(TF (t_{i}) \times IDF (t_{i}))}^{2}}} * e^{- \frac{\ln 2}{hl} (d^{today} - d^{lati})}

Wherein, TF (t _i) be search characteristics item t _iat classification c _kin frequency of occurrence;

The half life period that hl is described user interest;

D ^todayfor the current time, d ^lastfor search characteristics item t _ifrom nearest search time of current time;

; Wherein, n _ifor search characteristics item t occurring _ithe quantity of training text, the total quantity that N is training text, L is factor of influence.

Preferably, described second condition probability obtains by following formula:

P (d_{i} | c_{k}) = Π_{j = 1}^{n} P (t_{j} | c_{k}) * ω_{jk}

Wherein, the total quantity that n is the search characteristics item; P(t _j| c _k) be described first condition probability, ω _jkfor search characteristics item t _jweight.

Preferably, classification corresponding to the posterior probability of described maximum obtains by following formula:

class (d_{i}) = \arg \max_{1 \leq k \leq | c |} {P (c_{k} | d_{i})} = \arg \max_{1 \leq k \leq | c |} {P (c_{k}) P (d_{i} | c_{k})}

Wherein, | C| is the total quantity of classification, P (c _k| d _i) be described posterior probability, P (c _k) be described prior probability, P (d _i| c _k) be described second condition probability.

The embodiment of the present application discloses a kind of hobby tagsort method based on the user search behavior, comprising:

Collect user's original search behavior information, according to described user's original search behavior Information generation user search behavior document; Described user search performance testing document package contains user ID, search phrase and corresponding search time;

Adopt described user search behavior document and user preferences tagsort model to calculate user's hobby tagsort;

Wherein, described user preferences tagsort model generates in the following way:

The embodiment of the present application discloses a kind of personalized recommendation method based on the user search behavior, comprising:

Obtain user's behavioural information, described user's behavioural information comprises user ID;

Determine user's hobby tagsort according to described user ID;

Adopt described preference categories to generate corresponding personalized recommendation information;

Adopt described personalized recommendation information to be recommended to the active user;

Wherein, described user's hobby tagsort generates in the following way:

Adopt described user search behavior document and user preferences tagsort model to calculate user's preference categories;

The embodiment of the present application discloses a kind of text training system, comprising:

The corpus acquisition module, for obtaining corpus, and,

The test document acquisition module, for obtaining user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;

Training characteristics item generation module, generate the training characteristics item of described classification for the training text that adopts each classification, and,

Search characteristics item generation module, generate the search characteristics item of described user search performance testing document for adopting described search phrase;

User preferences tagsort model construction module, for for the search characteristics item according to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model.

The embodiment of the present application discloses a kind of hobby tagsort system based on the user search behavior, comprising:

User search behavior document creation module, for collecting user's original search behavior information, according to described user's original search behavior Information generation user search behavior document; Described user search performance testing document package contains user ID, search phrase and corresponding search time;

Hobby tagsort module, like for adopting described user search behavior document and user the hobby tagsort that the good disaggregated model of feature calculates the user;

Wherein, described hobby tagsort module comprises following submodule:

Corpus obtains submodule, for obtaining corpus, and,

Test document is obtained submodule, for user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;

The training characteristics item generates submodule, generates the training characteristics item of described classification for the training text that adopts each classification, and,

The search characteristics item generates submodule, for adopting described search phrase, generates the search characteristics item of described user search performance testing document;

The embodiment of the present application discloses a kind of personalized recommendation system based on the user search behavior, comprising:

User behavior acquisition of information module, for obtaining user's behavioural information, described user's behavioural information comprises user ID;

Hobby tagsort determination module, for determining user's hobby tagsort according to described user ID;

Personalized recommendation information production module, generate corresponding personalized recommendation information for adopting described preference categories;

Recommending module, recommended to the active user for adopting described personalized recommendation information;

Wherein, described hobby tagsort determination module comprises following submodule:

User search behavior document generates submodule, for collecting user's original search behavior information, according to described user's original search behavior Information generation user search behavior document; Described user search performance testing document package contains user ID, search phrase and corresponding search time;

Hobby tagsort submodule, like for adopting described user search behavior document and user the hobby tagsort that the good disaggregated model of feature calculates the user;

Wherein, described hobby tagsort module comprises following submodule:

The corpus acquisition module, for obtaining corpus, and,

The test document acquisition module, for user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;

Training characteristics item production module, generate the training characteristics item of described classification for the training text that adopts each classification, and,

Search characteristics item production module, generate the search characteristics item of described user search performance testing document for adopting described search phrase;

With background technology, compare, the embodiment of the present application comprises following advantage:

The embodiment of the present application adopts NB Algorithm, calculates simply, and computing time is few, the user search behavioural information, according to configure weights search time, is calculated to the classification of user's hobby feature, and accuracy in computation is high, the corresponding matching rate of being recommended is high, recommends success ratio high.

The embodiment of the present application by personalized recommendation information configured in advance in user profile, in the time of user access, can directly obtain recommendation information corresponding to user after obtaining user ID, and need not calculate classification according to its user search behavioural information again, saved system resource and the efficiency that has improved personalized recommendation.

The accompanying drawing explanation

Fig. 1 shows the flow chart of steps of a kind of text training method embodiment that the embodiment of the present application provides;

Fig. 2 shows the flow chart of steps of a kind of hobby tagsort embodiment of the method based on the user search behavior that the embodiment of the present application provides;

Fig. 3 shows the flow chart of steps of the embodiment of the method for a kind of personalized recommendation based on the user search behavior that the embodiment of the present application provides;

Fig. 4 shows the structured flowchart of a kind of text training system embodiment that the embodiment of the present application provides;

Fig. 5 shows the structured flowchart of a kind of hobby tagsort System Implementation based on the user search behavior that the embodiment of the present application provides;

Fig. 6 shows the structured flowchart of the System Implementation of a kind of personalized recommendation based on the user search behavior that the embodiment of the present application provides.

Embodiment

For above-mentioned purpose, the feature and advantage that make the embodiment of the present application can become apparent more, below in conjunction with the drawings and specific embodiments, the embodiment of the present application is described in further detail.

With reference to Fig. 1, show the flow chart of steps of a kind of text training method embodiment of the embodiment of the present application, specifically can comprise the steps:

Step 101, obtain corpus, and,

Step 102, obtain user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;

Be appreciated that, the applied corpus of the embodiment of the present application can be a fairly large standard Chinese text classification test platform, can comprise a plurality of classification in corpus, for example military affairs, physical culture, tourism, health etc., test document under classification can be the various language materials such as news, article.

Test data can be chosen the related data of some user search behaviors, and these users' interest preference can be marked by manual type, these users' interest preference be categorized as known.Certainly, the classification of these interest preferences is corresponding with the classification in corpus.

Described user related data comprises user ID, search phrase and corresponding search time, for example:

User A 2012-10-10 phrase 1 phrase 3 phrase 5 phrases 4

User A 2012-10-11 phrase 6 phrase 5 phrase 3 phrases 2

User A 2012-10-12 phrase 9 phrase 8 phrase 1 phrases 4

User A 2012-10-13 phrase 2 phrase 7 phrase 3 phrases 6

In general, can think that above-mentioned data are one piece of documents that consist of user and search phrase.

Step 103, adopt the training text of each classification to generate the training characteristics item of described classification, and,

Step 104, adopt described search phrase to generate the search characteristics item of described user search performance testing document;

It should be noted that, training characteristics item and search characteristics item, can be word, can be word, can also be phrase, and the embodiment of the present application is not limited this.

Usually, the phrase of training text and user search performance testing document very more than, if the training of directly being classified and classification, calculated amount is very big, therefore before being trained and classifying, under the prerequisite that does not affect classification accuracy, need to reduce the dimension of feature.

In a preferred exemplary of the embodiment of the present application, step 103 can comprise following sub-step:

Sub-step S11, in each classification, carry out participle for each training document;

Sub-step S12, add up the frequency of occurrence of each participle in described classification;

Sub-step S13, sorted described participle from high to low according to frequency of occurrence;

Sub-step S14, extract front M participle and the frequency of occurrence thereof of predetermined number, generates the training characteristics item of described classification; Wherein, M is positive integer.

In a preferred exemplary of the embodiment of the present application, step 104 can comprise following sub-step:

Sub-step S21, carry out participle for each search phrase;

Sub-step S22, add up the frequency of occurrence of each participle;

Sub-step S23, sorted described participle from high to low according to frequency of occurrence;

Sub-step S24, extract the top n participle of predetermined number, generates the search characteristics item of described user search performance testing document; Wherein, N is positive integer.

Being appreciated that participle, is by the expectation information of training text and user search performance testing document, is divided into word, or is divided into word, or be divided into less phrase, and the embodiment of the present invention is not limited this.

Each classification and each user search performance testing document, all can have corresponding training characteristics item and search characteristics item.

For example, while generating the search characteristics item of user search performance testing document, can obtain following formatted data:

User A 2012-10-10 word 1, number of times a word 2, number of times b ..., word m, frequency n

User A 2012-10-11 word 1, number of times a word 3, number of times b ..., word g, number of times k

User A 2012-10-12 word 4, number of times a word 2, number of times b ..., word p, number of times q

User A 2012-10-13 word 3, number of times a word 2, number of times b ..., word w, number of times e

Step 105, according to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model for the search characteristics item.

Be appreciated that after the user preferences characteristic model builds, just can directly use this model to be classified to the document of specified format.

In the application's a preferred embodiment, step 105 can comprise following sub-step:

Sub-step S31, the training text that calculates each classification accounts for the prior probability of the ratio of all training texts;

In a preferred exemplary of the embodiment of the present invention, the prior probability P (c of classification _k) can adopt maximal possibility estimation (Maximum Likelihood Estimate is called for short MLE) to be asked for, described prior probability can obtain by following formula:

P (c_{k}) = \frac{Σ_{i = 1}^{| D |} P (c_{k} | d_{i})}{| D |}

Wherein, D={d ₁, d ₂..., d _{| D|}be training text, | the total quantity that D| is training text;

As training text d _ibelong to classification c _kthe time, P (d _i| c _k)=1; As training text d _ido not belong to classification c _kthe time, P (d _i| c _k)=0.

Sub-step S32, using the frequency of occurrence of the training characteristics item of each classification as the search characteristics item frequency of occurrence in described classification identical with described training characteristics item;

Be appreciated that the frequency of occurrence of search characteristics item in described classification, i.e. characteristic item frequency TF (Term Frequency).The training document of different classification, very big-difference is arranged on the frequency of occurrence of some search characteristics item, therefore the characteristic item frequency is one of important references of user search style of writing test document classification, and the general larger search characteristics item of TF value has higher weight in this classification.

Sub-step S33, adopt described frequency of occurrence to calculate the first condition probability that described search characteristics item appears in described each classification;

In a preferred exemplary of the embodiment of the present invention, the first condition probability can adopt multinomial model to be calculated, and described first condition probability can obtain by following formula:

P (t_{j} | c_{k}) = \frac{1 + TF (t_{j}, c_{k})}{| V | + Σ_{j = 1}^{| V |} TF (t_{j}, c_{k})}

Sub-step S34, according to being corresponding search characteristics item configure weights search time;

In a preferred exemplary of the embodiment of the present invention, sub-step S34 further can comprise the steps:

Sub-step S341, the training text quantity of described search characteristics item appears in statistics;

Sub-step S342, the half life period of configure user interest;

Sub-step S343 adopts following formula to obtain described weight:

w_{i} = \frac{TF (t_{i}) \times IDF (t_{i})}{\sqrt{Σ_{i = 1}^{n} {(TF (t_{i}) \times IDF (t_{i}))}^{2}}} * e^{- \frac{\ln 2}{hl} (d^{today} - d^{lati})}

The half life period that hl is described user interest;

The simple TF that uses is not enough to mean the percentage contribution of a search characteristics item to classification, such as, all may occur that some do not have contributive function word (as: interjection, preposition, conjunction etc.) to classification in user search performance testing document and training document, and the frequency of occurrences of these function words is generally all larger, be that the TF value is also larger, thereby classification is had a negative impact.The search characteristics word that the TF value is high in addition, if the TF value is all higher in all documents, that just it is hard to tell such search characteristics item represents the attribute of which class on earth.

Anti-document frequency (Inverse Document Frequency:IDF) is to think that the training document frequency that this search characteristics item occurs is higher, and its classification information comprised is just lower, also just means that this search characteristics item is more inessential.

Factor of influence L is that those skilled in the art are set according to actual conditions, and the embodiment of the present application is not limited at this.

In a preferred exemplary of the embodiment of the present application, the value of L is 0.01.

Therefore, for above-mentioned limitation, the embodiment of the present application adopts TF and IDF to be combined with, i.e. TFIDF weight, and in addition, the normalization of TFIDF will combine with the time decay factor, finally just obtains the weight of search characteristics item.

D ^todayfor the current time, be preferably the date.D ^lastfor search characteristics item t _ifrom nearest search time of current time, be preferably the date.Hl means the half life period, after hl days user's interest attenuation half, but be not linear attenuation, the value of hl is that those skilled in the art are determined according to actual, the embodiment of the present invention is not limited at this.

Sub-step S35, adopt described weight and described first condition probability calculation the second condition probability of described user search performance testing document to occur in each classification;

In a preferred exemplary of the embodiment of the present invention, described second condition probability can obtain by following formula:

P (d_{i} | c_{k}) = Π_{j = 1}^{n} P (t_{j} | c_{k}) * ω_{jk}

As time goes on, the weight of user's search phrase can decay thereupon.

Sub-step S36, adopt described prior probability and described second condition probability calculation user search performance testing document to belong to the posterior probability of each classification;

In a preferred exemplary of the embodiment of the present invention, described posterior probability can obtain by following formula:

P (c_{k} | d_{i}) = \frac{P (c_{k}) P (d_{i} | c_{k})}{P (d_{i})}

Wherein, P (c _k) be described prior probability, P (d _i| c _k) be described second condition probability, P (d _i) for the probability of occurrence of user search performance testing document.

Sub-step S37, extract the classification of classification corresponding to maximum posterior probability as user search performance testing document ownership;

In a preferred exemplary of the embodiment of the present invention, classification corresponding to the posterior probability of described maximum obtains by following formula:

class (d_{i}) = \arg \max_{1 \leq k \leq | c |} {P (c_{k} | d_{i})} = \arg \max_{1 \leq k \leq | c |} {P (c_{k}) P (d_{i} | c_{k})}

In a preferred exemplary of the embodiment of the present invention, due to P (d _i) be constant, institute is in the hope of class(d _i), as long as can obtain

\arg \max_{1 \leq k \leq | c |} {P (c_{k}) P (d_{i} | c_{k})}

Get final product.

Sub-step S38, according to the classification of the former ownership of described user search performance testing document and the classification of current calculating ownership, judge whether to meet pre-conditioned; If obtain final user preferences tagsort model; If not, return to sub-step S34.

In a preferred embodiment of the embodiment of the present application, can use recall rate (recall) and degree of accuracy (precision) standard as judgement.

For given user search performance testing document and each classification, to result of calculation, following statistics can be arranged:

So, recall rate r=a/(a+c), degree of accuracy p=a/(a+b).

F ₁index is F ₁=2pr/(p+r).

Obtain the F of each class ₁value, then obtain F ₁mean value, then can be with this user preferences tagsort model of this mean value comprehensive evaluation.

The standard of estimating, pre-conditioned, by those skilled in the art, according to actual conditions, to be set, the embodiment of the present application is not limited at this.

Certainly, above-mentioned judgment mode, just as example, when implementing the embodiment of the present application, can arrange other judgment modes according to actual conditions, and the embodiment of the present application is not limited this

When meeting when pre-conditioned, obtain final user preferences tagsort model; If not, reconfigure weight, reconfigure the half life period hl of user interest, again carry out the text training.

With reference to Fig. 2, show the flow chart of steps of a kind of hobby tagsort embodiment of the method based on the user search behavior of the embodiment of the present application, specifically can comprise the steps:

Step 201, collection user's original search behavior information, according to described user's original search behavior Information generation user search behavior document; Described user search performance testing document package contains user ID, search phrase and corresponding search time;

Step 202, adopt described user search behavior document and user preferences tagsort model to calculate user's hobby tagsort;

Wherein, described step 202 specifically can comprise following sub-step:

Sub-step 2021, obtain corpus, and, user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;

Sub-step 2022, adopt the training text of each classification to generate the training characteristics item of described classification, and, adopt described search phrase to generate the search characteristics item of described user search performance testing document;

Sub-step 2023, according to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model for the search characteristics item.

The embodiment of the present application can be by the user's that collects whole original search behavior information, or the original search behavior information of part, generates user search behavior document, and the embodiment of the present application is not limited this.

In the embodiment of the present application, because the obtain manner of user preferences tagsort model is substantially similar to its obtain manner at text training method embodiment, so that describes is fairly simple, relevant part gets final product referring to the part explanation of text training method embodiment, and the embodiment of the present invention is not described in detail at this.

With reference to Fig. 3, show the flow chart of steps of a kind of personalized recommendation method embodiment based on the user search behavior of the embodiment of the present application, specifically can comprise the steps:

Step 301, obtain user's behavioural information, and described user's behavioural information comprises user ID;

Step 302, determine user's hobby tagsort according to described user ID;

Step 303, adopt described preference categories to generate corresponding personalized recommendation information;

Step 304, adopt described personalized recommendation information to be recommended to the active user;

Wherein, described step 302 can comprise following sub-step:

Sub-step S41, collection user's original search behavior information, according to described user's original search behavior Information generation user search behavior document; Described user search performance testing document package contains user ID, search phrase and corresponding search time;

Sub-step S42, adopt described user search behavior document and user preferences tagsort model to calculate user's preference categories;

Wherein, described sub-step S42 further can comprise following sub-step:

Sub-step S421, obtain corpus, and, user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;

Sub-step S422, adopt the training text of each classification to generate the training characteristics item of described classification, and, adopt described search phrase to generate the search characteristics item of described user search performance testing document;

Sub-step S423, according to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model for the search characteristics item.

In specific implementation, can be in advance by the user by search behavior, by the classification of the correspondence of its division, and set up the incidence relation of described classification and user ID.In the time of user access, can directly obtain classification corresponding to user after obtaining user ID, obtain again this corresponding recommendation information of classifying, and need not be classified according to its user's search behavior again, saved system resource and the high efficiency of personalized recommendation.

Be appreciated that, for embodiment of the method, for simple description, therefore it all is expressed as to a series of combination of actions, but those skilled in the art should know, the embodiment of the present application is not subject to the restriction of described sequence of movement, because according to the embodiment of the present application, some step can adopt other orders or carry out simultaneously.Secondly, those skilled in the art also should know, the embodiment described in instructions all belongs to preferred embodiment, and related action and module might not be that the embodiment of the present application is necessary.

With reference to Fig. 4, show the structured flowchart of a kind of text training system embodiment that the embodiment of the present application provides, specifically can comprise as lower module:

Corpus acquisition module 401, for obtaining corpus, and,

Test document acquisition module 402, for obtaining user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;

Training characteristics item generation module 403, generate the training characteristics item of described classification for the training text that adopts each classification, and,

Search characteristics item generation module 404, generate the search characteristics item of described user search performance testing document for adopting described search phrase;

User preferences tagsort model construction module 405, for for the search characteristics item according to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model.

In the application's a preferred embodiment, described training characteristics item production module can comprise following submodule:

First participle submodule, in each classification, carry out participle for each training document;

The first frequency statistics submodule, for adding up the frequency of occurrence of each participle in described classification;

The first sequence submodule, for being sorted described participle from high to low according to frequency of occurrence;

First generates submodule, for front M participle and the frequency of occurrence thereof that extracts predetermined number, generates the training characteristics item of described classification; Wherein, M is positive integer.

In the application's a preferred embodiment, described search characteristics item production module can comprise following submodule:

The second participle submodule, for carrying out participle for each search phrase;

The second frequency statistics submodule, for adding up the frequency of occurrence of each participle;

The second sequence submodule, for being sorted described participle from high to low according to frequency of occurrence;

Second generates submodule, for extracting the top n participle of predetermined number, generates the search characteristics item of described user search performance testing document; Wherein, N is positive integer.

In the application's a preferred embodiment, described user preferences tagsort model construction module can comprise following submodule:

The prior probability calculating sub module, account for the prior probability of the ratio of all training texts for the training text that calculates each classification;

The 3rd frequency statistics submodule, for the frequency of occurrence of the training characteristics item using each classification as the search characteristics item identical with described training characteristics item the frequency of occurrence in described classification;

First condition probability calculation submodule, calculate for adopting described frequency of occurrence the first condition probability that described search characteristics item appears in described each classification;

Weight configuration submodule, for according to being corresponding search characteristics item configure weights search time;

Second condition probability calculation submodule, the second condition probability of described user search performance testing document occurs for adopting described weight and described first condition probability calculation in each classification;

The posterior probability calculating sub module, belong to the posterior probability of each classification for adopting described prior probability and described second condition probability calculation user search performance testing document;

Extract submodule, for extracting the classification of classification corresponding to maximum posterior probability as user search performance testing document ownership;

The judgement submodule, the classification for the classification according to the former ownership of described user search performance testing document and current calculating ownership, judge whether to meet pre-conditioned; If call the acquisition submodule; If not, call and return to submodule;

Obtain submodule, for obtaining final user preferences tagsort model;

Return to submodule, for returning to described weight configuration submodule.

In a preferred exemplary of the embodiment of the present application, described prior probability can obtain by following formula:

P (c_{k}) = \frac{Σ_{i = 1}^{| D |} P (c_{k} | d_{i})}{| D |}

In a preferred exemplary of the embodiment of the present application, described first condition probability can obtain by following formula:

P (t_{j} | c_{k}) = \frac{1 + TF (t_{j}, c_{k})}{| V | + Σ_{j = 1}^{| V |} TF (t_{j}, c_{k})}

In a preferred exemplary of the embodiment of the present application, described weight configuration submodule further can comprise following submodule:

The statistics submodule, for adding up the training text quantity that described search characteristics item occurs;

Half life period configuration submodule, for the half life period of configure user interest;

The weight calculation submodule obtains described weight for adopting following formula:

w_{i} = \frac{TF (t_{i}) \times IDF (t_{i})}{\sqrt{Σ_{i = 1}^{n} {(TF (t_{i}) \times IDF (t_{i}))}^{2}}} * e^{- \frac{\ln 2}{hl} (d^{today} - d^{lati})}

The half life period that hl is described user interest;

In a preferred exemplary of the embodiment of the present application, described second condition probability can obtain by following formula:

P (d_{i} | c_{k}) = Π_{j = 1}^{n} P (t_{j} | c_{k}) * ω_{jk}

In a preferred exemplary of the embodiment of the present application, described posterior probability can obtain by following formula:

P (c_{k} | d_{i}) = \frac{P (c_{k}) P (d_{i} | c_{k})}{P (d_{i})}

In a preferred exemplary of the embodiment of the present application, classification corresponding to the posterior probability of described maximum can obtain by following formula:

class (d_{i}) = \arg \max_{1 \leq k \leq | c |} {P (c_{k} | d_{i})} = \arg \max_{1 \leq k \leq | c |} {P (c_{k}) P (d_{i} | c_{k})}

With reference to Fig. 5, show the structured flowchart of a kind of hobby tagsort system embodiment based on the user search behavior that the embodiment of the present application provides, specifically can comprise as lower module:

User search behavior document creation module 501, for collecting user's original search behavior information, according to described user's original search behavior Information generation user search behavior document; Described user search performance testing document package contains user ID, search phrase and corresponding search time;

Hobby tagsort module 502, like for adopting described user search behavior document and user the hobby tagsort that the good disaggregated model of feature calculates the user;

Wherein, described hobby tagsort module comprises following submodule:

Corpus obtains submodule, for obtaining corpus, and,

With reference to Fig. 6, show the structured flowchart of a kind of personalized recommendation system embodiment based on the user search behavior that the embodiment of the present application provides, specifically can comprise as lower module:

User behavior acquisition of information module 601, for obtaining user's behavioural information, described user's behavioural information comprises user ID;

Hobby tagsort determination module 602, for determining user's hobby tagsort according to described user ID;

Personalized recommendation information production module 603, generate corresponding personalized recommendation information for adopting described preference categories;

Recommending module 604, recommended to the active user for adopting described personalized recommendation information;

Wherein, described hobby tagsort module comprises following submodule:

The corpus acquisition module, for obtaining corpus, and,

For system embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, relevant part gets final product referring to the part explanation of embodiment of the method.

Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is and the difference of other embodiment that between each embodiment, identical similar part is mutually referring to getting final product.

Those skilled in the art should understand, the embodiment of the embodiment of the present application can be provided as method, system or computer program.Therefore, the embodiment of the present application can adopt complete hardware implementation example, implement software example or in conjunction with the form of the embodiment of software and hardware aspect fully.And the embodiment of the present application can adopt the form that wherein includes the upper computer program of implementing of computer-usable storage medium (including but not limited to magnetic disk memory, CD-ROM, optical memory etc.) of computer usable program code one or more.

The embodiment of the present application is to describe with reference to process flow diagram and/or the block scheme of method, terminal device (system) and computer program according to the embodiment of the present application.Should understand can be in computer program instructions realization flow figure and/or block scheme each flow process and/or the flow process in square frame and process flow diagram and/or block scheme and/or the combination of square frame.Can provide these computer program instructions to the processor of multi-purpose computer, special purpose computer, Embedded Processor or other programmable data processing terminal equipment to produce a machine, make the instruction of carrying out by the processor of computing machine or other programmable data processing terminal equipment produce for realizing the device in the function of flow process of process flow diagram or a plurality of flow process and/or square frame of block scheme or a plurality of square frame appointments.

These computer program instructions also can be stored in energy vectoring computer or the computer-readable memory of other programmable data processing terminal equipment with ad hoc fashion work, make the instruction be stored in this computer-readable memory produce the manufacture that comprises command device, this command device is realized the function of appointment in flow process of process flow diagram or a plurality of flow process and/or square frame of block scheme or a plurality of square frame.

These computer program instructions also can be loaded on computing machine or other programmable data processing terminal equipment, make and carry out the sequence of operations step to produce computer implemented processing on computing machine or other programmable terminal equipment, thereby the instruction of carrying out on computing machine or other programmable terminal equipment is provided for realizing the step of the function of appointment in flow process of process flow diagram or a plurality of flow process and/or square frame of block scheme or a plurality of square frame.

Although described the preferred embodiment of the embodiment of the present application, once those skilled in the art obtain the basic creative concept of cicada, can make other change and modification to these embodiment.So claims are intended to all changes and the modification that are interpreted as comprising preferred embodiment and fall into the embodiment of the present application scope.

Finally, also it should be noted that, in this article, relational terms such as the first and second grades only is used for an entity or operation are separated with another entity or operational zone, and not necessarily requires or imply between these entities or operation the relation of any this reality or sequentially of existing.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thereby make the process, method, article or the terminal device that comprise a series of key elements not only comprise those key elements, but also comprise other key elements of clearly not listing, or also be included as the intrinsic key element of this process, method, article or terminal device.In the situation that not more restrictions, the key element limited by statement " comprising ... ", and be not precluded within process, method, article or the terminal device that comprises described key element and also have other identical element.

Above a kind of text training method that the embodiment of the present application is provided, a kind of text training system, a kind of hobby tagsort method based on the user search behavior, a kind of hobby tagsort system based on the user search behavior, a kind of personalized recommendation method and a kind of personalized recommendation system based on the user search behavior based on the user search behavior, be described in detail, having applied specific case herein sets forth principle and the embodiment of the embodiment of the present application, the explanation of above embodiment is just for helping to understand method and the core concept thereof of the embodiment of the present application, simultaneously, for one of ordinary skill in the art, the thought according to the embodiment of the present application, all will change in specific embodiments and applications, and in sum, this description should not be construed as the restriction to the embodiment of the present application.

Claims

1. a text training method, is characterized in that, comprising:

2. method according to claim 1, is characterized in that, the step that the training text of described each classification of employing generates the training characteristics item of described classification comprises:

In each classification, for each training document, carry out participle;

Carry out participle for each search phrase;

Add up the frequency of occurrence of each participle;

3. method according to claim 1 and 2, is characterized in that, described for the search characteristics item according to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, the step that builds user's disaggregated model comprises:

4. method according to claim 3, is characterized in that, described first condition probability obtains by following formula:

P (t_{j} | c_{k}) = \frac{1 + TF (t_{j}, c_{k})}{| V | + Σ_{j = 1}^{| V |} TF (t_{j}, c_{k})}

Wherein, TF (t _j, x _k) be search characteristics item t _jat classification c _kin frequency of occurrence, | V| be the classification c _kthe total quantity of middle training characteristics item.

5. method according to claim 3, is characterized in that, the described step according to being corresponding search characteristics item configure weights search time comprises:

The half life period of configure user interest;

Adopt following formula to obtain described weight:

w_{i} = \frac{TF (t_{i}) \times TDF (t_{i})}{\sqrt{Σ_{i = 1}^{n} {(TF (t_{i}) \times IDF (t_{i}))}^{2}}} * e^{- \frac{\ln 2}{hl} (d^{today} - d^{lati})}

The half life period that hl is described user interest;

6. method according to claim 3, is characterized in that, described second condition probability obtains by following formula:

P (d_{i} | c_{k}) = Π_{j = 1}^{n} P (t_{j} | c_{k}) * ω_{jk}

7. method according to claim 3, is characterized in that, classification corresponding to the posterior probability of described maximum obtains by following formula:

class (d_{i}) = \arg \max_{1 \leq k \leq | c |} {P (c_{k} | d_{i})} = \arg \max_{1 \leq k \leq | c |} {P (c_{k}) P (d_{i} | c_{k})}

8. the hobby tagsort method based on the user search behavior, is characterized in that, comprising:

9. the personalized recommendation method based on the user search behavior, is characterized in that, comprising:

Determine user's hobby tagsort according to described user ID;

Wherein, described user's hobby tagsort generates in the following way:

10. a text training system, is characterized in that, comprising:

The corpus acquisition module, for obtaining corpus, and,

11. the hobby tagsort system based on the user search behavior, is characterized in that, comprising:

Wherein, described hobby tagsort module comprises following submodule:

Corpus obtains submodule, for obtaining corpus, and,

12. the personalized recommendation system based on the user search behavior, is characterized in that, comprising:

Wherein, described hobby tagsort module comprises following submodule:

The corpus acquisition module, for obtaining corpus, and,