CN103440242A - User search behavior-based personalized recommendation method and system - Google Patents

User search behavior-based personalized recommendation method and system Download PDF

Info

Publication number
CN103440242A
CN103440242A CN2013102600388A CN201310260038A CN103440242A CN 103440242 A CN103440242 A CN 103440242A CN 2013102600388 A CN2013102600388 A CN 2013102600388A CN 201310260038 A CN201310260038 A CN 201310260038A CN 103440242 A CN103440242 A CN 103440242A
Authority
CN
China
Prior art keywords
search
user
classification
characteristics item
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013102600388A
Other languages
Chinese (zh)
Inventor
罗峰
黄苏支
李娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING IZP TECHNOLOGIES Co Ltd
Original Assignee
BEIJING IZP TECHNOLOGIES Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING IZP TECHNOLOGIES Co Ltd filed Critical BEIJING IZP TECHNOLOGIES Co Ltd
Priority to CN2013102600388A priority Critical patent/CN103440242A/en
Publication of CN103440242A publication Critical patent/CN103440242A/en
Pending legal-status Critical Current

Links

Images

Abstract

The embodiment of the invention provides a text training method. The method comprises the following steps of: acquiring a corpus and a user search behavior test document, wherein the corpus comprises a plurality of categories, each category comprises a plurality of training texts, and the user search behavior test document belongs to the category and comprises a user identity, a search phrase and corresponding search time; generating a training characteristic item of the category by using the training text of each category and generating a search characteristic item of the user search behavior test document by using the search phrase; configuring weight according to the search characteristic item and the corresponding search time and constructing a user preference characteristic classification model according to the search characteristic item and the training characteristic item. The embodiment of the invention has the characteristics of easiness in calculation, short calculation time, high calculation accuracy, high matching rate for corresponding recommendation and high success rate for recommendation.

Description

A kind of personalized recommendation method and system based on the user search behavior
Technical field
The embodiment of the present application relates to technical field of data processing, particularly relates to a kind of text training method, a kind of text training system, a kind of hobby tagsort method, a kind of hobby tagsort system, a kind of personalized recommendation method and a kind of personalized recommendation system based on the user search behavior based on the user search behavior based on the user search behavior based on the user search behavior.
Background technology
The fast development of the Internet Internet has been brought people into information society and the age of Internet economy, and development and the personal lifestyle of enterprise all produced to deep effect.Simultaneously, excessive information makes people can't therefrom obtain efficiently the part oneself needed, and the service efficiency of information reduces on the contrary.
When people need to obtain the information of hobby, often to manually be searched for, then filter incoherent information, dish obtains preference information.Obviously, people are unwilling the time of costing a lot of money in the online searching preference information extended endlessly, but wish and information that can like that recommend according to the hobby auto acquisition system of self.Therefore, the effect of calculating user interest preference classification seems very outstanding.
Website channel or the webpage that can access according to the user at present carry out the classification of interest preference, and step is:
(1), channel or webpage are manually marked, mark type of preferences under its audient;
(2), channel or webpage and the number of times thereof of statistic of user accessing, according to the number of times descending sort and obtain top n channel or webpage; Wherein, N is positive integer;
(3), if the user has accessed certain webpage of certain channel, get other webpages in the above-mentioned channel obtained and recommended, or in the above-mentioned webpage obtained, other co-channel (adjective) webpages are recommended.
For the method, the accuracy of its classification depends on the granularity in the website channel division, in the excessive situation of granularity, to the accuracy of classification, can have a negative impact.
Therefore, need at present the urgent technical matters solved of those skilled in the art to be exactly: propose a kind of mechanism of calculating the user interest profile classification, classification accuracy is high, can serve targetedly based on result of calculation, improves the efficiency of service.
The application content
The embodiment of the present application technical matters to be solved is to provide a kind of method of the feature extraction based on user behavior and a kind of method of the personalized recommendation based on user behavior, can the behavioural information based on the user user be divided into to the customer group that hobby is close, and extract different user group's feature, make this feature can distinguish different customer groups, when personalized recommendation, follow feature accordingly to be recommended quickly and efficiently.
Accordingly, the embodiment of the present application also provides a kind of system of the feature extraction based on user behavior and a kind of system of the personalized recommendation based on user behavior, in order to guarantee the implementation and application of said method.
The embodiment of the present application discloses a kind of text training method, comprising:
Obtain corpus, and, user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;
Adopt the training text of each classification to generate the training characteristics item of described classification, and, adopt described search phrase to generate the search characteristics item of described user search performance testing document;
According to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model for the search characteristics item.
Preferably, the step that the training text of described each classification of employing generates the training characteristics item of described classification comprises:
In each classification, for each training document, carry out participle;
Add up the frequency of occurrence of each participle in described classification;
According to frequency of occurrence, described participle is sorted from high to low;
Extract front M participle and the frequency of occurrence thereof of predetermined number, generate the training characteristics item of described classification; Wherein, M is positive integer;
The step that the described search phrase of described employing generates the search characteristics item of described user search performance testing document comprises:
Carry out participle for each search phrase;
Add up the frequency of occurrence of each participle;
According to frequency of occurrence, described participle is sorted from high to low;
Extract the top n participle of predetermined number, generate the search characteristics item of described user search performance testing document; Wherein, N is positive integer.
Preferably, described for the search characteristics item according to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, the step that builds user's disaggregated model comprises:
The training text that calculates each classification accounts for the prior probability of the ratio of all training texts;
Using the frequency of occurrence of the training characteristics item of each classification as the search characteristics item frequency of occurrence in described classification identical with described training characteristics item;
Adopt described frequency of occurrence to calculate the first condition probability that described search characteristics item appears in described each classification;
According to being corresponding search characteristics item configure weights search time;
Adopt described weight and described first condition probability calculation the second condition probability of described user search performance testing document to occur in each classification;
Adopt described prior probability and described second condition probability calculation user search performance testing document to belong to the posterior probability of each classification;
Extract the classification of classification corresponding to maximum posterior probability as user search performance testing document ownership;
According to the classification of the former ownership of described user search performance testing document and the classification of current calculating ownership, judge whether to meet pre-conditioned; If obtain final user preferences tagsort model; If not, return to the described sub-step according to being corresponding search characteristics item configure weights search time.
Preferably, described first condition probability obtains by following formula:
P ( t j | c k ) = 1 + TF ( t j , c k ) | V | + Σ j = 1 | V | TF ( t j , c k )
Wherein, TF (t j, c k) be search characteristics item t jat classification c kin frequency of occurrence, | V| be the classification c kthe total quantity of middle training characteristics item.
Preferably, the described step according to being corresponding search characteristics item configure weights search time comprises:
The training text quantity of described search characteristics item appears in statistics;
The half life period of configure user interest;
Adopt following formula to obtain described weight:
w i = TF ( t i ) × TDF ( t i ) Σ i = 1 n ( TF ( t i ) × IDF ( t i ) ) 2 * e - ln 2 hl ( d today - d lati )
Wherein, TF (t i) be search characteristics item t iat classification c kin frequency of occurrence;
The half life period that hl is described user interest;
D todayfor the current time, d lastfor search characteristics item t ifrom nearest search time of current time;
Figure BDA00003415377400042
; Wherein, n ifor search characteristics item t occurring ithe quantity of training text, the total quantity that N is training text, L is factor of influence.
Preferably, described second condition probability obtains by following formula:
P ( d i | c k ) = Π j = 1 n P ( t j | c k ) * ω jk
Wherein, the total quantity that n is the search characteristics item; P(t j| c k) be described first condition probability, ω jkfor search characteristics item t jweight.
Preferably, classification corresponding to the posterior probability of described maximum obtains by following formula:
class ( d i ) = arg max 1 ≤ k ≤ | c | { P ( c k | d i ) } = arg max 1 ≤ k ≤ | c | { P ( c k ) P ( d i | c k ) }
Wherein, | C| is the total quantity of classification, P (c k| d i) be described posterior probability, P (c k) be described prior probability, P (d i| c k) be described second condition probability.
The embodiment of the present application discloses a kind of hobby tagsort method based on the user search behavior, comprising:
Collect user's original search behavior information, according to described user's original search behavior Information generation user search behavior document; Described user search performance testing document package contains user ID, search phrase and corresponding search time;
Adopt described user search behavior document and user preferences tagsort model to calculate user's hobby tagsort;
Wherein, described user preferences tagsort model generates in the following way:
Obtain corpus, and, user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;
Adopt the training text of each classification to generate the training characteristics item of described classification, and, adopt described search phrase to generate the search characteristics item of described user search performance testing document;
According to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model for the search characteristics item.
The embodiment of the present application discloses a kind of personalized recommendation method based on the user search behavior, comprising:
Obtain user's behavioural information, described user's behavioural information comprises user ID;
Determine user's hobby tagsort according to described user ID;
Adopt described preference categories to generate corresponding personalized recommendation information;
Adopt described personalized recommendation information to be recommended to the active user;
Wherein, described user's hobby tagsort generates in the following way:
Collect user's original search behavior information, according to described user's original search behavior Information generation user search behavior document; Described user search performance testing document package contains user ID, search phrase and corresponding search time;
Adopt described user search behavior document and user preferences tagsort model to calculate user's preference categories;
Wherein, described user preferences tagsort model generates in the following way:
Obtain corpus, and, user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;
Adopt the training text of each classification to generate the training characteristics item of described classification, and, adopt described search phrase to generate the search characteristics item of described user search performance testing document;
According to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model for the search characteristics item.
The embodiment of the present application discloses a kind of text training system, comprising:
The corpus acquisition module, for obtaining corpus, and,
The test document acquisition module, for obtaining user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;
Training characteristics item generation module, generate the training characteristics item of described classification for the training text that adopts each classification, and,
Search characteristics item generation module, generate the search characteristics item of described user search performance testing document for adopting described search phrase;
User preferences tagsort model construction module, for for the search characteristics item according to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model.
The embodiment of the present application discloses a kind of hobby tagsort system based on the user search behavior, comprising:
User search behavior document creation module, for collecting user's original search behavior information, according to described user's original search behavior Information generation user search behavior document; Described user search performance testing document package contains user ID, search phrase and corresponding search time;
Hobby tagsort module, like for adopting described user search behavior document and user the hobby tagsort that the good disaggregated model of feature calculates the user;
Wherein, described hobby tagsort module comprises following submodule:
Corpus obtains submodule, for obtaining corpus, and,
Test document is obtained submodule, for user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;
The training characteristics item generates submodule, generates the training characteristics item of described classification for the training text that adopts each classification, and,
The search characteristics item generates submodule, for adopting described search phrase, generates the search characteristics item of described user search performance testing document;
User preferences tagsort model construction module, for for the search characteristics item according to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model.
The embodiment of the present application discloses a kind of personalized recommendation system based on the user search behavior, comprising:
User behavior acquisition of information module, for obtaining user's behavioural information, described user's behavioural information comprises user ID;
Hobby tagsort determination module, for determining user's hobby tagsort according to described user ID;
Personalized recommendation information production module, generate corresponding personalized recommendation information for adopting described preference categories;
Recommending module, recommended to the active user for adopting described personalized recommendation information;
Wherein, described hobby tagsort determination module comprises following submodule:
User search behavior document generates submodule, for collecting user's original search behavior information, according to described user's original search behavior Information generation user search behavior document; Described user search performance testing document package contains user ID, search phrase and corresponding search time;
Hobby tagsort submodule, like for adopting described user search behavior document and user the hobby tagsort that the good disaggregated model of feature calculates the user;
Wherein, described hobby tagsort module comprises following submodule:
The corpus acquisition module, for obtaining corpus, and,
The test document acquisition module, for user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;
Training characteristics item production module, generate the training characteristics item of described classification for the training text that adopts each classification, and,
Search characteristics item production module, generate the search characteristics item of described user search performance testing document for adopting described search phrase;
User preferences tagsort model construction module, for for the search characteristics item according to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model.
With background technology, compare, the embodiment of the present application comprises following advantage:
The embodiment of the present application adopts NB Algorithm, calculates simply, and computing time is few, the user search behavioural information, according to configure weights search time, is calculated to the classification of user's hobby feature, and accuracy in computation is high, the corresponding matching rate of being recommended is high, recommends success ratio high.
The embodiment of the present application by personalized recommendation information configured in advance in user profile, in the time of user access, can directly obtain recommendation information corresponding to user after obtaining user ID, and need not calculate classification according to its user search behavioural information again, saved system resource and the efficiency that has improved personalized recommendation.
The accompanying drawing explanation
Fig. 1 shows the flow chart of steps of a kind of text training method embodiment that the embodiment of the present application provides;
Fig. 2 shows the flow chart of steps of a kind of hobby tagsort embodiment of the method based on the user search behavior that the embodiment of the present application provides;
Fig. 3 shows the flow chart of steps of the embodiment of the method for a kind of personalized recommendation based on the user search behavior that the embodiment of the present application provides;
Fig. 4 shows the structured flowchart of a kind of text training system embodiment that the embodiment of the present application provides;
Fig. 5 shows the structured flowchart of a kind of hobby tagsort System Implementation based on the user search behavior that the embodiment of the present application provides;
Fig. 6 shows the structured flowchart of the System Implementation of a kind of personalized recommendation based on the user search behavior that the embodiment of the present application provides.
Embodiment
For above-mentioned purpose, the feature and advantage that make the embodiment of the present application can become apparent more, below in conjunction with the drawings and specific embodiments, the embodiment of the present application is described in further detail.
With reference to Fig. 1, show the flow chart of steps of a kind of text training method embodiment of the embodiment of the present application, specifically can comprise the steps:
Step 101, obtain corpus, and,
Step 102, obtain user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;
Be appreciated that, the applied corpus of the embodiment of the present application can be a fairly large standard Chinese text classification test platform, can comprise a plurality of classification in corpus, for example military affairs, physical culture, tourism, health etc., test document under classification can be the various language materials such as news, article.
Test data can be chosen the related data of some user search behaviors, and these users' interest preference can be marked by manual type, these users' interest preference be categorized as known.Certainly, the classification of these interest preferences is corresponding with the classification in corpus.
Described user related data comprises user ID, search phrase and corresponding search time, for example:
User A 2012-10-10 phrase 1 phrase 3 phrase 5 phrases 4
User A 2012-10-11 phrase 6 phrase 5 phrase 3 phrases 2
User A 2012-10-12 phrase 9 phrase 8 phrase 1 phrases 4
User A 2012-10-13 phrase 2 phrase 7 phrase 3 phrases 6
In general, can think that above-mentioned data are one piece of documents that consist of user and search phrase.
Step 103, adopt the training text of each classification to generate the training characteristics item of described classification, and,
Step 104, adopt described search phrase to generate the search characteristics item of described user search performance testing document;
It should be noted that, training characteristics item and search characteristics item, can be word, can be word, can also be phrase, and the embodiment of the present application is not limited this.
Usually, the phrase of training text and user search performance testing document very more than, if the training of directly being classified and classification, calculated amount is very big, therefore before being trained and classifying, under the prerequisite that does not affect classification accuracy, need to reduce the dimension of feature.
In a preferred exemplary of the embodiment of the present application, step 103 can comprise following sub-step:
Sub-step S11, in each classification, carry out participle for each training document;
Sub-step S12, add up the frequency of occurrence of each participle in described classification;
Sub-step S13, sorted described participle from high to low according to frequency of occurrence;
Sub-step S14, extract front M participle and the frequency of occurrence thereof of predetermined number, generates the training characteristics item of described classification; Wherein, M is positive integer.
In a preferred exemplary of the embodiment of the present application, step 104 can comprise following sub-step:
Sub-step S21, carry out participle for each search phrase;
Sub-step S22, add up the frequency of occurrence of each participle;
Sub-step S23, sorted described participle from high to low according to frequency of occurrence;
Sub-step S24, extract the top n participle of predetermined number, generates the search characteristics item of described user search performance testing document; Wherein, N is positive integer.
Being appreciated that participle, is by the expectation information of training text and user search performance testing document, is divided into word, or is divided into word, or be divided into less phrase, and the embodiment of the present invention is not limited this.
Each classification and each user search performance testing document, all can have corresponding training characteristics item and search characteristics item.
For example, while generating the search characteristics item of user search performance testing document, can obtain following formatted data:
User A 2012-10-10 word 1, number of times a word 2, number of times b ..., word m, frequency n
User A 2012-10-11 word 1, number of times a word 3, number of times b ..., word g, number of times k
User A 2012-10-12 word 4, number of times a word 2, number of times b ..., word p, number of times q
User A 2012-10-13 word 3, number of times a word 2, number of times b ..., word w, number of times e
Step 105, according to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model for the search characteristics item.
Be appreciated that after the user preferences characteristic model builds, just can directly use this model to be classified to the document of specified format.
In the application's a preferred embodiment, step 105 can comprise following sub-step:
Sub-step S31, the training text that calculates each classification accounts for the prior probability of the ratio of all training texts;
In a preferred exemplary of the embodiment of the present invention, the prior probability P (c of classification k) can adopt maximal possibility estimation (Maximum Likelihood Estimate is called for short MLE) to be asked for, described prior probability can obtain by following formula:
P ( c k ) = Σ i = 1 | D | P ( c k | d i ) | D |
Wherein, D={d 1, d 2..., d | D|be training text, | the total quantity that D| is training text;
As training text d ibelong to classification c kthe time, P (d i| c k)=1; As training text d ido not belong to classification c kthe time, P (d i| c k)=0.
Sub-step S32, using the frequency of occurrence of the training characteristics item of each classification as the search characteristics item frequency of occurrence in described classification identical with described training characteristics item;
Be appreciated that the frequency of occurrence of search characteristics item in described classification, i.e. characteristic item frequency TF (Term Frequency).The training document of different classification, very big-difference is arranged on the frequency of occurrence of some search characteristics item, therefore the characteristic item frequency is one of important references of user search style of writing test document classification, and the general larger search characteristics item of TF value has higher weight in this classification.
Sub-step S33, adopt described frequency of occurrence to calculate the first condition probability that described search characteristics item appears in described each classification;
In a preferred exemplary of the embodiment of the present invention, the first condition probability can adopt multinomial model to be calculated, and described first condition probability can obtain by following formula:
P ( t j | c k ) = 1 + TF ( t j , c k ) | V | + Σ j = 1 | V | TF ( t j , c k )
Wherein, TF (t j, c k) be search characteristics item t jat classification c kin frequency of occurrence, | V| be the classification c kthe total quantity of middle training characteristics item.
Sub-step S34, according to being corresponding search characteristics item configure weights search time;
In a preferred exemplary of the embodiment of the present invention, sub-step S34 further can comprise the steps:
Sub-step S341, the training text quantity of described search characteristics item appears in statistics;
Sub-step S342, the half life period of configure user interest;
Sub-step S343 adopts following formula to obtain described weight:
w i = TF ( t i ) × IDF ( t i ) Σ i = 1 n ( TF ( t i ) × IDF ( t i ) ) 2 * e - ln 2 hl ( d today - d lati )
Wherein, TF (t i) be search characteristics item t iat classification c kin frequency of occurrence;
The half life period that hl is described user interest;
D todayfor the current time, d lastfor search characteristics item t ifrom nearest search time of current time;
Figure BDA00003415377400123
; Wherein, n ifor search characteristics item t occurring ithe quantity of training text, the total quantity that N is training text, L is factor of influence.
The simple TF that uses is not enough to mean the percentage contribution of a search characteristics item to classification, such as, all may occur that some do not have contributive function word (as: interjection, preposition, conjunction etc.) to classification in user search performance testing document and training document, and the frequency of occurrences of these function words is generally all larger, be that the TF value is also larger, thereby classification is had a negative impact.The search characteristics word that the TF value is high in addition, if the TF value is all higher in all documents, that just it is hard to tell such search characteristics item represents the attribute of which class on earth.
Anti-document frequency (Inverse Document Frequency:IDF) is to think that the training document frequency that this search characteristics item occurs is higher, and its classification information comprised is just lower, also just means that this search characteristics item is more inessential.
Factor of influence L is that those skilled in the art are set according to actual conditions, and the embodiment of the present application is not limited at this.
In a preferred exemplary of the embodiment of the present application, the value of L is 0.01.
Therefore, for above-mentioned limitation, the embodiment of the present application adopts TF and IDF to be combined with, i.e. TFIDF weight, and in addition, the normalization of TFIDF will combine with the time decay factor, finally just obtains the weight of search characteristics item.
D todayfor the current time, be preferably the date.D lastfor search characteristics item t ifrom nearest search time of current time, be preferably the date.Hl means the half life period, after hl days user's interest attenuation half, but be not linear attenuation, the value of hl is that those skilled in the art are determined according to actual, the embodiment of the present invention is not limited at this.
Sub-step S35, adopt described weight and described first condition probability calculation the second condition probability of described user search performance testing document to occur in each classification;
In a preferred exemplary of the embodiment of the present invention, described second condition probability can obtain by following formula:
P ( d i | c k ) = Π j = 1 n P ( t j | c k ) * ω jk
Wherein, the total quantity that n is the search characteristics item; P(t j| c k) be described first condition probability, ω jkfor search characteristics item t jweight.
As time goes on, the weight of user's search phrase can decay thereupon.
Sub-step S36, adopt described prior probability and described second condition probability calculation user search performance testing document to belong to the posterior probability of each classification;
In a preferred exemplary of the embodiment of the present invention, described posterior probability can obtain by following formula:
P ( c k | d i ) = P ( c k ) P ( d i | c k ) P ( d i )
Wherein, P (c k) be described prior probability, P (d i| c k) be described second condition probability, P (d i) for the probability of occurrence of user search performance testing document.
Sub-step S37, extract the classification of classification corresponding to maximum posterior probability as user search performance testing document ownership;
In a preferred exemplary of the embodiment of the present invention, classification corresponding to the posterior probability of described maximum obtains by following formula:
class ( d i ) = arg max 1 ≤ k ≤ | c | { P ( c k | d i ) } = arg max 1 ≤ k ≤ | c | { P ( c k ) P ( d i | c k ) }
Wherein, | C| is the total quantity of classification, P (c k| d i) be described posterior probability, P (c k) be described prior probability, P (d i| c k) be described second condition probability.
In a preferred exemplary of the embodiment of the present invention, due to P (d i) be constant, institute is in the hope of class(d i), as long as can obtain arg max 1 ≤ k ≤ | c | { P ( c k ) P ( d i | c k ) } Get final product.
Sub-step S38, according to the classification of the former ownership of described user search performance testing document and the classification of current calculating ownership, judge whether to meet pre-conditioned; If obtain final user preferences tagsort model; If not, return to sub-step S34.
In a preferred embodiment of the embodiment of the present application, can use recall rate (recall) and degree of accuracy (precision) standard as judgement.
For given user search performance testing document and each classification, to result of calculation, following statistics can be arranged:
Figure BDA00003415377400143
So, recall rate r=a/(a+c), degree of accuracy p=a/(a+b).
F 1index is F 1=2pr/(p+r).
Obtain the F of each class 1value, then obtain F 1mean value, then can be with this user preferences tagsort model of this mean value comprehensive evaluation.
The standard of estimating, pre-conditioned, by those skilled in the art, according to actual conditions, to be set, the embodiment of the present application is not limited at this.
Certainly, above-mentioned judgment mode, just as example, when implementing the embodiment of the present application, can arrange other judgment modes according to actual conditions, and the embodiment of the present application is not limited this
When meeting when pre-conditioned, obtain final user preferences tagsort model; If not, reconfigure weight, reconfigure the half life period hl of user interest, again carry out the text training.
With reference to Fig. 2, show the flow chart of steps of a kind of hobby tagsort embodiment of the method based on the user search behavior of the embodiment of the present application, specifically can comprise the steps:
Step 201, collection user's original search behavior information, according to described user's original search behavior Information generation user search behavior document; Described user search performance testing document package contains user ID, search phrase and corresponding search time;
Step 202, adopt described user search behavior document and user preferences tagsort model to calculate user's hobby tagsort;
Wherein, described step 202 specifically can comprise following sub-step:
Sub-step 2021, obtain corpus, and, user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;
Sub-step 2022, adopt the training text of each classification to generate the training characteristics item of described classification, and, adopt described search phrase to generate the search characteristics item of described user search performance testing document;
Sub-step 2023, according to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model for the search characteristics item.
The embodiment of the present application can be by the user's that collects whole original search behavior information, or the original search behavior information of part, generates user search behavior document, and the embodiment of the present application is not limited this.
In the embodiment of the present application, because the obtain manner of user preferences tagsort model is substantially similar to its obtain manner at text training method embodiment, so that describes is fairly simple, relevant part gets final product referring to the part explanation of text training method embodiment, and the embodiment of the present invention is not described in detail at this.
With reference to Fig. 3, show the flow chart of steps of a kind of personalized recommendation method embodiment based on the user search behavior of the embodiment of the present application, specifically can comprise the steps:
Step 301, obtain user's behavioural information, and described user's behavioural information comprises user ID;
Step 302, determine user's hobby tagsort according to described user ID;
Step 303, adopt described preference categories to generate corresponding personalized recommendation information;
Step 304, adopt described personalized recommendation information to be recommended to the active user;
Wherein, described step 302 can comprise following sub-step:
Sub-step S41, collection user's original search behavior information, according to described user's original search behavior Information generation user search behavior document; Described user search performance testing document package contains user ID, search phrase and corresponding search time;
Sub-step S42, adopt described user search behavior document and user preferences tagsort model to calculate user's preference categories;
Wherein, described sub-step S42 further can comprise following sub-step:
Sub-step S421, obtain corpus, and, user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;
Sub-step S422, adopt the training text of each classification to generate the training characteristics item of described classification, and, adopt described search phrase to generate the search characteristics item of described user search performance testing document;
Sub-step S423, according to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model for the search characteristics item.
In specific implementation, can be in advance by the user by search behavior, by the classification of the correspondence of its division, and set up the incidence relation of described classification and user ID.In the time of user access, can directly obtain classification corresponding to user after obtaining user ID, obtain again this corresponding recommendation information of classifying, and need not be classified according to its user's search behavior again, saved system resource and the high efficiency of personalized recommendation.
In the embodiment of the present application, because the obtain manner of user preferences tagsort model is substantially similar to its obtain manner at text training method embodiment, so that describes is fairly simple, relevant part gets final product referring to the part explanation of text training method embodiment, and the embodiment of the present invention is not described in detail at this.
Be appreciated that, for embodiment of the method, for simple description, therefore it all is expressed as to a series of combination of actions, but those skilled in the art should know, the embodiment of the present application is not subject to the restriction of described sequence of movement, because according to the embodiment of the present application, some step can adopt other orders or carry out simultaneously.Secondly, those skilled in the art also should know, the embodiment described in instructions all belongs to preferred embodiment, and related action and module might not be that the embodiment of the present application is necessary.
With reference to Fig. 4, show the structured flowchart of a kind of text training system embodiment that the embodiment of the present application provides, specifically can comprise as lower module:
Corpus acquisition module 401, for obtaining corpus, and,
Test document acquisition module 402, for obtaining user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;
Training characteristics item generation module 403, generate the training characteristics item of described classification for the training text that adopts each classification, and,
Search characteristics item generation module 404, generate the search characteristics item of described user search performance testing document for adopting described search phrase;
User preferences tagsort model construction module 405, for for the search characteristics item according to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model.
In the application's a preferred embodiment, described training characteristics item production module can comprise following submodule:
First participle submodule, in each classification, carry out participle for each training document;
The first frequency statistics submodule, for adding up the frequency of occurrence of each participle in described classification;
The first sequence submodule, for being sorted described participle from high to low according to frequency of occurrence;
First generates submodule, for front M participle and the frequency of occurrence thereof that extracts predetermined number, generates the training characteristics item of described classification; Wherein, M is positive integer.
In the application's a preferred embodiment, described search characteristics item production module can comprise following submodule:
The second participle submodule, for carrying out participle for each search phrase;
The second frequency statistics submodule, for adding up the frequency of occurrence of each participle;
The second sequence submodule, for being sorted described participle from high to low according to frequency of occurrence;
Second generates submodule, for extracting the top n participle of predetermined number, generates the search characteristics item of described user search performance testing document; Wherein, N is positive integer.
In the application's a preferred embodiment, described user preferences tagsort model construction module can comprise following submodule:
The prior probability calculating sub module, account for the prior probability of the ratio of all training texts for the training text that calculates each classification;
The 3rd frequency statistics submodule, for the frequency of occurrence of the training characteristics item using each classification as the search characteristics item identical with described training characteristics item the frequency of occurrence in described classification;
First condition probability calculation submodule, calculate for adopting described frequency of occurrence the first condition probability that described search characteristics item appears in described each classification;
Weight configuration submodule, for according to being corresponding search characteristics item configure weights search time;
Second condition probability calculation submodule, the second condition probability of described user search performance testing document occurs for adopting described weight and described first condition probability calculation in each classification;
The posterior probability calculating sub module, belong to the posterior probability of each classification for adopting described prior probability and described second condition probability calculation user search performance testing document;
Extract submodule, for extracting the classification of classification corresponding to maximum posterior probability as user search performance testing document ownership;
The judgement submodule, the classification for the classification according to the former ownership of described user search performance testing document and current calculating ownership, judge whether to meet pre-conditioned; If call the acquisition submodule; If not, call and return to submodule;
Obtain submodule, for obtaining final user preferences tagsort model;
Return to submodule, for returning to described weight configuration submodule.
In a preferred exemplary of the embodiment of the present application, described prior probability can obtain by following formula:
P ( c k ) = Σ i = 1 | D | P ( c k | d i ) | D |
Wherein, D={d 1, d 2..., d | D|be training text, | the total quantity that D| is training text;
As training text d ibelong to classification c kthe time, P (d i| c k)=1; As training text d ido not belong to classification c kthe time, P (d i| c k)=0.
In a preferred exemplary of the embodiment of the present application, described first condition probability can obtain by following formula:
P ( t j | c k ) = 1 + TF ( t j , c k ) | V | + Σ j = 1 | V | TF ( t j , c k )
Wherein, TF (t j, c k) be search characteristics item t jat classification c kin frequency of occurrence, | V| be the classification c kthe total quantity of middle training characteristics item.
In a preferred exemplary of the embodiment of the present application, described weight configuration submodule further can comprise following submodule:
The statistics submodule, for adding up the training text quantity that described search characteristics item occurs;
Half life period configuration submodule, for the half life period of configure user interest;
The weight calculation submodule obtains described weight for adopting following formula:
w i = TF ( t i ) × IDF ( t i ) Σ i = 1 n ( TF ( t i ) × IDF ( t i ) ) 2 * e - ln 2 hl ( d today - d lati )
Wherein, TF (t i) be search characteristics item t iat classification c kin frequency of occurrence;
The half life period that hl is described user interest;
D todayfor the current time, d lastfor search characteristics item t ifrom nearest search time of current time;
Figure BDA00003415377400194
; Wherein, n ifor search characteristics item t occurring ithe quantity of training text, the total quantity that N is training text, L is factor of influence.
In a preferred exemplary of the embodiment of the present application, described second condition probability can obtain by following formula:
P ( d i | c k ) = Π j = 1 n P ( t j | c k ) * ω jk
Wherein, the total quantity that n is the search characteristics item; P(t j| c k) be described first condition probability, ω jkfor search characteristics item t jweight.
In a preferred exemplary of the embodiment of the present application, described posterior probability can obtain by following formula:
P ( c k | d i ) = P ( c k ) P ( d i | c k ) P ( d i )
Wherein, P (c k) be described prior probability, P (d i| c k) be described second condition probability, P (d i) for the probability of occurrence of user search performance testing document.
In a preferred exemplary of the embodiment of the present application, classification corresponding to the posterior probability of described maximum can obtain by following formula:
class ( d i ) = arg max 1 ≤ k ≤ | c | { P ( c k | d i ) } = arg max 1 ≤ k ≤ | c | { P ( c k ) P ( d i | c k ) }
Wherein, | C| is the total quantity of classification, P (c k| d i) be described posterior probability, P (c k) be described prior probability, P (d i| c k) be described second condition probability.
With reference to Fig. 5, show the structured flowchart of a kind of hobby tagsort system embodiment based on the user search behavior that the embodiment of the present application provides, specifically can comprise as lower module:
User search behavior document creation module 501, for collecting user's original search behavior information, according to described user's original search behavior Information generation user search behavior document; Described user search performance testing document package contains user ID, search phrase and corresponding search time;
Hobby tagsort module 502, like for adopting described user search behavior document and user the hobby tagsort that the good disaggregated model of feature calculates the user;
Wherein, described hobby tagsort module comprises following submodule:
Corpus obtains submodule, for obtaining corpus, and,
Test document is obtained submodule, for user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;
The training characteristics item generates submodule, generates the training characteristics item of described classification for the training text that adopts each classification, and,
The search characteristics item generates submodule, for adopting described search phrase, generates the search characteristics item of described user search performance testing document;
User preferences tagsort model construction module, for for the search characteristics item according to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model.
With reference to Fig. 6, show the structured flowchart of a kind of personalized recommendation system embodiment based on the user search behavior that the embodiment of the present application provides, specifically can comprise as lower module:
User behavior acquisition of information module 601, for obtaining user's behavioural information, described user's behavioural information comprises user ID;
Hobby tagsort determination module 602, for determining user's hobby tagsort according to described user ID;
Personalized recommendation information production module 603, generate corresponding personalized recommendation information for adopting described preference categories;
Recommending module 604, recommended to the active user for adopting described personalized recommendation information;
Wherein, described hobby tagsort determination module comprises following submodule:
User search behavior document generates submodule, for collecting user's original search behavior information, according to described user's original search behavior Information generation user search behavior document; Described user search performance testing document package contains user ID, search phrase and corresponding search time;
Hobby tagsort submodule, like for adopting described user search behavior document and user the hobby tagsort that the good disaggregated model of feature calculates the user;
Wherein, described hobby tagsort module comprises following submodule:
The corpus acquisition module, for obtaining corpus, and,
The test document acquisition module, for user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;
Training characteristics item production module, generate the training characteristics item of described classification for the training text that adopts each classification, and,
Search characteristics item production module, generate the search characteristics item of described user search performance testing document for adopting described search phrase;
User preferences tagsort model construction module, for for the search characteristics item according to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model.
For system embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, relevant part gets final product referring to the part explanation of embodiment of the method.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is and the difference of other embodiment that between each embodiment, identical similar part is mutually referring to getting final product.
Those skilled in the art should understand, the embodiment of the embodiment of the present application can be provided as method, system or computer program.Therefore, the embodiment of the present application can adopt complete hardware implementation example, implement software example or in conjunction with the form of the embodiment of software and hardware aspect fully.And the embodiment of the present application can adopt the form that wherein includes the upper computer program of implementing of computer-usable storage medium (including but not limited to magnetic disk memory, CD-ROM, optical memory etc.) of computer usable program code one or more.
The embodiment of the present application is to describe with reference to process flow diagram and/or the block scheme of method, terminal device (system) and computer program according to the embodiment of the present application.Should understand can be in computer program instructions realization flow figure and/or block scheme each flow process and/or the flow process in square frame and process flow diagram and/or block scheme and/or the combination of square frame.Can provide these computer program instructions to the processor of multi-purpose computer, special purpose computer, Embedded Processor or other programmable data processing terminal equipment to produce a machine, make the instruction of carrying out by the processor of computing machine or other programmable data processing terminal equipment produce for realizing the device in the function of flow process of process flow diagram or a plurality of flow process and/or square frame of block scheme or a plurality of square frame appointments.
These computer program instructions also can be stored in energy vectoring computer or the computer-readable memory of other programmable data processing terminal equipment with ad hoc fashion work, make the instruction be stored in this computer-readable memory produce the manufacture that comprises command device, this command device is realized the function of appointment in flow process of process flow diagram or a plurality of flow process and/or square frame of block scheme or a plurality of square frame.
These computer program instructions also can be loaded on computing machine or other programmable data processing terminal equipment, make and carry out the sequence of operations step to produce computer implemented processing on computing machine or other programmable terminal equipment, thereby the instruction of carrying out on computing machine or other programmable terminal equipment is provided for realizing the step of the function of appointment in flow process of process flow diagram or a plurality of flow process and/or square frame of block scheme or a plurality of square frame.
Although described the preferred embodiment of the embodiment of the present application, once those skilled in the art obtain the basic creative concept of cicada, can make other change and modification to these embodiment.So claims are intended to all changes and the modification that are interpreted as comprising preferred embodiment and fall into the embodiment of the present application scope.
Finally, also it should be noted that, in this article, relational terms such as the first and second grades only is used for an entity or operation are separated with another entity or operational zone, and not necessarily requires or imply between these entities or operation the relation of any this reality or sequentially of existing.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thereby make the process, method, article or the terminal device that comprise a series of key elements not only comprise those key elements, but also comprise other key elements of clearly not listing, or also be included as the intrinsic key element of this process, method, article or terminal device.In the situation that not more restrictions, the key element limited by statement " comprising ... ", and be not precluded within process, method, article or the terminal device that comprises described key element and also have other identical element.
Above a kind of text training method that the embodiment of the present application is provided, a kind of text training system, a kind of hobby tagsort method based on the user search behavior, a kind of hobby tagsort system based on the user search behavior, a kind of personalized recommendation method and a kind of personalized recommendation system based on the user search behavior based on the user search behavior, be described in detail, having applied specific case herein sets forth principle and the embodiment of the embodiment of the present application, the explanation of above embodiment is just for helping to understand method and the core concept thereof of the embodiment of the present application, simultaneously, for one of ordinary skill in the art, the thought according to the embodiment of the present application, all will change in specific embodiments and applications, and in sum, this description should not be construed as the restriction to the embodiment of the present application.

Claims (12)

1. a text training method, is characterized in that, comprising:
Obtain corpus, and, user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;
Adopt the training text of each classification to generate the training characteristics item of described classification, and, adopt described search phrase to generate the search characteristics item of described user search performance testing document;
According to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model for the search characteristics item.
2. method according to claim 1, is characterized in that, the step that the training text of described each classification of employing generates the training characteristics item of described classification comprises:
In each classification, for each training document, carry out participle;
Add up the frequency of occurrence of each participle in described classification;
According to frequency of occurrence, described participle is sorted from high to low;
Extract front M participle and the frequency of occurrence thereof of predetermined number, generate the training characteristics item of described classification; Wherein, M is positive integer;
The step that the described search phrase of described employing generates the search characteristics item of described user search performance testing document comprises:
Carry out participle for each search phrase;
Add up the frequency of occurrence of each participle;
According to frequency of occurrence, described participle is sorted from high to low;
Extract the top n participle of predetermined number, generate the search characteristics item of described user search performance testing document; Wherein, N is positive integer.
3. method according to claim 1 and 2, is characterized in that, described for the search characteristics item according to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, the step that builds user's disaggregated model comprises:
The training text that calculates each classification accounts for the prior probability of the ratio of all training texts;
Using the frequency of occurrence of the training characteristics item of each classification as the search characteristics item frequency of occurrence in described classification identical with described training characteristics item;
Adopt described frequency of occurrence to calculate the first condition probability that described search characteristics item appears in described each classification;
According to being corresponding search characteristics item configure weights search time;
Adopt described weight and described first condition probability calculation the second condition probability of described user search performance testing document to occur in each classification;
Adopt described prior probability and described second condition probability calculation user search performance testing document to belong to the posterior probability of each classification;
Extract the classification of classification corresponding to maximum posterior probability as user search performance testing document ownership;
According to the classification of the former ownership of described user search performance testing document and the classification of current calculating ownership, judge whether to meet pre-conditioned; If obtain final user preferences tagsort model; If not, return to the described sub-step according to being corresponding search characteristics item configure weights search time.
4. method according to claim 3, is characterized in that, described first condition probability obtains by following formula:
P ( t j | c k ) = 1 + TF ( t j , c k ) | V | + Σ j = 1 | V | TF ( t j , c k )
Wherein, TF (t j, x k) be search characteristics item t jat classification c kin frequency of occurrence, | V| be the classification c kthe total quantity of middle training characteristics item.
5. method according to claim 3, is characterized in that, the described step according to being corresponding search characteristics item configure weights search time comprises:
The training text quantity of described search characteristics item appears in statistics;
The half life period of configure user interest;
Adopt following formula to obtain described weight:
w i = TF ( t i ) × TDF ( t i ) Σ i = 1 n ( TF ( t i ) × IDF ( t i ) ) 2 * e - ln 2 hl ( d today - d lati )
Wherein, TF (t i) be search characteristics item t iat classification c kin frequency of occurrence;
The half life period that hl is described user interest;
D todayfor the current time, d lastfor search characteristics item t ifrom nearest search time of current time;
Figure FDA00003415377300031
; Wherein, n ifor search characteristics item t occurring ithe quantity of training text, the total quantity that N is training text, L is factor of influence.
6. method according to claim 3, is characterized in that, described second condition probability obtains by following formula:
P ( d i | c k ) = Π j = 1 n P ( t j | c k ) * ω jk
Wherein, the total quantity that n is the search characteristics item; P(t j| c k) be described first condition probability, ω jkfor search characteristics item t jweight.
7. method according to claim 3, is characterized in that, classification corresponding to the posterior probability of described maximum obtains by following formula:
class ( d i ) = arg max 1 ≤ k ≤ | c | { P ( c k | d i ) } = arg max 1 ≤ k ≤ | c | { P ( c k ) P ( d i | c k ) }
Wherein, | C| is the total quantity of classification, P (c k| d i) be described posterior probability, P (c k) be described prior probability, P (d i| c k) be described second condition probability.
8. the hobby tagsort method based on the user search behavior, is characterized in that, comprising:
Collect user's original search behavior information, according to described user's original search behavior Information generation user search behavior document; Described user search performance testing document package contains user ID, search phrase and corresponding search time;
Adopt described user search behavior document and user preferences tagsort model to calculate user's hobby tagsort;
Wherein, described user preferences tagsort model generates in the following way:
Obtain corpus, and, user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;
Adopt the training text of each classification to generate the training characteristics item of described classification, and, adopt described search phrase to generate the search characteristics item of described user search performance testing document;
According to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model for the search characteristics item.
9. the personalized recommendation method based on the user search behavior, is characterized in that, comprising:
Obtain user's behavioural information, described user's behavioural information comprises user ID;
Determine user's hobby tagsort according to described user ID;
Adopt described preference categories to generate corresponding personalized recommendation information;
Adopt described personalized recommendation information to be recommended to the active user;
Wherein, described user's hobby tagsort generates in the following way:
Collect user's original search behavior information, according to described user's original search behavior Information generation user search behavior document; Described user search performance testing document package contains user ID, search phrase and corresponding search time;
Adopt described user search behavior document and user preferences tagsort model to calculate user's preference categories;
Wherein, described user preferences tagsort model generates in the following way:
Obtain corpus, and, user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;
Adopt the training text of each classification to generate the training characteristics item of described classification, and, adopt described search phrase to generate the search characteristics item of described user search performance testing document;
According to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model for the search characteristics item.
10. a text training system, is characterized in that, comprising:
The corpus acquisition module, for obtaining corpus, and,
The test document acquisition module, for obtaining user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;
Training characteristics item generation module, generate the training characteristics item of described classification for the training text that adopts each classification, and,
Search characteristics item generation module, generate the search characteristics item of described user search performance testing document for adopting described search phrase;
User preferences tagsort model construction module, for for the search characteristics item according to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model.
11. the hobby tagsort system based on the user search behavior, is characterized in that, comprising:
User search behavior document creation module, for collecting user's original search behavior information, according to described user's original search behavior Information generation user search behavior document; Described user search performance testing document package contains user ID, search phrase and corresponding search time;
Hobby tagsort module, like for adopting described user search behavior document and user the hobby tagsort that the good disaggregated model of feature calculates the user;
Wherein, described hobby tagsort module comprises following submodule:
Corpus obtains submodule, for obtaining corpus, and,
Test document is obtained submodule, for user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;
The training characteristics item generates submodule, generates the training characteristics item of described classification for the training text that adopts each classification, and,
The search characteristics item generates submodule, for adopting described search phrase, generates the search characteristics item of described user search performance testing document;
User preferences tagsort model construction module, for for the search characteristics item according to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model.
12. the personalized recommendation system based on the user search behavior, is characterized in that, comprising:
User behavior acquisition of information module, for obtaining user's behavioural information, described user's behavioural information comprises user ID;
Hobby tagsort determination module, for determining user's hobby tagsort according to described user ID;
Personalized recommendation information production module, generate corresponding personalized recommendation information for adopting described preference categories;
Recommending module, recommended to the active user for adopting described personalized recommendation information;
Wherein, described hobby tagsort determination module comprises following submodule:
User search behavior document generates submodule, for collecting user's original search behavior information, according to described user's original search behavior Information generation user search behavior document; Described user search performance testing document package contains user ID, search phrase and corresponding search time;
Hobby tagsort submodule, like for adopting described user search behavior document and user the hobby tagsort that the good disaggregated model of feature calculates the user;
Wherein, described hobby tagsort module comprises following submodule:
The corpus acquisition module, for obtaining corpus, and,
The test document acquisition module, for user search performance testing document; Wherein, described corpus comprises a plurality of classification, and each classification comprises a plurality of training texts, and described user search performance testing document belongs to described classification, and described user search performance testing document package contains user ID, search phrase and corresponding search time;
Training characteristics item production module, generate the training characteristics item of described classification for the training text that adopts each classification, and,
Search characteristics item production module, generate the search characteristics item of described user search performance testing document for adopting described search phrase;
User preferences tagsort model construction module, for for the search characteristics item according to corresponding configure weights search time, and, according to described search characteristics item and described training characteristics item, build user preferences tagsort model.
CN2013102600388A 2013-06-26 2013-06-26 User search behavior-based personalized recommendation method and system Pending CN103440242A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013102600388A CN103440242A (en) 2013-06-26 2013-06-26 User search behavior-based personalized recommendation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013102600388A CN103440242A (en) 2013-06-26 2013-06-26 User search behavior-based personalized recommendation method and system

Publications (1)

Publication Number Publication Date
CN103440242A true CN103440242A (en) 2013-12-11

Family

ID=49693933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013102600388A Pending CN103440242A (en) 2013-06-26 2013-06-26 User search behavior-based personalized recommendation method and system

Country Status (1)

Country Link
CN (1) CN103440242A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103886090A (en) * 2014-03-31 2014-06-25 北京搜狗科技发展有限公司 Content recommendation method and device based on user favorites
CN103984703A (en) * 2014-04-22 2014-08-13 新浪网技术(中国)有限公司 Mail classification method and device
CN104462364A (en) * 2014-12-08 2015-03-25 百度在线网络技术(北京)有限公司 Search recommendation method and device
CN104809236A (en) * 2015-05-11 2015-07-29 苏州大学 Microblog-based user age classification method and Microblog-based user age classification system
CN105183781A (en) * 2015-08-14 2015-12-23 百度在线网络技术(北京)有限公司 Information recommendation method and apparatus
CN105761564A (en) * 2016-04-21 2016-07-13 刘福金 Intelligent teaching system for ideological and political education
CN105955663A (en) * 2016-04-26 2016-09-21 深圳市八零年代网络科技有限公司 User behavior-based message pushing method and apparatus
CN106021379A (en) * 2016-05-12 2016-10-12 深圳大学 Personalized recommendation method and system based on user preference
CN106354709A (en) * 2015-07-15 2017-01-25 富士通株式会社 Analysis device, server and method of user attribute information
CN107024217A (en) * 2016-02-01 2017-08-08 北京迈维出行科技有限公司 The method of the route planning of Intercity Transportation, apparatus and system
CN108009877A (en) * 2017-11-24 2018-05-08 阿里巴巴集团控股有限公司 Information mining method and device
CN109597973A (en) * 2017-09-30 2019-04-09 阿里巴巴集团控股有限公司 A kind of recommendation, generation method and the device of official documents and correspondence information
CN109636481A (en) * 2018-12-19 2019-04-16 未来电视有限公司 User's portrait construction method and device towards domestic consumer
CN110110267A (en) * 2018-01-25 2019-08-09 北京京东尚科信息技术有限公司 Extract characteristics of objects, the method and apparatus of object search
CN111538815A (en) * 2020-04-27 2020-08-14 北京百度网讯科技有限公司 Text query method, device, equipment and storage medium
US20200349627A1 (en) * 2014-12-18 2020-11-05 Ebay Inc. Expressions of users interest
CN117150143A (en) * 2023-10-30 2023-12-01 华能信息技术有限公司 Service method and system based on industrial Internet platform

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719145A (en) * 2009-11-17 2010-06-02 北京大学 Individuation searching method based on book domain ontology
US20110029509A1 (en) * 2009-07-30 2011-02-03 Microsoft Corporation Best-Bet Recommendations
CN101968802A (en) * 2010-09-30 2011-02-09 百度在线网络技术(北京)有限公司 Method and equipment for recommending content of Internet based on user browse behavior
CN103123653A (en) * 2013-03-15 2013-05-29 山东浪潮齐鲁软件产业股份有限公司 Search engine retrieving ordering method based on Bayesian classification learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110029509A1 (en) * 2009-07-30 2011-02-03 Microsoft Corporation Best-Bet Recommendations
CN101719145A (en) * 2009-11-17 2010-06-02 北京大学 Individuation searching method based on book domain ontology
CN101968802A (en) * 2010-09-30 2011-02-09 百度在线网络技术(北京)有限公司 Method and equipment for recommending content of Internet based on user browse behavior
CN103123653A (en) * 2013-03-15 2013-05-29 山东浪潮齐鲁软件产业股份有限公司 Search engine retrieving ordering method based on Bayesian classification learning

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103886090A (en) * 2014-03-31 2014-06-25 北京搜狗科技发展有限公司 Content recommendation method and device based on user favorites
CN103984703B (en) * 2014-04-22 2017-04-12 新浪网技术(中国)有限公司 Mail classification method and device
CN103984703A (en) * 2014-04-22 2014-08-13 新浪网技术(中国)有限公司 Mail classification method and device
CN104462364A (en) * 2014-12-08 2015-03-25 百度在线网络技术(北京)有限公司 Search recommendation method and device
CN104462364B (en) * 2014-12-08 2018-09-07 百度在线网络技术(北京)有限公司 Method and device is recommended in search
US20200349627A1 (en) * 2014-12-18 2020-11-05 Ebay Inc. Expressions of users interest
US11823244B2 (en) * 2014-12-18 2023-11-21 Ebay Inc. Expressions of users interest
CN104809236A (en) * 2015-05-11 2015-07-29 苏州大学 Microblog-based user age classification method and Microblog-based user age classification system
CN104809236B (en) * 2015-05-11 2018-03-27 苏州大学 A kind of age of user sorting technique and system based on microblogging
CN106354709A (en) * 2015-07-15 2017-01-25 富士通株式会社 Analysis device, server and method of user attribute information
CN105183781A (en) * 2015-08-14 2015-12-23 百度在线网络技术(北京)有限公司 Information recommendation method and apparatus
CN105183781B (en) * 2015-08-14 2018-11-20 百度在线网络技术(北京)有限公司 Information recommendation method and device
CN107024217A (en) * 2016-02-01 2017-08-08 北京迈维出行科技有限公司 The method of the route planning of Intercity Transportation, apparatus and system
CN107024217B (en) * 2016-02-01 2019-06-11 北京迈维出行科技有限公司 The method, apparatus and system of the route planning of Intercity Transportation
CN105761564A (en) * 2016-04-21 2016-07-13 刘福金 Intelligent teaching system for ideological and political education
CN105761564B (en) * 2016-04-21 2018-07-24 潍坊科技学院 A kind of ideological and political education intelligent tutoring system
CN105955663A (en) * 2016-04-26 2016-09-21 深圳市八零年代网络科技有限公司 User behavior-based message pushing method and apparatus
CN106021379A (en) * 2016-05-12 2016-10-12 深圳大学 Personalized recommendation method and system based on user preference
CN106021379B (en) * 2016-05-12 2017-08-25 深圳大学 A kind of personalized recommendation method and its system based on user preference
CN109597973A (en) * 2017-09-30 2019-04-09 阿里巴巴集团控股有限公司 A kind of recommendation, generation method and the device of official documents and correspondence information
CN108009877A (en) * 2017-11-24 2018-05-08 阿里巴巴集团控股有限公司 Information mining method and device
CN108009877B (en) * 2017-11-24 2021-10-15 创新先进技术有限公司 Information mining method and device
CN110110267A (en) * 2018-01-25 2019-08-09 北京京东尚科信息技术有限公司 Extract characteristics of objects, the method and apparatus of object search
CN109636481A (en) * 2018-12-19 2019-04-16 未来电视有限公司 User's portrait construction method and device towards domestic consumer
CN111538815A (en) * 2020-04-27 2020-08-14 北京百度网讯科技有限公司 Text query method, device, equipment and storage medium
CN111538815B (en) * 2020-04-27 2023-09-22 北京百度网讯科技有限公司 Text query method, device, equipment and storage medium
CN117150143A (en) * 2023-10-30 2023-12-01 华能信息技术有限公司 Service method and system based on industrial Internet platform
CN117150143B (en) * 2023-10-30 2024-01-26 华能信息技术有限公司 Service method and system based on industrial Internet platform

Similar Documents

Publication Publication Date Title
CN103440242A (en) User search behavior-based personalized recommendation method and system
McKenzie et al. Weighted multi-attribute matching of user-generated points of interest
Vargas-Govea et al. Effects of relevant contextual features in the performance of a restaurant recommender system
CN104239338A (en) Information recommendation method and information recommendation device
CN103020851B (en) A kind of metric calculation method supporting comment on commodity data multidimensional to analyze
US20180300296A1 (en) Document similarity analysis
CN104866474A (en) Personalized data searching method and device
CN102215300A (en) Telecommunication service recommendation method and system
CN107193883B (en) Data processing method and system
US20100106669A1 (en) Journal Manuscript Submission Decision Support System
CN110334356A (en) Article matter method for determination of amount, article screening technique and corresponding device
Wu et al. Micro-blog in China: identify influential users and automatically classify posts on Sina micro-blog
Ranjan et al. Comparative sentiment analysis of app reviews
CN112528007A (en) Confirmation method and confirmation device for target enterprise of business inviting project
CN104199938A (en) RSS-based agricultural land information sending method and system
Xie et al. A probabilistic recommendation method inspired by latent Dirichlet allocation model
US20140012853A1 (en) Search device, search method, search program, and computer-readable memory medium for recording search program
KR101575779B1 (en) Program rating prediction method and apparatus, and system based on sentiment analysis of viewers comments
Soo Kim Text recommender system using user's usage patterns
Lin Association rule mining for collaborative recommender systems.
KR101708440B1 (en) Adaptive item recommender method combined latent dirichlet allocation and active learning in distributed environment
Jiang et al. Durable product review mining for customer segmentation
Jatowt et al. Predicting importance of historical persons using Wikipedia
CN105095324A (en) User classification apparatus, user classification method and electronic device
KR101549188B1 (en) Apparatus and method for measuring brand image

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned
AD01 Patent right deemed abandoned

Effective date of abandoning: 20171103