CN102262653A

CN102262653A - Label recommendation method and system based on user motivation orientation

Info

Publication number: CN102262653A
Application number: CN 201110154353
Authority: CN
Inventors: 李瑞轩; 靳延安; 文坤梅; 辜希武; 李玉华
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2011-06-09
Filing date: 2011-06-09
Publication date: 2011-11-30
Anticipated expiration: 2031-06-09
Also published as: CN102262653B

Abstract

The invention provides a label recommendation method based on a user motivation orientation. The method provided by the invention comprises the following steps of: calculating a user motivation orientation, a motivation orientation of each labeled resource and a motivation orientation of a resource to be labeled according to a user triple; selecting a resource which is similar with the motivation orientation of the resource to be labeled from the labeled resources to obtain a non-user-depended similar resource; selecting a resource which is similar with the user motivation orientation from the non-user-depended similar resource to obtain a label recommendation candidate resource; combining all labels in the label recommendation candidate resource to obtain a combined label set; calculating a recommendation importance of each label in the combined label set; finally, carrying out the label recommendation according to the recommendation importance of each label from big to small. The method provided by the invention can recognize the calculating of the network information resource labeled by the user and then recommends a list which accords with the user intention and is composed of multiple labels to the user. The invention also provides a label recommendation system based on the method.

Description

A kind of based on tendentious label recommend method of user's motivation and system

Technical field

The invention belongs to the Web information resources and handle and utilize the field, be specifically related to recommend the method for label for the Web information resources and based on the commending system of the method based on user's motivation tendentiousness.

Background technology

Growing along with Internet, network information resource is just with the unthinkable speed increment of people, and the appearance of Web2.0 makes that this growth is swifter and more violent.In Web2.0, the internet system is from bottom to top leading by users' group wisdom and strength by top-down, minor resource effector centralized control, dominant transition originally.The user still is the producer of network information resource except being the viewer of network information resource simultaneously.Though this specific character of Web2.0 user's create contents has been enriched the source of information, quickened the diffusion of information, also cause information overload simultaneously, searched problems such as load increases the weight of, the reduction of information quality.So, user's cover the sky and the earth mass network information resources and obtain suitable and high-quality information how apace, at low cost, effectively and just become the impassable great research topic of pendulum of organization and management how at leisure in our front.

Desirable network information resource tissue should customer-centric, makes full use of the experience of emerging technology and people accumulation, and organizational framework should possess advantages of high practicability and ease for use.Under the Web2.0 environment, social tag system just plays an important role as a kind of very effective method of network information resource tissue.As organizational form, it is different with traditional controlled hierarchical classification system top-down, rigidity, society's tag system system has following three advantages: (1) social label is network information resource user generation when Internet resources marks, identical social label has formed new classification through after compiling, and it is bottom-up; (2) social label is not controlled by the expert, and the user can use any speech to mark voluntarily, has high dirigibility, ease for use and subjective awareness, and Internet resources can " flexibility " be under the jurisdiction of a plurality of popular classification.(3) in social tag system, the user can mark Internet resources from a plurality of dimensions, many levels.Therefore, its structure is non-level.

Yet when possessing numerous advantages, there is shortcoming equally in the label mode, mainly find expression in following two aspects: (1) most of social tag systems allow user's input label voluntarily, this operating mode makes the user be easy to control the mark behavior, but, the randomness of mark has more noise in the label because also having caused, wrong assembly, ambiguity and the User Defined label that does not have a practical significance usually are full of wherein, and this practicality to label has caused not little obstacle.For this reason, some social tag system has to provide some governing principles for the user specially.(2) the sparse problem of data, because it is a kind of emerging information organization mode that label type is browsed, also do not obtain very using widely, especially in Chinese resource, adopt the Internet resources of this organizational form very rare, on the other hand, the user is unaccustomed to still to be that Internet resources add a large amount of labels, thereby makes that existing label resources is very rare on the network.

In recent years, under this actual demand, the label recommended technology has been subjected to the extensive concern of academia and internet enterprise just.It is that network information resource to be marked provides a series of high-quality labels as the candidate by content and user's mark history, the pass explicit or implicit expression of investigating, analyze, excavating network information resource exactly that label is recommended.The purpose of recommending mainly is: (1) simplifies marking program, is user-friendly to, thereby increases the availability and the viscosity of social tag system.(2) quality of raising label reduces situations such as wrong assembly, ambiguity, improves the effect of label in organization of information resources, retrieval, utilization and discovery.(3) structure of change Label space makes Label space stablize faster and convergence, and then emerges in large numbers semanteme.

At present, both at home and abroad the comparatively ripe social label commending system of some development has been arranged at the various network information resource, these systems all in organization of information resources, retrieve, share and aspect such as discovery has played important effect.These systems comprise: commodity are carried out the Amazon that label is recommended, web page resources is carried out the Delicious that label is recommended, to picture carry out Flickr that label recommends, to scientific paper carry out Bibsonomy that label recommends, for the books film recommend the bean cotyledon net of label, for the potato net that provides video to share to recommend label etc.The label commending system that had existed already mainly adopts the technology of Recommendations traditional in the e-commerce system, mainly comprises: content-based recommended technology, based on the recommended technology of collaborative filtering, based on the recommended technology of correlation rule and the hybrid technology of these technology.Recommending on the foundation, these traditional recommended technologies or the content that is based on resource itself are recommended, or recommend based on the historical results of user's mark.On proposed algorithm, great majority are the algorithms that adopt data mining or machine learning.These traditional label recommended technologies have solved tissue, the classification of information overload and information resources, the problem of retrieval to a certain extent, but also very undesirable on effect, especially can not recommend to satisfy the label of customer information requirement.

Summary of the invention

In order to satisfy user's information requirement, use the motivation of social tag system from the user, discern its information object, for it recommends social more accurately label, the invention provides a kind ofly based on the tendentious recommendation stamp methods of user's motivation, this method can be recommended the tabulation that a plurality of labels that one of user meets user view are formed.The present invention also provides the label commending system based on this method simultaneously.

The present invention adopts following technical scheme to realize: the invention provides a kind ofly based on the tendentious label recommend method of user's motivation, may further comprise the steps:

(1) according to user's tlv triple, calculating user's motivation tendentiousness, the motivation tendentiousness that each has marked resource and the motivation tendentiousness of resource to be marked; Described user's tlv triple comprises user's mark history, marks resource and corresponding mark and resource to be marked and corresponding mark;

(2) marking the selection resource similar in the resource, the resource that obtains is called non-user relies on similar resource to the motivation tendentiousness of resource to be marked;

(3) rely on the selection resource similar in the similar resource non-user, the resource that obtains is called label recommended candidate resource to user's motivation tendentiousness;

(4) all labels in the label recommended candidate resource are merged, obtain merging tally set;

(5) calculate the recommendation importance that merges each label in the tally set;

(6) from big to small, carrying out label recommends according to the recommendation importance of each label.

It is a kind of based on the tendentious label commending system of user's motivation that the present invention also provides, and comprises motivation tendentiousness computing module, selects non-user to rely on similar resource module, selects label recommended candidate resource module, label to merge module, recommends importance computing module and output module;

Motivation tendentiousness computing module is used to calculate user's motivation tendentiousness, motivation tendentiousness that each has marked resource and the motivation tendentiousness of resource to be marked;

Selecting non-user to rely on similar resource module is used for obtaining non-user and relying on similar resource marking the resource selection resource similar to the motivation tendentiousness of resource to be marked;

Select label recommended candidate resource module to be used for relying on similar resource and select the resource similar, obtain label recommended candidate resource to user's motivation tendentiousness non-user;

Label merges module and is used for all labels of label recommended candidate resource are merged, and obtains merging tally set;

Recommend the importance computing module to be used for calculating the recommendation importance that merges each label of tally set;

Output module is used for according to the recommendation importance of each label from big to small, carries out label and recommends.

The starting point of label recommend method is the content of resource itself or the same existing structure of label etc. in the existing social tag system, and method proposed by the invention is directly from the metastable mark motivation of user tendentiousness, by obtaining user's mark motivation tendentiousness, and carry out label according to this mark motivation tendentiousness and recommend, the label of being recommended more meets user's intention, the better effects if of recommendation.The present invention can discern the motivation that the user marks network information resource, the discovery of this motivation provides good design reference to the tag design commending system, and can produce directive function to the study of body in the Label space, the semanteme stable, social label that is more conducive to social label construction emerges in large numbers.

Description of drawings

Fig. 1 is based on the tendentious label recommended flowsheet of user's motivation;

Fig. 2 is a special tag utilization rate inquiry synoptic diagram of the present invention;

Fig. 3 is the label cloud atlas that the present invention describes motivation tendentiousness user;

Fig. 4 is a label commending system module map of the present invention.

Embodiment

The present invention is further detailed explanation below in conjunction with accompanying drawing and example.

Motivation tendentiousness described in the present invention mainly contains two classes, the motivation of promptly classifying tendency and description motivation tendency, and their characteristics are as shown in table 1.

The characteristics of table 1 classification motivation tendency and description motivation tendency

	Classification motivation tendency	Description motivation tendency
			Purpose	Browse after being convenient to	Inquiry and retrieval after being convenient to
The resource tag rate	Low	High
			The vocabulary size	Limited	Infinitely
Situation appears in synonym	Few	Many
			Label is from the resource title	Few	Many
Change the cost of label	Greatly	Little

Concretely, the mark user with classification motivation tendency to use the purpose of label be will provide one for the resource that is marked to browse help function.Therefore, the user of classification motivation tendency wishes to set up a stable vocabulary according to the preference of oneself.For the ease of browsing, yes does not simply more have redundancy good more more for this vocabulary, so the mark user tends to avoid using the speech with identical semanteme, and can select the speech of clear and easy to understand easy note for use.For example,, have in user's the vocabulary of classification motivation tendency and often only can have " car " when mark during an automobile, and can not use " automobile ", speech that " vehicle " etc. has equivalent.Like this, from the result of mark, such vocabulary is more as a semantic classification system.Certainly, the same with traditional categorizing system, the classification cost that changes categorizing system is bigger.

Having the mark user who describes the motivation tendency, to use the purpose of label be will describe the content of the resource that is marked accurately so that inquiry in the future and retrieval.In order to support inquiry and the purpose of browsing better, so user's vocabulary just may be introduced the speech of many that be of little use and synonyms, for example when the description automobile, " car ", " automobile ", " vehicle " can appear in the vocabulary.In addition, the user often wishes to go to describe resource from many aspects, does not limit to the quantity of the speech of use.Also may be along with the development of cognition in the mark process, the speech of same meaning may change.So the vocabulary of describing the user of motivation tendency is an opening, dynamic vocabulary.

Among the present invention, adopt following symbol to come correlation parameter in the mark society tag system, u represents a user, and r represents a resource, and such as webpage, U represents to mark the set of all user u of resource r, and R represents the user, and all have marked the set of resource, R _uAll resources that expression user u has marked, | R _u| expression set R _uMiddle number of tags, t represents any one label, t ₁, t ₂..., t _nAll represent some concrete labels, T represents the user, and all have marked the tag set that resource is endowed, T _uExpression user u the set of used all labels, | T _u| expression set T _uMiddle number of tags, T _rRepresent that all users give all labels of resource r, | T _r| expression set T _rIn number of tags; R _u(t) represent the resource that user u uses label t to mark.

The invention provides a kind ofly, may further comprise the steps based on the tendentious label recommend method of user's motivation:

(1) according to user's tlv triple, calculating user's motivation tendentiousness, the motivation tendentiousness that each has marked resource and the motivation tendentiousness of resource to be marked; Described user's tlv triple comprises user's mark history, marks resource and corresponding mark and resource to be marked and corresponding mark; The mark of described resource correspondence comprises the mark of all users to resource.

The motivation tendentiousness of user u can be used vector M _uExpression, that is:

M _u＝(TRR _u，LFTU _u，TRCE _u，TSOF _u，STR _u) (1)

Wherein, TRR _u, LFTU _u, TRCE _u, TSOF _u, STR _uBe the tendentious 5 kinds of metric of motivation, its implication and calculating are as follows respectively:

A) user's the average label rate of the resource of mark (Tags/Resources Ratio, TRR)

User's the average label rate of the resource of mark TRR _uBe used to weigh the user for each resource marks employed average number of labels, the ratio of the sum of the resource that the size that it equals user's vocabulary and user are marked is as formula (2).

TRR _u＝e-|T _u|/|R _u| (2)

The user who describes the motivation tendency needs for describing, and can select various speech to describe resource, in theory, is not subjected to the restriction of quantity.For the needs of browsing, the user of classification motivation tendency tends to select less speech to mark resource.Therefore, its vocabulary is limited.In general, the user of classification motivation tendency is obviously low than the user who describes the motivation tendency in the score of this characteristic measure.That is to say user's TRR _uBe worth more little, then this user may tend to more the classification; TRR _uBe worth greatly more, then this user may tend to describe more.

B) user's low frequency tags utilization rate (Lower Frequency Tag Ratio, LFTR)

In order to add up label, use LFTR in situation about using _uPortray the user and use the degree of those low frequency tags, the quantity that it equals low frequency tags takies the ratio of total number of labels in the vocabulary of family.So-called low frequency tags is exactly the label that often do not use of user for the minor resource mark, promptly refers to those mark access times label seldom.Its tolerance is calculated with formula (3):

Wherein t represents any one label, t _MaxFor this user uses the most frequent label, | R (t) |, | R (t _Max) | be respectively and comprise label t and the most frequent label t _MaxResource quantity, n is for comprising the most frequent label t _MaxNumber of resources p/one on round, 0＜p≤100,

Be the set of user u low frequency tags, promptly be included in the no more than n of the number of resources that label marked in this set,

Number for user u low frequency tags.

Obvious 0≤LFTR _u≤ 1.Work as LFTR _u=1 o'clock, the number of times that all labels of expression user use was no more than n time, these labels all be from different angles, different sides describes resource, and the user does not mind the use low-frequency word.Certainly, can think that this user has the motivation of description tendency.Work as LFTR _u=0 o'clock, the expression user seldom used low frequency tags, thinks that low frequency tags is unfavorable for classified browse.If the introducing low frequency tags equals to introduce noise, destruction is kept consistent possibility of classifying, so this user takes notice of the use low-frequency word very much.Certainly, can think that this user has classification motivation tendency.

C) the relative conditon entropy of each label of user (Tag Relative Conditional Entropy, TRCE)

For the user of classification motivation tendency, they wish that label has maximum discrimination, only in this way browse just full blast.Therefore, selecting the user process of label to liken to is the process of information coding.Information coding is to make the information entropy maximum of sign indicating number, and the user to select to have the useful in other words label of discrimination be exactly to make the information entropy of label want maximum.In other words, the user with classification tendency thinks that exactly all labels are all identical on frequency of utilization, so just may make the information entropy maximum of label also just more to help browsing of user.On the contrary, the user who describes the motivation tendency is to this and lose interest in.

When the user uses the label coding resource, its conditional entropy H _u(R|T) just reflected the validity of this cataloged procedure, can calculate according to formula (4):

H_{u} (R | T) = - \underset{r &Element; R}{Σ} \underset{t &Element; T}{Σ} p (r, t) \log_{2} p (r | t) - - - (4)

Wherein, (r t) is the joint distribution probability of label t on resource r to p, and p (r|t) is the probability of label t mark resource r.

In the calculating of label information entropy, use label and resource as stochastic variable.Conditional entropy can be interpreted as the uncertainty that resource keeps label, mainly is subjected to the influence of the size of the quantity of resource and vocabulary.In order to distinguish the difference between the user, conditional entropy is carried out normalized to keep coded message, make the conditional entropy H of reality observation _u(R|T) all with desirable conditional entropy H _Opt(R|T) compare.When each label all has same discrimination, when promptly the p of all labels (r|t) is the same, can obtain desirable conditional entropy H _Opt(R|T), at this moment conditional entropy is also maximum.Therefore, can be with ideally conditional entropy as normalized factor, on this basis, the relative conditon entropy of label calculates as formula (5)

{TRCE}_{u} = \frac{H_{opt} (R | T) - H_{u} (R | T)}{H_{opt} (R | T)} - - - (5)

Obviously, 0≤TRCE _u≤ 1.Work as TRCE _uNear 0, the conditional entropy that the user is described is more near ideal situation more, and also care label is tending towards the equiprobability distribution.At this moment, the separating capacity of label is very strong, can judge the very big user that may belong to classification motivation tendency of user.Otherwise describe the user that motivation is inclined to very big may belonging to.

D) the semantic repetition factor of user's label (Tag Semantic Overlap Factor, TSOF)

User for having classification motivation tendency wishes that each synonym in own vocabulary is the least possible, can improve the efficient of browsing like this.But have the user who describes the motivation tendency for one, they also are indifferent to these, and are on the contrary, can comprehensive more description resource good more.Therefore, can weigh the motivation tendency that the user has, calculate as formula (6) by calculating the employed label similarity of user.

Wherein, sim (t _i, t _j) be two label t _i, t _jBetween similarity, adopt formula (7) to calculate.F (t in the formula _i), f (t _j) be respectively label t _i, t _jAt the number of times that the tally set of user u occurs, f (t _i, t _j) be at the common number of times that occurs of user's tally set.

sim (t_{i}, t_{j}) = \frac{\max (\log f (t_{i}), \log f (t_{j})) - lo gf (t_{i}, t_{j})}{\log N - \min (\log f (t_{i}), \log f (t_{j}))} - - - (7)

Wherein N is total number of speech in user's the tally set.

When the similarity of all labels of user near 0 the time, TSOF _uApproach 0, illustrate that this user's motivation tendentiousness is classification tendentiousness; Otherwise the motivation tendentiousness that the user is described is for describing tendentiousness.

E) user's special tag utilization rate (Special Tags Ratio, STR)

By social tag system label statistical study is found: when the user uses interrogative adverbs such as when, what, how to mark, remaining tagging user tends to be selected from the title of resource, through compare of analysis, the intention of these users' description content of pages is (as Fig. 2) very obviously.Defining these interrogative adverb labels is special tag.Simultaneously, as can be seen from Figure 3, these users mark the motivation of other resources and also tend to descriptive motivation, obviously contain other features (as the semantic repetition factor of label) of descriptive motivation tendency in other mark records as the user " breneaux " in article one record.

The utilization rate of special speech also can be used as one of tendentious discriminant criterion of user's motivation when therefore, marking resource.If a user's special speech utilization rate is high more, he has that to describe motivation tendentiousness just high more so.Otherwise the description motivation tendentiousness that he has is just low more.The special tag utilization rate is measured with formula (8).

STR _u＝card(t∈T _str)/|T _u| (8)

Wherein, T _Str=who, and when, what, when, where, how, howto ... be the special tag set, its set for all interrogative adverbs in the English can be set.Card (t ∈ T _Str) be the employed T that is included in of user u _StrIn the number of label, comprise repeat count.Obvious 0≤STR _u≤ 1, work as STR _u=card (t ∈ T _Str)/| T _u| more near 1, user u may have the motivation of description tendency more; If STR _uNear 0, user u may have classification motivation tendency more more.

Each has marked the motivation tendentiousness of resource and also can represent with the vector of 5 kinds of motivation metric; For a certain resource r, its motivation tendentiousness can be used vector M _rExpression, that is:

M _r＝(TRR _r，LFTU _r，TRCE _r，TSOF _r，STR _r) (9)

Wherein, TRR _r, LFTU _r, TRCE _r, TSOF _r, STR _rBe the tendentious 5 kinds of metric of the motivation of resource, its implication and calculating are as follows respectively:

A) marked resource the average label rate of each user (Tags/Resources Ratio, TRR)

Marked the average label rate of each user TRR of resource _rBe used to weigh the number of labels that each user of having marked resource on average uses, it equals the number of all labels that resource uses and the ratio of the number of users of this resource of mark, as formula (10).

TRR _r＝e-|T _r|/|U _r| (10)

Wherein, | T _r| represent that all users give all number of tags of resource r, | U _r| all number of users that expression is given label to resource r.In general, the resource of classification motivation tendency is obviously low than the resource of describing the motivation tendency in the score of this characteristic measure.That is to say the TRR of resource _rBe worth more little, then this resource may tend to more the classification; TRR _rBe worth greatly more, then this resource may be tended to describe more.

B) marked resource the low frequency tags utilization rate (Lower Frequency Tag Ratio, LFTR)

In order to add up label, use LFTR in situation about using _rPortray the usage degree of the low frequency tags of giving resource, the quantity that it equals low frequency tags accounts for the ratio of total number of labels in all vocabularys of giving this resource.So-called low frequency tags is exactly often not give the label of this resource.Its tolerance is calculated with formula (11):

Wherein t represents any one label of resource r, t _Max' be the most used label of resource r, | R _u(t) | for having used the number of users of label t, | R _u(t _Max') | for having used the most frequent label t _Max' number of users, m is the most used label t _Max' number of users q/one on round, 0＜q≤100,

Be the set of the low frequency tags of resource r, the number of users that promptly uses the label in this set is smaller or equal to m,

Number for the low frequency tags of resource r.

C) marked each label of resource the relative conditon entropy (Tag Relative Conditional Entropy, TRCE)

For the resource of classification motivation tendency, the label that institute gives them should have maximum discrimination, only in this way when the user browses, and the ability full blast.Therefore, can to liken to be the process of information coding to the process of giving resource tag.The information coding is the information entropy maximum of feasible sign indicating number most effectively, and selecting that the label of discrimination is arranged is exactly to make the information entropy of label want maximum.When the user uses the label coding resource, its conditional entropy H _r(U|T) just reflected the validity of this cataloged procedure, can calculate according to formula (12):

H_{r} (U | T) = - \underset{u &Element; U}{Σ} \underset{t &Element; T}{Σ} p {(u, t) \log}_{2} p (u | t) - - - (12)

Wherein, (u t) is the joint distribution probability of user u use label t to p, and p (u|t) is the probability that user u uses label t mark resource r.When each label all has same discrimination, when promptly the p of all labels (u|t) is the same, can obtain desirable conditional entropy H _Ropt(U|T), at this moment conditional entropy is also maximum.

In the calculating of label information entropy, use label and user as stochastic variable.Conditional entropy can be interpreted as the uncertainty that the user uses label, mainly is subjected to the influence of the size of number of users and vocabulary.In order to distinguish the difference between the resource, conditional entropy is carried out normalized to keep coded message, make the conditional entropy H of reality observation _r(U|T) all with desirable conditional entropy H _Ropt(U|T) compare.Therefore, can be with ideally conditional entropy as normalized factor, on this basis, the relative conditon entropy of label calculates as formula (13)

{TRCE}_{r} = \frac{H_{ropt} (U | T) - H_{r} (U | T)}{H_{ropt} (U | T)} - - - (13)

D) marked resource the semantic repetition factor of label (Tag Semantic Overlap Factor, TSOF)

For the resource with classification motivation tendency, the synonym in its vocabulary should be the least possible, can improve the efficient of browsing like this.But have the resource of describing the motivation tendency for one, just in time opposite, if label is can comprehensive more description resource good more.Therefore, can weigh the motivation tendency that resource has, calculate as formula (14) by the label similarity that resource is given in calculating.

Wherein, sim (t _i, t _jTwo label t of) ' be _i, t _jBetween similarity, adopt formula (15) to calculate.F (t in the formula _i) ', f (t _j) ' be respectively label t _i, t _jAt the number of times that the tally set of resource r occurs, f (t _i, t _j) ' be two label t _i, t _jAt the common number of times that occurs of the tally set of resource r.

sim {(t_{i}, t_{j})}^{'} = \frac{\max (\log f {(t_{i})}^{'}, \log f {(t_{j})}^{'}) - lo gf {(t_{i}, t_{j})}^{'}}{\log N^{'} - \min (\log f {(t_{i})}^{'}, \log f {(t_{j})}^{'})} - - - (15)

Wherein N ' is total number of speech in the tally set of resource r.

E) marked resource the special tag utilization rate (Special Tags Ratio, STR)

The definition of the special tag of resource is the same with the definition of user's special tag.The utilization rate of special speech equally also can be used as one of tendentious discriminant criterion of mark motivation of resource.If the special speech utilization rate that resource is endowed is high more, it just has the motivation of description tendentiousness more so.Otherwise he its classification motivation tendentiousness of being had is just high more.The special tag utilization rate is measured with formula (16).

STR _r＝card(t∈T _str)′/|T _r| (16)

Wherein, T _Str=who, and when, what, when, where, how, howto ... be the special tag set, its set for all interrogative adverbs in the English can be set.Card (t ∈ T _Str) ' be the employed T that is included in of resource r _StrIn the number of label, comprise repeat count.Obvious 0≤STR _r≤ 1, work as STR _r=card (t ∈ T _Str)/| T _r| more near 1, resource r may have the motivation of description tendency more; If STR _rNear 0, resource r may have classification motivation tendency more more.

Equally, resource to be marked

Motivation tendentiousness also can be expressed as the vector of 5 kinds of motivation metric

That is:

M_{\hat{r}} = ({TRR}_{\hat{r}}, {LFTU}_{\hat{r}}, {TRCE}_{\hat{r}}, {TSOF}_{\hat{r}}, {STR}_{\hat{r}}) - - - (17)

Wherein,

Be the tendentious 5 kinds of metric of the motivation of resource, the calculating of these 5 kinds of metric is identical with the tendentious computing method of the motivation that marks resource.

(2) marking the selection resource similar in the resource, obtain non-user and rely on similar resource to the motivation tendentiousness of resource to be marked; Promptly calculate the tendentious similarity of each motivation that has marked resource and resource to be marked, select the mark resource of similarity, the combination of resources of choosing is become non-user rely on similar resource R greater than threshold values α _Sim

Concretely, when resource was marked, used label came to mark for new resource before the user tended to select for use.So find the resource similar, calculate the tendentious matching degree of the current motivation of these resources and user to resource to be marked.The label that uses the high resource of matching degree will obtain meeting the label of user view as the Candidate Set of recommending.Calculate tendentiousness similarity that resource to be marked and user marked resource and can adopt Method of Cosine, as formula (18) based on vector space.

{sim}_{r &Element; R_{u}} (M_{r}, M_{\hat{r}}) = \frac{M_{r} \cdot M_{\hat{r}}}{| M_{r} | | M_{\hat{r}} |} - - - (18)

M wherein _rThe motivation tendentiousness vector representation of the resource that has marked for the user,

Motivation tendentiousness vector representation for resource to be marked.Set a threshold values α as controlling elements, (the α value is 0 to 1 if similarity is more than or equal to threshold values α, can record by experiment, suggestion is set to 0.6), it is very high that this just means that also the user has marked motivation tendentiousness similarity degree resource and resource to be marked.The combination of resources of mark that these similarity degrees are very high is used R together _SimThe non-user that expression is combined into relies on similar resource, promptly

Calculation of similarity degree can also adopt methods such as mutual information, Pearson be similar.

(3) rely on similar resource R non-user _SimThe middle selection resource similar to user's motivation tendentiousness obtains label recommended candidate resource R _Cad

Adopt formula (19) to calculate non-user and rely on similar resource R _SimIn the motivation tendentiousness and the tendentious similarity of user's motivation of each resource,

{sim}_{r &Element; R_{sim}} (M_{r}, M_{u}) = \frac{M_{r} \cdot M_{u}}{| M_{r} | | M_{u} |} - - - (19)

M wherein _rThe vector representation of the resource that the resource that has marked for the user and resource motivation tendentiousness to be marked are comparatively similar, M _uMotivation tendentiousness vector representation for the user.Set a threshold values β as controlling elements, if similarity, illustrates that the label of these resources can be used as the label that meets user view and recommends more than or equal to threshold values β (the β value is 0 to 1, can record by experiment, and suggestion is set to 0.6).Select similarity to rely on similar resource,, use R as label recommended candidate resource greater than the non-user of threshold values β _CadExpression label recommended candidate resource, promptly

(4) with label recommended candidate resource R _CadIn all labels merge, obtain merging tally set; Be about to label recommended candidate resource R _CadIn each resource by formula (20) merge its all labels, obtain merging tally set;

{\hat{T}}_{u} = \underset{(r | {sim}_{r &Element; R_{sim}} (M_{r}, M_{u}) &GreaterEqual; β)}{\cup} T_{r} - - - (20)

(5) calculate the recommendation importance that merges each label in the tally set; Promptly calculate the recommendation importance that merges each label in the tally set according to formula (21)

p (t | \hat{r}) = \underset{w &Element; \hat{r}, t &Element; {\hat{T}}_{u}}{Σ} p (w) s (w, t) - - - (21)

Wherein p (w) is a resource to be recommended In each speech w in resource to be recommended

In content importance, calculate according to formula (22); (w is the correlativity between the label t in speech w and the merging tally set t), according to formula (23) to s.

p (w) = \log (\frac{tf (w, \hat{r})}{N_{\hat{r}}} + 1) \log (\frac{N_{R_{cad}}}{| R_{cad} (w) |} + 1) - - - (22)

s (w, t) = \frac{\max ({\log tf (w, \hat{r}), \log tf (t, \hat{r})}) - \log tf (w, t, \hat{r})}{\log N_{\hat{r}} - \min ({\log tf (w, \hat{r}), \log tf (t, \hat{r})})} - - - (23)

Wherein,

For speech w in resource

The middle number of times that occurs,

For label t in resource

The middle number of times that occurs,

For speech w and label t in resource

In the number of times that occurs simultaneously,

Be resource

In the number of all speech,

The number of the whole speech that comprise for all label recommended candidate resources, | R _Cad(w) | be the number of the resource of the speech w that comprises in all label recommended candidate resources, described herein speech w is meant the speech in the English.

(6) according to recommending importance p (t|r) from big to small, recommend its corresponding label.

It is a kind of based on the tendentious label commending system of user's motivation that the present invention also provides, as shown in Figure 4, comprise that motivation tendentiousness computing module (100), the non-user of selection rely on similar resource module (200), selection label recommended candidate resource module (300), label merging module (400), recommend importance computing module (500) and output module (600);

Motivation tendentiousness computing module (100) is used to calculate user's motivation tendentiousness, motivation tendentiousness that each has marked resource and the motivation tendentiousness of resource to be marked;

Selecting non-user to rely on similar resource module (200) is used for obtaining non-user and relying on similar resource marking the resource selection resource similar to the motivation tendentiousness of resource to be marked;

Select label recommended candidate resource module (300) to be used for relying on similar resource and select the resource similar, obtain label recommended candidate resource to user's motivation tendentiousness non-user;

Label merges module (400) and is used for all labels of label recommended candidate resource are merged, and obtains merging tally set;

Recommend importance computing module (500) to be used for calculating the recommendation importance that merges each label of tally set;

Output module (600) is used for according to the recommendation importance of each label from big to small, carries out label and recommends.

The present invention not only is confined to above-mentioned embodiment; persons skilled in the art can adopt other multiple embodiment to implement the present invention, therefore according to content disclosed by the invention; every employing project organization of the present invention and thinking all fall into the scope of protection of the invention.

Claims

1. one kind based on the tendentious label recommend method of user's motivation, may further comprise the steps:

2. label recommend method according to claim 1 is characterized in that, the motivation tendentiousness of user u is M in the step (1) _u=(TRR _u, LFTU _u, TRCE _u, TSOF _u, STR _u), TRR _u, LFTU _u, TRCE _u, TSOF _u, STR _uBe the tendentious metric of the motivation of user u, each metric is calculated as follows:

(a)TRR _u＝e-|T _u|/|R _u|；

Wherein, T _uExpression user u the set of used all labels, | T _u| expression set T _uMiddle number of tags, R _uAll resources that expression user u has marked, | R _u| expression set R _uMiddle number of tags;

Wherein, t represents any one label, t _MaxBe the most used label of user u, | R (t) | for the resource quantity that comprises label t, | R (t _Max) | for comprising the most frequent label t _MaxResource quantity, n is for comprising the most frequent label t _MaxNumber of resources p/one on round, 0＜p≤100,

Be the set of user u low frequency tags,

Number for user u low frequency tags;

Wherein, (r t) is the joint distribution probability of label t on resource r to p, and p (r|t) is the probability of label t mark resource r, and R represents user u, and all have marked the set of resource, and T represents user u, and all have marked the tag set that resource is endowed, H _Opt(R|T) be the p (r|t) of all labels H when all identical _u(R|T) value;

Wherein, sim (t _i, t _j) two label t of expression _i, t _jBetween similarity, f (t _i), f (t _i) be respectively label t _i, t _jAt the number of times that the tally set of user u occurs, f (t _i, t _j) be two label t _i, t _jThe common number of times that occurs in the tally set of user u, N are total numbers of speech in the tally set of user u;

(e)STR _u＝card(t∈T _str)/|T _u|；

Wherein, T _StrBe special tag set, card (t ∈ T _Str) be the employed T that is included in of user u _StrIn the number of label, comprise repeat count;

The motivation tendentiousness that has marked any one resource r in resource and the resource to be marked described in the step (1) is M _r=(TRR _r, LFTU _r, TRCE _r, TSOF _r, STR _r), TRR _r, LFTU _r, TRCE _r, TSOF _r, STR _rBe the tendentious metric of the motivation of resource r, each metric is calculated as follows:

(a’)TRR _r＝e-|T _r|/|U _r|；

Wherein, | T _r| represent that all users give all number of tags of resource r, | U _r| all number of users that expression is given label to resource r;

Wherein, t represents any one label of resource r, t _Max' be the most used label of resource r, | R _u(t) | for having used the number of users of label t, | R _u(t _Max') | for having used the most frequent label t _Max' number of users, m is the most used label t _Max' number of users q/one on round, 0＜q≤100,

Be the set of the low frequency tags of resource r,

Number for the low frequency tags of resource r;

Wherein, (u t) is the joint distribution probability of user u use label t to p, and p (u|t) is the probability that user u uses label t mark resource r, and U represents to mark all users' of resource r set, H _Ropt(R|T) be the p (u|t) of all labels H when all identical _r(R|T) value;

Wherein, sim (t _i, t _jTwo label t of) ' be _i, t _jBetween similarity, f (t _i) ', f (t _j) ' be respectively label t _i, t _jAt the number of times that the tally set of resource r occurs, f (t _i, t _j) ' be two label t _i, t _jThe common number of times that occurs in the tally set of resource r, N ' are total numbers of speech in the tally set of resource r;

(e’) STR _r＝card(t∈T _str)′/|T _r|；

Wherein, T _StrBe special tag set, card (t ∈ T _Str) ' be the employed T that is included in of resource r _StrIn the number of label, comprise repeat count.

3. label recommend method according to claim 1 and 2 is characterized in that, adopts following method to obtain non-user in the step (2) and relies on similar resource:

(3.1) calculate the tendentious similarity of motivation of each motivation tendentiousness that has marked resource and resource to be marked respectively;

(3.2) select the mark resource of similarity, promptly obtain non-user and rely on similar resource, wherein 0＜α＜1 greater than threshold values α.

4. label recommend method according to claim 1 and 2 is characterized in that, adopts following method to obtain label recommended candidate resource in the step (3):

(4.1) calculate motivation tendentiousness and the tendentious similarity of user's motivation that non-user relies on each resource in the similar resource;

(4.2) select similarity to rely on similar resource, i.e. label recommended candidate resource, wherein 0＜β＜1 greater than the non-user of threshold values β.

5. label recommend method according to claim 1 and 2 is characterized in that, adopts following method to calculate in the step (5) and merges tally set

In the recommendation importance of each label:

(5.1) calculate resource to be marked

In each speech w in resource to be marked

In content importance p (w), Wherein,

For speech w in resource to be marked

The middle number of times that occurs,

Be resource to be marked

In the number of all speech,

The number of the whole speech that comprise for all label recommended candidate resources, | R _Cad(w) | be the number of the resource of the speech w that comprises in all label recommended candidate resources;

(5.2) calculate speech w and merge tally set

Correlativity s between the middle label t (w, t),

Wherein,

For label t in resource to be marked

The middle number of times that occurs,

For speech w and label t in resource to be marked

In the number of times that occurs simultaneously;

(5.3) the recommendation importance of computation tag t

6. one kind based on the tendentious label commending system of user's motivation, comprises motivation tendentiousness computing module (100), selects non-user to rely on similar resource module (200), selects label recommended candidate resource module (300), label merges module (400), recommend importance computing module (500) and output module (600);