CN102262653A - Label recommendation method and system based on user motivation orientation - Google Patents
Label recommendation method and system based on user motivation orientation Download PDFInfo
- Publication number
- CN102262653A CN102262653A CN 201110154353 CN201110154353A CN102262653A CN 102262653 A CN102262653 A CN 102262653A CN 201110154353 CN201110154353 CN 201110154353 CN 201110154353 A CN201110154353 A CN 201110154353A CN 102262653 A CN102262653 A CN 102262653A
- Authority
- CN
- China
- Prior art keywords
- resource
- label
- user
- motivation
- marked
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
The invention provides a label recommendation method based on a user motivation orientation. The method provided by the invention comprises the following steps of: calculating a user motivation orientation, a motivation orientation of each labeled resource and a motivation orientation of a resource to be labeled according to a user triple; selecting a resource which is similar with the motivation orientation of the resource to be labeled from the labeled resources to obtain a non-user-depended similar resource; selecting a resource which is similar with the user motivation orientation from the non-user-depended similar resource to obtain a label recommendation candidate resource; combining all labels in the label recommendation candidate resource to obtain a combined label set; calculating a recommendation importance of each label in the combined label set; finally, carrying out the label recommendation according to the recommendation importance of each label from big to small. The method provided by the invention can recognize the calculating of the network information resource labeled by the user and then recommends a list which accords with the user intention and is composed of multiple labels to the user. The invention also provides a label recommendation system based on the method.
Description
Technical field
The invention belongs to the Web information resources and handle and utilize the field, be specifically related to recommend the method for label for the Web information resources and based on the commending system of the method based on user's motivation tendentiousness.
Background technology
Growing along with Internet, network information resource is just with the unthinkable speed increment of people, and the appearance of Web2.0 makes that this growth is swifter and more violent.In Web2.0, the internet system is from bottom to top leading by users' group wisdom and strength by top-down, minor resource effector centralized control, dominant transition originally.The user still is the producer of network information resource except being the viewer of network information resource simultaneously.Though this specific character of Web2.0 user's create contents has been enriched the source of information, quickened the diffusion of information, also cause information overload simultaneously, searched problems such as load increases the weight of, the reduction of information quality.So, user's cover the sky and the earth mass network information resources and obtain suitable and high-quality information how apace, at low cost, effectively and just become the impassable great research topic of pendulum of organization and management how at leisure in our front.
Desirable network information resource tissue should customer-centric, makes full use of the experience of emerging technology and people accumulation, and organizational framework should possess advantages of high practicability and ease for use.Under the Web2.0 environment, social tag system just plays an important role as a kind of very effective method of network information resource tissue.As organizational form, it is different with traditional controlled hierarchical classification system top-down, rigidity, society's tag system system has following three advantages: (1) social label is network information resource user generation when Internet resources marks, identical social label has formed new classification through after compiling, and it is bottom-up; (2) social label is not controlled by the expert, and the user can use any speech to mark voluntarily, has high dirigibility, ease for use and subjective awareness, and Internet resources can " flexibility " be under the jurisdiction of a plurality of popular classification.(3) in social tag system, the user can mark Internet resources from a plurality of dimensions, many levels.Therefore, its structure is non-level.
Yet when possessing numerous advantages, there is shortcoming equally in the label mode, mainly find expression in following two aspects: (1) most of social tag systems allow user's input label voluntarily, this operating mode makes the user be easy to control the mark behavior, but, the randomness of mark has more noise in the label because also having caused, wrong assembly, ambiguity and the User Defined label that does not have a practical significance usually are full of wherein, and this practicality to label has caused not little obstacle.For this reason, some social tag system has to provide some governing principles for the user specially.(2) the sparse problem of data, because it is a kind of emerging information organization mode that label type is browsed, also do not obtain very using widely, especially in Chinese resource, adopt the Internet resources of this organizational form very rare, on the other hand, the user is unaccustomed to still to be that Internet resources add a large amount of labels, thereby makes that existing label resources is very rare on the network.
In recent years, under this actual demand, the label recommended technology has been subjected to the extensive concern of academia and internet enterprise just.It is that network information resource to be marked provides a series of high-quality labels as the candidate by content and user's mark history, the pass explicit or implicit expression of investigating, analyze, excavating network information resource exactly that label is recommended.The purpose of recommending mainly is: (1) simplifies marking program, is user-friendly to, thereby increases the availability and the viscosity of social tag system.(2) quality of raising label reduces situations such as wrong assembly, ambiguity, improves the effect of label in organization of information resources, retrieval, utilization and discovery.(3) structure of change Label space makes Label space stablize faster and convergence, and then emerges in large numbers semanteme.
At present, both at home and abroad the comparatively ripe social label commending system of some development has been arranged at the various network information resource, these systems all in organization of information resources, retrieve, share and aspect such as discovery has played important effect.These systems comprise: commodity are carried out the Amazon that label is recommended, web page resources is carried out the Delicious that label is recommended, to picture carry out Flickr that label recommends, to scientific paper carry out Bibsonomy that label recommends, for the books film recommend the bean cotyledon net of label, for the potato net that provides video to share to recommend label etc.The label commending system that had existed already mainly adopts the technology of Recommendations traditional in the e-commerce system, mainly comprises: content-based recommended technology, based on the recommended technology of collaborative filtering, based on the recommended technology of correlation rule and the hybrid technology of these technology.Recommending on the foundation, these traditional recommended technologies or the content that is based on resource itself are recommended, or recommend based on the historical results of user's mark.On proposed algorithm, great majority are the algorithms that adopt data mining or machine learning.These traditional label recommended technologies have solved tissue, the classification of information overload and information resources, the problem of retrieval to a certain extent, but also very undesirable on effect, especially can not recommend to satisfy the label of customer information requirement.
Summary of the invention
In order to satisfy user's information requirement, use the motivation of social tag system from the user, discern its information object, for it recommends social more accurately label, the invention provides a kind ofly based on the tendentious recommendation stamp methods of user's motivation, this method can be recommended the tabulation that a plurality of labels that one of user meets user view are formed.The present invention also provides the label commending system based on this method simultaneously.
The present invention adopts following technical scheme to realize: the invention provides a kind ofly based on the tendentious label recommend method of user's motivation, may further comprise the steps:
(1) according to user's tlv triple, calculating user's motivation tendentiousness, the motivation tendentiousness that each has marked resource and the motivation tendentiousness of resource to be marked; Described user's tlv triple comprises user's mark history, marks resource and corresponding mark and resource to be marked and corresponding mark;
(2) marking the selection resource similar in the resource, the resource that obtains is called non-user relies on similar resource to the motivation tendentiousness of resource to be marked;
(3) rely on the selection resource similar in the similar resource non-user, the resource that obtains is called label recommended candidate resource to user's motivation tendentiousness;
(4) all labels in the label recommended candidate resource are merged, obtain merging tally set;
(5) calculate the recommendation importance that merges each label in the tally set;
(6) from big to small, carrying out label recommends according to the recommendation importance of each label.
It is a kind of based on the tendentious label commending system of user's motivation that the present invention also provides, and comprises motivation tendentiousness computing module, selects non-user to rely on similar resource module, selects label recommended candidate resource module, label to merge module, recommends importance computing module and output module;
Motivation tendentiousness computing module is used to calculate user's motivation tendentiousness, motivation tendentiousness that each has marked resource and the motivation tendentiousness of resource to be marked;
Selecting non-user to rely on similar resource module is used for obtaining non-user and relying on similar resource marking the resource selection resource similar to the motivation tendentiousness of resource to be marked;
Select label recommended candidate resource module to be used for relying on similar resource and select the resource similar, obtain label recommended candidate resource to user's motivation tendentiousness non-user;
Label merges module and is used for all labels of label recommended candidate resource are merged, and obtains merging tally set;
Recommend the importance computing module to be used for calculating the recommendation importance that merges each label of tally set;
Output module is used for according to the recommendation importance of each label from big to small, carries out label and recommends.
The starting point of label recommend method is the content of resource itself or the same existing structure of label etc. in the existing social tag system, and method proposed by the invention is directly from the metastable mark motivation of user tendentiousness, by obtaining user's mark motivation tendentiousness, and carry out label according to this mark motivation tendentiousness and recommend, the label of being recommended more meets user's intention, the better effects if of recommendation.The present invention can discern the motivation that the user marks network information resource, the discovery of this motivation provides good design reference to the tag design commending system, and can produce directive function to the study of body in the Label space, the semanteme stable, social label that is more conducive to social label construction emerges in large numbers.
Description of drawings
Fig. 1 is based on the tendentious label recommended flowsheet of user's motivation;
Fig. 2 is a special tag utilization rate inquiry synoptic diagram of the present invention;
Fig. 3 is the label cloud atlas that the present invention describes motivation tendentiousness user;
Fig. 4 is a label commending system module map of the present invention.
Embodiment
The present invention is further detailed explanation below in conjunction with accompanying drawing and example.
Motivation tendentiousness described in the present invention mainly contains two classes, the motivation of promptly classifying tendency and description motivation tendency, and their characteristics are as shown in table 1.
The characteristics of table 1 classification motivation tendency and description motivation tendency
Classification motivation tendency | Description motivation tendency | |
Purpose | Browse after being convenient to | Inquiry and retrieval after being convenient to |
The resource tag rate | Low | High |
The vocabulary size | Limited | Infinitely |
Situation appears in synonym | Few | Many |
Label is from the resource title | Few | Many |
Change the cost of label | Greatly | Little |
Concretely, the mark user with classification motivation tendency to use the purpose of label be will provide one for the resource that is marked to browse help function.Therefore, the user of classification motivation tendency wishes to set up a stable vocabulary according to the preference of oneself.For the ease of browsing, yes does not simply more have redundancy good more more for this vocabulary, so the mark user tends to avoid using the speech with identical semanteme, and can select the speech of clear and easy to understand easy note for use.For example,, have in user's the vocabulary of classification motivation tendency and often only can have " car " when mark during an automobile, and can not use " automobile ", speech that " vehicle " etc. has equivalent.Like this, from the result of mark, such vocabulary is more as a semantic classification system.Certainly, the same with traditional categorizing system, the classification cost that changes categorizing system is bigger.
Having the mark user who describes the motivation tendency, to use the purpose of label be will describe the content of the resource that is marked accurately so that inquiry in the future and retrieval.In order to support inquiry and the purpose of browsing better, so user's vocabulary just may be introduced the speech of many that be of little use and synonyms, for example when the description automobile, " car ", " automobile ", " vehicle " can appear in the vocabulary.In addition, the user often wishes to go to describe resource from many aspects, does not limit to the quantity of the speech of use.Also may be along with the development of cognition in the mark process, the speech of same meaning may change.So the vocabulary of describing the user of motivation tendency is an opening, dynamic vocabulary.
Among the present invention, adopt following symbol to come correlation parameter in the mark society tag system, u represents a user, and r represents a resource, and such as webpage, U represents to mark the set of all user u of resource r, and R represents the user, and all have marked the set of resource, R
uAll resources that expression user u has marked, | R
u| expression set R
uMiddle number of tags, t represents any one label, t
1, t
2..., t
nAll represent some concrete labels, T represents the user, and all have marked the tag set that resource is endowed, T
uExpression user u the set of used all labels, | T
u| expression set T
uMiddle number of tags, T
rRepresent that all users give all labels of resource r, | T
r| expression set T
rIn number of tags; R
u(t) represent the resource that user u uses label t to mark.
The invention provides a kind ofly, may further comprise the steps based on the tendentious label recommend method of user's motivation:
(1) according to user's tlv triple, calculating user's motivation tendentiousness, the motivation tendentiousness that each has marked resource and the motivation tendentiousness of resource to be marked; Described user's tlv triple comprises user's mark history, marks resource and corresponding mark and resource to be marked and corresponding mark; The mark of described resource correspondence comprises the mark of all users to resource.
The motivation tendentiousness of user u can be used vector M
uExpression, that is:
M
u=(TRR
u,LFTU
u,TRCE
u,TSOF
u,STR
u) (1)
Wherein, TRR
u, LFTU
u, TRCE
u, TSOF
u, STR
uBe the tendentious 5 kinds of metric of motivation, its implication and calculating are as follows respectively:
A) user's the average label rate of the resource of mark (Tags/Resources Ratio, TRR)
User's the average label rate of the resource of mark TRR
uBe used to weigh the user for each resource marks employed average number of labels, the ratio of the sum of the resource that the size that it equals user's vocabulary and user are marked is as formula (2).
TRR
u=e-|T
u|/|R
u| (2)
The user who describes the motivation tendency needs for describing, and can select various speech to describe resource, in theory, is not subjected to the restriction of quantity.For the needs of browsing, the user of classification motivation tendency tends to select less speech to mark resource.Therefore, its vocabulary is limited.In general, the user of classification motivation tendency is obviously low than the user who describes the motivation tendency in the score of this characteristic measure.That is to say user's TRR
uBe worth more little, then this user may tend to more the classification; TRR
uBe worth greatly more, then this user may tend to describe more.
B) user's low frequency tags utilization rate (Lower Frequency Tag Ratio, LFTR)
In order to add up label, use LFTR in situation about using
uPortray the user and use the degree of those low frequency tags, the quantity that it equals low frequency tags takies the ratio of total number of labels in the vocabulary of family.So-called low frequency tags is exactly the label that often do not use of user for the minor resource mark, promptly refers to those mark access times label seldom.Its tolerance is calculated with formula (3):
Wherein t represents any one label, t
MaxFor this user uses the most frequent label, | R (t) |, | R (t
Max) | be respectively and comprise label t and the most frequent label t
MaxResource quantity, n is for comprising the most frequent label t
MaxNumber of resources p/one on round, 0<p≤100,
Be the set of user u low frequency tags, promptly be included in the no more than n of the number of resources that label marked in this set,
Number for user u low frequency tags.
Obvious 0≤LFTR
u≤ 1.Work as LFTR
u=1 o'clock, the number of times that all labels of expression user use was no more than n time, these labels all be from different angles, different sides describes resource, and the user does not mind the use low-frequency word.Certainly, can think that this user has the motivation of description tendency.Work as LFTR
u=0 o'clock, the expression user seldom used low frequency tags, thinks that low frequency tags is unfavorable for classified browse.If the introducing low frequency tags equals to introduce noise, destruction is kept consistent possibility of classifying, so this user takes notice of the use low-frequency word very much.Certainly, can think that this user has classification motivation tendency.
C) the relative conditon entropy of each label of user (Tag Relative Conditional Entropy, TRCE)
For the user of classification motivation tendency, they wish that label has maximum discrimination, only in this way browse just full blast.Therefore, selecting the user process of label to liken to is the process of information coding.Information coding is to make the information entropy maximum of sign indicating number, and the user to select to have the useful in other words label of discrimination be exactly to make the information entropy of label want maximum.In other words, the user with classification tendency thinks that exactly all labels are all identical on frequency of utilization, so just may make the information entropy maximum of label also just more to help browsing of user.On the contrary, the user who describes the motivation tendency is to this and lose interest in.
When the user uses the label coding resource, its conditional entropy H
u(R|T) just reflected the validity of this cataloged procedure, can calculate according to formula (4):
Wherein, (r t) is the joint distribution probability of label t on resource r to p, and p (r|t) is the probability of label t mark resource r.
In the calculating of label information entropy, use label and resource as stochastic variable.Conditional entropy can be interpreted as the uncertainty that resource keeps label, mainly is subjected to the influence of the size of the quantity of resource and vocabulary.In order to distinguish the difference between the user, conditional entropy is carried out normalized to keep coded message, make the conditional entropy H of reality observation
u(R|T) all with desirable conditional entropy H
Opt(R|T) compare.When each label all has same discrimination, when promptly the p of all labels (r|t) is the same, can obtain desirable conditional entropy H
Opt(R|T), at this moment conditional entropy is also maximum.Therefore, can be with ideally conditional entropy as normalized factor, on this basis, the relative conditon entropy of label calculates as formula (5)
Obviously, 0≤TRCE
u≤ 1.Work as TRCE
uNear 0, the conditional entropy that the user is described is more near ideal situation more, and also care label is tending towards the equiprobability distribution.At this moment, the separating capacity of label is very strong, can judge the very big user that may belong to classification motivation tendency of user.Otherwise describe the user that motivation is inclined to very big may belonging to.
D) the semantic repetition factor of user's label (Tag Semantic Overlap Factor, TSOF)
User for having classification motivation tendency wishes that each synonym in own vocabulary is the least possible, can improve the efficient of browsing like this.But have the user who describes the motivation tendency for one, they also are indifferent to these, and are on the contrary, can comprehensive more description resource good more.Therefore, can weigh the motivation tendency that the user has, calculate as formula (6) by calculating the employed label similarity of user.
Wherein, sim (t
i, t
j) be two label t
i, t
jBetween similarity, adopt formula (7) to calculate.F (t in the formula
i), f (t
j) be respectively label t
i, t
jAt the number of times that the tally set of user u occurs, f (t
i, t
j) be at the common number of times that occurs of user's tally set.
Wherein N is total number of speech in user's the tally set.
When the similarity of all labels of user near 0 the time, TSOF
uApproach 0, illustrate that this user's motivation tendentiousness is classification tendentiousness; Otherwise the motivation tendentiousness that the user is described is for describing tendentiousness.
E) user's special tag utilization rate (Special Tags Ratio, STR)
By social tag system label statistical study is found: when the user uses interrogative adverbs such as when, what, how to mark, remaining tagging user tends to be selected from the title of resource, through compare of analysis, the intention of these users' description content of pages is (as Fig. 2) very obviously.Defining these interrogative adverb labels is special tag.Simultaneously, as can be seen from Figure 3, these users mark the motivation of other resources and also tend to descriptive motivation, obviously contain other features (as the semantic repetition factor of label) of descriptive motivation tendency in other mark records as the user " breneaux " in article one record.
The utilization rate of special speech also can be used as one of tendentious discriminant criterion of user's motivation when therefore, marking resource.If a user's special speech utilization rate is high more, he has that to describe motivation tendentiousness just high more so.Otherwise the description motivation tendentiousness that he has is just low more.The special tag utilization rate is measured with formula (8).
STR
u=card(t∈T
str)/|T
u| (8)
Wherein, T
Str=who, and when, what, when, where, how, howto ... be the special tag set, its set for all interrogative adverbs in the English can be set.Card (t ∈ T
Str) be the employed T that is included in of user u
StrIn the number of label, comprise repeat count.Obvious 0≤STR
u≤ 1, work as STR
u=card (t ∈ T
Str)/| T
u| more near 1, user u may have the motivation of description tendency more; If STR
uNear 0, user u may have classification motivation tendency more more.
Each has marked the motivation tendentiousness of resource and also can represent with the vector of 5 kinds of motivation metric; For a certain resource r, its motivation tendentiousness can be used vector M
rExpression, that is:
M
r=(TRR
r,LFTU
r,TRCE
r,TSOF
r,STR
r) (9)
Wherein, TRR
r, LFTU
r, TRCE
r, TSOF
r, STR
rBe the tendentious 5 kinds of metric of the motivation of resource, its implication and calculating are as follows respectively:
A) marked resource the average label rate of each user (Tags/Resources Ratio, TRR)
Marked the average label rate of each user TRR of resource
rBe used to weigh the number of labels that each user of having marked resource on average uses, it equals the number of all labels that resource uses and the ratio of the number of users of this resource of mark, as formula (10).
TRR
r=e-|T
r|/|U
r| (10)
Wherein, | T
r| represent that all users give all number of tags of resource r, | U
r| all number of users that expression is given label to resource r.In general, the resource of classification motivation tendency is obviously low than the resource of describing the motivation tendency in the score of this characteristic measure.That is to say the TRR of resource
rBe worth more little, then this resource may tend to more the classification; TRR
rBe worth greatly more, then this resource may be tended to describe more.
B) marked resource the low frequency tags utilization rate (Lower Frequency Tag Ratio, LFTR)
In order to add up label, use LFTR in situation about using
rPortray the usage degree of the low frequency tags of giving resource, the quantity that it equals low frequency tags accounts for the ratio of total number of labels in all vocabularys of giving this resource.So-called low frequency tags is exactly often not give the label of this resource.Its tolerance is calculated with formula (11):
Wherein t represents any one label of resource r, t
Max' be the most used label of resource r, | R
u(t) | for having used the number of users of label t, | R
u(t
Max') | for having used the most frequent label t
Max' number of users, m is the most used label t
Max' number of users q/one on round, 0<q≤100,
Be the set of the low frequency tags of resource r, the number of users that promptly uses the label in this set is smaller or equal to m,
Number for the low frequency tags of resource r.
C) marked each label of resource the relative conditon entropy (Tag Relative Conditional Entropy, TRCE)
For the resource of classification motivation tendency, the label that institute gives them should have maximum discrimination, only in this way when the user browses, and the ability full blast.Therefore, can to liken to be the process of information coding to the process of giving resource tag.The information coding is the information entropy maximum of feasible sign indicating number most effectively, and selecting that the label of discrimination is arranged is exactly to make the information entropy of label want maximum.When the user uses the label coding resource, its conditional entropy H
r(U|T) just reflected the validity of this cataloged procedure, can calculate according to formula (12):
Wherein, (u t) is the joint distribution probability of user u use label t to p, and p (u|t) is the probability that user u uses label t mark resource r.When each label all has same discrimination, when promptly the p of all labels (u|t) is the same, can obtain desirable conditional entropy H
Ropt(U|T), at this moment conditional entropy is also maximum.
In the calculating of label information entropy, use label and user as stochastic variable.Conditional entropy can be interpreted as the uncertainty that the user uses label, mainly is subjected to the influence of the size of number of users and vocabulary.In order to distinguish the difference between the resource, conditional entropy is carried out normalized to keep coded message, make the conditional entropy H of reality observation
r(U|T) all with desirable conditional entropy H
Ropt(U|T) compare.Therefore, can be with ideally conditional entropy as normalized factor, on this basis, the relative conditon entropy of label calculates as formula (13)
D) marked resource the semantic repetition factor of label (Tag Semantic Overlap Factor, TSOF)
For the resource with classification motivation tendency, the synonym in its vocabulary should be the least possible, can improve the efficient of browsing like this.But have the resource of describing the motivation tendency for one, just in time opposite, if label is can comprehensive more description resource good more.Therefore, can weigh the motivation tendency that resource has, calculate as formula (14) by the label similarity that resource is given in calculating.
Wherein, sim (t
i, t
jTwo label t of) ' be
i, t
jBetween similarity, adopt formula (15) to calculate.F (t in the formula
i) ', f (t
j) ' be respectively label t
i, t
jAt the number of times that the tally set of resource r occurs, f (t
i, t
j) ' be two label t
i, t
jAt the common number of times that occurs of the tally set of resource r.
Wherein N ' is total number of speech in the tally set of resource r.
E) marked resource the special tag utilization rate (Special Tags Ratio, STR)
The definition of the special tag of resource is the same with the definition of user's special tag.The utilization rate of special speech equally also can be used as one of tendentious discriminant criterion of mark motivation of resource.If the special speech utilization rate that resource is endowed is high more, it just has the motivation of description tendentiousness more so.Otherwise he its classification motivation tendentiousness of being had is just high more.The special tag utilization rate is measured with formula (16).
STR
r=card(t∈T
str)′/|T
r| (16)
Wherein, T
Str=who, and when, what, when, where, how, howto ... be the special tag set, its set for all interrogative adverbs in the English can be set.Card (t ∈ T
Str) ' be the employed T that is included in of resource r
StrIn the number of label, comprise repeat count.Obvious 0≤STR
r≤ 1, work as STR
r=card (t ∈ T
Str)/| T
r| more near 1, resource r may have the motivation of description tendency more; If STR
rNear 0, resource r may have classification motivation tendency more more.
Equally, resource to be marked
Motivation tendentiousness also can be expressed as the vector of 5 kinds of motivation metric
That is:
Wherein,
Be the tendentious 5 kinds of metric of the motivation of resource, the calculating of these 5 kinds of metric is identical with the tendentious computing method of the motivation that marks resource.
(2) marking the selection resource similar in the resource, obtain non-user and rely on similar resource to the motivation tendentiousness of resource to be marked; Promptly calculate the tendentious similarity of each motivation that has marked resource and resource to be marked, select the mark resource of similarity, the combination of resources of choosing is become non-user rely on similar resource R greater than threshold values α
Sim
Concretely, when resource was marked, used label came to mark for new resource before the user tended to select for use.So find the resource similar, calculate the tendentious matching degree of the current motivation of these resources and user to resource to be marked.The label that uses the high resource of matching degree will obtain meeting the label of user view as the Candidate Set of recommending.Calculate tendentiousness similarity that resource to be marked and user marked resource and can adopt Method of Cosine, as formula (18) based on vector space.
M wherein
rThe motivation tendentiousness vector representation of the resource that has marked for the user,
Motivation tendentiousness vector representation for resource to be marked.Set a threshold values α as controlling elements, (the α value is 0 to 1 if similarity is more than or equal to threshold values α, can record by experiment, suggestion is set to 0.6), it is very high that this just means that also the user has marked motivation tendentiousness similarity degree resource and resource to be marked.The combination of resources of mark that these similarity degrees are very high is used R together
SimThe non-user that expression is combined into relies on similar resource, promptly
Calculation of similarity degree can also adopt methods such as mutual information, Pearson be similar.
(3) rely on similar resource R non-user
SimThe middle selection resource similar to user's motivation tendentiousness obtains label recommended candidate resource R
Cad
Adopt formula (19) to calculate non-user and rely on similar resource R
SimIn the motivation tendentiousness and the tendentious similarity of user's motivation of each resource,
M wherein
rThe vector representation of the resource that the resource that has marked for the user and resource motivation tendentiousness to be marked are comparatively similar, M
uMotivation tendentiousness vector representation for the user.Set a threshold values β as controlling elements, if similarity, illustrates that the label of these resources can be used as the label that meets user view and recommends more than or equal to threshold values β (the β value is 0 to 1, can record by experiment, and suggestion is set to 0.6).Select similarity to rely on similar resource,, use R as label recommended candidate resource greater than the non-user of threshold values β
CadExpression label recommended candidate resource, promptly
(4) with label recommended candidate resource R
CadIn all labels merge, obtain merging tally set; Be about to label recommended candidate resource R
CadIn each resource by formula (20) merge its all labels, obtain merging tally set;
(5) calculate the recommendation importance that merges each label in the tally set; Promptly calculate the recommendation importance that merges each label in the tally set according to formula (21)
Wherein p (w) is a resource to be recommended
In each speech w in resource to be recommended
In content importance, calculate according to formula (22); (w is the correlativity between the label t in speech w and the merging tally set t), according to formula (23) to s.
Wherein,
For speech w in resource
The middle number of times that occurs,
For label t in resource
The middle number of times that occurs,
For speech w and label t in resource
In the number of times that occurs simultaneously,
Be resource
In the number of all speech,
The number of the whole speech that comprise for all label recommended candidate resources, | R
Cad(w) | be the number of the resource of the speech w that comprises in all label recommended candidate resources, described herein speech w is meant the speech in the English.
(6) according to recommending importance p (t|r) from big to small, recommend its corresponding label.
It is a kind of based on the tendentious label commending system of user's motivation that the present invention also provides, as shown in Figure 4, comprise that motivation tendentiousness computing module (100), the non-user of selection rely on similar resource module (200), selection label recommended candidate resource module (300), label merging module (400), recommend importance computing module (500) and output module (600);
Motivation tendentiousness computing module (100) is used to calculate user's motivation tendentiousness, motivation tendentiousness that each has marked resource and the motivation tendentiousness of resource to be marked;
Selecting non-user to rely on similar resource module (200) is used for obtaining non-user and relying on similar resource marking the resource selection resource similar to the motivation tendentiousness of resource to be marked;
Select label recommended candidate resource module (300) to be used for relying on similar resource and select the resource similar, obtain label recommended candidate resource to user's motivation tendentiousness non-user;
Label merges module (400) and is used for all labels of label recommended candidate resource are merged, and obtains merging tally set;
Recommend importance computing module (500) to be used for calculating the recommendation importance that merges each label of tally set;
Output module (600) is used for according to the recommendation importance of each label from big to small, carries out label and recommends.
The present invention not only is confined to above-mentioned embodiment; persons skilled in the art can adopt other multiple embodiment to implement the present invention, therefore according to content disclosed by the invention; every employing project organization of the present invention and thinking all fall into the scope of protection of the invention.
Claims (6)
1. one kind based on the tendentious label recommend method of user's motivation, may further comprise the steps:
(1) according to user's tlv triple, calculating user's motivation tendentiousness, the motivation tendentiousness that each has marked resource and the motivation tendentiousness of resource to be marked; Described user's tlv triple comprises user's mark history, marks resource and corresponding mark and resource to be marked and corresponding mark;
(2) marking the selection resource similar in the resource, the resource that obtains is called non-user relies on similar resource to the motivation tendentiousness of resource to be marked;
(3) rely on the selection resource similar in the similar resource non-user, the resource that obtains is called label recommended candidate resource to user's motivation tendentiousness;
(4) all labels in the label recommended candidate resource are merged, obtain merging tally set;
(5) calculate the recommendation importance that merges each label in the tally set;
(6) from big to small, carrying out label recommends according to the recommendation importance of each label.
2. label recommend method according to claim 1 is characterized in that, the motivation tendentiousness of user u is M in the step (1)
u=(TRR
u, LFTU
u, TRCE
u, TSOF
u, STR
u), TRR
u, LFTU
u, TRCE
u, TSOF
u, STR
uBe the tendentious metric of the motivation of user u, each metric is calculated as follows:
(a)TRR
u=e-|T
u|/|R
u|;
Wherein, T
uExpression user u the set of used all labels, | T
u| expression set T
uMiddle number of tags, R
uAll resources that expression user u has marked, | R
u| expression set R
uMiddle number of tags;
Wherein, t represents any one label, t
MaxBe the most used label of user u, | R (t) | for the resource quantity that comprises label t, | R (t
Max) | for comprising the most frequent label t
MaxResource quantity, n is for comprising the most frequent label t
MaxNumber of resources p/one on round, 0<p≤100,
Be the set of user u low frequency tags,
Number for user u low frequency tags;
Wherein, (r t) is the joint distribution probability of label t on resource r to p, and p (r|t) is the probability of label t mark resource r, and R represents user u, and all have marked the set of resource, and T represents user u, and all have marked the tag set that resource is endowed, H
Opt(R|T) be the p (r|t) of all labels H when all identical
u(R|T) value;
Wherein, sim (t
i, t
j) two label t of expression
i, t
jBetween similarity, f (t
i), f (t
i) be respectively label t
i, t
jAt the number of times that the tally set of user u occurs, f (t
i, t
j) be two label t
i, t
jThe common number of times that occurs in the tally set of user u, N are total numbers of speech in the tally set of user u;
(e)STR
u=card(t∈T
str)/|T
u|;
Wherein, T
StrBe special tag set, card (t ∈ T
Str) be the employed T that is included in of user u
StrIn the number of label, comprise repeat count;
The motivation tendentiousness that has marked any one resource r in resource and the resource to be marked described in the step (1) is M
r=(TRR
r, LFTU
r, TRCE
r, TSOF
r, STR
r), TRR
r, LFTU
r, TRCE
r, TSOF
r, STR
rBe the tendentious metric of the motivation of resource r, each metric is calculated as follows:
(a’)TRR
r=e-|T
r|/|U
r|;
Wherein, | T
r| represent that all users give all number of tags of resource r, | U
r| all number of users that expression is given label to resource r;
Wherein, t represents any one label of resource r, t
Max' be the most used label of resource r, | R
u(t) | for having used the number of users of label t, | R
u(t
Max') | for having used the most frequent label t
Max' number of users, m is the most used label t
Max' number of users q/one on round, 0<q≤100,
Be the set of the low frequency tags of resource r,
Number for the low frequency tags of resource r;
Wherein, (u t) is the joint distribution probability of user u use label t to p, and p (u|t) is the probability that user u uses label t mark resource r, and U represents to mark all users' of resource r set, H
Ropt(R|T) be the p (u|t) of all labels H when all identical
r(R|T) value;
Wherein, sim (t
i, t
jTwo label t of) ' be
i, t
jBetween similarity, f (t
i) ', f (t
j) ' be respectively label t
i, t
jAt the number of times that the tally set of resource r occurs, f (t
i, t
j) ' be two label t
i, t
jThe common number of times that occurs in the tally set of resource r, N ' are total numbers of speech in the tally set of resource r;
(e’) STR
r=card(t∈T
str)′/|T
r|;
Wherein, T
StrBe special tag set, card (t ∈ T
Str) ' be the employed T that is included in of resource r
StrIn the number of label, comprise repeat count.
3. label recommend method according to claim 1 and 2 is characterized in that, adopts following method to obtain non-user in the step (2) and relies on similar resource:
(3.1) calculate the tendentious similarity of motivation of each motivation tendentiousness that has marked resource and resource to be marked respectively;
(3.2) select the mark resource of similarity, promptly obtain non-user and rely on similar resource, wherein 0<α<1 greater than threshold values α.
4. label recommend method according to claim 1 and 2 is characterized in that, adopts following method to obtain label recommended candidate resource in the step (3):
(4.1) calculate motivation tendentiousness and the tendentious similarity of user's motivation that non-user relies on each resource in the similar resource;
(4.2) select similarity to rely on similar resource, i.e. label recommended candidate resource, wherein 0<β<1 greater than the non-user of threshold values β.
5. label recommend method according to claim 1 and 2 is characterized in that, adopts following method to calculate in the step (5) and merges tally set
In the recommendation importance of each label:
(5.1) calculate resource to be marked
In each speech w in resource to be marked
In content importance p (w),
Wherein,
For speech w in resource to be marked
The middle number of times that occurs,
Be resource to be marked
In the number of all speech,
The number of the whole speech that comprise for all label recommended candidate resources, | R
Cad(w) | be the number of the resource of the speech w that comprises in all label recommended candidate resources;
Wherein,
For label t in resource to be marked
The middle number of times that occurs,
For speech w and label t in resource to be marked
In the number of times that occurs simultaneously;
6. one kind based on the tendentious label commending system of user's motivation, comprises motivation tendentiousness computing module (100), selects non-user to rely on similar resource module (200), selects label recommended candidate resource module (300), label merges module (400), recommend importance computing module (500) and output module (600);
Motivation tendentiousness computing module (100) is used to calculate user's motivation tendentiousness, motivation tendentiousness that each has marked resource and the motivation tendentiousness of resource to be marked;
Selecting non-user to rely on similar resource module (200) is used for obtaining non-user and relying on similar resource marking the resource selection resource similar to the motivation tendentiousness of resource to be marked;
Select label recommended candidate resource module (300) to be used for relying on similar resource and select the resource similar, obtain label recommended candidate resource to user's motivation tendentiousness non-user;
Label merges module (400) and is used for all labels of label recommended candidate resource are merged, and obtains merging tally set;
Recommend importance computing module (500) to be used for calculating the recommendation importance that merges each label of tally set;
Output module (600) is used for according to the recommendation importance of each label from big to small, carries out label and recommends.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110154353 CN102262653B (en) | 2011-06-09 | 2011-06-09 | Label recommendation method and system based on user motivation orientation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110154353 CN102262653B (en) | 2011-06-09 | 2011-06-09 | Label recommendation method and system based on user motivation orientation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102262653A true CN102262653A (en) | 2011-11-30 |
CN102262653B CN102262653B (en) | 2013-09-18 |
Family
ID=45009282
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201110154353 Expired - Fee Related CN102262653B (en) | 2011-06-09 | 2011-06-09 | Label recommendation method and system based on user motivation orientation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102262653B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103164463A (en) * | 2011-12-16 | 2013-06-19 | 国际商业机器公司 | Method and device for recommending labels |
CN103544510A (en) * | 2013-09-30 | 2014-01-29 | 小米科技有限责任公司 | Information processing method, information processing device and mobile terminal |
CN104199838A (en) * | 2014-08-04 | 2014-12-10 | 浙江工商大学 | User model building method based on label disambiguation |
CN104216881A (en) * | 2013-05-29 | 2014-12-17 | 腾讯科技(深圳)有限公司 | Method and device for recommending individual labels |
CN105989018A (en) * | 2015-01-29 | 2016-10-05 | 深圳市腾讯计算机系统有限公司 | Label generation method and label generation device |
CN107341242A (en) * | 2017-07-06 | 2017-11-10 | 太原理工大学 | A kind of label recommendation method and system |
CN107833082A (en) * | 2017-09-15 | 2018-03-23 | 广州唯品会研究院有限公司 | A kind of recommendation method and apparatus of commodity picture |
CN108334625A (en) * | 2018-02-09 | 2018-07-27 | 深圳壹账通智能科技有限公司 | Processing method, device, computer equipment and the storage medium of user information |
CN108415971A (en) * | 2018-02-08 | 2018-08-17 | 兰州智豆信息科技有限公司 | Recommend the method and apparatus of supply-demand information using knowledge mapping |
CN111221644A (en) * | 2018-11-27 | 2020-06-02 | 阿里巴巴集团控股有限公司 | Resource scheduling method, device and equipment |
CN112966682A (en) * | 2021-05-18 | 2021-06-15 | 江苏联著实业股份有限公司 | File classification method and system based on semantic analysis |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6655963B1 (en) * | 2000-07-31 | 2003-12-02 | Microsoft Corporation | Methods and apparatus for predicting and selectively collecting preferences based on personality diagnosis |
CN101751448A (en) * | 2009-07-22 | 2010-06-23 | 中国科学院自动化研究所 | Commendation method of personalized resource information based on scene information |
CN102004774A (en) * | 2010-11-16 | 2011-04-06 | 清华大学 | Personalized user tag modeling and recommendation method based on unified probability model |
-
2011
- 2011-06-09 CN CN 201110154353 patent/CN102262653B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6655963B1 (en) * | 2000-07-31 | 2003-12-02 | Microsoft Corporation | Methods and apparatus for predicting and selectively collecting preferences based on personality diagnosis |
CN101751448A (en) * | 2009-07-22 | 2010-06-23 | 中国科学院自动化研究所 | Commendation method of personalized resource information based on scene information |
CN102004774A (en) * | 2010-11-16 | 2011-04-06 | 清华大学 | Personalized user tag modeling and recommendation method based on unified probability model |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103164463A (en) * | 2011-12-16 | 2013-06-19 | 国际商业机器公司 | Method and device for recommending labels |
US9134957B2 (en) | 2011-12-16 | 2015-09-15 | International Business Machines Corporation | Recommending tags based on user ratings |
CN103164463B (en) * | 2011-12-16 | 2017-03-22 | 国际商业机器公司 | Method and device for recommending labels |
CN104216881A (en) * | 2013-05-29 | 2014-12-17 | 腾讯科技(深圳)有限公司 | Method and device for recommending individual labels |
CN103544510B (en) * | 2013-09-30 | 2016-10-26 | 小米科技有限责任公司 | Information processing method, device and mobile terminal |
CN103544510A (en) * | 2013-09-30 | 2014-01-29 | 小米科技有限责任公司 | Information processing method, information processing device and mobile terminal |
CN104199838B (en) * | 2014-08-04 | 2017-09-29 | 浙江工商大学 | A kind of user model constructing method based on label disambiguation |
CN104199838A (en) * | 2014-08-04 | 2014-12-10 | 浙江工商大学 | User model building method based on label disambiguation |
CN105989018A (en) * | 2015-01-29 | 2016-10-05 | 深圳市腾讯计算机系统有限公司 | Label generation method and label generation device |
CN105989018B (en) * | 2015-01-29 | 2020-04-21 | 深圳市腾讯计算机系统有限公司 | Label generation method and label generation device |
CN107341242A (en) * | 2017-07-06 | 2017-11-10 | 太原理工大学 | A kind of label recommendation method and system |
CN107833082A (en) * | 2017-09-15 | 2018-03-23 | 广州唯品会研究院有限公司 | A kind of recommendation method and apparatus of commodity picture |
CN108415971A (en) * | 2018-02-08 | 2018-08-17 | 兰州智豆信息科技有限公司 | Recommend the method and apparatus of supply-demand information using knowledge mapping |
CN108415971B (en) * | 2018-02-08 | 2021-07-23 | 兰州智豆信息科技有限公司 | Method and device for recommending supply and demand information by using knowledge graph |
CN108334625A (en) * | 2018-02-09 | 2018-07-27 | 深圳壹账通智能科技有限公司 | Processing method, device, computer equipment and the storage medium of user information |
CN108334625B (en) * | 2018-02-09 | 2020-05-29 | 深圳壹账通智能科技有限公司 | User information processing method and device, computer equipment and storage medium |
CN111221644A (en) * | 2018-11-27 | 2020-06-02 | 阿里巴巴集团控股有限公司 | Resource scheduling method, device and equipment |
CN111221644B (en) * | 2018-11-27 | 2023-06-13 | 阿里巴巴集团控股有限公司 | Resource scheduling method, device and equipment |
CN112966682A (en) * | 2021-05-18 | 2021-06-15 | 江苏联著实业股份有限公司 | File classification method and system based on semantic analysis |
Also Published As
Publication number | Publication date |
---|---|
CN102262653B (en) | 2013-09-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102262653B (en) | Label recommendation method and system based on user motivation orientation | |
Al-Ghuribi et al. | Multi-criteria review-based recommender system–the state of the art | |
Zhou et al. | Preference-based mining of top-K influential nodes in social networks | |
Zhang et al. | Identification of the to-be-improved product features based on online reviews for product redesign | |
Liu et al. | Analyzing changes in hotel customers’ expectations by trip mode | |
Lv et al. | Learning to model relatedness for news recommendation | |
Liang et al. | Connecting users and items with weighted tags for personalized item recommendations | |
CN102004774B (en) | Personalized user tag modeling and recommendation method based on unified probability model | |
Welch et al. | Search result diversity for informational queries | |
Bendersky et al. | Learning from user interactions in personal search via attribute parameterization | |
Lu et al. | Scalable news recommendation using multi-dimensional similarity and Jaccard–Kmeans clustering | |
CN104834686A (en) | Video recommendation method based on hybrid semantic matrix | |
US8930388B2 (en) | System and method for providing orientation into subject areas of digital information for augmented communities | |
CN103473354A (en) | Insurance recommendation system framework and insurance recommendation method based on e-commerce platform | |
CN104572797A (en) | Individual service recommendation system and method based on topic model | |
JP2010176665A (en) | System and method for providing default hierarchical training for social indexing | |
CN103823893A (en) | User comment-based product search method and system | |
CN105426514A (en) | Personalized mobile APP recommendation method | |
CN104484431A (en) | Multi-source individualized news webpage recommending method based on field body | |
CN105608166A (en) | Label extracting method and device | |
CN104077417A (en) | Figure tag recommendation method and system in social network | |
CN104915449A (en) | Faceted search system and method based on water conservancy object classification labels | |
CN104899229A (en) | Swarm intelligence based behavior clustering system | |
An et al. | A heuristic approach on metadata recommendation for search engine optimization | |
CN102982101B (en) | Based on the method for the network community user Push Service of user context body |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20130918 Termination date: 20140609 |