CN102262653A - Label recommendation method and system based on user motivation orientation - Google Patents

Label recommendation method and system based on user motivation orientation Download PDF

Info

Publication number
CN102262653A
CN102262653A CN 201110154353 CN201110154353A CN102262653A CN 102262653 A CN102262653 A CN 102262653A CN 201110154353 CN201110154353 CN 201110154353 CN 201110154353 A CN201110154353 A CN 201110154353A CN 102262653 A CN102262653 A CN 102262653A
Authority
CN
China
Prior art keywords
resource
label
user
motivation
marked
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201110154353
Other languages
Chinese (zh)
Other versions
CN102262653B (en
Inventor
李瑞轩
靳延安
文坤梅
辜希武
李玉华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN 201110154353 priority Critical patent/CN102262653B/en
Publication of CN102262653A publication Critical patent/CN102262653A/en
Application granted granted Critical
Publication of CN102262653B publication Critical patent/CN102262653B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a label recommendation method based on a user motivation orientation. The method provided by the invention comprises the following steps of: calculating a user motivation orientation, a motivation orientation of each labeled resource and a motivation orientation of a resource to be labeled according to a user triple; selecting a resource which is similar with the motivation orientation of the resource to be labeled from the labeled resources to obtain a non-user-depended similar resource; selecting a resource which is similar with the user motivation orientation from the non-user-depended similar resource to obtain a label recommendation candidate resource; combining all labels in the label recommendation candidate resource to obtain a combined label set; calculating a recommendation importance of each label in the combined label set; finally, carrying out the label recommendation according to the recommendation importance of each label from big to small. The method provided by the invention can recognize the calculating of the network information resource labeled by the user and then recommends a list which accords with the user intention and is composed of multiple labels to the user. The invention also provides a label recommendation system based on the method.

Description

A kind of based on tendentious label recommend method of user's motivation and system
Technical field
The invention belongs to the Web information resources and handle and utilize the field, be specifically related to recommend the method for label for the Web information resources and based on the commending system of the method based on user's motivation tendentiousness.
Background technology
Growing along with Internet, network information resource is just with the unthinkable speed increment of people, and the appearance of Web2.0 makes that this growth is swifter and more violent.In Web2.0, the internet system is from bottom to top leading by users' group wisdom and strength by top-down, minor resource effector centralized control, dominant transition originally.The user still is the producer of network information resource except being the viewer of network information resource simultaneously.Though this specific character of Web2.0 user's create contents has been enriched the source of information, quickened the diffusion of information, also cause information overload simultaneously, searched problems such as load increases the weight of, the reduction of information quality.So, user's cover the sky and the earth mass network information resources and obtain suitable and high-quality information how apace, at low cost, effectively and just become the impassable great research topic of pendulum of organization and management how at leisure in our front.
Desirable network information resource tissue should customer-centric, makes full use of the experience of emerging technology and people accumulation, and organizational framework should possess advantages of high practicability and ease for use.Under the Web2.0 environment, social tag system just plays an important role as a kind of very effective method of network information resource tissue.As organizational form, it is different with traditional controlled hierarchical classification system top-down, rigidity, society's tag system system has following three advantages: (1) social label is network information resource user generation when Internet resources marks, identical social label has formed new classification through after compiling, and it is bottom-up; (2) social label is not controlled by the expert, and the user can use any speech to mark voluntarily, has high dirigibility, ease for use and subjective awareness, and Internet resources can " flexibility " be under the jurisdiction of a plurality of popular classification.(3) in social tag system, the user can mark Internet resources from a plurality of dimensions, many levels.Therefore, its structure is non-level.
Yet when possessing numerous advantages, there is shortcoming equally in the label mode, mainly find expression in following two aspects: (1) most of social tag systems allow user's input label voluntarily, this operating mode makes the user be easy to control the mark behavior, but, the randomness of mark has more noise in the label because also having caused, wrong assembly, ambiguity and the User Defined label that does not have a practical significance usually are full of wherein, and this practicality to label has caused not little obstacle.For this reason, some social tag system has to provide some governing principles for the user specially.(2) the sparse problem of data, because it is a kind of emerging information organization mode that label type is browsed, also do not obtain very using widely, especially in Chinese resource, adopt the Internet resources of this organizational form very rare, on the other hand, the user is unaccustomed to still to be that Internet resources add a large amount of labels, thereby makes that existing label resources is very rare on the network.
In recent years, under this actual demand, the label recommended technology has been subjected to the extensive concern of academia and internet enterprise just.It is that network information resource to be marked provides a series of high-quality labels as the candidate by content and user's mark history, the pass explicit or implicit expression of investigating, analyze, excavating network information resource exactly that label is recommended.The purpose of recommending mainly is: (1) simplifies marking program, is user-friendly to, thereby increases the availability and the viscosity of social tag system.(2) quality of raising label reduces situations such as wrong assembly, ambiguity, improves the effect of label in organization of information resources, retrieval, utilization and discovery.(3) structure of change Label space makes Label space stablize faster and convergence, and then emerges in large numbers semanteme.
At present, both at home and abroad the comparatively ripe social label commending system of some development has been arranged at the various network information resource, these systems all in organization of information resources, retrieve, share and aspect such as discovery has played important effect.These systems comprise: commodity are carried out the Amazon that label is recommended, web page resources is carried out the Delicious that label is recommended, to picture carry out Flickr that label recommends, to scientific paper carry out Bibsonomy that label recommends, for the books film recommend the bean cotyledon net of label, for the potato net that provides video to share to recommend label etc.The label commending system that had existed already mainly adopts the technology of Recommendations traditional in the e-commerce system, mainly comprises: content-based recommended technology, based on the recommended technology of collaborative filtering, based on the recommended technology of correlation rule and the hybrid technology of these technology.Recommending on the foundation, these traditional recommended technologies or the content that is based on resource itself are recommended, or recommend based on the historical results of user's mark.On proposed algorithm, great majority are the algorithms that adopt data mining or machine learning.These traditional label recommended technologies have solved tissue, the classification of information overload and information resources, the problem of retrieval to a certain extent, but also very undesirable on effect, especially can not recommend to satisfy the label of customer information requirement.
Summary of the invention
In order to satisfy user's information requirement, use the motivation of social tag system from the user, discern its information object, for it recommends social more accurately label, the invention provides a kind ofly based on the tendentious recommendation stamp methods of user's motivation, this method can be recommended the tabulation that a plurality of labels that one of user meets user view are formed.The present invention also provides the label commending system based on this method simultaneously.
The present invention adopts following technical scheme to realize: the invention provides a kind ofly based on the tendentious label recommend method of user's motivation, may further comprise the steps:
(1) according to user's tlv triple, calculating user's motivation tendentiousness, the motivation tendentiousness that each has marked resource and the motivation tendentiousness of resource to be marked; Described user's tlv triple comprises user's mark history, marks resource and corresponding mark and resource to be marked and corresponding mark;
(2) marking the selection resource similar in the resource, the resource that obtains is called non-user relies on similar resource to the motivation tendentiousness of resource to be marked;
(3) rely on the selection resource similar in the similar resource non-user, the resource that obtains is called label recommended candidate resource to user's motivation tendentiousness;
(4) all labels in the label recommended candidate resource are merged, obtain merging tally set;
(5) calculate the recommendation importance that merges each label in the tally set;
(6) from big to small, carrying out label recommends according to the recommendation importance of each label.
It is a kind of based on the tendentious label commending system of user's motivation that the present invention also provides, and comprises motivation tendentiousness computing module, selects non-user to rely on similar resource module, selects label recommended candidate resource module, label to merge module, recommends importance computing module and output module;
Motivation tendentiousness computing module is used to calculate user's motivation tendentiousness, motivation tendentiousness that each has marked resource and the motivation tendentiousness of resource to be marked;
Selecting non-user to rely on similar resource module is used for obtaining non-user and relying on similar resource marking the resource selection resource similar to the motivation tendentiousness of resource to be marked;
Select label recommended candidate resource module to be used for relying on similar resource and select the resource similar, obtain label recommended candidate resource to user's motivation tendentiousness non-user;
Label merges module and is used for all labels of label recommended candidate resource are merged, and obtains merging tally set;
Recommend the importance computing module to be used for calculating the recommendation importance that merges each label of tally set;
Output module is used for according to the recommendation importance of each label from big to small, carries out label and recommends.
The starting point of label recommend method is the content of resource itself or the same existing structure of label etc. in the existing social tag system, and method proposed by the invention is directly from the metastable mark motivation of user tendentiousness, by obtaining user's mark motivation tendentiousness, and carry out label according to this mark motivation tendentiousness and recommend, the label of being recommended more meets user's intention, the better effects if of recommendation.The present invention can discern the motivation that the user marks network information resource, the discovery of this motivation provides good design reference to the tag design commending system, and can produce directive function to the study of body in the Label space, the semanteme stable, social label that is more conducive to social label construction emerges in large numbers.
Description of drawings
Fig. 1 is based on the tendentious label recommended flowsheet of user's motivation;
Fig. 2 is a special tag utilization rate inquiry synoptic diagram of the present invention;
Fig. 3 is the label cloud atlas that the present invention describes motivation tendentiousness user;
Fig. 4 is a label commending system module map of the present invention.
Embodiment
The present invention is further detailed explanation below in conjunction with accompanying drawing and example.
Motivation tendentiousness described in the present invention mainly contains two classes, the motivation of promptly classifying tendency and description motivation tendency, and their characteristics are as shown in table 1.
The characteristics of table 1 classification motivation tendency and description motivation tendency
Classification motivation tendency Description motivation tendency
Purpose Browse after being convenient to Inquiry and retrieval after being convenient to
The resource tag rate Low High
The vocabulary size Limited Infinitely
Situation appears in synonym Few Many
Label is from the resource title Few Many
Change the cost of label Greatly Little
Concretely, the mark user with classification motivation tendency to use the purpose of label be will provide one for the resource that is marked to browse help function.Therefore, the user of classification motivation tendency wishes to set up a stable vocabulary according to the preference of oneself.For the ease of browsing, yes does not simply more have redundancy good more more for this vocabulary, so the mark user tends to avoid using the speech with identical semanteme, and can select the speech of clear and easy to understand easy note for use.For example,, have in user's the vocabulary of classification motivation tendency and often only can have " car " when mark during an automobile, and can not use " automobile ", speech that " vehicle " etc. has equivalent.Like this, from the result of mark, such vocabulary is more as a semantic classification system.Certainly, the same with traditional categorizing system, the classification cost that changes categorizing system is bigger.
Having the mark user who describes the motivation tendency, to use the purpose of label be will describe the content of the resource that is marked accurately so that inquiry in the future and retrieval.In order to support inquiry and the purpose of browsing better, so user's vocabulary just may be introduced the speech of many that be of little use and synonyms, for example when the description automobile, " car ", " automobile ", " vehicle " can appear in the vocabulary.In addition, the user often wishes to go to describe resource from many aspects, does not limit to the quantity of the speech of use.Also may be along with the development of cognition in the mark process, the speech of same meaning may change.So the vocabulary of describing the user of motivation tendency is an opening, dynamic vocabulary.
Among the present invention, adopt following symbol to come correlation parameter in the mark society tag system, u represents a user, and r represents a resource, and such as webpage, U represents to mark the set of all user u of resource r, and R represents the user, and all have marked the set of resource, R uAll resources that expression user u has marked, | R u| expression set R uMiddle number of tags, t represents any one label, t 1, t 2..., t nAll represent some concrete labels, T represents the user, and all have marked the tag set that resource is endowed, T uExpression user u the set of used all labels, | T u| expression set T uMiddle number of tags, T rRepresent that all users give all labels of resource r, | T r| expression set T rIn number of tags; R u(t) represent the resource that user u uses label t to mark.
The invention provides a kind ofly, may further comprise the steps based on the tendentious label recommend method of user's motivation:
(1) according to user's tlv triple, calculating user's motivation tendentiousness, the motivation tendentiousness that each has marked resource and the motivation tendentiousness of resource to be marked; Described user's tlv triple comprises user's mark history, marks resource and corresponding mark and resource to be marked and corresponding mark; The mark of described resource correspondence comprises the mark of all users to resource.
The motivation tendentiousness of user u can be used vector M uExpression, that is:
M u=(TRR u,LFTU u,TRCE u,TSOF u,STR u) (1)
Wherein, TRR u, LFTU u, TRCE u, TSOF u, STR uBe the tendentious 5 kinds of metric of motivation, its implication and calculating are as follows respectively:
A) user's the average label rate of the resource of mark (Tags/Resources Ratio, TRR)
User's the average label rate of the resource of mark TRR uBe used to weigh the user for each resource marks employed average number of labels, the ratio of the sum of the resource that the size that it equals user's vocabulary and user are marked is as formula (2).
TRR u=e-|T u|/|R u| (2)
The user who describes the motivation tendency needs for describing, and can select various speech to describe resource, in theory, is not subjected to the restriction of quantity.For the needs of browsing, the user of classification motivation tendency tends to select less speech to mark resource.Therefore, its vocabulary is limited.In general, the user of classification motivation tendency is obviously low than the user who describes the motivation tendency in the score of this characteristic measure.That is to say user's TRR uBe worth more little, then this user may tend to more the classification; TRR uBe worth greatly more, then this user may tend to describe more.
B) user's low frequency tags utilization rate (Lower Frequency Tag Ratio, LFTR)
In order to add up label, use LFTR in situation about using uPortray the user and use the degree of those low frequency tags, the quantity that it equals low frequency tags takies the ratio of total number of labels in the vocabulary of family.So-called low frequency tags is exactly the label that often do not use of user for the minor resource mark, promptly refers to those mark access times label seldom.Its tolerance is calculated with formula (3):
Figure BDA0000067195780000061
Wherein t represents any one label, t MaxFor this user uses the most frequent label, | R (t) |, | R (t Max) | be respectively and comprise label t and the most frequent label t MaxResource quantity, n is for comprising the most frequent label t MaxNumber of resources p/one on round, 0<p≤100,
Figure BDA0000067195780000062
Be the set of user u low frequency tags, promptly be included in the no more than n of the number of resources that label marked in this set,
Figure BDA0000067195780000071
Number for user u low frequency tags.
Obvious 0≤LFTR u≤ 1.Work as LFTR u=1 o'clock, the number of times that all labels of expression user use was no more than n time, these labels all be from different angles, different sides describes resource, and the user does not mind the use low-frequency word.Certainly, can think that this user has the motivation of description tendency.Work as LFTR u=0 o'clock, the expression user seldom used low frequency tags, thinks that low frequency tags is unfavorable for classified browse.If the introducing low frequency tags equals to introduce noise, destruction is kept consistent possibility of classifying, so this user takes notice of the use low-frequency word very much.Certainly, can think that this user has classification motivation tendency.
C) the relative conditon entropy of each label of user (Tag Relative Conditional Entropy, TRCE)
For the user of classification motivation tendency, they wish that label has maximum discrimination, only in this way browse just full blast.Therefore, selecting the user process of label to liken to is the process of information coding.Information coding is to make the information entropy maximum of sign indicating number, and the user to select to have the useful in other words label of discrimination be exactly to make the information entropy of label want maximum.In other words, the user with classification tendency thinks that exactly all labels are all identical on frequency of utilization, so just may make the information entropy maximum of label also just more to help browsing of user.On the contrary, the user who describes the motivation tendency is to this and lose interest in.
When the user uses the label coding resource, its conditional entropy H u(R|T) just reflected the validity of this cataloged procedure, can calculate according to formula (4):
H u ( R | T ) = - Σ r ∈ R Σ t ∈ T p ( r , t ) log 2 p ( r | t ) - - - ( 4 )
Wherein, (r t) is the joint distribution probability of label t on resource r to p, and p (r|t) is the probability of label t mark resource r.
In the calculating of label information entropy, use label and resource as stochastic variable.Conditional entropy can be interpreted as the uncertainty that resource keeps label, mainly is subjected to the influence of the size of the quantity of resource and vocabulary.In order to distinguish the difference between the user, conditional entropy is carried out normalized to keep coded message, make the conditional entropy H of reality observation u(R|T) all with desirable conditional entropy H Opt(R|T) compare.When each label all has same discrimination, when promptly the p of all labels (r|t) is the same, can obtain desirable conditional entropy H Opt(R|T), at this moment conditional entropy is also maximum.Therefore, can be with ideally conditional entropy as normalized factor, on this basis, the relative conditon entropy of label calculates as formula (5)
TRCE u = H opt ( R | T ) - H u ( R | T ) H opt ( R | T ) - - - ( 5 )
Obviously, 0≤TRCE u≤ 1.Work as TRCE uNear 0, the conditional entropy that the user is described is more near ideal situation more, and also care label is tending towards the equiprobability distribution.At this moment, the separating capacity of label is very strong, can judge the very big user that may belong to classification motivation tendency of user.Otherwise describe the user that motivation is inclined to very big may belonging to.
D) the semantic repetition factor of user's label (Tag Semantic Overlap Factor, TSOF)
User for having classification motivation tendency wishes that each synonym in own vocabulary is the least possible, can improve the efficient of browsing like this.But have the user who describes the motivation tendency for one, they also are indifferent to these, and are on the contrary, can comprehensive more description resource good more.Therefore, can weigh the motivation tendency that the user has, calculate as formula (6) by calculating the employed label similarity of user.
Figure BDA0000067195780000082
Wherein, sim (t i, t j) be two label t i, t jBetween similarity, adopt formula (7) to calculate.F (t in the formula i), f (t j) be respectively label t i, t jAt the number of times that the tally set of user u occurs, f (t i, t j) be at the common number of times that occurs of user's tally set.
sim ( t i , t j ) = max ( log f ( t i ) , log f ( t j ) ) - lo gf ( t i , t j ) log N - min ( log f ( t i ) , log f ( t j ) ) - - - ( 7 )
Wherein N is total number of speech in user's the tally set.
When the similarity of all labels of user near 0 the time, TSOF uApproach 0, illustrate that this user's motivation tendentiousness is classification tendentiousness; Otherwise the motivation tendentiousness that the user is described is for describing tendentiousness.
E) user's special tag utilization rate (Special Tags Ratio, STR)
By social tag system label statistical study is found: when the user uses interrogative adverbs such as when, what, how to mark, remaining tagging user tends to be selected from the title of resource, through compare of analysis, the intention of these users' description content of pages is (as Fig. 2) very obviously.Defining these interrogative adverb labels is special tag.Simultaneously, as can be seen from Figure 3, these users mark the motivation of other resources and also tend to descriptive motivation, obviously contain other features (as the semantic repetition factor of label) of descriptive motivation tendency in other mark records as the user " breneaux " in article one record.
The utilization rate of special speech also can be used as one of tendentious discriminant criterion of user's motivation when therefore, marking resource.If a user's special speech utilization rate is high more, he has that to describe motivation tendentiousness just high more so.Otherwise the description motivation tendentiousness that he has is just low more.The special tag utilization rate is measured with formula (8).
STR u=card(t∈T str)/|T u| (8)
Wherein, T Str=who, and when, what, when, where, how, howto ... be the special tag set, its set for all interrogative adverbs in the English can be set.Card (t ∈ T Str) be the employed T that is included in of user u StrIn the number of label, comprise repeat count.Obvious 0≤STR u≤ 1, work as STR u=card (t ∈ T Str)/| T u| more near 1, user u may have the motivation of description tendency more; If STR uNear 0, user u may have classification motivation tendency more more.
Each has marked the motivation tendentiousness of resource and also can represent with the vector of 5 kinds of motivation metric; For a certain resource r, its motivation tendentiousness can be used vector M rExpression, that is:
M r=(TRR r,LFTU r,TRCE r,TSOF r,STR r) (9)
Wherein, TRR r, LFTU r, TRCE r, TSOF r, STR rBe the tendentious 5 kinds of metric of the motivation of resource, its implication and calculating are as follows respectively:
A) marked resource the average label rate of each user (Tags/Resources Ratio, TRR)
Marked the average label rate of each user TRR of resource rBe used to weigh the number of labels that each user of having marked resource on average uses, it equals the number of all labels that resource uses and the ratio of the number of users of this resource of mark, as formula (10).
TRR r=e-|T r|/|U r| (10)
Wherein, | T r| represent that all users give all number of tags of resource r, | U r| all number of users that expression is given label to resource r.In general, the resource of classification motivation tendency is obviously low than the resource of describing the motivation tendency in the score of this characteristic measure.That is to say the TRR of resource rBe worth more little, then this resource may tend to more the classification; TRR rBe worth greatly more, then this resource may be tended to describe more.
B) marked resource the low frequency tags utilization rate (Lower Frequency Tag Ratio, LFTR)
In order to add up label, use LFTR in situation about using rPortray the usage degree of the low frequency tags of giving resource, the quantity that it equals low frequency tags accounts for the ratio of total number of labels in all vocabularys of giving this resource.So-called low frequency tags is exactly often not give the label of this resource.Its tolerance is calculated with formula (11):
Wherein t represents any one label of resource r, t Max' be the most used label of resource r, | R u(t) | for having used the number of users of label t, | R u(t Max') | for having used the most frequent label t Max' number of users, m is the most used label t Max' number of users q/one on round, 0<q≤100,
Figure BDA0000067195780000102
Be the set of the low frequency tags of resource r, the number of users that promptly uses the label in this set is smaller or equal to m,
Figure BDA0000067195780000103
Number for the low frequency tags of resource r.
C) marked each label of resource the relative conditon entropy (Tag Relative Conditional Entropy, TRCE)
For the resource of classification motivation tendency, the label that institute gives them should have maximum discrimination, only in this way when the user browses, and the ability full blast.Therefore, can to liken to be the process of information coding to the process of giving resource tag.The information coding is the information entropy maximum of feasible sign indicating number most effectively, and selecting that the label of discrimination is arranged is exactly to make the information entropy of label want maximum.When the user uses the label coding resource, its conditional entropy H r(U|T) just reflected the validity of this cataloged procedure, can calculate according to formula (12):
H r ( U | T ) = - Σ u ∈ U Σ t ∈ T p ( u , t ) log 2 p ( u | t ) - - - ( 12 )
Wherein, (u t) is the joint distribution probability of user u use label t to p, and p (u|t) is the probability that user u uses label t mark resource r.When each label all has same discrimination, when promptly the p of all labels (u|t) is the same, can obtain desirable conditional entropy H Ropt(U|T), at this moment conditional entropy is also maximum.
In the calculating of label information entropy, use label and user as stochastic variable.Conditional entropy can be interpreted as the uncertainty that the user uses label, mainly is subjected to the influence of the size of number of users and vocabulary.In order to distinguish the difference between the resource, conditional entropy is carried out normalized to keep coded message, make the conditional entropy H of reality observation r(U|T) all with desirable conditional entropy H Ropt(U|T) compare.Therefore, can be with ideally conditional entropy as normalized factor, on this basis, the relative conditon entropy of label calculates as formula (13)
TRCE r = H ropt ( U | T ) - H r ( U | T ) H ropt ( U | T ) - - - ( 13 )
D) marked resource the semantic repetition factor of label (Tag Semantic Overlap Factor, TSOF)
For the resource with classification motivation tendency, the synonym in its vocabulary should be the least possible, can improve the efficient of browsing like this.But have the resource of describing the motivation tendency for one, just in time opposite, if label is can comprehensive more description resource good more.Therefore, can weigh the motivation tendency that resource has, calculate as formula (14) by the label similarity that resource is given in calculating.
Figure BDA0000067195780000112
Wherein, sim (t i, t jTwo label t of) ' be i, t jBetween similarity, adopt formula (15) to calculate.F (t in the formula i) ', f (t j) ' be respectively label t i, t jAt the number of times that the tally set of resource r occurs, f (t i, t j) ' be two label t i, t jAt the common number of times that occurs of the tally set of resource r.
sim ( t i , t j ) ′ = max ( log f ( t i ) ′ , log f ( t j ) ′ ) - lo gf ( t i , t j ) ′ log N ′ - min ( log f ( t i ) ′ , log f ( t j ) ′ ) - - - ( 15 )
Wherein N ' is total number of speech in the tally set of resource r.
E) marked resource the special tag utilization rate (Special Tags Ratio, STR)
The definition of the special tag of resource is the same with the definition of user's special tag.The utilization rate of special speech equally also can be used as one of tendentious discriminant criterion of mark motivation of resource.If the special speech utilization rate that resource is endowed is high more, it just has the motivation of description tendentiousness more so.Otherwise he its classification motivation tendentiousness of being had is just high more.The special tag utilization rate is measured with formula (16).
STR r=card(t∈T str)′/|T r| (16)
Wherein, T Str=who, and when, what, when, where, how, howto ... be the special tag set, its set for all interrogative adverbs in the English can be set.Card (t ∈ T Str) ' be the employed T that is included in of resource r StrIn the number of label, comprise repeat count.Obvious 0≤STR r≤ 1, work as STR r=card (t ∈ T Str)/| T r| more near 1, resource r may have the motivation of description tendency more; If STR rNear 0, resource r may have classification motivation tendency more more.
Equally, resource to be marked
Figure BDA0000067195780000121
Motivation tendentiousness also can be expressed as the vector of 5 kinds of motivation metric
Figure BDA0000067195780000122
That is:
M r ^ = ( TRR r ^ , LFTU r ^ , TRCE r ^ , TSOF r ^ , STR r ^ ) - - - ( 17 )
Wherein,
Figure BDA0000067195780000124
Be the tendentious 5 kinds of metric of the motivation of resource, the calculating of these 5 kinds of metric is identical with the tendentious computing method of the motivation that marks resource.
(2) marking the selection resource similar in the resource, obtain non-user and rely on similar resource to the motivation tendentiousness of resource to be marked; Promptly calculate the tendentious similarity of each motivation that has marked resource and resource to be marked, select the mark resource of similarity, the combination of resources of choosing is become non-user rely on similar resource R greater than threshold values α Sim
Concretely, when resource was marked, used label came to mark for new resource before the user tended to select for use.So find the resource similar, calculate the tendentious matching degree of the current motivation of these resources and user to resource to be marked.The label that uses the high resource of matching degree will obtain meeting the label of user view as the Candidate Set of recommending.Calculate tendentiousness similarity that resource to be marked and user marked resource and can adopt Method of Cosine, as formula (18) based on vector space.
sim r ∈ R u ( M r , M r ^ ) = M r · M r ^ | M r | | M r ^ | - - - ( 18 )
M wherein rThe motivation tendentiousness vector representation of the resource that has marked for the user,
Figure BDA0000067195780000126
Motivation tendentiousness vector representation for resource to be marked.Set a threshold values α as controlling elements, (the α value is 0 to 1 if similarity is more than or equal to threshold values α, can record by experiment, suggestion is set to 0.6), it is very high that this just means that also the user has marked motivation tendentiousness similarity degree resource and resource to be marked.The combination of resources of mark that these similarity degrees are very high is used R together SimThe non-user that expression is combined into relies on similar resource, promptly
Figure BDA0000067195780000131
Calculation of similarity degree can also adopt methods such as mutual information, Pearson be similar.
(3) rely on similar resource R non-user SimThe middle selection resource similar to user's motivation tendentiousness obtains label recommended candidate resource R Cad
Adopt formula (19) to calculate non-user and rely on similar resource R SimIn the motivation tendentiousness and the tendentious similarity of user's motivation of each resource,
sim r ∈ R sim ( M r , M u ) = M r · M u | M r | | M u | - - - ( 19 )
M wherein rThe vector representation of the resource that the resource that has marked for the user and resource motivation tendentiousness to be marked are comparatively similar, M uMotivation tendentiousness vector representation for the user.Set a threshold values β as controlling elements, if similarity, illustrates that the label of these resources can be used as the label that meets user view and recommends more than or equal to threshold values β (the β value is 0 to 1, can record by experiment, and suggestion is set to 0.6).Select similarity to rely on similar resource,, use R as label recommended candidate resource greater than the non-user of threshold values β CadExpression label recommended candidate resource, promptly
Figure BDA0000067195780000133
(4) with label recommended candidate resource R CadIn all labels merge, obtain merging tally set; Be about to label recommended candidate resource R CadIn each resource by formula (20) merge its all labels, obtain merging tally set;
T ^ u = ∪ ( r | sim r ∈ R sim ( M r , M u ) ≥ β ) T r - - - ( 20 )
(5) calculate the recommendation importance that merges each label in the tally set; Promptly calculate the recommendation importance that merges each label in the tally set according to formula (21)
Figure BDA0000067195780000135
p ( t | r ^ ) = Σ w ∈ r ^ , t ∈ T ^ u p ( w ) s ( w , t ) - - - ( 21 )
Wherein p (w) is a resource to be recommended In each speech w in resource to be recommended
Figure BDA0000067195780000138
In content importance, calculate according to formula (22); (w is the correlativity between the label t in speech w and the merging tally set t), according to formula (23) to s.
p ( w ) = log ( tf ( w , r ^ ) N r ^ + 1 ) log ( N R cad | R cad ( w ) | + 1 ) - - - ( 22 )
s ( w , t ) = max ( { log tf ( w , r ^ ) , log tf ( t , r ^ ) } ) - log tf ( w , t , r ^ ) log N r ^ - min ( { log tf ( w , r ^ ) , log tf ( t , r ^ ) } ) - - - ( 23 )
Wherein,
Figure BDA0000067195780000142
For speech w in resource
Figure BDA0000067195780000143
The middle number of times that occurs,
Figure BDA0000067195780000144
For label t in resource
Figure BDA0000067195780000145
The middle number of times that occurs,
Figure BDA0000067195780000146
For speech w and label t in resource
Figure BDA0000067195780000147
In the number of times that occurs simultaneously,
Figure BDA0000067195780000148
Be resource
Figure BDA0000067195780000149
In the number of all speech,
Figure BDA00000671957800001410
The number of the whole speech that comprise for all label recommended candidate resources, | R Cad(w) | be the number of the resource of the speech w that comprises in all label recommended candidate resources, described herein speech w is meant the speech in the English.
(6) according to recommending importance p (t|r) from big to small, recommend its corresponding label.
It is a kind of based on the tendentious label commending system of user's motivation that the present invention also provides, as shown in Figure 4, comprise that motivation tendentiousness computing module (100), the non-user of selection rely on similar resource module (200), selection label recommended candidate resource module (300), label merging module (400), recommend importance computing module (500) and output module (600);
Motivation tendentiousness computing module (100) is used to calculate user's motivation tendentiousness, motivation tendentiousness that each has marked resource and the motivation tendentiousness of resource to be marked;
Selecting non-user to rely on similar resource module (200) is used for obtaining non-user and relying on similar resource marking the resource selection resource similar to the motivation tendentiousness of resource to be marked;
Select label recommended candidate resource module (300) to be used for relying on similar resource and select the resource similar, obtain label recommended candidate resource to user's motivation tendentiousness non-user;
Label merges module (400) and is used for all labels of label recommended candidate resource are merged, and obtains merging tally set;
Recommend importance computing module (500) to be used for calculating the recommendation importance that merges each label of tally set;
Output module (600) is used for according to the recommendation importance of each label from big to small, carries out label and recommends.
The present invention not only is confined to above-mentioned embodiment; persons skilled in the art can adopt other multiple embodiment to implement the present invention, therefore according to content disclosed by the invention; every employing project organization of the present invention and thinking all fall into the scope of protection of the invention.

Claims (6)

1. one kind based on the tendentious label recommend method of user's motivation, may further comprise the steps:
(1) according to user's tlv triple, calculating user's motivation tendentiousness, the motivation tendentiousness that each has marked resource and the motivation tendentiousness of resource to be marked; Described user's tlv triple comprises user's mark history, marks resource and corresponding mark and resource to be marked and corresponding mark;
(2) marking the selection resource similar in the resource, the resource that obtains is called non-user relies on similar resource to the motivation tendentiousness of resource to be marked;
(3) rely on the selection resource similar in the similar resource non-user, the resource that obtains is called label recommended candidate resource to user's motivation tendentiousness;
(4) all labels in the label recommended candidate resource are merged, obtain merging tally set;
(5) calculate the recommendation importance that merges each label in the tally set;
(6) from big to small, carrying out label recommends according to the recommendation importance of each label.
2. label recommend method according to claim 1 is characterized in that, the motivation tendentiousness of user u is M in the step (1) u=(TRR u, LFTU u, TRCE u, TSOF u, STR u), TRR u, LFTU u, TRCE u, TSOF u, STR uBe the tendentious metric of the motivation of user u, each metric is calculated as follows:
(a)TRR u=e-|T u|/|R u|;
Wherein, T uExpression user u the set of used all labels, | T u| expression set T uMiddle number of tags, R uAll resources that expression user u has marked, | R u| expression set R uMiddle number of tags;
Figure FDA0000067195770000011
Wherein, t represents any one label, t MaxBe the most used label of user u, | R (t) | for the resource quantity that comprises label t, | R (t Max) | for comprising the most frequent label t MaxResource quantity, n is for comprising the most frequent label t MaxNumber of resources p/one on round, 0<p≤100,
Figure FDA0000067195770000012
Be the set of user u low frequency tags,
Figure FDA0000067195770000021
Number for user u low frequency tags;
Figure FDA0000067195770000023
Wherein, (r t) is the joint distribution probability of label t on resource r to p, and p (r|t) is the probability of label t mark resource r, and R represents user u, and all have marked the set of resource, and T represents user u, and all have marked the tag set that resource is endowed, H Opt(R|T) be the p (r|t) of all labels H when all identical u(R|T) value;
Figure FDA0000067195770000024
Figure FDA0000067195770000025
Wherein, sim (t i, t j) two label t of expression i, t jBetween similarity, f (t i), f (t i) be respectively label t i, t jAt the number of times that the tally set of user u occurs, f (t i, t j) be two label t i, t jThe common number of times that occurs in the tally set of user u, N are total numbers of speech in the tally set of user u;
(e)STR u=card(t∈T str)/|T u|;
Wherein, T StrBe special tag set, card (t ∈ T Str) be the employed T that is included in of user u StrIn the number of label, comprise repeat count;
The motivation tendentiousness that has marked any one resource r in resource and the resource to be marked described in the step (1) is M r=(TRR r, LFTU r, TRCE r, TSOF r, STR r), TRR r, LFTU r, TRCE r, TSOF r, STR rBe the tendentious metric of the motivation of resource r, each metric is calculated as follows:
(a’)TRR r=e-|T r|/|U r|;
Wherein, | T r| represent that all users give all number of tags of resource r, | U r| all number of users that expression is given label to resource r;
Figure FDA0000067195770000031
Wherein, t represents any one label of resource r, t Max' be the most used label of resource r, | R u(t) | for having used the number of users of label t, | R u(t Max') | for having used the most frequent label t Max' number of users, m is the most used label t Max' number of users q/one on round, 0<q≤100,
Figure FDA0000067195770000032
Be the set of the low frequency tags of resource r,
Figure FDA0000067195770000033
Number for the low frequency tags of resource r;
Figure FDA0000067195770000034
Figure FDA0000067195770000035
Wherein, (u t) is the joint distribution probability of user u use label t to p, and p (u|t) is the probability that user u uses label t mark resource r, and U represents to mark all users' of resource r set, H Ropt(R|T) be the p (u|t) of all labels H when all identical r(R|T) value;
Figure FDA0000067195770000036
Figure FDA0000067195770000037
Wherein, sim (t i, t jTwo label t of) ' be i, t jBetween similarity, f (t i) ', f (t j) ' be respectively label t i, t jAt the number of times that the tally set of resource r occurs, f (t i, t j) ' be two label t i, t jThe common number of times that occurs in the tally set of resource r, N ' are total numbers of speech in the tally set of resource r;
(e’) STR r=card(t∈T str)′/|T r|;
Wherein, T StrBe special tag set, card (t ∈ T Str) ' be the employed T that is included in of resource r StrIn the number of label, comprise repeat count.
3. label recommend method according to claim 1 and 2 is characterized in that, adopts following method to obtain non-user in the step (2) and relies on similar resource:
(3.1) calculate the tendentious similarity of motivation of each motivation tendentiousness that has marked resource and resource to be marked respectively;
(3.2) select the mark resource of similarity, promptly obtain non-user and rely on similar resource, wherein 0<α<1 greater than threshold values α.
4. label recommend method according to claim 1 and 2 is characterized in that, adopts following method to obtain label recommended candidate resource in the step (3):
(4.1) calculate motivation tendentiousness and the tendentious similarity of user's motivation that non-user relies on each resource in the similar resource;
(4.2) select similarity to rely on similar resource, i.e. label recommended candidate resource, wherein 0<β<1 greater than the non-user of threshold values β.
5. label recommend method according to claim 1 and 2 is characterized in that, adopts following method to calculate in the step (5) and merges tally set
Figure RE-FDA0000091986210000041
In the recommendation importance of each label:
(5.1) calculate resource to be marked
Figure RE-FDA0000091986210000042
In each speech w in resource to be marked
Figure RE-FDA0000091986210000043
In content importance p (w), Wherein,
Figure RE-FDA0000091986210000045
For speech w in resource to be marked
Figure RE-FDA0000091986210000046
The middle number of times that occurs,
Figure RE-FDA0000091986210000047
Be resource to be marked
Figure RE-FDA0000091986210000048
In the number of all speech,
Figure RE-FDA0000091986210000049
The number of the whole speech that comprise for all label recommended candidate resources, | R Cad(w) | be the number of the resource of the speech w that comprises in all label recommended candidate resources;
(5.2) calculate speech w and merge tally set
Figure RE-FDA00000919862100000410
Correlativity s between the middle label t (w, t),
Wherein,
Figure RE-FDA0000091986210000052
For label t in resource to be marked
Figure RE-FDA0000091986210000053
The middle number of times that occurs,
Figure RE-FDA0000091986210000054
For speech w and label t in resource to be marked
Figure RE-FDA0000091986210000055
In the number of times that occurs simultaneously;
(5.3) the recommendation importance of computation tag t
Figure RE-FDA0000091986210000056
6. one kind based on the tendentious label commending system of user's motivation, comprises motivation tendentiousness computing module (100), selects non-user to rely on similar resource module (200), selects label recommended candidate resource module (300), label merges module (400), recommend importance computing module (500) and output module (600);
Motivation tendentiousness computing module (100) is used to calculate user's motivation tendentiousness, motivation tendentiousness that each has marked resource and the motivation tendentiousness of resource to be marked;
Selecting non-user to rely on similar resource module (200) is used for obtaining non-user and relying on similar resource marking the resource selection resource similar to the motivation tendentiousness of resource to be marked;
Select label recommended candidate resource module (300) to be used for relying on similar resource and select the resource similar, obtain label recommended candidate resource to user's motivation tendentiousness non-user;
Label merges module (400) and is used for all labels of label recommended candidate resource are merged, and obtains merging tally set;
Recommend importance computing module (500) to be used for calculating the recommendation importance that merges each label of tally set;
Output module (600) is used for according to the recommendation importance of each label from big to small, carries out label and recommends.
CN 201110154353 2011-06-09 2011-06-09 Label recommendation method and system based on user motivation orientation Expired - Fee Related CN102262653B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110154353 CN102262653B (en) 2011-06-09 2011-06-09 Label recommendation method and system based on user motivation orientation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110154353 CN102262653B (en) 2011-06-09 2011-06-09 Label recommendation method and system based on user motivation orientation

Publications (2)

Publication Number Publication Date
CN102262653A true CN102262653A (en) 2011-11-30
CN102262653B CN102262653B (en) 2013-09-18

Family

ID=45009282

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110154353 Expired - Fee Related CN102262653B (en) 2011-06-09 2011-06-09 Label recommendation method and system based on user motivation orientation

Country Status (1)

Country Link
CN (1) CN102262653B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164463A (en) * 2011-12-16 2013-06-19 国际商业机器公司 Method and device for recommending labels
CN103544510A (en) * 2013-09-30 2014-01-29 小米科技有限责任公司 Information processing method, information processing device and mobile terminal
CN104199838A (en) * 2014-08-04 2014-12-10 浙江工商大学 User model building method based on label disambiguation
CN104216881A (en) * 2013-05-29 2014-12-17 腾讯科技(深圳)有限公司 Method and device for recommending individual labels
CN105989018A (en) * 2015-01-29 2016-10-05 深圳市腾讯计算机系统有限公司 Label generation method and label generation device
CN107341242A (en) * 2017-07-06 2017-11-10 太原理工大学 A kind of label recommendation method and system
CN107833082A (en) * 2017-09-15 2018-03-23 广州唯品会研究院有限公司 A kind of recommendation method and apparatus of commodity picture
CN108334625A (en) * 2018-02-09 2018-07-27 深圳壹账通智能科技有限公司 Processing method, device, computer equipment and the storage medium of user information
CN108415971A (en) * 2018-02-08 2018-08-17 兰州智豆信息科技有限公司 Recommend the method and apparatus of supply-demand information using knowledge mapping
CN111221644A (en) * 2018-11-27 2020-06-02 阿里巴巴集团控股有限公司 Resource scheduling method, device and equipment
CN112966682A (en) * 2021-05-18 2021-06-15 江苏联著实业股份有限公司 File classification method and system based on semantic analysis

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6655963B1 (en) * 2000-07-31 2003-12-02 Microsoft Corporation Methods and apparatus for predicting and selectively collecting preferences based on personality diagnosis
CN101751448A (en) * 2009-07-22 2010-06-23 中国科学院自动化研究所 Commendation method of personalized resource information based on scene information
CN102004774A (en) * 2010-11-16 2011-04-06 清华大学 Personalized user tag modeling and recommendation method based on unified probability model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6655963B1 (en) * 2000-07-31 2003-12-02 Microsoft Corporation Methods and apparatus for predicting and selectively collecting preferences based on personality diagnosis
CN101751448A (en) * 2009-07-22 2010-06-23 中国科学院自动化研究所 Commendation method of personalized resource information based on scene information
CN102004774A (en) * 2010-11-16 2011-04-06 清华大学 Personalized user tag modeling and recommendation method based on unified probability model

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164463A (en) * 2011-12-16 2013-06-19 国际商业机器公司 Method and device for recommending labels
US9134957B2 (en) 2011-12-16 2015-09-15 International Business Machines Corporation Recommending tags based on user ratings
CN103164463B (en) * 2011-12-16 2017-03-22 国际商业机器公司 Method and device for recommending labels
CN104216881A (en) * 2013-05-29 2014-12-17 腾讯科技(深圳)有限公司 Method and device for recommending individual labels
CN103544510B (en) * 2013-09-30 2016-10-26 小米科技有限责任公司 Information processing method, device and mobile terminal
CN103544510A (en) * 2013-09-30 2014-01-29 小米科技有限责任公司 Information processing method, information processing device and mobile terminal
CN104199838B (en) * 2014-08-04 2017-09-29 浙江工商大学 A kind of user model constructing method based on label disambiguation
CN104199838A (en) * 2014-08-04 2014-12-10 浙江工商大学 User model building method based on label disambiguation
CN105989018A (en) * 2015-01-29 2016-10-05 深圳市腾讯计算机系统有限公司 Label generation method and label generation device
CN105989018B (en) * 2015-01-29 2020-04-21 深圳市腾讯计算机系统有限公司 Label generation method and label generation device
CN107341242A (en) * 2017-07-06 2017-11-10 太原理工大学 A kind of label recommendation method and system
CN107833082A (en) * 2017-09-15 2018-03-23 广州唯品会研究院有限公司 A kind of recommendation method and apparatus of commodity picture
CN108415971A (en) * 2018-02-08 2018-08-17 兰州智豆信息科技有限公司 Recommend the method and apparatus of supply-demand information using knowledge mapping
CN108415971B (en) * 2018-02-08 2021-07-23 兰州智豆信息科技有限公司 Method and device for recommending supply and demand information by using knowledge graph
CN108334625A (en) * 2018-02-09 2018-07-27 深圳壹账通智能科技有限公司 Processing method, device, computer equipment and the storage medium of user information
CN108334625B (en) * 2018-02-09 2020-05-29 深圳壹账通智能科技有限公司 User information processing method and device, computer equipment and storage medium
CN111221644A (en) * 2018-11-27 2020-06-02 阿里巴巴集团控股有限公司 Resource scheduling method, device and equipment
CN111221644B (en) * 2018-11-27 2023-06-13 阿里巴巴集团控股有限公司 Resource scheduling method, device and equipment
CN112966682A (en) * 2021-05-18 2021-06-15 江苏联著实业股份有限公司 File classification method and system based on semantic analysis

Also Published As

Publication number Publication date
CN102262653B (en) 2013-09-18

Similar Documents

Publication Publication Date Title
CN102262653B (en) Label recommendation method and system based on user motivation orientation
Al-Ghuribi et al. Multi-criteria review-based recommender system–the state of the art
Zhou et al. Preference-based mining of top-K influential nodes in social networks
Zhang et al. Identification of the to-be-improved product features based on online reviews for product redesign
Liu et al. Analyzing changes in hotel customers’ expectations by trip mode
Lv et al. Learning to model relatedness for news recommendation
Liang et al. Connecting users and items with weighted tags for personalized item recommendations
CN102004774B (en) Personalized user tag modeling and recommendation method based on unified probability model
Welch et al. Search result diversity for informational queries
Bendersky et al. Learning from user interactions in personal search via attribute parameterization
Lu et al. Scalable news recommendation using multi-dimensional similarity and Jaccard–Kmeans clustering
CN104834686A (en) Video recommendation method based on hybrid semantic matrix
US8930388B2 (en) System and method for providing orientation into subject areas of digital information for augmented communities
CN103473354A (en) Insurance recommendation system framework and insurance recommendation method based on e-commerce platform
CN104572797A (en) Individual service recommendation system and method based on topic model
JP2010176665A (en) System and method for providing default hierarchical training for social indexing
CN103823893A (en) User comment-based product search method and system
CN105426514A (en) Personalized mobile APP recommendation method
CN104484431A (en) Multi-source individualized news webpage recommending method based on field body
CN105608166A (en) Label extracting method and device
CN104077417A (en) Figure tag recommendation method and system in social network
CN104915449A (en) Faceted search system and method based on water conservancy object classification labels
CN104899229A (en) Swarm intelligence based behavior clustering system
An et al. A heuristic approach on metadata recommendation for search engine optimization
CN102982101B (en) Based on the method for the network community user Push Service of user context body

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130918

Termination date: 20140609