WO2015184619A1

WO2015184619A1 - Method and apparatus for estimating recessive character distribution of users

Info

Publication number: WO2015184619A1
Application number: PCT/CN2014/079258
Authority: WO
Inventors: 陈宽
Original assignee: 深圳市推想大数据信息技术有限公司
Priority date: 2014-06-05
Filing date: 2014-06-05
Publication date: 2015-12-10
Also published as: CN104205100B; CN104205100A

Abstract

A method and apparatus for estimating recessive character distribution of users. The method comprises: acquiring users using a website and dominant characters of the users; acquiring character information of the whole population from a population database, the character information comprising dominant characters and recessive characters; and calculating recessive character distribution of the users according to the character information of the whole population, the users using the website, the dominant characters of the users and a Bayesian algorithm. By means of the method, an estimation result is more accurate when recessive characters of users are estimated.

Description

Method and device for estimating distribution of hidden features of users

[Technical Field]

The present invention relates to the field of network technologies, and in particular, to a method and apparatus for estimating a distribution of hidden features of a user.

【Background technique】

Usually, when users use the website, they need to register as users of the website. When users register as users of the website, they need to fill in the registration information, such as: user name, ID number, and so on.

If the website manager needs precise advertising marketing and pushes different advertisements to different users, it is not enough according to the user registration information. If more user information is needed, the user's other information may be calculated according to the user's already registered information. Information, for example: Know the user's name, want to estimate the user's age, race, gender, etc.

In the prior art, the implicit feature is estimated by known dominant features, which is implemented according to the Bayesian equation, as follows:

Suppose X is the recessive feature of the user we are interested in estimating. Let t be the dominant feature of the user we can observe. To estimate X, the Bayesian equation is as follows:

Wherein, the sample space of the Bayesian equation is national population data, for example: t is the user name,

X is the gender of the user. By looking at the national population data, the 6 rate P(t | x) of the name t in each gender X, the 6 rate P(x) of each gender X, and the 6 rate of the name t appear. P(t), so that ρ(χ 1 ₁ ) can be calculated.

It is worth noting that: The sample space of the above Bayesian equation is the national population data, and the composition of the users using the website is often very different from the composition of the national population. For example: The majority of the user population of Sina Weibo is young college students. Most of the users of Renren.com are students at school. At this time, if the Bayesian equation is forcibly applied, the estimated recessive features will have a large error, such as The following examples illustrate:

If it is observed that the user name of a user of a certain website F is called Jo (equivalent to the dominant feature t), it is desirable to estimate the Jo age layer, assuming that the age group is 0 to 50 years old, A, age group 50 to: 100 years old is B, and For each half of the population, then == 0.5. Assume that 50~: 100 people in the age group do not use the website F, then Ρ | ^ = 0. Through the population database, the distribution of Jo among the population is 0 to 50. The age group is 1 person, and the 50~: 100 age group has 99 people.

P(A 11) _ Pjt) _ P(t \ A)P(A) _ P(t \ A) * 0.5 _ P(t \ A) _ 1

P(B 11) _ Pit I B) , _ P(t I B)P(B) _ P(t I E) * 0.5 ~ P(t \ B) ~ 99

Pit)

According to the Bayesian equation, the probability that Jo's age layer is 0~50 is 1%, and the probability of 50~100 is 99%. However, the actual situation is that the probability of the Jo age group being 0~50 is 100%, which is 50%. ~: The age of 100 is 0%. It is precisely because the composition of the sample space using the website F is different from the composition of the national population. However, the national population data is used in the calculation, and the sample space is different, resulting in serious deviations in the calculation results. Usually, each website has its own characteristics. The people attracted by each website also have their own characteristics. The composition of the population is generally different from the composition of the national population. If the sample space of the national population data is used to estimate the hidden characteristics of the users. , i3⁄4 'J will inevitably cause a result error.

[Summary of the Invention]

In order to at least partially solve the above problems, the present invention proposes a method and apparatus for estimating a user's implicit feature distribution such that the estimation result is more accurate when estimating a user's recessive feature.

In order to solve the above technical problem, a technical solution adopted by the present invention is a method for estimating a distribution of a hidden feature of a user, which includes obtaining a user who uses the website and a dominant feature of the user; and acquiring feature information of all the populations from the population database, wherein The feature information includes a dominant feature and a recessive feature. The user implicit feature distribution is calculated according to the feature information of the population, the user using the website, and the dominant feature of the user, in conjunction with a Bayesian algorithm.

Wherein the characteristic information according to the population, the user who uses the website, and the user The dominant feature, the step of calculating the user's implicit feature distribution in combination with the Bayesian algorithm is specifically: if, under the implicit feature of any user, the user uses the website and the probability independence condition of the user having the dominant feature is established, then Calculating the hidden features of the user according to the following formula,

The L is an integer greater than or equal to 1, the X is a recessive feature of the user, the t is a dominant feature of the user, and the / is a user who uses the website.

The method further includes: judging whether, under the implicit feature of any user, the user uses the website and the probability independence condition of the user has a dominant feature, the determining specific steps include: according to the feature information of all the population, the user who uses the website And the explicit characteristics of the user, calculating the value of any user, wherein the calculation formula of the A is as follows:

= i ³ (t | x ₁ .... nx _i )

Calculating the value of Ρ ₂ of any user according to the characteristic information of all the populations, wherein the calculation formula of the Ρ ₂ is as follows:

Ρ ₂ = I ^χ ι η...+ . n xj ^/ lx ...+ . n xj

If both A and P _{2 of} the arbitrary user are equal, the probability independence condition is established.

The method further includes: analyzing the user behavior habit according to the dominant feature and the recessive feature of the user.

In order to solve the above technical problem, another technical solution adopted by the present invention is: providing an apparatus for estimating a distribution of a hidden feature of a user, comprising: a first obtaining module, configured to acquire a user who uses the website and a dominant feature of the user; a second obtaining module, configured to acquire feature information of all the populations from the national population database, where the feature information includes a dominant feature and a recessive feature; and a calculating module, configured to use the website according to the feature information of all the populations The dominant feature of the user and the user is combined with a Bayesian algorithm to calculate the user's implicit feature distribution.

Wherein, under the implicit feature of any user, if the user uses the website and the probability independence condition of the user having the dominant feature is established, the implicit feature of the user is calculated according to the following formula.

The device further includes a determining module, where the determining module is configured to calculate a value of the A of any user according to the feature information of the user, the user of the website, and the dominant feature of the user, where the A Calculated as follows:

= i ³ (t | x ₁ .... nx _i )

with,

Calculating the value of Ρ ₂ of any user according to the characteristic information of all the users, the user using the website, and the dominant characteristics of the user, wherein the calculation formula of the Ρ ₂ is as follows:

Ρ ₂ = I ^χ ι η...+ . n xj ^/ lx ...+ . n xj

as well as,

Determining whether the arbitrary user is equal to P ₂ , if equal, the probability independence condition is in which the device further includes an analysis module; and the analyzing module is configured to: according to a dominant feature and a recessive feature of the user , analyzing the user behavior habits.

In order to solve the above technical problem, another technical solution adopted by the present invention is: providing an apparatus for estimating a distribution of a hidden feature of a user, the apparatus comprising a processor; the processor is configured to acquire a user who uses the website and a dominant feature of the user And obtaining feature information of all the populations from the population database, wherein the feature information includes dominant features and recessive features, and, according to characteristic information of all the populations, users using the website, and dominantness of the user Feature, combining the Bayesian algorithm to calculate the user recessive feature distribution;

The step of calculating, by the processor according to the characteristic information of all the populations, the users using the website, and the dominant features of the user, the Bayesian algorithm to calculate the hidden feature distribution of the user is specifically: The processor is configured to calculate a recessive feature of the user according to the following formula if, under the implicit feature of any user, the user uses the website and the probability independence condition of the user has a dominant feature is established, _p

The processor is further configured to determine whether a probability independence condition that the user uses the website and the user has a dominant feature is established under the implicit feature of any user, and the determining the specific step includes: according to the feature information of all the populations, The value of any user is calculated using the explicit characteristics of the user and the user of the website, wherein the calculation formula of the A is as follows:

= i ³ (t | x ₁ .... nx _i )

P ₂ = ^p (t I ^χ ι n...+ . n xJP lx +.... n xj

If any of the users is equal to P ₂ , the probability independence condition is established.

The processor is further configured to analyze the user behavior habit according to the dominant feature and the recessive feature of the user.

The beneficial effects of the present invention are: Different from the prior art, the present invention, when calculating the hidden features of the user, plus the data of the user who uses the website, enables the people who have dominant characteristics among the user groups of the computing website. When there is a probability among the recessive features, the user group of the website is taken as the sample space, instead of the national population data, the difference of the sample space does not exist, so that the error of the calculation result does not exist, the calculation result is corrected, and then the calculation is performed. The result is more accurate.

[Description of the Drawings]

1 is a flow chart of an embodiment of a method for estimating a recessive feature distribution of a user according to the present invention; FIG. 2 is a schematic diagram of distribution of dominant features and recessive features in a sample space in a method for estimating a recessive feature distribution of a user according to the present invention; ;

3 is a schematic diagram of a method for estimating a user's implicit feature distribution in the method embodiment of the present invention with implicit features and using a website to distribute in a sample space; 4 is a schematic diagram of a method for estimating a distribution of a dominant feature and a recessive feature in a sample space in an embodiment of a method for estimating a recessive feature distribution of a user;

FIG. 5 is a schematic structural diagram of a first embodiment of an apparatus for estimating a recessive feature distribution of a user according to the present invention; FIG. 6 is a schematic structural diagram of a second embodiment of an apparatus for estimating a recessive feature distribution of a user according to the present invention.

【detailed description】

The invention will now be described in detail in conjunction with the drawings and embodiments.

Referring to Figure 1, the method includes:

Step S201: Obtaining a user who uses the website and a dominant feature of the user;

The website records information about the user, for example: user registration information, user access information, etc., wherein the user's related information is usually stored in the statistics of the website background, and the statistics can be used to obtain who uses the website, for example: Statistical data records Zhang San, Li Si registered as a user of the website, through the statistics can be found that Zhang San and Li Si use the website, of course, the user's relevant information requirements are true, such as: real name, real age, etc. Wait.

The dominant feature of the user is the directly acquired feature, for example: the statistical data records the real name of the registered user, and the name is the display feature of the user.

The hidden feature of the user is a feature that cannot be directly obtained. For example, the statistic data does not record the race of the registered user, and the race of the user cannot be directly obtained through statistical data, and the race is a hidden feature of the user.

Step S202: Obtain feature information of all populations from a population database, where the feature information includes a dominant feature and a recessive feature;

The population database records in detail the characteristics of all populations, such as: person's name, gender, age, etc. It is worth noting that: the feature information of the population database includes explicit features and recessive features, wherein the dominant features correspond to the dominant features of the user, and the recessive features correspond to the hidden features of the user, for example: the user's name is a display feature, Then, the name in the population database is a display feature, and the user's race is a recessive feature, and the race in the population database is a recessive feature. In the embodiment of the present invention, the population database may be a population database published by a national authority, which may be obtained from an open source.

Step S203: Calculate the distribution of the hidden features of the user according to the characteristic information of all the populations, the explicit features of the users and the users using the website, and the Bayesian algorithm;

Before calculating the user implicit feature distribution in combination with the Bayesian algorithm, it is also required to verify whether the probability independence condition of the user using the website and the user has a dominant feature is established under the implicit feature of any user, and step S203 may be specific. If: under the implicit feature of any user, the user uses the website and the probability independence condition of the user having the dominant feature is established, the implicit feature of the user is calculated according to the following formula,

P _(Xl n .... _{nxJtn) =} hunt ^'n ... + ^ just ^ + ... ^ x Pjx ^ - .. Nx L) - Formula _χ

^{1 L} P(t ΓΛ f)

The following L=l indicates the origin of Equation 1. As can be seen from the background art, since the composition of the user who uses the website is different from the composition of the national population, if the Bayesian equation is imposed, the calculation result will be inaccurate. In order to avoid errors in the calculation results, the sample space needs to be corrected, and the user using the website is added to the Bayesian equation. The modified Bayesian equation is:

P( _Xl I tcf) = . /1)—Form _2, where the probability independent condition is established, then the corpse (0/|^)= corpse|^) corpse (/|^),

Then = j , w -- Equation 3 It can be seen from Equation 3 that the probability problem of the three conditions is reduced to the probability problem between the three conditions and the two, which simplifies the requirements of the data.

Further, Equation 3 and Equation 2 show that the simplified Bayesian equation needs to satisfy the probability independent condition. The specific reasons are as follows:

As shown in Figure 2, it is assumed that the recessive feature X of the website may only present two values A and B. The two regions A and B are shown on the graph, and a and b are assumed to be A and B respectively. area. Assume that the dominant feature t that can be observed is represented by a small rectangle in the middle, and two ranges of recessive features. The intersections are TA and TB, and the areas are ta and tb, respectively. The problem that needs to be solved is to require the area ratio between TA and TB. After standardization to 1, the probability ratio of the two can be obtained.

If A and B are completely covering the entire population sample space, the simple Bayes equation is:

Pit I A)P(A)

P(A 11) _ P{t) _ P(t I A)P(A)

P(B 11) Pit I B)P(B) P(t I B)P(B)

~ Pit) ~

Ta a

If the area ratio of the figure is used, it is: =4^, if the area ratio is equal to the sample side, tb tb b

b a + b

The coincidence is A+B, and the equation must be established.

If the sample space on both sides does not match, there is a problem with the area ratio equation. As shown in Figure 3, only some people in the B group use the website F, labeled B ', and the area is b', and the dominant The intersection between the features t and B ' is TB ' and the area is tb', then the value we are actually interested in becomes

At this point, the sample space to the left of the equation is A+B '. If we continue to simply apply the Bayesian equation, the right side of the equation continues to be:

Pit I A)P(A)

Pit I B)P(B)

At this point, the sample space to the right of the equation is still A+B.

If expressed in terms of area, the left side of the Bayesian equation is: , and the right side of the equation is: tb'

Ta a

^ = - , Obviously, _≠ , the left side of the equation is not equal to the right side of the equation, that is, Bayesian tb b tb tb' tb

b a + b

The two sides of the equation are unequal. Simply applying the Bayesian equation will cause errors in the calculation results.

Obviously, the reason for the error in the calculation result is as follows: The sample space on the left and right sides of the equation is not correct. Therefore, the sample space needs to be corrected so that the sample spaces on the left and right sides of the equation are consistent.

As shown in Fig. 4, the sample space composition of TA is the same as the sample space of A, and the sample composition of TB is the same as the sample composition of B. When the person B of the website F uses B',

Tb' b' ta' a'

Tb b ta a Wherein, the sample space is corrected such that the sample space composition of the TA is compared with the sample space of A

When the sample composition of TB is the same as the sample composition of B,

Then the Bayesian equation is:

Tb' V tb' V tb V tb V b tb b' b is expressed as area. _ V a + b _ b' a + b _ ba + b _ ba + bb ₌ bba + b

• a + b ta + tb' ta + tb' ta + tb' ta + tb' ta + tb' a + ba + ba + ba + ba + b The modified Bayesian equation can be: Ρ(Β' | Tn ) = ^{P(t 1 B)P(f 1 B)P{B)} At this time, the population distribution data can be obtained through the population database, for example: how many people at the same time under each recessive feature value { x ₁ Have the dominant eigenvalue t we observed, as well as the proportion of the total population. Among them, a sufficiently detailed database (such as Census data in the United States) allows us to determine each person and their corresponding dominant and recessive features, assuming that there are a total of ^ personal data in the database, the data of the Vth person is χ ^ν ), assumed to be the event indicator equation. We can do the following calculations for the probabilities in the bias Bayesian correction equation:

Pi I x)

χ - χ

X — X ί

W

At this point we also need Ρ(/ | χ ), that is, how many people use the website F in each of the people with hidden characteristics (such as how many people in the 12-19 age group use the website) In general, the relevant user data is recorded in the background statistics of the website, and the required data can be obtained through the statistical data. Further, for the total number of recessive features is η, then ρ(χ' | η /) = 1

ι=1 Therefore,

The above circle is described by taking a single recessive feature as an example. Similarly, it can be extended to multiple recessive features, and the modified Bayesian equation is: _p

It should be noted that: the sample space is corrected such that the sample space composition of TA is phased with the sample space of A, and the sample composition of TB is the same as the sample composition of B, wherein the 6-rate independence condition is satisfied, and the opposite is satisfied. Under the condition of probability independence, the sample space structure of TA is the same as the sample space of A, and the sample structure of TB is the same as the sample space of B. Therefore, when using the Bayesian equation after positive, To verify whether the probability independence condition is met, the method further includes:

It is judged whether the probability independence condition of the user using the website and the user has a dominant feature is established under the implicit feature of any user, and the specific steps of the determining include:

Calculate the value of any user based on the characteristic information of all the population, the users using the website, and the dominant characteristics of the user. The calculation formula of the A is as follows:

= i ³ (t | x ₁ .... nx _i )

Calculate the value of Ρ ₂ for any user based on the characteristic information of all populations, where Ρ ₂ is calculated as follows:

Ρ ₂ = I ^χ ι η...+ . n xj ^/ lx ...+ . n xj

If both A and P _{2 of} any user are equal, the probability independence condition is established.

The L is an integer greater than or equal to 1, wherein when L is 1, it is a single recessive feature, the X is a recessive feature of the user, the t is a dominant feature of the user, and the / is used The user of the website.

Further, after obtaining the dominant feature and the recessive feature of the user, the website behavior habit can be analyzed according to the dominant feature and the recessive feature of the user, so that the advertising strategy can be formulated according to the user's behavior habit, or pushed to the user. Appropriate value-added services and more. Among them, the user's explicit features and recessive features can be obtained, and the user's behavior habits can be more accurately determined, thereby making the customized advertising strategy or the value-added service pushed more reasonable and improving the success rate.

The invention corrects the problem of sample space deviation generated by the implicit estimation problem, so as to estimate The calculation result is closer to the correct theoretical value, and the stronger the deviation of the sample space, the stronger the necessity of using the present invention for correction. At present, many popular users have very strong biases. For example, foreign social media website Facebook, in 2012, shows that 83% of people aged 18-29 use, while only 40% of people over 65 years old People are using, if we don't take corrections, the normal maximum probability method and Bayesian algorithm will erroneously amplify each user's probability of 65 years and older to the correct value relative to the probability of 18-29 years old. More than this, this will have a significant impact on the calculations and analysis based on this, which may cause serious deviations from the final result.

In the embodiment of the present invention, when calculating the implicit feature of the user, adding the data of the user who uses the website, when the probability of the recessive feature among the people having the dominant features among the user groups of the computing website is calculated, The user group of the website is used as the sample space, instead of the national population data, the difference of the sample space does not exist, so that the error of the calculation result does not exist, and the calculation result is corrected.

The present invention also provides a first embodiment of an apparatus for estimating a hidden feature distribution of a user. As shown in FIG. 5, the apparatus includes a first obtaining module 301, a second obtaining module 302, and a calculating module 304.

The first acquisition module 301 obtains the user who uses the website and the dominant features of the user. The second acquisition module 302 acquires feature information of all populations from the national population database, wherein the feature information includes dominant features and recessive features.

The calculation module 304 calculates the user recessive feature distribution according to the characteristic information of all the populations, the explicit features of the users and users using the website, and the Bayesian algorithm. Specifically, the calculation module 304 may be configured to calculate a user recessive feature distribution by using a Bayesian algorithm if the user uses the website and the probability independence condition of the user has a dominant feature under the implicit feature of any user, and then the calculation module 304 In another example, if the probability independence condition of the user using the website and the user has a dominant feature is established under the hidden feature of any user, the implicit feature of the user is calculated according to the following formula.

ρ

The L is an integer greater than or equal to 1, the X is a recessive feature of the user, the t is a dominant feature of the user, and the / is a user using the website, and the calculation formula is The origin can be referred to the estimation of the user's implicit feature distribution implementation, which will not be repeated here. The apparatus can also include a determination module 303 and an analysis module 305. The determining module 303 is configured to calculate the value of A of any user according to the feature information of all users, the user using the website, and the dominant feature of the user, where the calculation formula of the A is as follows:

= i ³ (t | x ₁ .... nx _i )

with,

Ρ ₂ = I ^χ ι η...+ . n xj ^/ lx ...+ . n xj

as well as,

Determining whether the arbitrary user is equal to P ₂ , and if they are equal, the probability independence condition analysis module 305 analyzes the user behavior habit according to the dominant feature and the recessive feature of the user, thereby being able to formulate according to the behavior habit of the user. Advertising strategy, or, to push the appropriate value-added services to users, and so on. Among them, obtaining the explicit and hidden features of the user can more accurately determine the behavior habits of the user, thereby making the customized advertising strategy or the value-added service pushed more reasonable and improving the success rate.

In the embodiment of the present invention, when calculating the implicit feature of the user, the calculation module 304 adds the data of the user who uses the website, so that among the people having the dominant features among the user groups of the computing website, there are hidden features among the people having the dominant features among the user groups of the computing website. When the probability is based on the user group of the website as the sample space, rather than the national population data, the difference in the sample space does not exist, so that the error of the calculation result does not exist, and the calculation result is corrected.

The present invention also provides a second embodiment of an apparatus for estimating a hidden feature distribution of a user. As shown in FIG. 6, the apparatus includes a processor 401, a memory 402, and a bus 403. Both processor 401 and memory 402 are coupled to bus 403.

The processor 401 is configured to acquire a feature of the user using the website and the user, and obtain feature information of all the populations from the population database, where the feature information includes a dominant feature and a recessive feature, according to the feature information of all the populations. , using the users of the website and the dominant characteristics of the user, combined with The leaves algorithm calculates the user's recessive feature distribution.

Further, the processor 401, according to the feature information of all the populations, the user who uses the website, and the dominant features of the user, and the Bayesian algorithm to calculate the user's implicit feature distribution is specifically as follows: Under the implicit feature, if the user uses the website and the probability independence condition of the user having the dominant feature is established, the hidden feature of the user is calculated according to the following formula.

ρ _ίν

The L is an integer greater than or equal to 1, the X is a recessive feature of the user, the t is a dominant feature of the user, and the / is a user who uses the website. And determining whether the probability independence condition of the user using the website and the user has a dominant feature is established under the implicit feature of any user, the specific steps of the determining include:

= i ³ (t | x ₁ .... nx _i )

Ρ ₂ = I ^χ ι η...+ . n xj ^/ lx ...+ . n xj

If the arbitrary user is equal to Ρ ₂ , the probability independence condition is established.

The processor 401 is further configured to analyze the user behavior habit according to the dominant features and the recessive features of the user.

It should be noted that the user who uses the website and the explicit features of the user can be obtained by the website background statistics and stored in the memory 402. The processor 401 extracts from the memory 402 the user who uses the website and the dominant features of the user. The content of the population database can also be stored in the memory 402 after being obtained from the public channel in advance, and is extracted from the memory 402 when the population database is needed, or can be obtained from the public channel when needed.

In the embodiment of the present invention, when calculating the implicit feature of the user, the processor 401 adds data of the user who uses the website, so that the person who has the dominant feature among the user groups of the computing website has When there is a probability in the recessive feature, the user group of the website is taken as the sample space, instead of the national population data, the difference of the sample space does not exist, so that the error of the calculation result does not exist, and the calculation result is corrected.

The above description is only the embodiment of the present invention, and is not intended to limit the scope of the invention, and the equivalent structure or equivalent process transformation using the specification and the drawings of the present invention may be directly or indirectly applied to other related technologies. The scope of the invention is included in the scope of patent protection of the present invention.

Claims

Rights request

A method for estimating a distribution of a recessive feature of a user, the method comprising: obtaining a user using the website and a dominant feature of the user;

Obtaining feature information of all populations from a population database, wherein the feature information includes dominant features and recessive features;

The user recessive feature distribution is calculated according to the feature information of all the populations, the users using the website, and the dominant features of the user, in conjunction with the Bayesian algorithm.

2. The method of claim 1 wherein

The step of calculating the user's implicit feature distribution according to the feature information of the all populations, the user using the website, and the explicit features of the user, and combining the Bayesian algorithm are specifically:

If, under the implicit feature of any user, the user uses the website and the probability independence condition of the user has a dominant feature, the implicit feature of the user is calculated according to the following formula.

ρ

3. The method according to claim 2, further comprising: determining whether a probability independence condition of the user using the website and having the dominant feature is established under the implicit feature of any user, the determining the specific step Includes:

= i ³ (t | x ₁ .... nx _i ) Calculate the value of Ρ ₂ of any user based on the characteristic information of all the populations, wherein the calculation formula of the Ρ ₂ is as follows:

P ₂ = ^p (t I ^χ ι n...+ . nxJP lx +.... nxj

The method according to any one of claims 1 to 3, wherein the method further comprises:

The user behavior habits are analyzed according to the dominant features and recessive features of the user.

5. A device for estimating a distribution of a recessive feature of a user, comprising:

a first obtaining module, configured to acquire a user who uses the website and a dominant feature of the user;

a second obtaining module, configured to acquire feature information of all populations from a national population database, wherein the feature information includes a dominant feature and a recessive feature;

And a calculation module, configured to calculate a recessive feature distribution of the user according to the feature information of the all populations, the user who uses the website, and the dominant features of the user, and the Bayesian algorithm.

The method according to claim 5, wherein the calculating module is specifically configured to: if the user uses the website and the probability independence condition of the user having the dominant feature is established under the implicit feature of any user, according to the following formula Calculating the hidden features of the user,

p

The method according to claim 6, wherein the device further comprises a determining module, wherein the determining module is configured to calculate according to feature information of all users, using a dominant feature of a user of the website and a user The value of A of any user, where the calculation formula of A is as follows:

= i ³ (t | x ₁ .... nx _i ) and,

_{^{^{P 2 = p (t I χ}}} ι n ... +. N xJP lx + .... N xj

as well as,

Determining whether A and P _{2 of} the arbitrary user are equal, if equal, the probability independence condition is

The device according to any one of claims 5 to 7, wherein the device further comprises an analysis module;

The analyzing module is configured to analyze the user behavior habit according to a dominant feature and a recessive feature of the user.

9. Apparatus for estimating a distribution of recessive features of a user, characterized in that said apparatus comprises a processor;

The processor is configured to acquire a user of the website and a dominant feature of the user, and acquire feature information of all the populations from the population database, where the feature information includes a dominant feature and a recessive feature, and, according to the all The characteristic information of the population, the user who uses the website, and the dominant features of the user are combined with the Bayesian algorithm to calculate the user's implicit feature distribution.

10. Apparatus according to claim 9 wherein:

And the step of calculating, by the processor according to the feature information of the all populations, the user using the website, and the dominant feature of the user, the Bayesian algorithm to calculate the hidden feature distribution of the user according to the following: If the probability independence condition of the user using the website and the user has a dominant feature is established under the implicit feature of any user, the implicit feature of the user is calculated according to the following formula, ρ

11. Apparatus according to claim 10 wherein:

The processor is further configured to determine whether a probability independence condition that the user uses the website and the user has a dominant feature is established under the implicit feature of any user, and the specific steps of the determining include:

= i ³ (t | x ₁ .... nx _i ) Calculate the value of Ρ ₂ of any user based on the characteristic information of all the populations, wherein the calculation formula of the Ρ ₂ is as follows: P ₂ = ^p (t I ^χ ι n...+ . nxJP lx +.... nxj

12. Apparatus according to any one of claims 9 to 11 wherein: