CN103237094A - Method and device for user identification - Google Patents

Method and device for user identification Download PDF

Info

Publication number
CN103237094A
CN103237094A CN2013101343184A CN201310134318A CN103237094A CN 103237094 A CN103237094 A CN 103237094A CN 2013101343184 A CN2013101343184 A CN 2013101343184A CN 201310134318 A CN201310134318 A CN 201310134318A CN 103237094 A CN103237094 A CN 103237094A
Authority
CN
China
Prior art keywords
cookie
user
value
field
message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013101343184A
Other languages
Chinese (zh)
Other versions
CN103237094B (en
Inventor
罗峰
黄苏支
李娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING IZP TECHNOLOGIES Co Ltd
Original Assignee
BEIJING IZP TECHNOLOGIES Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING IZP TECHNOLOGIES Co Ltd filed Critical BEIJING IZP TECHNOLOGIES Co Ltd
Priority to CN201310134318.4A priority Critical patent/CN103237094B/en
Publication of CN103237094A publication Critical patent/CN103237094A/en
Application granted granted Critical
Publication of CN103237094B publication Critical patent/CN103237094B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method and a device for user identification. Long-term cookie fields used for performing unique identification on users are acquired through a statistic manner; and the cookie fields are corresponding to various websites. And then, the cookies are connected according to jump relationships of user access messages to generate a cookie-value relationship list of the users and corresponding user IDs (identifications). Access messages sent by the users are collected, and then are marked and subjected to user identification according to the cookie-value relationship list of the users and the corresponding user IDs; and thereby, user identification with messages of ADSL (asymmetrical digital subscriber loop), IP (internet protocol) and the like is substituted; and accuracy and efficiency of user identification can be effectively improved.

Description

A kind of method and device of identifying the user
Technical field
The present invention relates to the internet information treatment technology, relate in particular to a kind of user's of identification method and device.
Background technology
In Internet technology, can pass through ADSL, IP and UA (User Agent) identify the user, but all there is certain limitation in actual applications in above-mentioned several method: 1) great majority visit message does not carry ADSL information, can cause most user to identify if identify the user by ADSL, recognition efficiency is low; 2) at present a lot of user computers all are to adopt dynamic IP, and namely the IP address of user computer is often to change, if identify the user by the IP address, are difficult to accurately navigate to the user; 3) if identify the user by UA, a general user has a plurality of UA, namely uses a plurality of browsers, and its next UA can corresponding a lot of users, also can't be accurately to User Recognition.
Summary of the invention
In view of this, the technical problem to be solved in the present invention provides a kind of user's of identification method and device, come associated user by cookie, the corresponding own cookie of each user is tabulated in order to the identifying user identity, resolve by the cookie that the visit message is carried, user's degree of depth is identified.
For achieving the above object, the present invention is achieved through the following technical solutions:
A kind of method of identifying the user, this method comprise,
Generate the self-corresponding user ID that can the unique identification user of user's cookie-value relation table and each, wherein, described user cookie-value relation table has recorded the cookie field and the customer identity registration information incidence relation that are used for the identifying user identity of each website correspondence;
Gather the visit message that the user sends;
When described visit message carries cookie information, resolve and obtain cookie field and the field value of described cookie information correspondence;
When described cookie field is present in the user cookie-value relation table, the user ID corresponding with this user cookie-value relation table added in the described visit message.
Further, described method comprises that also the visit message of user ID, the cookie-value that carries according to message extracts corresponding unique user's cookie value from described interpolation.
Further, when described cookie field is not present in the user cookie-value relation table, the field value of described cookie field correspondence is mated with corresponding unique user's of described extraction cookie value, if the field value of described cookie field correspondence is identical with corresponding unique user's of described extraction cookie value, then the match is successful, and the user ID of the cookie-value relation table that corresponding unique user's of described extraction cookie value is corresponding is added in the described visit message.
Further, described method also comprises, when described visit message does not carry cookie information, obtain the information that URL carries in the described visit message, and the information that described URL carries mated with corresponding unique user's of described extraction cookie value, if the information that described URL carries is identical with corresponding unique user's of described extraction cookie value, then the match is successful, and the user ID of the cookie-value relation table that corresponding unique user's of described extraction cookie value is corresponding is added in the described visit message.
Further, if during the corresponding a plurality of user ID of corresponding unique user's of described extraction cookie value, then merge described a plurality of user ID, and record the unique user's of described correspondence cookie value and the corresponding relation between each user ID, and again the user ID of this user cookie-value relation table correspondence is added in the described visit message.
Further, the self-corresponding user ID that can the unique identification user of described generation user's cookie-value relation table and each comprises,
Screen single user and website traffic and reach the website of preset flow threshold value;
Obtain each website in the described website according to single user and be used for the cookie field of identifying user identity, generate the domain-cookie dictionary;
Generate the redirect graph of a relation that the user visits message;
Visit the redirect graph of a relation of message and described domain-cookie dictionary according to described user and generate the self-corresponding user ID that can the unique identification user of user's cookie-value relation table and each.
Further, described screening single user comprise,
Gather ADSL terminal use correspondence in a period of time different web sites can the unique identification user identity the cookie field value, if described cookie field value remains unchanged in the section at the fixed time, judge that then this user is single user.
Further, describedly obtain each website in the described website according to single user and be used for the cookie field of identifying user identity, generate the domain-cookie dictionary and comprise,
Gather single user and the cookie value of each cookie field correspondence among each website cookie;
Add up the quantity of the corresponding unique single user's of each cookie field cookie value cookie value, calculate the frequency that this cookie field cookie value and single user occur seasonable cookie value one by one;
Add up single number of users of the corresponding unique cookie value of the single user of each cookie field, calculate the frequency that this cookie field cookie value and single user occur seasonable single user one by one;
One by one the frequency of seasonable cookie value appearance and the frequency of single user appearance are filtered each website cookie according to single number of users of each cookie field correspondence, quantity, cookie value and single user of cookie value, select cookie value quantity, single number of users and cookie value and single user cookie field that the corresponding frequency of occurrences is high one by one as the cookie field that is used for the identifying user identity of corresponding website;
Cookie field according to each website domain and corresponding identification user identity thereof generates the domain-cookie dictionary.
Further, the redirect graph of a relation that described generation user visits message comprises,
Gather the visit message of all users in a period of time;
At first according to the mode of ADSL+UA described message is divided into groups, do not carry the visit message of ADSL information if exist, then divide into groups according to the mode of IP+UA, and every group of message sorted according to the access time;
Set up the redirect graph of a relation that the user visits message.
Further, describedly visit the redirect graph of a relation of message and described domain-cookie dictionary according to the user and set up user cookie-value relation table and comprise,
S1: the host domain name of visit message and the Main Domain (domain) of redirect graph of a relation are inequality, and the cookie of two website domain name correspondences is in the domain-cookie dictionary, the cookie value of described two website domain names is carried out association, it is right to generate cookie-value, cookie-value is to setting up as described, and then the degree of incidence of cookie-value of two websites adds 1 under this user;
S2: visit message according to the user and generate cookie corresponding relation figure, the right degree of incidence of cookie-value in the statistical chart;
S3: screen according to default degree of incidence threshold value, obtain the connected component of cookie corresponding relation figure, generate the self-corresponding user ID that can the unique identification user of user's cookie-value relation table and each.
Correspondingly, the present invention also discloses a kind of user's of identification device, and described device comprises,
Generation module, be used for to generate the self-corresponding user ID that can the unique identification user of user's cookie-value relation table and each, wherein, described user cookie-value relation table has recorded the cookie field and the customer identity registration information incidence relation that are used for the identifying user identity of each website correspondence;
Acquisition module is used for gathering the visit message that the user sends;
First judge module is used for judging whether described visit message carries cookie information;
Second judge module is used for when described visit message carries cookie information, judges whether the cookie field after resolving is present in the user cookie-value relation table;
First identification module, be used for to resolve and obtain cookie field and the field value of the cookie information correspondence that the visit message carries, and when described cookie field is present in the user cookie-value relation table, the user ID corresponding with this user cookie-value relation table added in the described visit message.
Technical scheme of the present invention, correspondence long-term that mode by statistics obtains each website is used for the cookie field of unique identification user identity, redirect according to user's access websites concerns then, these cookie are associated, generate user's cookie-value relation table and corresponding user ID thereof, gather the visit message that the user sends; According to the user ID of user cookie-value relation table and correspondence thereof the visit message is carried out mark and carry out User Recognition, carry out User Recognition thereby substitute information such as using ADSL, IP, can effectively improve accuracy and the recognition efficiency of User Recognition.
Description of drawings
The identification user's that Fig. 1 provides for first embodiment of the invention method flow diagram;
The identification user's that Fig. 2 provides for second embodiment of the invention method flow diagram;
The flow chart of the self-corresponding user ID that can the unique identification user of the generation cookie-value relation table that Fig. 3 provides for the embodiment of the invention and each;
The method flow diagram of the cookie field of the screening identifying user identity that Fig. 4 provides for the embodiment of the invention;
The screening that Fig. 5 provides for the embodiment of the invention is used for the schematic diagram of cookie field of identifying user identity;
The identification user's that Fig. 6 provides for the present invention and first embodiment the corresponding apparatus structure block diagram of method;
The identification user's that Fig. 7 provides for the present invention and second embodiment the corresponding apparatus structure block diagram of method.
Embodiment
The invention will be further described below in conjunction with drawings and Examples.
Cookie (Cookies) is a kind of by browser preservation text on computers, is accompanied by user's request and the page and transmits between Web server and browser.During the each access site of user, web application can read the information that Cookie comprises.Cookie only is used for store character string value.If a computer has been installed a plurality of browsers (UA), each browser can be deposited cookie with space independently.Because not only can confirm the user among the cookie, can also comprise the information of computer and browser, so a user can obtain different cookie information with different browser logins or with different computer log; In addition, for the multi-user group who uses same browser at same computer, cookie can not distinguish their identity, unless they use different user name logins.
Because the read-write operation of cookie file relies on current browser fully, different browsers can not shared a cookie file, and cookie is easy to by the user manually or by software deletion or expired automatically.Consider the characteristics that cookie itself has, correspondence long-term that the mode of technical scheme of the present invention by statistics obtains each website is used for the cookie field of unique sign user identity, the redirect relation of visiting message according to the user is carried out association with these cookie then, generates cookie-value relation table and corresponding user ID; Gather the visit message that the user sends; According to the user ID of user cookie-value relation table and correspondence thereof the visit message is carried out mark and carry out User Recognition.
The identification user's that Fig. 1 provides for first embodiment of the invention method flow diagram.As shown in Figure 1, this method comprises the steps:
Step 101: generate the self-corresponding user ID that can the unique identification user of user's cookie-value relation table and each, wherein, described user cookie-value relation table has recorded the cookie field and the customer identity registration information incidence relation that are used for the identifying user identity of each website correspondence.
The flow chart of the self-corresponding user ID that can the unique identification user of the generation cookie-value relation table that Fig. 3 provides for the embodiment of the invention and each.As shown in Figure 3, this method flow comprises the steps:
Step 301: screen the website that single user and website traffic reach the preset flow threshold value.
The single user's of described screening process be by the different web sites of gathering ADSL terminal use correspondence in a period of time can the unique identification user identity the cookie field value, the cookie field value remains unchanged in the threshold range at the fixed time as described, judges that then this user is single user.
The attribute of the cookie field of website has a lot, by the value of cookie field that can be used for the identifying user identity of some Top Site correspondences in each ADSL terminal use (ADSL+UA) a period of time of complicate statistics.Table 1 be several Top Sites can the identifying user identity the cookie field, for example the qq of Tengxun number, i.e. the value of Tengxun website o_cookie field; The BAIDUID of baidu, i.e. value of baidu website BAIDUID field correspondence etc.If each ADSL terminal use correspondence can be used for the identifying user identity the cookie field value at the fixed time threshold value (for example 1 month, 2 months etc.) remain unchanged, then the ADSL terminal use is single user, namely ADSL terminal at the fixed time threshold value have only a user to use.Can arrange according to actual needs for scheduled time threshold value.
qq o_cookie
taobao cna
baidu BAIDUID
renren id
sohu SUV
pptv PUID
weibo un
sina SINAGLOBAL
Table 1
When described screening website traffic reached the website of preset flow threshold value, described flow threshold arranged according to actual needs, was used for the sample size of restriction statistics website.In order to guarantee that the data statistics result has better generality, in the present embodiment in all websites of screening total flow come preceding 3000 website as sample.
Step 302: obtain each website in the described website according to single user and be used for the cookie field of identifying user identity, generate the domain-cookie dictionary.
The method flow diagram of the cookie field of the screening identifying user identity that Fig. 4 provides for the embodiment of the invention.As shown in Figure 4, this flow process comprises the steps:
Step 3021: single user and the cookie value of gathering each cookie field correspondence among each website cookie;
Step 3022: find out the quantity of the corresponding unique single user's of each cookie field cookie value cookie value, calculate the frequency that this cookie field cookie value and single user occur seasonable cookie value one by one;
Step 3023: find out single number of users of the corresponding unique cookie value of the single user of each cookie field, calculate the frequency that this cookie field cookie value and single user occur seasonable single user one by one;
Step 3024: one by one the frequency of seasonable cookie value appearance and the frequency of single user appearance are filtered each website cookie according to single number of users of each cookie field correspondence, quantity, cookie value and single user of cookie value, select cookie value quantity, single number of users and cookie value and single user cookie field that the corresponding frequency of occurrences is high one by one as the cookie field that is used for the identifying user identity of corresponding website;
Step 3025: generate the domain-cookie dictionary according to each website domain and corresponding cookie field that can the identifying user identity thereof.
In the present embodiment, cookie value and single user of each the cookie field correspondence of each the website cookie under the statistics top3000 flow.Be that example describes with the Tengxun, the website cookie of Tengxun comprises a plurality of cookie fields, be example with the o-cookie field, the number of cookie value is m under the described o-cookie field, single user's number is n, wherein, each cookie value can corresponding a plurality of single users, each single user also can corresponding a plurality of cookie values, that is to say, one concrete qq number as a cookie value, can login at different browsers for this qq number, such one qq number will corresponding a plurality of users, can make that following of this field is one to one with the user some qq number this moment; A single user can have a plurality of qq numbers, also only some qq number and single user are one to one in this case, just can filter out between qq and the user relation one to one by this dual mode, in addition, also need to consider cookie value and the number of users of this cookie field correspondence, the result that large sample is added up just has more generality like this.Utilize this kind method that each cookie field of Tengxun website is screened, find out each cookie field cookie value and single user corresponding and cookie value quantity, cookie field that single number of users frequency of occurrences is high one by one, this field is considered to be used for the cookie field of identifying user identity, thereby the cookie field that is used for the identifying user identity that can obtain Tengxun's website correspondence is the o_cookie field.
By that analogy, can find out the cookie field that each website in the website of described top3000 flow is used for the identifying user identity, generate the domain-cookie dictionary according to each website domain and corresponding cookie field that can the identifying user identity.
The screening that Fig. 5 provides for the embodiment of the invention is used for the schematic diagram of cookie field of identifying user identity.Concrete,
1) for each the cookie field under the top3000host domain name, its value represents that with V the user represents with U, because the cookie instability of subscription client storage, such as the cookie value that the has inefficacy that expires, will produce new cookie value during user's access websites.Suppose that under the cookie field, the number of cookie value is m, user's number is n; Corresponding Ki the user of cookie value Vi, corresponding Ti the cookie value of user Ui;
2) find out the number of unique U of cookie value V correspondence, namely the number of Ki==1 divided by the number of total V, is made as k/m then;
3) find out the number of unique V of user U correspondence, namely the number of Ti==1 divided by the number of total U, is made as t/n then;
4) extract cookie field under the host domain name according to m, n under each cookie field of each host domain name, k/m, t/n as filtercondition, described cookie field is as the cookie of identifying user identity, according to the cookie field generation domain-cookie dictionary of each website domain and corresponding identification user identity thereof.
Step 303: generate the redirect graph of a relation that the user visits message.
Described user visits the step that the redirect graph of a relation of message generates and comprises,
Gather the visit message of all users in a period of time;
Described message is divided into groups and every group of message sorted according to the access time according to ADSL+UA and IP+UA;
Set up the redirect graph of a relation that the user visits message.
Step 304: visit the redirect graph of a relation of message and described domain-cookie dictionary according to described user and generate the self-corresponding user ID that can the unique identification user of user's cookie-value relation table and each.
This step is related according to user's redirect relation with the cookie field that proposes, and makes the corresponding a series of cookie field of each user and value.
Set up corresponding relation between the cookie according to the domain-cookie dictionary of the visit redirect graph of a relation of message and generation, concrete steps comprise,
(1) Main Domain (domain) of the host domain name of visit message and redirect graph of a relation is inequality, and the cookie of two website domain name correspondences is in the domain-cookie dictionary, the cookie value of described two website domain names is carried out association, it is right to generate cookie-value, cookie-value is to setting up as described, and then the degree of incidence of cookie-value of two websites adds 1 under this user;
(2) statistics generates cookie corresponding relation figure, and the node of figure is represented cookie-value, and two degree of incidence between the cookie-value are represented on the limit;
(3) degree of incidence is removed less than the limit of threshold value, generate the strong connected component of non-directed graph.The corresponding user of each component namely generates user cookie-value relation table and corresponding ID;
(4) utilize the ID sign node of user cookie-value relation table correspondence, repeating step (2)-(4) no longer change until the number of the user cookie-value relation table that generates.
Concrete, gather the visit message of all users in a period of time, according to ADSL+UA described message is divided into groups, described every group of message accounting different user use the situation of same UA accessed web page in this time period.If some message does not contain ADSL, then according to IP+UA the described message that does not contain ADSL is divided into groups, every group of message accounting different user use the situation of same UA accessed web page in this time period.Sort according to time sequencing according to the message after ADSL+UA and the IP+UA grouping described, set up the redirect graph of a relation of every group access message.
Set up cookie corresponding relation figure according to the redirect graph of a relation of every group access message and the domain-cookie dictionary of generation, and the right degree of incidence of each cookie-value is carried out record; Screen according to default degree of incidence threshold value, it is right more than or equal to the cookie-value of default degree of incidence threshold value to keep degree of incidence, utilize the searching algorithm of figure to obtain the connected component of cookie corresponding relation figure, each connected component represents a user, generates user's cookie-value relation table and corresponding user ID.The searching algorithm that it will be readily appreciated by those skilled in the art that described figure can adopt depth-first search algorithm, BFS algorithm etc.
For example, the cookie corresponding relation figure according to the visit message redirect graph of a relation after the ADSL+UA grouping and the generation of cookie-domain dictionary represents to scheme G.Figure G is A--B--C--D--BE--F--M
As seen from the figure, its connected component number is 2, namely
G1:A--B--C--D–B
G2:E–F–M
G1 and G2 represent two users, the cookie value of node A, the B among the G1, C, D representative of consumer 1 visit different web sites; The cookie value of node E, F, M representative of consumer 2 visit different web sites among the G2.By that analogy, make the corresponding own cookie of each user tabulate in order to the identifying user identity.
Step 102: gather the visit message that the user sends;
Step 103: judge whether described visit message includes cookie information, if, execution in step 104; Otherwise, finish.
User and server carry out when mutual, and in most of the cases, the user visits message all can have cookie information.According to the self-corresponding user ID that can the unique identification user of user's cookie-value relation table and each the visit message is carried out mark, thereby the user is identified.
Step 104: resolve described cookie information, judge whether described cookie information is present in the user cookie-value relation table, if, execution in step 105; Otherwise, finish.
Step 105: the user ID corresponding with this user cookie-value relation table added in the described visit message.
The described identification of present embodiment user's method, can with carry cookie information and resolve after the visit message that is present in the user cookie-value relation table of cookie information identify, carry out User Recognition by the ID that adds user cookie-value relation table correspondence at described message, improved accuracy and the recognition efficiency of User Recognition.
The identification user's that Fig. 2 provides for second embodiment of the invention method flow diagram.As shown in Figure 2, this method comprises the steps:
Step 201: generate the self-corresponding user ID that can the unique identification user of user's cookie-value relation table and each, wherein, described user cookie-value relation table has recorded the cookie field and the customer identity registration information incidence relation that are used for the identifying user identity of each website correspondence.
The process of the self-corresponding user ID that can the unique identification user of described generation cookie-value relation table and each can be referring to the detailed description of Fig. 1 to this part.
Step 202: gather the visit message that the user sends;
Step 203: judge whether described visit message includes cookie information, if, execution in step 204; Otherwise, execution in step 209.
User and server carry out when mutual, and in most of the cases, the user visits message all can have cookie information.According to the self-corresponding user ID that can the unique identification user of user's cookie-value relation table and each the visit message is carried out mark, thereby the user is identified.
Step 204: resolve described cookie information, judge whether described cookie information is present in the user cookie-value relation table, if, execution in step 205; Otherwise, execution in step 207.
Step 205: the user ID corresponding with this user cookie-value relation table added in the described visit message.
Step 206: the visit message of user ID, the cookie-value that carries according to message extracts corresponding unique user's cookie value from described interpolation.
The unique user's of described correspondence cookie value comprises the user name of subscriber mailbox or qq number etc.
Step 207: the field value of described cookie field correspondence is mated with corresponding unique user's of described extraction cookie value, as the match is successful, execution in step 208; Otherwise, finish.
It is described that the match is successful, the field value that refers to described cookie field correspondence is identical with corresponding unique user's of described extraction cookie value or satisfy specified conditions, be judged to be then that the match is successful, for example, described corresponding unique user cookie value of extracting is one 12 character string, when field value and described 12 character strings of described cookie field correspondence are mated in order, when identical, regard as then that the match is successful.When certain specified conditions is set, the field value of cookie field correspondence and described 12 character string are mated in order as described, have only a character string not simultaneously, are judged to be also that the match is successful.Certainly, the setting of described specified conditions is relevant with the accuracy rate of identification.
Step 208: the user ID of the cookie-value relation table that corresponding unique user's of described extraction cookie value is corresponding is added in the described visit message.
Step 209: obtain the information that URL carries in the described visit message.
Step 210: the information that described URL carries is mated with corresponding unique user's of described extraction cookie value, if the match is successful, execution in step 208; Otherwise, finish.
It is described that the match is successful, refer to that information that URL carries and corresponding unique user's of described extraction cookie value is identical or satisfy specified conditions, be judged to be then that the match is successful, for example, described corresponding unique user cookie value of extracting is one 12 character string, the information of carrying as URL and described 12 character string are mated in order, regard as then when identical that the match is successful.When certain specified conditions is set, the URL information of carrying and described 12 character string are mated in order as described, have only a character string not simultaneously, are judged to be also that the match is successful.Certainly, the setting of described specified conditions is relevant with the accuracy rate of identification.
Wherein, if then merge described ID during the corresponding a plurality of ID of the unique user's of described correspondence cookie value, and record the unique user's of described correspondence cookie value and the corresponding relation between each ID, and again the ID of user cookie-value relation table correspondence is added in the message.For example, merge for same mailbox user name or a plurality of ID of qq correspondence, and the corresponding relation of record mailbox user name or qq number and ID, again the ID corresponding with this user cookie-value relation table added in the message.
The method of the described User Recognition of present embodiment, can cover 55% message, simultaneously, cookie value by the unique single user of correspondence (as mailbox or qq number) merges a plurality of UA of user's correspondence, compare with the mode of identifying the user by modes such as ADSL, IP, can obviously improve accuracy and the recognition efficiency of User Recognition.
The identification user's that Fig. 6 provides for the present invention and first embodiment the corresponding apparatus structure block diagram of method.As shown in Figure 6, described device comprises,
Generation module 601, be used for to generate the self-corresponding user ID that can the unique identification user of user's cookie-value relation table and each, wherein, described user cookie-value relation table has recorded the cookie field and the customer identity registration information incidence relation that are used for the identifying user identity of each website correspondence;
Acquisition module 602 is used for gathering the visit message that the user sends;
First judge module 603 is used for judging whether described message carries cookie information;
Second judge module 604 is used for when described visit message carries cookie information, judges whether the cookie field after resolving is present in the user cookie-value relation table;
First identification module 605, be used for to resolve and obtain cookie field and the field value of the cookie information correspondence that the visit message carries, and when described cookie field is present in the user cookie-value relation table, the user ID corresponding with this user cookie-value relation table added in the described visit message.
The identification user's that Fig. 7 provides for the present invention and second embodiment the corresponding apparatus structure block diagram of method.As shown in Figure 7, described device comprises,
Generation module 701, be used for to generate the self-corresponding user ID that can the unique identification user of user's cookie-value relation table and each, wherein, described user cookie-value relation table has recorded the cookie field and the customer identity registration information incidence relation that are used for the identifying user identity of each website correspondence.The step of the self-corresponding user ID that can the unique identification user of described generation cookie-value relation table and each can elaborating with reference to 1 pair of this step of figure.
Acquisition module 702 is used for gathering the visit message that the user sends.
First judge module 703 is used for judging whether described visit message carries cookie information.
Second judge module 704 is used for when described visit message carries cookie information, judges whether the cookie field after resolving is present in the user cookie-value relation table;
First identification module 705, be used for obtaining cookie field and the field value of the cookie information correspondence that the visit message after the parsing carries, and when described cookie field is present in the user cookie-value relation table, the user ID corresponding with this user cookie-value relation table added in the described visit message.
Information extraction modules 706, visit message for user ID from described interpolation, the cookie-value that carries according to message extracts corresponding unique user's cookie value, and the unique user's of described correspondence cookie value comprises the user name of subscriber mailbox or qq number etc.
Second identification module 707, be used for when described cookie field is not present in user cookie-value relation table, the field value of the cookie field correspondence after resolving is mated with corresponding unique user's of described extraction cookie value, if the field value of described cookie field correspondence is identical with corresponding unique user's of described extraction cookie value, then the match is successful, and the user ID of the cookie-value relation table that corresponding unique user's of described extraction cookie value is corresponding is added in the described visit message.
The 3rd identification module 708, be used for when described visit message does not carry cookie information, obtain the information that URL carries in the described visit message, and the information that described URL carries mated with corresponding unique user's of described extraction cookie value, if the information that described URL carries is identical with corresponding unique user's of described extraction cookie value, then the match is successful, and the user ID of the cookie-value relation table that corresponding unique user's of described extraction cookie value is corresponding is added in the described visit message.
Information merges module 709, be used for gathering user ID, if then merge described ID during the corresponding a plurality of ID of the unique user's of described correspondence cookie value, and record the unique user's of described correspondence cookie value and the corresponding relation between each ID, and again the ID of user cookie-value relation table correspondence is added in the message.
Technical scheme of the present invention, correspondence long-term that mode by statistics obtains each website is used for the cookie of unique sign user identity, and cookie field and the user of described identifying user identity carried out, generate user's cookie-value relation table and corresponding user ID thereof; Gather the visit message that the user sends; According to the user ID of user cookie-value relation table and correspondence thereof the visit message is carried out mark and carry out User Recognition, thereby substitute information such as using ADSL, IP and carry out User Recognition, in addition, this method can also be by corresponding unique single user fix information (as mailbox or qq number) merge a plurality of UA of user's correspondence, improved accuracy and the recognition efficiency of User Recognition.
One of ordinary skill in the art will appreciate that all or part of step that realizes in above-described embodiment method is to instruct relevant hardware to finish by program, described program can be stored in the computer read/write memory medium, described storage medium, as: ROM/RAM, magnetic disc, CD etc.
Above-mentioned only is preferred embodiment of the present invention and institute's application technology principle, anyly is familiar with those skilled in the art in the technical scope that the present invention discloses, and the variation that can expect easily or replacement all should be encompassed in protection scope of the present invention.

Claims (11)

1. a method of identifying the user is characterized in that, this method comprises,
Generate the self-corresponding user ID that can the unique identification user of user's cookie-value relation table and each, wherein, described user cookie-value relation table has recorded the cookie field and the customer identity registration information incidence relation that are used for the identifying user identity of each website correspondence;
Gather the visit message that the user sends;
When described visit message carries cookie information, resolve and obtain cookie field and the field value of described cookie information correspondence;
When described cookie field is present in the user cookie-value relation table, the user ID corresponding with this user cookie-value relation table added in the described visit message.
2. identification user's according to claim 1 method is characterized in that described method comprises that also the visit message of user ID, the cookie-value that carries according to message extracts corresponding unique user's cookie value from described interpolation.
3. identification according to claim 2 user's method, it is characterized in that, described method also comprises, when described cookie field is not present in the user cookie-value relation table, the field value of described cookie field correspondence is mated with corresponding unique user's of described extraction cookie value, if the field value of described cookie field correspondence is identical with corresponding unique user's of described extraction cookie value, then the match is successful, and the user ID of the cookie-value relation table that corresponding unique user's of described extraction cookie value is corresponding is added in the described visit message.
4. identification according to claim 2 user's method, it is characterized in that, described method also comprises, when described visit message does not carry cookie information, obtain the information that URL carries in the described visit message, and the information that described URL carries mated with corresponding unique user's of described extraction cookie value, if the information that described URL carries is identical with corresponding unique user's of described extraction cookie value, then the match is successful, and the user ID of the cookie-value relation table that corresponding unique user's of described extraction cookie value is corresponding is added in the described visit message.
5. according to claim 2 or the 3 or 4 described methods of identifying users, it is characterized in that, if during the corresponding a plurality of user ID of corresponding unique user's of described extraction cookie value, then merge described a plurality of user ID, and record the unique user's of described correspondence cookie value and the corresponding relation between each user ID, and again the user ID of this user cookie-value relation table correspondence is added in the described visit message.
6. identification according to claim 1 user's method is characterized in that, the self-corresponding user ID that can the unique identification user of described generation user's cookie-value relation table and each comprises,
Screen single user and website traffic and reach the website of preset flow threshold value;
Obtain each website in the described website according to single user and be used for the cookie field of identifying user identity, generate the domain-cookie dictionary;
Generate the redirect graph of a relation that the user visits message;
Visit the redirect graph of a relation of message and described domain-cookie dictionary according to described user and generate the self-corresponding user ID that can the unique identification user of user's cookie-value relation table and each.
7. identification user's according to claim 6 method is characterized in that described screening single user comprise,
Gather ADSL terminal use correspondence in a period of time different web sites can the unique identification user identity the cookie field value, if described cookie field value remains unchanged in the section at the fixed time, judge that then this user is single user.
8. identification according to claim 6 user's method is characterized in that, describedly obtains each website in the described website according to single user and is used for the cookie field of identifying user identity, generate the domain-cookie dictionary and comprise,
Gather single user and the cookie value of each cookie field correspondence among each website cookie;
Add up the quantity of the corresponding unique single user's of each cookie field cookie value cookie value, calculate the frequency that this cookie field cookie value and single user occur seasonable cookie value one by one;
Add up single number of users of the corresponding unique cookie value of the single user of each cookie field, calculate the frequency that this cookie field cookie value and single user occur seasonable single user one by one;
One by one the frequency of seasonable cookie value appearance and the frequency of single user appearance are filtered each website cookie according to single number of users of each cookie field correspondence, quantity, cookie value and single user of cookie value, select cookie value quantity, single number of users and cookie value and single user cookie field that the corresponding frequency of occurrences is high one by one as the cookie field that is used for the identifying user identity of corresponding website;
Cookie field according to each website domain and corresponding identification user identity thereof generates the domain-cookie dictionary.
9. identification according to claim 6 user's method is characterized in that, the redirect graph of a relation that described generation user visits message comprises,
Gather the visit message of all users in a period of time;
At first according to the mode of ADSL+UA described message is divided into groups, do not carry the visit message of ADSL information if exist, then divide into groups according to the mode of IP+UA, and every group of message sorted according to the access time;
Set up the redirect graph of a relation that the user visits message.
10. identification according to claim 6 user's method is characterized in that, describedly visit the redirect graph of a relation of message and described domain-cookie dictionary according to the user and set up user cookie-value relation table and comprise,
S1: the host domain name of visit message and the Main Domain (domain) of redirect graph of a relation are inequality, and the cookie of two website domain name correspondences is in the domain-cookie dictionary, the cookie value of described two website domain names is carried out association, it is right to generate cookie-value, cookie-value is to setting up as described, and then the degree of incidence of cookie-value of two websites adds 1 under this user;
S2: visit message according to the user and generate cookie corresponding relation figure, the right degree of incidence of cookie-value in the statistical chart;
S3: screen according to default degree of incidence threshold value, obtain the connected component of cookie corresponding relation figure, generate the self-corresponding user ID that can the unique identification user of user's cookie-value relation table and each.
11. a device of identifying the user is characterized in that, described device comprises,
Generation module, be used for to generate the self-corresponding user ID that can the unique identification user of user's cookie-value relation table and each, wherein, described user cookie-value relation table has recorded the cookie field and the customer identity registration information incidence relation that are used for the identifying user identity of each website correspondence;
Acquisition module is used for gathering the visit message that the user sends;
First judge module is used for judging whether described visit message carries cookie information;
Second judge module is used for when described visit message carries cookie information, judges whether the cookie field after resolving is present in the user cookie-value relation table;
First identification module, be used for to resolve and obtain cookie field and the field value of the cookie information correspondence that the visit message carries, and when described cookie field is present in the user cookie-value relation table, the user ID corresponding with this user cookie-value relation table added in the described visit message.
CN201310134318.4A 2013-04-17 2013-04-17 A kind of method and device identifying user Expired - Fee Related CN103237094B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310134318.4A CN103237094B (en) 2013-04-17 2013-04-17 A kind of method and device identifying user

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310134318.4A CN103237094B (en) 2013-04-17 2013-04-17 A kind of method and device identifying user

Publications (2)

Publication Number Publication Date
CN103237094A true CN103237094A (en) 2013-08-07
CN103237094B CN103237094B (en) 2016-04-13

Family

ID=48885110

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310134318.4A Expired - Fee Related CN103237094B (en) 2013-04-17 2013-04-17 A kind of method and device identifying user

Country Status (1)

Country Link
CN (1) CN103237094B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104199849A (en) * 2014-08-08 2014-12-10 亿赞普(北京)科技有限公司 Advertisement injecting method and device
CN105100295A (en) * 2014-05-21 2015-11-25 北京秒针信息咨询有限公司 Method and device for identifying independent users
CN106302797A (en) * 2016-08-31 2017-01-04 北京锐安科技有限公司 A kind of cookie accesses De-weight method and device
CN106855864A (en) * 2015-12-09 2017-06-16 北京秒针信息咨询有限公司 A kind of method and apparatus of extraction information
CN107092535A (en) * 2017-04-18 2017-08-25 上海雷腾软件股份有限公司 Method and apparatus for the data storage of test interface
CN107426133A (en) * 2016-05-23 2017-12-01 株式会社理光 A kind of method and device for establishing user identity mapping relations
JP2018018523A (en) * 2016-07-26 2018-02-01 株式会社リコー Method for associating user access log, apparatus, system, program and recording medium
CN108536831A (en) * 2018-04-11 2018-09-14 上海驰骛信息科技有限公司 A kind of user's identifying system and method based on multi-parameter
CN108595657A (en) * 2018-04-28 2018-09-28 成都智信电子技术有限公司 The tables of data classification map method and apparatus of HIS systems
CN109388686A (en) * 2017-08-10 2019-02-26 北京国双科技有限公司 A kind of user identifier method and device
CN110995887A (en) * 2019-12-17 2020-04-10 武汉绿色网络信息服务有限责任公司 ID association method and device
CN112152873A (en) * 2020-09-02 2020-12-29 杭州安恒信息技术股份有限公司 User identification method and device, computer equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080235243A1 (en) * 2007-03-21 2008-09-25 Nhn Corporation System and method for expanding target inventory according to browser-login mapping
CN102333092A (en) * 2011-09-30 2012-01-25 北京亿赞普网络技术有限公司 Network user identification method and application server

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080235243A1 (en) * 2007-03-21 2008-09-25 Nhn Corporation System and method for expanding target inventory according to browser-login mapping
CN102333092A (en) * 2011-09-30 2012-01-25 北京亿赞普网络技术有限公司 Network user identification method and application server

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105100295A (en) * 2014-05-21 2015-11-25 北京秒针信息咨询有限公司 Method and device for identifying independent users
CN104199849A (en) * 2014-08-08 2014-12-10 亿赞普(北京)科技有限公司 Advertisement injecting method and device
CN106855864A (en) * 2015-12-09 2017-06-16 北京秒针信息咨询有限公司 A kind of method and apparatus of extraction information
CN107426133B (en) * 2016-05-23 2020-06-30 株式会社理光 Method and device for identifying user identity information
CN107426133A (en) * 2016-05-23 2017-12-01 株式会社理光 A kind of method and device for establishing user identity mapping relations
JP2018018523A (en) * 2016-07-26 2018-02-01 株式会社リコー Method for associating user access log, apparatus, system, program and recording medium
CN107659602A (en) * 2016-07-26 2018-02-02 株式会社理光 Association user accesses the method, apparatus and system of record
CN106302797A (en) * 2016-08-31 2017-01-04 北京锐安科技有限公司 A kind of cookie accesses De-weight method and device
CN107092535B (en) * 2017-04-18 2020-06-19 上海雷腾软件股份有限公司 Method and apparatus for data storage of test interface
CN107092535A (en) * 2017-04-18 2017-08-25 上海雷腾软件股份有限公司 Method and apparatus for the data storage of test interface
CN109388686A (en) * 2017-08-10 2019-02-26 北京国双科技有限公司 A kind of user identifier method and device
CN108536831A (en) * 2018-04-11 2018-09-14 上海驰骛信息科技有限公司 A kind of user's identifying system and method based on multi-parameter
CN108595657A (en) * 2018-04-28 2018-09-28 成都智信电子技术有限公司 The tables of data classification map method and apparatus of HIS systems
CN108595657B (en) * 2018-04-28 2020-10-09 成都智信电子技术有限公司 Data table classification mapping method and device of HIS (hardware-in-the-system)
CN110995887A (en) * 2019-12-17 2020-04-10 武汉绿色网络信息服务有限责任公司 ID association method and device
CN112152873A (en) * 2020-09-02 2020-12-29 杭州安恒信息技术股份有限公司 User identification method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN103237094B (en) 2016-04-13

Similar Documents

Publication Publication Date Title
CN103237094A (en) Method and device for user identification
US11593301B2 (en) Session-based processing method and system
CN108304410B (en) Method and device for detecting abnormal access page and data analysis method
CN100394727C (en) Log analyzing method and system
De Choudhury et al. How does the data sampling strategy impact the discovery of information diffusion in social media?
CN102316130B (en) A kind of behavior based on user judges the method and apparatus of the close and distant degree of itself and good friend
CN106570013B (en) Method and device for processing page access data
US20130318603A1 (en) Security threat detection based on indications in big data of access to newly registered domains
US11816172B2 (en) Data processing method, server, and computer storage medium
CN101409690A (en) Method and system for obtaining internet user behaviors
CN103051637A (en) User identification method and device
CN106708841B (en) The polymerization and device of website visitation path
CN108366012B (en) Social relationship establishing method and device and electronic equipment
CN109359263B (en) User behavior feature extraction method and system
CN105224691A (en) A kind of information processing method and device
CN104202418A (en) Method and system for recommending commercial content distribution network for content provider
CN110392032B (en) Method, device and storage medium for detecting abnormal URL
CN108650145A (en) Phone number characteristic automatic extraction method under a kind of home broadband WiFi
CN105989019B (en) A kind of method and device for cleaning data
CN106844553A (en) Data snooping and extending method and device based on sample data
CN105653674A (en) File management method and system of intelligent terminal
EP3361405B1 (en) Enhancement of intrusion detection systems
CN111611508B (en) Identification method and device for actual website access of user
CN106933860B (en) Malicious Uniform Resource Locator (URL) identification method and device
CN109145307A (en) User's face sketch recognition method, method for pushing, device, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160413

Termination date: 20170417