WO2017028728A1 - Determining method and device for click through rate (ctr) - Google Patents

Determining method and device for click through rate (ctr) Download PDF

Info

Publication number
WO2017028728A1
WO2017028728A1 PCT/CN2016/094448 CN2016094448W WO2017028728A1 WO 2017028728 A1 WO2017028728 A1 WO 2017028728A1 CN 2016094448 W CN2016094448 W CN 2016094448W WO 2017028728 A1 WO2017028728 A1 WO 2017028728A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample data
sequence
ctr
application
feature values
Prior art date
Application number
PCT/CN2016/094448
Other languages
French (fr)
Chinese (zh)
Inventor
谢海军
邓国
邓琨
贺露
Original Assignee
北京金山安全软件有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京金山安全软件有限公司 filed Critical 北京金山安全软件有限公司
Publication of WO2017028728A1 publication Critical patent/WO2017028728A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Definitions

  • the present invention relates to the field of Internet technologies, and in particular, to a method and an apparatus for determining a click arrival rate CTR.
  • Internet product promoters can use the application wall and other promotional resources to promote the application to the developer of the application, that is, the promotion of the Internet product will be based on the user's current operating scenario and the determined click-through rate (CTR) of the application.
  • Click Through Rate recommends one or more premium apps with a higher CTR for the user. It can be seen that quickly determining the application's CTR is particularly important in the promotion of the application.
  • the common CTR determination method is a statistical-based determination method, that is, the CTR is the same in the same operation scenario, and the CTR in the different operation scenarios is found to be the same as or similar to the current operation scenario.
  • the CTR under the operating scenario is used as the CTR applied in the current operating scenario.
  • the statistics-based determination method relies on a large amount of historical sample data, and multiple feature dimensions need to be considered when searching for an operation scenario that is the same as or similar to the current operation scenario.
  • the CTR of the application is determined and the resource consumption is large.
  • the embodiment of the invention discloses a method and a device for determining the click arrival rate CTR, which can quickly determine the CTR of the application and consume small resources.
  • the first aspect of the present invention discloses a method for determining a click arrival rate CTR, and the method includes:
  • sequence of feature values for each application is determined, the sequence of feature values being a sequence of applied feature values for describing application information, a sequence of user feature values for describing user information, and a flow feature value Sequence composition
  • the output of the CTR calculation algorithm corresponding to each of the sequence of feature values is determined as the CTR of the application corresponding to the sequence of feature values.
  • the CTR calculation algorithm is a logistic regression model based algorithm
  • the calculation formula of the logistic regression model based algorithm is:
  • the y CTR is an output of the calculation formula
  • the It is a coefficient of the calculation formula calculated in advance.
  • the method further comprises: before determining a sequence of feature values for each application when detecting a display request for an application, the method further comprising:
  • the second sample data is formed by the second sample data a sum of a sequence of feature values of a sample of data, a sum of presentation identifiers of first sample data forming the second sample data, and a click identifier of first sample data forming the second sample data;
  • the method further comprises: before determining a sequence of feature values for each application when detecting a display request for an application, the method further comprising:
  • the second sample data is formed by the second sample data a sum of a sequence of feature values of a sample of data, a sum of presentation identifiers of first sample data forming the second sample data, and a click identifier of first sample data forming the second sample data;
  • Each second sample data included in each of the sample sets is expressed as a likelihood expression, respectively, and all of the likelihood expressions corresponding to each of the sample sets are multiplied to obtain a product of the sample set Expression
  • the method further includes:
  • the second quantity of second sample data is stored in a memory space of consecutive addresses.
  • the second aspect of the present invention discloses a determining device for a click arrival rate CTR, the device comprising a first determining unit, a first obtaining unit, and a second determining unit, wherein:
  • the first determining unit is configured to determine a sequence of feature values of each application when the display request for the application is detected, where the sequence of feature values is used by the application feature value sequence for describing application information, and is used to describe user information.
  • User characteristic value sequence and flow characteristic value sequence composition are configured to determine a sequence of feature values of each application when the display request for the application is detected, where the sequence of feature values is used by the application feature value sequence for describing application information, and is used to describe user information.
  • the first acquiring unit is configured to respectively obtain, according to an input of a preset CTR calculation algorithm, an output of the CTR calculation algorithm corresponding to each of the feature value sequences;
  • the second determining unit is configured to determine an output of the CTR calculation algorithm corresponding to each of the feature value sequences as a CTR corresponding to the application of the feature value sequence.
  • the CTR calculation algorithm is a logistic regression model based algorithm
  • the calculation formula of the logistic regression model based algorithm is:
  • the y CTR is an output of the calculation formula
  • the It is a coefficient of the calculation formula calculated in advance.
  • the apparatus further includes a first reading unit, a first merging unit, a second obtaining unit, a first calculating unit, and a third determining unit, wherein:
  • the first reading unit is configured to read a first quantity of first sample data from pre-stored sample data, where the sample data is a sequence of feature values, and a display identifier for identifying whether the sample data is presented And a click identifier for identifying whether the sample data is clicked;
  • the first merging unit is configured to merge the first number of eigenvalue sequences in the first quantity of the first sample data to be the same
  • the data is obtained to obtain a second quantity of second sample data, the second sample data is formed by a sequence of feature values of the first sample data forming the second sample data, and the first sample data forming the second sample data And a sum of the click identifiers of the first sample data forming the second sample data;
  • the second obtaining unit is configured to express each of the second sample data into a likelihood expression, and multiply all the likelihood expressions to obtain a product likelihood expression;
  • the first calculating unit is configured to perform an iteration of a target number of times by a Newton iteration method and an initial iteration parameter, and calculate a value of an unknown parameter in the product likelihood expression when the product likelihood expression takes a maximum value;
  • the third determining unit is configured to determine a value of the unknown parameter as the
  • the apparatus further includes a second reading unit, a second merging unit, a averaging unit, a third obtaining unit, a second calculating unit, and a fourth determining unit, wherein:
  • the second reading unit is configured to read a first quantity of first sample data from pre-stored sample data, where the first sample data is used by a sequence of feature values to identify whether the sample data is displayed a presentation identifier and a click identifier for identifying whether the sample data is clicked;
  • the second merging unit is configured to merge the first sample data of the first number of first sample data with the same sequence of feature values to obtain a second quantity of second sample data, the second sample data a sum of a sequence of feature values of the first sample data forming the second sample data, a sum of presentation identifiers of the first sample data forming the second sample data, and a click of the first sample data forming the second sample data
  • the equalizing unit is configured to divide the second quantity of second sample data into a sample group including a third quantity of second sample data
  • the third obtaining unit is configured to respectively express each second sample data included in each of the sample groups into a likelihood expression, and compare all the likelihood expressions corresponding to each of the sample groups Multiply by to obtain a product likelihood expression for the sample set;
  • the second calculating unit is configured to perform an iteration by using a Newton iteration method and an initial iteration parameter, and respectively calculate a first value of an unknown parameter in the product likelihood expression when each of the product likelihood expressions takes a maximum value Performing, by using the sum of each of the first values as an initial iteration parameter of the next iteration, performing an iterative operation by the Newton iteration method and the initial iteration parameter until the number of iterations reaches the target number of times;
  • the fourth determining unit is configured to determine a sum of the second value of the unknown parameter in each of the product likelihood expressions calculated after the target number of times is iterated as
  • the apparatus further includes a storage unit, wherein:
  • the storage unit is configured to store the second quantity of second sample data in a memory space with consecutive addresses.
  • a third aspect of the present invention discloses a server, where the server includes:
  • a memory for storing instructions executable by the processor
  • the processor is configured to read executable instructions stored in the memory to execute a program corresponding to the instructions for performing a determination method of a click arrival rate CTR according to an embodiment of the first aspect of the present invention .
  • a fourth aspect of the present invention discloses a computer readable storage medium having instructions stored therein, when the processor of the server executes the instructions, the server performs the method according to the first aspect of the present invention. Click the method of determining the arrival rate CTR.
  • a fifth aspect of the present invention discloses a computer application that, when executed on a processor of a server, performs a method of determining a click arrival rate CTR as in the first aspect of the present invention.
  • a sequence of feature values of each application is determined, wherein the sequence of feature values is a sequence of application feature values for describing application information, and a user for describing user information.
  • the eigenvalue sequence and the flow eigenvalue sequence are composed, and each eigenvalue sequence is respectively input as a preset CTR calculation algorithm, and an output of the CTR calculation algorithm corresponding to each eigenvalue sequence is obtained, and each eigenvalue sequence is obtained.
  • the output of the corresponding CTR calculation algorithm is determined to correspond to the CTR of the applied sequence of feature values.
  • the embodiment of the present invention can quickly calculate the CTR of each application according to the determined sequence of feature values of each application and the preset CTR calculation algorithm, and does not need to calculate the application in different operation scenarios according to the historical sample data.
  • the CTR finds the CTR in the same or similar operation scenario as the current operation scenario, and the resource consumption is low.
  • FIG. 1 is a schematic flow chart of a method for determining a click arrival rate CTR according to an embodiment of the present invention
  • FIG. 2 is a schematic flow chart of another method for determining a click arrival rate CTR according to an embodiment of the present invention
  • FIG. 3 is a schematic flow chart of still another method for determining a click arrival rate CTR according to an embodiment of the present invention
  • FIG. 4 is a schematic structural diagram of a device for determining a click arrival rate CTR according to an embodiment of the present invention
  • FIG. 5 is a schematic structural diagram of another apparatus for determining a click arrival rate CTR according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of still another apparatus for determining a click arrival rate CTR according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a server according to an embodiment of the present invention.
  • the embodiment of the invention discloses a method and a device for determining a click arrival rate CTR, which can quickly calculate the CTR of each application according to the determined sequence of feature values of each application and a preset CTR calculation algorithm, and resource consumption. low. The details are described below separately.
  • FIG. 1 is a schematic flowchart diagram of a method for determining a click arrival rate CTR according to an embodiment of the present invention. Among them, the method shown in FIG. 1 can be applied to a server. As shown in FIG. 1, the method for determining the click arrival rate CTR may include the following steps:
  • S101 Determine a sequence of feature values of each application when detecting a display request for an application.
  • the display request for the application may be triggered by the user through the terminal device, or may be actively triggered by the terminal device, and the sequence of feature values of each application is used to describe the application information (such as the category of the application). And a sequence of application feature values of the related description information of the application, etc., a sequence of user feature values for describing user information (such as the gender of the user and the interest of the user, etc.) and information describing the user behavior (such as time, place, language, etc.)
  • the sequence of traffic characteristic values is composed, and the sequence of feature values is a feature value vector composed of a plurality of 0s and 1s as components.
  • the application information is the category of the application (shooting application and game application)
  • the user information is the user's gender (male and female)
  • the behavior information is the location (Shanghai and Beijing)
  • a3 and a4 are used to describe the gender of the user
  • a5 and a6 are used to describe
  • the terminal device in the embodiment of the present invention may include: a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a personal digital assistant (PDA), a tablet computer (PAD), a portable multimedia player (PMP), a navigation device, and a desktop computer. And so on.
  • PDA personal digital assistant
  • PAD tablet computer
  • PMP portable multimedia player
  • Each of the feature value sequences is respectively input as a preset CTR calculation algorithm, and an output of the CTR calculation algorithm corresponding to each feature value sequence is acquired.
  • the preset CTR calculation algorithm is used to represent the relationship between the applied feature value sequence (input) and the CTR (output) of the application, that is, the eigenvalue sequence of each application is respectively used as the CTR calculation algorithm.
  • the input, the output of the corresponding CTR calculation algorithm is the CTR of the application.
  • the CTR calculation algorithm is an algorithm based on a logistic regression model, and the calculation formula of the algorithm based on the logistic regression model is:
  • y CTR is the output of the calculation formula (CTR of the application)
  • I the coefficient of the calculation formula calculated in advance.
  • the calculation formula is calculated according to certain historical sample data. The CTR of the application can be quickly determined by acquiring the sequence of feature values of the application in the subsequent CTR determination process.
  • a sequence of feature values of each application is determined, wherein the sequence of feature values is a sequence of application feature values for describing application information, and a user for describing user information.
  • the eigenvalue sequence and the flow eigenvalue sequence are composed, and each eigenvalue sequence is used as an input of a preset CTR calculation algorithm, and an output of a CTR calculation algorithm corresponding to each eigenvalue sequence is obtained, and each eigenvalue sequence is corresponding.
  • the output of the CTR calculation algorithm is determined to correspond to the CTR of the applied sequence of feature values.
  • the embodiment of the present invention can quickly calculate the CTR of each application according to the determined sequence of feature values of each application and the preset CTR calculation algorithm, and does not need to count from the historical sample data to be applied in different operation scenarios.
  • the CTR finds the CTR in the same or similar operation scenario as the current operation scenario, and the resource consumption is low.
  • FIG. 2 is a schematic flowchart diagram of another method for determining a click arrival rate CTR according to an embodiment of the present invention. Among them, the method shown in Figure 2 is applied to the server. As shown in FIG. 2, the method for determining the click arrival rate CTR may include the following steps:
  • each sample data is composed of a sequence of feature values, a presentation identifier pv for identifying whether the sample data is presented, and a click identifier click for identifying whether the sample data is clicked, and the sequence of feature values is used.
  • a sequence of application feature values describing application information (such as categories of applications and related description information of applications, etc.), sequence of user feature values for describing user information (such as gender of the user and interest of the user, etc.) and for describing user behavior
  • a sequence of traffic characteristic values of information (such as time, place, language, etc.), wherein pv is 1 and click is 0 indicates that the sample data is presentation sample data, pv is 0 and click 1 indicates that the sample data is click sample data. .
  • the first sample data with the same sequence of feature values in the first quantity of the first sample data is merged to obtain a second quantity of second sample data.
  • the first sample data having the same sequence of feature values is merged into a second sample data by using the same sequence of feature values as the principle of merging, that is, the first sample data of the first quantity is merged into the first sample data.
  • Second quantity of the second sample Data, and the second number is equal to the number of different values of the sequence of feature values in the first sample data of the first number.
  • the second quantity of second sample data is formed by a sum of a sequence of feature values of the first sample data forming the second sample data, a pv of all first sample data forming the second sample data, and forming the second The sum of the clicks of all the first sample data of the sample data.
  • each second sample data may be represented as a likelihood expression, wherein the likelihood expression is:
  • n takes all integers greater than or equal to 1 and less than or equal to the second number
  • pv n is the sum of pvs of all first sample data forming the nth second sample data
  • click n is all the numbers forming the nth second sample data
  • the initial iteration parameter For the preset iteration parameter that is, the value of the unknown parameter in the product likelihood expression is calculated when the product likelihood expression takes the maximum value by performing an iteration by the Newton iteration method.
  • the target number m may be a preset number of times, or may be based on versus The angle is the minimum or according to The modulo is calculated as the minimum value, which is not limited in the embodiment of the present invention.
  • the logarithm of the product likelihood expression may be first obtained, and then multiplied by -1 to obtain Then iterate through the Newton iteration method to iterate the target times m times.
  • the value of the unknown parameter when taking the minimum value may be first obtained, and then multiplied by -1 to obtain Then iterate through the Newton iteration method to iterate the target times m times.
  • S205 Determine a sequence of feature values for each application when a display request for the application is detected.
  • the display request for the application may be triggered by the user through the terminal device, or may be The terminal device is actively triggered, and the sequence of feature values of each application is used to describe application information (such as the category of the application and related description information of the application, etc.), and is used to describe user information (such as the gender of the user).
  • application information such as the category of the application and related description information of the application, etc.
  • user information such as the gender of the user.
  • the sequence of feature values is a feature value of a plurality of 0s and 1s as components vector.
  • Each of the feature value sequences is respectively input as a preset CTR calculation algorithm, and an output of the CTR calculation algorithm corresponding to each feature value sequence is obtained.
  • the CTR calculation algorithm is used to represent the relationship between the application's sequence of feature values (input) and the CTR (output) of the application, that is, the sequence of feature values of each application is used as the input of the CTR calculation algorithm, respectively.
  • the output of the corresponding CTR calculation algorithm is the CTR of the application.
  • the CTR calculation algorithm is an algorithm based on a logistic regression model, and the calculation formula of the algorithm based on the logistic regression model is:
  • the application with the top CTR ranking may be recommended to the user.
  • step S202 After performing step S202 and before performing step S203, the following operations may also be performed:
  • the second quantity of the second sample data is stored in a memory space of consecutive addresses.
  • the second quantity of the second sample data is stored in the contiguous memory space and the head and tail pointer array may be used to identify the start memory and the end memory of each second sample data, thereby speeding up the second Reading of sample data.
  • the implementation of the embodiment of the present invention can obtain a calculation formula for indicating the relationship between the application of the feature value sequence and the CTR of the application by using a learning process, and can quickly determine the CTR based on the determined feature value sequence and the calculation formula. Calculate the CTR of the application, and then quickly display the appropriate application in the promotion resource, improve the user experience, and merge the sample data when the calculation formula is obtained, and the resource consumption is low.
  • FIG. 3 is a schematic flowchart diagram of still another method for determining a click arrival rate CTR according to an embodiment of the present invention.
  • the method shown in FIG. 3 can be applied to a server.
  • the click arrival rate CTR can be determined as follows:
  • each sample data is composed of a sequence of feature values, a presentation identifier pv for identifying whether the sample data is presented, and a click identifier click for identifying whether the sample data is clicked, and the sequence of feature values is used.
  • a sequence of application feature values describing application information (such as categories of applications and related description information of applications, etc.), sequence of user feature values for describing user information (such as gender of the user and interest of the user, etc.) and for describing user behavior
  • a sequence of traffic characteristic values of information (such as time, place, language, etc.), wherein pv is 1 and click is 0 indicates that the sample data is presentation sample data, pv is 0 and click 1 indicates that the sample data is click sample data. .
  • the first sample data with the same sequence of feature values in the first quantity of the first sample data is merged to obtain a second quantity of second sample data.
  • the first sample data having the same sequence of feature values is merged into a second sample data by using the same sequence of feature values as the principle of merging, that is, the first sample data of the first quantity is merged into the first sample data.
  • a second quantity is equal to the number of different values of the sequence of feature values in the first sample data of the first quantity.
  • the second quantity of second sample data is formed by a sum of a sequence of feature values of the first sample data forming the second sample data, a pv of all first sample data forming the second sample data, and forming the second The sum of the clicks of all the first sample data of the sample data.
  • the third quantity may be less than or equal to the number of CPU cores in the server, so that the manner of equally dividing the second quantity of second sample data can simultaneously perform the same processing for each third quantity of the second sample data. , speed up the processing speed.
  • each product likelihood expression is iterated to obtain a value of the second number of unknown parameters, and then the values of the second number of unknown parameters are summed as the initial iteration parameter at the next iteration.
  • step S309 when the determination result in step S307 is YES, step S309 is performed; when the determination result in step S307 is NO, step S308 is performed.
  • the target number m may be a preset number of times, or may be a sum of values of the second number of unknown parameters obtained after the m-1th iteration.
  • the modulo is calculated as the minimum value, which is not limited in the embodiment of the present invention.
  • step S305 is performed.
  • S310 Determine a sequence of feature values for each application when a display request for the application is detected.
  • Each of the feature value sequences is respectively input as a preset CTR calculation algorithm, and an output of the CTR calculation algorithm corresponding to each feature value sequence is acquired.
  • step S302 After performing step S302 and before performing step S303, the following operations may also be performed:
  • the second quantity of the second sample data is stored in a memory space of consecutive addresses.
  • the second quantity of the second sample data is stored in the contiguous memory space and the head and tail pointer array may be used to identify the start memory and the end memory of each second sample data, thereby speeding up the second Reading of sample data.
  • Embodiments of the present invention can quickly determine the CTR of an application and consume small resources.
  • FIG. 4 is a schematic structural diagram of a device for determining a click arrival rate CTR according to an embodiment of the present invention.
  • the device can be installed in a server.
  • the apparatus may include a first determining unit 401, a first obtaining unit 402, and a second determining unit 403, where:
  • the first determining unit 401 is configured to determine a sequence of feature values for each application when a display request for the application is detected.
  • the display request for the application may be triggered by the user through the terminal device, or may be actively triggered by the terminal device, and the sequence of feature values of each application is used to describe the application information (such as the category of the application). And a sequence of application feature values of the related description information of the application, etc., a sequence of user feature values for describing user information (such as the gender of the user and the interest of the user, etc.) and information describing the user behavior (such as time, place, language, etc.)
  • the sequence of traffic characteristic values is composed, and the sequence of feature values is a feature value vector composed of a plurality of 0s and 1s as components.
  • the first obtaining unit 402 is configured to respectively acquire each feature value sequence as an input of a preset CTR calculation algorithm, and acquire an output of a CTR calculation algorithm corresponding to each feature value sequence.
  • the preset CTR calculation algorithm is used to represent the relationship between the applied feature value sequence (input) and the CTR (output) of the application, that is, the eigenvalue sequence of each application is respectively used as the CTR calculation algorithm.
  • the input, the output of the corresponding CTR calculation algorithm is the CTR of the application.
  • the second determining unit 403 is configured to determine an output of the CTR calculation algorithm corresponding to each sequence of feature values as the feature value.
  • the sequence corresponds to the CTR of the application.
  • the CTR calculation algorithm is an algorithm based on a logistic regression model, and the calculation formula of the algorithm based on the logistic regression model is:
  • y CTR is the output of the calculation formula (CTR of the application)
  • I the coefficient of the calculation formula calculated in advance.
  • the calculation formula is calculated according to certain historical sample data. The CTR of the application can be quickly determined by acquiring the sequence of feature values of the application in the subsequent CTR determination process.
  • the device may further include a first reading unit 404, a first merging unit 405, a second obtaining unit 406, a first calculating unit 407, and a third determining unit 408.
  • the structure of the apparatus can be as shown in FIG. 5.
  • FIG. 5 is a schematic structural diagram of another apparatus for determining the click arrival rate CTR disclosed in the embodiment of the present invention. among them:
  • the first reading unit 404 is configured to read the first quantity of first sample data from the pre-stored sample data.
  • each sample data is composed of a sequence of feature values, a presentation identifier pv for identifying whether the sample is presented, and a click identifier click for identifying whether the sample data is clicked
  • the sequence of feature values is used for
  • a sequence of application feature values describing application information such as the category of the application and related description information of the application, etc.
  • a sequence of user feature values for describing user information such as the gender of the user and the user's interest, etc.
  • information describing the user behavior A sequence of traffic characteristic values (such as time, place, language, etc.), wherein pv is 1 and click is 0 indicates that the sample data is presentation sample data, pv is 0 and click 1 indicates that the sample data is click sample data.
  • the first merging unit 405 is configured to merge the first sample data having the same sequence of feature values in the first quantity of the first sample data to obtain the second quantity of second sample data.
  • the first sample data having the same sequence of feature values is merged into a second sample data by using the same sequence of feature values as the principle of merging, that is, the first sample data of the first quantity is merged into the first sample data.
  • a second quantity is equal to the number of different values of the sequence of feature values in the first sample data of the first quantity.
  • the second quantity of second sample data is formed by a sum of a sequence of feature values of the first sample data forming the second sample data, a pv of all first sample data forming the second sample data, and forming the second The sum of the clicks of all the first sample data of the sample data.
  • the second obtaining unit 406 is configured to express each second sample data as a likelihood expression and multiply all likelihood expressions to obtain a product likelihood expression.
  • the first calculating unit 407 is configured to perform the iteration of the target times by the Newton iteration method and the initial iteration parameter, and calculate The value of the unknown parameter in the above product likelihood expression is obtained when the above product likelihood expression takes the maximum value.
  • the third determining unit 408 is configured to determine the value of the above unknown parameter as the above
  • the device may further include a second reading unit 409, a second merging unit 410, a storage unit 411, a averaging unit 412, a third obtaining unit 413, a second calculating unit 414, and The fourth determining unit 415, at this time, the structure of the device can be as shown in FIG. 6.
  • FIG. 6 is a schematic structural diagram of another determining device for the click arrival rate CTR disclosed in the embodiment of the present invention. among them:
  • the second reading unit 409 is configured to read the first quantity of the first sample data from the pre-stored sample data.
  • the second merging unit 410 is configured to merge the first sample data having the same sequence of feature values in the first quantity of the first sample data to obtain a second quantity of second sample data.
  • the first sample data having the same sequence of feature values is merged into a second sample data by using the same sequence of feature values as the principle of merging, that is, the first sample data of the first quantity is merged into the first sample data.
  • a second quantity is equal to the number of different values of the sequence of feature values in the first sample data of the first quantity.
  • the second quantity of second sample data is formed by a sum of a sequence of feature values of the first sample data forming the second sample data, a pv of all first sample data forming the second sample data, and forming the second The sum of the clicks of all the first sample data of the sample data.
  • the storage unit 411 is configured to store the second quantity of the second sample data in the memory space with consecutive addresses.
  • the averaging unit 412 is configured to divide the second number of second sample data into a sample group including a third number of second sample data.
  • the third quantity may be less than or equal to the number of CPU cores in the server, so that the manner of equally dividing the second quantity of second sample data can simultaneously perform the same processing for each third quantity of the second sample data. , speed up the processing speed.
  • the third obtaining unit 413 is configured to respectively express each second sample data included in each sample group into a likelihood expression, and multiply all likelihood expressions corresponding to each sample group to obtain a product of the sample group. Likelihood expression.
  • the second calculating unit 414 is configured to perform an iteration by the Newton iteration method and the initial iterative parameter, and respectively calculate the first value of the unknown parameter in the product likelihood expression when each product likelihood expression takes the maximum value, and each will be The sum of the first values is used as the initial iteration parameter for the next iteration, and an iterative operation is performed by the Newton iteration method and the initial iteration parameters until the number of iterations reaches the target number of times.
  • the fourth determining unit 415 is configured to determine, as the above, the sum of the second values of the unknown parameters in each of the product likelihood expressions calculated after the target number of times is iterated
  • the embodiment of the invention can quickly determine the CTR of the application and the resource consumption is small.
  • the present invention also proposes a server including the determining means of the click arrival rate CTR of another embodiment of the present invention described in FIGS. 4, 5 and 6.
  • FIG. 7 is a schematic structural diagram of a server for performing a method for determining a click arrival rate CTR according to an embodiment of the present invention.
  • the server may include at least one processor 501, such as a CPU, at least one network interface 504 or other user interface 503, memory 505, at least one communication bus 502. Communication bus 502 is used to implement connection communication between these components.
  • the user interface 503 can optionally include a USB interface and other standard interfaces and wired interfaces.
  • Network interface 504 can optionally include a Wi-Fi interface as well as other wireless interfaces.
  • the memory 505 may include a high speed RAM memory and may also include a non-volatile memory such as at least one disk memory.
  • the memory 505 can optionally include at least one storage device located remotely from the aforementioned processor 501. As shown in FIG. 7, an operating system 5051 and an application 5052 may be included in the memory 505 as a computer storage medium.
  • the memory 505 stores the following elements, executable modules or data structures, or a subset thereof, or their extended set:
  • Operating system 5051 including various system programs for implementing various basic services and processing hardware-based tasks;
  • the application 5052 includes various setting programs of a data block parameter setting program of the target network data, a partitioning program of the target data block, a comparison program of the target data block and the data block in the database, and a deletion program of the target data block. To achieve a variety of application services.
  • the processor 501 is configured to call a program stored in the memory 505, and performs the following operations:
  • Receiving client feature information collected by the client where the client feature information is formed according to related information of the application in the client operating system and related information of the client user, where the related information of the application includes the application At least one of operational information, behavior information of the application, and context information related to the application.
  • the embodiment of the present invention further provides a computer readable storage medium having instructions stored therein, when the processor of the mobile terminal executes the instruction, the mobile terminal performs the determining method of the click arrival rate CTR.
  • the embodiment of the present invention further provides a computer application program that performs the above-described determination method of the click arrival rate CTR when it is executed on the processor of the mobile terminal.
  • the units in the apparatus of the embodiment of the present invention may be combined, divided, and deleted according to actual needs.
  • the unit in the embodiment of the present invention may be implemented by a general-purpose integrated circuit, such as a CPU (Central Processing Unit), or an ASIC (Application Specific Integrated Circuit).
  • a CPU Central Processing Unit
  • ASIC Application Specific Integrated Circuit
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
  • a "computer-readable medium” can be any apparatus that can contain, store, communicate, propagate, or transport a program for use in an instruction execution system, apparatus, or device, or in conjunction with the instruction execution system, apparatus, or device.
  • computer readable media include the following: electrical connections (electronic devices) having one or more wires, portable computer disk cartridges (magnetic devices), random access memory (RAM), Read only memory (ROM), erasable editable read only memory (EPROM or flash memory), fiber optic devices, and portable compact disk read only memory (CDROM).
  • the computer readable medium may even be a paper or other suitable medium on which the program can be printed, as it may be optically scanned, for example by paper or other medium, followed by editing, interpretation or, if appropriate, other suitable The method is processed to obtain the program electronically and then stored in computer memory.
  • portions of the invention may be implemented in hardware, software, firmware or a combination thereof.
  • multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals. Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.
  • the terms “installation”, “connected”, “connected”, “fixed” and the like shall be understood broadly, and may be either a fixed connection or a detachable connection, unless explicitly stated and defined otherwise. , or integrated; can be mechanical or electrical connection; can be directly connected, or indirectly connected through an intermediate medium, can be the internal communication of two elements or the interaction of two elements, unless otherwise specified Limited.
  • the specific meanings of the above terms in the present invention can be understood on a case-by-case basis.

Abstract

A determining method and device for a click through rate (CTR). The method comprises: when a display request aiming at applications is detected, determining a characteristic value sequence of each application (S101), wherein the characteristic value sequence consists of an application characteristic value sequence for describing application information, a user characteristic value sequence for describing user information, and a flow characteristic value sequence; respectively using each characteristic value sequence as an input of a preset CTR computational algorithm, and acquiring an output, that corresponds to each characteristic value sequence, of the CTR computational algorithm (S102); and determining the output, that corresponds to each characteristic value sequence, of the CTR computational algorithm as a CTR of an application corresponding to the characteristic value sequence (S103). The method can quickly determine the CTR of an application and has low resource consumption.

Description

点击到达率CTR的确定方法及装置Method and device for determining click arrival rate CTR
相关申请的交叉引用Cross-reference to related applications
本申请基于申请号为201510507737.7,申请日为2015年8月18日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。The present application is based on a Chinese patent application No. 201510507737.7, filed on Aug. 18, 2015, and the priority of which is hereby incorporated by reference.
技术领域Technical field
本发明涉及互联网技术领域,具体涉及一种点击到达率CTR的确定方法及装置。The present invention relates to the field of Internet technologies, and in particular, to a method and an apparatus for determining a click arrival rate CTR.
背景技术Background technique
在互联网技术领域,互联网产品的推广方可以利用应用墙等推广资源为应用的开发方推广应用,即互联网产品的推广方会根据用户的当前操作场景以及确定出的应用的点击到达率(CTR,Click Through Rate)为用户推荐CTR较高的一个或多个优质应用。可见,快速确定出应用的CTR在应用的推广方面显得尤为重要。In the field of Internet technology, Internet product promoters can use the application wall and other promotional resources to promote the application to the developer of the application, that is, the promotion of the Internet product will be based on the user's current operating scenario and the determined click-through rate (CTR) of the application. Click Through Rate) recommends one or more premium apps with a higher CTR for the user. It can be seen that quickly determining the application's CTR is particularly important in the promotion of the application.
当前,常见的CTR确定方法为基于统计的确定方法,即假设应用在相同操作场景下的CTR相同,从根据历史样本数据统计出应用在不同操作场景下的CTR中查找与当前操作场景相同或相似的操作场景下的CTR,以作为应用在当前操作场景下的CTR。但是,由于操作场景由多个特征维度构成,该基于统计的确定方法依赖于大量的历史样本数据,且在查找与当前操作场景相同或相似的操作场景时需要考虑到多个特征维度,无法快速的确定出应用的CTR且资源消耗大。Currently, the common CTR determination method is a statistical-based determination method, that is, the CTR is the same in the same operation scenario, and the CTR in the different operation scenarios is found to be the same as or similar to the current operation scenario. The CTR under the operating scenario is used as the CTR applied in the current operating scenario. However, since the operation scenario is composed of a plurality of feature dimensions, the statistics-based determination method relies on a large amount of historical sample data, and multiple feature dimensions need to be considered when searching for an operation scenario that is the same as or similar to the current operation scenario. The CTR of the application is determined and the resource consumption is large.
发明内容Summary of the invention
本发明实施例公开了一种点击到达率CTR的确定方法及装置,能够快速的确定出应用的CTR且资源消耗小。The embodiment of the invention discloses a method and a device for determining the click arrival rate CTR, which can quickly determine the CTR of the application and consume small resources.
本发明第一方面实施例公开了一种点击到达率CTR的确定方法,所述方法包括:The first aspect of the present invention discloses a method for determining a click arrival rate CTR, and the method includes:
在检测到针对应用的显示请求时,确定每个应用的特征值序列,所述特征值序列由用于描述应用信息的应用特征值序列、用于描述用户信息的用户特征值序列以及流量特征值序列组成;When a display request for an application is detected, a sequence of feature values for each application is determined, the sequence of feature values being a sequence of applied feature values for describing application information, a sequence of user feature values for describing user information, and a flow feature value Sequence composition
分别将每个所述特征值序列作为预先设置的CTR计算算法的输入,获取与每个所述特征值序列对应的所述CTR计算算法的输出;And respectively outputting each of the feature value sequences as an input of a preset CTR calculation algorithm, and acquiring an output of the CTR calculation algorithm corresponding to each of the feature value sequences;
将每个所述特征值序列对应的所述CTR计算算法的输出确定为该特征值序列对应应用的CTR。 The output of the CTR calculation algorithm corresponding to each of the sequence of feature values is determined as the CTR of the application corresponding to the sequence of feature values.
在至少一个实施例中,所述CTR计算算法为基于逻辑回归模型的算法,且所述基于逻辑回归模型的算法的计算公式为:In at least one embodiment, the CTR calculation algorithm is a logistic regression model based algorithm, and the calculation formula of the logistic regression model based algorithm is:
Figure PCTCN2016094448-appb-000001
Figure PCTCN2016094448-appb-000001
其中,所述yCTR为所述计算公式的输出,所述
Figure PCTCN2016094448-appb-000002
为所述计算公式的输入,所述
Figure PCTCN2016094448-appb-000003
为预先计算出的所述计算公式的系数。
Wherein the y CTR is an output of the calculation formula,
Figure PCTCN2016094448-appb-000002
For the input of the calculation formula, the
Figure PCTCN2016094448-appb-000003
It is a coefficient of the calculation formula calculated in advance.
在至少一个实施例中,所述在检测到针对应用的显示请求时,确定每个应用的特征值序列之前,所述方法还包括:In at least one embodiment, the method further comprises: before determining a sequence of feature values for each application when detecting a display request for an application, the method further comprising:
从预先存储的样本数据中读取第一数量的第一样本数据,所述样本数据由特征值序列、用于标识该样本数据是否被展现的展现标识以及用于标识该样本数据是否被点击的点击标识组成;Reading a first quantity of first sample data from pre-stored sample data, the sample data being represented by a sequence of feature values, a presentation identifier for identifying whether the sample data is presented, and for identifying whether the sample data is clicked Click markup composition;
归并所述第一数量的第一样本数据中特征值序列相同的第一样本数据,以获取第二数量的第二样本数据,所述第二样本数据由形成该第二样本数据的第一样本数据的特征值序列、形成该第二样本数据的第一样本数据的展现标识之和以及形成该第二样本数据的第一样本数据的点击标识之和组成;And combining the first sample data of the first quantity of the first sample data having the same sequence of feature values to obtain a second quantity of second sample data, wherein the second sample data is formed by the second sample data a sum of a sequence of feature values of a sample of data, a sum of presentation identifiers of first sample data forming the second sample data, and a click identifier of first sample data forming the second sample data;
将每个所述第二样本数据表达成似然表达式,并将所有所述似然表达式相乘以获取乘积似然表达式;Expressing each of the second sample data into a likelihood expression, and multiplying all of the likelihood expressions to obtain a product likelihood expression;
通过牛顿迭代法以及初始迭代参数进行目标次数次的迭代,计算出所述乘积似然表达式取最大值时所述乘积似然表达式中未知参数的值,并将所述未知参数的值确定为所述
Figure PCTCN2016094448-appb-000004
Calculating the value of the unknown parameter in the product likelihood expression when the product likelihood expression takes the maximum value by using the Newton iteration method and the initial iteration parameter to perform the iteration of the target number of times, and determining the value of the unknown parameter For the stated
Figure PCTCN2016094448-appb-000004
在至少一个实施例中,所述在检测到针对应用的显示请求时,确定每个应用的特征值序列之前,所述方法还包括:In at least one embodiment, the method further comprises: before determining a sequence of feature values for each application when detecting a display request for an application, the method further comprising:
从预先存储的样本数据中读取第一数量的第一样本数据,所述第一样本数据由特征值序列、用于标识该样本数据是否被展现的展现标识以及用于标识该样本数据是否被点击的点击标识组成;Reading a first quantity of first sample data from pre-stored sample data, the first sample data being represented by a sequence of feature values, a presentation identifier for identifying whether the sample data is presented, and for identifying the sample data Whether it is composed of clicked click marks;
归并所述第一数量的第一样本数据中特征值序列相同的第一样本数据,以获取第二数量的第二样本数据,所述第二样本数据由形成该第二样本数据的第一样本数据的特征值序列、形成该第二样本数据的第一样本数据的展现标识之和以及形成该第二样本数据的第一样本数据的点击标识之和组成;And combining the first sample data of the first quantity of the first sample data having the same sequence of feature values to obtain a second quantity of second sample data, wherein the second sample data is formed by the second sample data a sum of a sequence of feature values of a sample of data, a sum of presentation identifiers of first sample data forming the second sample data, and a click identifier of first sample data forming the second sample data;
将所述第二数量的第二样本数据均分成每份包括第三数量的第二样本数据的样本组;Dividing the second quantity of second sample data into a sample group each including a third quantity of second sample data;
分别将每个所述样本组包括的每个第二样本数据表达成似然表达式,并将每个所述样本组对应的所有所述似然表达式相乘以获取该样本组的乘积似然表达式; Each second sample data included in each of the sample sets is expressed as a likelihood expression, respectively, and all of the likelihood expressions corresponding to each of the sample sets are multiplied to obtain a product of the sample set Expression
通过牛顿迭代法以及初始迭代参数进行一次迭代,分别计算出每个所述乘积似然表达式取最大值时该乘积似然表达式中未知参数的第一值,将每个所述第一值的和作为下一次迭代的初始迭代参数,执行所述通过牛顿迭代法以及初始迭代参数进行一次迭代的操作,直至迭代次数到达目标次数;Performing an iteration by Newton iteration method and initial iteration parameters, respectively calculating a first value of an unknown parameter in the product likelihood expression when each of the product likelihood expression takes a maximum value, each of the first values And as an initial iteration parameter of the next iteration, performing the operation of performing an iteration by the Newton iteration method and the initial iteration parameter until the number of iterations reaches the target number of times;
将在所述目标次数迭代后计算出的每个所述乘积似然表达式中所述未知参数的第二值的和确定为所述
Figure PCTCN2016094448-appb-000005
And determining a sum of the second values of the unknown parameters in each of the product likelihood expressions calculated after the target number of iterations is determined as
Figure PCTCN2016094448-appb-000005
在至少一个实施例中,所述归并所述第一数量的第一样本数据中特征值序列相同的第一样本数据之后,所述将所述第二数量的第二样本数据均分成每份包括第三数量的第二样本数据的样本组之前,所述方法还包括:In at least one embodiment, after the merging the first sample data having the same sequence of feature values in the first number of first sample data, dividing the second number of second sample data into each Before the sample group including the third quantity of the second sample data, the method further includes:
将所述第二数量的第二样本数据存储在地址连续的内存空间中。The second quantity of second sample data is stored in a memory space of consecutive addresses.
本发明第二方面实施例公开了一种点击到达率CTR的确定装置,所述装置包括第一确定单元、第一获取单元以及第二确定单元,其中:The second aspect of the present invention discloses a determining device for a click arrival rate CTR, the device comprising a first determining unit, a first obtaining unit, and a second determining unit, wherein:
所述第一确定单元,用于在检测到针对应用的显示请求时,确定每个应用的特征值序列,所述特征值序列由用于描述应用信息的应用特征值序列、用于描述用户信息的用户特征值序列以及流量特征值序列组成;The first determining unit is configured to determine a sequence of feature values of each application when the display request for the application is detected, where the sequence of feature values is used by the application feature value sequence for describing application information, and is used to describe user information. User characteristic value sequence and flow characteristic value sequence composition;
所述第一获取单元,用于分别将每个所述特征值序列作为预先设置的CTR计算算法的输入,获取与每个所述特征值序列对应的所述CTR计算算法的输出;The first acquiring unit is configured to respectively obtain, according to an input of a preset CTR calculation algorithm, an output of the CTR calculation algorithm corresponding to each of the feature value sequences;
所述第二确定单元,用于将每个所述特征值序列对应的所述CTR计算算法的输出确定为该特征值序列对应应用的CTR。The second determining unit is configured to determine an output of the CTR calculation algorithm corresponding to each of the feature value sequences as a CTR corresponding to the application of the feature value sequence.
在至少一个实施例中,所述CTR计算算法为基于逻辑回归模型的算法,且所述基于逻辑回归模型的算法的计算公式为:In at least one embodiment, the CTR calculation algorithm is a logistic regression model based algorithm, and the calculation formula of the logistic regression model based algorithm is:
Figure PCTCN2016094448-appb-000006
Figure PCTCN2016094448-appb-000006
其中,所述yCTR为所述计算公式的输出,所述
Figure PCTCN2016094448-appb-000007
为所述计算公式的输入,所述
Figure PCTCN2016094448-appb-000008
为预先计算出的所述计算公式的系数。
Wherein the y CTR is an output of the calculation formula,
Figure PCTCN2016094448-appb-000007
For the input of the calculation formula, the
Figure PCTCN2016094448-appb-000008
It is a coefficient of the calculation formula calculated in advance.
在至少一个实施例中,所述装置还包括第一读取单元、第一归并单元、第二获取单元、第一计算单元以及第三确定单元,其中:In at least one embodiment, the apparatus further includes a first reading unit, a first merging unit, a second obtaining unit, a first calculating unit, and a third determining unit, wherein:
所述第一读取单元,用于从预先存储的样本数据中读取第一数量的第一样本数据,所述样本数据由特征值序列、用于标识该样本数据是否被展现的展现标识以及用于标识该样本数据是否被点击的点击标识组成;The first reading unit is configured to read a first quantity of first sample data from pre-stored sample data, where the sample data is a sequence of feature values, and a display identifier for identifying whether the sample data is presented And a click identifier for identifying whether the sample data is clicked;
所述第一归并单元,用于归并所述第一数量的第一样本数据中特征值序列相同的第一样 本数据,以获取第二数量的第二样本数据,所述第二样本数据由形成该第二样本数据的第一样本数据的特征值序列、形成该第二样本数据的第一样本数据的展现标识之和以及形成该第二样本数据的第一样本数据的点击标识之和组成;The first merging unit is configured to merge the first number of eigenvalue sequences in the first quantity of the first sample data to be the same The data is obtained to obtain a second quantity of second sample data, the second sample data is formed by a sequence of feature values of the first sample data forming the second sample data, and the first sample data forming the second sample data And a sum of the click identifiers of the first sample data forming the second sample data;
所述第二获取单元,用于将每个所述第二样本数据表达成似然表达式,并将所有所述似然表达式相乘以获取乘积似然表达式;The second obtaining unit is configured to express each of the second sample data into a likelihood expression, and multiply all the likelihood expressions to obtain a product likelihood expression;
所述第一计算单元,用于通过牛顿迭代法以及初始迭代参数进行目标次数次的迭代,计算出所述乘积似然表达式取最大值时所述乘积似然表达式中未知参数的值;The first calculating unit is configured to perform an iteration of a target number of times by a Newton iteration method and an initial iteration parameter, and calculate a value of an unknown parameter in the product likelihood expression when the product likelihood expression takes a maximum value;
所述第三确定单元,用于将所述未知参数的值确定为所述
Figure PCTCN2016094448-appb-000009
The third determining unit is configured to determine a value of the unknown parameter as the
Figure PCTCN2016094448-appb-000009
在至少一个实施例中,所述装置还包括第二读取单元、第二归并单元、均分单元、第三获取单元、第二计算单元以及第四确定单元,其中:In at least one embodiment, the apparatus further includes a second reading unit, a second merging unit, a averaging unit, a third obtaining unit, a second calculating unit, and a fourth determining unit, wherein:
所述第二读取单元,用于从预先存储的样本数据中读取第一数量的第一样本数据,所述第一样本数据由特征值序列、用于标识该样本数据是否被展现的展现标识以及用于标识该样本数据是否被点击的点击标识组成;The second reading unit is configured to read a first quantity of first sample data from pre-stored sample data, where the first sample data is used by a sequence of feature values to identify whether the sample data is displayed a presentation identifier and a click identifier for identifying whether the sample data is clicked;
所述第二归并单元,用于归并所述第一数量的第一样本数据中特征值序列相同的第一样本数据,以获取第二数量的第二样本数据,所述第二样本数据由形成该第二样本数据的第一样本数据的特征值序列、形成该第二样本数据的第一样本数据的展现标识之和以及形成该第二样本数据的第一样本数据的点击标识之和组成;The second merging unit is configured to merge the first sample data of the first number of first sample data with the same sequence of feature values to obtain a second quantity of second sample data, the second sample data a sum of a sequence of feature values of the first sample data forming the second sample data, a sum of presentation identifiers of the first sample data forming the second sample data, and a click of the first sample data forming the second sample data The sum of the logos;
所述均分单元,用于将所述第二数量的第二样本数据均分成每份包括第三数量的第二样本数据的样本组;The equalizing unit is configured to divide the second quantity of second sample data into a sample group including a third quantity of second sample data;
所述第三获取单元,用于分别将每个所述样本组包括的每个第二样本数据表达成似然表达式,并将每个所述样本组对应的所有所述似然表达式相乘以获取该样本组的乘积似然表达式;The third obtaining unit is configured to respectively express each second sample data included in each of the sample groups into a likelihood expression, and compare all the likelihood expressions corresponding to each of the sample groups Multiply by to obtain a product likelihood expression for the sample set;
所述第二计算单元,用于通过牛顿迭代法以及初始迭代参数进行一次迭代,分别计算出每个所述乘积似然表达式取最大值时该乘积似然表达式中未知参数的第一值,将每个所述第一值的和作为下一次迭代的初始迭代参数,执行所述通过牛顿迭代法以及初始迭代参数进行一次迭代的操作,直至迭代次数到达目标次数;The second calculating unit is configured to perform an iteration by using a Newton iteration method and an initial iteration parameter, and respectively calculate a first value of an unknown parameter in the product likelihood expression when each of the product likelihood expressions takes a maximum value Performing, by using the sum of each of the first values as an initial iteration parameter of the next iteration, performing an iterative operation by the Newton iteration method and the initial iteration parameter until the number of iterations reaches the target number of times;
所述第四确定单元,用于将在所述目标次数迭代后计算出的每个所述乘积似然表达式中所述未知参数的第二值的和确定为所述
Figure PCTCN2016094448-appb-000010
The fourth determining unit is configured to determine a sum of the second value of the unknown parameter in each of the product likelihood expressions calculated after the target number of times is iterated as
Figure PCTCN2016094448-appb-000010
在至少一个实施例中,所述装置还包括存储单元,其中:In at least one embodiment, the apparatus further includes a storage unit, wherein:
所述存储单元,用于将所述第二数量的第二样本数据存储在地址连续的内存空间中。The storage unit is configured to store the second quantity of second sample data in a memory space with consecutive addresses.
本发明第三方面实施例公开了一种服务器,所述服务器包括:A third aspect of the present invention discloses a server, where the server includes:
处理器;以及 Processor;
存储器,用于存储所述处理器可执行的指令,a memory for storing instructions executable by the processor,
其中所述处理器用于读取所述存储器中存储的可执行的指令以运行与所述指令对应的程序,以用于执行如本发明第一方面实施例所述的点击到达率CTR的确定方法。Wherein the processor is configured to read executable instructions stored in the memory to execute a program corresponding to the instructions for performing a determination method of a click arrival rate CTR according to an embodiment of the first aspect of the present invention .
本发明第四方面实施例公开了一种计算机可读存储介质,具有存储于其中的指令,当服务器的处理器执行所述指令时,所述服务器执行如本发明第一方面实施例所述的点击到达率CTR的确定方法。A fourth aspect of the present invention discloses a computer readable storage medium having instructions stored therein, when the processor of the server executes the instructions, the server performs the method according to the first aspect of the present invention. Click the method of determining the arrival rate CTR.
本发明第五方面实施例公开了一种计算机应用程序,当其在服务器的处理器上执行时,执行如本发明第一方面实施例所述的点击到达率CTR的确定方法。A fifth aspect of the present invention discloses a computer application that, when executed on a processor of a server, performs a method of determining a click arrival rate CTR as in the first aspect of the present invention.
本发明实施例中,在检测到针对应用的显示请求时,确定每个应用的特征值序列,其中,该特征值序列由用于描述应用信息的应用特征值序列、用于描述用户信息的用户特征值序列以及流量特征值序列组成,分别将每个特征值序列作为预先设置的CTR计算算法的输入,获取与每个特征值序列对应的所述CTR计算算法的输出,将每个特征值序列对应的CTR计算算法的输出确定为该特征值序列对应应用的CTR。实施本发明实施例能够根据确定出的每个应用的特征值序列以及预先设置的CTR计算算法快速的计算出每个应用的CTR,且无需从根据历史样本数据统计出应用在不同操作场景下的CTR中查找与当前操作场景相同或相似的操作场景下的CTR,资源消耗低。In the embodiment of the present invention, when a display request for an application is detected, a sequence of feature values of each application is determined, wherein the sequence of feature values is a sequence of application feature values for describing application information, and a user for describing user information. The eigenvalue sequence and the flow eigenvalue sequence are composed, and each eigenvalue sequence is respectively input as a preset CTR calculation algorithm, and an output of the CTR calculation algorithm corresponding to each eigenvalue sequence is obtained, and each eigenvalue sequence is obtained. The output of the corresponding CTR calculation algorithm is determined to correspond to the CTR of the applied sequence of feature values. The embodiment of the present invention can quickly calculate the CTR of each application according to the determined sequence of feature values of each application and the preset CTR calculation algorithm, and does not need to calculate the application in different operation scenarios according to the historical sample data. The CTR finds the CTR in the same or similar operation scenario as the current operation scenario, and the resource consumption is low.
附图说明DRAWINGS
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings to be used in the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without paying any creative work.
图1是本发明实施例公开的一种点击到达率CTR的确定方法的流程示意图;1 is a schematic flow chart of a method for determining a click arrival rate CTR according to an embodiment of the present invention;
图2是本发明实施例公开的另一种点击到达率CTR的确定方法的流程示意图;2 is a schematic flow chart of another method for determining a click arrival rate CTR according to an embodiment of the present invention;
图3是本发明实施例公开的又一种点击到达率CTR的确定方法的流程示意图;3 is a schematic flow chart of still another method for determining a click arrival rate CTR according to an embodiment of the present invention;
图4是本发明实施例公开的一种点击到达率CTR的确定装置的结构示意图;4 is a schematic structural diagram of a device for determining a click arrival rate CTR according to an embodiment of the present invention;
图5是本发明实施例公开的另一种点击到达率CTR的确定装置的结构示意图;FIG. 5 is a schematic structural diagram of another apparatus for determining a click arrival rate CTR according to an embodiment of the present invention; FIG.
图6是本发明实施例公开的又一种点击到达率CTR的确定装置的结构示意图;和6 is a schematic structural diagram of still another apparatus for determining a click arrival rate CTR according to an embodiment of the present invention; and
图7是根据本发明一个实施例的服务器的结构示意图。FIG. 7 is a schematic structural diagram of a server according to an embodiment of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明 中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. Based on the invention All of the other embodiments obtained by those skilled in the art without creative efforts are within the scope of the present invention.
本发明实施例公开了一种点击到达率CTR的确定方法及装置,能够根据确定出的每个应用的特征值序列以及预先设置的CTR计算算法快速的计算出每个应用的CTR,且资源消耗低。以下分别进行详细说明。The embodiment of the invention discloses a method and a device for determining a click arrival rate CTR, which can quickly calculate the CTR of each application according to the determined sequence of feature values of each application and a preset CTR calculation algorithm, and resource consumption. low. The details are described below separately.
参见图1,图1是本发明实施例公开的一种点击到达率CTR的确定方法的流程示意图。其中,图1所示的方法可以应用于服务器中。如图1所示,该点击到达率CTR的确定方法可以包括以下步骤:Referring to FIG. 1, FIG. 1 is a schematic flowchart diagram of a method for determining a click arrival rate CTR according to an embodiment of the present invention. Among them, the method shown in FIG. 1 can be applied to a server. As shown in FIG. 1, the method for determining the click arrival rate CTR may include the following steps:
S101、在检测到针对应用的显示请求时,确定每个应用的特征值序列。S101. Determine a sequence of feature values of each application when detecting a display request for an application.
本发明实施例中,针对应用的显示请求可以是由用户通过终端设备触发的,也可以是由终端设备主动触发的,且每个应用的特征值序列由用于描述应用信息(如应用的类别以及应用的相关描述信息等)的应用特征值序列、用于描述用户信息(如用户的性别及用户的兴趣等)的用户特征值序列以及用于描述用户行为信息(如时间、地点以及语言等)的流量特征值序列组成,且该特征值序列为由多个0和1作为分量的特征值向量。举例来说,假设应用信息为应用的类别(拍摄应用以及游戏应用)、用户信息为用户的性别(男性以及女性)以及行为信息为地点(上海及北京),则对于一个应用来说,其特征值序列A=[a1,a2,a3,a4,a5,a6],其中,a1及a2用于描述应用的类别,a1=1且a2=0表示应用为拍摄应用,a1=0且a2=1表示应用为游戏应用,a3及a4用于用于描述用户的性别,a3=1且a4=0表示用户性别为男性,a3=0且a4=1表示用户性别为女性,a5及a6用于描述用户的行为地点,a5=1且a6=0表示用户所处的地点为上海,a5=0且a6=1表示用户所处的地点在北京。In the embodiment of the present invention, the display request for the application may be triggered by the user through the terminal device, or may be actively triggered by the terminal device, and the sequence of feature values of each application is used to describe the application information (such as the category of the application). And a sequence of application feature values of the related description information of the application, etc., a sequence of user feature values for describing user information (such as the gender of the user and the interest of the user, etc.) and information describing the user behavior (such as time, place, language, etc.) The sequence of traffic characteristic values is composed, and the sequence of feature values is a feature value vector composed of a plurality of 0s and 1s as components. For example, if the application information is the category of the application (shooting application and game application), the user information is the user's gender (male and female), and the behavior information is the location (Shanghai and Beijing), then for an application, its characteristics The value sequence A = [a1, a2, a3, a4, a5, a6], where a1 and a2 are used to describe the category of the application, a1 = 1 and a2 = 0 indicates that the application is a shooting application, a1 = 0 and a2 = 1 Indicates that the application is a game application, a3 and a4 are used to describe the gender of the user, a3=1 and a4=0 indicates that the user gender is male, a3=0 and a4=1 indicates that the user gender is female, and a5 and a6 are used to describe The user's behavior location, a5=1 and a6=0 means that the user is in Shanghai, a5=0 and a6=1 means that the user is in Beijing.
本发明实施例中的终端设备可以包括:手机、智能电话、笔记本电脑、数字广播接收机、个人数字助理(PDA)、平板电脑(PAD)、便携式多媒体播放器(PMP)、导航装置、台式机等等终端设备。The terminal device in the embodiment of the present invention may include: a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a personal digital assistant (PDA), a tablet computer (PAD), a portable multimedia player (PMP), a navigation device, and a desktop computer. And so on.
S102、分别将每个特征值序列作为预先设置的CTR计算算法的输入,获取与每个特征值序列对应的CTR计算算法的输出。S102. Each of the feature value sequences is respectively input as a preset CTR calculation algorithm, and an output of the CTR calculation algorithm corresponding to each feature value sequence is acquired.
本发明实施例中,预先设置的CTR计算算法用于表示应用的特征值序列(输入)与该应用的CTR(输出)之间的关系,即将每个应用的特征值序列分别作为该CTR计算算法的输入,对应的CTR计算算法的输出即为应用的CTR。这样只要预先设置好CTR计算算法,就可以快速的确定出应用的CTR,无需再依赖于大量的历史样本数据,资源消耗低。In the embodiment of the present invention, the preset CTR calculation algorithm is used to represent the relationship between the applied feature value sequence (input) and the CTR (output) of the application, that is, the eigenvalue sequence of each application is respectively used as the CTR calculation algorithm. The input, the output of the corresponding CTR calculation algorithm is the CTR of the application. In this way, as long as the CTR calculation algorithm is set in advance, the CTR of the application can be quickly determined, and there is no need to rely on a large amount of historical sample data, and the resource consumption is low.
该CTR计算算法为基于逻辑回归模型的算法,且该基于逻辑回归模型的算法的计算公式为: The CTR calculation algorithm is an algorithm based on a logistic regression model, and the calculation formula of the algorithm based on the logistic regression model is:
Figure PCTCN2016094448-appb-000011
Figure PCTCN2016094448-appb-000011
其中,yCTR为该计算公式的输出(应用的CTR),
Figure PCTCN2016094448-appb-000012
为该计算公式的输入(应用的特征值序列),
Figure PCTCN2016094448-appb-000013
为预先计算出的该计算公式的系数。本发明实施例中,根据一定的历史样本数据计算出该计算公式中的
Figure PCTCN2016094448-appb-000014
在后续的CTR确定过程中只要获取到应用的特征值序列均可快速的确定出应用的CTR。
Where y CTR is the output of the calculation formula (CTR of the application),
Figure PCTCN2016094448-appb-000012
For the input of this calculation formula (the sequence of eigenvalues applied),
Figure PCTCN2016094448-appb-000013
Is the coefficient of the calculation formula calculated in advance. In the embodiment of the present invention, the calculation formula is calculated according to certain historical sample data.
Figure PCTCN2016094448-appb-000014
The CTR of the application can be quickly determined by acquiring the sequence of feature values of the application in the subsequent CTR determination process.
S103、将每个特征值序列对应的CTR计算算法的输出确定为该特征值序列对应应用的CTR。S103. Determine an output of a CTR calculation algorithm corresponding to each feature value sequence as a CTR corresponding to the application of the feature value sequence.
本发明实施例中,在检测到针对应用的显示请求时,确定每个应用的特征值序列,其中,该特征值序列由用于描述应用信息的应用特征值序列、用于描述用户信息的用户特征值序列以及流量特征值序列组成,分别将每个特征值序列作为预先设置的CTR计算算法的输入,获取与每个特征值序列对应的CTR计算算法的输出,将每个特征值序列对应的CTR计算算法的输出确定为该特征值序列对应应用的CTR。本发明实施例能够根据确定出的每个应用的特征值序列以及预先设置的CTR计算算法快速的计算出每个应用的CTR,且无需从从根据历史样本数据统计出应用在不同操作场景下的CTR中查找与当前操作场景相同或相似的操作场景下的CTR,资源消耗低。In the embodiment of the present invention, when a display request for an application is detected, a sequence of feature values of each application is determined, wherein the sequence of feature values is a sequence of application feature values for describing application information, and a user for describing user information. The eigenvalue sequence and the flow eigenvalue sequence are composed, and each eigenvalue sequence is used as an input of a preset CTR calculation algorithm, and an output of a CTR calculation algorithm corresponding to each eigenvalue sequence is obtained, and each eigenvalue sequence is corresponding. The output of the CTR calculation algorithm is determined to correspond to the CTR of the applied sequence of feature values. The embodiment of the present invention can quickly calculate the CTR of each application according to the determined sequence of feature values of each application and the preset CTR calculation algorithm, and does not need to count from the historical sample data to be applied in different operation scenarios. The CTR finds the CTR in the same or similar operation scenario as the current operation scenario, and the resource consumption is low.
参见图2,图2是本发明实施例公开的另一种点击到达率CTR的确定方法的流程示意图。其中,图2所示的方法应用于服务器中。如图2所示,该点击到达率CTR的确定方法可以包括以下步骤:Referring to FIG. 2, FIG. 2 is a schematic flowchart diagram of another method for determining a click arrival rate CTR according to an embodiment of the present invention. Among them, the method shown in Figure 2 is applied to the server. As shown in FIG. 2, the method for determining the click arrival rate CTR may include the following steps:
S201、从预先存储的样本数据中读取第一数量的第一样本数据。S201. Read a first quantity of first sample data from pre-stored sample data.
本发明实施例中,每个样本数据由特征值序列、用于标识该样本数据是否被展现的展现标识pv以及用于标识该样本数据是否被点击的点击标识click组成,且特征值序列由用于描述应用信息(如应用的类别以及应用的相关描述信息等)的应用特征值序列、用于描述用户信息(如用户的性别及用户的兴趣等)的用户特征值序列以及用于描述用户行为信息(如时间、地点以及语言等)的流量特征值序列组成,其中,pv为1且click为0表示该样本数据为展现样本数据,pv为0且click为1表示该样本数据为点击样本数据。In the embodiment of the present invention, each sample data is composed of a sequence of feature values, a presentation identifier pv for identifying whether the sample data is presented, and a click identifier click for identifying whether the sample data is clicked, and the sequence of feature values is used. a sequence of application feature values describing application information (such as categories of applications and related description information of applications, etc.), sequence of user feature values for describing user information (such as gender of the user and interest of the user, etc.) and for describing user behavior A sequence of traffic characteristic values of information (such as time, place, language, etc.), wherein pv is 1 and click is 0 indicates that the sample data is presentation sample data, pv is 0 and click 1 indicates that the sample data is click sample data. .
S202、归并上述第一数量的第一样本数据中特征值序列相同的第一样本数据,以获取第二数量的第二样本数据。S202. The first sample data with the same sequence of feature values in the first quantity of the first sample data is merged to obtain a second quantity of second sample data.
本发明实施例中,以特征值序列相同为归并原则,将特征值序列相同的多条第一样本数据归并成一条第二样本数据,即将上述第一数量的第一样本数据归并成第二数量的第二样本 数据,且第二数量等于上述第一数量的第一样本数据中特征值序列的不同取值的个数。其中,第二数量的第二样本数据由形成该第二样本数据的第一样本数据的特征值序列、形成该第二样本数据的所有第一样本数据的pv之和以及形成该第二样本数据的所有第一样本数据的click之和组成。In the embodiment of the present invention, the first sample data having the same sequence of feature values is merged into a second sample data by using the same sequence of feature values as the principle of merging, that is, the first sample data of the first quantity is merged into the first sample data. Second quantity of the second sample Data, and the second number is equal to the number of different values of the sequence of feature values in the first sample data of the first number. The second quantity of second sample data is formed by a sum of a sequence of feature values of the first sample data forming the second sample data, a pv of all first sample data forming the second sample data, and forming the second The sum of the clicks of all the first sample data of the sample data.
S203、将每个第二样本数据表达成似然表达式,并将所有似然表达式相乘以获取乘积似然表达式。S203. Express each second sample data as a likelihood expression, and multiply all likelihood expressions to obtain a product likelihood expression.
本发明实施例中,每个第二样本数据均可表示为似然表达式,其中,似然表达式为:In the embodiment of the present invention, each second sample data may be represented as a likelihood expression, wherein the likelihood expression is:
Figure PCTCN2016094448-appb-000015
Figure PCTCN2016094448-appb-000015
其中,
Figure PCTCN2016094448-appb-000016
且n取大于等于1且小于等于第二数量的所有整数,
Figure PCTCN2016094448-appb-000017
为第n个第二样本数据的特征值序列,pvn为形成第n个第二样本数据的所有第一样本数据的pv之和,clickn为形成第n个第二样本数据的所有第一样本数据的click之和,则乘积似然表达式为:
among them,
Figure PCTCN2016094448-appb-000016
And n takes all integers greater than or equal to 1 and less than or equal to the second number,
Figure PCTCN2016094448-appb-000017
a sequence of feature values of the nth second sample data, pv n is the sum of pvs of all first sample data forming the nth second sample data, and click n is all the numbers forming the nth second sample data The sum of clicks of a sample of data, then the product likelihood expression is:
Figure PCTCN2016094448-appb-000018
Figure PCTCN2016094448-appb-000018
S204、通过牛顿迭代法以及初始迭代参数进行目标次数次的迭代,计算出上述乘积似然表达式取最大值时上述乘积似然表达式中未知参数的值,并将该未知参数的值确定为
Figure PCTCN2016094448-appb-000019
S204. Perform an iteration of the target number of times by a Newton iteration method and an initial iteration parameter, calculate a value of the unknown parameter in the product likelihood expression when the product likelihood expression takes the maximum value, and determine the value of the unknown parameter as
Figure PCTCN2016094448-appb-000019
本发明实施例中,初始迭代参数
Figure PCTCN2016094448-appb-000020
为预先设置的迭代参数,即通过牛顿迭代法进行一次迭代后计算出上述乘积似然表达式取最大值时上述乘积似然表达式中未知参数的值为
Figure PCTCN2016094448-appb-000021
Figure PCTCN2016094448-appb-000022
作为下一次迭代时的初始迭代参数得到
Figure PCTCN2016094448-appb-000023
以此类推,直到迭代次数达到目标次数m之后得到得未知参数的值
Figure PCTCN2016094448-appb-000024
Figure PCTCN2016094448-appb-000025
确定为
Figure PCTCN2016094448-appb-000026
其中,目标次数m可以是预先设置好的次数,也可以是根据
Figure PCTCN2016094448-appb-000027
Figure PCTCN2016094448-appb-000028
的夹角为最小值或根据
Figure PCTCN2016094448-appb-000029
的模为最小值计算出来的,本发明实施例不作限定。
In the embodiment of the present invention, the initial iteration parameter
Figure PCTCN2016094448-appb-000020
For the preset iteration parameter, that is, the value of the unknown parameter in the product likelihood expression is calculated when the product likelihood expression takes the maximum value by performing an iteration by the Newton iteration method.
Figure PCTCN2016094448-appb-000021
will
Figure PCTCN2016094448-appb-000022
As the initial iteration parameter at the next iteration
Figure PCTCN2016094448-appb-000023
And so on, until the number of iterations reaches the target number m, the value of the unknown parameter is obtained.
Figure PCTCN2016094448-appb-000024
will
Figure PCTCN2016094448-appb-000025
Determined as
Figure PCTCN2016094448-appb-000026
The target number m may be a preset number of times, or may be based on
Figure PCTCN2016094448-appb-000027
versus
Figure PCTCN2016094448-appb-000028
The angle is the minimum or according to
Figure PCTCN2016094448-appb-000029
The modulo is calculated as the minimum value, which is not limited in the embodiment of the present invention.
本发明实施例,具体的,要计算上述乘积似然表达式的最大值,可以先对上述乘积似然表达式取对数,然后再乘以-1得到
Figure PCTCN2016094448-appb-000030
然后通过牛顿迭代法迭代目标次数m次计算出
Figure PCTCN2016094448-appb-000031
取最小值时未知参数的值
Figure PCTCN2016094448-appb-000032
In the embodiment of the present invention, specifically, to calculate the maximum value of the product likelihood expression, the logarithm of the product likelihood expression may be first obtained, and then multiplied by -1 to obtain
Figure PCTCN2016094448-appb-000030
Then iterate through the Newton iteration method to iterate the target times m times.
Figure PCTCN2016094448-appb-000031
The value of the unknown parameter when taking the minimum value
Figure PCTCN2016094448-appb-000032
S205、在检测到针对应用的显示请求时,确定每个应用的特征值序列。S205. Determine a sequence of feature values for each application when a display request for the application is detected.
本发明实施例中,针对应用的显示请求可以是由用户通过终端设备触发的,也可以是由 终端设备主动触发的,且每个应用的特征值序列由用于描述应用信息(如应用的类别以及应用的相关描述信息等)的应用特征值序列、用于描述用户信息(如用户的性别及用户的兴趣等)的用户特征值序列以及用于描述用户行为信息(如时间、地点以及语言等)的流量特征值序列组成,且该特征值序列为由多个0和1作为分量的特征值向量。In the embodiment of the present invention, the display request for the application may be triggered by the user through the terminal device, or may be The terminal device is actively triggered, and the sequence of feature values of each application is used to describe application information (such as the category of the application and related description information of the application, etc.), and is used to describe user information (such as the gender of the user). a sequence of user feature values of a user's interest, etc., and a sequence of traffic feature values for describing user behavior information (such as time, place, language, etc.), and the sequence of feature values is a feature value of a plurality of 0s and 1s as components vector.
S206、分别将每个特征值序列作为预先设置的CTR计算算法的输入,获取与每个特征值序列对应的CTR计算算法的输出。S206. Each of the feature value sequences is respectively input as a preset CTR calculation algorithm, and an output of the CTR calculation algorithm corresponding to each feature value sequence is obtained.
本发明实施例中,该CTR计算算法用于表示应用的特征值序列(输入)与该应用的CTR(输出)之间的关系,即将每个应用的特征值序列分别作为该CTR计算算法的输入,对应的CTR计算算法的输出即为应用的CTR。且该CTR计算算法为基于逻辑回归模型的算法,且该基于逻辑回归模型的算法的计算公式为:In the embodiment of the present invention, the CTR calculation algorithm is used to represent the relationship between the application's sequence of feature values (input) and the CTR (output) of the application, that is, the sequence of feature values of each application is used as the input of the CTR calculation algorithm, respectively. The output of the corresponding CTR calculation algorithm is the CTR of the application. And the CTR calculation algorithm is an algorithm based on a logistic regression model, and the calculation formula of the algorithm based on the logistic regression model is:
Figure PCTCN2016094448-appb-000033
Figure PCTCN2016094448-appb-000033
其中,yCTR为该计算公式的输出(应用的CTR),
Figure PCTCN2016094448-appb-000034
为该计算公式的输入(应用的特征值序列),
Figure PCTCN2016094448-appb-000035
为上述
Figure PCTCN2016094448-appb-000036
Where y CTR is the output of the calculation formula (CTR of the application),
Figure PCTCN2016094448-appb-000034
For the input of this calculation formula (the sequence of eigenvalues applied),
Figure PCTCN2016094448-appb-000035
For the above
Figure PCTCN2016094448-appb-000036
S207、将每个特征值序列对应的CTR计算算法的输出确定为该特征值序列对应应用的CTR。S207. Determine an output of the CTR calculation algorithm corresponding to each feature value sequence as a CTR corresponding to the application of the feature value sequence.
本发明实施例中,在确定出每个应用的CTR之后,可以将CTR排名靠前的应用推荐给用户。In the embodiment of the present invention, after determining the CTR of each application, the application with the top CTR ranking may be recommended to the user.
在执行步骤S202之后以及在执行步骤S203之前,还可以执行以下操作:After performing step S202 and before performing step S203, the following operations may also be performed:
将上述第二数量的第二样本数据存储在地址连续的内存空间中。The second quantity of the second sample data is stored in a memory space of consecutive addresses.
本发明实施例中,将上述第二数量的第二样本数据存储在地址连续的内存空间中且可以用头尾指针数组标识每一个第二样本数据的开始内存和结束内存,这样能够加速第二样本数据的读取。In the embodiment of the present invention, the second quantity of the second sample data is stored in the contiguous memory space and the head and tail pointer array may be used to identify the start memory and the end memory of each second sample data, thereby speeding up the second Reading of sample data.
可见,实施本发明实施例能够通过一次学习过程获取用于表示应用的特征值序列与该应用的CTR之间关系的计算公式,在后续确定CTR时可以根据确定出的特征值序列以及计算公式快速的计算出应用的CTR,进而能够快速的在推广资源中显示合适的应用,提高了用户的使用体验,且在获取计算公式时对样本数据进行了归并,资源消耗低。It can be seen that the implementation of the embodiment of the present invention can obtain a calculation formula for indicating the relationship between the application of the feature value sequence and the CTR of the application by using a learning process, and can quickly determine the CTR based on the determined feature value sequence and the calculation formula. Calculate the CTR of the application, and then quickly display the appropriate application in the promotion resource, improve the user experience, and merge the sample data when the calculation formula is obtained, and the resource consumption is low.
参见图3,图3是本发明实施例公开的又一种点击到达率CTR的确定方法的流程示意图。其中,图3所示的方法可以应用于服务器中。如图3所示,该点击到达率CTR的确定方式可以为: Referring to FIG. 3, FIG. 3 is a schematic flowchart diagram of still another method for determining a click arrival rate CTR according to an embodiment of the present invention. Among them, the method shown in FIG. 3 can be applied to a server. As shown in FIG. 3, the click arrival rate CTR can be determined as follows:
S301、从预先存储的样本数据中读取第一数量的第一样本数据。S301. Read a first quantity of first sample data from pre-stored sample data.
本发明实施例中,每个样本数据由特征值序列、用于标识该样本数据是否被展现的展现标识pv以及用于标识该样本数据是否被点击的点击标识click组成,且特征值序列由用于描述应用信息(如应用的类别以及应用的相关描述信息等)的应用特征值序列、用于描述用户信息(如用户的性别及用户的兴趣等)的用户特征值序列以及用于描述用户行为信息(如时间、地点以及语言等)的流量特征值序列组成,其中,pv为1且click为0表示该样本数据为展现样本数据,pv为0且click为1表示该样本数据为点击样本数据。In the embodiment of the present invention, each sample data is composed of a sequence of feature values, a presentation identifier pv for identifying whether the sample data is presented, and a click identifier click for identifying whether the sample data is clicked, and the sequence of feature values is used. a sequence of application feature values describing application information (such as categories of applications and related description information of applications, etc.), sequence of user feature values for describing user information (such as gender of the user and interest of the user, etc.) and for describing user behavior A sequence of traffic characteristic values of information (such as time, place, language, etc.), wherein pv is 1 and click is 0 indicates that the sample data is presentation sample data, pv is 0 and click 1 indicates that the sample data is click sample data. .
S302、归并上述第一数量的第一样本数据中特征值序列相同的第一样本数据,以获取第二数量的第二样本数据。S302. The first sample data with the same sequence of feature values in the first quantity of the first sample data is merged to obtain a second quantity of second sample data.
本发明实施例中,以特征值序列相同为归并原则,将特征值序列相同的多条第一样本数据归并成一条第二样本数据,即将上述第一数量的第一样本数据归并成第二数量的第二样本数据,且第二数量等于上述第一数量的第一样本数据中特征值序列的不同取值的个数。其中,第二数量的第二样本数据由形成该第二样本数据的第一样本数据的特征值序列、形成该第二样本数据的所有第一样本数据的pv之和以及形成该第二样本数据的所有第一样本数据的click之和组成。In the embodiment of the present invention, the first sample data having the same sequence of feature values is merged into a second sample data by using the same sequence of feature values as the principle of merging, that is, the first sample data of the first quantity is merged into the first sample data. And a second quantity is equal to the number of different values of the sequence of feature values in the first sample data of the first quantity. The second quantity of second sample data is formed by a sum of a sequence of feature values of the first sample data forming the second sample data, a pv of all first sample data forming the second sample data, and forming the second The sum of the clicks of all the first sample data of the sample data.
S303、将第二数量的第二样本数据均分成每份包括第三数量的第二样本数据的样本组。S303. Divide the second quantity of the second sample data into the sample group including the third quantity of the second sample data.
本发明实施例中,第三数量可以小于等于服务器中的CPU核数,这样将第二数量的第二样本数据均分的方式能够同时对每份第三数量的第二样本数据进行相同的处理,加快了处理速度。In the embodiment of the present invention, the third quantity may be less than or equal to the number of CPU cores in the server, so that the manner of equally dividing the second quantity of second sample data can simultaneously perform the same processing for each third quantity of the second sample data. , speed up the processing speed.
S304、分别将每个样本组包括的每个第二样本数据表达成似然表达式,并将每个样本组对应的所有似然表达式相乘以获取该样本组的乘积似然表达式。S304. Express each second sample data included in each sample group into a likelihood expression, and multiply all likelihood expressions corresponding to each sample group to obtain a product likelihood expression of the sample group.
S305、通过牛顿迭代法以及初始迭代参数进行一次迭代。S305, performing an iteration by a Newton iteration method and initial iteration parameters.
S306、分别计算出每个乘积似然表达式取最大值时该乘积似然表达式中未知参数的第一值,并计算所有第一值的和。S306. Calculate a first value of an unknown parameter in the product likelihood expression when each product likelihood expression takes a maximum value, and calculate a sum of all the first values.
本发明实施例中,对每个乘积似然表达式进行迭代之后得到第二数量个未知参数的值,然后将第二数量个未知参数的值求和,作为下一次迭代时的初始迭代参数。In the embodiment of the present invention, each product likelihood expression is iterated to obtain a value of the second number of unknown parameters, and then the values of the second number of unknown parameters are summed as the initial iteration parameter at the next iteration.
S307、判断迭代次数是否到达目标次数。S307. Determine whether the number of iterations reaches the target number of times.
本发明实施例中,当步骤S307的判断结果为是时,执行步骤S309;当步骤S307的判断结果为否时,执行步骤S308。In the embodiment of the present invention, when the determination result in step S307 is YES, step S309 is performed; when the determination result in step S307 is NO, step S308 is performed.
本发明实施例中,目标次数m可以是预先设置的次数,也可以是根据第m-1次迭代后得到的第二数量个未知参数的值的和
Figure PCTCN2016094448-appb-000037
与第m次迭代后得到的第二数量个未知参数的值的和
Figure PCTCN2016094448-appb-000038
的夹角最小或根据
Figure PCTCN2016094448-appb-000039
的模为最小值计算出来的,本发明实施例不作限定。
In the embodiment of the present invention, the target number m may be a preset number of times, or may be a sum of values of the second number of unknown parameters obtained after the m-1th iteration.
Figure PCTCN2016094448-appb-000037
The sum of the values of the second number of unknown parameters obtained after the mth iteration
Figure PCTCN2016094448-appb-000038
Minimum angle or basis
Figure PCTCN2016094448-appb-000039
The modulo is calculated as the minimum value, which is not limited in the embodiment of the present invention.
S308、将上述所有第一值的和作为下一次迭代的初始迭代参数。S308. The sum of all the first values described above is used as an initial iteration parameter of the next iteration.
本发明实施例中,执行完毕步骤S308之后执行步骤S305。In the embodiment of the present invention, after step S308 is performed, step S305 is performed.
S309、将在目标次数迭代后计算出的每个乘积似然表达式中未知参数的第二值的和确定为
Figure PCTCN2016094448-appb-000040
S309. Determine a sum of a second value of an unknown parameter in each product likelihood expression calculated after the target number of iterations is determined as
Figure PCTCN2016094448-appb-000040
S310、在检测到针对应用的显示请求时,确定每个应用的特征值序列。S310. Determine a sequence of feature values for each application when a display request for the application is detected.
S311、分别将每个特征值序列作为预先设置的CTR计算算法的输入,获取与每个特征值序列对应的CTR计算算法的输出。S311. Each of the feature value sequences is respectively input as a preset CTR calculation algorithm, and an output of the CTR calculation algorithm corresponding to each feature value sequence is acquired.
S312、将每个特征值序列对应的CTR计算算法的输出确定为该特征值序列对应应用的CTR。S312. Determine an output of the CTR calculation algorithm corresponding to each feature value sequence as a CTR corresponding to the application of the feature value sequence.
在执行步骤S302之后以及在执行步骤S303之前,还可以执行以下操作:After performing step S302 and before performing step S303, the following operations may also be performed:
将上述第二数量的第二样本数据存储在地址连续的内存空间中。The second quantity of the second sample data is stored in a memory space of consecutive addresses.
本发明实施例中,将上述第二数量的第二样本数据存储在地址连续的内存空间中且可以用头尾指针数组标识每一个第二样本数据的开始内存和结束内存,这样能够加速第二样本数据的读取。In the embodiment of the present invention, the second quantity of the second sample data is stored in the contiguous memory space and the head and tail pointer array may be used to identify the start memory and the end memory of each second sample data, thereby speeding up the second Reading of sample data.
实施本发明实施例能够快速的确定出应用的CTR且资源消耗小。Embodiments of the present invention can quickly determine the CTR of an application and consume small resources.
参见图4,图4是本发明实施例公开的一种点击到达率CTR的确定装置的结构示意图。如图4所示,该装置可以安装在服务器中。如图4所示,该装置可以包括第一确定单元401、第一获取单元402以及第二确定单元403,其中:Referring to FIG. 4, FIG. 4 is a schematic structural diagram of a device for determining a click arrival rate CTR according to an embodiment of the present invention. As shown in Figure 4, the device can be installed in a server. As shown in FIG. 4, the apparatus may include a first determining unit 401, a first obtaining unit 402, and a second determining unit 403, where:
第一确定单元401用于在检测到针对应用的显示请求时,确定每个应用的特征值序列。The first determining unit 401 is configured to determine a sequence of feature values for each application when a display request for the application is detected.
本发明实施例中,针对应用的显示请求可以是由用户通过终端设备触发的,也可以是由终端设备主动触发的,且每个应用的特征值序列由用于描述应用信息(如应用的类别以及应用的相关描述信息等)的应用特征值序列、用于描述用户信息(如用户的性别及用户的兴趣等)的用户特征值序列以及用于描述用户行为信息(如时间、地点以及语言等)的流量特征值序列组成,且该特征值序列为由多个0和1作为分量的特征值向量。In the embodiment of the present invention, the display request for the application may be triggered by the user through the terminal device, or may be actively triggered by the terminal device, and the sequence of feature values of each application is used to describe the application information (such as the category of the application). And a sequence of application feature values of the related description information of the application, etc., a sequence of user feature values for describing user information (such as the gender of the user and the interest of the user, etc.) and information describing the user behavior (such as time, place, language, etc.) The sequence of traffic characteristic values is composed, and the sequence of feature values is a feature value vector composed of a plurality of 0s and 1s as components.
第一获取单元402用于分别将每个特征值序列作为预先设置的CTR计算算法的输入,获取与每个特征值序列对应的CTR计算算法的输出。The first obtaining unit 402 is configured to respectively acquire each feature value sequence as an input of a preset CTR calculation algorithm, and acquire an output of a CTR calculation algorithm corresponding to each feature value sequence.
本发明实施例中,预先设置的CTR计算算法用于表示应用的特征值序列(输入)与该应用的CTR(输出)之间的关系,即将每个应用的特征值序列分别作为该CTR计算算法的输入,对应的CTR计算算法的输出即为应用的CTR。这样只要预先设置好CTR计算算法,就可以快速的确定出应用的CTR,无需再依赖于大量的历史样本数据,资源消耗低。In the embodiment of the present invention, the preset CTR calculation algorithm is used to represent the relationship between the applied feature value sequence (input) and the CTR (output) of the application, that is, the eigenvalue sequence of each application is respectively used as the CTR calculation algorithm. The input, the output of the corresponding CTR calculation algorithm is the CTR of the application. In this way, as long as the CTR calculation algorithm is set in advance, the CTR of the application can be quickly determined, and there is no need to rely on a large amount of historical sample data, and the resource consumption is low.
第二确定单元403用于将每个特征值序列对应的CTR计算算法的输出确定为该特征值 序列对应应用的CTR。The second determining unit 403 is configured to determine an output of the CTR calculation algorithm corresponding to each sequence of feature values as the feature value. The sequence corresponds to the CTR of the application.
该CTR计算算法为基于逻辑回归模型的算法,且该基于逻辑回归模型的算法的计算公式为:The CTR calculation algorithm is an algorithm based on a logistic regression model, and the calculation formula of the algorithm based on the logistic regression model is:
Figure PCTCN2016094448-appb-000041
Figure PCTCN2016094448-appb-000041
其中,yCTR为该计算公式的输出(应用的CTR),
Figure PCTCN2016094448-appb-000042
为该计算公式的输入(应用的特征值序列),
Figure PCTCN2016094448-appb-000043
为预先计算出的该计算公式的系数。本发明实施例中,只要根据一定的历史样本数据计算出该计算公式中的
Figure PCTCN2016094448-appb-000044
在后续的CTR确定过程中只要获取到应用的特征值序列均可快速的确定出应用的CTR。
Where y CTR is the output of the calculation formula (CTR of the application),
Figure PCTCN2016094448-appb-000042
For the input of this calculation formula (the sequence of eigenvalues applied),
Figure PCTCN2016094448-appb-000043
Is the coefficient of the calculation formula calculated in advance. In the embodiment of the present invention, the calculation formula is calculated according to certain historical sample data.
Figure PCTCN2016094448-appb-000044
The CTR of the application can be quickly determined by acquiring the sequence of feature values of the application in the subsequent CTR determination process.
在图4所示的装置结构基础上,该装置还可以包括第一读取单元404、第一归并单元405、第二获取单元406、第一计算单元407以及第三确定单元408,此时,该装置的结构可以如图5所示,图5是本发明实施例公开的另一种点击到达率CTR的确定装置的结构示意图。其中:Based on the device structure shown in FIG. 4, the device may further include a first reading unit 404, a first merging unit 405, a second obtaining unit 406, a first calculating unit 407, and a third determining unit 408. The structure of the apparatus can be as shown in FIG. 5. FIG. 5 is a schematic structural diagram of another apparatus for determining the click arrival rate CTR disclosed in the embodiment of the present invention. among them:
第一读取单元404用于从预先存储的样本数据中读取第一数量的第一样本数据。The first reading unit 404 is configured to read the first quantity of first sample data from the pre-stored sample data.
本发明实施例中,每个样本数据由特征值序列、用于标识该样本是否被展现的展现标识pv以及用于标识该样本数据是否被点击的点击标识click组成,且特征值序列由用于描述应用信息(如应用的类别以及应用的相关描述信息等)的应用特征值序列、用于描述用户信息(如用户的性别及用户的兴趣等)的用户特征值序列以及用于描述用户行为信息(如时间、地点以及语言等)的流量特征值序列组成,其中,pv为1且click为0表示该样本数据为展现样本数据,pv为0且click为1表示该样本数据为点击样本数据。In the embodiment of the present invention, each sample data is composed of a sequence of feature values, a presentation identifier pv for identifying whether the sample is presented, and a click identifier click for identifying whether the sample data is clicked, and the sequence of feature values is used for A sequence of application feature values describing application information (such as the category of the application and related description information of the application, etc.), a sequence of user feature values for describing user information (such as the gender of the user and the user's interest, etc.) and information describing the user behavior A sequence of traffic characteristic values (such as time, place, language, etc.), wherein pv is 1 and click is 0 indicates that the sample data is presentation sample data, pv is 0 and click 1 indicates that the sample data is click sample data.
第一归并单元405用于归并上述第一数量的第一样本数据中特征值序列相同的第一样本数据,以获取第二数量的第二样本数据。The first merging unit 405 is configured to merge the first sample data having the same sequence of feature values in the first quantity of the first sample data to obtain the second quantity of second sample data.
本发明实施例中,以特征值序列相同为归并原则,将特征值序列相同的多条第一样本数据归并成一条第二样本数据,即将上述第一数量的第一样本数据归并成第二数量的第二样本数据,且第二数量等于上述第一数量的第一样本数据中特征值序列的不同取值的个数。其中,第二数量的第二样本数据由形成该第二样本数据的第一样本数据的特征值序列、形成该第二样本数据的所有第一样本数据的pv之和以及形成该第二样本数据的所有第一样本数据的click之和组成。In the embodiment of the present invention, the first sample data having the same sequence of feature values is merged into a second sample data by using the same sequence of feature values as the principle of merging, that is, the first sample data of the first quantity is merged into the first sample data. And a second quantity is equal to the number of different values of the sequence of feature values in the first sample data of the first quantity. The second quantity of second sample data is formed by a sum of a sequence of feature values of the first sample data forming the second sample data, a pv of all first sample data forming the second sample data, and forming the second The sum of the clicks of all the first sample data of the sample data.
第二获取单元406用于将每个第二样本数据表达成似然表达式,并将所有似然表达式相乘以获取乘积似然表达式。The second obtaining unit 406 is configured to express each second sample data as a likelihood expression and multiply all likelihood expressions to obtain a product likelihood expression.
第一计算单元407用于通过牛顿迭代法以及初始迭代参数进行目标次数次的迭代,计算 出上述乘积似然表达式取最大值时上述乘积似然表达式中未知参数的值。The first calculating unit 407 is configured to perform the iteration of the target times by the Newton iteration method and the initial iteration parameter, and calculate The value of the unknown parameter in the above product likelihood expression is obtained when the above product likelihood expression takes the maximum value.
第三确定单元408用于将上述未知参数的值确定为上述
Figure PCTCN2016094448-appb-000045
The third determining unit 408 is configured to determine the value of the above unknown parameter as the above
Figure PCTCN2016094448-appb-000045
在图4所示的装置结构基础上,该装置还可以包括第二读取单元409、第二归并单元410、存储单元411、均分单元412、第三获取单元413、第二计算单元414以及第四确定单元415,此时,该装置的结构可以如图6所示,图6是本发明实施例公开的又一种点击到达率CTR的确定装置的结构示意图。其中:Based on the device structure shown in FIG. 4, the device may further include a second reading unit 409, a second merging unit 410, a storage unit 411, a averaging unit 412, a third obtaining unit 413, a second calculating unit 414, and The fourth determining unit 415, at this time, the structure of the device can be as shown in FIG. 6. FIG. 6 is a schematic structural diagram of another determining device for the click arrival rate CTR disclosed in the embodiment of the present invention. among them:
第二读取单元409用于从预先存储的样本数据中读取第一数量的第一样本数据。The second reading unit 409 is configured to read the first quantity of the first sample data from the pre-stored sample data.
第二归并单元410用于归并上述第一数量的第一样本数据中特征值序列相同的第一样本数据,以获取第二数量的第二样本数据。The second merging unit 410 is configured to merge the first sample data having the same sequence of feature values in the first quantity of the first sample data to obtain a second quantity of second sample data.
本发明实施例中,以特征值序列相同为归并原则,将特征值序列相同的多条第一样本数据归并成一条第二样本数据,即将上述第一数量的第一样本数据归并成第二数量的第二样本数据,且第二数量等于上述第一数量的第一样本数据中特征值序列的不同取值的个数。其中,第二数量的第二样本数据由形成该第二样本数据的第一样本数据的特征值序列、形成该第二样本数据的所有第一样本数据的pv之和以及形成该第二样本数据的所有第一样本数据的click之和组成。In the embodiment of the present invention, the first sample data having the same sequence of feature values is merged into a second sample data by using the same sequence of feature values as the principle of merging, that is, the first sample data of the first quantity is merged into the first sample data. And a second quantity is equal to the number of different values of the sequence of feature values in the first sample data of the first quantity. The second quantity of second sample data is formed by a sum of a sequence of feature values of the first sample data forming the second sample data, a pv of all first sample data forming the second sample data, and forming the second The sum of the clicks of all the first sample data of the sample data.
存储单元411用于将上述第二数量的第二样本数据存储在地址连续的内存空间中。The storage unit 411 is configured to store the second quantity of the second sample data in the memory space with consecutive addresses.
均分单元412用于将第二数量的第二样本数据均分成每份包括第三数量的第二样本数据的样本组。The averaging unit 412 is configured to divide the second number of second sample data into a sample group including a third number of second sample data.
本发明实施例中,第三数量可以小于等于服务器中的CPU核数,这样将第二数量的第二样本数据均分的方式能够同时对每份第三数量的第二样本数据进行相同的处理,加快了处理速度。In the embodiment of the present invention, the third quantity may be less than or equal to the number of CPU cores in the server, so that the manner of equally dividing the second quantity of second sample data can simultaneously perform the same processing for each third quantity of the second sample data. , speed up the processing speed.
第三获取单元413用于分别将每个样本组包括的每个第二样本数据表达成似然表达式,并将每个样本组对应的所有似然表达式相乘以获取该样本组的乘积似然表达式。The third obtaining unit 413 is configured to respectively express each second sample data included in each sample group into a likelihood expression, and multiply all likelihood expressions corresponding to each sample group to obtain a product of the sample group. Likelihood expression.
第二计算单元414用于通过牛顿迭代法以及初始迭代参数进行一次迭代,分别计算出每个乘积似然表达式取最大值时该乘积似然表达式中未知参数的第一值,将每个第一值的和作为下一次迭代的初始迭代参数,并执行通过牛顿迭代法以及初始迭代参数进行一次迭代的操作,直至迭代次数到达目标次数。The second calculating unit 414 is configured to perform an iteration by the Newton iteration method and the initial iterative parameter, and respectively calculate the first value of the unknown parameter in the product likelihood expression when each product likelihood expression takes the maximum value, and each will be The sum of the first values is used as the initial iteration parameter for the next iteration, and an iterative operation is performed by the Newton iteration method and the initial iteration parameters until the number of iterations reaches the target number of times.
第四确定单元415用于将在目标次数迭代后计算出的每个乘积似然表达式中未知参数的第二值的和确定为上述
Figure PCTCN2016094448-appb-000046
The fourth determining unit 415 is configured to determine, as the above, the sum of the second values of the unknown parameters in each of the product likelihood expressions calculated after the target number of times is iterated
Figure PCTCN2016094448-appb-000046
本发明实施例能够快速的确定出应用的CTR且资源消耗小。The embodiment of the invention can quickly determine the CTR of the application and the resource consumption is small.
为了实现上述实施例,本发明还提出一种服务器,该服务器包括图4、图5和图6描述的本发明另一个实施例的点击到达率CTR的确定装置。 In order to implement the above embodiment, the present invention also proposes a server including the determining means of the click arrival rate CTR of another embodiment of the present invention described in FIGS. 4, 5 and 6.
参见图7,图7是根据本发明一个实施例的服务器的结构示意图,用于执行本发明实施例公开的点击到达率CTR的确定方法。该服务器可以包括:至少一个处理器501,例如CPU,至少一个网络接口504或者其他用户接口503,存储器505,至少一个通信总线502。通信总线502用于实现这些组件之间的连接通信。其中,用户接口503可选的可以包括USB接口以及其他标准接口、有线接口。网络接口504可选的可以包括Wi-Fi接口以及其他无线接口。存储器505可能包含高速RAM存储器,也可能还包括非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器505可选的可以包含至少一个位于远离前述处理器501的存储装置。如图7所示,作为一种计算机存储介质的存储器505中可以包括操作系统5051及应用程序5052。Referring to FIG. 7, FIG. 7 is a schematic structural diagram of a server for performing a method for determining a click arrival rate CTR according to an embodiment of the present invention. The server may include at least one processor 501, such as a CPU, at least one network interface 504 or other user interface 503, memory 505, at least one communication bus 502. Communication bus 502 is used to implement connection communication between these components. The user interface 503 can optionally include a USB interface and other standard interfaces and wired interfaces. Network interface 504 can optionally include a Wi-Fi interface as well as other wireless interfaces. The memory 505 may include a high speed RAM memory and may also include a non-volatile memory such as at least one disk memory. The memory 505 can optionally include at least one storage device located remotely from the aforementioned processor 501. As shown in FIG. 7, an operating system 5051 and an application 5052 may be included in the memory 505 as a computer storage medium.
在一些实施方式中,存储器505存储了如下的元素,可执行模块或者数据结构,或者他们的子集,或者他们的扩展集:In some implementations, the memory 505 stores the following elements, executable modules or data structures, or a subset thereof, or their extended set:
操作系统5051,包含各种系统程序,用于实现各种基础业务以及处理基于硬件的任务; Operating system 5051, including various system programs for implementing various basic services and processing hardware-based tasks;
应用程序5052,包含目标网络数据的数据分块参数的设置程序、目标数据块的划分程序、目标数据块与数据库中的数据块的比较程序以及目标数据块的删除程序等各种应用程序,用于实现各种应用业务。The application 5052 includes various setting programs of a data block parameter setting program of the target network data, a partitioning program of the target data block, a comparison program of the target data block and the data block in the database, and a deletion program of the target data block. To achieve a variety of application services.
具体地,处理器501用于调用存储器505中存储的程序,执行以下操作:Specifically, the processor 501 is configured to call a program stored in the memory 505, and performs the following operations:
接收客户端采集的客户端特征信息,其中,所述客户端特征信息根据客户端操作系统中应用程序的相关信息以及客户端用户相关信息形成,所述应用程序的相关信息包括所述应用程序的运行信息、所述应用程序的行为信息、和与所述应用程序相关的情景信息中的至少一个。Receiving client feature information collected by the client, where the client feature information is formed according to related information of the application in the client operating system and related information of the client user, where the related information of the application includes the application At least one of operational information, behavior information of the application, and context information related to the application.
处理器501对上述步骤的具体执行过程以及处理器501通过运行程序来进一步执行的步骤,可以参见上述方法实施例的描述,在此不再赘述。For the specific execution process of the foregoing steps and the steps of the processor 501 to be further executed by the processor 501, reference may be made to the description of the foregoing method embodiments, and details are not described herein again.
本发明实施例还提供了一种计算机可读存储介质,具有存储于其中的指令,当移动终端的处理器执行所述指令时,所述移动终端执行上述点击到达率CTR的确定方法。The embodiment of the present invention further provides a computer readable storage medium having instructions stored therein, when the processor of the mobile terminal executes the instruction, the mobile terminal performs the determining method of the click arrival rate CTR.
本发明实施例还提供了一种计算机应用程序,当其在移动终端的处理器上执行时,执行上述点击到达率CTR的确定方法。The embodiment of the present invention further provides a computer application program that performs the above-described determination method of the click arrival rate CTR when it is executed on the processor of the mobile terminal.
需要说明的是,在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详细描述的部分,可以参见其他实施例的相关描述。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作以及单元并不一定是本发明所必须的。It should be noted that, in the above embodiments, the descriptions of the various embodiments are different, and the parts that are not described in detail in a certain embodiment may be referred to the related descriptions of other embodiments. In addition, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions and units involved are not necessarily required by the present invention.
以上对本发明实施例所提供的一种点击到达率CTR的确定方法及装置进行了详细介绍,本文中应用了具体实例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是 用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。The method and device for determining the click arrival rate CTR provided by the embodiments of the present invention are described in detail. The principles and implementation manners of the present invention are described in the specific examples. The description of the above embodiments is only The method for understanding the present invention and its core idea; at the same time, for those of ordinary skill in the art, according to the idea of the present invention, there will be changes in specific embodiments and application scopes. The description should not be construed as limiting the invention.
本发明实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减。The steps in the method of the embodiment of the present invention may be sequentially adjusted, merged, and deleted according to actual needs.
本发明实施例装置中的单元可以根据实际需要进行合并、划分和删减。The units in the apparatus of the embodiment of the present invention may be combined, divided, and deleted according to actual needs.
本发明实施例中所述单元可以通过通用集成电路,例如CPU(Central Processing Unit,中央处理器),或通过ASIC(Application Specific Integrated Circuit,专用集成电路)来实现。The unit in the embodiment of the present invention may be implemented by a general-purpose integrated circuit, such as a CPU (Central Processing Unit), or an ASIC (Application Specific Integrated Circuit).
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。A person skilled in the art can understand that all or part of the process of implementing the above embodiment method can be completed by a computer program to instruct related hardware, and the program can be stored in a computer readable storage medium. When executed, the flow of an embodiment of the methods as described above may be included. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
需要说明的是,在本文中,诸如第一和第二之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, without necessarily requiring or implying between these entities or operations. There are any such actual relationships or sequences. Furthermore, the term "comprises" or "comprises" or "comprises" or any other variations thereof is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device that comprises a plurality of elements includes not only those elements but also Other elements, or elements that are inherent to such a process, method, item, or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。The logic and/or steps represented in the flowchart or otherwise described herein, for example, may be considered as an ordered list of executable instructions for implementing logical functions, and may be embodied in any computer readable medium, Used in conjunction with, or in conjunction with, an instruction execution system, apparatus, or device (eg, a computer-based system, a system including a processor, or other system that can fetch instructions and execute instructions from an instruction execution system, apparatus, or device) Or use with equipment. For the purposes of this specification, a "computer-readable medium" can be any apparatus that can contain, store, communicate, propagate, or transport a program for use in an instruction execution system, apparatus, or device, or in conjunction with the instruction execution system, apparatus, or device. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections (electronic devices) having one or more wires, portable computer disk cartridges (magnetic devices), random access memory (RAM), Read only memory (ROM), erasable editable read only memory (EPROM or flash memory), fiber optic devices, and portable compact disk read only memory (CDROM). In addition, the computer readable medium may even be a paper or other suitable medium on which the program can be printed, as it may be optically scanned, for example by paper or other medium, followed by editing, interpretation or, if appropriate, other suitable The method is processed to obtain the program electronically and then stored in computer memory.
应当理解,本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施 方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。It should be understood that portions of the invention may be implemented in hardware, software, firmware or a combination thereof. In the above implementation In the manner, multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals. Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.
在本发明中,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”、“固定”等术语应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或成一体;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通或两个元件的相互作用关系,除非另有明确的限定。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本发明中的具体含义。In the present invention, the terms "installation", "connected", "connected", "fixed" and the like shall be understood broadly, and may be either a fixed connection or a detachable connection, unless explicitly stated and defined otherwise. , or integrated; can be mechanical or electrical connection; can be directly connected, or indirectly connected through an intermediate medium, can be the internal communication of two elements or the interaction of two elements, unless otherwise specified Limited. For those skilled in the art, the specific meanings of the above terms in the present invention can be understood on a case-by-case basis.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of the present specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" and the like means a specific feature described in connection with the embodiment or example. A structure, material or feature is included in at least one embodiment or example of the invention. In the present specification, the schematic representation of the above terms is not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples. In addition, various embodiments or examples described in the specification, as well as features of various embodiments or examples, may be combined and combined.
尽管上面已经示出和描述了本发明的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本发明的限制,本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。 Although the embodiments of the present invention have been shown and described, it is understood that the above-described embodiments are illustrative and are not to be construed as limiting the scope of the invention. The embodiments are subject to variations, modifications, substitutions and variations.

Claims (13)

  1. 一种点击到达率CTR的确定方法,其特征在于,所述方法包括:A method for determining a click arrival rate CTR, the method comprising:
    在检测到针对应用的显示请求时,确定每个应用的特征值序列,所述特征值序列由用于描述应用信息的应用特征值序列、用于描述用户信息的用户特征值序列以及流量特征值序列组成;When a display request for an application is detected, a sequence of feature values for each application is determined, the sequence of feature values being a sequence of applied feature values for describing application information, a sequence of user feature values for describing user information, and a flow feature value Sequence composition
    分别将每个所述特征值序列作为预先设置的CTR计算算法的输入,获取与每个所述特征值序列对应的所述CTR计算算法的输出;And respectively outputting each of the feature value sequences as an input of a preset CTR calculation algorithm, and acquiring an output of the CTR calculation algorithm corresponding to each of the feature value sequences;
    将每个所述特征值序列对应的所述CTR计算算法的输出确定为所述特征值序列对应应用的CTR。The output of the CTR calculation algorithm corresponding to each of the feature value sequences is determined as the CTR of the application of the feature value sequence.
  2. 根据权利要求1所述的方法,其特征在于,所述CTR计算算法为基于逻辑回归模型的算法,且所述基于逻辑回归模型的算法的计算公式为:The method according to claim 1, wherein the CTR calculation algorithm is a logistic regression model based algorithm, and the calculation formula of the logistic regression model based algorithm is:
    Figure PCTCN2016094448-appb-100001
    Figure PCTCN2016094448-appb-100001
    其中,所述yCTR为所述计算公式的输出,所述
    Figure PCTCN2016094448-appb-100002
    为所述计算公式的输入,所述
    Figure PCTCN2016094448-appb-100003
    为预先计算出的所述计算公式的系数。
    Wherein the yCTR is an output of the calculation formula,
    Figure PCTCN2016094448-appb-100002
    For the input of the calculation formula, the
    Figure PCTCN2016094448-appb-100003
    It is a coefficient of the calculation formula calculated in advance.
  3. 根据权利要求1或2所述的方法,其特征在于,所述在检测到针对应用的显示请求时,确定每个应用的特征值序列之前,所述方法还包括:The method according to claim 1 or 2, wherein the method further comprises: before determining a sequence of feature values for each application when detecting a display request for an application, the method further comprising:
    从预先存储的样本数据中读取第一数量的第一样本数据,所述样本数据由特征值序列、用于标识所述样本数据是否被展现的展现标识以及用于标识该样本数据是否被点击的点击标识组成;Reading a first quantity of first sample data from pre-stored sample data, the sample data being represented by a sequence of feature values, a presentation identifier for identifying whether the sample data is presented, and for identifying whether the sample data is The clicked click mark consists of;
    归并所述第一数量的第一样本数据中特征值序列相同的第一样本数据,以获取第二数量的第二样本数据,所述第二样本数据由形成所述第二样本数据的第一样本数据的特征值序列、形成所述第二样本数据的第一样本数据的展现标识之和以及形成所述第二样本数据的第一样本数据的点击标识之和组成;And combining the first sample data of the first number of first sample data with the same sequence of feature values to obtain a second quantity of second sample data, wherein the second sample data is formed by the second sample data a sum of a sequence of feature values of the first sample data, a sum of presentation identifiers of the first sample data forming the second sample data, and a click identifier of the first sample data forming the second sample data;
    将每个所述第二样本数据表达成似然表达式,并将所有所述似然表达式相乘以获取乘积似然表达式;Expressing each of the second sample data into a likelihood expression, and multiplying all of the likelihood expressions to obtain a product likelihood expression;
    通过牛顿迭代法以及初始迭代参数进行目标次数次的迭代,计算出所述乘积似然表达式取最大值时所述乘积似然表达式中未知参数的值,并将所述未知参数的值确定为所述
    Figure PCTCN2016094448-appb-100004
    Calculating the value of the unknown parameter in the product likelihood expression when the product likelihood expression takes the maximum value by using the Newton iteration method and the initial iteration parameter to perform the iteration of the target number of times, and determining the value of the unknown parameter For the stated
    Figure PCTCN2016094448-appb-100004
  4. 根据权利要求1或2所述的方法,其特征在于,所述在检测到针对应用的显示请求时,确定每个应用的特征值序列之前,所述方法还包括:The method according to claim 1 or 2, wherein the method further comprises: before determining a sequence of feature values for each application when detecting a display request for an application, the method further comprising:
    从预先存储的样本数据中读取第一数量的第一样本数据,所述第一样本数据由特征值序列、用于标识所述样本数据是否被展现的展现标识以及用于标识所述样本数据是否被点击的点击标识组成;Reading a first quantity of first sample data from pre-stored sample data, the first sample data being represented by a sequence of feature values, a presentation identifier for identifying whether the sample data is presented, and for identifying the Whether the sample data is composed of clicked click identifiers;
    归并所述第一数量的第一样本数据中特征值序列相同的第一样本数据,以获取第二数量的第二样本数据,所述第二样本数据由形成所述第二样本数据的第一样本数据的特征值序列、形成所述第二样本数据的第一样本数据的展现标识之和以及形成所述第二样本数据的第一样本数据的点击标识之和组成;And combining the first sample data of the first number of first sample data with the same sequence of feature values to obtain a second quantity of second sample data, wherein the second sample data is formed by the second sample data a sum of a sequence of feature values of the first sample data, a sum of presentation identifiers of the first sample data forming the second sample data, and a click identifier of the first sample data forming the second sample data;
    将所述第二数量的第二样本数据均分成每份包括第三数量的第二样本数据的样本组;Dividing the second quantity of second sample data into a sample group each including a third quantity of second sample data;
    分别将每个所述样本组包括的每个第二样本数据表达成似然表达式,并将每个所述样本组对应的所有所述似然表达式相乘以获取所述样本组的乘积似然表达式;Each second sample data included in each of the sample sets is expressed as a likelihood expression, and all of the likelihood expressions corresponding to each of the sample sets are multiplied to obtain a product of the sample set Likelihood expression
    通过牛顿迭代法以及初始迭代参数进行一次迭代,分别计算出每个所述乘积似然表达式取最大值时所述乘积似然表达式中未知参数的第一值,将每个所述第一值的和作为下一次迭代的初始迭代参数,执行所述通过牛顿迭代法以及初始迭代参数进行一次迭代的操作,直至迭代次数到达目标次数;Performing an iteration by the Newton iteration method and the initial iteration parameter, respectively calculating a first value of the unknown parameter in the product likelihood expression when each of the product likelihood expression takes a maximum value, each of the first The sum of the values is used as the initial iteration parameter of the next iteration, and the operation of performing an iteration by the Newton iteration method and the initial iteration parameter is performed until the number of iterations reaches the target number of times;
    将在所述目标次数迭代后计算出的每个所述乘积似然表达式中所述未知参数的第二值的和确定为所述
    Figure PCTCN2016094448-appb-100005
    And determining a sum of the second values of the unknown parameters in each of the product likelihood expressions calculated after the target number of iterations is determined as
    Figure PCTCN2016094448-appb-100005
    .
  5. 根据权利要求4所述的方法,其特征在于,所述归并所述第一数量的第一样本数据中特征值序列相同的第一样本数据之后,所述将所述第二数量的第二样本数据均分成每份包括第三数量的第二样本数据的样本组之前,所述方法还包括:The method according to claim 4, wherein after the first sample data having the same sequence of feature values in the first number of first sample data is merged, the second number of Before the two sample data are divided into each sample group including the third quantity of the second sample data, the method further includes:
    将所述第二数量的第二样本数据存储在地址连续的内存空间中。The second quantity of second sample data is stored in a memory space of consecutive addresses.
  6. 一种点击到达率CTR的确定装置,其特征在于,所述装置包括第一确定单元、第一获取单元以及第二确定单元,其中:A determining device for a click arrival rate CTR, characterized in that the device comprises a first determining unit, a first obtaining unit and a second determining unit, wherein:
    所述第一确定单元,用于在检测到针对应用的显示请求时,确定每个应用的特征值序列,所述特征值序列由用于描述应用信息的应用特征值序列、用于描述用户信息的用户特征值序列以及流量特征值序列组成;The first determining unit is configured to determine a sequence of feature values of each application when the display request for the application is detected, where the sequence of feature values is used by the application feature value sequence for describing application information, and is used to describe user information. User characteristic value sequence and flow characteristic value sequence composition;
    所述第一获取单元,用于分别将每个所述特征值序列作为预先设置的CTR计算算法的输入,获取与每个所述特征值序列对应的所述CTR计算算法的输出; The first acquiring unit is configured to respectively obtain, according to an input of a preset CTR calculation algorithm, an output of the CTR calculation algorithm corresponding to each of the feature value sequences;
    所述第二确定单元,用于将每个所述特征值序列对应的所述CTR计算算法的输出确定为所述特征值序列对应应用的CTR。The second determining unit is configured to determine an output of the CTR calculation algorithm corresponding to each of the feature value sequences as a CTR corresponding to the application of the feature value sequence.
  7. 根据权利要求6所述的装置,其特征在于,所述CTR计算算法为基于逻辑回归模型的算法,且所述基于逻辑回归模型的算法的计算公式为:The apparatus according to claim 6, wherein the CTR calculation algorithm is a logistic regression model based algorithm, and the calculation formula of the logistic regression model based algorithm is:
    Figure PCTCN2016094448-appb-100006
    Figure PCTCN2016094448-appb-100006
    其中,所述yCTR为所述计算公式的输出,所述
    Figure PCTCN2016094448-appb-100007
    为所述计算公式的输入,所述
    Figure PCTCN2016094448-appb-100008
    为预先计算出的所述计算公式的系数。
    Wherein the yCTR is an output of the calculation formula,
    Figure PCTCN2016094448-appb-100007
    For the input of the calculation formula, the
    Figure PCTCN2016094448-appb-100008
    It is a coefficient of the calculation formula calculated in advance.
  8. 根据权利要求6或7所述的装置,其特征在于,所述装置还包括第一读取单元、第一归并单元、第二获取单元、第一计算单元以及第三确定单元,其中:The device according to claim 6 or 7, wherein the device further comprises a first reading unit, a first merging unit, a second obtaining unit, a first calculating unit and a third determining unit, wherein:
    所述第一读取单元,用于从预先存储的样本数据中读取第一数量的第一样本数据,所述样本数据由特征值序列、用于标识所述样本数据是否被展现的展现标识以及用于标识所述样本数据是否被点击的点击标识组成;The first reading unit is configured to read a first quantity of first sample data from pre-stored sample data, the sample data being represented by a sequence of feature values for identifying whether the sample data is presented An identifier and a click identifier for identifying whether the sample data is clicked;
    所述第一归并单元,用于归并所述第一数量的第一样本数据中特征值序列相同的第一样本数据,以获取第二数量的第二样本数据,所述第二样本数据由形成所述第二样本数据的第一样本数据的特征值序列、形成所述第二样本数据的第一样本数据的展现标识之和以及形成所述第二样本数据的第一样本数据的点击标识之和组成;The first merging unit is configured to merge the first sample data of the first number of first sample data with the same sequence of feature values to obtain a second quantity of second sample data, the second sample data a sum of a feature value sequence of the first sample data forming the second sample data, a presentation identifier of the first sample data forming the second sample data, and a first sample forming the second sample data The sum of the click marks of the data;
    所述第二获取单元,用于将每个所述第二样本数据表达成似然表达式,并将所有所述似然表达式相乘以获取乘积似然表达式;The second obtaining unit is configured to express each of the second sample data into a likelihood expression, and multiply all the likelihood expressions to obtain a product likelihood expression;
    所述第一计算单元,用于通过牛顿迭代法以及初始迭代参数进行目标次数次的迭代,计算出所述乘积似然表达式取最大值时所述乘积似然表达式中未知参数的值;The first calculating unit is configured to perform an iteration of a target number of times by a Newton iteration method and an initial iteration parameter, and calculate a value of an unknown parameter in the product likelihood expression when the product likelihood expression takes a maximum value;
    所述第三确定单元,用于将所述未知参数的值确定为所述
    Figure PCTCN2016094448-appb-100009
    The third determining unit is configured to determine a value of the unknown parameter as the
    Figure PCTCN2016094448-appb-100009
  9. 根据权利要求6或7所述的装置,其特征在于,所述装置还包括第二读取单元、第二归并单元、均分单元、第三获取单元、第二计算单元以及第四确定单元,其中:The apparatus according to claim 6 or 7, wherein the apparatus further comprises a second reading unit, a second merging unit, a averaging unit, a third obtaining unit, a second calculating unit, and a fourth determining unit, among them:
    所述第二读取单元,用于从预先存储的样本数据中读取第一数量的第一样本数据,所述第一样本数据由特征值序列、用于标识所述样本数据是否被展现的展现标识以及用于标识所述样本数据是否被点击的点击标识组成;The second reading unit is configured to read a first quantity of first sample data from pre-stored sample data, where the first sample data is used by a sequence of feature values to identify whether the sample data is a presentation identifier of the presentation and a click identifier for identifying whether the sample data is clicked;
    所述第二归并单元,用于归并所述第一数量的第一样本数据中特征值序列相同的第一样 本数据,以获取第二数量的第二样本数据,所述第二样本数据由形成所述第二样本数据的第一样本数据的特征值序列、形成所述第二样本数据的第一样本数据的展现标识之和以及形成所述第二样本数据的第一样本数据的点击标识之和组成;The second merging unit is configured to merge the same first eigenvalue sequence in the first quantity of the first sample data The data is obtained to obtain a second quantity of second sample data, the second sample data being the same as the sequence of feature values of the first sample data forming the second sample data, forming the second sample data a sum of a presentation identifier of the data and a click identifier of the first sample data forming the second sample data;
    所述均分单元,用于将所述第二数量的第二样本数据均分成每份包括第三数量的第二样本数据的样本组;The equalizing unit is configured to divide the second quantity of second sample data into a sample group including a third quantity of second sample data;
    所述第三获取单元,用于分别将每个所述样本组包括的每个第二样本数据表达成似然表达式,并将每个所述样本组对应的所有所述似然表达式相乘以获取所述样本组的乘积似然表达式;The third obtaining unit is configured to respectively express each second sample data included in each of the sample groups into a likelihood expression, and compare all the likelihood expressions corresponding to each of the sample groups Multiplying by to obtain a product likelihood expression of the sample set;
    所述第二计算单元,用于通过牛顿迭代法以及初始迭代参数进行一次迭代,分别计算出每个所述乘积似然表达式取最大值时所述乘积似然表达式中未知参数的第一值,将每个所述第一值的和作为下一次迭代的初始迭代参数,执行所述通过牛顿迭代法以及初始迭代参数进行一次迭代的操作,直至迭代次数到达目标次数;The second calculating unit is configured to perform an iteration by using a Newton iteration method and an initial iteration parameter, and respectively calculate a first unknown parameter in the product likelihood expression when each of the product likelihood expressions takes a maximum value a value, using the sum of each of the first values as an initial iteration parameter of the next iteration, performing the operation of performing an iteration by the Newton iteration method and the initial iteration parameter until the number of iterations reaches the target number of times;
    所述第四确定单元,用于将在所述目标次数迭代后计算出的每个所述乘积似然表达式中所述未知参数的第二值的和确定为所述
    Figure PCTCN2016094448-appb-100010
    The fourth determining unit is configured to determine a sum of the second value of the unknown parameter in each of the product likelihood expressions calculated after the target number of times is iterated as
    Figure PCTCN2016094448-appb-100010
  10. 根据权利要求9所述的装置,其特征在于,所述装置还包括存储单元,其中:The apparatus of claim 9 wherein said apparatus further comprises a storage unit, wherein:
    所述存储单元,用于将所述第二数量的第二样本数据存储在地址连续的内存空间中。The storage unit is configured to store the second quantity of second sample data in a memory space with consecutive addresses.
  11. 一种服务器,其特征在于,包括:A server, comprising:
    处理器;以及Processor;
    存储器,用于存储所述处理器可执行的指令,a memory for storing instructions executable by the processor,
    其中所述处理器用于读取所述存储器中存储的可执行的指令以运行与所述指令对应的程序,以用于执行如权利要求1至5中任一项所述的点击到达率CTR的确定方法。Wherein the processor is configured to read an executable instruction stored in the memory to execute a program corresponding to the instruction for performing the click arrival rate CTR according to any one of claims 1 to 5. Determine the method.
  12. 一种计算机可读存储介质,具有存储于其中的指令,当服务器的处理器执行所述指令时,所述服务器执行如权利要求1至5中任一项所述的点击到达率CTR的确定方法。A computer readable storage medium having instructions stored therein, the server performing the determination method of the click arrival rate CTR according to any one of claims 1 to 5 when the processor of the server executes the instruction .
  13. 一种计算机应用程序,当其在服务器的处理器上执行时,执行如权利要求1至5中任一项所述的点击到达率CTR的确定方法。 A computer application that, when executed on a processor of a server, performs the determination method of the click arrival rate CTR according to any one of claims 1 to 5.
PCT/CN2016/094448 2015-08-18 2016-08-10 Determining method and device for click through rate (ctr) WO2017028728A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510507737.7 2015-08-18
CN201510507737.7A CN105205098B (en) 2015-08-18 2015-08-18 Method and device for determining click arrival rate (CTR)

Publications (1)

Publication Number Publication Date
WO2017028728A1 true WO2017028728A1 (en) 2017-02-23

Family

ID=54952782

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/094448 WO2017028728A1 (en) 2015-08-18 2016-08-10 Determining method and device for click through rate (ctr)

Country Status (2)

Country Link
CN (1) CN105205098B (en)
WO (1) WO2017028728A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023226353A1 (en) * 2022-05-26 2023-11-30 上海二三四五网络科技有限公司 Method and apparatus for calculating ctr trend content based on click position factor improvement

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205098B (en) * 2015-08-18 2018-11-20 北京金山安全软件有限公司 Method and device for determining click arrival rate (CTR)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120023043A1 (en) * 2010-07-21 2012-01-26 Ozgur Cetin Estimating Probabilities of Events in Sponsored Search Using Adaptive Models
CN102346899A (en) * 2011-10-08 2012-02-08 亿赞普(北京)科技有限公司 Method and device for predicting advertisement click rate based on user behaviors
CN103514178A (en) * 2012-06-18 2014-01-15 阿里巴巴集团控股有限公司 Searching and sorting method and device based on click rate
CN103914468A (en) * 2012-12-31 2014-07-09 阿里巴巴集团控股有限公司 Method and device for searching for released information
CN105205098A (en) * 2015-08-18 2015-12-30 北京金山安全软件有限公司 Method and device for determining click arrival rate (CTR)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110119278A1 (en) * 2009-08-28 2011-05-19 Resonate Networks, Inc. Method and apparatus for delivering targeted content to website visitors to promote products and brands
CN103390032B (en) * 2013-07-04 2017-01-18 上海交通大学 Recommendation system and method based on relationship type cooperative topic regression
CN103745225A (en) * 2013-12-27 2014-04-23 北京集奥聚合网络技术有限公司 Method and system for training distributed CTR (Click To Rate) prediction model
CN103996088A (en) * 2014-06-10 2014-08-20 苏州工业职业技术学院 Advertisement click-through rate prediction method based on multi-dimensional feature combination logical regression

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120023043A1 (en) * 2010-07-21 2012-01-26 Ozgur Cetin Estimating Probabilities of Events in Sponsored Search Using Adaptive Models
CN102346899A (en) * 2011-10-08 2012-02-08 亿赞普(北京)科技有限公司 Method and device for predicting advertisement click rate based on user behaviors
CN103514178A (en) * 2012-06-18 2014-01-15 阿里巴巴集团控股有限公司 Searching and sorting method and device based on click rate
CN103914468A (en) * 2012-12-31 2014-07-09 阿里巴巴集团控股有限公司 Method and device for searching for released information
CN105205098A (en) * 2015-08-18 2015-12-30 北京金山安全软件有限公司 Method and device for determining click arrival rate (CTR)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023226353A1 (en) * 2022-05-26 2023-11-30 上海二三四五网络科技有限公司 Method and apparatus for calculating ctr trend content based on click position factor improvement

Also Published As

Publication number Publication date
CN105205098A (en) 2015-12-30
CN105205098B (en) 2018-11-20

Similar Documents

Publication Publication Date Title
CN106873799B (en) Input method and device
US10346496B2 (en) Information category obtaining method and apparatus
CN106888236B (en) Session management method and session management device
US20190236511A1 (en) Method for estimating amount of task objects required to reach target completed tasks
CN108153909B (en) Keyword putting word-expanding method and device, electronic equipment and storage medium
CN110704751A (en) Data processing method and device, electronic equipment and storage medium
CN108965951B (en) Advertisement playing method and device
WO2015169073A1 (en) Application information searching method and device
US20180107953A1 (en) Content delivery method, apparatus, and storage medium
CN111783810B (en) Method and device for determining attribute information of user
CN113360711B (en) Model training and executing method, device, equipment and medium for video understanding task
CN110381352B (en) Virtual gift display method and device, electronic equipment and readable medium
US20210286763A1 (en) Suggesting a destination folder for a file to be saved
WO2020151548A1 (en) Method and device for sorting followed pages
WO2015148420A1 (en) User inactivity aware recommendation system
WO2017028728A1 (en) Determining method and device for click through rate (ctr)
JP2018518764A (en) Object search method, apparatus and server
CN113205189B (en) Method for training prediction model, prediction method and device
WO2015139457A1 (en) Method and device for search and recommendation
JP6668492B2 (en) Information disclosure method, information disclosure device, and storage medium
CN111242670B (en) Method and device for determining seed set of maximum influence degree index
CN110674399B (en) Method and apparatus for determining order of search items
US9530024B2 (en) Recommendation system for protecting user privacy
WO2016155384A1 (en) Search optimization method, apparatus, and system
CN110827101A (en) Shop recommendation method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16836592

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 29.05.2018)

122 Ep: pct application non-entry in european phase

Ref document number: 16836592

Country of ref document: EP

Kind code of ref document: A1