WO2016107455A1 - Information matching processing method and apparatus - Google Patents

Information matching processing method and apparatus Download PDF

Info

Publication number
WO2016107455A1
WO2016107455A1 PCT/CN2015/098247 CN2015098247W WO2016107455A1 WO 2016107455 A1 WO2016107455 A1 WO 2016107455A1 CN 2015098247 W CN2015098247 W CN 2015098247W WO 2016107455 A1 WO2016107455 A1 WO 2016107455A1
Authority
WO
WIPO (PCT)
Prior art keywords
product information
search keyword
feature
click rate
matching
Prior art date
Application number
PCT/CN2015/098247
Other languages
French (fr)
Chinese (zh)
Inventor
王涛
黄鹏
林锋
Original Assignee
阿里巴巴集团控股有限公司
王涛
黄鹏
林锋
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 王涛, 黄鹏, 林锋 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2016107455A1 publication Critical patent/WO2016107455A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present invention relates to the field of data processing technologies, and in particular, to an information matching processing method and apparatus.
  • e-commerce websites have developed rapidly.
  • a large amount of data or products are usually stored.
  • the website server often recommends a product matching the search term to the user according to the search term input by the user.
  • some products that match the search term with high quality, good quality, and are promoted are often recommended to the user.
  • sellers often choose good quality products for advertising promotion.
  • the corresponding search keyword needs to be purchased for the published product information. If the product information published by the seller matches the search keyword, the probability that the product is searched by the user is larger, and the buyer user also has a higher probability. It is more likely to find products that match the search term, so that you can get useful information in the information ocean.
  • accurately determining the matching degree between the product information and the search term can not only improve the effectiveness of the product promotion by the seller user, but also reduce the data interaction between the client and the server brought by the buyer user repeatedly searching for the product, thereby improving the user experience and improving at the same time. Server performance.
  • the method for judging the matching degree between product information and search words existing in the prior art is often to calculate the correlation between the search term and the advertisement product, determine the matching degree between the search term and the published product information according to the relevance score, and recommend the seller to purchase the match. High search keywords.
  • this method existing in the prior art only considers the relevance of the search term to the advertisement product, and does not consider the degree to which the advertisement product is preferred by the user, and thus the matching calculated by this is not accurate. Inaccurate matching calculation results not only lead to the seller not effectively promoting its products, but also the products recommended by the website to the buyer users are not products that exactly match their needs and interests. The buyer has to repeatedly search to get the real products. Interested products, which increase the data interaction between the client and the server, increase the data processing load of the server, and reduce the processing performance of the server. And seriously occupy valuable Internet bandwidth resources.
  • the present invention discloses an information matching processing method and device, which can improve the objectivity and accuracy of information matching, improve the user experience, reduce the data processing load of the server, improve the processing performance of the server, and save. Valuable Internet bandwidth resources.
  • a product information matching processing method comprising:
  • a product information matching processing apparatus comprising:
  • An obtaining unit configured to obtain each search keyword and product information, and combine the search keywords and product information into a search keyword and a product information feature pair;
  • a correlation gear determining unit configured to calculate a correlation between each of the search keyword and the product information feature pair, and determine a correlation gear position of each of the search keyword and the product information feature pair according to the correlation calculation result;
  • the estimated click rate gear determining unit is configured to calculate an estimated click rate of each of the search keyword and the product information feature pair, and determine, by using the quantile point, the pair of the search keyword and the product information feature Estimated click rate rate corresponding to the estimated click rate;
  • a matching determining unit configured to determine, according to the correlation gear position and the estimated click rate gear position, a score of each of the search keyword and product information feature pairs, wherein the score is used to represent the search keyword and The degree of matching of product information.
  • An advantageous aspect that can be achieved by an aspect of the embodiments of the present invention is that the method and apparatus provided by the present invention not only considers the relevance of the search keyword and the product information, but also considers the degree of matching between the search keyword and the product information.
  • the degree to which the product is preferred by the user, the estimated click-through rate factor that can objectively reflect the degree to which the product is preferred by the user is used to calculate the estimated click-through rate, and is also determined according to a preset proportional rule (for example, a normal distribution rule).
  • the click rate file corresponding to the probability that the advertisement product is clicked by the user under the search keyword, and the matching degree of the search keyword and the product information is comprehensively determined by the relevant gear position and the click rate gear position, thereby obtaining more accurate matching. result.
  • FIG. 1 is a schematic flowchart of an information matching processing method according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a standard normal distribution quantile table according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of an estimated click rate rate distribution according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of an information matching processing apparatus according to an embodiment of the present invention.
  • the invention discloses an information matching processing method and device, which not only considers the correlation between the search keyword and the product information, but also considers the degree to which the product is favored by the user, and introduces an estimated click that reflects the degree of the product being favored by the user.
  • the rate factor is used to calculate the estimated click rate, and according to the normal distribution law, the click rate file corresponding to the probability that the advertising product is clicked by the user under the search keyword is determined, and the relevant gear position and the click rate gear are comprehensively determined. Search for the match between keywords and product information to get more accurate matching results.
  • a seller needs to purchase a search keyword to promote an advertisement product
  • the method provided by the embodiment of the present invention can be applied to a website server for judging a search keyword and a seller.
  • the degree of matching of the published product information so as to recommend to the seller to purchase the search keyword with high matching degree, so as to improve the effectiveness of the seller user to promote the product, and further improve the probability that the seller user product is clicked by the buyer user; on the other hand, Improve the efficiency of buyers and users searching for products, reduce the data interaction between the client and the server brought by the repeated search of the product by the buyer, improve the user experience, reduce the data processing load of the server, improve the processing performance of the server, and save valuable Internet bandwidth. Resources.
  • FIG. 1 is a schematic flowchart diagram of an information matching processing method according to an embodiment of the present invention.
  • the seller's product information can be processed separately to obtain one or more words that can describe their product information, and search with Key words two-two composition search keywords and product information feature pairs.
  • the seller's product information includes MP3 player, iphone6, Note4, headphones, and so on.
  • Search keywords for mobile phones, the composition of the search keywords and product information features include (mobile phones, MP3 players), (mobile phones, iphone6), (mobile phones, Note4), (mobile phones, headsets).
  • the product information may specifically be advertisement product information.
  • the search keywords and product information may be preprocessed, and the preprocessing includes performing words required for matching each feature. Extraction of semantic features.
  • the specific processing manner may be various and is not limited herein.
  • step S102 and step S103 there is no necessary sequential execution order between step S102 and step S103, and the two may be executed in parallel or may be performed in reverse.
  • the correlation calculation is mainly obtained by searching the keyword category and the relevance of the article of the advertisement product and the text correlation.
  • the category relevance refers to the degree of matching between the click category of the search keyword and the category of the advertising product
  • the text relevance includes various aspects, mainly referring to the matching degree between the core word of the search keyword and the core word of the advertisement product title.
  • the attribute matching attribute in the search keyword and the attribute matching in the description of the advertisement product, and the comprehensive category matching and the text matching can obtain the relevance score.
  • the step S102 may include: performing matching determination on each feature of the search keyword and the product information feature pair; and determining the search keyword and product information feature according to the matching judgment result of each feature The relevant gear position.
  • the search keyword and the product information feature when performing the correlation calculation, perform matching determination of each feature: at least one of a category feature matching judgment and a text feature matching judgment.
  • the category feature matching is determined to determine whether the search keyword and the product information belong to the same category.
  • the category feature matching judgment generally refers to a category judgment according to the meaning of the text. If the search keyword category is the same as the category in which the product information is published, the result of the category feature matching judgment is “Yes”; otherwise, the result of the category feature matching judgment is “No”. Wherein, a special case in which the result of the category feature matching judgment is “No” is that the search keyword has no category, and for a search keyword without a category, the long tail is generally serious, and the long tail is very Search keywords that are less searched by users.
  • search keyword is “mp3” and the published product is “audio player”, the two belong to the same category, and the result of the category feature matching judgment is “yes”.
  • the search keyword is “mp3”, and the published product is “radio”, the two do not belong to the same category, and the result of the category feature matching judgment is “no”.
  • the text feature matching is determined to determine whether the search keyword and the text content of the published product information are associated.
  • the text feature matching judgment of the present invention includes: an exact matching judgment, a partial matching judgment, a central word matching judgment, a central word complete matching judgment, and a hidden word. At least one of a matching judgment and a reverse preposition matching judgment.
  • the text feature matching judgment may further include a method of extracting the text feature vector and calculating the similarity of the text vector by using the cosine angle formula. The invention is not limited thereto.
  • the correlation gears of the search keyword and the product information feature pair may be determined according to the matching judgment result of the features.
  • the correlation gear is divided into three grades of excellent difference.
  • the step S103 may include: predetermining a scale factor corresponding to each gear position of the estimated click rate gear; determining a value of the quantile according to the proportional coefficient; and according to the search keyword and the product information feature The estimated click rate of the pair and the value of the quantile determine the gear range in which the estimated click rate is located.
  • the quantile is a normal distribution quantile.
  • the standard normal distribution quantile is first introduced.
  • the standard normal distribution also known as the Gaussian distribution, is a normal distribution with 0 as the mean and 1 as the standard deviation, denoted as N(0,1), which is a probability distribution curve showing a bell shape, which is small at both ends. The middle is large, and the total area under the curve is 1, which is defined as: if it changes randomly
  • the quantity X obeys a probability distribution with a positional parameter of ⁇ and a scale parameter of ⁇ , which is recorded as:
  • f is subject to 0 as the average and 1 is the standard normal distribution of the standard deviation.
  • the normal distribution quantile is used to characterize the rule of the curve area under the normal distribution.
  • the area of 68.268949% under the function curve is within a standard deviation of the average.
  • the area of 95.449974% is within the range of two standard deviations of 2 ⁇ around the mean.
  • the area of 99.730020% is within the range of three standard deviations of 3 ⁇ around the average.
  • the area of 99.993666% is within the range of four standard deviations of 4 ⁇ around the mean.
  • the invention divides the gear position of the estimated click rate by applying the normal distribution law.
  • the estimated click rate eCTR is to establish a mathematical probability model through multiple exposures and clicks in history, and use the model to predict whether a future exposure will produce a click.
  • the final value is given to a certain Under the word, the probability that a product is clicked by the user after exposure, therefore, it is a value between 0 and 1. The larger the value, the more likely it is to be clicked.
  • the eCTR is estimated using the industry standard LR model, which includes feature extraction and model training.
  • the calculating the estimated click rate of each of the search keyword and the product information feature pair comprises: extracting features of the search keyword and the product information feature pair, and obtaining feature weights corresponding to each feature according to the training model;
  • the estimated click rate is calculated by the feature and the feature weight corresponding to the feature.
  • the feature extraction feature includes one or any combination of the following: text information of the search keyword, category information of the search keyword, title of the product information, and the The attribute of the product information, the relevance of the search keyword to the product information.
  • the estimated click rate eCTR of the advertisement pair (Query, offer) can be estimated.
  • Query is the search keyword
  • offer is the product information.
  • the LR model belongs to the generalized linear model, which is obtained by changing the linear model through the Logistic formula.
  • w i is the feature weight
  • f i is the feature value
  • y is the final calculated estimated click rate. The formula limits the final result to (0, 1), which coincides with the click probability.
  • the estimated eCTR should conform to the Gaussian normal distribution.
  • the eCTR is used to classify the eCTR of the ad pair using the keyword and the global dimension.
  • the eCTR of each ad pair will fall on the corresponding interval of the overall eCTR distribution.
  • the interval determines the estimated click rate file for the pair of ads.
  • the rating of the advertising products of most customers is averaged, and the advertising products of a small number of customers are at a better or worse level.
  • the estimated click rate gears are divided into good, medium, and poor gears, and the scale coefficients corresponding to each gear are 3:4:3, that is, The proportion of good advertising products is 30%, the proportion of advertising products in the gear position is 40%, and the proportion of advertising products with poor gear is 30%.
  • the corresponding score is 5 stars. 4 stars and 3 stars.
  • Figure 3 is a schematic diagram of the estimated click rate. Wherein, the abscissa is the estimated click rate value, the ordinate is the frequency, and the curve area corresponds to the probability (ie, the proportional value).
  • is the average
  • is the standard deviation
  • Z a is the normal distribution quantile
  • Z ⁇ is a value.
  • a1 and a2 correspond to two quantized points of the standard normal distribution, respectively, and can be respectively corresponding to Z ⁇ 1 and Z ⁇ 2 by the scale values in Fig. 3, and Z ⁇ 1 can be obtained by the above method.
  • A2 and a1 correspond to the two quantiles in the normal distribution.
  • the estimated click rate is in accordance with the general normal distribution law.
  • is not equal to 0, ⁇ is not equal to 1
  • the corresponding quantile can be approximated by the law of the normal distribution quantile, and the general normal distribution corresponds to the ratio 3:4:
  • the quantile of 3 thus gives the following formula:
  • is the average and ⁇ is the standard deviation.
  • ⁇ and ⁇ can be calculated by actual data samples. Specifically, after obtaining the estimated click-through rate value, the average value ⁇ of all the estimated click-through rates and the corresponding variance ⁇ can be obtained. For the specific calculation method, refer to the existing method. Then, based on the average value ⁇ and the variance ⁇ , the value of the general normal distribution quantile is obtained according to the formula (4).
  • the range of the estimated click rate may be determined according to the estimated click rate and the value of the normal distribution quantile. For example, when the estimated hit rate is (0, ⁇ - ⁇ /2) according to the standard normal distribution quantile table, the corresponding estimated click rate is the difference; the estimated click rate belongs to ( ⁇ - ⁇ / 2, ⁇ + ⁇ /2), the corresponding estimated click rate is medium; when the estimated click rate is [ ⁇ + ⁇ /2,1), the corresponding estimated click rate is it is good.
  • S104 Determine, according to the correlation gear position and the estimated click rate gear position, a score of each of the search keyword and the product information feature pair, where the score is used to represent the matching degree between the search keyword and the product information. .
  • the specific calculation method of the score may be various, for example, a weighted average method is used to obtain a score or other implementation manner, which is not limited by the present invention.
  • the correlation calculation and the estimated click rate are combined to calculate the matching degree between the search keyword and the advertisement product, not only to inform the seller user of the advertisement quality and the matching degree, but also objectively reflect that the buyer user searches for the product on the website.
  • the higher the rate the more effective the seller is to promote the product.
  • advertisers' optimization of advertisements will lead to an increase in product quality.
  • the direct result is that users' experience on the website will become better, and the data interaction between the client and the server will be less.
  • the data processing load of the server improves the processing performance of the server and saves valuable Internet bandwidth resources.
  • FIG. 4 is a schematic diagram of a product information matching processing apparatus according to an embodiment of the present invention.
  • a product information matching processing device 400 comprising:
  • the obtaining unit 401 is configured to obtain each search keyword and product information, and combine the search keywords and product information into a search keyword and a product information feature pair.
  • the correlation gear determining unit 402 is configured to calculate a correlation between each of the search keyword and the product information feature pair, and determine a correlation gear position of each of the search keyword and the product information feature pair according to the correlation calculation result.
  • the estimated click rate gear determining unit 403 is configured to calculate an estimated click rate of each of the search keyword and product information feature pairs, and use the quantile to determine an estimate of each of the search keywords and product information feature pairs.
  • the matching determining unit 404 is configured to determine a score of each of the search keyword and the product information feature pair according to the relevance gear and the estimated click rate gear, the score being used to represent the search keyword The degree of matching with product information.
  • the estimated click rate rate determining unit includes an estimated click rate calculating subunit and a gear determining subunit, wherein the estimated click rate calculating subunit comprises:
  • a model establishing subunit for performing feature extraction on the search keyword and the product information feature pair Taking, according to the training model, obtaining feature weights corresponding to each feature;
  • the calculating subunit is configured to calculate the estimated click rate by using the extracted feature and the feature weight corresponding to the feature.
  • the feature extracted by the model establishing subunit includes one or any combination of the following: text information of the search keyword, category information of the search keyword, title of the product information, The attribute of the product information, the relevance of the search keyword to the product information.
  • the estimated click rate gear determining unit includes an estimated click rate calculating subunit and a gear determining subunit, wherein the gear determining unit includes:
  • a proportional coefficient determining subunit configured to predetermine a scale factor corresponding to each gear position of the estimated click rate gear position
  • a gear interval determining subunit configured to determine, according to the estimated click rate of each of the search keyword and the product information feature pair and the value of the quantile, the gear range in which the estimated click rate is located.
  • the quantile is a normal distribution quantile.
  • the correlation gear determining unit includes:
  • a feature matching sub-unit configured to perform matching matching between the search keyword and the product information feature pair
  • Determining a subunit configured to determine a correlation gear of the search keyword and the product information feature pair according to the matching judgment result of the each feature.
  • the matching determination of each feature performed by the feature matching subunit includes: at least one of a category feature matching judgment and a text feature matching judgment;
  • the category feature matching is determined to determine whether the search keyword and the product information belong to the same category
  • the text feature matching is determined to determine whether the search keyword and the text content of the product information are associated.
  • the invention may be described in the general context of computer-executable instructions executed by a computer, such as a program module.
  • program modules include routines, programs, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are connected through a communication network.
  • program modules can be located in both local and remote computer storage media including storage devices.

Abstract

Disclosed is an information matching processing method. The method comprises: acquiring various search key words and various pieces of product information, and search key word and product information feature pairs being constituted of the various search key words and the various pieces of product information in pairs (S101); calculating the correlation of each of the search key word and product information feature pairs, and determining a correlation level of each of the search key word and product information feature pairs according to a correlation calculation result (S102); calculating a pre-estimated click rate of each of the search key word and product information feature pairs, and determining a pre-estimated click rate level corresponding to the pre-estimated click rate of each of the search key word and product information feature pairs by using a quantile (S103); and according to the correlation level and the pre-estimated click rate level, determining a score of each of the search key word and product information feature pairs, the score being used for characterizing a matching degree between the search key word and the product information (S104).

Description

一种信息匹配处理方法和装置Information matching processing method and device 技术领域Technical field
本发明涉及数据处理技术领域,特别是涉及一种信息匹配处理方法和装置。The present invention relates to the field of data processing technologies, and in particular, to an information matching processing method and apparatus.
背景技术Background technique
随着计算机和互联网技术的发展,电子商务网站得到了迅速的发展。在电子商务网站中通常存储有海量的数据或者产品,为了提高用户搜索感兴趣产品的效率,网站服务器往往会根据用户输入的搜索词,向用户推荐与所述搜索词匹配的产品。在向用户推荐的与搜索词匹配的产品中,一些与搜索词匹配度高、质量好且进行了广告推广的产品往往会被优先推荐给用户。而卖家为了提高产品销售量往往会选择质量好的产品进行广告推广。卖家进行广告推广时,需要为发布的产品信息购买相应的搜索关键词,如果卖家发布的产品信息与搜索关键词的匹配度越高,产品被用户搜索的几率则越大,而买家用户也更可能找到与搜索词匹配的产品,从而能够在信息海洋中获取到有用的信息。With the development of computer and Internet technologies, e-commerce websites have developed rapidly. In an e-commerce website, a large amount of data or products are usually stored. In order to improve the efficiency of a user searching for a product of interest, the website server often recommends a product matching the search term to the user according to the search term input by the user. Among the products recommended to the user that match the search term, some products that match the search term with high quality, good quality, and are promoted are often recommended to the user. In order to increase the sales volume of products, sellers often choose good quality products for advertising promotion. When the seller promotes the advertisement, the corresponding search keyword needs to be purchased for the published product information. If the product information published by the seller matches the search keyword, the probability that the product is searched by the user is larger, and the buyer user also has a higher probability. It is more likely to find products that match the search term, so that you can get useful information in the information ocean.
因此,准确判断产品信息与搜索词的匹配度,不仅可以提高卖家用户推广产品的有效性,还可以减少买家用户反复搜索产品带来的客户端与服务器的数据交互,提高用户体验,同时提升服务器的性能。Therefore, accurately determining the matching degree between the product information and the search term can not only improve the effectiveness of the product promotion by the seller user, but also reduce the data interaction between the client and the server brought by the buyer user repeatedly searching for the product, thereby improving the user experience and improving at the same time. Server performance.
现有技术存在的判断产品信息与搜索词的匹配度方法,往往是通过计算搜索词与广告产品的相关性,根据所述相关性分数判断搜索词和发布产品信息的匹配度,推荐卖家购买匹配度高的搜索关键词。The method for judging the matching degree between product information and search words existing in the prior art is often to calculate the correlation between the search term and the advertisement product, determine the matching degree between the search term and the published product information according to the relevance score, and recommend the seller to purchase the match. High search keywords.
然而,现有技术存在的这种方法,只考虑搜索词与广告产品的相关性,而未考虑广告产品被用户偏好的程度,因此由此计算的匹配性并不准确。不准确的匹配性计算结果不仅导致卖家未能有效推广其产品,也导致网站向买家用户推荐的产品并不是与其需求、兴趣完全匹配的产品,买家不得不反复检索才能够获取到其真正感兴趣的产品,从而增加了用户所在客户端与服务器的数据交互,加大了服务器的数据处理负载,降低了服务器的处理性能, 并严重占用了宝贵的互联网带宽资源。However, this method existing in the prior art only considers the relevance of the search term to the advertisement product, and does not consider the degree to which the advertisement product is preferred by the user, and thus the matching calculated by this is not accurate. Inaccurate matching calculation results not only lead to the seller not effectively promoting its products, but also the products recommended by the website to the buyer users are not products that exactly match their needs and interests. The buyer has to repeatedly search to get the real products. Interested products, which increase the data interaction between the client and the server, increase the data processing load of the server, and reduce the processing performance of the server. And seriously occupy valuable Internet bandwidth resources.
发明内容Summary of the invention
为解决上述技术问题,本发明公开了一种信息匹配处理方法和装置,可提高信息匹配的客观性和准确性,提高了用户体验,降低了服务器的数据处理负载,提高服务器的处理性能,节省宝贵的互联网带宽资源。In order to solve the above technical problem, the present invention discloses an information matching processing method and device, which can improve the objectivity and accuracy of information matching, improve the user experience, reduce the data processing load of the server, improve the processing performance of the server, and save. Valuable Internet bandwidth resources.
技术方案如下:The technical solutions are as follows:
根据本发明实施例的第一方面,公开了一种产品信息匹配处理方法,所述方法包括:According to a first aspect of the embodiments of the present invention, a product information matching processing method is disclosed, the method comprising:
获取各搜索关键词和产品信息,并将所述各搜索关键词和产品信息两两组成搜索关键词和产品信息特征对;Obtaining each search keyword and product information, and combining the search keywords and product information into a search keyword and a product information feature pair;
计算各所述搜索关键词和产品信息特征对的相关性,根据相关性计算结果确定各所述搜索关键词和产品信息特征对的相关性档位;Calculating a correlation between each of the search keyword and the product information feature pair, and determining a correlation gear position of each of the search keyword and the product information feature pair according to the correlation calculation result;
计算各所述搜索关键词和产品信息特征对的预估点击率,利用分位点确定与各所述搜索关键词和产品信息特征对的预估点击率对应的预估点击率档位;Calculating an estimated click rate of each of the search keyword and the product information feature pair, and using the quantile to determine an estimated click rate file corresponding to the estimated click rate of each of the search keyword and the product information feature pair;
根据所述相关性档位和所述预估点击率档位确定各所述搜索关键词和产品信息特征对的评分,所述评分用于表征所述搜索关键词与产品信息的匹配程度。And determining, according to the correlation gear position and the estimated click rate gear position, a score of each of the search keyword and product information feature pairs, the score being used to represent a degree of matching between the search keyword and product information.
根据本发明实施例的第二方面,公开了一种产品信息匹配处理装置,所述装置包括:According to a second aspect of the embodiments of the present invention, a product information matching processing apparatus is disclosed, the apparatus comprising:
获取单元,用于获取各搜索关键词和产品信息,并将所述各搜索关键词和产品信息两两组成搜索关键词和产品信息特征对;An obtaining unit, configured to obtain each search keyword and product information, and combine the search keywords and product information into a search keyword and a product information feature pair;
相关性档位确定单元,用于计算各所述搜索关键词和产品信息特征对的相关性,根据相关性计算结果确定各所述搜索关键词和产品信息特征对的相关性档位;a correlation gear determining unit, configured to calculate a correlation between each of the search keyword and the product information feature pair, and determine a correlation gear position of each of the search keyword and the product information feature pair according to the correlation calculation result;
预估点击率档位确定单元,用于计算各所述搜索关键词和产品信息特征对的预估点击率,利用分位点确定与各所述搜索关键词和产品信息特征对的 预估点击率对应的预估点击率档位;The estimated click rate gear determining unit is configured to calculate an estimated click rate of each of the search keyword and the product information feature pair, and determine, by using the quantile point, the pair of the search keyword and the product information feature Estimated click rate rate corresponding to the estimated click rate;
匹配性确定单元,用于根据所述相关性档位和所述预估点击率档位确定各所述搜索关键词和产品信息特征对的评分,所述评分用于表征所述搜索关键词与产品信息的匹配程度。a matching determining unit, configured to determine, according to the correlation gear position and the estimated click rate gear position, a score of each of the search keyword and product information feature pairs, wherein the score is used to represent the search keyword and The degree of matching of product information.
本发明实施例的一个方面能够达到的有益效果为:本发明提供的方法和装置,在确定搜索关键词与产品信息的匹配程度时,不仅考虑了搜索关键词与产品信息的相关性,还考虑了产品被用户偏好的程度,引入了能够客观反映产品被用户偏好的程度的预估点击率因子进行预估点击率计算,并还根据预设的比例规则(例如,正态分布规律)确定该广告产品在该搜索关键词下被用户点击的概率所对应的点击率档位,由相关性档位和点击率档位综合确定搜索关键词与产品信息的匹配程度,从而得到更加准确的匹配性结果。由此,不仅可以提高卖家用户推广产品的有效性,还可以减少买家用户反复搜索产品带来的客户端与服务器的数据交互,提高用户体验,降低了服务器的数据处理负载,提高服务器的处理性能,节省宝贵的互联网带宽资源。An advantageous aspect that can be achieved by an aspect of the embodiments of the present invention is that the method and apparatus provided by the present invention not only considers the relevance of the search keyword and the product information, but also considers the degree of matching between the search keyword and the product information. The degree to which the product is preferred by the user, the estimated click-through rate factor that can objectively reflect the degree to which the product is preferred by the user is used to calculate the estimated click-through rate, and is also determined according to a preset proportional rule (for example, a normal distribution rule). The click rate file corresponding to the probability that the advertisement product is clicked by the user under the search keyword, and the matching degree of the search keyword and the product information is comprehensively determined by the relevant gear position and the click rate gear position, thereby obtaining more accurate matching. result. Therefore, not only can the seller user improve the effectiveness of the product promotion, but also reduce the data interaction between the client and the server brought by the buyer user repeatedly searching for the product, improve the user experience, reduce the data processing load of the server, and improve the processing of the server. Performance, saving valuable internet bandwidth resources.
附图说明DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a few embodiments described in the present invention, and other drawings can be obtained from those skilled in the art without any inventive effort.
图1为本发明实施例提供的一种信息匹配处理方法流程示意图;1 is a schematic flowchart of an information matching processing method according to an embodiment of the present invention;
图2为本发明实施例提供的标准正态分布分位表示意图;2 is a schematic diagram of a standard normal distribution quantile table according to an embodiment of the present invention;
图3为本发明实施例提供的预估点击率档位分布示意图;FIG. 3 is a schematic diagram of an estimated click rate rate distribution according to an embodiment of the present invention; FIG.
图4为本发明实施例提供的信息匹配处理装置示意图。FIG. 4 is a schematic diagram of an information matching processing apparatus according to an embodiment of the present invention.
具体实施方式detailed description
为了使本技术领域的人员更好地理解本发明中的技术方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施 例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。In order to make those skilled in the art better understand the technical solutions in the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described in conjunction with the accompanying drawings in the embodiments of the present invention. The embodiment is only a part of the embodiment of the invention, not the entire implementation. example. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts shall fall within the scope of the present invention.
本发明公开了一种信息匹配处理方法和装置,不仅考虑了搜索关键词与产品信息的相关性,还考虑了产品被用户偏好的程度,引入了能够反映产品被用户偏好的程度的预估点击率因子进行预估点击率计算,并根据正态分布规律确定该广告产品在该搜索关键词下被用户点击的概率所对应的点击率档位,由相关性档位和点击率档位综合确定搜索关键词与产品信息的匹配程度,从而得到更加准确的匹配性结果。The invention discloses an information matching processing method and device, which not only considers the correlation between the search keyword and the product information, but also considers the degree to which the product is favored by the user, and introduces an estimated click that reflects the degree of the product being favored by the user. The rate factor is used to calculate the estimated click rate, and according to the normal distribution law, the click rate file corresponding to the probability that the advertising product is clicked by the user under the search keyword is determined, and the relevant gear position and the click rate gear are comprehensively determined. Search for the match between keywords and product information to get more accurate matching results.
在本发明的一种应用场景中,在电子商务类网站,卖家需要购买搜索关键词来推广其广告产品,本发明实施例提供的方法可以应用于网站服务器端,用于判断搜索关键词与卖家发布的产品信息的匹配程度,从而向卖家推荐购买匹配度高的搜索关键词,以提高卖家用户推广产品的有效性,进一步提高卖家用户产品被买家用户点击的概率;另一方面,也可以提高买家用户搜索产品的效率,减少买家用户反复搜索产品带来的客户端与服务器的数据交互,提高用户体验,降低了服务器的数据处理负载,提高服务器的处理性能,节省宝贵的互联网带宽资源。In an application scenario of the present invention, in an e-commerce website, a seller needs to purchase a search keyword to promote an advertisement product, and the method provided by the embodiment of the present invention can be applied to a website server for judging a search keyword and a seller. The degree of matching of the published product information, so as to recommend to the seller to purchase the search keyword with high matching degree, so as to improve the effectiveness of the seller user to promote the product, and further improve the probability that the seller user product is clicked by the buyer user; on the other hand, Improve the efficiency of buyers and users searching for products, reduce the data interaction between the client and the server brought by the repeated search of the product by the buyer, improve the user experience, reduce the data processing load of the server, improve the processing performance of the server, and save valuable Internet bandwidth. Resources.
参见图1,为本发明实施例提供的一种信息匹配处理方法流程示意图。FIG. 1 is a schematic flowchart diagram of an information matching processing method according to an embodiment of the present invention.
S101,获取各搜索关键词和产品信息,并将所述各搜索关键词和产品信息两两组成搜索关键词和产品信息特征对。S101. Acquire each search keyword and product information, and combine the search keywords and product information into a search keyword and a product information feature pair.
通常对于卖家而言,其经营的产品是多样的,可能属于不同的类目,这时,可以针对卖家的产品信息分别进行处理,获取一个或多个能够描述其产品信息的词语,并与搜索关键词两两组成搜索关键词和产品信息特征对。例如,卖家的产品信息包括MP3播放器、iphone6,Note4,耳机等。搜索关键词为手机,则组成的搜索关键词和产品信息特征对就包括(手机,MP3播放器),(手机,iphone6),(手机,Note4),(手机,耳机)。当然,以上仅为示例性说明,不视为对本发明的限制。其中,所述产品信息具体可以为广告产品信息。Usually for sellers, the products they operate are diverse and may belong to different categories. In this case, the seller's product information can be processed separately to obtain one or more words that can describe their product information, and search with Key words two-two composition search keywords and product information feature pairs. For example, the seller's product information includes MP3 player, iphone6, Note4, headphones, and so on. Search keywords for mobile phones, the composition of the search keywords and product information features include (mobile phones, MP3 players), (mobile phones, iphone6), (mobile phones, Note4), (mobile phones, headsets). Of course, the above is merely illustrative and is not to be construed as limiting the invention. The product information may specifically be advertisement product information.
需要说明的是,在执行步骤S102和步骤S103之前,可以对所述各搜索关键词和产品信息进行预处理,所述预处理包括进行各项特征匹配所需的语 义特征的抽取处理。具体处理的方式可以是多样的,在此不进行限定。It should be noted that, before performing step S102 and step S103, the search keywords and product information may be preprocessed, and the preprocessing includes performing words required for matching each feature. Extraction of semantic features. The specific processing manner may be various and is not limited herein.
此外,步骤S102和步骤S103之间并没有必然的先后执行顺序,二者可以并行地执行,也可以颠倒地执行。In addition, there is no necessary sequential execution order between step S102 and step S103, and the two may be executed in parallel or may be performed in reverse.
S102,计算各所述搜索关键词和产品信息特征对的相关性,根据相关性计算结果确定各所述搜索关键词和产品信息特征对的相关性档位。S102. Calculate a correlation between each of the search keyword and the product information feature pair, and determine a correlation gear position of each of the search keyword and the product information feature pair according to the correlation calculation result.
其中,相关性的计算主要通过搜索关键词与广告产品的类目相关性以及文本相关性得到。其中,类目相关性是指搜索关键词的点击类目与广告产品所在类目的匹配程度;文本相关性包括多方面,主要是指搜索关键词的核心词与广告产品标题的核心词匹配程度以及搜索关键词中出现的属性与广告产品描述中的属性匹配度,综合类目匹配与文本匹配即可得到相关性分数。Among them, the correlation calculation is mainly obtained by searching the keyword category and the relevance of the article of the advertisement product and the text correlation. Among them, the category relevance refers to the degree of matching between the click category of the search keyword and the category of the advertising product; the text relevance includes various aspects, mainly referring to the matching degree between the core word of the search keyword and the core word of the advertisement product title. And the attribute matching attribute in the search keyword and the attribute matching in the description of the advertisement product, and the comprehensive category matching and the text matching can obtain the relevance score.
具体实现时,步骤S102具体可以包括:将所述搜索关键词和产品信息特征对进行各项特征的匹配判断;根据所述各项特征的匹配判断结果,确定所述搜索关键词和产品信息特征对的相关性档位。In a specific implementation, the step S102 may include: performing matching determination on each feature of the search keyword and the product information feature pair; and determining the search keyword and product information feature according to the matching judgment result of each feature The relevant gear position.
具体实现时,在进行相关性计算时,所述搜索关键词和产品信息特征对进行各项特征的匹配判断:类目特征匹配判断和文本特征匹配判断两者至少一种。In a specific implementation, when performing the correlation calculation, the search keyword and the product information feature perform matching determination of each feature: at least one of a category feature matching judgment and a text feature matching judgment.
进一步,所述类目特征匹配判断为判断所述搜索关键词和产品信息是否属于同一类目。在本发明一具体实现中,所述类目特征匹配判断通常指按照文本含义所进行的类目判断。如所述搜索关键词类目同发布产品信息的类目相同,则类目特征匹配判断的结果为“是”,否则,类目特征匹配判断的结果为“否”。其中,类目特征匹配判断的结果为“否”的一种特殊情况是所述搜索关键词没有类目,对于没有类目的搜索关键词通常是其长尾比较严重,所述长尾即很少被用户搜索的搜索关键词。例如,所述搜索关键词为“mp3”,而发布产品为“音频播放器”,则两者属于同一类目,类目特征匹配判断的结果为“是”。所述搜索关键词为“mp3”,而发布产品为“收音机”,则两者不属于同一类目,类目特征匹配判断的结果为“否”。Further, the category feature matching is determined to determine whether the search keyword and the product information belong to the same category. In a specific implementation of the present invention, the category feature matching judgment generally refers to a category judgment according to the meaning of the text. If the search keyword category is the same as the category in which the product information is published, the result of the category feature matching judgment is “Yes”; otherwise, the result of the category feature matching judgment is “No”. Wherein, a special case in which the result of the category feature matching judgment is “No” is that the search keyword has no category, and for a search keyword without a category, the long tail is generally serious, and the long tail is very Search keywords that are less searched by users. For example, if the search keyword is “mp3” and the published product is “audio player”, the two belong to the same category, and the result of the category feature matching judgment is “yes”. The search keyword is “mp3”, and the published product is “radio”, the two do not belong to the same category, and the result of the category feature matching judgment is “no”.
进一步,所述文本特征匹配判断为判断所述搜索关键词和发布产品信息的文本内容是否相关联。具体地,本发明所述文本特征匹配判断包括:完全匹配判断、部分匹配判断、中心词匹配判断、中心词完全匹配判断、隐藏词 匹配判断以及反向介词匹配判断中至少一种。当然,文本特征匹配判断还可以包括提取文本特征向量,利用余弦夹角公式计算文本向量的相似性的方法。本发明对此不进行限定。Further, the text feature matching is determined to determine whether the search keyword and the text content of the published product information are associated. Specifically, the text feature matching judgment of the present invention includes: an exact matching judgment, a partial matching judgment, a central word matching judgment, a central word complete matching judgment, and a hidden word. At least one of a matching judgment and a reverse preposition matching judgment. Of course, the text feature matching judgment may further include a method of extracting the text feature vector and calculating the similarity of the text vector by using the cosine angle formula. The invention is not limited thereto.
在根据搜索关键词和产品信息特征对进行各项特征的匹配判断后,即可以根据所述各项特征的匹配判断结果,确定所述搜索关键词和产品信息特征对的相关性档位。在本发明中,相关性档位划分为优良差三档。After determining the matching of the features according to the search keyword and the product information feature, the correlation gears of the search keyword and the product information feature pair may be determined according to the matching judgment result of the features. In the present invention, the correlation gear is divided into three grades of excellent difference.
如表1所示,为相关性档位划分的一种示意性说明,当然还可以采用其他档位划分方法,在此不进行限定。As shown in Table 1, a schematic description of the division of the relevant gears, of course, other gear division methods may also be adopted, which are not limited herein.
表1Table 1
Figure PCTCN2015098247-appb-000001
Figure PCTCN2015098247-appb-000001
S103,计算各所述搜索关键词和产品信息特征对的预估点击率,利用分位点确定与各所述搜索关键词和产品信息特征对的预估点击率对应的预估点击率档位。S103. Calculate an estimated click rate of each of the search keyword and the product information feature pair, and use the quantile to determine an estimated click rate file corresponding to the estimated click rate of each of the search keyword and the product information feature pair. .
具体实现时,步骤S103可以包括:预先确定预估点击率档位各档位对应的比例系数;根据所述比例系数确定分位点的数值;根据所述各所述搜索关键词和产品信息特征对的预估点击率以及所述分位点的数值确定所述预估点击率所在的档位区间。In a specific implementation, the step S103 may include: predetermining a scale factor corresponding to each gear position of the estimated click rate gear; determining a value of the quantile according to the proportional coefficient; and according to the search keyword and the product information feature The estimated click rate of the pair and the value of the quantile determine the gear range in which the estimated click rate is located.
优选地,所述分位点为正态分布分位点。Preferably, the quantile is a normal distribution quantile.
下面结合一个实例进行详细地说明。The following is described in detail with reference to an example.
首先对标准正态分布分位点进行介绍。标准正态分布又称为高斯分布,是以0为均数、以1为标准差的正态分布,记为N(0,1),其是一个呈现钟形的概率分布曲线,两头小,中间大,曲线下的总面积为1,其定义为:若随机变 量X服从一个位置参数为μ、尺度参数为σ的概率分布,记为:The standard normal distribution quantile is first introduced. The standard normal distribution, also known as the Gaussian distribution, is a normal distribution with 0 as the mean and 1 as the standard deviation, denoted as N(0,1), which is a probability distribution curve showing a bell shape, which is small at both ends. The middle is large, and the total area under the curve is 1, which is defined as: if it changes randomly The quantity X obeys a probability distribution with a positional parameter of μ and a scale parameter of σ, which is recorded as:
X~N(μ,σ2)      (1)X~N(μ,σ 2 ) (1)
其概率密度函数为Its probability density function is
Figure PCTCN2015098247-appb-000002
Figure PCTCN2015098247-appb-000002
则称f服从0为平均数,1为标准差的标准正态分布。Then, f is subject to 0 as the average and 1 is the standard normal distribution of the standard deviation.
正态分布分位点用于刻画正态分布下的曲线面积符合的规律,标准正态分布的上α分位点定义:设X~N(0,1),对于任给的α,(0<α<1),称满足P(X>Za)=α的点Za为标准正态分布的上α分位点。如查图2所示的正态分布表示意图,当Za=1,查出α=0.158655。The normal distribution quantile is used to characterize the rule of the curve area under the normal distribution. The upper α quantile of the standard normal distribution is defined as: X~N(0,1), for any given α, (0 <α<1), a point Za that satisfies P(X>Z a )=α is referred to as an upper α-fraction of a standard normal distribution. For example, look at the diagram of the normal distribution table shown in Figure 2. When Z a =1, find α = 0.158655.
正态分布常用的分位点有如下规律:The commonly used quantile of a normal distribution has the following rules:
函数曲线下68.268949%的面积在平均数左右的一个标准差范围内。The area of 68.268949% under the function curve is within a standard deviation of the average.
95.449974%的面积在平均数左右两个标准差2σ的范围内。The area of 95.449974% is within the range of two standard deviations of 2σ around the mean.
99.730020%的面积在平均数左右三个标准差3σ的范围内。The area of 99.730020% is within the range of three standard deviations of 3σ around the average.
99.993666%的面积在平均数左右四个标准差4σ的范围内。The area of 99.993666% is within the range of four standard deviations of 4σ around the mean.
本发明正是应用了正态分布规律进行了预估点击率的档位划分。The invention divides the gear position of the estimated click rate by applying the normal distribution law.
其中,预估点击率eCTR是通过对历史上的多次曝光和点击行为建立数学概率模型,并通过该模型来对未来的曝光是否产生点击来进行预测,最终给出的值是指在某个词下,某个产品曝光后被用户点击的概率,因此,其是一个0~1之间的值,值越大则说明被点击可能性越大。Among them, the estimated click rate eCTR is to establish a mathematical probability model through multiple exposures and clicks in history, and use the model to predict whether a future exposure will produce a click. The final value is given to a certain Under the word, the probability that a product is clicked by the user after exposure, therefore, it is a value between 0 and 1. The larger the value, the more likely it is to be clicked.
eCTR的预估采用业界标准的LR模型,LR模型包括特征提取和模型训练两个部分。其中,计算各所述搜索关键词和产品信息特征对的预估点击率包括:对所述搜索关键词和产品信息特征对进行特征提取,根据训练模型得到每一特征对应的特征权重;利用提取的特征以及所述特征对应的特征权重计算预估点击率。The eCTR is estimated using the industry standard LR model, which includes feature extraction and model training. The calculating the estimated click rate of each of the search keyword and the product information feature pair comprises: extracting features of the search keyword and the product information feature pair, and obtaining feature weights corresponding to each feature according to the training model; The estimated click rate is calculated by the feature and the feature weight corresponding to the feature.
其中,特征提取的特征包括以下所列中的一种或任意结合:所述搜索关键词的文本信息、所述搜索关键词的类目信息、所述产品信息的标题、所述 产品信息的属性、所述搜索关键词与所述产品信息的相关性。The feature extraction feature includes one or any combination of the following: text information of the search keyword, category information of the search keyword, title of the product information, and the The attribute of the product information, the relevance of the search keyword to the product information.
然后,通过模型训练得到特征权重后,就可以估算出广告对(Query,offer)的预估点击率eCTR。其中,Query为搜索关键词,offer为产品信息。Then, after the feature weight is obtained through the model training, the estimated click rate eCTR of the advertisement pair (Query, offer) can be estimated. Among them, Query is the search keyword and offer is the product information.
LR模型属于广义线性模型,它是线性模型经过Logistic公式变化而得,具体如表达式为:The LR model belongs to the generalized linear model, which is obtained by changing the linear model through the Logistic formula.
Figure PCTCN2015098247-appb-000003
Figure PCTCN2015098247-appb-000003
其中,wi为特征权重,fi为特征值,y为最终计算的预估点击率,公式将最终结果限定为(0,1)之间,正好与点击概率相吻合。Where w i is the feature weight, f i is the feature value, and y is the final calculated estimated click rate. The formula limits the final result to (0, 1), which coincides with the click probability.
理论上,预估准确的eCTR应符合高斯正态分布,使用关键词和全局的维度对广告对的eCTR划分档位,每一广告对的eCTR,其定会落在整体eCTR分布的对应区间上,该区间即决定了该广告对所对应的预估点击率档位。按照本发明提供的预估点击率档位划分方法,可保证大部分客户的广告产品的评分处于平均水平,小部分客户的广告产品处于较好或较差的水平。In theory, the estimated eCTR should conform to the Gaussian normal distribution. The eCTR is used to classify the eCTR of the ad pair using the keyword and the global dimension. The eCTR of each ad pair will fall on the corresponding interval of the overall eCTR distribution. The interval determines the estimated click rate file for the pair of ads. According to the method for estimating the click rate rate provided by the present invention, the rating of the advertising products of most customers is averaged, and the advertising products of a small number of customers are at a better or worse level.
在本发明实施例中,根据实际业务分析和经验确定,确定将预估点击率档位划分为好、中、差3档,每一档位对应的比例系数分别为3:4:3,即档位为好的广告产品所占比例为30%,档位为中的广告产品所占比例为40%,档位为差的广告产品所占比例为30%,分别对应的评分是5星,4星和3星。具体请参照图3,为预估点击率档位划分示意图。其中,横坐标为预估点击率值,纵坐标为频次,曲线面积对应概率(即比例值)。In the embodiment of the present invention, according to actual business analysis and empirical determination, it is determined that the estimated click rate gears are divided into good, medium, and poor gears, and the scale coefficients corresponding to each gear are 3:4:3, that is, The proportion of good advertising products is 30%, the proportion of advertising products in the gear position is 40%, and the proportion of advertising products with poor gear is 30%. The corresponding score is 5 stars. 4 stars and 3 stars. For details, please refer to Figure 3, which is a schematic diagram of the estimated click rate. Wherein, the abscissa is the estimated click rate value, the ordinate is the frequency, and the curve area corresponds to the probability (ie, the proportional value).
具体实现时,当按照3:4:3的比例划分全局或关键词维度预估点击率eCTR分布时,要求偏离平均数一定范围的曲线下分布面积为0.4,两侧由于对称关系,则各为0.3,按照正态分布常用分位点的规律可得:In the specific implementation, when the global or keyword dimension estimated click-through rate eCTR distribution is divided according to the ratio of 3:4:3, the distribution area under the curve that deviates from the average range is 0.4, and the two sides are symmetrically 0.3, according to the law of common distribution of commonly used quantile points:
Figure PCTCN2015098247-appb-000004
Figure PCTCN2015098247-appb-000004
其中,μ为平均数,σ为标准差,Za为正态分布分位点。Where μ is the average, σ is the standard deviation, and Z a is the normal distribution quantile.
也就是说,在确定预估点击率档位各档位对应的比例系数后,即可以根 据所述比例系数确定正态分布分位点的数值。That is to say, after determining the scale factor corresponding to each gear position of the estimated click rate gear, it can be rooted The value of the normal distribution quantile is determined according to the scale factor.
假设图3服从标准正态分布,即X~N(0,1),对于任给的α,(0<α<1),称满足P(X>Zα)=α的点Zα为标准正态分布的上α分位点,Z(1-α)对应下α分位点。Suppose that Figure 3 obeys the standard normal distribution, that is, X~N(0,1). For any given α, (0<α<1), the point Zα that satisfies P(X>Zα)=α is the standard normal state. The upper α-fraction of the distribution, Z(1-α) corresponds to the lower α-point.
Zα是一个数值,当X~N(0,1),那么P(X>Zα)=α。举例进行说明,在正态分布表中找α,对应查出Zα。例如查Z0.025的值,即需要查1-0.025=0.975对应的Z值,查找图2所示正态分布表,刚好能查到0.9750对应的Z值为1.96,故Z0.025=1.96反过来查Zα=1.96对应的α值,需要先查1.96,对应着0.975,1-0.975=0.025=即为α值。Zα is a value. When X~N(0,1), then P(X>Zα)=α. For example, it is found that α is found in the normal distribution table, and Zα is detected correspondingly. For example, to check the value of Z0.025, it is necessary to check the Z value corresponding to 1-0.025=0.975, and find the normal distribution table shown in Fig. 2, just to find that the Z value corresponding to 0.9750 is 1.96, so Z0.025=1.96 To check the alpha value corresponding to Zα=1.96, you need to check 1.96 first, corresponding to 0.975, 1-0.975=0.025= is the alpha value.
则由图3可以看出来,a1和a2分别对应标准正态分布的两个分位点,通过图3中标的比例值,可分别对应到Zα1和Zα2上,通过上面的方法即可得到Zα1和Zα2的值,在标准正态分布下,Zα1对应上α分位点,Zα2对应下α分位点。It can be seen from Fig. 3 that a1 and a2 correspond to two quantized points of the standard normal distribution, respectively, and can be respectively corresponding to Zα1 and Zα2 by the scale values in Fig. 3, and Zα1 can be obtained by the above method. The value of Zα2, under the standard normal distribution, Zα1 corresponds to the α-fraction point, and Zα2 corresponds to the lower α-point.
具体实现时,按照3:4:3的比例划分预估点击率档位各档位时,可以看到两侧偏离平均数一定范围的曲线下分布面积为0.4,左右两侧由于对称关系,则各为0.3,则在图3标准正态分布分位图中a2分位点对应的右侧曲线面积为0.3,即查Z0,3的值,即需要查1-0.3=0.7对应的Z值。查图2所示的正态分布分位表可以得到,0.7对应的Z值为0.52,则Z0,3=0.52,即a2为0.52;类似地,可以确定a1的值为-0.52。a2和a1则分别对应该比例在正态分布下的两个分位点。当然也可以按照公式(4)计算正态分布分位点Zα1和Zα2的值。由于图3满足标准正态分布分位点,因此,有X~N(0,1),即μ等于0,σ等于1,由公式(4)计算得到,Za=±0.5,对应图3,即a1=-0.5,a2=0.5。In the specific implementation, when the estimated click rate gears are divided according to the ratio of 3:4:3, the distribution area under the curve with a certain range deviating from the average is 0.4, and the left and right sides are symmetric. Then each is 0.3, then the area of the right curve corresponding to the a2 quantile in the standard normal distribution bitmap of Figure 3 is 0.3, that is, the value of Z 0,3 is checked, that is, the Z corresponding to 1-0.3=0.7 needs to be checked. value. Looking up the normal distribution quantile table shown in Fig. 2, the Z value corresponding to 0.7 is 0.52, then Z 0,3 = 0.52, that is, a2 is 0.52; similarly, it can be determined that the value of a1 is -0.52. A2 and a1 correspond to the two quantiles in the normal distribution. Of course, the values of the normal distribution quantile points Zα1 and Zα2 can also be calculated according to the formula (4). Since Figure 3 satisfies the standard normal distribution quantile, there is X~N(0,1), that is, μ is equal to 0, and σ is equal to 1, which is calculated by the formula (4), Z a = ±0.5, corresponding to Figure 3 , ie a1=-0.5, a2=0.5.
预估点击率的取值符合一般正态分布规律。对应到一般的正态分布(μ不等于0,σ不等于1)的情况下,对应分位点则可通过正态分布分位点的规律近似得到,一般正态分布对应比例3:4:3的分位点从而可以得到以下公式:The estimated click rate is in accordance with the general normal distribution law. Corresponding to the general normal distribution (μ is not equal to 0, σ is not equal to 1), the corresponding quantile can be approximated by the law of the normal distribution quantile, and the general normal distribution corresponds to the ratio 3:4: The quantile of 3 thus gives the following formula:
Figure PCTCN2015098247-appb-000005
Figure PCTCN2015098247-appb-000005
其中,μ为平均数,σ为标准差。其中,μ和σ可以通过实际数据样本来计算得到。具体地,在获得预估点击率数值后,即可求出所有预估点击率的平均值μ以及对应的方差σ,具体计算方法可以参照现有技术存在的方法。然后,根据平均值μ以及方差σ,根据公式(4)得到一般正态分布分位点的数值。Where μ is the average and σ is the standard deviation. Among them, μ and σ can be calculated by actual data samples. Specifically, after obtaining the estimated click-through rate value, the average value μ of all the estimated click-through rates and the corresponding variance σ can be obtained. For the specific calculation method, refer to the existing method. Then, based on the average value μ and the variance σ, the value of the general normal distribution quantile is obtained according to the formula (4).
在确定一般正态分布分位点的数值后,则可以根据预估点击率与正态分布分位点的数值大小,确定所述预估点击率所在的档位区间。例如,根据标准正态分布分位表求出预估点击率属于(0,μ-σ/2]时,其对应的预估点击率档位为差;预估点击率属于(μ-σ/2,μ+σ/2)之间时,其对应的预估点击率档位为中;预估点击率属于[μ+σ/2,1)时,其对应的预估点击率档位为好。After determining the value of the general normal distribution quantile, the range of the estimated click rate may be determined according to the estimated click rate and the value of the normal distribution quantile. For example, when the estimated hit rate is (0, μ-σ/2) according to the standard normal distribution quantile table, the corresponding estimated click rate is the difference; the estimated click rate belongs to (μ-σ/ 2, μ+σ/2), the corresponding estimated click rate is medium; when the estimated click rate is [μ+σ/2,1), the corresponding estimated click rate is it is good.
需要说明的是,以上以比例系数为3:4:3为例进行说明,当确定的比例系数为其他比例时,可以参照上述方法的思想进行计算。It should be noted that the above description is made by taking the proportional coefficient of 3:4:3 as an example. When the determined proportional coefficient is other ratios, the calculation can be performed with reference to the idea of the above method.
S104,根据所述相关性档位和所述预估点击率档位确定各所述搜索关键词和产品信息特征对的评分,所述评分用于表征所述搜索关键词与产品信息的匹配程度。S104. Determine, according to the correlation gear position and the estimated click rate gear position, a score of each of the search keyword and the product information feature pair, where the score is used to represent the matching degree between the search keyword and the product information. .
具体实现时,评分的具体计算方法可以是多样的,例如采用加权平均的方法得到评分或者其他实现方式,本发明对此不进行限定。In a specific implementation, the specific calculation method of the score may be various, for example, a weighted average method is used to obtain a score or other implementation manner, which is not limited by the present invention.
参照表2,为星级评分的一种实现方式。Referring to Table 2, an implementation of star rating.
表2Table 2
Figure PCTCN2015098247-appb-000006
Figure PCTCN2015098247-appb-000006
其中,根据实际业务分析,可以选定使用好中差为3:4:3的比例对相关性为优的广告对进行划分,分别对应的是5星,4星和3星,对于相关性为良 的广告对按照1:1的比例划分档位,分别对应2星和1星,优广告对的划分如表2所示,良的广告对由于仅两档,划分相对简单,取分布均值点均可,良广告对中,大于均值是2星,小于均值为1星。Among them, according to the actual business analysis, you can choose to use the ratio of 3:4:3 with good difference to classify the ads with good correlation, which are 5 stars, 4 stars and 3 stars respectively. Good The advertisements are divided according to the ratio of 1:1, corresponding to 2 stars and 1 star respectively. The division of excellent advertisement pairs is shown in Table 2. The good advertisements are relatively simple because of only two files, and the distribution mean points are taken. Yes, good advertising is centered, greater than the mean is 2 stars, less than the average of 1 star.
在本发明实施例中,结合了相关性计算与预估点击率计算搜索关键词与广告产品的匹配程度,不仅告知卖家用户广告质量以及匹配度如何,还会客观反映买家用户在网站搜索产品时该广告产品被买家所点击的概率,评分星级越高,排名越靠前,买家点击的可能性就越大,带来的曝光以及反馈就会更多,使得广告客户的投资回报率也越大,提高了卖家推广产品的有效性。对于网站买家来说,广告客户对广告的优化会带来产品质量的提升,其直接结果就是用户在网站的体验会变得更好,用户所在客户端与服务器的数据交互会变少,降低了服务器的数据处理负载,提高服务器的处理性能,节省宝贵的互联网带宽资源。In the embodiment of the present invention, the correlation calculation and the estimated click rate are combined to calculate the matching degree between the search keyword and the advertisement product, not only to inform the seller user of the advertisement quality and the matching degree, but also objectively reflect that the buyer user searches for the product on the website. The probability that the advertising product will be clicked by the buyer, the higher the rating star rating, the higher the ranking, the greater the possibility of the buyer clicking, the more exposure and feedback, and the return on investment of the advertiser. The higher the rate, the more effective the seller is to promote the product. For website buyers, advertisers' optimization of advertisements will lead to an increase in product quality. The direct result is that users' experience on the website will become better, and the data interaction between the client and the server will be less. The data processing load of the server improves the processing performance of the server and saves valuable Internet bandwidth resources.
参见图4,为本发明实施例提供的产品信息匹配处理装置示意图。FIG. 4 is a schematic diagram of a product information matching processing apparatus according to an embodiment of the present invention.
一种产品信息匹配处理装置400,所述装置包括:A product information matching processing device 400, the device comprising:
获取单元401,用于获取各搜索关键词和产品信息,并将所述各搜索关键词和产品信息两两组成搜索关键词和产品信息特征对。The obtaining unit 401 is configured to obtain each search keyword and product information, and combine the search keywords and product information into a search keyword and a product information feature pair.
相关性档位确定单元402,用于计算各所述搜索关键词和产品信息特征对的相关性,根据相关性计算结果确定各所述搜索关键词和产品信息特征对的相关性档位。The correlation gear determining unit 402 is configured to calculate a correlation between each of the search keyword and the product information feature pair, and determine a correlation gear position of each of the search keyword and the product information feature pair according to the correlation calculation result.
预估点击率档位确定单元403,用于计算各所述搜索关键词和产品信息特征对的预估点击率,利用分位点确定与各所述搜索关键词和产品信息特征对的预估点击率对应的预估点击率档位。The estimated click rate gear determining unit 403 is configured to calculate an estimated click rate of each of the search keyword and product information feature pairs, and use the quantile to determine an estimate of each of the search keywords and product information feature pairs. The estimated click rate file corresponding to the click rate.
匹配性确定单元404,用于根据所述相关性档位和所述预估点击率档位确定各所述搜索关键词和产品信息特征对的评分,所述评分用于表征所述搜索关键词与产品信息的匹配程度。The matching determining unit 404 is configured to determine a score of each of the search keyword and the product information feature pair according to the relevance gear and the estimated click rate gear, the score being used to represent the search keyword The degree of matching with product information.
进一步的,所述预估点击率档位确定单元包括预估点击率计算子单元和档位确定子单元,其中,所述预估点击率计算子单元包括:Further, the estimated click rate rate determining unit includes an estimated click rate calculating subunit and a gear determining subunit, wherein the estimated click rate calculating subunit comprises:
模型建立子单元,用于对所述搜索关键词和产品信息特征对进行特征提 取,根据训练模型得到每一特征对应的特征权重;a model establishing subunit for performing feature extraction on the search keyword and the product information feature pair Taking, according to the training model, obtaining feature weights corresponding to each feature;
计算子单元,用于利用提取的特征以及所述特征对应的特征权重计算预估点击率。The calculating subunit is configured to calculate the estimated click rate by using the extracted feature and the feature weight corresponding to the feature.
进一步的,所述模型建立子单元提取的特征包括以下所列中的一种或任意结合:所述搜索关键词的文本信息、所述搜索关键词的类目信息、所述产品信息的标题、所述产品信息的属性、所述搜索关键词与所述产品信息的相关性。Further, the feature extracted by the model establishing subunit includes one or any combination of the following: text information of the search keyword, category information of the search keyword, title of the product information, The attribute of the product information, the relevance of the search keyword to the product information.
进一步的,所述预估点击率档位确定单元包括预估点击率计算子单元和档位确定子单元,其中,所述档位确定子单元包括:Further, the estimated click rate gear determining unit includes an estimated click rate calculating subunit and a gear determining subunit, wherein the gear determining unit includes:
比例系数确定子单元,用于预先确定预估点击率档位各档位对应的比例系数;a proportional coefficient determining subunit, configured to predetermine a scale factor corresponding to each gear position of the estimated click rate gear position;
分位点确定子单元,用于根据所述比例系数确定分位点的数值;a quantile determining subunit for determining a value of the quantile according to the scaling factor;
档位区间确定子单元,用于根据所述各所述搜索关键词和产品信息特征对的预估点击率以及所述分位点的数值确定所述预估点击率所在的档位区间。a gear interval determining subunit, configured to determine, according to the estimated click rate of each of the search keyword and the product information feature pair and the value of the quantile, the gear range in which the estimated click rate is located.
其中,所述分位点为正态分布分位点。Wherein, the quantile is a normal distribution quantile.
进一步的,所述相关性档位确定单元包括:Further, the correlation gear determining unit includes:
特征匹配子单元,用于将所述搜索关键词和产品信息特征对进行各项特征的匹配判断;a feature matching sub-unit, configured to perform matching matching between the search keyword and the product information feature pair;
确定子单元,用于根据所述各项特征的匹配判断结果,确定所述搜索关键词和产品信息特征对的相关性档位。Determining a subunit, configured to determine a correlation gear of the search keyword and the product information feature pair according to the matching judgment result of the each feature.
进一步的,所述特征匹配子单元进行的各项特征的匹配判断包括:类目特征匹配判断和文本特征匹配判断两者至少一种;Further, the matching determination of each feature performed by the feature matching subunit includes: at least one of a category feature matching judgment and a text feature matching judgment;
所述类目特征匹配判断为判断所述搜索关键词和产品信息是否属于同一类目;The category feature matching is determined to determine whether the search keyword and the product information belong to the same category;
所述文本特征匹配判断为判断所述搜索关键词和产品信息的文本内容是否相关联。The text feature matching is determined to determine whether the search keyword and the text content of the product information are associated.
上述各单元的功能可对应于图1详细描述的上述方法的处理步骤,于此不再赘述。需要说明的是,由于对方法实施例进行详细的阐述,对装置实施 例的描述较为简单,本领域技术人员可以理解的是,可以参照方法实施例构造本发明的装置实施例。本领域技术人员在不付出创造性劳动下获取的其他实现方式均属于本发明的保护范围。The functions of the above units may correspond to the processing steps of the above method described in detail in FIG. 1, and details are not described herein again. It should be noted that, due to the detailed description of the method embodiments, the device is implemented. The description of the examples is relatively simple, and those skilled in the art will appreciate that the device embodiments of the present invention can be constructed with reference to method embodiments. Other implementations obtained by those skilled in the art without creative efforts are within the scope of the present invention.
本领域技术人员可以理解的是,以上对方法和装置实施例进行了示例性说明,以上不视为对本发明的限制,本领域技术人员在不付出创造性劳动下获得的其他实现方式均属于本发明的保护范围。It should be understood by those skilled in the art that the above embodiments of the method and the device are exemplified, and the above is not considered as a limitation of the present invention. Other implementations obtained by those skilled in the art without creative efforts belong to the present invention. The scope of protection.
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。本发明可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本发明,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。It should be noted that, in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply such entities or operations. There is any such actual relationship or order between them. Furthermore, the term "comprises" or "comprises" or "comprises" or any other variations thereof is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device that comprises a plurality of elements includes not only those elements but also Other elements, or elements that are inherent to such a process, method, item, or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element. The invention may be described in the general context of computer-executable instructions executed by a computer, such as a program module. Generally, program modules include routines, programs, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are connected through a communication network. In a distributed computing environment, program modules can be located in both local and remote computer storage media including storage devices.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。以上所述 仅是本发明的具体实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。 The various embodiments in the specification are described in a progressive manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment. The device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie may be located A place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without any creative effort. as above It is to be understood that the present invention may be modified and modified without departing from the principles of the invention. The scope of protection of the present invention.

Claims (14)

  1. 一种信息匹配处理方法,其特征在于,所述方法包括:An information matching processing method, characterized in that the method comprises:
    获取各搜索关键词和产品信息,并将所述各搜索关键词和产品信息两两组成搜索关键词和产品信息特征对;Obtaining each search keyword and product information, and combining the search keywords and product information into a search keyword and a product information feature pair;
    计算各所述搜索关键词和产品信息特征对的相关性,根据相关性计算结果确定各所述搜索关键词和产品信息特征对的相关性档位;Calculating a correlation between each of the search keyword and the product information feature pair, and determining a correlation gear position of each of the search keyword and the product information feature pair according to the correlation calculation result;
    计算各所述搜索关键词和产品信息特征对的预估点击率,利用分位点确定与各所述搜索关键词和产品信息特征对的预估点击率对应的预估点击率档位;Calculating an estimated click rate of each of the search keyword and the product information feature pair, and using the quantile to determine an estimated click rate file corresponding to the estimated click rate of each of the search keyword and the product information feature pair;
    根据所述相关性档位和所述预估点击率档位确定各所述搜索关键词和产品信息特征对的评分,所述评分用于表征所述搜索关键词与产品信息的匹配程度。And determining, according to the correlation gear position and the estimated click rate gear position, a score of each of the search keyword and product information feature pairs, the score being used to represent a degree of matching between the search keyword and product information.
  2. 根据权利要求1所述的方法,其特征在于,所述计算各所述搜索关键词和产品信息特征对的预估点击率包括:The method according to claim 1, wherein said calculating a predicted click rate of each of said search keyword and product information feature pairs comprises:
    对所述搜索关键词和产品信息特征对进行特征提取,根据训练模型得到每一特征对应的特征权重;Feature extraction is performed on the search keyword and product information feature pairs, and feature weights corresponding to each feature are obtained according to the training model;
    利用提取的特征以及所述特征对应的特征权重计算预估点击率。The estimated click rate is calculated using the extracted features and the feature weights corresponding to the features.
  3. 根据权利要求2所述的方法,其特征在于,所述提取的特征包括以下所列中的一种或任意结合:所述搜索关键词的文本信息、所述搜索关键词的类目信息、所述产品信息的标题、所述产品信息的属性、所述搜索关键词与所述产品信息的相关性。The method according to claim 2, wherein said extracted features comprise one or any combination of the following: text information of said search keyword, category information of said search keyword, The title of the product information, the attribute of the product information, and the relevance of the search keyword to the product information.
  4. 根据权利要求1所述的方法,其特征在于,所述利用分位点确定与各所述搜索关键词和产品信息特征对的预估点击率对应的预估点击率档位包括:The method according to claim 1, wherein said using the quantile to determine an estimated click rate file corresponding to an estimated click rate of each of said search keyword and product information feature pairs comprises:
    预先确定预估点击率档位各档位对应的比例系数;Predetermining the scale factor corresponding to each gear position of the estimated click rate gear;
    根据所述比例系数确定分位点的数值;Determining a value of the quantile according to the scale factor;
    根据所述各所述搜索关键词和产品信息特征对的预估点击率以及所述分位点的数值确定所述预估点击率所在的档位区间。 And determining, according to the estimated click rate of each of the search keyword and product information feature pairs and the value of the quantile, the gear range in which the estimated click rate is located.
  5. 根据权利要求4所述的方法,其特征在于,所述分位点为正态分布分位点。The method of claim 4 wherein said quantile is a normal distribution quantile.
  6. 根据权利要求1所述的方法,其特征在于,计算各所述搜索关键词和产品信息特征对的相关性,根据相关性计算结果确定各所述搜索关键词和产品信息特征对的相关性档位包括:The method according to claim 1, wherein the correlation between each of the search keyword and the product information feature pair is calculated, and the correlation file of each of the search keyword and the product information feature pair is determined according to the correlation calculation result. Bits include:
    将所述搜索关键词和产品信息特征对进行各项特征的匹配判断;Matching the search keyword and the product information feature to perform matching determination of each feature;
    根据所述各项特征的匹配判断结果,确定所述搜索关键词和产品信息特征对的相关性档位。And determining a correlation gear of the search keyword and the product information feature pair according to the matching judgment result of the features.
  7. 根据权利要求6所述的方法,其特征在于,所述各项特征的匹配判断包括:类目特征匹配判断和文本特征匹配判断两者至少一种;The method according to claim 6, wherein the matching determination of each feature comprises: at least one of a category feature matching determination and a text feature matching determination;
    所述类目特征匹配判断为判断所述搜索关键词和产品信息是否属于同一类目;The category feature matching is determined to determine whether the search keyword and the product information belong to the same category;
    所述文本特征匹配判断为判断所述搜索关键词和产品信息的文本内容是否相关联。The text feature matching is determined to determine whether the search keyword and the text content of the product information are associated.
  8. 一种信息匹配处理装置,其特征在于,所述装置包括:An information matching processing device, characterized in that the device comprises:
    获取单元,用于获取各搜索关键词和产品信息,并将所述各搜索关键词和产品信息两两组成搜索关键词和产品信息特征对;An obtaining unit, configured to obtain each search keyword and product information, and combine the search keywords and product information into a search keyword and a product information feature pair;
    相关性档位确定单元,用于计算各所述搜索关键词和产品信息特征对的相关性,根据相关性计算结果确定各所述搜索关键词和产品信息特征对的相关性档位;a correlation gear determining unit, configured to calculate a correlation between each of the search keyword and the product information feature pair, and determine a correlation gear position of each of the search keyword and the product information feature pair according to the correlation calculation result;
    预估点击率档位确定单元,用于计算各所述搜索关键词和产品信息特征对的预估点击率,利用分位点确定与各所述搜索关键词和产品信息特征对的预估点击率对应的预估点击率档位;The estimated click rate gear determining unit is configured to calculate an estimated click rate of each of the search keyword and the product information feature pair, and use the quantile to determine an estimated click with each of the search keyword and product information feature pairs. The estimated click rate rate corresponding to the rate;
    匹配性确定单元,用于根据所述相关性档位和所述预估点击率档位确定各所述搜索关键词和产品信息特征对的评分,所述评分用于表征所述搜索关键词与产品信息的匹配程度。a matching determining unit, configured to determine, according to the correlation gear position and the estimated click rate gear position, a score of each of the search keyword and product information feature pairs, wherein the score is used to represent the search keyword and The degree of matching of product information.
  9. 根据权利要求8所述的装置,其特征在于,所述预估点击率档位确定单元包括预估点击率计算子单元和档位确定子单元,其中,所述预估点击率计算子单元包括: The apparatus according to claim 8, wherein said estimated click rate rate determining unit comprises an estimated click rate calculating subunit and a gear determining subunit, wherein said estimated click rate calculating subunit comprises :
    模型建立子单元,用于对所述搜索关键词和产品信息特征对进行特征提取,根据训练模型得到每一特征对应的特征权重;a model establishing subunit, configured to perform feature extraction on the search keyword and the product information feature pair, and obtain a feature weight corresponding to each feature according to the training model;
    计算子单元,用于利用提取的特征以及所述特征对应的特征权重计算预估点击率。The calculating subunit is configured to calculate the estimated click rate by using the extracted feature and the feature weight corresponding to the feature.
  10. 根据权利要求9所述的装置,其特征在于,所述模型建立子单元提取的特征包括以下所列中的一种或任意结合:所述搜索关键词的文本信息、所述搜索关键词的类目信息、所述产品信息的标题、所述产品信息的属性、所述搜索关键词与所述产品信息的相关性。The apparatus according to claim 9, wherein the feature extracted by the model establishing subunit comprises one or any combination of the following: text information of the search keyword, class of the search keyword The title information, the title of the product information, the attribute of the product information, and the relevance of the search keyword to the product information.
  11. 根据权利要求8所述的装置,其特征在于,所述预估点击率档位确定单元包括预估点击率计算子单元和档位确定子单元,其中,所述档位确定子单元包括:The apparatus according to claim 8, wherein the estimated click rate position determining unit comprises an estimated click rate calculating subunit and a gear determining subunit, wherein the gear determining unit comprises:
    比例系数确定子单元,用于预先确定预估点击率档位各档位对应的比例系数;a proportional coefficient determining subunit, configured to predetermine a scale factor corresponding to each gear position of the estimated click rate gear position;
    分位点确定子单元,用于根据所述比例系数确定分位点的数值;a quantile determining subunit for determining a value of the quantile according to the scaling factor;
    档位区间确定子单元,用于根据所述各所述搜索关键词和产品信息特征对的预估点击率以及所述分位点的数值确定所述预估点击率所在的档位区间。a gear interval determining subunit, configured to determine, according to the estimated click rate of each of the search keyword and the product information feature pair and the value of the quantile, the gear range in which the estimated click rate is located.
  12. 根据权利要求11所述的装置,其特征在于,所述分位点为正态分布分位点。The apparatus of claim 11 wherein said quantile is a normal distribution quantile.
  13. 根据权利要求8所述的装置,其特征在于,所述相关性档位确定单元包括:The apparatus according to claim 8, wherein the correlation gear determining unit comprises:
    特征匹配子单元,用于将所述搜索关键词和产品信息特征对进行各项特征的匹配判断;a feature matching sub-unit, configured to perform matching matching between the search keyword and the product information feature pair;
    确定子单元,用于根据所述各项特征的匹配判断结果,确定所述搜索关键词和产品信息特征对的相关性档位。Determining a subunit, configured to determine a correlation gear of the search keyword and the product information feature pair according to the matching judgment result of the each feature.
  14. 根据权利要求13所述的装置,其特征在于,所述特征匹配子单元进行的各项特征的匹配判断包括:类目特征匹配判断和文本特征匹配判断两者至少一种;The device according to claim 13, wherein the matching determination of each feature performed by the feature matching sub-unit comprises: at least one of a category feature matching determination and a text feature matching determination;
    所述类目特征匹配判断为判断所述搜索关键词和产品信息是否属于同一 类目;The category feature matching is determined to determine whether the search keyword and product information belong to the same Category
    所述文本特征匹配判断为判断所述搜索关键词和产品信息的文本内容是否相关联。 The text feature matching is determined to determine whether the search keyword and the text content of the product information are associated.
PCT/CN2015/098247 2014-12-29 2015-12-22 Information matching processing method and apparatus WO2016107455A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410838112.4 2014-12-29
CN201410838112.4A CN105808541B (en) 2014-12-29 2014-12-29 A kind of information matches treating method and apparatus

Publications (1)

Publication Number Publication Date
WO2016107455A1 true WO2016107455A1 (en) 2016-07-07

Family

ID=56284233

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/098247 WO2016107455A1 (en) 2014-12-29 2015-12-22 Information matching processing method and apparatus

Country Status (2)

Country Link
CN (1) CN105808541B (en)
WO (1) WO2016107455A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111047009A (en) * 2019-11-21 2020-04-21 腾讯科技(深圳)有限公司 Event trigger probability pre-estimation model training method and event trigger probability pre-estimation method

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649605B (en) * 2016-11-28 2020-09-29 百度在线网络技术(北京)有限公司 Method and device for triggering promotion keywords
CN107767172A (en) * 2017-10-12 2018-03-06 百度在线网络技术(北京)有限公司 Information-pushing method, device, server and medium
CN110516033A (en) * 2018-05-04 2019-11-29 北京京东尚科信息技术有限公司 A kind of method and apparatus calculating user preference
CN110633398A (en) * 2018-05-31 2019-12-31 阿里巴巴集团控股有限公司 Method for confirming central word, searching method, device and storage medium
CN110909182B (en) * 2019-11-29 2023-05-09 北京达佳互联信息技术有限公司 Multimedia resource searching method, device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020138481A1 (en) * 2001-03-23 2002-09-26 International Business Machines Corporation Searching product catalogs
CN103678481A (en) * 2003-09-30 2014-03-26 雅虎公司 Method and apparatus for search scoring
CN103729365A (en) * 2012-10-12 2014-04-16 阿里巴巴集团控股有限公司 Searching method and system
CN103778548A (en) * 2012-10-19 2014-05-07 阿里巴巴集团控股有限公司 Goods information and keyword matching method, and goods information releasing method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514178A (en) * 2012-06-18 2014-01-15 阿里巴巴集团控股有限公司 Searching and sorting method and device based on click rate
CN104077306B (en) * 2013-03-28 2018-05-11 阿里巴巴集团控股有限公司 The result ordering method and system of a kind of search engine

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020138481A1 (en) * 2001-03-23 2002-09-26 International Business Machines Corporation Searching product catalogs
CN103678481A (en) * 2003-09-30 2014-03-26 雅虎公司 Method and apparatus for search scoring
CN103729365A (en) * 2012-10-12 2014-04-16 阿里巴巴集团控股有限公司 Searching method and system
CN103778548A (en) * 2012-10-19 2014-05-07 阿里巴巴集团控股有限公司 Goods information and keyword matching method, and goods information releasing method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111047009A (en) * 2019-11-21 2020-04-21 腾讯科技(深圳)有限公司 Event trigger probability pre-estimation model training method and event trigger probability pre-estimation method
CN111047009B (en) * 2019-11-21 2023-05-23 腾讯科技(深圳)有限公司 Event trigger probability prediction model training method and event trigger probability prediction method

Also Published As

Publication number Publication date
CN105808541B (en) 2019-11-08
CN105808541A (en) 2016-07-27

Similar Documents

Publication Publication Date Title
WO2016107455A1 (en) Information matching processing method and apparatus
US9489688B2 (en) Method and system for recommending search phrases
JP6152173B2 (en) Ranking product search results
TWI512508B (en) Recommended methods and systems for recommending information
JP5662620B2 (en) Content information recommendation based on user behavior
US10452662B2 (en) Determining search result rankings based on trust level values associated with sellers
US20130339350A1 (en) Ranking Search Results Based on Click Through Rates
US10025807B2 (en) Dynamic data acquisition method and system
WO2018121700A1 (en) Method and device for recommending application information based on installed application, terminal device, and storage medium
US9805102B1 (en) Content item selection based on presentation context
US20140012840A1 (en) Generating search results
US20130262979A1 (en) Method and System of Displaying Cross-Website Information
CN105468649B (en) Method and device for judging matching of objects to be displayed
WO2013155144A1 (en) Searching supplier information based on transaction platform
TW201327233A (en) Personalized information pushing method and device
EP3117339A1 (en) Systems and methods for keyword suggestion
CN110766486A (en) Method and device for determining item category
US20140214621A1 (en) Method and device for pushing information
TW201828200A (en) Data processing method and apparatus increasing the overall display efficiency of the object display environment and decreasing the waste of display resources of each object display environment
TW201734909A (en) Method and apparatus for identifying target user
CN107679916A (en) For obtaining the method and device of user interest degree
Olmedilla et al. Identification of Influencers in eWord-of-Mouth communities using their Online Participation Features
TW201709124A (en) Method and Apparatus for Information Presentation Based on Service Object
WO2017219317A1 (en) Information pushing method and device based on search content
CN112348594A (en) Method, device, computing equipment and medium for processing article demands

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15875133

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15875133

Country of ref document: EP

Kind code of ref document: A1