WO2008030670A1

WO2008030670A1 - Detecting and adjudicating click fraud

Info

Publication number: WO2008030670A1
Application number: PCT/US2007/074651
Authority: WO
Inventors: Gideon A. Yuval
Original assignee: Microsoft Corporation
Priority date: 2006-09-08
Filing date: 2007-07-27
Publication date: 2008-03-13

Abstract

The subject disclosure pertains to systems and methods for facilitating detection of click fraud for a pay per click advertising system. A random, unbiased sample of clicks can be generated and used to estimate the occurrence of click fraud within the advertising system. Analysis of the limited sample set reduces expenses in detection of illegitimate clicks. Additionally, instances of click fraud detected within the sample set can be used to estimate the total volume of click fraud for purposes of adjudication and reimbursement of the advertiser, making click fraud detection economically viable. In an aspect, cryptographic techniques, such as public key encryption, can be used in the generation of the random sample set, allowing the advertiser to verify the selection of the sample set.

Description

DETECTING AND ADJUDICATING CLICK FRAUD

BACKGROUND

[0001] As the Internet has become increasingly popular, online advertising has become an important tool for retailers and vendors. Online advertising provides necessary income to website providers and additional marketing opportunities for retailers. One popular system for determining fees in exchange for displaying advertisements is pay per click advertising. In a pay per click system, websites or search engines display advertisements. The website operator receives a predetermined fee from the retailer or advertiser every time a website user selects or clicks on an advertisement to view additional information or access a retailer's website. As online advertising has grown, advertising networks have developed that act as middlemen between the advertisers and website operators, arranging for display of advertisements and payment of fees. [0002] Unfortunately, pay per click advertising is vulnerable to manipulation. In particular, click fraud occurs when a person, automated script or computer program imitates a valid user of a website and clicks on an advertisement to generate an improper advertising fee or charge. Click fraud can be prompted by various financial considerations. Competitors of the advertiser can click on the advertiser's ads, forcing the advertiser to pay excessive fees. In addition, friends of the website operator or advertising network may improperly click on advertisements intending to assist the website operator or advertising network by generating fees. Competitors of a website operator or advertising network may click on advertisements to make it look as though the website operator or advertising network has been improperly generating clicks, effectively framing the operator or network. Click fraud can also be motivated purely by malicious intent, comparable to vandalism. Individuals can perpetrate click fraud based upon personal or political vendettas in addition to financial gain.

[0003] It is unclear with what frequency click fraud occurs. Additionally, the extent of its impact upon advertisers and advertising systems is unknown. However, simply the perception of rampant click fraud has a detrimental effect upon online advertising and advertising systems. If advertisers feel that they are likely to be defrauded, they will be reluctant to utilize online advertising, or will minimize their use of online advertisements. Discouragement of online advertising impacts not only the advertising networks, but also the website operators that depend upon advertising revenue to offset operating expenses. Eventually, Internet users can be affected as they may be required to pay for services that were previously provided free of charge to replace the shortfall caused by reduced advertising revenue. SUMMARY

[0004] The following presents a simplified summary in order to provide a basic understanding of some aspects of the claimed subject matter. This summary is not an extensive overview. It is not intended to identify key/critical elements or to delineate the scope of the claimed subject matter. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.

[0005] Briefly described, the provided subject matter concerns facilitating detection and adjudication of click fraud for a pay per click online system. An inherent problem in identifying and resolving click fraud is the minimal financial value of each individual instance of fraud. For example, a single incident of click fraud may cost an advertiser fifty-cents. A simple cost benefit analysis makes it clear that it is not cost effective for advertisers to pay to investigate and dispute each case of click fraud. Advertising systems may be in a better position to monitor click fraud. However, advertisers may be reluctant to disclose trade secrets regarding their customers or customer transactions necessary to fully analyze and detect illegitimate clicks. [0006] The systems and methods described herein can be utilized to generate a random, unbiased sample of clicks that can be evaluated and used to estimate the occurrence of click fraud over the entire set of clicks. Each detected case of click fraud within the sample is representative of a predetermined number of undetected fraudulent clicks. For example, if one click in a thousand is sampled, each detected instance of click fraud is representative of one thousand instances of click fraud. Therefore, each instance of click fraud within the sample set has an increased value based upon the number of clicks it represents. The increased value of each instance of click fraud in the sample makes it economically viable for advertisers to analyze the sample set and adjudicate detected click fraud. Consequently, it is important that the sample set be both random and unbiased.

[0007] In an aspect, the advertiser can create a representation of each click for use in generation of the sample set. The representation can be generated utilizing a hash function and the history of the click, including information from the advertising system as well as information gathered subsequent to the selection of the advertisement (e.g., information regarding a transaction or customer actions). The representation (e.g., the hash value generated from the click history) identifies the click without granting advertising systems access to trade secrets such as customer information. In addition, the hash value can be used at a later time to validate the click history, ensuring that the click history remains unchanged from the time the click representation is generated. [0008] In an aspect, the advertising system can generate the sample set using cryptographic methods to ensure the randomness of the sample and lack of bias in favor of either the advertiser or the advertising system. To create a random sample, the click representation can be encoded (e.g. , using public key cryptography). The sample set can be selected from the encoded click representations, referred to herein as click identifiers, based upon predetermine criteria (e.g., bit patterns or ranges of values). The advertiser can verify that the click identifier was correctly encoded and that the sample set was properly selected based upon the predetermined criteria. The advertiser need only analyze the sample set for fraudulent clicks to estimate the occurrence of click fraud.

[0009] To the accomplishment of the foregoing and related ends, certain illustrative aspects of the claimed subject matter are described herein in connection with the following description and the annexed drawings. These aspects are indicative of various ways in which the subject matter may be practiced, all of which are intended to be within the scope of the claimed subject matter. Other advantages and novel features may become apparent from the following detailed description when considered in conjunction with the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0010] Fig. 1 is a block diagram of a click fraud detection system for pay per click online advertising in accordance with an aspect of the subject matter disclosed herein. [0011] Fig. 2 is a more detailed block diagram of a click fraud detection system including a hash value generator component in accordance with an aspect of the subject matter disclosed herein.

[0012] Fig. 3 is a more detailed block diagram of a click fraud system including a sample selector component in accordance with an aspect of the subject matter disclosed herein. [0013] Fig. 4 is a block diagram of a click fraud detection system including components for adjudication of click fraud in accordance with an aspect of the subject matter disclosed herein. [0014] Fig. 5 is a block diagram of a click fraud detection system including user interfaces in accordance with an aspect of the subject matter disclosed herein. [0015] Fig. 6 illustrates a methodology for monitoring and detecting click fraud in accordance with an aspect of the subject matter disclosed herein.

[0016] Fig. 7 illustrates a methodology for generating a sample set for use in detection of click fraud in accordance with an aspect of the subject matter disclosed herein. [0017] Fig. 8 illustrates a methodology for adjudication of click fraud claims in accordance with an aspect of the subject matter disclosed herein. [0018] Fig. 9 is a schematic block diagram illustrating a suitable operating environment. [0019] Fig. 10 is a schematic block diagram of a sample-computing environment.

DETAILED DESCRIPTION

[0020] The various aspects of the subject matter disclosed herein are now described with reference to the annexed drawings, wherein like numerals refer to like or corresponding elements throughout. It should be understood, however, that the drawings and detailed description relating thereto are not intended to limit the claimed subject matter to the particular form disclosed. Rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the claimed subject matter.

[0021] As used herein, the terms "component," "system" and the like are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on computer and the computer can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers.

[0022] The word "exemplary" is used herein to mean serving as an example, instance, or illustration. The subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other aspects or designs.

[0023] Furthermore, the disclosed subject matter may be implemented as a system, method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer or processor based device to implement aspects detailed herein. The term "article of manufacture" (or alternatively, "computer program product") as used herein is intended to encompass a computer program accessible from any computer-readable device, carrier, or media. For example, computer readable media can include but are not limited to magnetic storage devices (e.g. , hard disk, floppy disk, magnetic strips...), optical disks (e.g., compact disk (CD), digital versatile disk (DVD)...), smart cards, and flash memory devices (e.g., card, stick). Additionally it should be appreciated that a carrier wave can be employed to carry computer-readable electronic data such as those used in transmitting and receiving electronic mail or in accessing a network such as the Internet or a local area network (LAN). Of course, those skilled in the art will recognize many modifications may be made to this configuration without departing from the scope or spirit of the claimed subject matter. [0024] An inherent difficulty for advertisers in detection and adjudication of click fraud is the generally negligible cost of individual instances of fraudulent clicks. It is simply not economically feasible for an advertiser to pay to investigate and analyze individual clicks to identify a single fraudulent click. In addition, once a possibly fraudulent click is identified, the advertiser may need to pay attorney's fees and other costs if the advertising system disputes the fraudulent nature of the click. As used herein, an advertising system can include a website operator, an advertising network or any other entity with which the advertiser contracts for display of online advertisements. Costs can quickly escalate into the hundreds or thousands of dollars for analysis and adjudication for a transaction that may have an actual cost of fifty-cents. [0025] It may be more feasible for advertising systems to monitor clicks for possible fraud.

However, advertisers are loath to entrust such monitoring solely to the advertising systems. Whether valid or not, there is a level of distrust in leaving the policing of clicks to entities that benefit, albeit indirectly, from the fraud.

[0026] In addition, information regarding normal customer behavior is useful in evaluating and identifying fraudulent clicks. However, customer information is typically a valuable trade secret for advertisers. Advertisers are reluctant at best to entrust valuable customer information to advertising systems.

[0027] Referring now to Fig. 1, a system for facilitating detection and adjudication of click fraud between an advertiser 100 and an advertising system 102 in a pay per click system is illustrated. The system can facilitate creation of an unbiased, random sample set of clicks. The advertiser 100 and advertising system 102 can utilize the sample set to estimate the number of illegitimate clicks and negotiate reimbursement of the advertiser 100 for any instances of click fraud.

[0028] A click representation component 104 can generate a representation of a click for use in selecting a sample set. The click representation can be generated based upon any information associated with a click. Click information can include information gathered by the advertiser 100 as well as the advertising system 102. The advertising system 102 can provide click information for each click to any advertisers 100 via the advertiser interface 106. Although a single advertiser 100 is illustrated for brevity, the advertiser interface 106 can provide click information to any number of advertisers 100. Click information specific to the advertiser 100 can be received via an advertising system interface 108 and maintained in a click data store 110. Click information can also be independently maintained by the advertising system 102 in a click information data store 112. Generation of the click representation is discussed in greater detail below with respect to Figure 2. [0029] One or more click representations can be provided to the advertising system 102 via the advertising system interface 108 and received by the advertiser interface 106. A sample generator component 114 can utilize the received click representations to generate an unbiased, random sample set of clicks. In particular, the sample generator component 114 can utilize cryptographic techniques to encode the click representations and produce a set of click identifiers from which the sample set is selected. The sample generator component 114 can then select a subset of the click identifiers as the sample set based upon a predetermined parameter or parameters. Parameters can be chosen to control the size of the sample set and can be agreed upon by the advertising system 102 and the advertiser 100 prior to creation of the sample set. Click identifier generation and sample set selection are discussed in detail below with respect to Figure 3. The process of selecting the sample set is effectively split between the advertiser 100 and the advertising system 102. As described, the advertiser 100 can control the click representation for each click, while the advertising system is responsible for generating the sample set as a function of the click representations. This division helps ensure that an unbiased sample is utilized for click fraud evaluation.

[0030] Once the sample set is selected, sample set information can be stored in the click information data store 112 of the advertising system 102. Click representations can also be maintained in the click information data store 112 for reference in case of disputes regarding selection of the sample set or the fraudulent nature of a particular click. In addition, sample set information can be provided to the appropriate advertiser 100 through the advertiser interface 106. [0031] The advertiser 100 can utilize a sample set verification component 116 to verify that the sample set was correctly determined by the advertising system 102. The sample set can be verified based upon the click representations provided to the advertising system 102 and the predetermined parameter(s). The sample set information can be maintained in the click data store 110 for further evaluation and determination of click fraud.

[0032] Turning now to Fig. 2, a more detailed block diagram of a click fraud detection system is illustrated. The click representation component 104 can include a click history generator component 200 that can gather data associated with a click and assemble the gathered data into a click history. The click data can include information obtained from the advertising system 102, such as information regarding the recent actions of the user prior to selection of the advertisement, time at which the selection occurred, information regarding the user or user's account and the like. This click information can be provided through the advertising system interface 106 and maintained in the click data store 110. The information obtained from the advertising system 102 can be combined with advertiser click information collected by the advertiser 100 subsequent to the selection of the advertisement. The advertiser click information can include any actions taken by the user after selection of the advertisement, such as purchases by the user, interactions between advertiser representatives and the user and the like. Information obtained from the advertising system 102 and the advertiser click information can be combined and organized in a standardized format that can be utilized in identification of illegitimate clicks.

[0033] The click history can be analyzed and used to evaluate the likelihood that a click is fraudulent. For example, repeated selection of an advertisement by a user clearly lacking the wherewithal or desire to purchase the advertised product can signal that the user is receiving a financial benefit for selecting the advertisement. The click histories can contain a great deal of information regarding the business practices of the advertiser 100 as well as an abundance of information regarding customers of the advertiser 100. Although such information can be useful in detecting click fraud, an advertiser may not wish to turn over the information to the advertising system 102.

[0034] To avoid providing the advertising system 102 with trade secret information, a hash value generator component 202 can utilize a one-way hash function to create a hash value from a click history. The hash value can be used as the click representation and provided to the advertising system 102. Hash functions (e.g., Secure Hash Algorithm (SHA), Message Digest 5 (MD5)) simply convert an input string into an output string. In particular, a one-way hash function, works in one direction, such that it is easy to compute a hash value from the input string, but difficult to generate input string based solely upon the hash. The output hash value is not dependent upon the input click history in a discernable way.

[0035] In addition, it should be difficult to generate two valid input values from a given hash value. This allows the hash value to be used to authenticate the input used to generate the hash value. Here, the click representation can be used to authenticate the click history. [0036] The hash value generator component 202 can generate click representations based upon hash values of click histories. Consequently, a click representation can be supplied to the advertising system 102 and yet, the advertising system 102 will be unable recreate the click history, including any confidential customer information, based solely upon the click representation. However, in the event that a click is adjudicated based upon the click history, the advertising system 102 can verify that the click history has remained unchanged since creation of the click representation. The click history can be authenticated by hashing the click history and comparing the click representation provided by the advertiser 100 to the newly hashed value. [0037] Communications between the advertiser 100 and the advertising system 102 can be performed using numerous methods. In particular, communications between the advertiser interface 106 and the advertising system interface 106 can be periodic or asynchronous. Specifically, click representations can be transmitted as they are generated, in batches or groups and/or in response to a request from the advertising system 102. Click representations can be transmitted via networks, such as the Internet, by computer-readable media (e.g., CDs, DVDs) or any other suitable method of communication.

[0038] Referring now to Fig. 3, a more detailed block diagram of a click fraud detection system is illustrated. The sample generator component 114 of the advertising system 102 can include a click identifier component 300 that generates a click identifier based upon the click representations obtained from the advertiser 100. In particular, the click identifier component 300 can utilize cryptographic techniques to encode click representations to create click identifiers. Clicks can be selected for inclusion in the sample set based upon this seemingly random click identifier value. By encoding the click representation, the click identifier component 300 creates a bit pattern or value that the advertiser 100 is incapable of predicting. Consequently, the advertiser 100 will not be able to modify or adjust a specific click representation to ensure its inclusion in the sample set. At the same time, the click identifier is computed based upon a click representation that remains outside of the control of the advertising system 102, ensuring that the advertising system cannot bias generation of click identifiers to control selection of clicks included in the sample set. [0039] If particular cryptographic techniques, such as public key encryption, are used, the advertiser 102 will be able to verify that the click identifier has been correctly generated by the click identifier component 300, but will not have the capability to generate the click identifier itself. Using public key encryption, the click identifier component 300 can encode the click representation using a private key. To predict the click identifier from solely from the click representation, the advertiser 100 would have to break the encryption system, an extremely difficult and economically impractical task. However, the advertiser 100 can be provided with a public key that allows it to decrypt the click identifier and verify that the click representation was correctly encoded during generation of the click identifier. Public key encryption systems such as the RSA encryption algorithm, Digital Signature Algorithm (DSA), ElGamal cryptosystem, elliptical curve cryptography, Diffie-Hellman and the like can be used by the click identifier component 300 to generate the click identifier.

[0040] A sample selector component 302 can select a subset of the click identifiers for inclusion in the sample set of clicks. In particular, the sample selector component 302 can compare the click identifiers to a predetermined pattern to identify clicks for inclusion in the sample set. For example, clicks can be selected for inclusion in the sample set if the middle eight bits of the click identifier are equal to zero. [0041] The length of the bit pattern used to select click identifiers will affect the likelihood that a particular click identifier will be selected. For example, if selection of click identifiers is based solely upon the last bit of the click identifier (e.g., a click identifier is selected for the sample set if the final bit is equal to zero), a click identifier will have a one in two chance of being selected, since the final bit will be one of two values, zero or one. Similarly, if selection of click identifiers is based upon the last two bits (e.g., a click identifier is selected for the sample if the final two bits are both equal to one), a click identifier will have a one in four chance of being selected since there are four possible values for the final two bits. The likelihood that a click identifier will be selected for the sample set can be computed as l/2^k, where k is equal to the bit pattern length. Sample set size can be adjusted by modifying the length of the bit pattern to be matched. The advertiser 100 and advertising system 102 can agree upon a bit pattern prior to creation of the sample set. [0042] Alternatively, the click identifier can be viewed as a numerical value rather than a bit pattern. The value of the click identifier can be compared to range of numerical values to determine the sample set. For example, click identifiers with a numerical value less than 1000 or between 30,000 and 50,000 can be included in the sample set. The size of the range of values selected for inclusion in the sample set can be specified to control the size of the sample set. [0043] Sample set information including click identifiers and/or the click representations associated with the click identifiers can be maintained in the click information data store 112. In addition, sample set information, including click identifiers can be supplied to the advertiser 100. The provided sample set information allows the advertiser 100 to verify encoding of the click representations and selection of the sample set from the click identifiers. [0044] Referring now to Fig. 4, a system for facilitating click fraud detection including click analysis is illustrated. The advertiser 100 can include a click history analysis component 400 that selects one or more clicks from the sample set for analysis to detect fraudulent clicks. The click history analysis component 400 can retrieve the click history for the selected click from the click data store 110. Depending upon the size of the sample set, the advertiser can elect to investigate each click within the sample set. In an aspect, the click history analysis component 400 can be implemented as an artificial intelligence (AI) and/or machine learning and reasoning (MLR) component that employs a probabilistic and/or statistical-based analysis to prognose or infer that a click is fraudulent in nature. For example, AI and MLR mechanisms can be employed to review click histories and detect irregularities indicative of illegitimate clicks. [0045] Once a click has been identified by the click history analysis component as fraudulent, the advertiser 100 can notify the advertising system 102 of its claim for click fraud. Notification can occur via the advertising system interface 108 and the advertiser interface 106, via a letter or any other suitable means for notification. The advertiser 100 can provide any necessary information including, but not limited to, the click history, the click representation and the click identifier for the alleged fraudulent click.

[0046] In the event that an advertiser 100 identifies a click as fraudulent, the advertising system 102 can analyze the click to make an independent evaluation as to the fraudulent nature of the click. A fraud analysis component 402 can evaluate the provided click information to determine whether to reimburse the advertiser 100 or dispute the claim of click fraud. [0047] The fraud analysis component 402 can include a click history verification component 404 can confirms that the click representation that was provided to the advertising system 102 and used in generation of the sample set was correctly derived from the click history maintained by the advertiser 100. As discussed in detail above, the click representation can be computed from the click history using a one-way hash function. Consequently, the click representation can be used to authenticate the click history.

[0048] The fraud analysis component 402 can also include a click fraud analysis component

406 that can analyze the click history provided by the advertiser 100 to determine the likelihood of fraud. The click fraud analysis component 406 can be implemented as an artificial intelligence (AI) and/or machine learning and reasoning (MLR) component that employs a probabilistic and/or statistical-based analysis to prognose or infer that a click is fraudulent in nature. [0049] Referring now to Fig. 5, a click fraud detection and adjudication system including user interfaces is illustrated. In particular, the advertiser 100 can include an advertiser user interface 500 that allows operators or users to monitor the click fraud detection process and/or notifies operators of potential click fraud. In addition, the advertiser user interface 500 can render click histories, particularly for the sample set, and allow operators to view click information. Operators can analyze the click histories to determine those clicks that are likely fraudulent. Alternatively, the click history analysis component 400 can suggest clicks to the operator for further analysis via the advertiser user interface 500. The advertiser 100 can notify the advertising system 102 of possible fraudulent clicks as described above.

[0050] The advertising system 102 can include an advertising system user interface 504 that allows operators to monitor and control generation of samples for one or more advertisers 100. In addition, the advertising system user interface 504 can notify the operator if a claim of click fraud has been received from an advertiser 100 and provide the operator with the relevant information. For example, the operator can be provided with the click history, confirmation that the click was included within the sample set, verification of the click history based upon the click representation and an analysis of the click history. The operator can further analyze the click history to determine the appropriate action by the advertising system (e.g., refund of advertising fees, rejection of the claim). The advertising system 102 can generate a response to the click fraud claim authorizing reimbursement or rejecting the claim and notify the advertiser 100.

[0051] The aforementioned systems have been described with respect to interaction between several components. It should be appreciated that such systems and components can include those components or sub-components specified therein, some of the specified components or sub-components, and/or additional components. Sub-components could also be implemented as components communicatively coupled to other components rather than included within parent components. Additionally, it should be noted that one or more components may be combined into a single component providing aggregate functionality or divided into several sub-components. The components may also interact with one or more other components not specifically described herein but known by those of skill in the art.

[0052] Furthermore, as will be appreciated various portions of the disclosed systems above and methods below may include or consist of artificial intelligence or knowledge or rule based components, sub-components, processes, means, methodologies, or mechanisms (e.g., support vector machines, neural networks, expert systems, Bayesian belief networks, fuzzy logic, data fusion engines, classifiers...). For example, the click history analysis component 400 and the click fraud analysis component 406 can utilize artificial intelligence of rule-based components to determine if a click is fraudulent. Such components, inter alia, can automate certain mechanisms or processes performed thereby to make portions of the systems and methods more adaptive as well as efficient and intelligent.

[0053] Referring now to Figs. 6-8, while for purposes of simplicity of explanation, the methodologies that can be implemented in accordance with the disclosed subject matter were shown and described as a series of blocks, it is to be understood and appreciated that the claimed subject matter is not limited by the order of the blocks, as some blocks may occur in different orders and/or concurrently with other blocks from what is depicted and described herein. Moreover, not all illustrated blocks may be required to implement the methodologies described hereinafter.

[0054] Additionally, it should be further appreciated that the methodologies disclosed throughout this specification are capable of being stored on an article of manufacture to facilitate transporting and transferring such methodologies to computers. The term article of manufacture, as used, is intended to encompass a computer program accessible from any computer-readable device, carrier, or media. [0055] Referring now to Fig. 6, a methodology for monitoring clicks to facilitate detection of click fraud is illustrated. At 602, click information is gathered and compiled into a click history. The click history can include information collected by the advertising system regarding the click as well as information collected by the advertiser subsequent to the selection of the advertisement. The click history can be maintained in a standardized format for use in fraud detection. [0056] A click representation based upon the click history can be generated at 604. The click representation can be generated by an advertiser using a one-way hash function, thereby ensuring that the click history can be matched to the click representation, if necessary. The click representation can be supplied to an advertising system at 606. By providing the click representation rather than the click history, advertisers can avoid sharing confidential customer information. Click representations can be provided as they are generated, or a set of click representations can be collected and provided as a batch or group. Click representations can be provided via the Internet, in a letter, on computer-readable media or using any other suitable method.

[0057] At 608, sample set information based upon the click representations provided to the advertising system can be received by the advertiser. The sample set information can identify the clicks selected for inclusion in the sample. In addition, information can be included (e.g., click identifiers) that allows the advertiser to verify the selection of the sample. At 610, the sample set information can be validated by the advertiser. In an aspect, validation can include verifying encoding of the click representation and sample selection parameters. The click histories for the validated sample set can be analyzed at 612. At 614, fraud can be established based at least in part upon the analysis of the click history.

[0058] Referring now to Fig. 7, a methodology for generating a sample set for use in facilitating detection of click fraud is illustrated. At 702, one or more click representations can be received from the advertiser by an advertising system. As discussed above, the click representations can be hash values of the click history. Use of hash values allows the advertising system to verify the click history at a later time if necessary, but protects confidential business information during the sample selection process.

[0059] The click representation or representations can be encoded at 704 utilizing a cryptographic technique, such as public key encryption (e.g., RSA). Use of public key encryption ensures that the advertiser will be unable to predict the resulting encoded click representation, referred to herein as a click identifier. However, the advertiser is able to confirm that the click identifier was correctly encoded from the click representation.

[0060] At 706, a determination can be made as to whether the resulting click identifier should be included in the sample set. For example, the click identifier can be compared to a predetermined bit pattern or range of values. If the click identifier is selected for inclusion in the sample set, the click is added to the sample set at 708 and the process continues at 710. If the click identifier is not selected, the process continues at 710, where the advertiser is provided with sample set information. The sample set information can include information identifying the specific clicks selected for inclusion in the sample set. In addition, the sample set information can include information regarding clicks not selected for inclusion to allow the advertiser to verify sample set selection. At 712, sample set information including click representations can be maintained in a data store. The click representations may be utilized in the event that the advertiser claims click fraud or disputes selection of the sample set.

[0061] Referring now to Fig. 8, a methodology for adjudicating claims of click fraud is illustrated. At 802, a possibly fraudulent click is detected by the advertiser. The click fraud can be detected by an advertiser during its analysis of the sample set of clicks. At 804, the advertising system is notified of a claim of click fraud. The notification or claim of click fraud can include the information necessary for the advertising system to perform an independent analysis of the click, including click history, click representation and the like.

[0062] A determination can be made as to whether the click representation is consistent with the provided click history at 806. Inconsistency between the click representation and the click history would be indicative of a change in the click history after generation of the click representation, invalidating the click history. If the click history is not consistent with the click representation, the adverting system can dispute the claim of click fraud at 808. Otherwise, the process can continue and click history is analyzed at 810.

[0063] At 812, a determination can be made by the advertising system as to the fraudulent nature of the click, based at least in part upon the click history. If the advertising system concludes that the click is not fraudulent, the advertising system can dispute the claim of click fraud at 808. However, if the advertising system concludes that the click was fraudulent, the advertising system can reimburse the advertiser at 814.

[0064] In order to provide a context for the various aspects of the disclosed subject matter,

Figs. 9 and 10 as well as the following discussion are intended to provide a brief, general description of a suitable environment in which the various aspects of the disclosed subject matter may be implemented. While the subject matter has been described above in the general context of computer-executable instructions of a computer program that runs on a computer and/or computers, those skilled in the art will recognize that the system and methods disclosed herein also may be implemented in combination with other program modules. Generally, program modules include routines, programs, components, data structures, etc. that perform particular tasks and/or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the inventive methods may be practiced with other computer system configurations, including single-processor or multiprocessor computer systems, mini-computing devices, mainframe computers, as well as personal computers, hand-held computing devices (e.g., personal digital assistant (PDA), phone, watch...), microprocessor-based or programmable consumer or industrial electronics, and the like. The illustrated aspects may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. However, some, if not all aspects of the systems and methods described herein can be practiced on stand-alone computers. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.

[0065] With reference again to Fig. 9, the exemplary environment 900 for implementing various aspects of the embodiments includes a mobile device or computer 902, the computer 902 including a processing unit 904, a system memory 906 and a system bus 908. The system bus 908 couples system components including, but not limited to, the system memory 906 to the processing unit 904. The processing unit 904 can be any of various commercially available processors. Dual microprocessors and other multi-processor architectures may also be employed as the processing unit 904.

[0066] The system memory 906 includes read-only memory (ROM) 910 and random access memory (RAM) 912. A basic input/output system (BIOS) is stored in a non-volatile memory 910 such as ROM, EPROM, EEPROM, which BIOS contains the basic routines that help to transfer information between elements within the computer 902, such as during start-up. The RAM 912 can also include a high-speed RAM such as static RAM for caching data.

[0067] The computer or mobile device 902 further includes an internal hard disk drive

(HDD) 914 (e.g., EIDE, SATA), which internal hard disk drive 914 may also be configured for external use in a suitable chassis (not shown), a magnetic floppy disk drive (FDD) 916, (e.g., to read from or write to a removable diskette 918) and an optical disk drive 920, (e.g., reading a CD- ROM disk 922 or, to read from or write to other high capacity optical media such as the DVD). The hard disk drive 914, magnetic disk drive 916 and optical disk drive 920 can be connected to the system bus 908 by a hard disk drive interface 924, a magnetic disk drive interface 926 and an optical drive interface 928, respectively. The interface 924 for external drive implementations includes at least one or both of Universal Serial Bus (USB) and IEEE 1194 interface technologies. Other external drive connection technologies are within contemplation of the subject systems and methods. [0068] The drives and their associated computer-readable media provide nonvolatile storage of data, data structures, computer-executable instructions, and so forth. For the computer 902, the drives and media accommodate the storage of any data in a suitable digital format. Although the description of computer-readable media above refers to a HDD, a removable magnetic diskette, and a removable optical media such as a CD or DVD, it should be appreciated by those skilled in the art that other types of media which are readable by a computer, such as zip drives, magnetic cassettes, flash memory cards, cartridges, and the like, may also be used in the exemplary operating environment, and further, that any such media may contain computer-executable instructions for performing the methods for the embodiments of the data management system described herein. [0069] A number of program modules can be stored in the drives and RAM 912, including an operating system 930, one or more application programs 932, other program modules 934 and program data 936. All or portions of the operating system, applications, modules, and/or data can also be cached in the RAM 912. It is appreciated that the systems and methods can be implemented with various commercially available operating systems or combinations of operating systems. [0070] A user can enter commands and information into the computer 902 through one or more wired/wireless input devices, e.g., a keyboard 938 and a pointing device, such as a mouse 940. Other input devices (not shown) may include a microphone, an IR remote control, a joystick, a game pad, a stylus pen, touch screen, or the like. These and other input devices are often connected to the processing unit 904 through an input device interface 942 that is coupled to the system bus 908, but can be connected by other interfaces, such as a parallel port, an IEEE 1194 serial port, a game port, a USB port, an IR interface, etc. A display device 944 can be used to provide a set of group items to a user. The display devices can be connected to the system bus 908 via an interface, such as a video adapter 946.

[0071] The mobile device or computer 902 may operate in a networked environment using logical connections via wired and/or wireless communications to one or more remote computers, such as a remote computer(s) 948. The remote computer(s) 948 can be a workstation, a server computer, a router, a personal computer, portable computer, microprocessor-based entertainment appliance, a peer device or other common network node, and typically includes many or all of the elements described relative to the computer 902, although, for purposes of brevity, only a memory/storage device 950 is illustrated. The logical connections depicted include wired/wireless connectivity to a local area network (LAN) 952 and/or larger networks, e.g., a wide area network (WAN) 954. Such LAN and WAN networking environments are commonplace in offices and companies, and facilitate enterprise-wide computer networks, such as intranets, all of which may connect to a global communications network, e.g. , the Internet. [0072] When used in a LAN networking environment, the computer 902 is connected to the local network 952 through a wired and/or wireless communication network interface or adapter 956. The adaptor 956 may facilitate wired or wireless communication to the LAN 952, which may also include a wireless access point disposed thereon for communicating with the wireless adaptor 956.

[0073] When used in a WAN networking environment, the computer 902 can include a modem 958, or is connected to a communications server on the WAN 954, or has other means for establishing communications over the WAN 954, such as by way of the Internet. The modem 958, which can be internal or external and a wired or wireless device, is connected to the system bus 908 via the serial port interface 942. In a networked environment, program modules depicted relative to the computer 902, or portions thereof, can be stored in the remote memory/storage device 950. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers can be used.

[0074] The computer 902 is operable to communicate with any wireless devices or entities operatively disposed in wireless communication, e.g. , a printer, scanner, desktop and/or portable computer, PDA, communications satellite, any piece of equipment or location associated with a wirelessly detectable tag (e.g., a kiosk, news stand, restroom), and telephone. The wireless devices or entities include at least Wi-Fi and Bluetooth™ wireless technologies. Thus, the communication can be a predefined structure as with a conventional network or simply an ad hoc communication between at least two devices.

[0075] Wi-Fi allows connection to the Internet from a couch at home, a bed in a hotel room, or a conference room at work, without wires. Wi-Fi is a wireless technology similar to that used in a cell phone that enables such devices, e.g., computers, to send and receive data indoors and out; anywhere within the range of a base station. Wi-Fi networks use radio technologies called IEEE 802.11 (a, b, g, etc.) to provide secure, reliable, fast wireless connectivity. A Wi-Fi network can be used to connect computers to each other, to the Internet, and to wired networks (which use IEEE 802.3 or Ethernet). Wi-Fi networks operate in the unlicensed 2.4 and 5 GHz radio bands, at an 11 Mbps (802.1 Ia) or 54 Mbps (802.1 Ib) data rate, for example, or with products that contain both bands (dual band), so the networks can provide real-world performance similar to the basic lOBaseT wired Ethernet networks used in many offices.

[0076] Fig. 10 is a schematic block diagram of a sample-computing environment 1000 with which the systems and methods described herein can interact. The system 1000 includes one or more client(s) 1002. The client(s) 1002 can be hardware and/or software (e.g., threads, processes, computing devices). The system 1000 also includes one or more server(s) 1004. Thus, system 1000 can correspond to a two-tier client server model or a multi-tier model (e.g., client, middle tier server, data server), amongst other models. The server(s) 1004 can also be hardware and/or software (e.g., threads, processes, computing devices). One possible communication between a client 1002 and a server 1004 may be in the form of a data packet adapted to be transmitted between two or more computer processes. The system 1000 includes a communication framework 1006 that can be employed to facilitate communications between the client(s) 1002 and the server(s) 1004. The client(s) 1002 are operably connected to one or more client data store(s) 1008 that can be employed to store information local to the client(s) 1002. Similarly, the server(s) 1004 are operably connected to one or more server data store(s) 1010 that can be employed to store information local to the servers 1004.

[0077] What has been described above includes examples of aspects of the claimed subject matter. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the claimed subject matter, but one of ordinary skill in the art may recognize that many further combinations and permutations of the disclosed subject matter are possible. Accordingly, the disclosed subject matter is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims. Furthermore, to the extent that the terms "includes," "has" or "having" are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term "comprising" as "comprising" is interpreted when employed as a transitional word in a claim.

Claims

CLAIMSWhat is claimed is:

1. A system for facilitating detection of click fraud for a pay per click online advertising system, comprising: an interface component (108) that obtains at least one click representation that corresponds to a click; and a sample generator component (114) that generates a randomized sample set for use in detecting click fraud based at least in part upon the at least one click representation.

2. The system of claim 1, further comprising: a click identifier component (300) that utilizes cryptography to generate a click identifier that corresponds to the click representation; and a sample selection component (302) that selects the sample set as a function of at least one of the click identifier and a predetermined parameter.

3. The system of claim 2, the click identifier component (300) utilizes public key encryption to generate the click identifier.

4. The system of claim 2, the predetermined parameter is at least one of a bit pattern and a range of values.

5. The system of claim 2, the click identifier component (300) utilizes RSA public key encryption.

6. The system of claim 1 , the click representation is a hash value of a click history corresponding to the click (104).

7. The system of claim 1, further comprising a click information data store (112) that maintains at the least one click representation and the sample set.

8. The system of claim 1, further comprising a fraud analysis component (402) that analyzes a click history corresponding to the click and infers fraud based at least in part upon the analyzed click history.

9. A method for monitoring a click per pay advertising system, comprising: generating a plurality of click representations, each of the plurality of click representations is associated with a click (604); receiving a randomized sample set associated with a subset of the plurality of click representations (608); and validating the sample set based at least in part upon correspondence of the sample set to the subset of the plurality of click representations and a sample selection parameter (610).

10. The method of claim 9, further comprising: analyzing the click for each of the subset of the plurality of click representations (612); and establishing a fraud based at least in part upon the analysis of the click (614).

11. The method of claim 10, further comprising notifying an advertising system of the fraud (804).

12. The method of claim 10, further comprising notifying an operator of the fraud (804).

13. The method of claim 9, the act of generating the plurality of click representations further comprises generating a hash value for the click (604).

14. The method of claim 13, the hash value is generated using at least one of Secure Hash Algorithm (SHA) and Message Digest 5 (MD5).

15. The method of claim 9, the sample set is generated based at least in part upon encryption of the plurality of click representations and selection from the encrypted plurality of click representations as a function of the sample selection parameter.

16. The method of claim 15, public key encryption is utilized to encrypt the plurality of click representations.

17. The method of claim 16, the act of validating the sample click identifier further comprises: decrypting the sample set (704); and comparing the decrypted sample set to the plurality of click representations (706).

18. The method of claim 9, the click includes customer information.

19. A system for facilitating detection of click fraud, comprising: means for generating a plurality of click representations (104), each of the plurality of click representations is associated with a click history; means for obtaining a randomized sample set associated with a subset of the plurality of click representations (114); means for verifying the sample set based upon correspondence to the subset of the plurality of click representations and a predetermined parameter (116); means for analyzing the click history for each of the subset of the plurality of click representations (400); and means for establishing a fraud based at least in part upon the analysis of the click history (402).

20. The system of claim 19, a one-way hash function is utilized in generation of the plurality of click representations (104).