WO2004084097A1 - Method and apparatus for detecting invalid clicks on the internet search engine - Google Patents

Method and apparatus for detecting invalid clicks on the internet search engine Download PDF

Info

Publication number
WO2004084097A1
WO2004084097A1 PCT/KR2004/000416 KR2004000416W WO2004084097A1 WO 2004084097 A1 WO2004084097 A1 WO 2004084097A1 KR 2004000416 W KR2004000416 W KR 2004000416W WO 2004084097 A1 WO2004084097 A1 WO 2004084097A1
Authority
WO
WIPO (PCT)
Prior art keywords
click
search
searcher
clicks
identifier
Prior art date
Application number
PCT/KR2004/000416
Other languages
French (fr)
Inventor
Jung Soo Ha
Seok Ho Kang
Woo Sung Lee
Original Assignee
Nhn Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nhn Corporation filed Critical Nhn Corporation
Priority to JP2005518761A priority Critical patent/JP4358188B2/en
Publication of WO2004084097A1 publication Critical patent/WO2004084097A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links

Definitions

  • the present invention relates to an Internet search engine server. More particularly, the present invention relates to a method and apparatus for detecting invalid clicks for a search item included in a search result web page that is provided by an Internet search engine server. Furthermore, the present invention relates to a method and apparatus for detecting invalid clicks, which can detect a variety of attempts for unjustly increasing the number of clicks for a search item and can immediately cope with these attempts.
  • searchers access Internet search engine servers such as NAVER, Yahoo and Lycos to request a search.
  • the Internet search service provider generates a search result web page including search items, which contain information associated with a search word input by the searcher, and then provides the searcher with the generated search result web page.
  • search result web page when a searcher accesses a NAVER search engine server and then inputs the search words "digital camera", is shown in Fig. 2.
  • Each item included in the search result web page is associated with URL (Uniform Resource Locator).
  • An Internet search service provider determines a list sequence of search items by combining several references.
  • One of the references that have been widely used is the number of clicks for a specific search item by users. For example, if the number of clicks for a search item by users is great, the search item is displayed relatively at an upper portion of a search result web page.
  • a network information provider of a web server wants search items associated with himself or herself to be displayed at the top of a search result web page. For this reason, in order for the search item for his or her web page to be displayed at the top of the search result web page, the network information provider may deliberately access the Internet search server to click the search item for his or her own web page multiple times. In some cases, the network information providers may continuously click the search item for his or her web page using a special program. Since such unjust clicks for a search item do not reflect the natural search results of users, an Internet search service provider has to detect such invalid clicks.
  • Overture Services, Inc. U.S.A.
  • an Internet search service provider provides services wherein a network information provider pays per click when searchers click on a search item in a search result web page, which is associated with the network information provider.
  • a searcher intentionally clicks on a specific search item several times, the network information provider associated with the search item has to pay additional costs. Accordingly, even in this case, it is necessary to detect invalid clicks, which are made with the intention of only increasing only the number of clicks without actually searching for search item.
  • a further another object of the present invention is to provide a method and apparatus for detecting invalid clicks, wherein several identifiers provided in order to detect invalid clicks are difficult to be counterfeited or forged.
  • the present invention provides a method for detecting invalid clicks in an Internet search engine, comprising the steps of generating a search result web page in response to a search request from a searcher, acquiring a page identifier corresponding to the generated web page, receiving a click for a search item included in the search result web page from the searcher, acquiring a site identifier corresponding to the clicked search item, and if the page identifier and the site identifier are coincident with a page identifier and a site identifier associated with other clicks within a predetermined time interval, determining that the click is invalid.
  • a method for detecting invalid clicks in an Internet search engine comprising the steps of generating a search result web page in response to a search request from a searcher, acquiring a session identifier included in a session cookie file stored in a terminal of the searcher, receiving a click for a search item included in the search result web page from the searcher, acquiring a site identifier corresponding to the clicked search item, and if the session identifier and the site identifier are coincident with a session identifier and a site identifier associated with other clicks within a predetermined time interval, determining that the click is invalid.
  • a method for detecting invalid clicks in an Internet search engine comprising the steps of receiving a click for a search item included in a search result web page from a searcher, acquiring a client IP address corresponding to a terminal of the searcher, acquiring a site identifier corresponding to the clicked search item, and if the client IP address and the site identifier are coincident with a client IP address and a site identifier associated with other clicks within a predetermined time interval, determining that the click is invalid.
  • a method for detecting invalid clicks in an Internet search engine comprising the steps of generating a search result web page in response to a search request from a searcher, acquiring a terminal identifier corresponding to a terminal of the searcher, generating a user cookie file including the terminal identifier and then storing the user cookie file in the terminal of the searcher, receiving a click for a search item included in the search result web page from the searcher, acquiring a site identifier corresponding to the clicked search item, and if the terminal identifier and the site identifier are coincident with a terminal identifier and a site identifier associated with other clicks within a predetermined time interval, determining that the click is invalid.
  • an apparatus for detecting invalid clicks wherein if a searcher clicks on a search item included in a search result web page provided by an Internet search engine, at least one of an IP address of the searcher's terminal, a network address to which the searcher's terminal belongs, a search word associated with the search result web page, information on a web browser of the searcher, a click time associated with the click and cookie file information stored in the searcher's terminal, and URL information associated with the search item are received, and it is determined whether the click is invalid based on a predetermined reference according to the received information.
  • an apparatus for detecting invalid clicks comprising (1) a log storage unit that, in response to a click of a searcher for a search item included in a search result web page provided by an Internet search engine, stores a log regarding at least two of the following: an IP address of the searcher's terminal, a network address to which the searcher's terminal belongs, a search word associated with the search result web page, information on a web browser of the searcher, a click time associated with the click, cookie file information stored in the searcher's terminal and URL information associated with the search item, (2) an invalid click pattern storage unit that stores an invalid click pattern associated with a pair of at least two of the following: the IP address of the searcher's terminal, the network address to which the searcher's terminal belongs, the search word associated with the search result web page, the information on the web browser of the searcher, the click time associated with the click, the cookie file information stored in the searcher's terminal, and URL information associated with the search item, and (3)
  • an apparatus for detecting invalid clicks comprising a click counter means for counting the number of clicks of a searcher per search item for a predetermined time interval for the search item included in a search result web page provided by an Internet search engine, an average click-number calculation means for calculating the average number of clicks, for the predetermined time interval, of search items belonging to a category to which the search item belongs, and a decision means for determining whether the number of clicks per search item is greater by a predetermined difference than the average number of clicks.
  • an apparatus for detecting invalid clicks comprising a click counter means for counting the number of clicks of a searcher per a search item for a predetermined time interval for the search item included in a search result web page provided by an Internet search engine, an average click-number calculation means for calculating the average number of clicks of a predetermined first number of search items located at the upper side of the search items and of a predetermined second number of search items located at the lower side of the search items, in the search result web page for the predetermined time interval, and a decision means for determining whether the number of clicks per search item is greater by a predetermined difference than the average number of clicks.
  • invalid clicks are difficult to accurately define, and the scope of the invalid clicks should be variously defined depending on embodiments and applications.
  • invalid clicks may refer to clicks that are made with the intention of increasing only the number of clicks without the intention of an actual search.
  • Fig. 1 is a diagram illustrating a network connection of an Internet search server including an apparatus for detecting invalid clicks and a client terminal according to the present invention.
  • Fig. 2 is a diagram illustrating search result web page generated internet search engine.
  • Fig. 3 is a block diagram illustrating the construction of an apparatus for detecting invalid clicks according to an embodiment of the present invention.
  • Fig. 4 is a flowchart illustrating a method for detecting invalid clicks according to an embodiment of the present invention.
  • Fig. 5 is shows an exemplary log file according to an embodiment of the present invention.
  • Figs. 6a and 6b are flowcharts illustrating a method for detecting invalid clicks according to an embodiment of the present invention.
  • Fig. 7 shows an exemplary log file according to an embodiment of the present invention.
  • Fig. 8 is a flowchart illustrating a method of generating a session identifier according to an embodiment of the present invention.
  • Fig. 9 is a flowchart illustrating a method for detecting invalid clicks according to an embodiment of the present invention.
  • Fig. 10 shows an exemplary log file according to an embodiment of the present invention.
  • Fig. 11 is a flowchart illustrating a method for detecting invalid clicks according to an embodiment of the present invention.
  • Fig. 12 is a block diagram illustrating the construction of a general-purpose computer system that may be adopted in constructing a search engine server and an apparatus for detecting invalid clicks according to the present invention.
  • Fig. 1 is a diagram illustrating a network connection of an Internet search server including an apparatus for detecting invalid clicks and a client terminal according to the present invention.
  • a searcher or a cheater who will attempts unjust clicks accesses an Internet search server 104 through a client terminal 101 connected to an Internet 103.
  • the cheater attempts to increase the number of clicks by clicking on a search item in a search result web page several times provided by the Internet search server 104, whose number of clicks needs to be increased. For example, in Fig.
  • a search item 202 is a search item associated with http://www.invalidclick.com and a cheater clicks on the search item 202 continuously in order for the search item 202 to be displayed at the top of a search result web page.
  • a cookie file 102 is a specific text file that is stored in a hard disk of the client terminal 101 by the search engine server 104 or by other web site when the client terminal 101 is connected to the search engine server 104 or the other web site.
  • each request for the web page is independent from other requests. Therefore, the web server has nothing information on which page has sent to the client terminal 101 previously or what work has performed together with the client terminal 101 previously. Accordingly, in order to correlate respective requests processed independently as such, a cookie file is provided.
  • Such a cookie file serves to allow a web server to store information on a user in the user's computer. Even in this invention, in order to detect invalid clicks, several cookie files are used. This will be described in detail later on.
  • a log file 105 is a file for storing several logs related to a user's click pattern.
  • several parameters are used in order to detect invalid clicks. After parameters associated with respective clicks are stored in the log file, it is determined whether the input clicks are invalid based on predetermined rules and patterns.
  • Fig. 3 is a block diagram illustrating the construction of an apparatus for detecting invalid clicks according to an embodiment of the present invention.
  • the apparatus for detecting invalid clicks 301 comprises a parameter input unit 304, a log storage unit 305, an invalid click pattern storage unit 306, an invalid click verification unit 307, an invalid click report unit 308 and an invalid click decision unit 309. If a searcher clicks on a search item included in a search result web page provided by an Internet search engine, several parameters 302 associated with the click are input to the parameter input unit 304.
  • the parameters are basic information for determining invalid clicks and include an IP address of the searcher's terminal, a network address to which the searcher's terminal belongs, a search word associated with the search result web page, information on a web browser of the searcher, a click time associated with the click, cookie file information stored in the searcher's terminal, URL information associated with the search item and the like.
  • a search request packet is transferred from the client terminal 101 to the Internet search engine server 104.
  • the search request packet contains a packet configuration according to the HTTP protocol and is also contained in the Internet (IP: Internet Protocol) packet. Since a source IP address field is contained in the configuration of the Internet protocol packet, the Internet search engine server 104 extracts a source IP address from the search request packet for which a click is requested, thus extracting an IP address of the searcher's terminal.
  • IP Internet Protocol
  • the front part of the source IP address is a network address to which the searcher's terminal belongs.
  • the IP address is composed of 4 bytes.
  • the front part of the IP address is a network address for identifying a network to which a searcher's terminal belongs and the remaining parts thereof are addresses for identifying the searcher's terminal within the network. Accordingly, a network address is extracted from the source IP address.
  • the 3 bytes at the front part of the IP address are considered a network address and the network address is obtained from the source IP address. For example, if a source IP address is 123.45.67.89, 123.45.67 is extracted as a network address.
  • a search word associated with a search result web page is a value input to the Internet search server 104 by the searcher.
  • Information on a web browser of the searcher is information on a web browser, which is installed in the client terminal 101 of the searcher and is used to access the Internet search server 104.
  • Information on the web browser includes the type of web browser, the version of the web browser, product ID of the web browser, etc. In particular, even when a plurality of searchers has web browsers of the same type and the same version, the product IDs of their web browsers may be different. Thus, it becomes useful information for identifying a searcher's terminal.
  • some of environment parameters of a client are transferred to a web server with them included in the HTTP packet.
  • a program (a search engine program) of the web server can receive the environmental parameters and can use the parameters to detect invalid clicks.
  • Such enviromnental parameters include the following information:
  • REMOTE_HOST domain name of a connected person
  • REMOTE_ADDR IP address of a connected client host
  • REMOTE JSER name of a connected person (displayed in case of a web server whose user authentication is set)
  • REMOTE_IDENT ID of a connected person (displayed in case of a web server whose user authentication is set)
  • HTTP_USER_AGENT registration information on a program driven by a connected person, usually the name of a browser
  • HTTP_ACCEPT_LANGUAGE language used by a connected person
  • HTTP_REFERER name of document that calls a corresponding CGI program
  • REQUEST_METHOD method for transmitting data to a sever (GET, POST)
  • QUERY_STRING parameters wherein transmitted data are stored when the data are sent in a GET mode
  • CONTENTJ ENGTH total length of transmitted data (the number of byte) when the data are sent in POST mode
  • CONTENT TYPE type of MIME of data when the data are transmitted in a POST mode
  • AUTH_TYPE parameters for confirming a user's authority
  • PATH_INFO information of a current path of a called CGI program
  • PATH_TRANSLATED information on a current path of resources in a web server required by web
  • a click time associated with a click of a searcher is a time when a click input from the searcher is received. According to another embodiment of the present invention, other time associated with the click time of the searcher may be used. For example, a time when a searcher actually input a click in a client may be used.
  • cookie file 102 Information on a cookie file stored in the searcher's terminal is obtained by the Internet search server 104 which accesses the cookie file 102 stored in the client terminal 101.
  • the cookie file 102 may be used for a variety uses. This will be described in detail with reference to other embodiments.
  • URL information associated with the search item clicked by the searcher can be obtained by referring to the search database since it is stored in a search database (not shown) associated with the search engine server 104.
  • the URL information may be a domain name of a web server or information containing a domain name, a directory and a file name. For instance, http://www.naver.com and http://www.naver.com/download are the same since they are www.naver.com in view of the domain name, but have different URLs. In the present invention, an embodiment using URL up to the domain name has been described for convenience of explanation.
  • URL information includes all the embodiments according to this description.
  • logs including only some parameters are shown for convenience of explanation. According to another embodiment of the present invention, however, logs including all or some of the parameters 302 may be stored in the log storage unit 305.
  • the log storage unit 305 stores therein a log regarding at least two of an IP address of a searcher's terminal, a network address to which the searcher's terminal belongs, a search word associated with the search result web page, information on a web browser of the searcher, a click time associated with the click, cookie file information stored in the searcher's terminal, and URL information associated with the search item.
  • the log storage unit 305 stores therein a log regarding at least one of an IP address of a searcher's terminal, a network address to which the searcher's terminal belongs, a search word associated with the search result web page, information on a web browser of the searcher, a click time associated with the click and cookie file information stored in the searcher's terminal, and URL information associated with the search item.
  • the invalid click pattern storage unit 306 stores therein invalid click patterns or rules associated with a pair of at least two of an IP address of a searcher's terminal, a network address to which the searcher's terminal belongs, a search word associated with the search result web page, information on a web browser of the searcher, a click time associated with the click, cookie file information stored in the searcher's terminal, and URL information associated with the search item.
  • a rule or pattern that "both the IP address of the searcher's terminal and URL information associated with a search item, among click inputs for 10 minutes, are coincident to each other, may be stored in the invalid click pattern storage unit 306. As such, the rule, etc.
  • the invalid click pattern storage unit 306 may be stored in the form of a file using a predetermined language according to a predetermined rule. Or, in case of the above rule or pattern, it may be stored in the form of a program so that it is determined to be an invalid click.
  • the invalid click decision unit 309 determines whether the searcher's clicks are invalid based on the log stored in the log storage unit 305 and the invalid click pattern stored in the invalid click pattern storage unit 306.
  • the invalid click report unit 308 reports clicks pursuant to a predetermined reference among clicks, which are determined to be invalid by the invalid click decision unit 309, to the administrator 303 of the Internet search engine.
  • the invalid click report unit 308 reports all the clicks, which are determined to be invalid by the invalid click decision unit 309, to the administrator of the Internet search engine.
  • the predetermined reference is all clicks that have been determined to be invalid by the invalid click decision unit 309.
  • a field indicating whether to report a case corresponding to the rule or pattern to the administrator 303 is stored in every rule or pattern stored in the invalid click pattern storage unit 306. In this case, in case of a case corresponding to a rule where the administrator 303 must be informed, the invalid click report unit 308 reports it to the administrator 303.
  • the invalid click verification unit 307 allows the administrator 303 to change clicks, which have been determined to be the invalid by the invalid click decision unit 309, to valid clicks. Since the invalid click verification unit 307 can change clicks that are erroneously determined to be invalid clicks to valid clicks, invalid clicks can be more accurately determined.
  • Fig. 4 is a flowchart illustrating a method for detecting invalid clicks according to an embodiment of the present invention.
  • the Internet search server 104 receives a search request from a searcher (step 401). If the searcher accesses the Internet search server 104 and then inputs a search word, the search word is transferred to the Internet search server 104 as a search request packet.
  • the Internet search server 104 generates a search result web page in response to the search request (step 402). For example, as shown in Fig. 2, a search result web page including a plurality of search items corresponding to an input of a search word by a searcher is provided to the searcher.
  • a page identifier corresponding to the generated search result web page is acquired (step 403).
  • a page identifier is generated whenever a search result web page is generated.
  • the page identifier is an identifier for identifying the search result web page. Accordingly, if the same searcher requests a search by repeatedly inputting the same search word in a search window of the Internet search server 104, a new page identifier is allocated every time. Likewise, if a searcher clicks "reload" in a web browser on which a search result web page is displayed, the Internet search server 104 allocates a new page identifier for the search result web page since the search request packet is transferred from the client terminal 101 to the Internet search server 104.
  • the Internet search server 104 receives a click for a search item included in the search result web page from the searcher. If the click is received, the Internet search server 104 allows a hyperlink for the search item to approach the Internet search server 104, allows the Internet search server 104 to perform the necessary processes and then allows a client terminal to access a web site corresponding to the search item. For example, in case where http://www.naver.com/abc/*http://www.invalidclick.com/ is prepared as a hyperlink of a search item corresponding to "http//www. invalidclick.com/", if a searcher clicks the search item, the search is allowed to access a search server called http://www.naver.com. The search server allows a client terminal to access the http://www.invalidclick.com according to URL located at the rear side of the hyperlink.
  • the Internet search server 104 acquires a site identifier corresponding to he clicked search item (step 405).
  • the site identifier is an identifier for identifying a search item and is generated based on URL information corresponding to a search item.
  • the site identifier uses intact URL information corresponding to the search item.
  • URL information used as basic information for generating the site identifier may be a domain name of a web server or information containing a domain name, a directory and a file name. For instance, http://www.naver.com and http://www.naver.com/download are same since they are both www.naver.com from the viewpoint of a domain name, but are different from the viewpoint of a URL.
  • URL information includes all the embodiments according to this description.
  • step 406 the apparatus for detecting invalid clicks determines that the click is invalid if the page identifier and the site identifier are coincident with a page identifier and a site identifier associated with other click within a predetermined time interval.
  • Fig. 5 is shows an exemplary log file according to an embodiment of the present invention. The embodiment of Fig. 4 will be described with reference to Fig. 5.
  • a page identifier 509 and a site identifier 510 are stored in a log file 500.
  • Reference numerals 501 to 508 indicate logs stored for respective click inputs.
  • a cheater accesses the Internet search server 104 to request a search.
  • the Internet search server 104 generates a search result web page and generates a page identifier corresponding to a search result web page, "nCe249sisnO".
  • the cheater continuously clicks a specific search item included in the search result web page. Even though a specific search item in a search result web page generated once is continuously clicked, a page identifier is not newly generated. Thus, the page identifier continues to have the same value.
  • the cheater may update the search result web page by clicking "reload" in the web browser.
  • a page identifier is newly allocated and a log regarding the page identifier is the log 505. Thereafter, a case where the cheater clicks on the same search item corresponds to the log 506.
  • Figs. 6a and 6b are flowcharts illustrating a method for detecting invalid clicks according to an embodiment of the present invention.
  • the Internet search server 104 receives a search request from a searcher (step 601).
  • the Internet search server 104 generates a search result web page in response to the search request (step 602).
  • An apparatus for determining invalid clicks determines whether a session cookie file is stored in the client terminal 101 that requested the search (step 603).
  • Step 603 to step 611 are processes for obtaining a session identifier.
  • the apparatus for determining invalid clicks generates a new session identifier (step
  • step 604 a session cookie file containing the session identifier is stored in the client terminal 101.
  • An updating time of the session identifier is also stored in the session cookie file.
  • the updating time is stored in the session cookie file (step 609).
  • the apparatus for determining invalid clicks determines whether the last- updated time of the session identifier contained in the session cookie file is within a predetermined time interval (step 606).
  • the apparatus for determining invalid clicks extracts a session identifier contained in the session cookie file (step 607).
  • the apparatus for determining invalid clicks generates a new session identifier (step 608).
  • a session identifier contained in the session cookie file is updated with a newly created session identifier (step 610).
  • An updating time of the session identifier is stored in the session cookie file (step 611).
  • the Internet search server 104 receives a click for a search item included in the search result web page from the searcher (step 612).
  • the Internet search server 104 acquires a site identifier corresponding to the clicked search item (step 613).
  • the apparatus for detecting invalid clicks determines that the click is an invalid click if the session identifier and the site identifier are coincident with a session identifier and a site identifier associated with other clicks within a predetermined time interval (step 614).
  • Fig. 7 shows an exemplary log file according to an embodiment of the present invention.
  • a click time 710, an updating time 711 of a session identifier, a session identifier 712 and a site identifier 713 are stored in a log file 700.
  • Reference numerals 701 to 708 indicate logs stored corresponding to respective click inputs.
  • a cheater accesses an Internet search server 104 to request a search request.
  • the Internet search server 104 generates a search result web page.
  • the Internet search server 104 receives a click for a search item included in the search result web page.
  • the Internet search server 104 determines whether a session cookie file is stored in the client terminal 101. If it is determined that a session cookie file is not stored in the client terminal 101, the Internet search server 104 generates a new session identifier and then has its updating time and a session cookie file containing the session identifier stored in the client terminal 101. In this embodiment, a session identifier "xigw9492" and an updating time "10:50:14" are recorded. Moreover, a click time, an updating time, a session identifier and a site identifier corresponding to a search item are stored in the log file 700 as the log 701. In case where a session cookie file is generated for the first time, the session cookie file is generated upon click and a session identifier is also generated at that time. Thus, a click time and a session identifier updating time are same.
  • the Internet search server 104 determines whether a session cookie file is stored in the client terminal 101. Since the session cookie file generated in the above is already stored in the client terminal 101, the Internet search server 104 accesses a session cookie file stored in the client terminal 101.
  • the session cookie file stores a session identifier and the last-updated time of the session identifier therein. In this embodiment, a session identifier "xigw9492" and an updating time "10:50:14" are stored in the session cookie file.
  • the Internet search server 104 determines whether a click time for the search item from the searcher is within a predetermined time interval from the last-updated time associated with the session identifier.
  • a click time of the second click is "10:50:18". If the predetermined time interval is 5 seconds, the click time "10:50:18" is within the predetermined time interval from the last-updated time "10:50:14". As such, in this case, the session identifier stored in the session cookie file is used as a current session identifier and the session identifier of the session cookie file is not updated. As a result, in this case, for example, the log 702 is recorded.
  • the log 702 is an invalid click since it has the same session identifier and site identifier as the log 701.
  • the log 704 corresponds to a case where the cheater requests "reload”. Likewise in the event that the cheater requests "reload”, reference to the session cookie file stored in the client terminal 101 is made and the session identifier is not updated since the last-updated time stored in the session cookie file is within the predetermined time interval. Accordingly, for example, the log 704 is recorded. It is determined that the log 704 is an invalid click since it is same as the log 701. That is, according to this embodiment, it is possible to detect a case where a cheater clicks on the same search item after clicking on "reload" within a short time interval.
  • the log 705 corresponds to a case where a click for the same search item is received from a searcher different from the log 701, the log 702 and the log 704. In this case, it is not determined to be an invalid click since a new session identifier is allocated.
  • the log 709 corresponds to a case where a searcher same as the log 701 clicks on the same search item after a considerable time. In this case, as a click is received after a considerable time, it is not determined to be an invalid click.
  • a cheater clicks on the same search item after a predetermined time interval since a session identifier is generated it is determined to be an invalid click.
  • a case where a click is made within a predetermined time interval from the final click time for the same search item based on an invalid click decision may be an invalid click. This will be described in brief. If a click is received from a searcher, it is determined whether a session cookie file is stored in the terminal. If it is determined that a session cookie file is stored in the terminal, it is determined whether the click time for the search item from the searcher is within a predetermined time interval from the final click time associated with the session identifier.
  • a session identifier contained in the session cookie file is acquired and the final click time is updated with a click time for the search item.
  • the time reference is decided according to an object of detecting invalid clicks.
  • Fig. 8 is a flowchart illustrating a method of generating a session identifier according to an embodiment of the present invention.
  • Source data 801 are basic data for generating a session identifier 805.
  • the source data may be current time information, a search word, a product ID of a web browser of a searcher and the like.
  • the source data may be numbers that are randomly selected.
  • a hashing function 802 is applied to the source data 801 to generate an encoded string 803.
  • a checksum is then added to the encoded string 803 to generate a session identifier 805. The checksum serves to prevent a cheater from counterfeiting a session identifier.
  • the method for generating a session identifier may be applied to generate a page identifier, a site identifier, a terminal identifier to be described later, etc.
  • Fig. 9 is a flowchart illustrating a method for detecting invalid clicks according to an embodiment of the present invention.
  • the Internet search server 104 receives a click for a search item included in a search result web page from a searcher (step 901).
  • the Internet search server 104 acquires a client IP address corresponding to a terminal 101 of the searcher (step 902).
  • the IP address of the client may be extracted from a source IP address field of the received IP packet.
  • the Internet search server 104 acquires a site identifier corresponding to the clicked search item (step 903).
  • an apparatus for searching invalid clicks determines that the click is invalid if the client IP address and the site identifier are coincident with a client IP address and a site identifier associated with other clicks within a predetermined time interval.
  • Fig. 10 shows an exemplary log file according to an embodiment of the present invention.
  • a click time 1010, a client IP address 1011 and a site identifier 1012 are stored in a log file 1000.
  • Reference numerals 1001 to 1009 designate logs stored corresponding to respective click inputs.
  • a user continuously visits a web site within a short time, it is difficult to see it as a normal click. Thus, this case is determined to be an invalid click. For example, if the time reference is 5minutes, the log 1002, the log 1004 and the log 1005 having the same client IP address and the same site identifier as the log 1001 are determined to be invalid clicks. It is determined that a click related to the clicked log 1009 in about 20 minutes is a valid click. In the event that invalid clicks are determined based on the client IP address, there are some points to be cautious about.
  • this embodiment is constructed in combination with an embodiment using other parameters such as a session identifier.
  • Fig. 11 is a flowchart illustrating a method for detecting invalid clicks according to an embodiment of the present invention.
  • An Internet search server 104 receives a search request from a searcher (step 1101) and generates a search result web page (step 1102).
  • the Internet search server 104 determines whether a user cookie file including a terminal identifier is stored in the terminal (step 1103).
  • the Internet search server 104 As a result of the determination in step 1103, if the user cookie file including the terminal identifier is not stored in the terminal, the Internet search server 104 generates a terminal identifier (step 1104).
  • the Internet search server 104 generates the user cookie file including the terminal identifier and stores it in the terminal of the searcher (step 1105).
  • the Internet search server 104 extracts the terminal identifier from the user cookie file (step 1106).
  • the Internet search server 104 receives a click for a search item included in the search result web page from the searcher (step 1107) and then acquires a site identifier corresponding to the clicked search item (step 1108).
  • an apparatus for determining invalid clicks determines that if the terminal identifier and the site identifier are coincident with a terminal identifier and a site identifier associated with other clicks within a predetermined time interval, the click is invalid.
  • a client terminal uses a proxy server or an IP gateway, it is possible to discriminate the client's terminal using a terminal identifier.
  • a proxy server or an IP gateway it is possible to properly identify clicks from different clients.
  • the number of clicks of searchers per a search item for a predetermined time interval for search items included in a search result web page provided by an Internet search engine is greater than the average number of clicks of search items belonging to a category to which the search item belongs, it is considered to be an invalid click and is thus reported to an administrator.
  • An apparatus for detecting invalid clicks includes a click counter means for counting the number of clicks for the number of clicks of searchers per search item for a predetermined time interval for search items included in a search result web page provided by an Internet search engine, an average click-number calculation means for calculating the average number of clicks for a predetermined time interval of search items belonging to the category to which the search item belongs, and a decision means for determining whether the number of clicks per search item is greater by a predetermined difference than the average number of clicks. If the number of clicks per search item is greater by a predetermined difference than the average number of clicks, this fact is reported to the administrator through the invalid click report unit 308.
  • the number of clicks of searchers per a search item for a predetermined time interval for search items included in a search result web page provided by an Internet search engine is compared with the average number of clicks of a predetermined first number of search items located at the upper side of the search items and of a predetermined second number of search items located at the lower side of the search items in the search result web page for the predetermined time interval.
  • the number of clicks for a specific search item is compared with the number of clicks of two search items located immediately on the specific search item and two search items located immediately below the specific search item for the same period.
  • the method for determining the invalid click may be used independently or may be used in combination with other method for determining invalid clicks.
  • a rule wherein a case where a client IP address, a page identifier and a site identifier corresponding to a search item are repeated within 5 minutes from the final click for the search item is invalid, may be stored in the invalid click pattern storage unit 306.
  • the Internet search server and the apparatus for identifying unjust clicks have been confusedly described as a single unit. According to another embodiment of the present invention, however, it is to be noted that they can be separately implemented according to their functions and managed by different administrators.
  • components shown and described as separate components may be physically constructed in a single system and may physically constructed in a separate system.
  • embodiments of the present invention further relate to computer readable media that include program instructions for performing various computer- implemented operations.
  • the media may also include, alone or in combination with the program instructions, data files, data structures, tables, and the like.
  • the media and program instructions may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well known and available to those having skill in the computer software arts.
  • Examples of computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media such as floptical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory devices (ROM) and random access memory (RAM).
  • the media may also be a transmission medium such as optical or metallic lines, wave guides, etc. including a carrier wave transmitting signals specifying the program instructions, data structures, etc.
  • Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter.
  • Fig. 12 is a block diagram illustrating the construction of a general-purpose computer system that may be adopted in constructing a search engine server and an apparatus for detecting invalid clicks according to the present invention.
  • the computer system includes any number of processors 1240 (also referred to as central processing units, or CPUs) that are coupled to storage devices including primary storage 1260 (typically a random access memory, or "RAM"), primary storage 1270 (typically a read only memory, or "ROM").
  • primary storage 1260 acts to transfer data and instructions uni-directionally to the CPU and primary storage 1260 is used typically to transfer data and instructions in a bi-directional manner. Both of these primary storage devices may include any suitable type of the computer-readable media described above.
  • a mass storage device 1210 is also coupled bi-directionally to CPU 1240 and provides additional data storage capacity and may include any of the computer-readable media described above.
  • the mass storage device 1210 may be used to store programs, data and the like and is typically a secondary storage medium such as a hard disk that is slower than primary storage.
  • a specific mass storage device such as a CD-ROM 1220 may also pass data uni-directionally to the CPU.
  • Processor 1240 is also coupled to an interface 1230 that includes one or more input/output devices such as such as video monitors, track balls, mice, keyboards, microphones, touch-sensitive displays, transducer card readers, magnetic or paper tape readers, tablets, styluses, voice or handwriting recognizers, or other well-known input devices such as, of course, other computers.
  • processor 1240 optionally may be coupled to a computer or telecommunications network using a network connection as shown generally at 1250 With such a network connection, it is contemplated that the CPU might receive information from the network, or might output information to the network in the course of performing the above-described method steps.
  • a network connection it is contemplated that the CPU might receive information from the network, or might output information to the network in the course of performing the above-described method steps.
  • the hardware elements described above may be configured (usually temporarily) to act as one or more software modules for performing the operations of this invention.
  • a method and apparatus for detecting invalid clicks for a search item included in a search result web page provided by an Internet search engine server are provided.
  • a method and apparatus for detecting invalid clicks which can detect a variety of attempts for unduly increasing the number of clicks for a search item and can immediately cope with these attempts, are provided. That is, if an unjust click attempt of a new pattern is found, the pattern or rule is stored in an invalid click pattern storage unit according to the present invention. It is thus possible to immediately cope with this unjust click attempt following a new pattern.

Abstract

The present invention relates to an Internet search engine server. More particularly, the present invention relates to a method and apparatus for detecting invalid clicks for a search item included in a search result web page that is provided by an Internet search engine server. The present invention relates to a method for detecting invalid clicks in an Internet search engine, comprising the steps of generating a search result web page in response to a search request from a searcher; acquiring a page identifier corresponding to the generated web page; receiving a click for a search item included in the search result web page from the searcher; acquiring a site identifier corresponding to the clicked search item; and if the page identifier and the site identifier are coincident with a page identifier and a site identifier associated with other clicks within a predetermined time interval, determining the click to be invalid. According to the present invention, a method and apparatus for detecting invalid clicks that detects a variety of attempts to unduly increase the number of clicks for a search item and immediately coping with these attempts are provided.

Description

METHOD AND APPARATUS FOR DETECTING INVALID CLICKS ON THE
INTERNET SEARCH ENGINE
Technical Field
The present invention relates to an Internet search engine server. More particularly, the present invention relates to a method and apparatus for detecting invalid clicks for a search item included in a search result web page that is provided by an Internet search engine server. Furthermore, the present invention relates to a method and apparatus for detecting invalid clicks, which can detect a variety of attempts for unjustly increasing the number of clicks for a search item and can immediately cope with these attempts.
Background Art As the Internet has become widely used, the number of information sources such as web pages that are accessible via the Internet has grown in an arithmetic progression. Also, in order to find information among a great number of information sources, searchers access Internet search engine servers such as NAVER, Yahoo and Lycos to request a search. The Internet search service provider generates a search result web page including search items, which contain information associated with a search word input by the searcher, and then provides the searcher with the generated search result web page. For example, a search result web page when a searcher accesses a NAVER search engine server and then inputs the search words "digital camera", is shown in Fig. 2. Each item included in the search result web page is associated with URL (Uniform Resource Locator).
Since the number of search items associated with a single search word is innumerably great, however, how such innumerable search items can be displayed on a search result web page and in what order is a very important problem for Internet search service providers. An Internet search service provider determines a list sequence of search items by combining several references. One of the references that have been widely used is the number of clicks for a specific search item by users. For example, if the number of clicks for a search item by users is great, the search item is displayed relatively at an upper portion of a search result web page. Even in case where the Internet search service providers determine the list sequence of the search items by combining a plurality of parameters, if one of the parameters is the number of clicks by users, a search item having a large number of clicks is displayed relatively at an upper side of the search result web page.
Also, the higher the search result web page generated by an Internet search server is displayed, the higher the possibility that users may click and visit the web page. Thus, a network information provider of a web server wants search items associated with himself or herself to be displayed at the top of a search result web page. For this reason, in order for the search item for his or her web page to be displayed at the top of the search result web page, the network information provider may deliberately access the Internet search server to click the search item for his or her own web page multiple times. In some cases, the network information providers may continuously click the search item for his or her web page using a special program. Since such unjust clicks for a search item do not reflect the natural search results of users, an Internet search service provider has to detect such invalid clicks.
Of prior technologies, there are services in which a network information provider associated with a search item is charged based on the number of clicks per search item in a search result web page. Overture Services, Inc. (U.S.A.), an Internet search service provider, provides services wherein a network information provider pays per click when searchers click on a search item in a search result web page, which is associated with the network information provider. In this case, if a searcher intentionally clicks on a specific search item several times, the network information provider associated with the search item has to pay additional costs. Accordingly, even in this case, it is necessary to detect invalid clicks, which are made with the intention of only increasing only the number of clicks without actually searching for search item.
Disclosure of Invention
The present invention is conceived to solve the aforementioned problems in the prior art. An object of the present invention is to provide a method and apparatus for detecting invalid clicks for a search item included in a search result web page provided by an Internet search engine server. Another object of the present invention is to provide a method and apparatus for detecting invalid clicks, which can detect a variety of attempts for increasing the number of clicks for a search item unduly and can immediately cope with these attempts.
A further another object of the present invention is to provide a method and apparatus for detecting invalid clicks, wherein several identifiers provided in order to detect invalid clicks are difficult to be counterfeited or forged.
In order to accomplish the above objects and to solve the aforementioned problems of the prior art, the present invention provides a method for detecting invalid clicks in an Internet search engine, comprising the steps of generating a search result web page in response to a search request from a searcher, acquiring a page identifier corresponding to the generated web page, receiving a click for a search item included in the search result web page from the searcher, acquiring a site identifier corresponding to the clicked search item, and if the page identifier and the site identifier are coincident with a page identifier and a site identifier associated with other clicks within a predetermined time interval, determining that the click is invalid.
According to aspect of the present invention, there is provided a method for detecting invalid clicks in an Internet search engine, comprising the steps of generating a search result web page in response to a search request from a searcher, acquiring a session identifier included in a session cookie file stored in a terminal of the searcher, receiving a click for a search item included in the search result web page from the searcher, acquiring a site identifier corresponding to the clicked search item, and if the session identifier and the site identifier are coincident with a session identifier and a site identifier associated with other clicks within a predetermined time interval, determining that the click is invalid. According to aspect of the present invention, there is provided a method for detecting invalid clicks in an Internet search engine, comprising the steps of receiving a click for a search item included in a search result web page from a searcher, acquiring a client IP address corresponding to a terminal of the searcher, acquiring a site identifier corresponding to the clicked search item, and if the client IP address and the site identifier are coincident with a client IP address and a site identifier associated with other clicks within a predetermined time interval, determining that the click is invalid.
According to aspect of the present invention, there is provided a method for detecting invalid clicks in an Internet search engine, comprising the steps of generating a search result web page in response to a search request from a searcher, acquiring a terminal identifier corresponding to a terminal of the searcher, generating a user cookie file including the terminal identifier and then storing the user cookie file in the terminal of the searcher, receiving a click for a search item included in the search result web page from the searcher, acquiring a site identifier corresponding to the clicked search item, and if the terminal identifier and the site identifier are coincident with a terminal identifier and a site identifier associated with other clicks within a predetermined time interval, determining that the click is invalid. According to another aspect of the present invention, there is provided an apparatus for detecting invalid clicks, wherein if a searcher clicks on a search item included in a search result web page provided by an Internet search engine, at least one of an IP address of the searcher's terminal, a network address to which the searcher's terminal belongs, a search word associated with the search result web page, information on a web browser of the searcher, a click time associated with the click and cookie file information stored in the searcher's terminal, and URL information associated with the search item are received, and it is determined whether the click is invalid based on a predetermined reference according to the received information.
According to another aspect of the present invention, there is provided an apparatus for detecting invalid clicks, comprising (1) a log storage unit that, in response to a click of a searcher for a search item included in a search result web page provided by an Internet search engine, stores a log regarding at least two of the following: an IP address of the searcher's terminal, a network address to which the searcher's terminal belongs, a search word associated with the search result web page, information on a web browser of the searcher, a click time associated with the click, cookie file information stored in the searcher's terminal and URL information associated with the search item, (2) an invalid click pattern storage unit that stores an invalid click pattern associated with a pair of at least two of the following: the IP address of the searcher's terminal, the network address to which the searcher's terminal belongs, the search word associated with the search result web page, the information on the web browser of the searcher, the click time associated with the click, the cookie file information stored in the searcher's terminal, and URL information associated with the search item, and (3) an invalid click decision unit that determines whether the click of the search is an invalid click based on the log stored in the log storage unit and the invalid click pattern stored in the invalid click pattern storage unit.
According to another aspect of the present invention, there is provided an apparatus for detecting invalid clicks, comprising a click counter means for counting the number of clicks of a searcher per search item for a predetermined time interval for the search item included in a search result web page provided by an Internet search engine, an average click-number calculation means for calculating the average number of clicks, for the predetermined time interval, of search items belonging to a category to which the search item belongs, and a decision means for determining whether the number of clicks per search item is greater by a predetermined difference than the average number of clicks.
According to another aspect of the present invention, there is provided an apparatus for detecting invalid clicks, comprising a click counter means for counting the number of clicks of a searcher per a search item for a predetermined time interval for the search item included in a search result web page provided by an Internet search engine, an average click-number calculation means for calculating the average number of clicks of a predetermined first number of search items located at the upper side of the search items and of a predetermined second number of search items located at the lower side of the search items, in the search result web page for the predetermined time interval, and a decision means for determining whether the number of clicks per search item is greater by a predetermined difference than the average number of clicks.
Invalid clicks are difficult to accurately define, and the scope of the invalid clicks should be variously defined depending on embodiments and applications. However, invalid clicks may refer to clicks that are made with the intention of increasing only the number of clicks without the intention of an actual search.
Brief Description of Drawings
Fig. 1 is a diagram illustrating a network connection of an Internet search server including an apparatus for detecting invalid clicks and a client terminal according to the present invention.
Fig. 2 is a diagram illustrating search result web page generated internet search engine.
Fig. 3 is a block diagram illustrating the construction of an apparatus for detecting invalid clicks according to an embodiment of the present invention.
Fig. 4 is a flowchart illustrating a method for detecting invalid clicks according to an embodiment of the present invention.
Fig. 5 is shows an exemplary log file according to an embodiment of the present invention.
Figs. 6a and 6b are flowcharts illustrating a method for detecting invalid clicks according to an embodiment of the present invention. Fig. 7 shows an exemplary log file according to an embodiment of the present invention.
Fig. 8 is a flowchart illustrating a method of generating a session identifier according to an embodiment of the present invention.
Fig. 9 is a flowchart illustrating a method for detecting invalid clicks according to an embodiment of the present invention.
Fig. 10 shows an exemplary log file according to an embodiment of the present invention.
Fig. 11 is a flowchart illustrating a method for detecting invalid clicks according to an embodiment of the present invention. Fig. 12 is a block diagram illustrating the construction of a general-purpose computer system that may be adopted in constructing a search engine server and an apparatus for detecting invalid clicks according to the present invention.
Best Mode for Carrying Out the Invention Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Fig. 1 is a diagram illustrating a network connection of an Internet search server including an apparatus for detecting invalid clicks and a client terminal according to the present invention. A searcher or a cheater who will attempts unjust clicks accesses an Internet search server 104 through a client terminal 101 connected to an Internet 103. The cheater attempts to increase the number of clicks by clicking on a search item in a search result web page several times provided by the Internet search server 104, whose number of clicks needs to be increased. For example, in Fig. 2, it can be assumed that a search item 202 is a search item associated with http://www.invalidclick.com and a cheater clicks on the search item 202 continuously in order for the search item 202 to be displayed at the top of a search result web page.
A cookie file 102 is a specific text file that is stored in a hard disk of the client terminal 101 by the search engine server 104 or by other web site when the client terminal 101 is connected to the search engine server 104 or the other web site. In the HTTP protocol used for connection to a web site, each request for the web page is independent from other requests. Therefore, the web server has nothing information on which page has sent to the client terminal 101 previously or what work has performed together with the client terminal 101 previously. Accordingly, in order to correlate respective requests processed independently as such, a cookie file is provided. Such a cookie file serves to allow a web server to store information on a user in the user's computer. Even in this invention, in order to detect invalid clicks, several cookie files are used. This will be described in detail later on.
A log file 105 is a file for storing several logs related to a user's click pattern. In the present invention, in order to detect invalid clicks, several parameters are used. After parameters associated with respective clicks are stored in the log file, it is determined whether the input clicks are invalid based on predetermined rules and patterns.
Examples of the log files according to an embodiment of the present invention are shown in Figs. 5, 7, and 10.
Fig. 3 is a block diagram illustrating the construction of an apparatus for detecting invalid clicks according to an embodiment of the present invention.
The apparatus for detecting invalid clicks 301 according to an embodiment of the present invention comprises a parameter input unit 304, a log storage unit 305, an invalid click pattern storage unit 306, an invalid click verification unit 307, an invalid click report unit 308 and an invalid click decision unit 309. If a searcher clicks on a search item included in a search result web page provided by an Internet search engine, several parameters 302 associated with the click are input to the parameter input unit 304. The parameters are basic information for determining invalid clicks and include an IP address of the searcher's terminal, a network address to which the searcher's terminal belongs, a search word associated with the search result web page, information on a web browser of the searcher, a click time associated with the click, cookie file information stored in the searcher's terminal, URL information associated with the search item and the like.
If the searcher requests a search to the Internet search engine server 104, a search request packet is transferred from the client terminal 101 to the Internet search engine server 104. The search request packet contains a packet configuration according to the HTTP protocol and is also contained in the Internet (IP: Internet Protocol) packet. Since a source IP address field is contained in the configuration of the Internet protocol packet, the Internet search engine server 104 extracts a source IP address from the search request packet for which a click is requested, thus extracting an IP address of the searcher's terminal.
The front part of the source IP address is a network address to which the searcher's terminal belongs. The IP address is composed of 4 bytes. The front part of the IP address is a network address for identifying a network to which a searcher's terminal belongs and the remaining parts thereof are addresses for identifying the searcher's terminal within the network. Accordingly, a network address is extracted from the source IP address. According to an embodiment of the present invention, the 3 bytes at the front part of the IP address are considered a network address and the network address is obtained from the source IP address. For example, if a source IP address is 123.45.67.89, 123.45.67 is extracted as a network address.
A search word associated with a search result web page is a value input to the Internet search server 104 by the searcher. Information on a web browser of the searcher is information on a web browser, which is installed in the client terminal 101 of the searcher and is used to access the Internet search server 104. Information on the web browser includes the type of web browser, the version of the web browser, product ID of the web browser, etc. In particular, even when a plurality of searchers has web browsers of the same type and the same version, the product IDs of their web browsers may be different. Thus, it becomes useful information for identifying a searcher's terminal.
According to the HTTP protocol used for connection to a web, some of environment parameters of a client are transferred to a web server with them included in the HTTP packet. A program (a search engine program) of the web server can receive the environmental parameters and can use the parameters to detect invalid clicks.
Such enviromnental parameters include the following information:
REMOTE_HOST: domain name of a connected person
REMOTE_ADDR: IP address of a connected client host
REMOTE JSER: name of a connected person (displayed in case of a web server whose user authentication is set) REMOTE_IDENT: ID of a connected person (displayed in case of a web server whose user authentication is set)
HTTP_USER_AGENT: registration information on a program driven by a connected person, usually the name of a browser
HTTP_ACCEPT_LANGUAGE: language used by a connected person HTTP_REFERER: name of document that calls a corresponding CGI program
REQUEST_METHOD: method for transmitting data to a sever (GET, POST)
QUERY_STRING: parameters wherein transmitted data are stored when the data are sent in a GET mode
CONTENTJ ENGTH: total length of transmitted data (the number of byte) when the data are sent in POST mode
CONTENT TYPE: type of MIME of data when the data are transmitted in a POST mode
AUTH_TYPE: parameters for confirming a user's authority
SERVER JSIAME: domain name of a current server SERVER_SOFTWARE: name of a web server program currently installed on a server
SERVER_PROTOCOL: name and version of a web protocol currently used by a server
SERVER_PORT: port number (in case of HTTP, usually 80) currently used by a server
PATH_INFO: information of a current path of a called CGI program
PATH_TRANSLATED: information on a current path of resources in a web server required by web
SCRIPT MAME: name of a CGI program that is being currently called HTTP_ACCEPT: type of resources that can currently receive in HTTP
A click time associated with a click of a searcher is a time when a click input from the searcher is received. According to another embodiment of the present invention, other time associated with the click time of the searcher may be used. For example, a time when a searcher actually input a click in a client may be used.
Information on a cookie file stored in the searcher's terminal is obtained by the Internet search server 104 which accesses the cookie file 102 stored in the client terminal 101. In the present invention, the cookie file 102 may be used for a variety uses. This will be described in detail with reference to other embodiments.
URL information associated with the search item clicked by the searcher can be obtained by referring to the search database since it is stored in a search database (not shown) associated with the search engine server 104. The URL information may be a domain name of a web server or information containing a domain name, a directory and a file name. For instance, http://www.naver.com and http://www.naver.com/download are the same since they are www.naver.com in view of the domain name, but have different URLs. In the present invention, an embodiment using URL up to the domain name has been described for convenience of explanation. However, the present invention covers all the embodiments wherein URLs are considered different search items if the URLs have different directories though their domain names are same since they include all of a domain name, a directory and a file name. Moreover, it is to be understood that in the present invention, URL information includes all the embodiments according to this description.
Furthermore, in addition to the aforementioned parameters, other parameters, which are useful for detection of invalid clicks, may be used to detect invalid clicks within the spirit of the present invention.
This variety of the parameters 302 as above is input to the parameter input unit 304. The parameters are again stored in the log storage unit 305. According to the present invention, examples of the logs stored in the log storage unit are shown in Figs. 5, 7 and 10. In these drawings, logs including only some parameters are shown for convenience of explanation. According to another embodiment of the present invention, however, logs including all or some of the parameters 302 may be stored in the log storage unit 305.
According to an embodiment of the present invention, the log storage unit 305 stores therein a log regarding at least two of an IP address of a searcher's terminal, a network address to which the searcher's terminal belongs, a search word associated with the search result web page, information on a web browser of the searcher, a click time associated with the click, cookie file information stored in the searcher's terminal, and URL information associated with the search item. According to a preferred embodiment of the present invention, the log storage unit 305 stores therein a log regarding at least one of an IP address of a searcher's terminal, a network address to which the searcher's terminal belongs, a search word associated with the search result web page, information on a web browser of the searcher, a click time associated with the click and cookie file information stored in the searcher's terminal, and URL information associated with the search item.
The invalid click pattern storage unit 306 stores therein invalid click patterns or rules associated with a pair of at least two of an IP address of a searcher's terminal, a network address to which the searcher's terminal belongs, a search word associated with the search result web page, information on a web browser of the searcher, a click time associated with the click, cookie file information stored in the searcher's terminal, and URL information associated with the search item. For example, a rule or pattern that "both the IP address of the searcher's terminal and URL information associated with a search item, among click inputs for 10 minutes, are coincident to each other, may be stored in the invalid click pattern storage unit 306. As such, the rule, etc. for determining invalid clicks, which is stored in the invalid click pattern storage unit 306, may be stored in the form of a file using a predetermined language according to a predetermined rule. Or, in case of the above rule or pattern, it may be stored in the form of a program so that it is determined to be an invalid click.
The invalid click decision unit 309 determines whether the searcher's clicks are invalid based on the log stored in the log storage unit 305 and the invalid click pattern stored in the invalid click pattern storage unit 306.
The invalid click report unit 308 reports clicks pursuant to a predetermined reference among clicks, which are determined to be invalid by the invalid click decision unit 309, to the administrator 303 of the Internet search engine. According to an embodiment of the present invention, the invalid click report unit 308 reports all the clicks, which are determined to be invalid by the invalid click decision unit 309, to the administrator of the Internet search engine. In this case, the predetermined reference is all clicks that have been determined to be invalid by the invalid click decision unit 309. According to another embodiment of the present invention, a field indicating whether to report a case corresponding to the rule or pattern to the administrator 303 is stored in every rule or pattern stored in the invalid click pattern storage unit 306. In this case, in case of a case corresponding to a rule where the administrator 303 must be informed, the invalid click report unit 308 reports it to the administrator 303.
The invalid click verification unit 307 allows the administrator 303 to change clicks, which have been determined to be the invalid by the invalid click decision unit 309, to valid clicks. Since the invalid click verification unit 307 can change clicks that are erroneously determined to be invalid clicks to valid clicks, invalid clicks can be more accurately determined.
Fig. 4 is a flowchart illustrating a method for detecting invalid clicks according to an embodiment of the present invention.
The Internet search server 104 receives a search request from a searcher (step 401). If the searcher accesses the Internet search server 104 and then inputs a search word, the search word is transferred to the Internet search server 104 as a search request packet.
The Internet search server 104 generates a search result web page in response to the search request (step 402). For example, as shown in Fig. 2, a search result web page including a plurality of search items corresponding to an input of a search word by a searcher is provided to the searcher.
A page identifier corresponding to the generated search result web page is acquired (step 403). A page identifier is generated whenever a search result web page is generated. The page identifier is an identifier for identifying the search result web page. Accordingly, if the same searcher requests a search by repeatedly inputting the same search word in a search window of the Internet search server 104, a new page identifier is allocated every time. Likewise, if a searcher clicks "reload" in a web browser on which a search result web page is displayed, the Internet search server 104 allocates a new page identifier for the search result web page since the search request packet is transferred from the client terminal 101 to the Internet search server 104. It may be that different page identifiers are allocated to search result web pages that look the same at first sight. However, if a new search request is received from the client terminal 101, a search result web page is newly generated at that time. A search result web page different from a previous search result web page can be thus provided.
In step 404, the Internet search server 104 receives a click for a search item included in the search result web page from the searcher. If the click is received, the Internet search server 104 allows a hyperlink for the search item to approach the Internet search server 104, allows the Internet search server 104 to perform the necessary processes and then allows a client terminal to access a web site corresponding to the search item. For example, in case where http://www.naver.com/abc/*http://www.invalidclick.com/ is prepared as a hyperlink of a search item corresponding to "http//www. invalidclick.com/", if a searcher clicks the search item, the search is allowed to access a search server called http://www.naver.com. The search server allows a client terminal to access the http://www.invalidclick.com according to URL located at the rear side of the hyperlink.
The Internet search server 104 acquires a site identifier corresponding to he clicked search item (step 405). The site identifier is an identifier for identifying a search item and is generated based on URL information corresponding to a search item. According to another embodiment of the present invention, the site identifier uses intact URL information corresponding to the search item. URL information used as basic information for generating the site identifier may be a domain name of a web server or information containing a domain name, a directory and a file name. For instance, http://www.naver.com and http://www.naver.com/download are same since they are both www.naver.com from the viewpoint of a domain name, but are different from the viewpoint of a URL. In the present invention, an embodiment using URLs up to the domain name has been described for convenience of explanation. However, the present invention covers all the embodiments wherein URLs are considered different search items if the URLs have different directories though their domain names are same since they include not only a domain name, but also a directory and a file name. Moreover, it is to be understood that in the present invention, URL information includes all the embodiments according to this description.
In step 406, the apparatus for detecting invalid clicks determines that the click is invalid if the page identifier and the site identifier are coincident with a page identifier and a site identifier associated with other click within a predetermined time interval.
Fig. 5 is shows an exemplary log file according to an embodiment of the present invention. The embodiment of Fig. 4 will be described with reference to Fig. 5.
According to the present invention, whenever a click for a search item is received from a user, a page identifier 509 and a site identifier 510 are stored in a log file 500. Reference numerals 501 to 508 indicate logs stored for respective click inputs.
A cheater accesses the Internet search server 104 to request a search. The Internet search server 104 generates a search result web page and generates a page identifier corresponding to a search result web page, "nCe249sisnO". The cheater continuously clicks a specific search item included in the search result web page. Even though a specific search item in a search result web page generated once is continuously clicked, a page identifier is not newly generated. Thus, the page identifier continues to have the same value.
It is thus determined that the log 501, the log 502 and the log 504 having the same page identifier and the same site identifier, among logs for clicks input for a predetermined time interval, are invalid clicks. According to an embodiment of the present invention, it is determined that one of the coincident logs is an invalid click and the remaining logs are invalid clicks.
The cheater may update the search result web page by clicking "reload" in the web browser. In this case, a page identifier is newly allocated and a log regarding the page identifier is the log 505. Thereafter, a case where the cheater clicks on the same search item corresponds to the log 506.
Therefore, according to this embodiment, if the cheater clicks on "reloads" and then clicks on the same search item (in case of the log 506), it is not determined to be an invalid click. As such, a method for determining a case of "reload" to be an invalid click will be described in the following embodiment with reference to Fig. 6.
Figs. 6a and 6b are flowcharts illustrating a method for detecting invalid clicks according to an embodiment of the present invention. The Internet search server 104 receives a search request from a searcher (step 601). The Internet search server 104 generates a search result web page in response to the search request (step 602).
An apparatus for determining invalid clicks determines whether a session cookie file is stored in the client terminal 101 that requested the search (step 603). Step 603 to step 611 are processes for obtaining a session identifier.
If it is determined that a session cookie file is not stored in the client terminal
101, the apparatus for determining invalid clicks generates a new session identifier (step
604). In step 605, a session cookie file containing the session identifier is stored in the client terminal 101. An updating time of the session identifier is also stored in the session cookie file. The updating time is stored in the session cookie file (step 609).
If it is determined that the session cookie file is stored in the client terminal 101 in step 602, the apparatus for determining invalid clicks determines whether the last- updated time of the session identifier contained in the session cookie file is within a predetermined time interval (step 606).
As a result of the determination in step 606, if the last-updated time of the session identifier contained in the session cookie file is within the predetermined time interval, the apparatus for determining invalid clicks extracts a session identifier contained in the session cookie file (step 607). As a result of the determination in step 606, if the last-updated time of the session identifier contained in the session cookie file is not within the predetermined time interval, the apparatus for determining invalid clicks generates a new session identifier (step 608). A session identifier contained in the session cookie file is updated with a newly created session identifier (step 610). An updating time of the session identifier is stored in the session cookie file (step 611).
The Internet search server 104 receives a click for a search item included in the search result web page from the searcher (step 612).
The Internet search server 104 acquires a site identifier corresponding to the clicked search item (step 613). The apparatus for detecting invalid clicks determines that the click is an invalid click if the session identifier and the site identifier are coincident with a session identifier and a site identifier associated with other clicks within a predetermined time interval (step 614).
Fig. 7 shows an exemplary log file according to an embodiment of the present invention.
In this embodiment, whenever a click for a search item is received from a user, a click time 710, an updating time 711 of a session identifier, a session identifier 712 and a site identifier 713 are stored in a log file 700. Reference numerals 701 to 708 indicate logs stored corresponding to respective click inputs.
A cheater accesses an Internet search server 104 to request a search request. The Internet search server 104 generates a search result web page. The Internet search server 104 receives a click for a search item included in the search result web page.
The Internet search server 104 determines whether a session cookie file is stored in the client terminal 101. If it is determined that a session cookie file is not stored in the client terminal 101, the Internet search server 104 generates a new session identifier and then has its updating time and a session cookie file containing the session identifier stored in the client terminal 101. In this embodiment, a session identifier "xigw9492" and an updating time "10:50:14" are recorded. Moreover, a click time, an updating time, a session identifier and a site identifier corresponding to a search item are stored in the log file 700 as the log 701. In case where a session cookie file is generated for the first time, the session cookie file is generated upon click and a session identifier is also generated at that time. Thus, a click time and a session identifier updating time are same.
A cheater clicks on the same search item in the same search result page. The Internet search server 104 determines whether a session cookie file is stored in the client terminal 101. Since the session cookie file generated in the above is already stored in the client terminal 101, the Internet search server 104 accesses a session cookie file stored in the client terminal 101. The session cookie file stores a session identifier and the last-updated time of the session identifier therein. In this embodiment, a session identifier "xigw9492" and an updating time "10:50:14" are stored in the session cookie file. The Internet search server 104 determines whether a click time for the search item from the searcher is within a predetermined time interval from the last-updated time associated with the session identifier. In this embodiment, a click time of the second click is "10:50:18". If the predetermined time interval is 5 seconds, the click time "10:50:18" is within the predetermined time interval from the last-updated time "10:50:14". As such, in this case, the session identifier stored in the session cookie file is used as a current session identifier and the session identifier of the session cookie file is not updated. As a result, in this case, for example, the log 702 is recorded.
It is thus determined that the log 702 is an invalid click since it has the same session identifier and site identifier as the log 701.
The log 704 corresponds to a case where the cheater requests "reload". Likewise in the event that the cheater requests "reload", reference to the session cookie file stored in the client terminal 101 is made and the session identifier is not updated since the last-updated time stored in the session cookie file is within the predetermined time interval. Accordingly, for example, the log 704 is recorded. It is determined that the log 704 is an invalid click since it is same as the log 701. That is, according to this embodiment, it is possible to detect a case where a cheater clicks on the same search item after clicking on "reload" within a short time interval.
The log 705 corresponds to a case where a click for the same search item is received from a searcher different from the log 701, the log 702 and the log 704. In this case, it is not determined to be an invalid click since a new session identifier is allocated. The log 709 corresponds to a case where a searcher same as the log 701 clicks on the same search item after a considerable time. In this case, as a click is received after a considerable time, it is not determined to be an invalid click.
According to this embodiment, a case where a cheater clicks on the same search item after a predetermined time interval since a session identifier is generated, it is determined to be an invalid click.
Likewise, according to another embodiment of the present invention, it is determined that a case where a click is made within a predetermined time interval from the final click time for the same search item based on an invalid click decision may be an invalid click. This will be described in brief. If a click is received from a searcher, it is determined whether a session cookie file is stored in the terminal. If it is determined that a session cookie file is stored in the terminal, it is determined whether the click time for the search item from the searcher is within a predetermined time interval from the final click time associated with the session identifier.
If it is determined that the click time for the search item is within the predetermined time interval, a session identifier contained in the session cookie file is acquired and the final click time is updated with a click time for the search item.
If it is determined that the click time for the search item is not within the predetermined time interval, a new session identifier is generated to update the session identifier contained in the session cookie file. Further, the final click time is updated with a click time for the search item. For example, in Fig. 1, in case where there are a plurality of clicks for the same search item from the same client terminals, if it is determined that a case where 5 seconds has passed from the final click is valid, the click associated with the log 704 is determined to be valid since it is made on "10:50:31" 13 seconds after the previous final click time "10:50:18". According to a preferred embodiment of the present invention, the time reference is decided according to an object of detecting invalid clicks.
Fig. 8 is a flowchart illustrating a method of generating a session identifier according to an embodiment of the present invention.
The session identifier must be uniquely allocated so that it is discriminated from other session identifiers and must be difficult to be counterfeited or forged. In case where the session identifier is uniquely allocated only, there is a possibility that a cheater may virtually generate a session identifier and then stores the session identifier in a session cookie, or may unduly increase the number of clicks using a program that is driven to continuously click a search item while changing a session identifier. Source data 801 are basic data for generating a session identifier 805. The source data may be current time information, a search word, a product ID of a web browser of a searcher and the like. The source data may be numbers that are randomly selected. A hashing function 802 is applied to the source data 801 to generate an encoded string 803. A checksum is then added to the encoded string 803 to generate a session identifier 805. The checksum serves to prevent a cheater from counterfeiting a session identifier.
The method for generating a session identifier according to this embodiment may be applied to generate a page identifier, a site identifier, a terminal identifier to be described later, etc.
Fig. 9 is a flowchart illustrating a method for detecting invalid clicks according to an embodiment of the present invention. The Internet search server 104 receives a click for a search item included in a search result web page from a searcher (step 901). The Internet search server 104 acquires a client IP address corresponding to a terminal 101 of the searcher (step 902).
The IP address of the client may be extracted from a source IP address field of the received IP packet. The Internet search server 104 acquires a site identifier corresponding to the clicked search item (step 903).
In step 904, an apparatus for searching invalid clicks determines that the click is invalid if the client IP address and the site identifier are coincident with a client IP address and a site identifier associated with other clicks within a predetermined time interval.
Fig. 10 shows an exemplary log file according to an embodiment of the present invention.
In this embodiment, whenever a click for a search item is received from a user, a click time 1010, a client IP address 1011 and a site identifier 1012 are stored in a log file 1000. Reference numerals 1001 to 1009 designate logs stored corresponding to respective click inputs.
If users of the same client terminals continuously click on the same search item, there is a high possibility that the clicks may be invalid if the clicks are repeated within a predetermined time interval. However, there is often a case where users of the same client terminals click on the same search item after a considerable time. In other words, there is a tendency that a user often visits a web site in which the user is very interested.
If a user continuously visits a web site within a short time, it is difficult to see it as a normal click. Thus, this case is determined to be an invalid click. For example, if the time reference is 5minutes, the log 1002, the log 1004 and the log 1005 having the same client IP address and the same site identifier as the log 1001 are determined to be invalid clicks. It is determined that a click related to the clicked log 1009 in about 20 minutes is a valid click. In the event that invalid clicks are determined based on the client IP address, there are some points to be cautious about. In a case where a client terminal uses a proxy server or an IP gateway, there is a danger that even though a cheater clicks on the same search item as other client terminal, it may be determined as an invalid click. Accordingly, it is preferred that this embodiment is constructed in combination with an embodiment using other parameters such as a session identifier.
On the contrary, there is a case where client IP addresses of client terminals that click on the same search item are different but their network addresses are same. This corresponds to a case where several persons continuously attempt unjust clicks at one place or click on the same search item using a program while changing their source IP addresses. In this case, if network addresses of the client terminals that click on the same search item are same and other conditions (for example, a condition that the number of clicks is greater than the average number of clicks within a directory to which the search item belongs) are satisfied, this may be determined to be an invalid click. Fig. 11 is a flowchart illustrating a method for detecting invalid clicks according to an embodiment of the present invention.
An Internet search server 104 receives a search request from a searcher (step 1101) and generates a search result web page (step 1102).
The Internet search server 104 determines whether a user cookie file including a terminal identifier is stored in the terminal (step 1103).
As a result of the determination in step 1103, if the user cookie file including the terminal identifier is not stored in the terminal, the Internet search server 104 generates a terminal identifier (step 1104).
The Internet search server 104 generates the user cookie file including the terminal identifier and stores it in the terminal of the searcher (step 1105).
As a result of the determination in step 1103, if the user cookie file including the terminal identifier is stored in the terminal, the Internet search server 104 extracts the terminal identifier from the user cookie file (step 1106).
The Internet search server 104 receives a click for a search item included in the search result web page from the searcher (step 1107) and then acquires a site identifier corresponding to the clicked search item (step 1108).
Finally, in step 1109, an apparatus for determining invalid clicks determines that if the terminal identifier and the site identifier are coincident with a terminal identifier and a site identifier associated with other clicks within a predetermined time interval, the click is invalid.
According to this embodiment, even though a client terminal uses a proxy server or an IP gateway, it is possible to discriminate the client's terminal using a terminal identifier. Thus, even if different client terminals use a proxy server or an IP gateway, it is possible to properly identify clicks from different clients.
In another embodiment of the present invention, if the number of clicks of searchers per a search item for a predetermined time interval for search items included in a search result web page provided by an Internet search engine is greater than the average number of clicks of search items belonging to a category to which the search item belongs, it is considered to be an invalid click and is thus reported to an administrator.
An apparatus for detecting invalid clicks according to this embodiment includes a click counter means for counting the number of clicks for the number of clicks of searchers per search item for a predetermined time interval for search items included in a search result web page provided by an Internet search engine, an average click-number calculation means for calculating the average number of clicks for a predetermined time interval of search items belonging to the category to which the search item belongs, and a decision means for determining whether the number of clicks per search item is greater by a predetermined difference than the average number of clicks. If the number of clicks per search item is greater by a predetermined difference than the average number of clicks, this fact is reported to the administrator through the invalid click report unit 308. According to another embodiment of the present invention, the number of clicks of searchers per a search item for a predetermined time interval for search items included in a search result web page provided by an Internet search engine is compared with the average number of clicks of a predetermined first number of search items located at the upper side of the search items and of a predetermined second number of search items located at the lower side of the search items in the search result web page for the predetermined time interval. For example, the number of clicks for a specific search item is compared with the number of clicks of two search items located immediately on the specific search item and two search items located immediately below the specific search item for the same period. As a result of the comparison, if the number of clicks for the specific search item is greater, for example, by 5 times than the number of clicks for surrounding other search items, there is a high possibility that it will be an invalid click and is thus reported as such to an administrator.
A variety of methods for determining invalid clicks have been described in the above. The method for determining the invalid click may be used independently or may be used in combination with other method for determining invalid clicks. For example, a rule wherein a case where a client IP address, a page identifier and a site identifier corresponding to a search item are repeated within 5 minutes from the final click for the search item is invalid, may be stored in the invalid click pattern storage unit 306.
In the present invention, the Internet search server and the apparatus for identifying unjust clicks have been confusedly described as a single unit. According to another embodiment of the present invention, however, it is to be noted that they can be separately implemented according to their functions and managed by different administrators.
Furthermore, in the present invention, components shown and described as separate components may be physically constructed in a single system and may physically constructed in a separate system.
Moreover, although several embodiments have been described in the present invention, it will be evident to those skilled in the art that some of the plurality of the embodiments or the remaining of the embodiments also fall within the spirit of the present invention. In addition, embodiments of the present invention further relate to computer readable media that include program instructions for performing various computer- implemented operations. The media may also include, alone or in combination with the program instructions, data files, data structures, tables, and the like. The media and program instructions may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well known and available to those having skill in the computer software arts. Examples of computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media such as floptical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory devices (ROM) and random access memory (RAM). The media may also be a transmission medium such as optical or metallic lines, wave guides, etc. including a carrier wave transmitting signals specifying the program instructions, data structures, etc. Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter.
Fig. 12 is a block diagram illustrating the construction of a general-purpose computer system that may be adopted in constructing a search engine server and an apparatus for detecting invalid clicks according to the present invention.
The computer system includes any number of processors 1240 (also referred to as central processing units, or CPUs) that are coupled to storage devices including primary storage 1260 (typically a random access memory, or "RAM"), primary storage 1270 (typically a read only memory, or "ROM"). As is well known in the art, primary storage 1260 acts to transfer data and instructions uni-directionally to the CPU and primary storage 1260 is used typically to transfer data and instructions in a bi-directional manner. Both of these primary storage devices may include any suitable type of the computer-readable media described above. A mass storage device 1210 is also coupled bi-directionally to CPU 1240 and provides additional data storage capacity and may include any of the computer-readable media described above. The mass storage device 1210 may be used to store programs, data and the like and is typically a secondary storage medium such as a hard disk that is slower than primary storage. A specific mass storage device such as a CD-ROM 1220 may also pass data uni-directionally to the CPU. Processor 1240 is also coupled to an interface 1230 that includes one or more input/output devices such as such as video monitors, track balls, mice, keyboards, microphones, touch-sensitive displays, transducer card readers, magnetic or paper tape readers, tablets, styluses, voice or handwriting recognizers, or other well-known input devices such as, of course, other computers. Finally, processor 1240 optionally may be coupled to a computer or telecommunications network using a network connection as shown generally at 1250 With such a network connection, it is contemplated that the CPU might receive information from the network, or might output information to the network in the course of performing the above-described method steps. The above- described devices and materials will be familiar to those of skill in the computer hardware and software arts.
The hardware elements described above may be configured (usually temporarily) to act as one or more software modules for performing the operations of this invention.
Industrial Applicability
According to the present invention described above, a method and apparatus for detecting invalid clicks for a search item included in a search result web page provided by an Internet search engine server are provided.
According to the present invention, a method and apparatus for detecting invalid clicks, which can detect a variety of attempts for unduly increasing the number of clicks for a search item and can immediately cope with these attempts, are provided. That is, if an unjust click attempt of a new pattern is found, the pattern or rule is stored in an invalid click pattern storage unit according to the present invention. It is thus possible to immediately cope with this unjust click attempt following a new pattern.
Moreover, according to the present invention, a method and apparatus for detecting invalid clicks, which can prevent several identifiers provided in order to detect invalid clicks from being counterfeited or forged, are provided. Although the present invention has been described in connection with the embodiment of the present invention illustrated in the accompanying drawings, it is not limited thereto since it will be apparent to those skilled in the art that various substitutions, modifications and changes may be made thereto. The scope of the present invention is defined by the appended claimed. All changes or modifications or their equivalents made within the meanings and scope of the claims should be construed as falling within the scope of the present invention.

Claims

1. A method for detecting invalid clicks in an Internet search engine, comprising the steps of: (a) generating a search result web page in response to a search request from a searcher;
(b) acquiring a page identifier corresponding to the generated web page;
(c) receiving a click for a search item included in the search result web page from the searcher; (d) acquiring a site identifier corresponding to the clicked search item; and
(e) if the page identifier and the site identifier are coincident with a page identifier and a site identifier associated with other clicks within a predetermined time interval, determining that the click is invalid.
2. The method as claimed in claim 1, wherein the page identifier and the site identifier comprise a checksum.
3. A method for detecting invalid clicks in an Internet search engine, comprising the steps of: generating a search result web page in response to a search request from a searcher; acquiring a session identifier included in a session cookie file stored in a terminal of the searcher; receiving a click for a search item included in the search result web page from the searcher; acquiring a site identifier corresponding to the clicked search item; and if the session identifier and the site identifier are coincident with a session identifier and a site identifier associated with other clicks within a predetermined time interval, determining that the click is invalid.
4. The method as claimed in claim 3, wherein the step of acquiring the session identifier included in the session cookie file stored in the terminal of the searcher comprises the steps of: determining whether the session cookie file is stored in the terminal; and if it is determined that the session cookie file is not stored in the terminal, generating a new session identifier and then storing a session cookie file including the generated session identifier in the terminal.
5. The method as claimed in claim 4, further comprising the steps of: if it is determined that the session cookie file is stored in the terminal, determining whether the last-updated time of a session identifier included in the session cookie file is within a predetermined time interval; and if it is determined that the last-updated time is within the predetermined time interval, acquiring a session identifier included in the session cookie file.
6. The method as claimed in claim 5, further comprising the steps of: if it is determined that the last-updated time is not within the predetermined time interval, by generating a new session identifier, updating the session identifier included in the session cookie file; and storing time of updating the session identifier in the session cookie file.
7. The method as claimed in claim 4, further comprising the steps of: if it is determined that the session cookie file is stored in the terminal, determining whether a click time for the search item from the searcher is within a predetermined time interval after the final click time associated with the session identifier; if it is determined that the click time for the search item is within the predetermined time interval after the final click time, acquiring a session identifier included in the session cookie file; and updating the final click time with the click time for the search item.
8. The method as claimed in claim 7, further comprising the steps of: if it is determined that the click time for the search item is not within the predetermined period time after the final click time, by generating a new session identifier, updating the session identifier included in the session cookie file; and updating the final click time with the click time for the search item.
9. The method as claimed in any one of claims 3 to 8, wherein the session identifier and the site identifier comprise a checksum.
10. A method for detecting invalid clicks in an Internet search engine, comprising the steps of: receiving a click for a search item included in a search result web page from a searcher; acquiring a client IP address corresponding to a terminal of the searcher; acquiring a site identifier corresponding to the clicked search item; and if the client IP address and the site identifier are coincident with a client IP address and a site identifier associated with other clicks within a predetermined time interval, determining that the click is invalid.
11. The method as claimed in claim 10, wherein the site identifier is generated with a checksum included therein.
12. A method for detecting invalid clicks in an Internet search engine, comprising the steps of: generating a search result web page in response to a search request from a searcher; acquiring a terminal identifier corresponding to a terminal of the searcher; generating a user cookie file including the terminal identifier and then storing the user cookie file in the terminal of the searcher; receiving a click for a search item included in the search result web page from the searcher; acquiring a site identifier corresponding to the clicked search item; and if the terminal identifier and the site identifier are coincident with a terminal identifier and a site identifier associated with other clicks within a predetermined time interval, determining that the click is invalid.
13. The method as claimed in claim 12, further comprising the steps of: determining whether a user cookie file including the terminal identifier is stored in the terminal; and if it is determined that the user cookie file including the terminal identifier is stored in the terminal, receiving the terminal identifier from the user cookie file.
14. The method as claimed in claim 12 or 13, wherein the terminal identifier and the site identifier comprises a checksum.
15. A computer-readable recording medium in which a program for implementing a method according to any one of claims 1 to 8 and 10 to 13 is recorded.
16. An apparatus for detecting invalid clicks, wherein: if a searcher clicks on a search item included in a search result web page provided by an Internet search engine, at least one of an IP address of the searcher's terminal, a network address to which the searcher's terminal belongs, a search word associated with the search result web page, information on a web browser of the searcher, a click time associated with the click and cookie file information stored in the searcher's terminal, and URL information associated with the search item are received, and it is determined whether the click is invalid according to a predetermined reference based on the received information.
17. An apparatus for detecting invalid clicks, comprising: a log storage unit that, in response to a click of a searcher for a search item included in a search result web page provided by an Internet search engine, stores a log regarding at least two of the foUowings: an IP address of the searcher's terminal, a network address to which the searcher's terminal belongs, a search word associated with the search result web page, information on a web browser of the searcher, a click time associated with the click, cookie file information stored in the searcher's terminal and URL information associated with the search item; an invalid click pattern storage unit that stores an invalid click pattern associated with a pair of at least two of the foUowings: the IP address of the searcher's terminal, the network address to which the searcher's terminal belongs, the search word associated with the search result web page, the information on the web browser of the searcher, the click time associated with the click, the cookie file information stored in the searcher's terminal, and URL information associated with the search item; and an invalid click decision unit that determines whether the click of the searcher is an invalid click based on the log stored in the log storage unit and the invalid click pattern stored in the invalid click pattern storage unit.
18. The apparatus as claimed in claim 17, further comprising an invalid click report unit for reporting a click that satisfies a predetermined reference among clicks that are determined to be invalid, to an administrator of the Internet search engine.
19. The apparatus as claimed in claim 18, further comprising an invalid click verification unit that changes the invalid click to a valid click according to an input of the administrator.
20. An apparatus for detecting invalid clicks, comprising: a click counter means for counting the number of clicks of a searcher per search item for a predetermined time interval for the search item included in a search result web page provided by an Internet search engine; an average click-number calculation means for calculating the average number of clicks, for the predetermined time interval, of search items belonging to a category to which the search item belongs; and a decision means for determining whether the number of clicks per search item is greater by a predetermined difference than the average number of clicks.
21. An apparatus for detecting invalid clicks, comprising: a click counter means for counting the number of clicks of a searcher per a search item for a predetermined time interval for the search item included in a search result web page provided by an Internet search engine; an average click-number calculation means for calculating the average number of clicks of a predetermined first number of search items located at the upper side of the search items and of a predetermined second number of search items located at the lower side of the search items, in the search result web page for the predetermined time interval; and a decision means for determining whether the number of clicks per search item is greater by a predetermined difference than the average number of clicks.
PCT/KR2004/000416 2003-03-19 2004-02-27 Method and apparatus for detecting invalid clicks on the internet search engine WO2004084097A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2005518761A JP4358188B2 (en) 2003-03-19 2004-02-27 Invalid click detection device in Internet search engine

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020030017233A KR100619178B1 (en) 2003-03-19 2003-03-19 Method and apparatus for detecting invalid clicks on the internet search engine
KR10-2003-0017233 2003-03-19

Publications (1)

Publication Number Publication Date
WO2004084097A1 true WO2004084097A1 (en) 2004-09-30

Family

ID=36707372

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2004/000416 WO2004084097A1 (en) 2003-03-19 2004-02-27 Method and apparatus for detecting invalid clicks on the internet search engine

Country Status (4)

Country Link
JP (1) JP4358188B2 (en)
KR (1) KR100619178B1 (en)
CN (2) CN100533434C (en)
WO (1) WO2004084097A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008030670A1 (en) * 2006-09-08 2008-03-13 Microsoft Corporation Detecting and adjudicating click fraud
CN102663062A (en) * 2012-03-30 2012-09-12 奇智软件(北京)有限公司 Method and device for processing invalid links in search result
CN103368857A (en) * 2012-03-26 2013-10-23 北大方正集团有限公司 Method and system for transmitting data information
US8706551B2 (en) * 2003-09-04 2014-04-22 Google Inc. Systems and methods for determining user actions
WO2015012865A1 (en) 2013-07-26 2015-01-29 Empire Technology Development, Llc Device and session identification
US8996404B2 (en) 2007-04-26 2015-03-31 Nhn Business Platform Corporation Method for processing invalid click and system for executing the method
CN105069061A (en) * 2015-07-28 2015-11-18 安一恒通(北京)科技有限公司 Method and system for loading webpage in historical browsing record, browser and server
US11042886B2 (en) 2003-09-04 2021-06-22 Google Llc Systems and methods for determining user actions

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100786796B1 (en) * 2005-03-25 2007-12-18 주식회사 다음커뮤니케이션 Method and system for billing of internet advertising
JP4648455B2 (en) * 2005-05-06 2011-03-09 エヌエイチエヌ コーポレーション Personalized search method and personalized search system
KR20060028463A (en) * 2006-03-09 2006-03-29 정성욱 Click tracking and management system for online advertisement service
KR100777659B1 (en) * 2006-04-10 2007-11-19 (주)소만사 Device of detecting invalid use of keyword advertisement
KR100777660B1 (en) * 2006-04-10 2007-11-19 (주)소만사 Method of detecting robot-based invalid use of keyword advertisement and computer-readable medium having thereon program performing function embodying the same
CN101075908B (en) * 2006-11-08 2011-04-20 腾讯科技(深圳)有限公司 Method and system for accounting network click numbers
KR100841348B1 (en) * 2007-08-16 2008-06-25 방용정 Non-cost internet advertisement system each time unfairness click of cost-per-click-view and method thereof
KR100902466B1 (en) * 2007-10-30 2009-06-11 엔에이치엔비즈니스플랫폼 주식회사 System and Method for Tracking a Keyword Search Abuser
KR100914600B1 (en) * 2007-11-14 2009-08-31 엔에이치엔(주) System and Method for Determining Invalid Clicks
KR101020949B1 (en) * 2008-11-18 2011-03-09 주식회사 데이타웨이브 시스템 Method and server for detecting unfair click of keyword advertisement
KR20110116562A (en) 2010-04-19 2011-10-26 서울대학교산학협력단 Method and system for detecting bot scum in massive multiplayer online role playing game
CN102289756A (en) * 2010-06-18 2011-12-21 百度在线网络技术(北京)有限公司 Method and system for judging click validation
KR101158464B1 (en) * 2010-11-26 2012-06-20 고려대학교 산학협력단 Method and apparatus for detecting bot process
JP2014026528A (en) * 2012-07-27 2014-02-06 Nippon Telegr & Teleph Corp <Ntt> Effective click counter, method and program
KR101919137B1 (en) * 2012-11-08 2018-11-15 네이버 주식회사 Display advertising rate calculating method and system acording to value index of advertisement slot
CN103475543A (en) * 2013-09-11 2013-12-25 北京思特奇信息技术股份有限公司 Abnormal system service call detection method and system
WO2015184579A1 (en) * 2014-06-03 2015-12-10 Yahoo! Inc Determining traffic quality using event-based traffic scoring
CN104331306B (en) * 2014-10-14 2017-05-10 北京齐尔布莱特科技有限公司 Content updating method, equipment and system
CN104580244B (en) * 2015-01-26 2018-03-13 百度在线网络技术(北京)有限公司 The defence method and device clicked maliciously
KR101639752B1 (en) * 2015-02-13 2016-07-15 네이버 주식회사 System and method for aggregating view of contents using filter logic
CN105677869A (en) * 2016-01-06 2016-06-15 广州神马移动信息科技有限公司 Multidimensional search log anti-cheating method, system and computing equipment
CN107526748B (en) * 2016-06-22 2021-08-03 华为技术有限公司 Method and equipment for identifying user click behavior
CN108255885B (en) * 2016-12-29 2020-11-06 北京酷我科技有限公司 Song recommendation method and system
CN110020206B (en) * 2019-04-12 2021-10-15 北京搜狗科技发展有限公司 Search result ordering method and device
CN110069691B (en) * 2019-04-29 2021-05-28 百度在线网络技术(北京)有限公司 Method and device for processing click behavior data
CN111444408B (en) * 2020-03-26 2021-09-14 腾讯科技(深圳)有限公司 Network search processing method and device and electronic equipment
JP6873343B1 (en) * 2020-09-07 2021-05-19 シエンプレ株式会社 Unauthorized click prevention system, unauthorized click prevention method and program

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6269361B1 (en) * 1999-05-28 2001-07-31 Goto.Com System and method for influencing a position on a search result list generated by a computer network search engine
KR20020020584A (en) * 2000-09-09 2002-03-15 맹진기 Internet survey system and method and media for storing program source thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6269361B1 (en) * 1999-05-28 2001-07-31 Goto.Com System and method for influencing a position on a search result list generated by a computer network search engine
KR20020020584A (en) * 2000-09-09 2002-03-15 맹진기 Internet survey system and method and media for storing program source thereof

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8706551B2 (en) * 2003-09-04 2014-04-22 Google Inc. Systems and methods for determining user actions
US11100518B2 (en) 2003-09-04 2021-08-24 Google Llc Systems and methods for determining user actions
US11042886B2 (en) 2003-09-04 2021-06-22 Google Llc Systems and methods for determining user actions
US10515387B2 (en) 2003-09-04 2019-12-24 Google Llc Systems and methods for determining user actions
WO2008030670A1 (en) * 2006-09-08 2008-03-13 Microsoft Corporation Detecting and adjudicating click fraud
US8996404B2 (en) 2007-04-26 2015-03-31 Nhn Business Platform Corporation Method for processing invalid click and system for executing the method
CN103368857A (en) * 2012-03-26 2013-10-23 北大方正集团有限公司 Method and system for transmitting data information
CN102663062A (en) * 2012-03-30 2012-09-12 奇智软件(北京)有限公司 Method and device for processing invalid links in search result
US9692833B2 (en) 2013-07-26 2017-06-27 Empire Technology Development Llc Device and session identification
EP3025245A4 (en) * 2013-07-26 2017-05-03 Empire Technology Development LLC Device and session identification
EP3025245A1 (en) * 2013-07-26 2016-06-01 Empire Technology Development LLC Device and session identification
WO2015012865A1 (en) 2013-07-26 2015-01-29 Empire Technology Development, Llc Device and session identification
CN105069061B (en) * 2015-07-28 2019-03-12 安一恒通(北京)科技有限公司 Loading method, system, the browser and server of webpage in historical viewings record
CN105069061A (en) * 2015-07-28 2015-11-18 安一恒通(北京)科技有限公司 Method and system for loading webpage in historical browsing record, browser and server

Also Published As

Publication number Publication date
KR20040082633A (en) 2004-09-30
CN1761961A (en) 2006-04-19
JP2006520940A (en) 2006-09-14
CN100533434C (en) 2009-08-26
JP4358188B2 (en) 2009-11-04
KR100619178B1 (en) 2006-09-05
CN101388035A (en) 2009-03-18

Similar Documents

Publication Publication Date Title
WO2004084097A1 (en) Method and apparatus for detecting invalid clicks on the internet search engine
US8751601B2 (en) User interface that provides relevant alternative links
US6862610B2 (en) Method and apparatus for verifying the identity of individuals
CA2734774C (en) A user-transparent system for uniquely identifying network-distributed devices without explicitly provided device or user identifying information
US6910077B2 (en) System and method for identifying cloaked web servers
CA2294935C (en) Method and apparatus for redirection of server external hyper-link references
US7293012B1 (en) Friendly URLs
US9843559B2 (en) Method for determining validity of command and system thereof
KR100619179B1 (en) Method and apparatus for detecting invalid clicks on the internet search engine
CN116324766A (en) Optimizing crawling requests by browsing profiles
WO2003005240A1 (en) Apparatus for searching on internet
JPH0950422A (en) Interaction succession type access control method on computer network and server computer therefor
KR20040083340A (en) Method and apparatus for detecting invalid clicks on the internet search engine
KR19990018591U (en) Internet harmful site access restriction device
JP4780744B2 (en) Web computing system
KR100368338B1 (en) Method for Connection Web Page Using E-Mail Address
Ghazi et al. Proposing a Mechanism to Improve Web Usage Mining Automatically using Semantic Repository of the Data
Fletcher et al. Analytics Techniques
KR20020003327A (en) Electronic bulletin board system

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2004807418X

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 2005518761

Country of ref document: JP

122 Ep: pct application non-entry in european phase