WO2004084097A1 - Procede et appareil de detection de clics invalides sur un moteur de recherche internet - Google Patents

Procede et appareil de detection de clics invalides sur un moteur de recherche internet Download PDF

Info

Publication number
WO2004084097A1
WO2004084097A1 PCT/KR2004/000416 KR2004000416W WO2004084097A1 WO 2004084097 A1 WO2004084097 A1 WO 2004084097A1 KR 2004000416 W KR2004000416 W KR 2004000416W WO 2004084097 A1 WO2004084097 A1 WO 2004084097A1
Authority
WO
WIPO (PCT)
Prior art keywords
click
search
searcher
clicks
identifier
Prior art date
Application number
PCT/KR2004/000416
Other languages
English (en)
Inventor
Jung Soo Ha
Seok Ho Kang
Woo Sung Lee
Original Assignee
Nhn Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nhn Corporation filed Critical Nhn Corporation
Priority to JP2005518761A priority Critical patent/JP4358188B2/ja
Publication of WO2004084097A1 publication Critical patent/WO2004084097A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links

Definitions

  • the present invention relates to an Internet search engine server. More particularly, the present invention relates to a method and apparatus for detecting invalid clicks for a search item included in a search result web page that is provided by an Internet search engine server. Furthermore, the present invention relates to a method and apparatus for detecting invalid clicks, which can detect a variety of attempts for unjustly increasing the number of clicks for a search item and can immediately cope with these attempts.
  • searchers access Internet search engine servers such as NAVER, Yahoo and Lycos to request a search.
  • the Internet search service provider generates a search result web page including search items, which contain information associated with a search word input by the searcher, and then provides the searcher with the generated search result web page.
  • search result web page when a searcher accesses a NAVER search engine server and then inputs the search words "digital camera", is shown in Fig. 2.
  • Each item included in the search result web page is associated with URL (Uniform Resource Locator).
  • An Internet search service provider determines a list sequence of search items by combining several references.
  • One of the references that have been widely used is the number of clicks for a specific search item by users. For example, if the number of clicks for a search item by users is great, the search item is displayed relatively at an upper portion of a search result web page.
  • a network information provider of a web server wants search items associated with himself or herself to be displayed at the top of a search result web page. For this reason, in order for the search item for his or her web page to be displayed at the top of the search result web page, the network information provider may deliberately access the Internet search server to click the search item for his or her own web page multiple times. In some cases, the network information providers may continuously click the search item for his or her web page using a special program. Since such unjust clicks for a search item do not reflect the natural search results of users, an Internet search service provider has to detect such invalid clicks.
  • Overture Services, Inc. U.S.A.
  • an Internet search service provider provides services wherein a network information provider pays per click when searchers click on a search item in a search result web page, which is associated with the network information provider.
  • a searcher intentionally clicks on a specific search item several times, the network information provider associated with the search item has to pay additional costs. Accordingly, even in this case, it is necessary to detect invalid clicks, which are made with the intention of only increasing only the number of clicks without actually searching for search item.
  • a further another object of the present invention is to provide a method and apparatus for detecting invalid clicks, wherein several identifiers provided in order to detect invalid clicks are difficult to be counterfeited or forged.
  • the present invention provides a method for detecting invalid clicks in an Internet search engine, comprising the steps of generating a search result web page in response to a search request from a searcher, acquiring a page identifier corresponding to the generated web page, receiving a click for a search item included in the search result web page from the searcher, acquiring a site identifier corresponding to the clicked search item, and if the page identifier and the site identifier are coincident with a page identifier and a site identifier associated with other clicks within a predetermined time interval, determining that the click is invalid.
  • a method for detecting invalid clicks in an Internet search engine comprising the steps of generating a search result web page in response to a search request from a searcher, acquiring a session identifier included in a session cookie file stored in a terminal of the searcher, receiving a click for a search item included in the search result web page from the searcher, acquiring a site identifier corresponding to the clicked search item, and if the session identifier and the site identifier are coincident with a session identifier and a site identifier associated with other clicks within a predetermined time interval, determining that the click is invalid.
  • a method for detecting invalid clicks in an Internet search engine comprising the steps of receiving a click for a search item included in a search result web page from a searcher, acquiring a client IP address corresponding to a terminal of the searcher, acquiring a site identifier corresponding to the clicked search item, and if the client IP address and the site identifier are coincident with a client IP address and a site identifier associated with other clicks within a predetermined time interval, determining that the click is invalid.
  • a method for detecting invalid clicks in an Internet search engine comprising the steps of generating a search result web page in response to a search request from a searcher, acquiring a terminal identifier corresponding to a terminal of the searcher, generating a user cookie file including the terminal identifier and then storing the user cookie file in the terminal of the searcher, receiving a click for a search item included in the search result web page from the searcher, acquiring a site identifier corresponding to the clicked search item, and if the terminal identifier and the site identifier are coincident with a terminal identifier and a site identifier associated with other clicks within a predetermined time interval, determining that the click is invalid.
  • an apparatus for detecting invalid clicks wherein if a searcher clicks on a search item included in a search result web page provided by an Internet search engine, at least one of an IP address of the searcher's terminal, a network address to which the searcher's terminal belongs, a search word associated with the search result web page, information on a web browser of the searcher, a click time associated with the click and cookie file information stored in the searcher's terminal, and URL information associated with the search item are received, and it is determined whether the click is invalid based on a predetermined reference according to the received information.
  • an apparatus for detecting invalid clicks comprising (1) a log storage unit that, in response to a click of a searcher for a search item included in a search result web page provided by an Internet search engine, stores a log regarding at least two of the following: an IP address of the searcher's terminal, a network address to which the searcher's terminal belongs, a search word associated with the search result web page, information on a web browser of the searcher, a click time associated with the click, cookie file information stored in the searcher's terminal and URL information associated with the search item, (2) an invalid click pattern storage unit that stores an invalid click pattern associated with a pair of at least two of the following: the IP address of the searcher's terminal, the network address to which the searcher's terminal belongs, the search word associated with the search result web page, the information on the web browser of the searcher, the click time associated with the click, the cookie file information stored in the searcher's terminal, and URL information associated with the search item, and (3)
  • an apparatus for detecting invalid clicks comprising a click counter means for counting the number of clicks of a searcher per search item for a predetermined time interval for the search item included in a search result web page provided by an Internet search engine, an average click-number calculation means for calculating the average number of clicks, for the predetermined time interval, of search items belonging to a category to which the search item belongs, and a decision means for determining whether the number of clicks per search item is greater by a predetermined difference than the average number of clicks.
  • an apparatus for detecting invalid clicks comprising a click counter means for counting the number of clicks of a searcher per a search item for a predetermined time interval for the search item included in a search result web page provided by an Internet search engine, an average click-number calculation means for calculating the average number of clicks of a predetermined first number of search items located at the upper side of the search items and of a predetermined second number of search items located at the lower side of the search items, in the search result web page for the predetermined time interval, and a decision means for determining whether the number of clicks per search item is greater by a predetermined difference than the average number of clicks.
  • invalid clicks are difficult to accurately define, and the scope of the invalid clicks should be variously defined depending on embodiments and applications.
  • invalid clicks may refer to clicks that are made with the intention of increasing only the number of clicks without the intention of an actual search.
  • Fig. 1 is a diagram illustrating a network connection of an Internet search server including an apparatus for detecting invalid clicks and a client terminal according to the present invention.
  • Fig. 2 is a diagram illustrating search result web page generated internet search engine.
  • Fig. 3 is a block diagram illustrating the construction of an apparatus for detecting invalid clicks according to an embodiment of the present invention.
  • Fig. 4 is a flowchart illustrating a method for detecting invalid clicks according to an embodiment of the present invention.
  • Fig. 5 is shows an exemplary log file according to an embodiment of the present invention.
  • Figs. 6a and 6b are flowcharts illustrating a method for detecting invalid clicks according to an embodiment of the present invention.
  • Fig. 7 shows an exemplary log file according to an embodiment of the present invention.
  • Fig. 8 is a flowchart illustrating a method of generating a session identifier according to an embodiment of the present invention.
  • Fig. 9 is a flowchart illustrating a method for detecting invalid clicks according to an embodiment of the present invention.
  • Fig. 10 shows an exemplary log file according to an embodiment of the present invention.
  • Fig. 11 is a flowchart illustrating a method for detecting invalid clicks according to an embodiment of the present invention.
  • Fig. 12 is a block diagram illustrating the construction of a general-purpose computer system that may be adopted in constructing a search engine server and an apparatus for detecting invalid clicks according to the present invention.
  • Fig. 1 is a diagram illustrating a network connection of an Internet search server including an apparatus for detecting invalid clicks and a client terminal according to the present invention.
  • a searcher or a cheater who will attempts unjust clicks accesses an Internet search server 104 through a client terminal 101 connected to an Internet 103.
  • the cheater attempts to increase the number of clicks by clicking on a search item in a search result web page several times provided by the Internet search server 104, whose number of clicks needs to be increased. For example, in Fig.
  • a search item 202 is a search item associated with http://www.invalidclick.com and a cheater clicks on the search item 202 continuously in order for the search item 202 to be displayed at the top of a search result web page.
  • a cookie file 102 is a specific text file that is stored in a hard disk of the client terminal 101 by the search engine server 104 or by other web site when the client terminal 101 is connected to the search engine server 104 or the other web site.
  • each request for the web page is independent from other requests. Therefore, the web server has nothing information on which page has sent to the client terminal 101 previously or what work has performed together with the client terminal 101 previously. Accordingly, in order to correlate respective requests processed independently as such, a cookie file is provided.
  • Such a cookie file serves to allow a web server to store information on a user in the user's computer. Even in this invention, in order to detect invalid clicks, several cookie files are used. This will be described in detail later on.
  • a log file 105 is a file for storing several logs related to a user's click pattern.
  • several parameters are used in order to detect invalid clicks. After parameters associated with respective clicks are stored in the log file, it is determined whether the input clicks are invalid based on predetermined rules and patterns.
  • Fig. 3 is a block diagram illustrating the construction of an apparatus for detecting invalid clicks according to an embodiment of the present invention.
  • the apparatus for detecting invalid clicks 301 comprises a parameter input unit 304, a log storage unit 305, an invalid click pattern storage unit 306, an invalid click verification unit 307, an invalid click report unit 308 and an invalid click decision unit 309. If a searcher clicks on a search item included in a search result web page provided by an Internet search engine, several parameters 302 associated with the click are input to the parameter input unit 304.
  • the parameters are basic information for determining invalid clicks and include an IP address of the searcher's terminal, a network address to which the searcher's terminal belongs, a search word associated with the search result web page, information on a web browser of the searcher, a click time associated with the click, cookie file information stored in the searcher's terminal, URL information associated with the search item and the like.
  • a search request packet is transferred from the client terminal 101 to the Internet search engine server 104.
  • the search request packet contains a packet configuration according to the HTTP protocol and is also contained in the Internet (IP: Internet Protocol) packet. Since a source IP address field is contained in the configuration of the Internet protocol packet, the Internet search engine server 104 extracts a source IP address from the search request packet for which a click is requested, thus extracting an IP address of the searcher's terminal.
  • IP Internet Protocol
  • the front part of the source IP address is a network address to which the searcher's terminal belongs.
  • the IP address is composed of 4 bytes.
  • the front part of the IP address is a network address for identifying a network to which a searcher's terminal belongs and the remaining parts thereof are addresses for identifying the searcher's terminal within the network. Accordingly, a network address is extracted from the source IP address.
  • the 3 bytes at the front part of the IP address are considered a network address and the network address is obtained from the source IP address. For example, if a source IP address is 123.45.67.89, 123.45.67 is extracted as a network address.
  • a search word associated with a search result web page is a value input to the Internet search server 104 by the searcher.
  • Information on a web browser of the searcher is information on a web browser, which is installed in the client terminal 101 of the searcher and is used to access the Internet search server 104.
  • Information on the web browser includes the type of web browser, the version of the web browser, product ID of the web browser, etc. In particular, even when a plurality of searchers has web browsers of the same type and the same version, the product IDs of their web browsers may be different. Thus, it becomes useful information for identifying a searcher's terminal.
  • some of environment parameters of a client are transferred to a web server with them included in the HTTP packet.
  • a program (a search engine program) of the web server can receive the environmental parameters and can use the parameters to detect invalid clicks.
  • Such enviromnental parameters include the following information:
  • REMOTE_HOST domain name of a connected person
  • REMOTE_ADDR IP address of a connected client host
  • REMOTE JSER name of a connected person (displayed in case of a web server whose user authentication is set)
  • REMOTE_IDENT ID of a connected person (displayed in case of a web server whose user authentication is set)
  • HTTP_USER_AGENT registration information on a program driven by a connected person, usually the name of a browser
  • HTTP_ACCEPT_LANGUAGE language used by a connected person
  • HTTP_REFERER name of document that calls a corresponding CGI program
  • REQUEST_METHOD method for transmitting data to a sever (GET, POST)
  • QUERY_STRING parameters wherein transmitted data are stored when the data are sent in a GET mode
  • CONTENTJ ENGTH total length of transmitted data (the number of byte) when the data are sent in POST mode
  • CONTENT TYPE type of MIME of data when the data are transmitted in a POST mode
  • AUTH_TYPE parameters for confirming a user's authority
  • PATH_INFO information of a current path of a called CGI program
  • PATH_TRANSLATED information on a current path of resources in a web server required by web
  • a click time associated with a click of a searcher is a time when a click input from the searcher is received. According to another embodiment of the present invention, other time associated with the click time of the searcher may be used. For example, a time when a searcher actually input a click in a client may be used.
  • cookie file 102 Information on a cookie file stored in the searcher's terminal is obtained by the Internet search server 104 which accesses the cookie file 102 stored in the client terminal 101.
  • the cookie file 102 may be used for a variety uses. This will be described in detail with reference to other embodiments.
  • URL information associated with the search item clicked by the searcher can be obtained by referring to the search database since it is stored in a search database (not shown) associated with the search engine server 104.
  • the URL information may be a domain name of a web server or information containing a domain name, a directory and a file name. For instance, http://www.naver.com and http://www.naver.com/download are the same since they are www.naver.com in view of the domain name, but have different URLs. In the present invention, an embodiment using URL up to the domain name has been described for convenience of explanation.
  • URL information includes all the embodiments according to this description.
  • logs including only some parameters are shown for convenience of explanation. According to another embodiment of the present invention, however, logs including all or some of the parameters 302 may be stored in the log storage unit 305.
  • the log storage unit 305 stores therein a log regarding at least two of an IP address of a searcher's terminal, a network address to which the searcher's terminal belongs, a search word associated with the search result web page, information on a web browser of the searcher, a click time associated with the click, cookie file information stored in the searcher's terminal, and URL information associated with the search item.
  • the log storage unit 305 stores therein a log regarding at least one of an IP address of a searcher's terminal, a network address to which the searcher's terminal belongs, a search word associated with the search result web page, information on a web browser of the searcher, a click time associated with the click and cookie file information stored in the searcher's terminal, and URL information associated with the search item.
  • the invalid click pattern storage unit 306 stores therein invalid click patterns or rules associated with a pair of at least two of an IP address of a searcher's terminal, a network address to which the searcher's terminal belongs, a search word associated with the search result web page, information on a web browser of the searcher, a click time associated with the click, cookie file information stored in the searcher's terminal, and URL information associated with the search item.
  • a rule or pattern that "both the IP address of the searcher's terminal and URL information associated with a search item, among click inputs for 10 minutes, are coincident to each other, may be stored in the invalid click pattern storage unit 306. As such, the rule, etc.
  • the invalid click pattern storage unit 306 may be stored in the form of a file using a predetermined language according to a predetermined rule. Or, in case of the above rule or pattern, it may be stored in the form of a program so that it is determined to be an invalid click.
  • the invalid click decision unit 309 determines whether the searcher's clicks are invalid based on the log stored in the log storage unit 305 and the invalid click pattern stored in the invalid click pattern storage unit 306.
  • the invalid click report unit 308 reports clicks pursuant to a predetermined reference among clicks, which are determined to be invalid by the invalid click decision unit 309, to the administrator 303 of the Internet search engine.
  • the invalid click report unit 308 reports all the clicks, which are determined to be invalid by the invalid click decision unit 309, to the administrator of the Internet search engine.
  • the predetermined reference is all clicks that have been determined to be invalid by the invalid click decision unit 309.
  • a field indicating whether to report a case corresponding to the rule or pattern to the administrator 303 is stored in every rule or pattern stored in the invalid click pattern storage unit 306. In this case, in case of a case corresponding to a rule where the administrator 303 must be informed, the invalid click report unit 308 reports it to the administrator 303.
  • the invalid click verification unit 307 allows the administrator 303 to change clicks, which have been determined to be the invalid by the invalid click decision unit 309, to valid clicks. Since the invalid click verification unit 307 can change clicks that are erroneously determined to be invalid clicks to valid clicks, invalid clicks can be more accurately determined.
  • Fig. 4 is a flowchart illustrating a method for detecting invalid clicks according to an embodiment of the present invention.
  • the Internet search server 104 receives a search request from a searcher (step 401). If the searcher accesses the Internet search server 104 and then inputs a search word, the search word is transferred to the Internet search server 104 as a search request packet.
  • the Internet search server 104 generates a search result web page in response to the search request (step 402). For example, as shown in Fig. 2, a search result web page including a plurality of search items corresponding to an input of a search word by a searcher is provided to the searcher.
  • a page identifier corresponding to the generated search result web page is acquired (step 403).
  • a page identifier is generated whenever a search result web page is generated.
  • the page identifier is an identifier for identifying the search result web page. Accordingly, if the same searcher requests a search by repeatedly inputting the same search word in a search window of the Internet search server 104, a new page identifier is allocated every time. Likewise, if a searcher clicks "reload" in a web browser on which a search result web page is displayed, the Internet search server 104 allocates a new page identifier for the search result web page since the search request packet is transferred from the client terminal 101 to the Internet search server 104.
  • the Internet search server 104 receives a click for a search item included in the search result web page from the searcher. If the click is received, the Internet search server 104 allows a hyperlink for the search item to approach the Internet search server 104, allows the Internet search server 104 to perform the necessary processes and then allows a client terminal to access a web site corresponding to the search item. For example, in case where http://www.naver.com/abc/*http://www.invalidclick.com/ is prepared as a hyperlink of a search item corresponding to "http//www. invalidclick.com/", if a searcher clicks the search item, the search is allowed to access a search server called http://www.naver.com. The search server allows a client terminal to access the http://www.invalidclick.com according to URL located at the rear side of the hyperlink.
  • the Internet search server 104 acquires a site identifier corresponding to he clicked search item (step 405).
  • the site identifier is an identifier for identifying a search item and is generated based on URL information corresponding to a search item.
  • the site identifier uses intact URL information corresponding to the search item.
  • URL information used as basic information for generating the site identifier may be a domain name of a web server or information containing a domain name, a directory and a file name. For instance, http://www.naver.com and http://www.naver.com/download are same since they are both www.naver.com from the viewpoint of a domain name, but are different from the viewpoint of a URL.
  • URL information includes all the embodiments according to this description.
  • step 406 the apparatus for detecting invalid clicks determines that the click is invalid if the page identifier and the site identifier are coincident with a page identifier and a site identifier associated with other click within a predetermined time interval.
  • Fig. 5 is shows an exemplary log file according to an embodiment of the present invention. The embodiment of Fig. 4 will be described with reference to Fig. 5.
  • a page identifier 509 and a site identifier 510 are stored in a log file 500.
  • Reference numerals 501 to 508 indicate logs stored for respective click inputs.
  • a cheater accesses the Internet search server 104 to request a search.
  • the Internet search server 104 generates a search result web page and generates a page identifier corresponding to a search result web page, "nCe249sisnO".
  • the cheater continuously clicks a specific search item included in the search result web page. Even though a specific search item in a search result web page generated once is continuously clicked, a page identifier is not newly generated. Thus, the page identifier continues to have the same value.
  • the cheater may update the search result web page by clicking "reload" in the web browser.
  • a page identifier is newly allocated and a log regarding the page identifier is the log 505. Thereafter, a case where the cheater clicks on the same search item corresponds to the log 506.
  • Figs. 6a and 6b are flowcharts illustrating a method for detecting invalid clicks according to an embodiment of the present invention.
  • the Internet search server 104 receives a search request from a searcher (step 601).
  • the Internet search server 104 generates a search result web page in response to the search request (step 602).
  • An apparatus for determining invalid clicks determines whether a session cookie file is stored in the client terminal 101 that requested the search (step 603).
  • Step 603 to step 611 are processes for obtaining a session identifier.
  • the apparatus for determining invalid clicks generates a new session identifier (step
  • step 604 a session cookie file containing the session identifier is stored in the client terminal 101.
  • An updating time of the session identifier is also stored in the session cookie file.
  • the updating time is stored in the session cookie file (step 609).
  • the apparatus for determining invalid clicks determines whether the last- updated time of the session identifier contained in the session cookie file is within a predetermined time interval (step 606).
  • the apparatus for determining invalid clicks extracts a session identifier contained in the session cookie file (step 607).
  • the apparatus for determining invalid clicks generates a new session identifier (step 608).
  • a session identifier contained in the session cookie file is updated with a newly created session identifier (step 610).
  • An updating time of the session identifier is stored in the session cookie file (step 611).
  • the Internet search server 104 receives a click for a search item included in the search result web page from the searcher (step 612).
  • the Internet search server 104 acquires a site identifier corresponding to the clicked search item (step 613).
  • the apparatus for detecting invalid clicks determines that the click is an invalid click if the session identifier and the site identifier are coincident with a session identifier and a site identifier associated with other clicks within a predetermined time interval (step 614).
  • Fig. 7 shows an exemplary log file according to an embodiment of the present invention.
  • a click time 710, an updating time 711 of a session identifier, a session identifier 712 and a site identifier 713 are stored in a log file 700.
  • Reference numerals 701 to 708 indicate logs stored corresponding to respective click inputs.
  • a cheater accesses an Internet search server 104 to request a search request.
  • the Internet search server 104 generates a search result web page.
  • the Internet search server 104 receives a click for a search item included in the search result web page.
  • the Internet search server 104 determines whether a session cookie file is stored in the client terminal 101. If it is determined that a session cookie file is not stored in the client terminal 101, the Internet search server 104 generates a new session identifier and then has its updating time and a session cookie file containing the session identifier stored in the client terminal 101. In this embodiment, a session identifier "xigw9492" and an updating time "10:50:14" are recorded. Moreover, a click time, an updating time, a session identifier and a site identifier corresponding to a search item are stored in the log file 700 as the log 701. In case where a session cookie file is generated for the first time, the session cookie file is generated upon click and a session identifier is also generated at that time. Thus, a click time and a session identifier updating time are same.
  • the Internet search server 104 determines whether a session cookie file is stored in the client terminal 101. Since the session cookie file generated in the above is already stored in the client terminal 101, the Internet search server 104 accesses a session cookie file stored in the client terminal 101.
  • the session cookie file stores a session identifier and the last-updated time of the session identifier therein. In this embodiment, a session identifier "xigw9492" and an updating time "10:50:14" are stored in the session cookie file.
  • the Internet search server 104 determines whether a click time for the search item from the searcher is within a predetermined time interval from the last-updated time associated with the session identifier.
  • a click time of the second click is "10:50:18". If the predetermined time interval is 5 seconds, the click time "10:50:18" is within the predetermined time interval from the last-updated time "10:50:14". As such, in this case, the session identifier stored in the session cookie file is used as a current session identifier and the session identifier of the session cookie file is not updated. As a result, in this case, for example, the log 702 is recorded.
  • the log 702 is an invalid click since it has the same session identifier and site identifier as the log 701.
  • the log 704 corresponds to a case where the cheater requests "reload”. Likewise in the event that the cheater requests "reload”, reference to the session cookie file stored in the client terminal 101 is made and the session identifier is not updated since the last-updated time stored in the session cookie file is within the predetermined time interval. Accordingly, for example, the log 704 is recorded. It is determined that the log 704 is an invalid click since it is same as the log 701. That is, according to this embodiment, it is possible to detect a case where a cheater clicks on the same search item after clicking on "reload" within a short time interval.
  • the log 705 corresponds to a case where a click for the same search item is received from a searcher different from the log 701, the log 702 and the log 704. In this case, it is not determined to be an invalid click since a new session identifier is allocated.
  • the log 709 corresponds to a case where a searcher same as the log 701 clicks on the same search item after a considerable time. In this case, as a click is received after a considerable time, it is not determined to be an invalid click.
  • a cheater clicks on the same search item after a predetermined time interval since a session identifier is generated it is determined to be an invalid click.
  • a case where a click is made within a predetermined time interval from the final click time for the same search item based on an invalid click decision may be an invalid click. This will be described in brief. If a click is received from a searcher, it is determined whether a session cookie file is stored in the terminal. If it is determined that a session cookie file is stored in the terminal, it is determined whether the click time for the search item from the searcher is within a predetermined time interval from the final click time associated with the session identifier.
  • a session identifier contained in the session cookie file is acquired and the final click time is updated with a click time for the search item.
  • the time reference is decided according to an object of detecting invalid clicks.
  • Fig. 8 is a flowchart illustrating a method of generating a session identifier according to an embodiment of the present invention.
  • Source data 801 are basic data for generating a session identifier 805.
  • the source data may be current time information, a search word, a product ID of a web browser of a searcher and the like.
  • the source data may be numbers that are randomly selected.
  • a hashing function 802 is applied to the source data 801 to generate an encoded string 803.
  • a checksum is then added to the encoded string 803 to generate a session identifier 805. The checksum serves to prevent a cheater from counterfeiting a session identifier.
  • the method for generating a session identifier may be applied to generate a page identifier, a site identifier, a terminal identifier to be described later, etc.
  • Fig. 9 is a flowchart illustrating a method for detecting invalid clicks according to an embodiment of the present invention.
  • the Internet search server 104 receives a click for a search item included in a search result web page from a searcher (step 901).
  • the Internet search server 104 acquires a client IP address corresponding to a terminal 101 of the searcher (step 902).
  • the IP address of the client may be extracted from a source IP address field of the received IP packet.
  • the Internet search server 104 acquires a site identifier corresponding to the clicked search item (step 903).
  • an apparatus for searching invalid clicks determines that the click is invalid if the client IP address and the site identifier are coincident with a client IP address and a site identifier associated with other clicks within a predetermined time interval.
  • Fig. 10 shows an exemplary log file according to an embodiment of the present invention.
  • a click time 1010, a client IP address 1011 and a site identifier 1012 are stored in a log file 1000.
  • Reference numerals 1001 to 1009 designate logs stored corresponding to respective click inputs.
  • a user continuously visits a web site within a short time, it is difficult to see it as a normal click. Thus, this case is determined to be an invalid click. For example, if the time reference is 5minutes, the log 1002, the log 1004 and the log 1005 having the same client IP address and the same site identifier as the log 1001 are determined to be invalid clicks. It is determined that a click related to the clicked log 1009 in about 20 minutes is a valid click. In the event that invalid clicks are determined based on the client IP address, there are some points to be cautious about.
  • this embodiment is constructed in combination with an embodiment using other parameters such as a session identifier.
  • Fig. 11 is a flowchart illustrating a method for detecting invalid clicks according to an embodiment of the present invention.
  • An Internet search server 104 receives a search request from a searcher (step 1101) and generates a search result web page (step 1102).
  • the Internet search server 104 determines whether a user cookie file including a terminal identifier is stored in the terminal (step 1103).
  • the Internet search server 104 As a result of the determination in step 1103, if the user cookie file including the terminal identifier is not stored in the terminal, the Internet search server 104 generates a terminal identifier (step 1104).
  • the Internet search server 104 generates the user cookie file including the terminal identifier and stores it in the terminal of the searcher (step 1105).
  • the Internet search server 104 extracts the terminal identifier from the user cookie file (step 1106).
  • the Internet search server 104 receives a click for a search item included in the search result web page from the searcher (step 1107) and then acquires a site identifier corresponding to the clicked search item (step 1108).
  • an apparatus for determining invalid clicks determines that if the terminal identifier and the site identifier are coincident with a terminal identifier and a site identifier associated with other clicks within a predetermined time interval, the click is invalid.
  • a client terminal uses a proxy server or an IP gateway, it is possible to discriminate the client's terminal using a terminal identifier.
  • a proxy server or an IP gateway it is possible to properly identify clicks from different clients.
  • the number of clicks of searchers per a search item for a predetermined time interval for search items included in a search result web page provided by an Internet search engine is greater than the average number of clicks of search items belonging to a category to which the search item belongs, it is considered to be an invalid click and is thus reported to an administrator.
  • An apparatus for detecting invalid clicks includes a click counter means for counting the number of clicks for the number of clicks of searchers per search item for a predetermined time interval for search items included in a search result web page provided by an Internet search engine, an average click-number calculation means for calculating the average number of clicks for a predetermined time interval of search items belonging to the category to which the search item belongs, and a decision means for determining whether the number of clicks per search item is greater by a predetermined difference than the average number of clicks. If the number of clicks per search item is greater by a predetermined difference than the average number of clicks, this fact is reported to the administrator through the invalid click report unit 308.
  • the number of clicks of searchers per a search item for a predetermined time interval for search items included in a search result web page provided by an Internet search engine is compared with the average number of clicks of a predetermined first number of search items located at the upper side of the search items and of a predetermined second number of search items located at the lower side of the search items in the search result web page for the predetermined time interval.
  • the number of clicks for a specific search item is compared with the number of clicks of two search items located immediately on the specific search item and two search items located immediately below the specific search item for the same period.
  • the method for determining the invalid click may be used independently or may be used in combination with other method for determining invalid clicks.
  • a rule wherein a case where a client IP address, a page identifier and a site identifier corresponding to a search item are repeated within 5 minutes from the final click for the search item is invalid, may be stored in the invalid click pattern storage unit 306.
  • the Internet search server and the apparatus for identifying unjust clicks have been confusedly described as a single unit. According to another embodiment of the present invention, however, it is to be noted that they can be separately implemented according to their functions and managed by different administrators.
  • components shown and described as separate components may be physically constructed in a single system and may physically constructed in a separate system.
  • embodiments of the present invention further relate to computer readable media that include program instructions for performing various computer- implemented operations.
  • the media may also include, alone or in combination with the program instructions, data files, data structures, tables, and the like.
  • the media and program instructions may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well known and available to those having skill in the computer software arts.
  • Examples of computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media such as floptical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory devices (ROM) and random access memory (RAM).
  • the media may also be a transmission medium such as optical or metallic lines, wave guides, etc. including a carrier wave transmitting signals specifying the program instructions, data structures, etc.
  • Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter.
  • Fig. 12 is a block diagram illustrating the construction of a general-purpose computer system that may be adopted in constructing a search engine server and an apparatus for detecting invalid clicks according to the present invention.
  • the computer system includes any number of processors 1240 (also referred to as central processing units, or CPUs) that are coupled to storage devices including primary storage 1260 (typically a random access memory, or "RAM"), primary storage 1270 (typically a read only memory, or "ROM").
  • primary storage 1260 acts to transfer data and instructions uni-directionally to the CPU and primary storage 1260 is used typically to transfer data and instructions in a bi-directional manner. Both of these primary storage devices may include any suitable type of the computer-readable media described above.
  • a mass storage device 1210 is also coupled bi-directionally to CPU 1240 and provides additional data storage capacity and may include any of the computer-readable media described above.
  • the mass storage device 1210 may be used to store programs, data and the like and is typically a secondary storage medium such as a hard disk that is slower than primary storage.
  • a specific mass storage device such as a CD-ROM 1220 may also pass data uni-directionally to the CPU.
  • Processor 1240 is also coupled to an interface 1230 that includes one or more input/output devices such as such as video monitors, track balls, mice, keyboards, microphones, touch-sensitive displays, transducer card readers, magnetic or paper tape readers, tablets, styluses, voice or handwriting recognizers, or other well-known input devices such as, of course, other computers.
  • processor 1240 optionally may be coupled to a computer or telecommunications network using a network connection as shown generally at 1250 With such a network connection, it is contemplated that the CPU might receive information from the network, or might output information to the network in the course of performing the above-described method steps.
  • a network connection it is contemplated that the CPU might receive information from the network, or might output information to the network in the course of performing the above-described method steps.
  • the hardware elements described above may be configured (usually temporarily) to act as one or more software modules for performing the operations of this invention.
  • a method and apparatus for detecting invalid clicks for a search item included in a search result web page provided by an Internet search engine server are provided.
  • a method and apparatus for detecting invalid clicks which can detect a variety of attempts for unduly increasing the number of clicks for a search item and can immediately cope with these attempts, are provided. That is, if an unjust click attempt of a new pattern is found, the pattern or rule is stored in an invalid click pattern storage unit according to the present invention. It is thus possible to immediately cope with this unjust click attempt following a new pattern.

Abstract

L'invention concerne un serveur moteur de recherche Internet. En particulier, l'invention concerne un procédé et un appareil permettant de détecter des clics invalides pour un objet de recherche inclus dans une page Web de résultats de recherche qui est obtenue par un serveur moteur de recherche Internet. L'invention concerne un procédé de détection de clics invalides dans un moteur de recherche Internet, procédé comprenant les étapes suivantes : génération d'une page Web de résultats de recherche en réponse à une demande de recherche émanant d'un chercheur ; acquisition d'un identificateur de page correspondant à la page Web générée ; réception d'un clic pour un objet de recherche inclus dans la page Web de résultats de recherche provenant du chercheur ; acquisition d'un identificateur de site correspondant à l'objet de recherche cliqué ; si l'identificateur de page et l'identificateur de site coïncident avec un identificateur de page et un identificateur de site associés à d'autres clics, dans un intervalle de temps prédéterminé, le clic est considéré comme invalide. L'invention fournit ainsi un procédé et un appareil de détection de clics invalides, détectant une variété de tentatives augmentant exagérément le nombre de clics pour un objet de recherche, ce qui résout immédiatement le problème de ces multiples tentatives.
PCT/KR2004/000416 2003-03-19 2004-02-27 Procede et appareil de detection de clics invalides sur un moteur de recherche internet WO2004084097A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2005518761A JP4358188B2 (ja) 2003-03-19 2004-02-27 インターネット検索エンジンにおける無効クリック検出装置

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020030017233A KR100619178B1 (ko) 2003-03-19 2003-03-19 인터넷 검색 엔진에 있어서의 무효 클릭 검출 방법 및 장치
KR10-2003-0017233 2003-03-19

Publications (1)

Publication Number Publication Date
WO2004084097A1 true WO2004084097A1 (fr) 2004-09-30

Family

ID=36707372

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2004/000416 WO2004084097A1 (fr) 2003-03-19 2004-02-27 Procede et appareil de detection de clics invalides sur un moteur de recherche internet

Country Status (4)

Country Link
JP (1) JP4358188B2 (fr)
KR (1) KR100619178B1 (fr)
CN (2) CN101388035A (fr)
WO (1) WO2004084097A1 (fr)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008030670A1 (fr) * 2006-09-08 2008-03-13 Microsoft Corporation Détection et jugement de la fraude au clic
CN102663062A (zh) * 2012-03-30 2012-09-12 奇智软件(北京)有限公司 一种处理搜索结果中无效链接的方法及装置
CN103368857A (zh) * 2012-03-26 2013-10-23 北大方正集团有限公司 一种发送数据信息的方法及系统
US8706551B2 (en) * 2003-09-04 2014-04-22 Google Inc. Systems and methods for determining user actions
WO2015012865A1 (fr) 2013-07-26 2015-01-29 Empire Technology Development, Llc Identification de session et de dispositif
US8996404B2 (en) 2007-04-26 2015-03-31 Nhn Business Platform Corporation Method for processing invalid click and system for executing the method
CN105069061A (zh) * 2015-07-28 2015-11-18 安一恒通(北京)科技有限公司 历史浏览记录中网页的加载方法、系统、浏览器和服务器
US11042886B2 (en) 2003-09-04 2021-06-22 Google Llc Systems and methods for determining user actions

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100786796B1 (ko) * 2005-03-25 2007-12-18 주식회사 다음커뮤니케이션 인터넷 광고 과금 방법 및 시스템
US7933917B2 (en) * 2005-05-06 2011-04-26 Nhn Corporation Personalized search method and system for enabling the method
KR20060028463A (ko) * 2006-03-09 2006-03-29 정성욱 온라인 광고 시스템에서의 이용자 부정 클릭 추적과 방지시스템 및 그 방법
KR100777659B1 (ko) * 2006-04-10 2007-11-19 (주)소만사 키워드 광고 부정 사용 검출 장치
KR100777660B1 (ko) * 2006-04-10 2007-11-19 (주)소만사 로봇 기반 키워드 광고 부정 사용 방지 방법 및 이를실현시키기 위한 프로그램을 기록한 컴퓨터로 판독 가능한기록 매체
CN101075908B (zh) * 2006-11-08 2011-04-20 腾讯科技(深圳)有限公司 一种网络点击统计系统及方法
KR100841348B1 (ko) * 2007-08-16 2008-06-25 방용정 클릭당 과금되는 광고의 부정클릭시 과금하지 않는 인터넷광고 시스템 및 그 방법
KR100902466B1 (ko) * 2007-10-30 2009-06-11 엔에이치엔비즈니스플랫폼 주식회사 키워드 검색 어뷰저 추적 방법 및 시스템
KR100914600B1 (ko) * 2007-11-14 2009-08-31 엔에이치엔(주) 무효 클릭 판단 방법 및 시스템
KR101020949B1 (ko) * 2008-11-18 2011-03-09 주식회사 데이타웨이브 시스템 키워드 광고의 부정 클릭 검출 방법 및 서버
KR20110116562A (ko) 2010-04-19 2011-10-26 서울대학교산학협력단 대규모 다중 사용자 온라인 롤플레잉 게임에서 봇을 검출하는 방법 및 시스템
CN102289756A (zh) * 2010-06-18 2011-12-21 百度在线网络技术(北京)有限公司 点击有效性的判断方法及其系统
KR101158464B1 (ko) * 2010-11-26 2012-06-20 고려대학교 산학협력단 봇 프로세스 탐지 장치 및 방법
JP2014026528A (ja) * 2012-07-27 2014-02-06 Nippon Telegr & Teleph Corp <Ntt> 有効クリック数算出装置、方法、及びプログラム
KR101919137B1 (ko) * 2012-11-08 2018-11-15 네이버 주식회사 광고 영역의 가치지수에 따른 디스플레이 광고 단가 산출 방법 및 시스템
CN103475543A (zh) * 2013-09-11 2013-12-25 北京思特奇信息技术股份有限公司 一种检测系统业务异常调用的方法及系统
EP3134823A4 (fr) * 2014-06-03 2017-10-25 Excalibur IP, LLC Détermination de qualité de trafic au moyen de notation de trafic à base d'événement
CN104331306B (zh) * 2014-10-14 2017-05-10 北京齐尔布莱特科技有限公司 一种内容更新方法、设备以及系统
CN104580244B (zh) * 2015-01-26 2018-03-13 百度在线网络技术(北京)有限公司 恶意点击的防御方法和装置
KR101639752B1 (ko) * 2015-02-13 2016-07-15 네이버 주식회사 필터로직을 이용하여 컨텐츠 뷰를 집계하는 시스템 및 방법
CN105677869A (zh) * 2016-01-06 2016-06-15 广州神马移动信息科技有限公司 多维度搜索日志反作弊方法、系统及计算设备
CN107526748B (zh) * 2016-06-22 2021-08-03 华为技术有限公司 一种识别用户点击行为的方法和设备
CN108255885B (zh) * 2016-12-29 2020-11-06 北京酷我科技有限公司 一种歌曲的推荐方法及系统
CN110020206B (zh) * 2019-04-12 2021-10-15 北京搜狗科技发展有限公司 一种搜索结果排序方法及装置
CN110069691B (zh) * 2019-04-29 2021-05-28 百度在线网络技术(北京)有限公司 用于处理点击行为数据的方法和装置
CN111444408B (zh) * 2020-03-26 2021-09-14 腾讯科技(深圳)有限公司 网络搜索处理方法、装置、电子设备
JP6873343B1 (ja) * 2020-09-07 2021-05-19 シエンプレ株式会社 不正クリック防止システム、不正クリック防止方法及びプログラム

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6269361B1 (en) * 1999-05-28 2001-07-31 Goto.Com System and method for influencing a position on a search result list generated by a computer network search engine
KR20020020584A (ko) * 2000-09-09 2002-03-15 맹진기 인터넷 설문조사 시스템 및 방법과 그 프로그램 소스를저장한 기록매체

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6269361B1 (en) * 1999-05-28 2001-07-31 Goto.Com System and method for influencing a position on a search result list generated by a computer network search engine
KR20020020584A (ko) * 2000-09-09 2002-03-15 맹진기 인터넷 설문조사 시스템 및 방법과 그 프로그램 소스를저장한 기록매체

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8706551B2 (en) * 2003-09-04 2014-04-22 Google Inc. Systems and methods for determining user actions
US11100518B2 (en) 2003-09-04 2021-08-24 Google Llc Systems and methods for determining user actions
US11042886B2 (en) 2003-09-04 2021-06-22 Google Llc Systems and methods for determining user actions
US10515387B2 (en) 2003-09-04 2019-12-24 Google Llc Systems and methods for determining user actions
WO2008030670A1 (fr) * 2006-09-08 2008-03-13 Microsoft Corporation Détection et jugement de la fraude au clic
US8996404B2 (en) 2007-04-26 2015-03-31 Nhn Business Platform Corporation Method for processing invalid click and system for executing the method
CN103368857A (zh) * 2012-03-26 2013-10-23 北大方正集团有限公司 一种发送数据信息的方法及系统
CN102663062A (zh) * 2012-03-30 2012-09-12 奇智软件(北京)有限公司 一种处理搜索结果中无效链接的方法及装置
US9692833B2 (en) 2013-07-26 2017-06-27 Empire Technology Development Llc Device and session identification
EP3025245A4 (fr) * 2013-07-26 2017-05-03 Empire Technology Development LLC Identification de session et de dispositif
EP3025245A1 (fr) * 2013-07-26 2016-06-01 Empire Technology Development LLC Identification de session et de dispositif
WO2015012865A1 (fr) 2013-07-26 2015-01-29 Empire Technology Development, Llc Identification de session et de dispositif
CN105069061B (zh) * 2015-07-28 2019-03-12 安一恒通(北京)科技有限公司 历史浏览记录中网页的加载方法、系统、浏览器和服务器
CN105069061A (zh) * 2015-07-28 2015-11-18 安一恒通(北京)科技有限公司 历史浏览记录中网页的加载方法、系统、浏览器和服务器

Also Published As

Publication number Publication date
KR20040082633A (ko) 2004-09-30
JP2006520940A (ja) 2006-09-14
CN1761961A (zh) 2006-04-19
JP4358188B2 (ja) 2009-11-04
KR100619178B1 (ko) 2006-09-05
CN101388035A (zh) 2009-03-18
CN100533434C (zh) 2009-08-26

Similar Documents

Publication Publication Date Title
WO2004084097A1 (fr) Procede et appareil de detection de clics invalides sur un moteur de recherche internet
US8751601B2 (en) User interface that provides relevant alternative links
US6862610B2 (en) Method and apparatus for verifying the identity of individuals
CA2734774C (fr) Identification unique de peripheriques reseau distribues en l&#39;absence d&#39;informations d&#39;identification de peripherique ou d&#39;utilisateur explicitement fournies
US6910077B2 (en) System and method for identifying cloaked web servers
CA2294935C (fr) Procede et dispositif servant a rediriger des references de liens hypermedia exterieures au serveur
US7293012B1 (en) Friendly URLs
CN116324766A (zh) 通过浏览简档优化抓取请求
KR100619179B1 (ko) 인터넷 검색 엔진에 있어서의 무효 클릭 검출 방법 및 장치
US8909795B2 (en) Method for determining validity of command and system thereof
WO2003005240A1 (fr) Appareil destine a faire des recherches sur internet
KR20040083340A (ko) 인터넷 검색 엔진에 있어서의 무효 클릭 검출 방법 및 장치
KR19990018591U (ko) 인터넷 유해 사이트 접속 제한 장치
JP4780744B2 (ja) ウェブコンピューティングシステム
KR100368338B1 (ko) 이메일 주소를 이용한 타겟 웹페이지 접속방법
Ghazi et al. Proposing a Mechanism to Improve Web Usage Mining Automatically using Semantic Repository of the Data
Fletcher et al. Analytics Techniques
KR20020003327A (ko) 전자 게시판 시스템

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2004807418X

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 2005518761

Country of ref document: JP

122 Ep: pct application non-entry in european phase