WO2000010107A1 - Analyzing internet-based information - Google Patents

Analyzing internet-based information Download PDF

Info

Publication number
WO2000010107A1
WO2000010107A1 PCT/US1999/018645 US9918645W WO0010107A1 WO 2000010107 A1 WO2000010107 A1 WO 2000010107A1 US 9918645 W US9918645 W US 9918645W WO 0010107 A1 WO0010107 A1 WO 0010107A1
Authority
WO
WIPO (PCT)
Prior art keywords
domain name
web site
entities
web
information
Prior art date
Application number
PCT/US1999/018645
Other languages
French (fr)
Inventor
Jeffrey Dean Black
Jason Harvey Titus
Ira Joseph Woodhead
Original Assignee
Iatlas Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Iatlas Corporation filed Critical Iatlas Corporation
Priority to AU55660/99A priority Critical patent/AU5566099A/en
Publication of WO2000010107A1 publication Critical patent/WO2000010107A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/954Navigation, e.g. using categorised browsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99934Query formulation, input preparation, or translation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99935Query augmenting and refining, e.g. inexact access

Definitions

  • This application relates to analyzing Internet-based information.
  • World-Wide Web (“Web”) protocols call for text-based addressing information, which is highly suitable for human users, to be converted to number- based addressing information, which is highly suitable for computers.
  • Much of the information available on the Web is organized into Web pages that can be retrieved and displayed by Web browser software ("browser") under the direction of a user.
  • Web browser Web browser software
  • Each of the Web pages is identifiable by a respective Uniform Resource
  • Locator text string such as "http://www.isp321.com/frontpage.html", that the browser can use to select the page.
  • Each URL includes a domain name,
  • a domain name registry organization maintains the domain name registration information, which may include name, address, and other information that allows the organization to bill the entity for payment for the maintenance. (It is to be understood that the term "registry”, as used herein, also refers to a domain name registrar or any other entity that may provide assistance in registering a domain name.)
  • the entity identifies a domain name server computer system that stores a numeric address (known as an IP address) that corresponds to the domain name, and the domain name registry stores the identity of the domain name server computer system together with the domain name in a file known as a zone file.
  • the domain name registry also reports the domain name together with the identity of the domain name server computer system to a root zone server computer system, which is a high level computer system that is responsible for helping other computer systems properly derive IP addresses from domain names (e.g., as described below).
  • the root zone server computer system receives such reports from effectively all domain name registries as domain names are registered, and therefore the root zone server computer system has a comprehensive list of domain names that are registered on the Web.
  • the Web browser When the Web browser is directed to retrieve information from a Web site identified by a URL, the browser must determine the IP address of the Web site to which the URL refers.
  • the root zone server computer system determines, based on information previously supplied by the domain name registry, the identity of the domain name server computer system for the domain name and reports the identity to the browser. The browser then refers to the domain name server computer system to find out the IP address, and uses the IP address to attempt to contact the Web site. If the Web site is functioning, the browser receives information including a Web page formatting header ("HTML header") from the Web site. If the Web site is not functioning, the browser receives no response from the Web site, and times out indicating an error.
  • HTML header Web page formatting header
  • An Internet service provider is an example of an entity that may have a registered domain name for a Web site.
  • an ISP has customers such as individuals or businesses for whom the ISP stores Web pages on the Web site for retrieval by Web browser software.
  • the ISP may have a customer Maple Street Plumbing for which the ISP stores a home Web page having a URL that includes a prefix "http://www.isp321.com/ ⁇ maplestplumb”.
  • a home Web page is typically the only or the primary entry point into a Web site or a set of Web pages that are under the control of an entity.
  • a Web portal site such as "Yahoo.com” that maintains, in pages organized by categories, links to Web sites and home pages that are under the control of other entities.
  • a Web portal site allows another entity to create a link from the Web portal site to the other entity's Web site or home page by submitting information to the Web portal site.
  • a Web search engine site (“search engine”) maintains and updates a search engine database, i.e., a Web page record database, that includes a Web page record for every Web page that has been turned up by Web sweeping software that sweeps the World Wide Web for any and all Web pages.
  • a typical Web page record includes a URL for the respective Web page, an excerpt or other subset of the information provided by the Web page, and a date indicating the most recent update of the Web page record.
  • the Web sweeping software discovers information on the Web, including domain names previously unknown to the search engine, by following links among Web pages.
  • Some information about an entity may not be available on a Web site that is under the control of the entity.
  • public financial information about a company may be stored in a database that is not linked to the company's Web site or is not directly accessible by Web browser software, such as a database under the control of a financial services firm.
  • statistical information regarding Web activity for companies or other entities in a particular sector of human activity, such as an industrial sector are expressed in broad terms such as the total number of uniquely qualified domain name Web sites ("unique Web sites") that are sponsored by all of the companies in a particular industrial sector. Such statistics may prove misleading for at least some purposes.
  • the resulting average i.e., one unique Web site per company, can make the sector appear to be well represented by unique Web sites, even if in actuality all ten unique Web sites belong to only one of the companies and none of the other nine companies has a unique Web site.
  • An Internet analysis system gathers domain names and determines whether the domain names are associated with functioning Web sites.
  • a variation of the Internet analysis system that includes an entity information database and a mapping database is able to generate reports regarding Web activity in sectors of human activity such as industrial sectors.
  • Different aspects of the invention allow one or more of the following.
  • a comprehensive list of tested domain names can be produced. Domain names for Web sites that are difficult or impossible for a search engine to discover can be made available to the search engine to allow the search engine to produce search
  • the domain names Before being provided to the search engine, the domain names may be prioritized or sorted according to one or more attributes (such as industry sector or company size) of the respective entities that are registered as having control over the domain names. Highly useful statistics can be produced concerning the number of entities in an industrial sector that are registered as having control over Web sites. Such statistics can be used for highly effective marketing or sales approaches in which Web oriented products are targeted at potential customers in industrial sectors that are shown by the statistics to have substantial Web activity.
  • attributes such as industry sector or company size
  • Figs. 1, 6, and 7 are block diagrams of computer-based systems.
  • Figs. 2, 3, 4, 5, and 10 are flow diagrams of computer-based procedures.
  • Figs. 8 and 11 are illustrations of output produced by software.
  • Figs. 9A-9B are illustrations of computer data.
  • Fig. 1 illustrates an Internet analysis system 110 in which a domain names analysis application 112 executes a procedure 1000 (Fig. 2) to collect domain names 114 from a domain name source 116 (step 1010), test the domain names to determine which of the domain names are domain names that correspond to functioning Web sites ("live domain names" 115) (step 1020), and deliver live domain names to a search engine 116 for use in searching the Web (step 1030).
  • Fig. 2 illustrates an Internet analysis system 110 in which a domain names analysis application 112 executes a procedure 1000 (Fig. 2) to collect domain names 114 from a domain name source 116 (step 1010), test the domain names to determine which of the domain names are domain names that correspond to functioning Web sites ("live domain names" 115) (step 1020), and deliver live domain names to a search engine 116 for use in searching the Web (step 1030).
  • the domain names analysis application collects domain names as follows.
  • the domain name source may include a domain name registry or a root zone server or both.
  • the domain names analysis application executes a procedure 2000 (Fig. 3) to submit a request to the domain name registry for a zone file (step 2010), download the requested zone file (step 2020), and extract domain names from the requested zone file (step 2030).
  • the zone file is downloaded by use of a binary transfer procedure known as an FTP transfer.
  • Fig. 11 illustrates an example of a portion of a zone file and extracted domain names. The example is divided into sections.
  • the first line includes the domain name (in the first column) and its corresponding domain name server (in the last column), and the next line lists the domain name server (in the first column) and its actual IP address (in the last column). If the domain name has more than one domain name server, the section may include additional lines, each including name and IP address information for another domain name
  • domain names are collected from a root zone server as follows. First, the root zone server (e.g., F. Root-Servers. net) is selected and data from the root zone server is directed into a file; the following is a sequence in which the F root server is selected and the F root server is directed to unload all data that
  • root zone servers are responsible for different domain name extensions (e.g., "com”, “net”, “edu”, “ca”, “uk")
  • collecting a comprehensive list of domain names requires gathering domain name information from multiple root zone servers.
  • other root zone servers are identified by use of a "whois" command. For example, to identify a root zone server that is responsible for "ca” which is the domain name extension for Canada, the following command line is used. > whois ca-dom The response generated in this case is "relay. cdnnet.ca”. Domain names are gathered from the "relay. cdnnet.ca” server by using a variation of the process described above, which variation uses “relay. cdnnet.ca” in place of "f. root-server. net”.
  • the domain names analysis application executes a procedure 4000 (Fig. 5) to attempt to acquire the IP address associated with the domain name (step 4010) and, if the IP address is acquired, to attempt, by a request known as an HTTP protocol query, to retrieve an HTML header from a server having the IP address (step 4020).
  • a request known as an HTTP protocol query
  • HTTP protocol query to retrieve an HTML header from a server having the IP address
  • a prefix "www.” is added to the domain name to form a URL, and the URL is handled much as a typical URL is handled by Web browser software.
  • a protocol known as telnet is used to attempt to connect to a Web server and retrieve an HTML header.
  • the domain name is determined to be a live domain name (step 4030). If either attempt fails, the domain name is determined not to be a live domain name (step 4040). In the case of the attempt to retrieve the HTML header, failure takes the form of a timing out, because the domain names analysis application fails to receive any response. If the Web site returns an error page, it does not qualify as a failure, because the error page includes the HTML header. In such a case, the Web site is functioning, but its
  • contents may be corrupted or may be blocked by security arrangements.
  • the live domain names are delivered to the search engine as a list that is added to the search engine's list of domain names to be searched for content to be recorded in the search engine's index. In a variation, all or some of the domain
  • names are delivered after being sorted in accordance with a prioritization scheme that takes into account information, gathered from mapping and entity
  • the only domain names delivered may be domain names registered as being under the control of telecommunications companies or companies in a particular city.
  • Fig. 6 illustrates a variation 200 of the Internet analysis system that allows a user to produce research reports regarding Web activity in sectors of human activity.
  • System 200 includes a mapping database 12 (Fig. 7) that maps URLs or domain names 14 to entities 16 such as people, businesses, or government agencies, as described in more detail below.
  • the mapping database may indicate that any URL that begins with "http:/ /www.uspto.gov” is for a Web page controlled by the U.S. Patent and Trademark Office, or that domain names "elmstdogs.com” and “elmstcats.com” are under the control of a company named Elm Street Pets, Inc.
  • System 200 also includes a Web activity analysis application 202 (described below) and an entity information database 28 (described below) that includes information such as geographic information about entities to which URLs or domain names are mapped in the mapping database.
  • entity information database 28 (described below) that includes information such as geographic information about entities to which URLs or domain names are mapped in the mapping database.
  • the mapping database may use a unique identification number ("unique
  • mapping database such as a 9-digit American Business Information (“ABI”) number or a DUNNS number, to identify an entity so that other information about the entity can be retrieved from the entity information database or elsewhere by searching under the unique ID.
  • ABSI numbers are sponsored by infoUSA.
  • unique IDs from the mapping database may be used to search the entity information database to produce a subset of the mapping database that has records only for entities having a particular characteristic, such as a particular geographic location or between 1000 and 5000 employees.
  • each of the entities may be assigned different unique IDs, and the different unique IDs may be linked in the mapping database to note the relationship among the entities. For example, a company that has offices in different locations may be assigned a unique ID for the company itself and a respective different unique ID for each location. In another example, when two previously unrelated companies merge or one is acquired by the other, each may
  • Information in the mapping database may be derived from information submitted by or on behalf of the entity when a domain name is registered. For example, when the company Elm Street Pets, Inc. registers the domain names
  • the entity may submit information to the mapping database in other ways such as in an on-line questionnaire that feeds the mapping database.
  • Information in the mapping database may be derived from information provided by an intermediary such as an ISP or an Internet portal.
  • an intermediary such as an ISP or an Internet portal.
  • ISP having a domain name "isp321.com” may have a customer Maple Street Plumbing for which the ISP hosts and administers a home page having a home page address "www.isp321.com/ ⁇ maplestplumb”.
  • the ISP may have name, address, and telephone number information for the purpose of billing Maple Street Plumbing for such hosting and administration, and may allow such information along with the home page address to be used to link the home page address to Maple Street Plumbing in the mapping database.
  • an Internet portal may allow an entity such as Maple
  • the Internet portal may allow information in the entry, and perhaps
  • the mapping database, the entity information database, and the Web analysis application allow a report such as report 204 (Fig. 8) to be generated that shows, in absolute numbers and as a percentage, how many entities in an industrial sector are registered as controlling one or more Web sites.
  • the entities included in the report may also or instead be limited by geographical area or by any other attribute stored for entities in the entity information database (Figs. 9A-
  • the Web analysis application executes a procedure 5000 (Fig. 10) to search the entity information database for entities that match an industrial sector code such as an SIC code or a North American Industrial Classification System (“NAICS") code (“sector entities”) (step 5010), determine from the mapping database which of the sector entities are registered as controlling one or more Web sites ("Web entities") (step 5020), and account for each of the sector entities and Web entities in the report (step 5030), such as by presenting quantities for sector entities and Web entities and indicating
  • Web entities can be tracked over time to demonstrate the growth in the number or
  • mapping database and applications based on the mapping database may take advantage of a hierarchical organization of Web pages, by treating similarly a mapped page and all pages below the mapped page, such as pages sharing a particular prefix with the mapped page. For example, all pages sharing
  • the mapping database may map an entity to Web pages maintained at different Web sites. For example, Maple Street Plumbing may have a first set of
  • the entity information database may include a database such as EDGAR that includes information about companies.
  • mapping database or the entity information database
  • One or more of the databases referenced above may be or include a relational database and may have records to which fields may be added readily.
  • any of many different types of computer equipment may be used.
  • one or more Intel-based personal computers may be used that run an SQL database on Linux and that programs written in Perl or the C programming language with interfaces to the SQL database.
  • the technique i.e., the procedures described above
  • an operating system such as Unix, Linux, Microsoft Windows 95, 98, or NT, or Macintosh OS, that each include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device such as a keyboard, and at least one output
  • Program code is applied to data entered using the input device to perform the technique described above and to generate output information.
  • output information is applied to one or more output devices such as a display screen of the computer.
  • a high level procedural or object-oriented programming language such as Perl, C, C++, or Java to communicate with a computer system.
  • the programs can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language.
  • each such computer program is stored on a storage medium or device, such as ROM or optical or magnetic disc, that is readable by a general or special purpose programmable computer for configuring and operating the computer when the storage medium or device is read by the computer to perform the procedures described in this document.
  • a storage medium or device such as ROM or optical or magnetic disc
  • the system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so
  • the user may be a human being or a non-human entity such as a computer program or an automated device that may interact with one or more of
  • databases may serve as the entity information database, which may take the form of any mechanism that provides automated access to information, such as a
  • System 110 may also refer to the
  • mapping and entity information databases before reporting the live domain names to the search engine.
  • system 110 can retrieve entity information relating to the live domain names, and can sort the live domain names according to the entity information, such as by listing first the live domain names that pertain to an industry that is indicated as being particularly relevant to the search engine or users of the search engine.

Abstract

A domain name is acquired from a domain name registry. It is determined whether the domain name represents any functioning Web site. If the domain name is associated with at least one functioning Web site, it is recorded that the domain name represents at least one functioning Web site. A set of criteria is acquired. A set of entities is identified that meet the criteria. It is determined how many entities in the set of entities are registered as having control over at least one Web site.

Description

ANALYZING INTERNET-BASED INFORMATION
Cross-Reference to Related Applications
This application claims the benefit of United States Provisional Application Serial No. 60/097029 entitled "Collecting, Combining, Analyzing, and Using Internet and Business Information" filed on August 17, 1998, which is incorporated
herein.
Background of the Invention
This application relates to analyzing Internet-based information. World-Wide Web ("Web") protocols call for text-based addressing information, which is highly suitable for human users, to be converted to number- based addressing information, which is highly suitable for computers. Much of the information available on the Web is organized into Web pages that can be retrieved and displayed by Web browser software ("browser") under the direction of a user. Each of the Web pages is identifiable by a respective Uniform Resource
Locator text string ("URL"), such as "http://www.isp321.com/frontpage.html", that the browser can use to select the page. Each URL includes a domain name,
such as "isp321.com", that identifies the Web site where the corresponding Web
page is stored for retrieval by browser software. Each domain name is registered
by an entity that controls the corresponding Web site and Web pages. A domain name registry organization maintains the domain name registration information, which may include name, address, and other information that allows the organization to bill the entity for payment for the maintenance. (It is to be understood that the term "registry", as used herein, also refers to a domain name registrar or any other entity that may provide assistance in registering a domain name.) When the domain name is registered, the entity identifies a domain name server computer system that stores a numeric address (known as an IP address) that corresponds to the domain name, and the domain name registry stores the identity of the domain name server computer system together with the domain name in a file known as a zone file. The domain name registry also reports the domain name together with the identity of the domain name server computer system to a root zone server computer system, which is a high level computer system that is responsible for helping other computer systems properly derive IP addresses from domain names (e.g., as described below). The root zone server computer system receives such reports from effectively all domain name registries as domain names are registered, and therefore the root zone server computer system has a comprehensive list of domain names that are registered on the Web. When the Web browser is directed to retrieve information from a Web site identified by a URL, the browser must determine the IP address of the Web site to which the URL refers. If the browser submits the domain name part of the URL to the root zone server computer system, the root zone server computer system determines, based on information previously supplied by the domain name registry, the identity of the domain name server computer system for the domain name and reports the identity to the browser. The browser then refers to the domain name server computer system to find out the IP address, and uses the IP address to attempt to contact the Web site. If the Web site is functioning, the browser receives information including a Web page formatting header ("HTML header") from the Web site. If the Web site is not functioning, the browser receives no response from the Web site, and times out indicating an error.
An Internet service provider ("ISP") is an example of an entity that may have a registered domain name for a Web site. Typically, an ISP has customers such as individuals or businesses for whom the ISP stores Web pages on the Web site for retrieval by Web browser software. For example, the ISP may have a customer Maple Street Plumbing for which the ISP stores a home Web page having a URL that includes a prefix "http://www.isp321.com/~maplestplumb". A home Web page is typically the only or the primary entry point into a Web site or a set of Web pages that are under the control of an entity.
Another example of an entity that may have a registered domain name is a Web portal site such as "Yahoo.com" that maintains, in pages organized by categories, links to Web sites and home pages that are under the control of other entities. Typically, a Web portal site allows another entity to create a link from the Web portal site to the other entity's Web site or home page by submitting information to the Web portal site.
A Web search engine site ("search engine") maintains and updates a search engine database, i.e., a Web page record database, that includes a Web page record for every Web page that has been turned up by Web sweeping software that sweeps the World Wide Web for any and all Web pages. A typical Web page record includes a URL for the respective Web page, an excerpt or other subset of the information provided by the Web page, and a date indicating the most recent update of the Web page record. When a user directs a Web search engine to execute a search, the Web page record database is searched and then search engine results are displayed to the user in the form of a list of Web page records.
The Web sweeping software discovers information on the Web, including domain names previously unknown to the search engine, by following links among Web pages. Some information about an entity may not be available on a Web site that is under the control of the entity. For example, public financial information about a company may be stored in a database that is not linked to the company's Web site or is not directly accessible by Web browser software, such as a database under the control of a financial services firm. In general, statistical information regarding Web activity for companies or other entities in a particular sector of human activity, such as an industrial sector, are expressed in broad terms such as the total number of uniquely qualified domain name Web sites ("unique Web sites") that are sponsored by all of the companies in a particular industrial sector. Such statistics may prove misleading for at least some purposes. For example, if ten companies belong to a sector that is known to have ten unique Web sites, the resulting average, i.e., one unique Web site per company, can make the sector appear to be well represented by unique Web sites, even if in actuality all ten unique Web sites belong to only one of the companies and none of the other nine companies has a unique Web site.
Summary of the Invention Methods and systems are provided for analyzing Internet-based information. An Internet analysis system is provided that gathers domain names and determines whether the domain names are associated with functioning Web sites. A variation of the Internet analysis system that includes an entity information database and a mapping database is able to generate reports regarding Web activity in sectors of human activity such as industrial sectors. Different aspects of the invention allow one or more of the following. A comprehensive list of tested domain names can be produced. Domain names for Web sites that are difficult or impossible for a search engine to discover can be made available to the search engine to allow the search engine to produce search
results that account for the contents of the previously undiscovered Web sites. Before being provided to the search engine, the domain names may be prioritized or sorted according to one or more attributes (such as industry sector or company size) of the respective entities that are registered as having control over the domain names. Highly useful statistics can be produced concerning the number of entities in an industrial sector that are registered as having control over Web sites. Such statistics can be used for highly effective marketing or sales approaches in which Web oriented products are targeted at potential customers in industrial sectors that are shown by the statistics to have substantial Web activity.
Other features and advantages will become apparent from the following description, including the drawings, and from the claims.
Brief Description of the Drawings
Figs. 1, 6, and 7 are block diagrams of computer-based systems.
Figs. 2, 3, 4, 5, and 10 are flow diagrams of computer-based procedures.
Figs. 8 and 11 are illustrations of output produced by software.
Figs. 9A-9B are illustrations of computer data. Detailed Description Fig. 1 illustrates an Internet analysis system 110 in which a domain names analysis application 112 executes a procedure 1000 (Fig. 2) to collect domain names 114 from a domain name source 116 (step 1010), test the domain names to determine which of the domain names are domain names that correspond to functioning Web sites ("live domain names" 115) (step 1020), and deliver live domain names to a search engine 116 for use in searching the Web (step 1030).
The domain names analysis application collects domain names as follows. The domain name source may include a domain name registry or a root zone server or both. To collect domain names from the domain name registry, the domain names analysis application executes a procedure 2000 (Fig. 3) to submit a request to the domain name registry for a zone file (step 2010), download the requested zone file (step 2020), and extract domain names from the requested zone file (step 2030). In a specific embodiment, the zone file is downloaded by use of a binary transfer procedure known as an FTP transfer. Fig. 11 illustrates an example of a portion of a zone file and extracted domain names. The example is divided into sections. In a typical section, as shown in the example, the first line includes the domain name (in the first column) and its corresponding domain name server (in the last column), and the next line lists the domain name server (in the first column) and its actual IP address (in the last column). If the domain name has more than one domain name server, the section may include additional lines, each including name and IP address information for another domain name
server. After the zone file is downloaded, the domain names are extracted and duplicate domain names are removed. To collect domain names from the root zone server, the domain names analysis application executes a procedure 3000 (Fig. 4) to request domain name information record by record from the root zone server (step 3010) and extract the domain names from the domain name information (step 3020). In a specific embodiment, domain names are collected from a root zone server as follows. First, the root zone server (e.g., F. Root-Servers. net) is selected and data from the root zone server is directed into a file; the following is a sequence in which the F root server is selected and the F root server is directed to unload all data that
ends in "com" into a file called "com.txt".
> nslookup > server f.root-servers.net
> Is com > com.txt
Next, a Perl program is executed to extract the domain names from the file in accordance with the principles described above in connection with the zone file
example. Finally, the process, with appropriate changes, is repeated as necessary to collect other domain names. Since different root zone servers are responsible for different domain name extensions (e.g., "com", "net", "edu", "ca", "uk"), collecting a comprehensive list of domain names requires gathering domain name information from multiple root zone servers. In a specific embodiment, other root zone servers are identified by use of a "whois" command. For example, to identify a root zone server that is responsible for "ca" which is the domain name extension for Canada, the following command line is used. > whois ca-dom The response generated in this case is "relay. cdnnet.ca". Domain names are gathered from the "relay. cdnnet.ca" server by using a variation of the process described above, which variation uses "relay. cdnnet.ca" in place of "f. root-server. net".
To test a domain name, the domain names analysis application executes a procedure 4000 (Fig. 5) to attempt to acquire the IP address associated with the domain name (step 4010) and, if the IP address is acquired, to attempt, by a request known as an HTTP protocol query, to retrieve an HTML header from a server having the IP address (step 4020). In a specific implementation, a prefix "www." is added to the domain name to form a URL, and the URL is handled much as a typical URL is handled by Web browser software. For example, a protocol known as telnet is used to attempt to connect to a Web server and retrieve an HTML header. The following command lines illustrate an example for
the case of the "uspto.gov" domain name: > telnet www.uspto.gov 80
> dump
If both attempts 4010, 4020 succeed, the domain name is determined to be a live domain name (step 4030). If either attempt fails, the domain name is determined not to be a live domain name (step 4040). In the case of the attempt to retrieve the HTML header, failure takes the form of a timing out, because the domain names analysis application fails to receive any response. If the Web site returns an error page, it does not qualify as a failure, because the error page includes the HTML header. In such a case, the Web site is functioning, but its
contents may be corrupted or may be blocked by security arrangements.
The live domain names are delivered to the search engine as a list that is added to the search engine's list of domain names to be searched for content to be recorded in the search engine's index. In a variation, all or some of the domain
names are delivered after being sorted in accordance with a prioritization scheme that takes into account information, gathered from mapping and entity
information databases (described below), pertaining to respective entities that are
registered as having control over the domain names. For example, the only domain names delivered may be domain names registered as being under the control of telecommunications companies or companies in a particular city.
Fig. 6 illustrates a variation 200 of the Internet analysis system that allows a user to produce research reports regarding Web activity in sectors of human activity. System 200 includes a mapping database 12 (Fig. 7) that maps URLs or domain names 14 to entities 16 such as people, businesses, or government agencies, as described in more detail below. For example, the mapping database may indicate that any URL that begins with "http:/ /www.uspto.gov" is for a Web page controlled by the U.S. Patent and Trademark Office, or that domain names "elmstdogs.com" and "elmstcats.com" are under the control of a company named Elm Street Pets, Inc. System 200 also includes a Web activity analysis application 202 (described below) and an entity information database 28 (described below) that includes information such as geographic information about entities to which URLs or domain names are mapped in the mapping database. The mapping database may use a unique identification number ("unique
ID"), such as a 9-digit American Business Information ("ABI") number or a DUNNS number, to identify an entity so that other information about the entity can be retrieved from the entity information database or elsewhere by searching under the unique ID. (ABI numbers are sponsored by infoUSA.) For example, unique IDs from the mapping database may be used to search the entity information database to produce a subset of the mapping database that has records only for entities having a particular characteristic, such as a particular geographic location or between 1000 and 5000 employees.
With respect to the mapping database, where an entity constitutes a portion of another entity, each of the entities may be assigned different unique IDs, and the different unique IDs may be linked in the mapping database to note the relationship among the entities. For example, a company that has offices in different locations may be assigned a unique ID for the company itself and a respective different unique ID for each location. In another example, when two previously unrelated companies merge or one is acquired by the other, each may
retain its unique ID and a new, different unique ID may be assigned to the combination of the two companies, or both companies may be assigned the same
unique ID.
Information in the mapping database may be derived from information submitted by or on behalf of the entity when a domain name is registered. For example, when the company Elm Street Pets, Inc. registers the domain names
"elmstdogs.com" and "elmstcats.com" with a domain name registry, the company associates the domain names with at least enough information, such as name,
address, and telephone number information, to allow the domain name registry to
bill the company for maintenance of the registration. The entity may submit information to the mapping database in other ways such as in an on-line questionnaire that feeds the mapping database.
Information in the mapping database may be derived from information provided by an intermediary such as an ISP or an Internet portal. For example, an
ISP having a domain name "isp321.com" may have a customer Maple Street Plumbing for which the ISP hosts and administers a home page having a home page address "www.isp321.com/~maplestplumb". In such a case, the ISP may have name, address, and telephone number information for the purpose of billing Maple Street Plumbing for such hosting and administration, and may allow such information along with the home page address to be used to link the home page address to Maple Street Plumbing in the mapping database.
In another example, an Internet portal may allow an entity such as Maple
Street Plumbing to create an entry or listing named "Maple Street Plumbing" in a "plumbing" section of a on-line directory maintained by the portal, to allow a user
to view home page "www.isp321.com/~maplestplumb" by selecting the entry. In such a case, the Internet portal may allow information in the entry, and perhaps
any address and telephone number information submitted by the entity during creation of the entry, to be used to link the home page to Maple Street Plumbing
in the mapping database. The mapping database, the entity information database, and the Web analysis application allow a report such as report 204 (Fig. 8) to be generated that shows, in absolute numbers and as a percentage, how many entities in an industrial sector are registered as controlling one or more Web sites. The entities included in the report may also or instead be limited by geographical area or by any other attribute stored for entities in the entity information database (Figs. 9A-
9B illustrate a list of such attributes). To generate the report, the Web analysis application executes a procedure 5000 (Fig. 10) to search the entity information database for entities that match an industrial sector code such as an SIC code or a North American Industrial Classification System ("NAICS") code ("sector entities") (step 5010), determine from the mapping database which of the sector entities are registered as controlling one or more Web sites ("Web entities") (step 5020), and account for each of the sector entities and Web entities in the report (step 5030), such as by presenting quantities for sector entities and Web entities and indicating
the number of Web entities as a percentage of the sector entities.
Other reports, such as time based reports, can also be generated by the Web analysis application. For example, the percentage of sector entities that are
Web entities can be tracked over time to demonstrate the growth in the number or
percentage of entities that are registered as controlling one or more Web sites
("online penetration"). By limiting the entities in the report by entity size (e.g., number of employees) or other attribute (e.g., obtained from the entity information database), the report can demonstrate other aspects of Web activity,
such as the difference in online penetration among large, medium, and small companies, or which industrial sectors have the most or least online penetration. The mapping database and applications based on the mapping database may take advantage of a hierarchical organization of Web pages, by treating similarly a mapped page and all pages below the mapped page, such as pages sharing a particular prefix with the mapped page. For example, all pages sharing
the prefix "http://www.isp321.com" may be treated as being under the control of an ISP named Global ISP Co.
The mapping database may map an entity to Web pages maintained at different Web sites. For example, Maple Street Plumbing may have a first set of
Web pages at the Global ISP Co. site and a second set of Web pages at another ISP's site. The entity information database may include a database such as EDGAR that includes information about companies.
Information in the mapping database or the entity information database
may allow searches to be limited by relative size of entities, such as size in an
industry. One or more of the databases referenced above may be or include a relational database and may have records to which fields may be added readily.
Any of many different types of computer equipment may be used. For example, one or more Intel-based personal computers may be used that run an SQL database on Linux and that programs written in Perl or the C programming language with interfaces to the SQL database.
The technique (i.e., the procedures described above) may be implemented in hardware or software, or a combination of both. In at least some cases, it is advantageous if the technique is implemented in computer programs executing on one or more programmable computers, such as a personal computer running or
able to run an operating system such as Unix, Linux, Microsoft Windows 95, 98, or NT, or Macintosh OS, that each include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device such as a keyboard, and at least one output
device. Program code is applied to data entered using the input device to perform the technique described above and to generate output information. The
output information is applied to one or more output devices such as a display screen of the computer.
In at least some cases, it is advantageous if each program is implemented in
a high level procedural or object-oriented programming language such as Perl, C, C++, or Java to communicate with a computer system. However, the programs can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language.
In at least some cases, it is advantageous if each such computer program is stored on a storage medium or device, such as ROM or optical or magnetic disc, that is readable by a general or special purpose programmable computer for configuring and operating the computer when the storage medium or device is read by the computer to perform the procedures described in this document. The system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so
configured causes a computer to operate in a specific and predefined manner.
Other embodiments are within the scope of the following claims. For example, the user may be a human being or a non-human entity such as a computer program or an automated device that may interact with one or more of
the databases or one or more of the applications via an application programming interface ("API") or a network message. An on-line information store or multiple
databases may serve as the entity information database, which may take the form of any mechanism that provides automated access to information, such as a
spreadsheet file or a store of email messages. System 110 may also refer to the
mapping and entity information databases before reporting the live domain names to the search engine. For example, by referring to the mapping and entity information databases, system 110 can retrieve entity information relating to the live domain names, and can sort the live domain names according to the entity information, such as by listing first the live domain names that pertain to an industry that is indicated as being particularly relevant to the search engine or users of the search engine.

Claims

What is claimed is:Claims
1. A method comprising: acquiring a domain name from a domain name registry; determining whether the domain name represents any functioning Web
site; and if the domain name is associated with at least one functioning Web site, recording that the domain name represents at least one functioning Web site.
2. The method of claim 1, further comprising: if it is indicated that the domain name represents at least one functioning Web site, submitting the domain name to a search engine.
3. The method of claim 1, further comprising:
identifying an entity that is registered as having control over the domain
name; determining whether the entity meets a set of prioritization criteria; and submitting the domain name to a search engine only if the entity meets the
set of prioritization criteria.
4. A method comprising: acquiring a set of criteria;
identifying a set of entities that meet the criteria; and determining how many entities in the set of entities are registered as having control over at least one Web site.
5. The method of claim 4, comprising: deriving a statistic from a result of the determination.
6. Computer software, residing on a computer-readable storage medium,
comprising a set of instructions for use in a computer system to cause the computer system to:
acquire a domain name from a domain name registry; determine whether the domain name represents any functioning Web site;
and if the domain name is associated with at least one functioning Web site,
record that the domain name represents at least one functioning Web site.
7. Computer software, residing on a computer-readable storage medium, comprising a set of instructions for use in a computer system to cause the computer system to: acquire a set of criteria;
identify a set of entities that meet the criteria; and determine how many entities in the set of entities are registered as having control over at least one Web site.
8. A system comprising: an acquirer that acquires a domain name from a domain name registry; a determiner that determines whether the domain name represents any functioning Web site; and a recorder that, if the domain name is associated with at least one functioning Web site, records that the domain name represents at least one
functioning Web site.
9. A system comprising:
an acquirer that acquires a set of criteria;
an identifier that identifies a set of entities that meet the criteria; and a determiner that determines how many entities in the set of entities are registered as having control over at least one Web site.
PCT/US1999/018645 1998-08-17 1999-08-16 Analyzing internet-based information WO2000010107A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU55660/99A AU5566099A (en) 1998-08-17 1999-08-16 Analyzing internet-based information

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US9702998P 1998-08-17 1998-08-17
US60/097,029 1998-08-17

Publications (1)

Publication Number Publication Date
WO2000010107A1 true WO2000010107A1 (en) 2000-02-24

Family

ID=22260433

Family Applications (4)

Application Number Title Priority Date Filing Date
PCT/US1999/018644 WO2000010106A1 (en) 1998-08-17 1999-08-16 Mapping information sources
PCT/US1999/018646 WO2000010108A1 (en) 1998-08-17 1999-08-16 Dynamically categorizing entity information
PCT/US1999/018645 WO2000010107A1 (en) 1998-08-17 1999-08-16 Analyzing internet-based information
PCT/US1999/018643 WO2000010105A1 (en) 1998-08-17 1999-08-16 Enhancing computer-based searching

Family Applications Before (2)

Application Number Title Priority Date Filing Date
PCT/US1999/018644 WO2000010106A1 (en) 1998-08-17 1999-08-16 Mapping information sources
PCT/US1999/018646 WO2000010108A1 (en) 1998-08-17 1999-08-16 Dynamically categorizing entity information

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/US1999/018643 WO2000010105A1 (en) 1998-08-17 1999-08-16 Enhancing computer-based searching

Country Status (5)

Country Link
US (2) US6654813B1 (en)
EP (1) EP1105818A1 (en)
JP (2) JP2002522847A (en)
AU (4) AU5565899A (en)
WO (4) WO2000010106A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002013057A1 (en) * 2000-08-04 2002-02-14 Sharinga Networks Inc. Network address resolution
US6735585B1 (en) 1998-08-17 2004-05-11 Altavista Company Method for search engine generating supplemented search not included in conventional search result identifying entity data related to portion of located web page
US7254573B2 (en) * 2002-10-02 2007-08-07 Burke Thomas R System and method for identifying alternate contact information in a database related to entity, query by identifying contact information of a different type than was in query which is related to the same entity

Families Citing this family (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6654813B1 (en) * 1998-08-17 2003-11-25 Alta Vista Company Dynamically categorizing entity information
JP4460693B2 (en) * 1999-10-26 2010-05-12 富士通株式会社 Network system with information retrieval function
US8271316B2 (en) 1999-12-17 2012-09-18 Buzzmetrics Ltd Consumer to business data capturing system
CA2298194A1 (en) * 2000-02-07 2001-08-07 Profilium Inc. Method and system for delivering and targeting advertisements over wireless networks
AU2001255806A1 (en) * 2000-03-14 2001-09-24 Sony Electronics Inc. A method and device for forming a semantic description
US6968380B1 (en) 2000-05-30 2005-11-22 International Business Machines Corporation Method and system for increasing ease-of-use and bandwidth utilization in wireless devices
US6985933B1 (en) * 2000-05-30 2006-01-10 International Business Machines Corporation Method and system for increasing ease-of-use and bandwidth utilization in wireless devices
US6983379B1 (en) * 2000-06-30 2006-01-03 Hitwise Pty. Ltd. Method and system for monitoring online behavior at a remote site and creating online behavior profiles
US7747713B1 (en) * 2000-06-30 2010-06-29 Hitwise Pty. Ltd. Method and system for classifying information available on a computer network
WO2002003219A1 (en) 2000-06-30 2002-01-10 Plurimus Corporation Method and system for monitoring online computer network behavior and creating online behavior profiles
WO2002008940A2 (en) * 2000-07-20 2002-01-31 Johnson Rodney D Information archival and retrieval system for internetworked computers
NL1016379C2 (en) * 2000-07-25 2002-01-28 Alphonsus Albertus Schirris Information searching method for e.g. internet, uses synonyms or translations of inputted search terms
JP2002084561A (en) * 2000-09-06 2002-03-22 Nec Corp Connection system, connection method therefor, and recording medium in which connection program is recorded
JP4200645B2 (en) * 2000-09-08 2008-12-24 日本電気株式会社 Information processing apparatus, information processing method, and recording medium
US7197470B1 (en) * 2000-10-11 2007-03-27 Buzzmetrics, Ltd. System and method for collection analysis of electronic discussion methods
US7185065B1 (en) 2000-10-11 2007-02-27 Buzzmetrics Ltd System and method for scoring electronic messages
US7080101B1 (en) * 2000-12-01 2006-07-18 Ncr Corp. Method and apparatus for partitioning data for storage in a database
US20030061232A1 (en) * 2001-09-21 2003-03-27 Dun & Bradstreet Inc. Method and system for processing business data
US6763362B2 (en) 2001-11-30 2004-07-13 Micron Technology, Inc. Method and system for updating a search engine
US7792828B2 (en) 2003-06-25 2010-09-07 Jericho Systems Corporation Method and system for selecting content items to be presented to a viewer
US7756750B2 (en) 2003-09-02 2010-07-13 Vinimaya, Inc. Method and system for providing online procurement between a buyer and suppliers over a network
NO20035563D0 (en) * 2003-10-01 2003-12-12 Telenor Asa Method and system for obtaining improved subscriber information
US7725414B2 (en) 2004-03-16 2010-05-25 Buzzmetrics, Ltd An Israel Corporation Method for developing a classifier for classifying communications
US7536382B2 (en) * 2004-03-31 2009-05-19 Google Inc. Query rewriting with entity detection
US20060015401A1 (en) * 2004-07-15 2006-01-19 Chu Barry H Efficiently spaced and used advertising in network-served multimedia documents
US7523085B2 (en) 2004-09-30 2009-04-21 Buzzmetrics, Ltd An Israel Corporation Topical sentiments in electronically stored communications
US9158855B2 (en) 2005-06-16 2015-10-13 Buzzmetrics, Ltd Extracting structured data from weblogs
US20070100836A1 (en) * 2005-10-28 2007-05-03 Yahoo! Inc. User interface for providing third party content as an RSS feed
US20070100960A1 (en) * 2005-10-28 2007-05-03 Yahoo! Inc. Managing content for RSS alerts over a network
US20090094137A1 (en) * 2005-12-22 2009-04-09 Toppenberg Larry W Web Page Optimization Systems
US7624101B2 (en) 2006-01-31 2009-11-24 Google Inc. Enhanced search results
US7660783B2 (en) 2006-09-27 2010-02-09 Buzzmetrics, Inc. System and method of ad-hoc analysis of data
US20080313142A1 (en) * 2007-06-14 2008-12-18 Microsoft Corporation Categorization of queries
US9392074B2 (en) 2007-07-07 2016-07-12 Qualcomm Incorporated User profile generation architecture for mobile content-message targeting
US9497286B2 (en) 2007-07-07 2016-11-15 Qualcomm Incorporated Method and system for providing targeted information based on a user profile in a mobile environment
US9203911B2 (en) 2007-11-14 2015-12-01 Qualcomm Incorporated Method and system for using a cache miss state match indicator to determine user suitability of targeted content messages in a mobile environment
US20090177530A1 (en) 2007-12-14 2009-07-09 Qualcomm Incorporated Near field communication transactions in a mobile environment
US8347326B2 (en) 2007-12-18 2013-01-01 The Nielsen Company (US) Identifying key media events and modeling causal relationships between key events and reported feelings
KR100930617B1 (en) * 2008-04-08 2009-12-09 한국과학기술정보연구원 Multiple object-oriented integrated search system and method
US20090327223A1 (en) * 2008-06-26 2009-12-31 Microsoft Corporation Query-driven web portals
US10007729B1 (en) 2009-01-23 2018-06-26 Zakta, LLC Collaboratively finding, organizing and/or accessing information
US10191982B1 (en) * 2009-01-23 2019-01-29 Zakata, LLC Topical search portal
US9607324B1 (en) 2009-01-23 2017-03-28 Zakta, LLC Topical trust network
US8874727B2 (en) 2010-05-31 2014-10-28 The Nielsen Company (Us), Llc Methods, apparatus, and articles of manufacture to rank users in an online social network
US8484186B1 (en) 2010-11-12 2013-07-09 Consumerinfo.Com, Inc. Personalized people finder
US10068266B2 (en) 2010-12-02 2018-09-04 Vinimaya Inc. Methods and systems to maintain, check, report, and audit contract and historical pricing in electronic procurement
CA2952419C (en) * 2014-06-20 2020-10-27 Zinc, Inc. Directory generation and messaging
US10643178B1 (en) 2017-06-16 2020-05-05 Coupa Software Incorporated Asynchronous real-time procurement system
CA3145535A1 (en) * 2021-01-12 2022-07-12 Tealbook Inc. System and method for data profiling

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997015018A1 (en) * 1995-10-16 1997-04-24 Bell Communications Research, Inc. Method and system for providing uniform access to heterogeneous information
WO1997029414A2 (en) * 1996-02-09 1997-08-14 At & T Corp. Method and apparatus for passively browsing the internet

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5974455A (en) * 1995-12-13 1999-10-26 Digital Equipment Corporation System for adding new entry to web page table upon receiving web page including link to another web page not having corresponding entry in web page table
JPH09311873A (en) * 1996-01-11 1997-12-02 Sony Corp Information providing data structure, information providing method, and information receiving terminal
US5905862A (en) * 1996-09-04 1999-05-18 Intel Corporation Automatic web site registration with multiple search engines
US5933827A (en) * 1996-09-25 1999-08-03 International Business Machines Corporation System for identifying new web pages of interest to a user
US6195657B1 (en) * 1996-09-26 2001-02-27 Imana, Inc. Software, method and apparatus for efficient categorization and recommendation of subjects according to multidimensional semantics
US5974572A (en) * 1996-10-15 1999-10-26 Mercury Interactive Corporation Software system and methods for generating a load test using a server access log
US5958008A (en) * 1996-10-15 1999-09-28 Mercury Interactive Corporation Software system and associated methods for scanning and mapping dynamically-generated web documents
CA2269131A1 (en) * 1996-10-25 1998-05-07 Ipf, Inc. System and method for managing and serving consumer product related information over the internet
US6085229A (en) * 1998-05-14 2000-07-04 Belarc, Inc. System and method for providing client side personalization of content of web pages and the like
US6141759A (en) * 1997-12-10 2000-10-31 Bmc Software, Inc. System and architecture for distributing, monitoring, and managing information requests on a computer network
US6151624A (en) * 1998-02-03 2000-11-21 Realnames Corporation Navigating network resources based on metadata
US6401118B1 (en) * 1998-06-30 2002-06-04 Online Monitoring Services Method and computer program product for an online monitoring search engine
US6654813B1 (en) * 1998-08-17 2003-11-25 Alta Vista Company Dynamically categorizing entity information
US6735585B1 (en) * 1998-08-17 2004-05-11 Altavista Company Method for search engine generating supplemented search not included in conventional search result identifying entity data related to portion of located web page

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997015018A1 (en) * 1995-10-16 1997-04-24 Bell Communications Research, Inc. Method and system for providing uniform access to heterogeneous information
WO1997029414A2 (en) * 1996-02-09 1997-08-14 At & T Corp. Method and apparatus for passively browsing the internet

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6735585B1 (en) 1998-08-17 2004-05-11 Altavista Company Method for search engine generating supplemented search not included in conventional search result identifying entity data related to portion of located web page
WO2002013057A1 (en) * 2000-08-04 2002-02-14 Sharinga Networks Inc. Network address resolution
US7254573B2 (en) * 2002-10-02 2007-08-07 Burke Thomas R System and method for identifying alternate contact information in a database related to entity, query by identifying contact information of a different type than was in query which is related to the same entity

Also Published As

Publication number Publication date
US7398266B2 (en) 2008-07-08
US6654813B1 (en) 2003-11-25
US20040267727A1 (en) 2004-12-30
AU5566099A (en) 2000-03-06
WO2000010106A1 (en) 2000-02-24
WO2000010108A1 (en) 2000-02-24
WO2000010105A1 (en) 2000-02-24
AU5565899A (en) 2000-03-06
JP5171927B2 (en) 2013-03-27
AU5565999A (en) 2000-03-06
AU5566199A (en) 2000-03-06
JP2011100461A (en) 2011-05-19
JP2002522847A (en) 2002-07-23
EP1105818A1 (en) 2001-06-13

Similar Documents

Publication Publication Date Title
WO2000010107A1 (en) Analyzing internet-based information
US6735585B1 (en) Method for search engine generating supplemented search not included in conventional search result identifying entity data related to portion of located web page
US7467140B2 (en) System, method, and article of manufacture for maintaining and accessing a whois database
US6745248B1 (en) Method and apparatus for analyzing domain name registrations
US6792458B1 (en) System and method for monitoring and analyzing internet traffic
US8346790B2 (en) Data integration method and system
US20050076097A1 (en) Dynamic web page referrer tracking and ranking
US7930285B2 (en) Systems for and methods of user demographic reporting usable for identifying users and collecting usage data
KR100819739B1 (en) Method and system for augmenting web content
US20030061232A1 (en) Method and system for processing business data
US20080270471A1 (en) Apparatus and method for internet service provider filtering in web analytics reports
US20040122939A1 (en) Method of obtaining economic data based on web site visitor data
KR20040082633A (en) Method and apparatus for detecting invalid clicks on the internet search engine
US7032017B2 (en) Identifying unique web visitors behind proxy servers
O'neill et al. A methodology for sampling the world wide web
WO2002021258A1 (en) Systems and methods for providing zip code linked web sites
KR20010096877A (en) Method and System for Target Marketing Using Internet IP Address
KR100619179B1 (en) Method and apparatus for detecting invalid clicks on the internet search engine
KR20010035966A (en) Method of Application Services using Supplementary Information for Internet Addresses
Alves et al. A Heuristic-Regression Approach to Crawler Pattern Identification on Clickstream Data
KR20070091907A (en) Multi step web site search method and system
KR20000053725A (en) multi pushing method for data and message
GB2405497A (en) Search engine

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH HU IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase