CA2618854C - Ranking search results using biased click distance - Google Patents

Ranking search results using biased click distance Download PDF

Info

Publication number
CA2618854C
CA2618854C CA2618854A CA2618854A CA2618854C CA 2618854 C CA2618854 C CA 2618854C CA 2618854 A CA2618854 A CA 2618854A CA 2618854 A CA2618854 A CA 2618854A CA 2618854 C CA2618854 C CA 2618854C
Authority
CA
Canada
Prior art keywords
authoritative
click distance
documents
document
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CA2618854A
Other languages
French (fr)
Other versions
CA2618854A1 (en
Inventor
Dmitriy Meyerzon
Hugo Zaragoza
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Corp
Original Assignee
Microsoft Corp
Microsoft Technology Licensing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corp, Microsoft Technology Licensing LLC filed Critical Microsoft Corp
Publication of CA2618854A1 publication Critical patent/CA2618854A1/en
Application granted granted Critical
Publication of CA2618854C publication Critical patent/CA2618854C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99935Query augmenting and refining, e.g. inexact access
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99937Sorting

Abstract

Methods for ranking search results in response to a search query using biased click distances are provided. Documents or nodes in a network, which can be identified as being either authoritative or non-authoritative, are each ranked according to their relevance score to a given search query using biased click distance values which are calculated based on the lesser of first and second click distance values. Where the first click distance value is a function of a number of links that need to be followed to create a path from a first non-authoritative document or node to a first authoritative document or node, and the second click distance value is a function of a number of links that need to be followed to create a path from a first non-authoritative document or node to a second authoritative document or node.

Description

RANKING SEARCH RESULTS USING BIASED CLICK DISTANCE
BACKGROUND
Ranking functions that rank documents according to their relevance to a given search query are known. Efforts continue in the art to develop ranking functions that provide better search results for a given search query compared to search results generated by search engines using known ranking functions.
SUMMARY
According to one aspect of the invention, there is provided a computer readable storage medium having stored thereon computer-executable instructions for ranking a plurality of documents in a network, wherein said computer-executable instructions when executed by the computer perform a method of generating search results in response to a search query, the method comprising: storing document information in memory, the document information identifying the plurality of documents in the network, the plurality of documents including authoritative documents and non-authoritative documents, the authoritative documents including at least a first authoritative document and a second authoritative document, and the non-authoritative documents including at least a first non-authoritative document;
storing link information in the memory, the link information identifying links among the plurality of documents; computing click distance values for each of the non-authoritative documents to the authoritative documents, the click distance values including at least a first click distance value that is a function of a number of links that need to be followed to create a path from the first non-authoritative document to the first authoritative document and a second click distance value that is a function of a number of links that need to be followed to create a path from the first non-authoritative document to the second authoritative document; computing biased click distance values for each of the non-authoritative documents in the network to the authoritative documents, wherein the biased click distance values include at least la a first biased click distance value that is a function of a lesser of the first and second click distances; receiving the search query including at least one search term;
executing the search query to generate a list of the plurality of documents that include the at least one search term, the list of the plurality of documents including an identifier of the first non-authoritative document; ranking the list of the plurality of documents that include the at least one search term using a ranking function that comprises one or more query-independent components, wherein at least one query-independent component includes a biased click distance parameter that takes into account the biased click distance values, including the first biased click distance value; and outputting the ranked search results according to the ranking.
According to another aspect of the invention, there is provided a method of determining document relevance scores for documents in a network, said method comprising the steps of: storing document and link information for the documents in the network; generating a representation of the network from the document and link information, wherein the representation of the network includes nodes that represent the documents and edges that represent the links;
assigning a biased click distance value to at least two authoritative nodes in the network, wherein the at least two authoritative nodes include at least a first authoritative node having a first assigned biased click distance and a second authoritative node having a second assigned biased click distance; computing click distances for each non-authoritative node in the representation of the network to at least two of the authoritative nodes, wherein the click distances include a first click distance and a second click distance, the first click distance being a function of a number of the links that need to be followed to create a path from a first non-authoritative node to the first authoritative node, and the second click distance being a function of a number of the links that need to be followed to create a path from the first non-authoritative node to the second authoritative node; computing biased click distance values for each of the non-authoritative documents, wherein the biased click distance values include at least a first biased click distance value that is a function of a lesser of the first and =

lb second click distances; and using the biased click distance values to determine document relevance scores for each of the documents in the network.
According to still another aspect of the invention, there is provided a computing system comprising: a processor; and a memory, the memory storing computer-executable instructions which when executed by the processor perform a method of determining document relevance scores for nodes in a network, said method comprising the steps of: assigning biased click distance values to at least two authoritative nodes in a representation of the network, wherein the at least two authoritative nodes include at least a first authoritative node having a first assigned biased click distance and a second authoritative node having a second assigned click distance; computing click distances for each non-authoritative node in the representation of the network to at least two of the authoritative nodes, wherein the click distances include a first click distance and a second click distance, the first click distance being a function of a number of links that need to be followed to create a path from a first non-authoritative node to the first authoritative node, and the second click distance being a function of a number of the links that need to be followed to create a path from the first non-authoritative node to the second authoritative node;
computing biased click distance values for the non-authoritative nodes, wherein the biased click distance values include at least a first biased click distance value that is a function of a lesser of the first and second click distances; and using the biased click distance values to determine document relevance scores for each of the nodes in the network.
Described herein are, among other things, various technologies for determining a document relevance score for a given document on a network. The document relevance score is generated via a ranking function that comprises one or more query-independent components, wherein at least one query-independent component includes a biased click distance parameter that takes into account biased click distance values for multiple documents on the network. The ranking functions may be used by a search engine to rank multiple documents in order (typically, in lc descending order) based on the document relevance scores of the multiple documents.
This Summary is provided to generally introduce the reader to one or more select concepts described below in the "Detailed Description" section in a simplified form. This Summary is not intended to identify key and/or required features of the claimed subject matter.
BRIEF DESCRIPTION OF THE FIGURES
FIG. 1 represents an exemplary logic flow diagram showing exemplary steps in a method of producing ranked search results in response to a search query inputted by a user;
FIG. 2 is a block diagram of some of the primary components of an exemplary operating environment for implementation of the methods and processes disclosed herein;
FIG. 3 depicts an exemplary web graph identifying documents in a network space, links between the documents, authoritative nodes having an assigned biased click distance value, and non-authoritative nodes having a calculated biased click distance value;
2 PCT/US2006/031965 FIGS. 4A-4B represent a logic flow diagram showing exemplary steps in a method of assigning and generating biased click distance values for nodes on a web graph;
FIGS. 5A-5B represent a logic flow diagram showing exemplary steps in a method of generating biased click distance values for non-authoritative nodes on a web graph; and FIG. 6 represents a logic flow diagram showing exemplary steps in a method of ranking search results generating using a ranking function containing a biased click distance value parameter.
DETAILED DESCRlPTION
To promote an understanding of the principles of the methods and processes disclosed herein, descriptions of specific embodiments follow and specific language is used to describe the specific embodiments. It will nevertheless be understood that no limitation of the scope of the disclosed methods and processes is intended by the use of specific language.
Alterations, further modifications, and such further applications of the principles of the disclosed methods and processes discussed are contemplated as would normally occur to one ordinarily skilled in the art to which the disclosed methods and processes pertains.
Methods of determining a document relevance score for documents on a network are disclosed. Each document relevance score is calculated using a ranking function that contains one or more query-dependent components (e.g., a function component that depends on the specifics of a given search query or search query term), as well as one or more query-independent components (e.g., a function component that that does not depend on a given search query or search query term). The document relevance scores determined by the ranking function may be used to rank documents within a network space (e.g., a corporate intranet space) according to each document relevance score.
An exemplary search process in which the disclosed methods may be used is shown as exemplary process 10 in FIG. 1.
FIG. 1 depicts exemplary search process 10, which starts with process step 80, wherein a user inputs a search query. From step 80, exemplary search process 10 proceeds to step 200, wherein a search engine searches all documents within a network space for one or more terms of the search query.
From step 200, exemplary search process 10 proceeds to step 300, wherein a ranking function of the search engine sorts the documents within the network
3 space based on the relevance score of each document, the document relevance score being based on one or more query-dependent components and one or more query-independent components. From step 300, exemplary search process 10 proceeds to step 400, wherein sorted search results are presented to the user, typically in decreasing order of relevance, identifying documents within the network space that are most relevant to the search query.
As discussed in more detail below, in some exemplary methods of determining a document relevance score, at least one query-independent component of a ranking function used to determine a document relevance score takes in to account a "biased click distance" of each document within a network space. The biased click distance for certain documents, referred to herein as "authoritative documents" within a network or "authoritative nodes" on a web graph, may be assigned an initial click distance value, in order to identify these documents as having different degrees of importance relative to each other, and possibly a higher degree of importance relative to the rest of the documents on the network. The remaining documents, referred to herein as "non-authoritative documents" within a network or "non-authoritative nodes" on a web graph, have a biased click distance value that is calculated based on their location to the closest authoritative document within a network space (or closest authoritative node on a web graph) resulting in click distance values biased towards the authoritative nodes.
In one exemplary embodiment, a biased click distance value may be assigned to m authoritative documents on a network comprising N total documents, wherein m is greater than or equal to 2 and less than N. In this exemplary embodiment, a system administrator manually selects or application code within a search system automatically identifies m authoritative documents within a given network space that have some degree of importance within the network space. For example, one of the m authoritative documents may be a homepage of a website or another page linked directly to the homepage of a website.
In another exemplary embodiment, at least two of the biased click distance values assigned to the m authoritative documents differ from one another. In this embodiment, different numerical values may be assigned to two or more m authoritative documents in order to further quantify the importance of one authoritative document to another authoritative document. For example, the importance of a given authoritative document may be indicated by a low biased click distance value. In this example, authoritative documents having a biased
4 click distance value equal to 0 will be considered more important than authoritative documents having a biased click distance value greater than 0.
The disclosed methods of determining a document relevance score may further utilize a ranking function that comprises at least one query-independent component that includes an edge value parameter that takes into account edge values assigned to each edge on the network, wherein each edge connects one document to another document within the hyperlinked structure of the network (or one node to another node on a web graph). Assigning edge values to one or more edges connecting documents to one another on a network provides a further method of affecting the document relevance score of documents on the network. For example, in the example described above wherein a lower biased click distance value indicates the importance of a given document, increasing an edge value between two documents, such as a first document and a second document linked to the first document, further reduces the importance of the second document (i.e., the linked document) relative to the first document. Conversely, by assigning a lower edge value to the edge between the first document and second document, the importance of the second document becomes greater relative to the first document In an exemplary embodiment, two or more edges linking documents within a network space may be assigned edge values that differ from one another. In this exemplary embodiment, different numerical values may be assigned to two or more edges in order to further quantify the importance of one document to another within a network space. In other exemplary embodiments, all of the edges linking documents within a network space are assigned the same edge value, wherein the assigned edge value is 1 or some other positive number.
In yet another embodiment, the edge values are equal to one another and are equal to or greater than the highest biased click distance value initially assigned to one or more authoritative documents.
In yet a further exemplary embodiment, the disclosed methods of determining a document relevance score utilizes a ranking function that comprises at least one query-independent component, which includes both the above-described biased click distance parameter and the above-described edge value parameter.
The document relevance score may be used to rank documents within a network space. For example, a method of ranking documents on a network may comprise the steps of determining a document relevance score for each document on the network using the above-described method; and ranking the documents in a desired order (typically, in descending order) based on the document relevance scores of each document.
The document relevance score may also be used to rank search results of a search query. For example, a method of ranking search results of a
5 search query may comprise the steps of determining a document relevance score for each document in the search results of a search query using the above-described method; and ranking the documents in a desired order (typically, in descending order) based on the document relevance scores of each document.
Application programs using the methods disclosed herein may be loaded and executed on a variety of computer systems comprising a variety of hardware components. An exemplary computer system and exemplary operating environment for practicing the methods disclosed herein is described below.
Exemplary Operating Environment FIG. 2 illustrates an example of a suitable computing system environment 100 on which the methods disclosed herein may be implemented.
The computing system environment 100 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the methods disclosed herein. Neither should the computing environment 100 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the, exemplary operating environment 100.
The methods disclosed herein are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with the methods disclosed herein include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The methods and processes disclosed herein may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The methods and processes disclosed herein may also be practiced in distributed computing
6 environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
With reference to PIG. 2, an exemplary system for implementing the methods and processes disclosed herein includes a general purpose computing device in the form of a computer 110. Components of computer 110 may include, but are not limited to, a processing unit 120, a system memory 130, and a system bus 121 that couples various system components including, but not limited to, system memory 130 to processing unit 120. System bus 121 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus also known as Mezzanine bus.
Computer 110 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by computer 110 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media.
Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium, which can be used to store the desired information and which can be accessed by computer 110.
Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes
7 wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer readable media as used herein.
System memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132. A basic input/output system 133 (BIOS) containing the basic routines that help to transfer information between elements within computer 110, such as during start-up, is typically stored in ROM 131.
RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120. By way of example, and not limitation, FIG. 2 illustrates operating system 134, application programs 135, other program modules 136, and program data 137.
Computer 110 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only, FIG. 2 illustrates a hard disk drive 140 that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive 151 that reads from or writes to a removable, nonvolatile magnetic disk 152, and an optical disk drive 155 that reads from or writes to a removable, nonvolatile optical disk 156 such as a CD
ROM or other optical media. Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like. Hard disk drive 141 is typically connected to system bus 121 through a non-removable memory interface such as interface 140, and magnetic disk drive 151 and optical disk drive 155 are typically connected to system bus 121 by a removable memory interface, such as interface 150.
The drives and their associated computer storage media discussed above and illustrated in FIG. 2 provide storage of computer readable instructions, data structures, program modules and other data for computer 110. In FIG. 2, for example, hard disk drive 141 is illustrated as storing operating system 144, application programs 145, other program modules 146, and program data 147.
Note that these components can either be the same as or different from operating system 134, application programs 135, other program modules 136, and program data 137. Operating system 144, application programs 145, other program modules 146, and program data 147 are given different numbers here to illustrate that, at a minimum, they are different copies.
8 A user may enter commands and information into computer 110 through input devices such as a keyboard 162 and pointing device 161, commonly referred to as a mouse, trackball or touch pad. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to processing unit 120 through a user input interface 160 that is coupled to system bus 121, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB). A monitor 191 or other type of display device is also connected to system bus 121 via an interface, such as a video interface 190. In addition to monitor 191, computer 110 may also include other peripheral output devices such as speakers 197 and printer 196, which may be connected through an output peripheral interface 195.
Computer 110 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180. Remote computer 180 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to computer 110, although only a memory storage device 181 has been illustrated in FIG. 2. The logical connections depicted in FIG. 2 include a local area network (LAN) 171 and a wide area network (WAN) 173, but may also include other networks. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.
When used in a LAN networking environment, computer 110 is connected to LAN 171 through a network interface or adapter 170. When used in a WAN networking environment, computer 110 typically includes a modem 172 or other means for establishing communications over WAN 173, such as the Internet. Modem 172, which may be internal or external, may be connected to system bus 121 via user input interface 160, or other appropriate mechanism.
In a networked environment, program modules depicted relative to computer 110, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation, FIG. 2 illustrates remote application programs as residing on memory device 181. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
Methods and processes disclosed herein may be implemented using one or more application programs including, but not limited to, a search ranking application, which could be one of numerous application programs
9 designated as application programs 135, application programs 145 and remote application programs 185 in exemplary system 100.
As mentioned above, those skilled in the art will appreciate that the disclosed methods of generating a document relevance score for a given document may be implemented in other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, networked personal computers, minicomputers, mainframe computers, and the like. The disclosed methods of generating a document relevance score for a given document may also be practiced in distributed computing environments, where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.
Implementation of Exemplary Embodiments As discussed above, methods of determining a document relevance score for a document on a network are provided. The disclosed methods may rank a document on a network utilizing (i) a ranking function that takes into account a biased click distance value of each document on the network, (ii) a ranking function that takes into account one or more edge values assigned to edges (or links) between documents on the network, or (iii) both (i) and (ii).
The disclosed methods of determining a document relevance score for a document on a network may comprise a number of steps. In one exemplary embodiment, the method of determining a document relevance score for a document on a network comprises the steps of storing document and link information for documents on a network; generating a representation of the network from the document and link information, wherein the representation of the network includes nodes that represent the documents and edges that represent the links; assigning a biased click distance value (CDA) to at least two nodes on the network, wherein the nodes that are assigned a biased click distance value are authoritative nodes; computing a biased click distance for each of the non-authoritative nodes in the representation of the network, wherein a biased click distance for a given non-authoritative node is measured from the given non-authoritative node to an authoritative node closest to the given non-authoritative node, wherein the computing step results in a computed biased click distance value (CDc) for each non-authoritative document; and using the biased click distance value (i.e., (CDA or CDG) for each document to determine the document relevance score of a given document on the network.
The step of storing document and link information for documents on a network may be performed by indexing application code commonly found 5 on computing systems. The indexing application code generates a representation of the network from the document and link information, wherein the = representation of the network includes nodes that represent the documents and edges that represent the links. Such a representation of the network is commonly referred to as a "web graph." One exemplary method of generating a web graph
10 comprises using data gathered by a process where link and anchor text information is gathered and attributed to specific target documents of the anchor.
This process and the concept of anchor text is described more fully in U.S.
Patent Application Serial Number 10/955,462 entitled "SYSTEM AND METHOD FOR
INCORPORATING ANCHOR TEXT INTO RANKING SEARCH RESULTS"
filed on August 30, 2004., FIG. 3 depicts an exemplary web graph identifying documents in a network space and links between the documents. As shown in FIG. 3, exemplary web graph 30 comprises nodes 31, which represent each document within a given network space (e.g., a corporate Intranet), and edges 32, which represent links between documents within a given network space. It should be understood that exemplary web graph 30 is an overly simplified representation of a given network space. Typically, a given network space may comprise hundreds, thousands or millions of documents and hundreds, thousands or millions of links 25= connecting documents to one another. Further, although exemplary web graph 30 depicts up to eight links connected to a given node (e.g., central node 33), it should be understood that in an actual network setting, a given node may have hundreds of links connecting the node (e.g., document) to hundreds of other documents within the network (e.g., the home page of a network may be linked to every page within the network).
In addition, exemplary web graph 30 shows very few cycles (e.g., a first node linking to a second node, which may link to additional nodes, wherein the second node or one of the additional nodes links back to the first node). One such cycle is represented by nodes 41 and 42 in FIG. 3. Other cycles could be represented if any of end nodes 40 linked back to any other node shown in FIG. 3, such as central node 33. Regardless of the simplicity or complexity of a given web graph, the disclosed methods of generating a document relevance
11 score for a given document may be used on any web graph, including those containing cycles.
Once a web graph has been generated, one or more techniques may be used to affect the relative importance of one or more documents within the network space, represented by the nodes of the web graph. As discussed above and below, these techniques include, but are not limited to, (i) designating two or more nodes as authoritative nodes; (ii) assigning each of the authoritative nodes a biased click distance value (CDA), (iii) optionally, assigning two or more biased click distance values (CDA) that differ from one another; (iv) assigning edge value to each edge of the web graph; (v) optionally, assigning a minimum edge value to each edge of the web graph, wherein the minimum edge value is greater than a maximum or highest assigned biased click distance values (CDAõ); (vi) optionally, assigning two or more edge values that differ from one another; (vii) calculating a biased click distance value (CDc) for each non-authoritative node; and (viii) optionally, downgrading any of the biased click distance values (CDA or CDc), when necessary, if test queries using the biased click distance values generate irrelevant search results. Some of the above-described exemplary techniques for affecting the biased click distance value of one or more documents within the network represented by exemplary web graph 30 are shown in FIG. 3.
In exemplary web graph 30, nodes 31 having a square shape are used to identify authoritative nodes within the network, while nodes 31 having a circular shape are used to identify non-authoritative nodes. It should be understood that any number of nodes within a given web graph may be designated as authoritative nodes depending on a number of factors including, but not limited to, the total number of documents within the network space, and the number of "important" documents that are within the network space. In exemplary web graph 30, 9 of the 104 nodes are designated as authoritative nodes (i.e., represent 9 out of 104 documents as being as particular importance).
Further, although not shown on exemplary web graph 30, each edge 32 between each pair of nodes 31 has an edge weight associated therewith.

Typically, each edge 32 has a default edge weight of 1; however, as discussed above, an edge weight other than 1 can be assigned to each edge 32. Further, in some embodiments, two or more different edge weights may be assigned to edges within the same web graph. In FIG. 3, letters p, q, r, s and t shown on exemplary web graph 30 are used to indicate edge values of some of edges 32. As discussed above, edge values p, q, r, s and t may have a value of 1, a value other than 1,
12 and/or values that differ from one another in order to further affect biased click distance values of nodes 31 within exemplary web graph 30. Typically, edge values for p, q, r, s and t, as well as the other edges in exemplary web graph 30, are the same number, and are typically equal to or greater than 1. In some embodiments, edge values for p, q, r, s and t, as well as the other edges in exemplary web graph 30, are the same number, and are equal to or greater than the highest biased click distance value assigned to an authoritative node.
The one or more techniques used to modify a web graph in order to affect biased click distance values of documents on a network may be manually initiated and performed by a system administrator. The system administrator can view a given web graph, and edit the web graph as desired to increase and/or decrease the relative importance of one or more documents within a network space as described above. Application code, such as application code in a computing system capable of conducting a search query, may automatically produce a bias in a web graph using one or more of the above-described techniques (e.g., calculating a biased click distance value (CD c) for each non-authoritative node).
FIGS. 4A-4B represent a logic flow diagram showing exemplary steps in an exemplary method of assigning and generating biased click distance values for nodes on a web graph followed by an optional downgrading procedure by a system administrator. As shown in FIG. 4A, exemplary method 401 starts at block 402 and proceeds to step 403. In step 403, a number of authoritative nodes (or URLs) are selected out of N total nodes (or URLs) within a network space.
In exemplary method 401, m authoritative nodes (or LTRLs) are selected, wherein in is greater than or equal to 2. Once the authoritative nodes (or URLs) are selected, exemplary method 401 proceeds to decision block 404.
At decision block 404, a determination is made by a system administrator whether to assign at least two different biased click distance values (CDA) to two or more of the in authoritative nodes (or LTRLs). If a decision is made to assign at least two different biased click distance values (CDA) to two or more of the m authoritative nodes (or URLs), exemplary method 401 proceeds to step 405, wherein at least two different biased click distance values (CDA) are assigned to two or more of the m authoritative nodes (or URLs). For example, referring to exemplary web graph 30 shown in FIG. 3, authoritative nodes 33 and 34 may be assigned a biased click distance value of 0, authoritative nodes 35 and 36 may be assigned a biased click distance value of +3, and authoritative node
13 may be assigned a biased click distance value of +2. From step 405, exemplary method 401 proceeds to decision block 407.
Returning to decision block 404, if a decision is made not to assign at least two different biased click distance values (CDA) to two or more of the m authoritative nodes (or LTRLs), exemplary method 401. proceeds to step 406, wherein the same biased click distance value (CDA) is assigned to each of the m authoritative nodes (or LTRLs). For example, referring to exemplary web graph 30 of FIG. 3 again, each of the authoritative nodes may be assigned a biased click distance value, such as 0, +2, or +5. From step 406, exemplary method 401 proceeds to decision block 407.
At decision block 407, a determination is made by a system administrator or application code whether to assign= an edge weight other than to one or more edges of a web graph. If a decision is made to assign an edge weight other than 1 to one or more edges of a web graph, exemplary method 401 proceeds to decision block 408. At decision block 408, a determination is made by a system administrator whether to assign a minimum edge value to the edges of a web graph, wherein the minimum edge value is greater than the largest assigned biased click distance value (CDAa.). If a decision is made to assign a minimum edge value to the edges of a web graph, wherein the minimum edge value is greater than the largest assigned biased click distance value (CDAmax), exemplary method 401 proceeds to step 409, wherein a minimum edge value greater than the largest assigned biased click distance value (CDAmax) is assigned to each edge of a web graph. For example, referring to exemplary web graph 30 shown in FIG. 3, if authoritative node 33 is assigned the largest biased click distance value (CDAmax) and CDAmax equals +3, a minimum edge value of greater than +3 is assigned to each edge 32 shown in FIG. 3.
In some embodiments, applying a minimum edge value that is greater than the largest assigned biased click distance value (CDAmax) to each edge of a web graph may have some advantages. In this embodiment, such a technique guarantees that the assigned biased click distance value (CDA) of each authoritative node (or document or LTRL) is less than the calculated biased click distance value (CD c) of every non-authoritative node (or document or -CTRL) in a web graph. When importance of a document is based on a lower biased click distance value, such a technique enables all of the authoritative nodes (or documents or LTRLs) to be considered more important than the non-authoritative nodes (or documents or URLs) within a web graph.
14 From step 409, exemplary method 401 proceeds to decision block 410 shown in FIG. 4B and described below. Returning to decision block 408, if a decision is made not to assign a minimum edge value to each edge, wherein the minimum edge value is greater than the largest assigned biased click distance value (CDAmax), exemplary method 401 proceeds directly to decision block 410 shown in FIG. 4B and described below. In this embodiment, it is possible for a non-authoritative node to have a biased click distance value less than an authoritative node (i.e., be considered more important than the authoritative node wherein importance of a document is based on a lower biased click distance value). For example, referring to exemplary web graph 30 of FIG. 3, if authoritative node 34 is assigned a biased click distance value of +3, authoritative node 48 is assigned a biased click distance value of 0, and edge value s is +1, non-authoritative nodes 39 have a calculated biased click distance value of +1 (i.e., the sum of the assigned biased click distance value of the closest authoritative node 48, 0, and edge value s, +1.
At decision block 410 shown in FIG. 4B, a determination is made by a system administrator whether to assign at least two different edge values to two or more edges of a web graph. If a decision is made to assign at least two different edge values to two or more edges of a web graph, exemplary method 401 proceeds to step 411, wherein at least two different edge values are assigned to two or more edges of a web graph. For example, referring to exemplary web graph 30 shown in FIG. 3, any two of edge values p, q, r, s and t may be assigned at least two different numbers. From step 411, exemplary method 401 proceeds to step 414 described below.
Returning to decision block 410, if a decision is made not to assign at least two different edge values to two or more edges of a web graph, exemplary method 401 proceeds to step 412, wherein the same edge value is assigned to each edge of a web graph, and the edge value is a value other than 1.
For example, referring to exemplary web graph 30 shown in FIG. 3, each of edge values p, q, r, s and t are assigned the same number and a number other than 1.
From step 412, exemplary method 401 proceeds to step 414 described below.
Returning to decision block 407 shown in FIG. 4A, if a decision is made not to assign an edge weight to one or more edges of a web graph, exemplary method 401 proceeds to step 413, wherein a default edge value (e.g., +1) is used for each edge of a web graph so that the edges of the web graph have a minimal effect on calculated biased click distance values. In this embodiment, factors such as the number and location of authoritative nodes have a greater effect on calculated biased click distance values than the default edge values.
From step 413, exemplary method 401 proceeds to step 414 shown in FIG. 4B.
In step 414, biased click distance values (CD,) for non-authoritative nodes (or documents or URLs) are calculated. As described in more 5 detail below, the biased click distance value for a given target node (i.e., non-authoritative node) (CDCtarget) linked directly to an authoritative node may be calculated using the formula:
CDCtarget=Mill(CDAclosest+ Edge Weight), wherein CDActosest represents the assigned biased click distance value for the authoritative node closest to the target node; and EdgeWeight (also referred to herein as EdgeValue) represents the edge value or edge weight assigned to the edge linking the closest authoritative node to the target node. The min(x) function is used to indicate that a minimal calculated biased click distance value is used for a given node, for example, if the node is linked directly to two authoritative nodes. The biased click distance value for a given target node (i.e., non-authoritative node) (CDCtarget) other than those linked directly to an authoritative node may be calculated using the formula:
CDCtarget= min(CDonin + Edge Weight), wherein CDcmin represents the calculated biased click distance value of an adjacent node having the lowest calculated biased click distance value; and Edge Weight represents the edge value or edge weight assigned to the edge linking the adjacent node having the lowest calculated biased click distance value and the target node. From step 414, exemplary method 401 proceeds to step 415.

In step 415, the resulting biased click distance values, assigned (CDA) and calculated (CD,), are tested by a system administrator. Typically, the system administrator tests the system by executing one or more search queries using the resulting biased click distance values (assigned (CDA) and calculated (CD,)). If the system administrator notices obviously irrelevant content coming back, the system administrator can use the above-described biasing tools/techniques to downgrade one or more sites, for example, archive folders or web sites, generating the irrelevant content. The above-described test enables a system administrator to evaluate the biased click distance values for possible inconsistencies between (i) the actual importance of a given document within a network space and (ii) the importance of the document as indicated by its biased click distance value. From step 415, exemplary method 401 proceeds to decision block 416.
At decision block 416, a determination is made by a system administrator whether to downgrade any biased click distance values in order to more closely represent the importance of a given document within a network space. If a decision is made to downgrade one or more biased click distance values in order to more closely represent the importance of one or more documents within a network space, exemplary method 401 proceeds to step 417, wherein the biased click distance values of one or more documents (or URLs) are adjusted either negatively or positively. From step 417, exemplary method 401 proceeds to step 418.
Returning to decision block 416, if a decision is made not to downgrade one or more biased click distance values in order to more closely represent the importance of one or more documents within a network space, exemplary method 401 proceeds directly to step 418. In step 418, the biased click distance values assigned to authoritative nodes and calculated for non-authoritative nodes are utilized in a ranking function to detennine an overall document relevance score for each document within a network space. From step 418, exemplary method 401 proceeds to end block 419.
As discussed above, biased click distance values (CD,) for non-authoritative nodes (or LTRLs) on a web graph are calculated based on the shortest distance between a given non-authoritative node (or URLs), also referred to as a "target node," and the closest authoritative node (or URL). One exemplary process for calculating the biased click distance values (CD,) for non-authoritative URLs within a network space is depicted in FIGS. 5A-5B.
FIGS. 5A-5B illustrate a logical flow diagram of an exemplary process 40 for calculating the biased click distance (CD,) for non-authoritative nodes (or URLs) within a network space. Exemplary process 40 starts at block 4140 and proceeds to step 4141, where a web graph comprising (i) authoritative nodes with their assigned biased click distance values (CDA), (ii) non-authoritative nodes, (iii) links between nodes, and (iv) edge values for each link is loaded from a database into memory. (See, for example, exemplary web graph 30 in FIG. 3). The web graph may have been previously generated using an indexing procedure as described above. From step= 4141, exemplary process 40 proceeds to step 4142.

In step 4142, biased click distance values (CDc) for non-authoritative nodes are initialized to a maximum biased click distance value, such as infinity. Assigning a maximum biased click distance value, such as infinity, to the non-authoritative nodes identifies nodes for which a biased click distance value (CDc) needs to be calculated. Once initialization of maximum biased click distance values is complete, exemplary process 40 proceeds to step 4143.
In step 4143, the m authoritative nodes are inserted into a queue.
The m authoritative nodes inserted into the queue correspond to the m most authoritative nodes of the network space as pre-determined by a system administrator or some other system determinator. Once the m authoritative nodes are added to the queue, exemplary process 40 proceeds to decision block 4144.
At decision block 4144, a determination is made by the application code as to whether the queue is empty. An empty queue signifies that all nodes of the web graph have either (i) obtained an assigned biased click distance value (CDA) or (ii) had their biased click distance value calculated (CDc). If the queue is empty, exemplary process 40 proceeds to end block 4145 where exemplary process 40 ends. However, if the queue is not empty, exemplary process 40 continues to step 4146.
In step 4146, the node having the smallest biased click distance value (i.e., CDA or CDc) is removed from the queue. This node is referred to herein as "the current node." During the first iteration through exemplary process 40, the authoritative node having the smallest assigned biased click distance value (i.e., CDAmin) is the current node. During subsequent iterations through exemplary process 40, the node having the smallest biased click distance value may be an authoritative node or a non-authoritative node. During the last iteration through exemplary process 40, the node having the smallest assigned biased click distance value will typically be a non-authoritative node. Once the node having the smallest biased click distance value (i.e., CDA or CDc) is removed from the queue, exemplary process 40 proceeds to decision block 4147.
At decision block 4147, a determination is made by the application code as to whether the current node has any target nodes. As used herein, the term "target node" or "target nodes" refers to one or more nodes linked to the current node. If the current node does not have any target nodes, exemplary process 40 returns to decision block 4144 to again determine whether the queue is empty, and then proceeds as discussed above. However, if the current node has one or more target nodes, exemplary process 40 proceeds to step 4148.

In step 4148, a target node associated with the current node is retrieved from the web graph and evaluated. For example, referring to exemplary web graph 30 of FIG. 3, if authoritative node 48 is the current node (i.e., the node having the smallest biased click distance value), any one of non-authoritative nodes 39 could be the target node (i.e., a node linked to authoritative node 48 and having an initial biased click distance value set to infinity). Once a current node and a target node are selected, exemplary process 40 proceeds to decision block 4149.
At decision block 4149, a determination is made by the application code whether the click distance associated with the target node biased click distance value is greater than the biased click distance value of the current node plus an edge weight value for the edge connecting the current node to the target node. If a determination is made that the target node biased click distance value is greater than the biased click distance value of the current node plus an edge weight value for the edge connecting the current node to the target node, exemplary process 40 proceeds to step 4150 (shown in FIG. 5B), wherein the target node biased click distance value is updated to equal the biased click distance value of the current node plus the edge weight value of the edge connecting the current node to the target node.
During the first iteration through exemplary process 40, all target nodes will have an initial target node biased click distance value set to infinity.
Consequently, exemplary process 40 will proceed to step 4150, wherein the biased click distance value of the target node is updated as described above.
However, in subsequent iterations through exemplary process 40, the selected target node may, for example, have an initial target node biased click distance value set to infinity (exemplary process 40 will proceed to step 4150) or may have a biased click distance value previously configured by the system administrator (e.g., the target node is an authoritative node). From step 4150, exemplary process 40 proceeds to step 4151.
In step 4151, the current node and the target node with an updated target node biased click distance value are both added to the queue. From step 4150, exemplary process 40 returns to decision block 4146 (shown in FIG. 5A) and continues as described above.
Returning to decision block 4149 (shown in FIG. 5A), if a determination is made that the target node biased click distance value is not greater than the biased click distance value of the current node plus an edge weight value for the edge connecting the current node to the target node, (i) the target node keeps its calculated target node biased click distance value, (ii) the target node remains out of the queue, and (iii) exemplary process 40 returns to decision block 4147 (shown in FIG. 5A), where a determination is made whether the current node has any other target nodes. If a determination is made that the current node does not have another target node, exemplary process 40 returns to decision block 4144 and continues as described above. If a determination is made that the current node has another target node, exemplary process 40 proceeds to step 4148 and continues as described above.
When exemplary process 40 returns to step 4148, another target node associated with the current node is selected and evaluated as described above. If the selected target node has not been selected before, the target node will have an initial biased click distance value set to infinity, and exemplary process 40 will proceed to step 4150 as described above.
The above-described exemplary method of providing a biased click distance value to all nodes on a web graph prevents a biased click distance value of a given target node from being changed if the biased click distance value is lower than the sum of a biased click distance value of a current node plus an edge value of the edge linking the target node to the current node.
Once all nodes of a given web graph have been determined and optionally downgraded (or optionally upgraded), if so desired, the biased click distance values for each document may be used as a parameter in a ranking function to provide a document relevance score for each document. Such a document relevance score may be used to rank search results of a search query.

An exemplary method of ranking search results generating using a ranking function containing a biased click distance value parameter is shown in FIG.
6.
FIG. 6 provides a logic flow diagram showing exemplary steps in exemplary method 20, wherein exemplary method 20 comprises a method of ranking search results generating using a ranking function containing a biased click distance value parameter. As shown in FIG. 6, exemplary method 20 starts at block 201 and proceeds to step 202. In step 202, a user requests a search by inputting a search query. Prior to step 202, biased click distance values for each of the documents on the network have previously been calculated. From step 202, exemplary method 20 proceeds to step 203.
In step 203, the biased click distance value for each document on a network is merged with any other document statistics (e.g., query-independent statistics) for each document stored in the index. Merging the biased click distance values with other document statistics allows for a faster query response time since all the information related to ranking is clustered together.
Accordingly, each document listed in the index has an associated biased click = distance value after the merge. Once the merge is complete, exemplary method 20 proceeds to step 204. =
5 In step 204, query-Independent document statistics for a given document, including a biased click distance value, are provided as a component of= a ranking function. Query-dependent data is also provided for the given document, typically as a separate component of the ranking function. The query-dependent data or content-related portion of the ranking flmction depends on the 10 actual search terms and the content of the given document In oxie embodiment, the ranking function comprises a sum of at = least one query-dependent (QD) component and at least one query-independent (Q1D) component, such as . = 15 = Score = QD(doc, query) + QID(doc).
=
The QD component can be.any document scoring function. In one embodiment, the QD component corresponds to a field weighted scoring function described in U.S. Patent Application Serial No. 10/804,326 entitled "FIELD WEIGHTING IN
TEXT DOCUMENT SEARCHING:' filed on March 18, 2004. As provided in = U.S. Patent Application Serial No. 10/804,326, one equation that may be used as a representation of the field weighted scoring function is as follows:
25 QD(doc,query)=E we (ki +1) x loacN) wherein:
wr represents a weighted term frequency or sum of term, frequencies of =
given terms inthe search query multiplied by weights across all fields (e.g., the = 30 title, the body, etc. of the document) and normalized according to the length of each field and the corMsponding average length, N represents a number of documents on the network, n represents a number of documents containing a query term, and = 1c1 is a tunable constant.
35 The above terms and equation are further described in detail in U.S.
Patent Application Serial No. 10/804,326.

The QID component can be any transformation of a biased click distance value and other document statistic (such as a URL depth) for a given document. In one embodiment, the QID component comprises a function as =
follows:
kcd QID(doc) = wca CD
cd +ha ________________________________________ + bõdUD
k bcd + bud wherein:
wa represents a weight of a query-independent component such as a component containing a biased click distance parameter, bcd represents a weight of a biased click distance relative to the URL
depth, bud represents a weight of a URL depth, CD represents a computed click distance or assigned biased click distance for a document, k, represents a tuning constant that is determined by optimizing the precision of the ranking function, similar to other tuning parameters (i.e., kõ may represent the edge weight value when all edges have the same edge weight value, or kew may represent the average or mean edge value when edge weight values differ from one another), UD represents a URL depth, and kcd is the biased click distance saturation constant.
The weighted terms (wcd, bcd, and bud) assist in defining the importance of each of their related terms (i.e., the component containing a biased click distance parameter, the biased click distance value for a given document, and the URL depth of the given document respectively) and ultimately the outcome of the scoring functions.
The URL depth (UD) is an optional addition to the above-referenced query-independent component to smooth the effect that the biased click distance value may have on the scoring function. For example, in some cases, a document that is not very important (i.e., has a large URL depth) may have a short biased click distance value. The URL depth is represented by the number of slashes in a document's URL.
For example, www.example.com\dl\d2\d3\d4.htm includes four slashes and would therefore have a URL depth of 4. This document however, may have a link directly from the main page wwvv.example.com giving it a relatively low biased click distance value. Including the URL depth term in the above-referenced function and weighting the URL depth term against the biased click distance value compensates for a relatively high biased click distance value to more accurately reflect the document's importance within the network. Depending on the network, a URL depth of 3 or more may be considered a deep link.
In one embodiment, the ranking function used to determine a document relevance score for a given document comprises a function as follows:
Score = E wtf' (ki +,1) x log(N)+W kcd cd k1+ wtf bed CD
+ bõdUD
kew Iced -4-bed + bud wherein the terms are as described above.
In other embodiments, the URL depth may be removed from the ranking function or other components may be added to the ranking function to improve the accuracy of the query-dependent component, the query-independent component, or both. Furthermore, the above-described query-independent component containing a biased click distance parameter may be incorporated into other ranking functions (not shown) to improve ranking of search results.
Once document statistics for a given document are provided to a ranking function in step 204, exemplary method 20 proceeds to step 205. In step 205, a document relevance score is determined for a given document, stored in memory, and associated with the given document. From step 205, exemplary method 20 proceeds to decision block 206.
At decision block 206, a determination is made by application code whether a document relevance score has been calculated for each document within a network. If a determination is made that a document relevance score has not been calculated for each document within a network, exemplary method 20 returns to step 204 and continues as described above. If a determination is made that a document relevance score has been calculated for each document within a network, exemplary method 20 proceeds to step 207.
In step 207, the search results of the query comprising numerous documents are ranked according to their associated document relevance scores.
The resulting document relevance scores take into account the biased click distance value of each of the documents within the network. Once the search results are ranked, exemplary method 20 proceeds to step 208 where ranked results are displayed to a user. From step 208, exemplary method 20 proceeds to step 209 where highest ranked results are selected and viewed by the user.
From step 209, exemplary method 20 proceeds to step 210 where exemplary method 20 ends.
In addition to the above-described methods of generating a document relevance score for documents within a network and using document relevance scores to rank search results of a search query, computer readable medium having stored thereon computer-executable instructions for performing the above-described methods are also disclosed herein.
Computing systems are also disclosed herein. An exemplary computing system contains at least one application module usable on the computing system, wherein the at least one application module comprises application code loaded thereon, wherein the application code performs a method of generating a document relevance score for documents within a network. The application code may be loaded onto the computing system using any of the above-described computer readable medium having thereon computer-executable instructions for generating a document relevance score for documents within a network and using document relevance scores to rank search results of a search query as described above.
-While the specification has been described in detail with respect to specific embodiments thereof, it will be appreciated that those skilled in the art, upon attaining an understanding of the foregoing, may readily conceive of alterations to, variations of, and equivalents to these embodiments.
Accordingly, the scope of the disclosed methods, computer readable medium, and computing systems should be assessed as that of the appended claims and any equivalents thereto.

Claims (14)

CLAIMS:
1. A
computer readable storage medium having stored thereon computer-executable instructions for ranking a plurality of documents in a network, wherein said computer-executable instructions when executed by the computer perform a method of generating search results in response to a search query, the method comprising:
storing document information in memory, the document information identifying the plurality of documents in the network, the plurality of documents including authoritative documents and non-authoritative documents, the authoritative documents including at least a first authoritative document and a second authoritative document, and the non-authoritative documents including at least a first non-authoritative document;
storing link information in the memory, the link information identifying links among the plurality of documents;
computing click distance values for each of the non-authoritative documents to the authoritative documents, the click distance values including at least a first click distance value that is a function of a number of links that need to be followed to create a path from the first non-authoritative document to the first authoritative document and a second click distance value that is a function of a number of links that need to be followed to create a path from the first non-authoritative document to the second authoritative document;
computing biased click distance values for each of the non-authoritative documents in the network to the authoritative documents, wherein the biased click distance values include at least a first biased click distance value that is a function of a lesser of the first and second click distances;
receiving the search query including at least one search term;

executing the search query to generate a list of the plurality of documents that include the at least one search term, the list of the plurality of documents including an identifier of the first non-authoritative document;
ranking the list of the plurality of documents that include the at least one search term using a ranking function that comprises one or more query-independent components, wherein at least one query-independent component includes a biased click distance parameter that takes into account the biased click distance values, including the first biased click distance value; and outputting the ranked search results according to the ranking.
2. The computer readable storage medium of claim 1, wherein the method further comprises assigning assigned biased click distance values to the authoritative documents.
3. The computer readable storage medium of claim 2, wherein at least two of the assigned biased click distance values differ from one another.
4. The computer readable storage medium of claim 1, wherein the ranking function further comprises at least a second query-independent component that includes an edge value parameter that takes into account edge values of each edge in the network, wherein one or more edge values are a number other than 1.
5. The computer readable storage medium of claim 4, wherein the edge values are equal to one another and are equal to a number other than 1.
6. The computer readable storage medium of claim 4, wherein the edge values are equal to one another and are equal to or greater than a highest biased click distance value initially assigned to one or more of the authoritative documents.
7. The computer readable storage medium of claim 1, further comprising computer-executable instructions for assigning a score generated by the ranking function to each document in the network, said score being used to rank documents in ascending or descending order.
8. The computer readable storage medium of claim 7, wherein the score for each document is generated using a formula:
wherein:
wtf' represents a weighted term frequency, N represents a number of the documents in the network, n represents a number of the documents containing the search term, W cd represents a weight of the at least one query-independent component, b cd represents a weight of one of the click distance values, bud represents a weight of a URL depth, CD represents one of the computed click distance values or an assigned biased click distance for a document, k ew represents a tuning constant related to edge weights, UD represents the URL depth, and k cd and k1 are constants.
9. A method of determining document relevance scores for documents in a network, said method comprising the steps of:

storing document and link information for the documents in the network;
generating a representation of the network from the document and link information, wherein the representation of the network includes nodes that represent the documents and edges that represent the links;
assigning a biased click distance value to at least two authoritative nodes in the network, wherein the at least two authoritative nodes include at least a first authoritative node having a first assigned biased click distance and a second authoritative node having a second assigned biased click distance;
computing click distances for each non-authoritative node in the representation of the network to at least two of the authoritative nodes, wherein the click distances include a first click distance and a second click distance, the first click distance being a function of a number of the links that need to be followed to create a path from a first non-authoritative node to the first authoritative node, and the second click distance being a function of a number of the links that need to be followed to create a path from the first non-authoritative node to the second authoritative node;
computing biased click distance values for each of the non-authoritative documents, wherein the biased click distance values include at least a first biased click distance value that is a function of a lesser of the first and second click distances; and using the biased click distance values to determine document relevance scores for each of the documents in the network.
10. The method of claim 9, wherein the first assigned biased click distance and the second assigned biased click distance differ from one another.
11. The method of claim 9, further comprising the step of: assigning to each edge in the representation of the network an edge value, wherein the edge values are equal to or greater than 1.
12. The method of claim 11, wherein each edge value is greater than a highest biased click distance value assigned to any of the authoritative nodes.
13. The method of claim 9, wherein the document relevance score for each document on the network is generated using a formula:
wherein:
wtf' represents a weighted term frequency, N represents a number of the documents in the network, n represents a number of the documents containing a query term, W cd represents a weight of a query-independent component, bdd represents a weight of one of the click distances, b ud represents a weight of a URL depth, CD represents one of the computed click distances or one of the assigned biased click distances, K ew represents a tuning constant related to edge weights, UD represents the URL depth, and k dd and k1 are constants.
14. A computing system comprising:
a processor; and a memory, the memory storing computer-executable instructions which when executed by the processor perform a method of determining document relevance scores for nodes in a network, said method comprising the steps of:
assigning biased click distance values to at least two authoritative nodes in a representation of the network, wherein the at least two authoritative nodes include at least a first authoritative node having a first assigned biased click distance and a second authoritative node having a second assigned click distance;
computing click distances for each non-authoritative node in the representation of the network to at least two of the authoritative nodes, wherein the click distances include a first click distance and a second click distance, the first click distance being a function of a number of links that need to be followed to create a path from a first non-authoritative node to the first authoritative node, and the second click distance being a function of a number of the links that need to be followed to create a path from the first non-authoritative node to the second authoritative node;
computing biased click distance values for the non-authoritative nodes, wherein the biased click distance values include at least a first biased click distance value that is a function of a lesser of the first and second click distances;
and using the biased click distance values to determine document relevance scores for each of the nodes in the network.
CA2618854A 2005-08-15 2006-08-15 Ranking search results using biased click distance Expired - Fee Related CA2618854C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US11/206,286 2005-08-15
US11/206,286 US7599917B2 (en) 2005-08-15 2005-08-15 Ranking search results using biased click distance
PCT/US2006/031965 WO2007022252A1 (en) 2005-08-15 2006-08-15 Ranking functions using a biased click distance of a document on a network

Publications (2)

Publication Number Publication Date
CA2618854A1 CA2618854A1 (en) 2007-02-22
CA2618854C true CA2618854C (en) 2014-04-22

Family

ID=37743763

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2618854A Expired - Fee Related CA2618854C (en) 2005-08-15 2006-08-15 Ranking search results using biased click distance

Country Status (17)

Country Link
US (1) US7599917B2 (en)
EP (1) EP1915703A4 (en)
JP (1) JP2009505292A (en)
KR (1) KR101301380B1 (en)
CN (1) CN101243435A (en)
AU (1) AU2006279520B2 (en)
BR (1) BRPI0614274A2 (en)
CA (1) CA2618854C (en)
IL (1) IL188902A (en)
MX (1) MX2008002173A (en)
MY (1) MY147720A (en)
NO (1) NO20080376L (en)
NZ (1) NZ565640A (en)
RU (1) RU2421802C2 (en)
TW (1) TWI396984B (en)
WO (1) WO2007022252A1 (en)
ZA (1) ZA200801435B (en)

Families Citing this family (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7584221B2 (en) * 2004-03-18 2009-09-01 Microsoft Corporation Field weighting in text searching
US7606793B2 (en) 2004-09-27 2009-10-20 Microsoft Corporation System and method for scoping searches using index keys
US7827181B2 (en) * 2004-09-30 2010-11-02 Microsoft Corporation Click distance determination
US7761448B2 (en) 2004-09-30 2010-07-20 Microsoft Corporation System and method for ranking search results using click distance
US7739277B2 (en) * 2004-09-30 2010-06-15 Microsoft Corporation System and method for incorporating anchor text into ranking search results
US7716198B2 (en) * 2004-12-21 2010-05-11 Microsoft Corporation Ranking search results using feature extraction
US20060200460A1 (en) * 2005-03-03 2006-09-07 Microsoft Corporation System and method for ranking search results using file types
US7792833B2 (en) * 2005-03-03 2010-09-07 Microsoft Corporation Ranking search results using language types
US8156112B2 (en) * 2006-11-07 2012-04-10 At&T Intellectual Property I, L.P. Determining sort order by distance
US8234272B2 (en) * 2007-05-04 2012-07-31 Sony Mobile Communications Ab Searching and ranking contacts in contact database
US20080319975A1 (en) * 2007-06-22 2008-12-25 Microsoft Corporation Exploratory Search Technique
US20090006358A1 (en) * 2007-06-27 2009-01-01 Microsoft Corporation Search results
EP2031819A1 (en) * 2007-09-03 2009-03-04 British Telecommunications Public Limited Company Distributed system
US9224149B2 (en) * 2007-10-15 2015-12-29 Google Inc. External referencing by portable program modules
US9348912B2 (en) * 2007-10-18 2016-05-24 Microsoft Technology Licensing, Llc Document length as a static relevance feature for ranking search results
US20090106221A1 (en) * 2007-10-18 2009-04-23 Microsoft Corporation Ranking and Providing Search Results Based In Part On A Number Of Click-Through Features
US7840569B2 (en) * 2007-10-18 2010-11-23 Microsoft Corporation Enterprise relevancy ranking using a neural network
US20090204889A1 (en) * 2008-02-13 2009-08-13 Mehta Rupesh R Adaptive sampling of web pages for extraction
US8010535B2 (en) 2008-03-07 2011-08-30 Microsoft Corporation Optimization of discontinuous rank metrics
US7958136B1 (en) * 2008-03-18 2011-06-07 Google Inc. Systems and methods for identifying similar documents
US8812493B2 (en) 2008-04-11 2014-08-19 Microsoft Corporation Search results ranking using editing distance and document information
US20100169311A1 (en) * 2008-12-30 2010-07-01 Ashwin Tengli Approaches for the unsupervised creation of structural templates for electronic documents
US8041729B2 (en) * 2009-02-20 2011-10-18 Yahoo! Inc. Categorizing queries and expanding keywords with a coreference graph
US20100228738A1 (en) * 2009-03-04 2010-09-09 Mehta Rupesh R Adaptive document sampling for information extraction
JP5261326B2 (en) * 2009-08-28 2013-08-14 日本電信電話株式会社 Information search device and information search program
TWI497322B (en) * 2009-10-01 2015-08-21 Alibaba Group Holding Ltd The method of determining and using the method of web page evaluation
US8738635B2 (en) 2010-06-01 2014-05-27 Microsoft Corporation Detection of junk in search result ranking
US8375061B2 (en) * 2010-06-08 2013-02-12 International Business Machines Corporation Graphical models for representing text documents for computer analysis
US9436764B2 (en) * 2010-06-29 2016-09-06 Microsoft Technology Licensing, Llc Navigation to popular search results
US9183299B2 (en) * 2010-11-19 2015-11-10 International Business Machines Corporation Search engine for ranking a set of pages returned as search results from a search query
US8898156B2 (en) 2011-03-03 2014-11-25 Microsoft Corporation Query expansion for web search
US9529915B2 (en) * 2011-06-16 2016-12-27 Microsoft Technology Licensing, Llc Search results based on user and result profiles
US8572096B1 (en) * 2011-08-05 2013-10-29 Google Inc. Selecting keywords using co-visitation information
US9495462B2 (en) 2012-01-27 2016-11-15 Microsoft Technology Licensing, Llc Re-ranking search results
US9348846B2 (en) 2012-07-02 2016-05-24 Google Inc. User-navigable resource representations
US9336302B1 (en) 2012-07-20 2016-05-10 Zuci Realty Llc Insight and algorithmic clustering for automated synthesis
CN104424291B (en) * 2013-09-02 2018-12-21 阿里巴巴集团控股有限公司 The method and device that a kind of pair of search result is ranked up
US9721309B2 (en) * 2013-12-31 2017-08-01 Microsoft Technology Licensing, Llc Ranking of discussion threads in a question-and-answer forum
JP5639319B1 (en) * 2014-04-07 2014-12-10 楽天株式会社 Information processing apparatus, information processing method, program, and storage medium
WO2015179328A1 (en) 2014-05-22 2015-11-26 3M Innovative Properties Company Neural network-based confidence assessment module for healthcare coding applications
US11226969B2 (en) * 2016-02-27 2022-01-18 Microsoft Technology Licensing, Llc Dynamic deeplinks for navigational queries
US11205103B2 (en) 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis
US10430581B2 (en) * 2016-12-22 2019-10-01 Chronicle Llc Computer telemetry analysis

Family Cites Families (147)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5222236A (en) 1988-04-29 1993-06-22 Overdrive Systems, Inc. Multiple integrated document assembly data processing system
US5257577A (en) 1991-04-01 1993-11-02 Clark Melvin D Apparatus for assist in recycling of refuse
US6202058B1 (en) 1994-04-25 2001-03-13 Apple Computer, Inc. System for ranking the relevance of information objects accessed by computer users
US5606609A (en) 1994-09-19 1997-02-25 Scientific-Atlanta Electronic document verification system and method
US5594660A (en) 1994-09-30 1997-01-14 Cirrus Logic, Inc. Programmable audio-video synchronization method and apparatus for multimedia systems
US5642502A (en) 1994-12-06 1997-06-24 University Of Central Florida Method and system for searching for relevant documents from a text database collection, using statistical ranking, relevancy feedback and small pieces of text
US5933851A (en) 1995-09-29 1999-08-03 Sony Corporation Time-stamp and hash-based file modification monitor with multi-user notification and method thereof
US5974455A (en) 1995-12-13 1999-10-26 Digital Equipment Corporation System for adding new entry to web page table upon receiving web page including link to another web page not having corresponding entry in web page table
US6314420B1 (en) 1996-04-04 2001-11-06 Lycos, Inc. Collaborative/adaptive search engine
JP3113814B2 (en) 1996-04-17 2000-12-04 インターナショナル・ビジネス・マシーンズ・コーポレ−ション Information search method and information search device
US5920859A (en) 1997-02-05 1999-07-06 Idd Enterprises, L.P. Hypertext document retrieval system and method
US5745890A (en) 1996-08-09 1998-04-28 Digital Equipment Corporation Sequential searching of a database index using constraints on word-location pairs
US5920854A (en) 1996-08-14 1999-07-06 Infoseek Corporation Real-time document collection search engine with phrase indexing
GB2323003B (en) 1996-10-02 2001-07-04 Nippon Telegraph & Telephone Method and apparatus for graphically displaying hierarchical structure
US6285999B1 (en) * 1997-01-10 2001-09-04 The Board Of Trustees Of The Leland Stanford Junior University Method for node ranking in a linked database
US5960383A (en) 1997-02-25 1999-09-28 Digital Equipment Corporation Extraction of key sections from texts using automatic indexing techniques
US5848404A (en) 1997-03-24 1998-12-08 International Business Machines Corporation Fast query search in large dimension database
US6256675B1 (en) 1997-05-06 2001-07-03 At&T Corp. System and method for allocating requests for objects and managing replicas of objects on a network
US6012053A (en) 1997-06-23 2000-01-04 Lycos, Inc. Computer system with user-controlled relevance ranking of search results
JPH1125119A (en) 1997-06-30 1999-01-29 Canon Inc Hypertext reference system
US5983216A (en) 1997-09-12 1999-11-09 Infoseek Corporation Performing automated document collection and selection by providing a meta-index with meta-index values indentifying corresponding document collections
US6182113B1 (en) 1997-09-16 2001-01-30 International Business Machines Corporation Dynamic multiplexing of hyperlinks and bookmarks
US6070191A (en) 1997-10-17 2000-05-30 Lucent Technologies Inc. Data distribution techniques for load-balanced fault-tolerant web access
US6351467B1 (en) 1997-10-27 2002-02-26 Hughes Electronics Corporation System and method for multicasting multimedia content
US6128701A (en) 1997-10-28 2000-10-03 Cache Flow, Inc. Adaptive and predictive cache refresh policy
US6594682B2 (en) 1997-10-28 2003-07-15 Microsoft Corporation Client-side system for scheduling delivery of web content and locally managing the web content
US5987457A (en) 1997-11-25 1999-11-16 Acceleration Software International Corporation Query refinement method for searching documents
US6389436B1 (en) 1997-12-15 2002-05-14 International Business Machines Corporation Enhanced hypertext categorization using hyperlinks
US6145003A (en) 1997-12-17 2000-11-07 Microsoft Corporation Method of web crawling utilizing address mapping
US6151624A (en) 1998-02-03 2000-11-21 Realnames Corporation Navigating network resources based on metadata
KR100285265B1 (en) 1998-02-25 2001-04-02 윤덕용 Db management system and inverted index storage structure using sub-index and large-capacity object
US6185558B1 (en) 1998-03-03 2001-02-06 Amazon.Com, Inc. Identifying the items most relevant to a current query based on items selected in connection with similar queries
US6125361A (en) 1998-04-10 2000-09-26 International Business Machines Corporation Feature diffusion across hyperlinks
US6240407B1 (en) 1998-04-29 2001-05-29 International Business Machines Corp. Method and apparatus for creating an index in a database system
US6098064A (en) 1998-05-22 2000-08-01 Xerox Corporation Prefetching and caching documents according to probability ranked need S list
US6285367B1 (en) 1998-05-26 2001-09-04 International Business Machines Corporation Method and apparatus for displaying and navigating a graph
US6182085B1 (en) 1998-05-28 2001-01-30 International Business Machines Corporation Collaborative team crawling:Large scale information gathering over the internet
US6208988B1 (en) 1998-06-01 2001-03-27 Bigchalk.Com, Inc. Method for identifying themes associated with a search query using metadata and for organizing documents responsive to the search query in accordance with the themes
JP2002517860A (en) 1998-06-08 2002-06-18 ケイシーエスエル インク. Method and system for retrieving relevant information from a database
US6006225A (en) 1998-06-15 1999-12-21 Amazon.Com Refining search queries by the suggestion of correlated terms from prior searches
US6638314B1 (en) 1998-06-26 2003-10-28 Microsoft Corporation Method of web crawling utilizing crawl numbers
JP4638984B2 (en) 1998-08-26 2011-02-23 フラクタル エッジ リミテッド Method and apparatus for mapping data files
US6549897B1 (en) 1998-10-09 2003-04-15 Microsoft Corporation Method and system for calculating phrase-document importance
US6360215B1 (en) 1998-11-03 2002-03-19 Inktomi Corporation Method and apparatus for retrieving documents based on information other than document content
US6385602B1 (en) 1998-11-03 2002-05-07 E-Centives, Inc. Presentation of search results using dynamic categorization
US20030069873A1 (en) 1998-11-18 2003-04-10 Kevin L. Fox Multiple engine information retrieval and visualization system
US6628304B2 (en) 1998-12-09 2003-09-30 Cisco Technology, Inc. Method and apparatus providing a graphical user interface for representing and navigating hierarchical networks
US6167369A (en) 1998-12-23 2000-12-26 Xerox Company Automatic language identification using both N-gram and word information
US6922699B2 (en) * 1999-01-26 2005-07-26 Xerox Corporation System and method for quantitatively representing data objects in vector space
US6418433B1 (en) 1999-01-28 2002-07-09 International Business Machines Corporation System and method for focussed web crawling
US6862710B1 (en) 1999-03-23 2005-03-01 Insightful Corporation Internet navigation using soft hyperlinks
US6304864B1 (en) 1999-04-20 2001-10-16 Textwise Llc System for retrieving multimedia information from the internet using multiple evolving intelligent agents
US6327590B1 (en) 1999-05-05 2001-12-04 Xerox Corporation System and method for collaborative ranking of search results employing user and group profiles derived from document collection content analysis
US7835943B2 (en) * 1999-05-28 2010-11-16 Yahoo! Inc. System and method for providing place and price protection in a search result list generated by a computer network search engine
US6990628B1 (en) 1999-06-14 2006-01-24 Yahoo! Inc. Method and apparatus for measuring similarity among electronic documents
US7072888B1 (en) 1999-06-16 2006-07-04 Triogo, Inc. Process for improving search engine efficiency using feedback
US6973490B1 (en) 1999-06-23 2005-12-06 Savvis Communications Corp. Method and system for object-level web performance and analysis
US6547829B1 (en) 1999-06-30 2003-04-15 Microsoft Corporation Method and system for detecting duplicate documents in web crawls
US7181438B1 (en) 1999-07-21 2007-02-20 Alberti Anemometer, Llc Database access system
US6598047B1 (en) 1999-07-26 2003-07-22 David W. Russell Method and system for searching text
US6442606B1 (en) 1999-08-12 2002-08-27 Inktomi Corporation Method and apparatus for identifying spoof documents
US6636853B1 (en) 1999-08-30 2003-10-21 Morphism, Llc Method and apparatus for representing and navigating search results
CA2389186A1 (en) 1999-10-29 2001-05-03 British Telecommunications Public Limited Company Method and apparatus for processing queries
US6263364B1 (en) 1999-11-02 2001-07-17 Alta Vista Company Web crawler system using plurality of parallel priority level queues having distinct associated download priority levels for prioritizing document downloading and maintaining document freshness
US6351755B1 (en) 1999-11-02 2002-02-26 Alta Vista Company System and method for associating an extensible set of data with documents downloaded by a web crawler
US6418453B1 (en) 1999-11-03 2002-07-09 International Business Machines Corporation Network repository service for efficient web crawling
US6418452B1 (en) 1999-11-03 2002-07-09 International Business Machines Corporation Network repository service directory for efficient web crawling
US6539376B1 (en) 1999-11-15 2003-03-25 International Business Machines Corporation System and method for the automatic mining of new relationships
US7016540B1 (en) 1999-11-24 2006-03-21 Nec Corporation Method and system for segmentation, classification, and summarization of video images
US6886129B1 (en) 1999-11-24 2005-04-26 International Business Machines Corporation Method and system for trawling the World-wide Web to identify implicitly-defined communities of web pages
US6546388B1 (en) 2000-01-14 2003-04-08 International Business Machines Corporation Metadata search results ranking system
US6883135B1 (en) * 2000-01-28 2005-04-19 Microsoft Corporation Proxy server using a statistical model
US6931397B1 (en) 2000-02-11 2005-08-16 International Business Machines Corporation System and method for automatic generation of dynamic search abstracts contain metadata by crawler
US6910029B1 (en) 2000-02-22 2005-06-21 International Business Machines Corporation System for weighted indexing of hierarchical documents
US6516312B1 (en) 2000-04-04 2003-02-04 International Business Machine Corporation System and method for dynamically associating keywords with domain-specific search engine queries
US6633867B1 (en) 2000-04-05 2003-10-14 International Business Machines Corporation System and method for providing a session query within the context of a dynamic search result set
US6718365B1 (en) * 2000-04-13 2004-04-06 International Business Machines Corporation Method, system, and program for ordering search results using an importance weighting
US6772160B2 (en) * 2000-06-08 2004-08-03 Ingenuity Systems, Inc. Techniques for facilitating information acquisition and storage
US6741986B2 (en) * 2000-12-08 2004-05-25 Ingenuity Systems, Inc. Method and system for performing information extraction and quality control for a knowledgebase
JP3573688B2 (en) 2000-06-28 2004-10-06 松下電器産業株式会社 Similar document search device and related keyword extraction device
US6601075B1 (en) 2000-07-27 2003-07-29 International Business Machines Corporation System and method of ranking and retrieving documents based on authority scores of schemas and documents
US6633868B1 (en) 2000-07-28 2003-10-14 Shermann Loyall Min System and method for context-based document retrieval
US7080073B1 (en) 2000-08-18 2006-07-18 Firstrain, Inc. Method and apparatus for focused crawling
US6959326B1 (en) 2000-08-24 2005-10-25 International Business Machines Corporation Method, system, and program for gathering indexable metadata on content at a data repository
US6598051B1 (en) 2000-09-19 2003-07-22 Altavista Company Web page connectivity server
US6560600B1 (en) * 2000-10-25 2003-05-06 Alta Vista Company Method and apparatus for ranking Web page search results
US7200606B2 (en) * 2000-11-07 2007-04-03 The Regents Of The University Of California Method and system for selecting documents by measuring document quality
US6622140B1 (en) 2000-11-15 2003-09-16 Justsystem Corporation Method and apparatus for analyzing affect and emotion in text
JP2002157271A (en) * 2000-11-20 2002-05-31 Yozan Inc Browser device, server device, recording medium, retrieving system and retrieving method
US20020103920A1 (en) * 2000-11-21 2002-08-01 Berkun Ken Alan Interpretive stream metadata extraction
US20020078045A1 (en) * 2000-12-14 2002-06-20 Rabindranath Dutta System, method, and program for ranking search results using user category weighting
US6778997B2 (en) * 2001-01-05 2004-08-17 International Business Machines Corporation XML: finding authoritative pages for mining communities based on page structure criteria
US7356530B2 (en) * 2001-01-10 2008-04-08 Looksmart, Ltd. Systems and methods of retrieving relevant information
US6766316B2 (en) 2001-01-18 2004-07-20 Science Applications International Corporation Method and system of ranking and clustering for document indexing and retrieval
WO2002063493A1 (en) * 2001-02-08 2002-08-15 2028, Inc. Methods and systems for automated semantic knowledge leveraging graph theoretic analysis and the inherent structure of communication
US7188106B2 (en) * 2001-05-01 2007-03-06 International Business Machines Corporation System and method for aggregating ranking results from various sources to improve the results of web searching
US6738764B2 (en) 2001-05-08 2004-05-18 Verity, Inc. Apparatus and method for adaptively ranking search results
IES20020335A2 (en) * 2001-05-10 2002-11-13 Changing Worlds Ltd Intelligent internet website with hierarchical menu
US6928425B2 (en) * 2001-08-13 2005-08-09 Xerox Corporation System for propagating enrichment between documents
US7076483B2 (en) 2001-08-27 2006-07-11 Xyleme Sa Ranking nodes in a graph
US6766422B2 (en) 2001-09-27 2004-07-20 Siemens Information And Communication Networks, Inc. Method and system for web caching based on predictive usage
US6944609B2 (en) 2001-10-18 2005-09-13 Lycos, Inc. Search results using editor feedback
US7428695B2 (en) 2001-10-22 2008-09-23 Hewlett-Packard Development Company, L.P. System for automatic generation of arbitrarily indexed hyperlinked text
US6763362B2 (en) 2001-11-30 2004-07-13 Micron Technology, Inc. Method and system for updating a search engine
US6829606B2 (en) 2002-02-14 2004-12-07 Infoglide Software Corporation Similarity search engine for use with relational databases
US6934714B2 (en) 2002-03-04 2005-08-23 Intelesis Engineering, Inc. Method and system for identification and maintenance of families of data records
US7693830B2 (en) * 2005-08-10 2010-04-06 Google Inc. Programmable search engine
US20040006559A1 (en) * 2002-05-29 2004-01-08 Gange David M. System, apparatus, and method for user tunable and selectable searching of a database using a weigthted quantized feature vector
AU2003243533A1 (en) 2002-06-12 2003-12-31 Jena Jordahl Data storage, retrieval, manipulation and display tools enabling multiple hierarchical points of view
CA2395905A1 (en) * 2002-07-26 2004-01-26 Teraxion Inc. Multi-grating tunable chromatic dispersion compensator
US7599911B2 (en) * 2002-08-05 2009-10-06 Yahoo! Inc. Method and apparatus for search ranking using human input and automated ranking
US7152059B2 (en) 2002-08-30 2006-12-19 Emergency24, Inc. System and method for predicting additional search results of a computerized database search user based on an initial search query
US7013458B2 (en) * 2002-09-09 2006-03-14 Sun Microsystems, Inc. Method and apparatus for associating metadata attributes with program elements
US6886010B2 (en) 2002-09-30 2005-04-26 The United States Of America As Represented By The Secretary Of The Navy Method for data and text mining and literature-based discovery
US7020648B2 (en) * 2002-12-14 2006-03-28 International Business Machines Corporation System and method for identifying and utilizing a secondary index to access a database using a management system without an internal catalogue of online metadata
US20040125606A1 (en) * 2002-12-26 2004-07-01 Kang-Ling Hwang Box type sensor lamp
US20040148278A1 (en) * 2003-01-22 2004-07-29 Amir Milo System and method for providing content warehouse
US6947930B2 (en) 2003-03-21 2005-09-20 Overture Services, Inc. Systems and methods for interactive search query refinement
US7028029B2 (en) 2003-03-28 2006-04-11 Google Inc. Adaptive computation of ranking
US7216123B2 (en) * 2003-03-28 2007-05-08 Board Of Trustees Of The Leland Stanford Junior University Methods for ranking nodes in large directed graphs
US7051023B2 (en) 2003-04-04 2006-05-23 Yahoo! Inc. Systems and methods for generating concept units from search queries
US7197497B2 (en) 2003-04-25 2007-03-27 Overture Services, Inc. Method and apparatus for machine learning a document relevance function
US7228301B2 (en) * 2003-06-27 2007-06-05 Microsoft Corporation Method for normalizing document metadata to improve search results using an alias relationship directory service
US7308643B1 (en) 2003-07-03 2007-12-11 Google Inc. Anchor tag indexing in a web crawler system
US7505964B2 (en) * 2003-09-12 2009-03-17 Google Inc. Methods and systems for improving a search ranking using related queries
US7346839B2 (en) * 2003-09-30 2008-03-18 Google Inc. Information retrieval based on historical data
US20050071328A1 (en) * 2003-09-30 2005-03-31 Lawrence Stephen R. Personalization of web search
US7552109B2 (en) * 2003-10-15 2009-06-23 International Business Machines Corporation System, method, and service for collaborative focused crawling of documents on a network
US20050086192A1 (en) * 2003-10-16 2005-04-21 Hitach, Ltd. Method and apparatus for improving the integration between a search engine and one or more file servers
US20050144162A1 (en) * 2003-12-29 2005-06-30 Ping Liang Advanced search, file system, and intelligent assistant agent
US20060047649A1 (en) * 2003-12-29 2006-03-02 Ping Liang Internet and computer information retrieval and mining with intelligent conceptual filtering, visualization and automation
US7483891B2 (en) * 2004-01-09 2009-01-27 Yahoo, Inc. Content presentation and management system associating base content and relevant additional content
US7392278B2 (en) * 2004-01-23 2008-06-24 Microsoft Corporation Building and using subwebs for focused search
US7499913B2 (en) * 2004-01-26 2009-03-03 International Business Machines Corporation Method for handling anchor text
US7343374B2 (en) * 2004-03-29 2008-03-11 Yahoo! Inc. Computation of page authority weights using personalized bookmarks
US7257577B2 (en) * 2004-05-07 2007-08-14 International Business Machines Corporation System, method and service for ranking search results using a modular scoring system
US7260573B1 (en) 2004-05-17 2007-08-21 Google Inc. Personalizing anchor text scores in a search engine
US7363296B1 (en) 2004-07-01 2008-04-22 Microsoft Corporation Generating a subindex with relevant attributes to improve querying
US20060036598A1 (en) * 2004-08-09 2006-02-16 Jie Wu Computerized method for ranking linked information items in distributed sources
US7739277B2 (en) * 2004-09-30 2010-06-15 Microsoft Corporation System and method for incorporating anchor text into ranking search results
US7827181B2 (en) * 2004-09-30 2010-11-02 Microsoft Corporation Click distance determination
US7761448B2 (en) * 2004-09-30 2010-07-20 Microsoft Corporation System and method for ranking search results using click distance
US7716198B2 (en) * 2004-12-21 2010-05-11 Microsoft Corporation Ranking search results using feature extraction
CA2601768C (en) * 2005-03-18 2016-08-23 Wink Technologies, Inc. Search engine that applies feedback from users to improve search results
US20060282455A1 (en) * 2005-06-13 2006-12-14 It Interactive Services Inc. System and method for ranking web content
US7716226B2 (en) * 2005-09-27 2010-05-11 Patentratings, Llc Method and system for probabilistically quantifying and visualizing relevance between two or more citationally or contextually related data objects
US20070150473A1 (en) * 2005-12-22 2007-06-28 Microsoft Corporation Search By Document Type And Relevance

Also Published As

Publication number Publication date
AU2006279520A1 (en) 2007-02-22
IL188902A0 (en) 2008-04-13
JP2009505292A (en) 2009-02-05
EP1915703A4 (en) 2011-11-16
MY147720A (en) 2013-01-15
RU2421802C2 (en) 2011-06-20
IL188902A (en) 2012-05-31
KR101301380B1 (en) 2013-08-29
KR20080043305A (en) 2008-05-16
TWI396984B (en) 2013-05-21
US7599917B2 (en) 2009-10-06
WO2007022252A1 (en) 2007-02-22
US20070038622A1 (en) 2007-02-15
NO20080376L (en) 2008-03-05
MX2008002173A (en) 2008-04-22
NZ565640A (en) 2010-02-26
EP1915703A1 (en) 2008-04-30
BRPI0614274A2 (en) 2011-03-22
AU2006279520B2 (en) 2011-03-17
RU2008105758A (en) 2009-08-20
TW200719183A (en) 2007-05-16
CN101243435A (en) 2008-08-13
ZA200801435B (en) 2009-05-27
CA2618854A1 (en) 2007-02-22

Similar Documents

Publication Publication Date Title
CA2618854C (en) Ranking search results using biased click distance
JP4698737B2 (en) Ranking function using document usage statistics
US6871202B2 (en) Method and apparatus for ranking web page search results
JP4950444B2 (en) System and method for ranking search results using click distance
RU2387005C2 (en) Method and system for ranking objects based on intra-type and inter-type relationships
JP5620913B2 (en) Document length as a static relevance feature for ranking search results
US20090299978A1 (en) Systems and methods for keyword and dynamic url search engine optimization
JP2006164246A (en) Entity-specific tunable search
Choi et al. Ranking web pages relevant to search keywords

Legal Events

Date Code Title Description
EEER Examination request
MKLA Lapsed

Effective date: 20220301

MKLA Lapsed

Effective date: 20200831