WO2001046828A2 - A navigation engine for assessing the quality of a trail between pages in a network - Google Patents
A navigation engine for assessing the quality of a trail between pages in a network Download PDFInfo
- Publication number
- WO2001046828A2 WO2001046828A2 PCT/GB2000/004765 GB0004765W WO0146828A2 WO 2001046828 A2 WO2001046828 A2 WO 2001046828A2 GB 0004765 W GB0004765 W GB 0004765W WO 0146828 A2 WO0146828 A2 WO 0146828A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- trail
- query
- navigation engine
- navigation
- trails
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/954—Navigation, e.g. using categorised browsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/07—Guided tours
Definitions
- the present invention relates to the field of browsing and navigation for the purpose of finding preferred trails between pages in a network, and more particularly to a navigation engine and associated method which assesses trails between pages in the World-Wide-Web or in any other hypertext system to assist in finding relevant information.
- the environment in which the present invention operates is the World-to-network
- the Web can be viewed as a hypertext database containing nodes, which are the Web pages, and links between these nodes defining its topology. Each Web page has a unique identifier describing where the page resides and how to retrieve it. The mechanism used is that of a Unified Resource Locator, or simply URL, which specifies the unique path for locating the Web page. Every link connects two nodes. The node we start at is called the anchor node and the node we finish at is called the destination node.
- a navigation session results in the user visiting a sequence of Web pages, which is called a trail.
- a trail is represented by the sequence of URLs associated with its pages. For example, a user's trail may be the sequence of URLs:
- the present invention is aimed at providing this facility, and more particularly to the provision of a system which will assess possible Web pages in a trail to help produce a trail of relevant Web pages which can be followed by a user.
- the present invention provides a navigation engine which uses a query defining a subject of interest to a user to select links between relevant pages in a network of linked textual or multi-media information, the navigation engine being able to assess the suitability of a plurality of links forming a trail based on the relevance of the pages in the trail, wherein the navigation engine provides an output related to the suitability to a user of various trails assessed.
- the output preferably includes a list of suitable trails available to be accessed by a user.
- the list of suitable trails may be in order of suitability, with the most suitable trail listed first.
- the relevance of a page in a network is preferably assessed based on the relevance of the page with respect to a query.
- a score is allocated to indicate the relevance of a page with respect to a query.
- the suitability of a trail may be calculated based on a chosen scoring function.
- the scoring function preferably involves at least one of the following: (a) the average score for a page in the trail with respect to a query, taking into account each step in the trail;
- the trails are preferably ordered based on the result of the scoring function.
- the trails are ranked by score, the highest score reflecting the best trail.
- the trail When assessing a trail, the trail may end with a page having no out-link.
- Assessment of a plurality of trails preferably comprises an exploration stage and a convergence stage.
- the exploration stage preferably includes extending trail lengths and scoring the trails that are induced.
- the convergence stage preferably assesses which induced trails are more suitable and gives these trails more weight at each iteration based on their ranking. Preferably an assessment of trails is conducted over sufficient iterations to produce a useful output. Any number of iterations may be applied, as appropriate.
- the network is preferably a hypertext system, such as the World-Wide-Web.
- a navigation engine according to the present invention is ideally suited for use when loaded into a computer system for connection to a network.
- the navigation engine may be accessible through the Web.
- it may be supplied to a user in the form of computer software stored on a carrier, such as a compact disk or floppy disk.
- the present invention further provides a system for facilitating exploration by a user of a network of linked textual or multi-media information, the system comprising: a user interface for receiving a query which defines a subject of interest to the user; and a navigation engine as described or claimed herein.
- Figure 1 is a flow chart depicting an embodiment of the algorithm for producing a best trail according to the present invention
- Figure 2 is a schematic representation of pages in a network (or Web); and Figure 3 shows an example of a navigation tree, which could result from the
- Figure 4 shows a user interface workstation for using a navigation engine according to the present invention to surf a network such as the World-Wide Web.
- the context of a navigation session is a query, which normally would be a set of keywords.
- the query can be viewed as the goal of the navigation session in the sense that the user would like to follow a trail which maximises the suitability of the trail to the query.
- the suitability of a trail to the query is realised by its score, which is a function of the scores of the individual Web pages of the trail with respect to the query. (We assume that scores of trails are numeric and trails having higher scores are more relevant to the query.)
- the score of a page with respect to a query indicates how closely the page contents match the query, i.e. how relevant the Web page is to the query.
- the scoring of individual Web pages with respect to a query is realised by information retrieval techniques.
- An example of a scoring method for a trail with respect to a query is that of taking the average score of its pages with respect to the query; other alternative trail scoring methods are described later.
- a navigation engine which uses a best trail algorithm, automates the process of navigation by suggesting to the user the best trail to follow, with respect to a query, given that the user is currently browsing a Web page. More specifically, given a starting URL of a Web page and a query it will present the user with the trail having maximal suitability to the query given the parameters input to the algorithm; we call this trail the best trail.
- the algorithm can be easily refined so that the n, with n >1 , most suitable trails can be returned instead of just the best trail; in addition, the algorithm can compute the best trails for multiple starting points.
- the user-interface i.e. how the trails are presented to the user, can take many forms of a type known to those skilled in the art and need not be described in detail herein.
- a navigation engine according to the present invention will ideally be supplied as a support tool for browsing accessible through a Web browser or as a plug-in to a search engine.
- the present invention is applicable in any hypertext system, such as an electronic book, for the purpose of navigation assistance.
- Section 2 we describe the working of the algorithm used by the navigation engine, the data structures used in the algorithm and the operations on these data structures.
- Section 3 we give the pseudo-code and flowchart of the best trail algorithm.
- Section 4 we give an illustrative example of the working of the best trail algorithm.
- Section 5 we describe a specific embodiment of the algorithm in a hypertext system.
- Each Web page (or simply page) has an associated URL which acts as a unique identifier or address for the purpose of locating the page and retrieving it.
- URL in the generic sense, where a hypertext system other than the Web will also have some form of unique identification of its pages.
- a link is an ordered pair of nodes from an anchor node to a destination node. Since nodes represent Web pages and these are uniquely identified by URLs, we consider a link to be an ordered pair of URLs.
- the out-links from a Web page are the links embedded within this page.
- the collections of links embedded in the pages of a hypertext system form a directed graph which determines its topology.
- a fra/7 is a sequence of URLs which is consistent with the topology of the Web. That is, any two adjacent URLs in the sequence form a link, which is embedded in the Web page identified by the anchor URL.
- a query provides the goal of a user's navigation session. It is normally specified by the user as a set of keywords, for instance in the manner a query is specified to a search engine.
- the score of a URL with respect to a query is the relevance or weight of the page associated with the URL with respect to the query. (At times we refer to the score of a URL as the score of its associated page.) That is, the score of a URL with respect to a query indicates how closely the page associated with the URL matches the query.
- the scoring function returns a numeric value, and that URLs with higher scores are more relevant to the query. In this embodiment we also assume that all URLs have a positive (i.e. greater than zero) score, which in the case of a non-relevant URL will be small.
- the scoring function must be consistent in the sense that, within a navigation session, all URLs are scored in the same manner.
- the score of a trail with respect to a query is a function of the scores of the individual Web pages of the trail with respect to the query.
- scores of trails are numeric and positive, and trails with higher scores are more suitable to the query.
- An ordering of trails with respect to a query Q is defined as follows. Given two trails, T- t and T 2 , we say that is better than T 2 with respect to Q if the score of " T-i with respect to Q is greater or equal to the score of T 2 with respect to Q. 8) Let ⁇ T ⁇ , T 2 , * .., Tn ⁇ , n ⁇ 1 , be a set of trails and Q be a query. The rank of a trail Tj in the set of trails, with respect to Q, is determined as follows. The trail with the highest score within the set is given the highest rank, i.e. 1 , the trail with the second highest score within the set is ranked as 2, ..., and the trail with the lowest score within the set is given the lowest rank. Two trails with the same score are given the same rank, and all trail scores are with respect to Q.
- Browsing is the general activity of exploring Web pages and inspecting their contents, and navigation is the activity of following links (colloquially known as "surfing").
- the algorithm follows links from anchor to destination according to the topology of the Web or the hypertext under consideration (i.e. when an out-link exists from the Web page identified by its URL, then it may be traversed by the algorithm).
- the algorithm builds a navigation tree (see Figure 3) whose root node U- is labelled by the URL of the starting point. Each time a destination URL is chosen, a new node U , U3 is added to the navigation tree and is labelled by the destination URL. Nodes that may be added to the navigation tree as a result of traversing a link that has not yet been followed from an existing node are called tip nodes. We also consider the special case when a link has been traversed to a destination URL, and the page U 2 associated with this URL has no out-links. Nodes in the navigation tree which are labelled by such URLs are called leaf nodes, and are also considered to be tip nodes.
- each tip node of the current state of the navigation tree is considered to be a destination node of an anchor of a link to be followed; in the case when the tip node is a leaf node, we can consider the destination node to be the leaf itself.
- the algorithm uses a random device to choose a tip node to be added to the navigation tree; in the special case when the tip node is a leaf node, the navigation tree remains unchanged.
- the weight that is attached to a tip node for the purpose of the probabilistic choice is proportional to the score of the trail induced by the tip node, which is the unique sequence of URLs labelling the nodes in the navigation tree forming a path from the root node of the tree to the tip node under consideration.
- the algorithm has two separate stages, the first being the exploration stage and the second being the convergence stage. Each stage comprises a preset number of iterations.
- a tip node is chosen with probability purely proportional to the score of the trail that it induces.
- a parameter called the discrimination factor (df) which is a real number strictly between zero and one, determines the convergence rate.
- a navigation tree is a tree whose root is the starting URL of a navigation session.
- the nodes in the navigation tree are labelled by URLs, where it is possible for two different nodes in the tree to be labelled by the same URL.
- Each arc in a navigation tree from one node (the anchor node) to another node (the destination node) corresponds to an existing link in the Web from the URL labelling the anchor node to the URL labelling the destination node.
- a frontier node in a navigation tree is a node in this navigation tree that is either (a) a leaf node, when the page associated with its URL has no out-links, or
- a tip node in a navigation tree is either
- the trail induced by a tip node in a navigation tree is the unique sequence of URLs labelling the nodes in the navigation tree which form a path from the root node of the tree to the tip node under consideration.
- the score of the trail induced by a tip node in a navigation tree, with respect to a query is the score of the trail induced by the tip node with respect to the query.
- (b) add a new node and arc to the navigation tree such that the anchor node of this arc is the parent frontier node of this tip node and the destination node is the tip node itself.
- the new node becomes a frontier node of the extended navigation tree.
- An indexed set ⁇ -,, B 2 B N ⁇ consisting of N trails, one for each input
- this set of trails may be ranked such that B ⁇ is better than B 2 with respect to Q, B 2 is better than ⁇ 3 with respect to Q, ..., and S w-i is better than B N with respect to 0.
- lexpiore > 0 is the number of iterations during the exploration stage of the algorithm.
- /converge ⁇ 1 is the number of iterations during the convergence stage of the algorithm.
- ⁇ Di, D 2 , ..., D M ⁇ is an indexed set of M navigation trees for each input URL ( ⁇ , 1 ⁇ k ⁇ N.
- D, ⁇ U k ⁇ , i.e. D, is a navigation tree having a single node U k , which is also its root, where 1 ⁇ k ⁇ N.
- ⁇ f-i, f 2 , ..., f n ⁇ is the set of tip nodes of D,
- power x, yj is a shorthand for x raised to the power of y and
- x-y is a shorthand for the multiplication of x and y.
- prob(Q, D,, x, j) is the probability of a tip t in the navigation tree D h with respect to the query Q.
- Algorithm 1 the flowchart of the algorithm is given in Figure 1.
- the algorithm has a main outer for loop starting at line 2 and ending at line 16, which computes the best trail for each one of the N input URLs.
- the first inner for loop starting at line 3 and ending at line 14 recomputes the best trail M times, given the starting URL U k .
- the overall best trail out of the M iterations with the same starting URL, is chosen at line 15 of the algorithm.
- the algorithm has two further inner for loops, the first one starting at line 5 and ending at line 8 comprises the exploration stage of the algorithm, and the second one starting at line 9 and ending at line 12 comprises the convergence stage of the algorithm. Finally, the set of N best trails for the set of N input URLs is returned at line 17 of the algorithm.
- Algorithm 1 Best.Trail(Q, ⁇ U 1t U 2 , * * *, U , M)
- FIG 2 we show an example Web topology, where each node is annotated with its URL and the score of this URL with respect to a given query is given in parentheses.
- J ⁇ is the starting URL of the best trail algorithm
- a possible navigation tree after seven node extensions is given in Figure 3.
- Each node in the navigation tree is annotated with a unique number and with its URL; the tip nodes of the navigation tree are shaded.
- the root of the navigation tree is node 0, which is labelled by the starting URL U ⁇ ⁇ , and the nodes that were added to the navigation tree as a result of the seven node extensions are numbered from 1 to 7.
- Dashed nodes and arcs indicate URLs and links that were, respectively, previously visited and traversed.
- the frontier nodes of the navigation tree are 1 , 5, 6 and 7.
- Node 1 is also a tip node of the navigation tree since it is a leaf node.
- Node 5 is the parent of two tip nodes, numbered 8 and 9.
- Node 6 is the parent of one tip node, numbered 10.
- node 7 is the parent of two tip nodes, numbered 11 and 12.
- Table 1 shows the tips, their induced trails and the score of these trails according to the first three trail scoring functions suggested in Section 1 term (6). (As will be noted, in this example, the trail to tip 11 is considered the best trail (i.e. highest score) irrespective of the scoring function (a,b,c) used.) Using this table the probability of the next tip node to add to the navigation tree can be computed. As can be seen these probabilities are, in general, different for different trail scoring functions.
- Table 1 The trails induced by the tips and their scores 5 Industrial Application
- the best trail algorithm is applicable in any hypertext system, such as an electronic book, for the purpose of navigation assistance.
- the algorithm could be used for the purpose of calculating and displaying to the user the best trail for each of the top URLs that match the input query.
- the user would be asked to input using a keyboard 20 or mouse 22 (for example) a query and using the destination URLs of the links embedded in the currently browsed Web page as the starting points for navigation, the browser would display on a screen 24 to the user the best trail for each one of these URLs.
- the algorithm can be easily refined so that for each starting URL the n, with n > 1 , most relevant trails can be returned rather than just the best trail. This process can be repeated after the user follows a link.
- a navigation engine and system according to the present invention provide very useful tools for use when navigating within a hypertext system, such as the World-Wide-Web.
- a hypertext system such as the World-Wide-Web.
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU18714/01A AU1871401A (en) | 1999-12-20 | 2000-12-12 | A navigation engine for assessing the quality of a trail between pages in a network |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB9930070A GB2357596A (en) | 1999-12-20 | 1999-12-20 | A navigation engine for assessing the quality of a trail between linked pages |
GB9930070.9 | 1999-12-20 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2001046828A2 true WO2001046828A2 (en) | 2001-06-28 |
WO2001046828A3 WO2001046828A3 (en) | 2002-07-11 |
Family
ID=10866656
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/GB2000/004765 WO2001046828A2 (en) | 1999-12-20 | 2000-12-12 | A navigation engine for assessing the quality of a trail between pages in a network |
Country Status (3)
Country | Link |
---|---|
AU (1) | AU1871401A (en) |
GB (1) | GB2357596A (en) |
WO (1) | WO2001046828A2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110264673A1 (en) * | 2010-04-27 | 2011-10-27 | Microsoft Corporation | Establishing search results and deeplinks using trails |
US8150869B2 (en) | 2008-03-17 | 2012-04-03 | Microsoft Corporation | Combined web browsing and searching |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7720842B2 (en) * | 2001-07-16 | 2010-05-18 | Informatica Corporation | Value-chained queries in analytic applications |
US6820077B2 (en) | 2002-02-22 | 2004-11-16 | Informatica Corporation | Method and system for navigating a large amount of data |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0950960A2 (en) * | 1998-04-17 | 1999-10-20 | Xerox Corporation | Usage based methods of traversing and displaying generalized graph structures |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6304872B1 (en) * | 1998-08-13 | 2001-10-16 | Tornado Technology Co. Ltd. | Search system for providing fulltext search over web pages of world wide web servers |
-
1999
- 1999-12-20 GB GB9930070A patent/GB2357596A/en not_active Withdrawn
-
2000
- 2000-12-12 AU AU18714/01A patent/AU1871401A/en not_active Abandoned
- 2000-12-12 WO PCT/GB2000/004765 patent/WO2001046828A2/en not_active Application Discontinuation
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0950960A2 (en) * | 1998-04-17 | 1999-10-20 | Xerox Corporation | Usage based methods of traversing and displaying generalized graph structures |
Non-Patent Citations (2)
Title |
---|
DATABASE INSPEC [Online] THE INSTITUTION OF ELECTRICAL ENGINEERS, STEVENAGE, GB; Inspec No. AN6690421, XP002187285 -& CHEN M. ET AL.: "Cha-Cha: a system for organizing intranet search results" PROC. USENIX '99: 2ND. SYMPOSIUM ON INTERNET TECHNOLOGIES AND SYSTEMS, [Online] 11 - 14 October 1999, pages 47-58, XP002187295 Boulder CO, USA Retrieved from the Internet: <URL:http://citeseer.nj.nec.com/cachedpage /209101/1> [retrieved on 2002-01-11] * |
WANTZ L.J. ET AL: "Toward User-Centric Navigation of the Web: COOL Links using SPI" WWW, [Online] XP002187284 Retrieved from the Internet: <URL:http://www.scope.gmd.de/info/www6/pos ters/774/ucn.htm> [retrieved on 2002-01-09] * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8150869B2 (en) | 2008-03-17 | 2012-04-03 | Microsoft Corporation | Combined web browsing and searching |
US20110264673A1 (en) * | 2010-04-27 | 2011-10-27 | Microsoft Corporation | Establishing search results and deeplinks using trails |
US10289735B2 (en) * | 2010-04-27 | 2019-05-14 | Microsoft Technology Licensing, Llc | Establishing search results and deeplinks using trails |
Also Published As
Publication number | Publication date |
---|---|
AU1871401A (en) | 2001-07-03 |
GB9930070D0 (en) | 2000-02-09 |
WO2001046828A3 (en) | 2002-07-11 |
GB2357596A (en) | 2001-06-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6101503A (en) | Active markup--a system and method for navigating through text collections | |
US6067552A (en) | User interface system and method for browsing a hypertext database | |
USRE42527E1 (en) | Virtual directory | |
US6321228B1 (en) | Internet search system for retrieving selected results from a previous search | |
US6704722B2 (en) | Systems and methods for performing crawl searches and index searches | |
AU2002301600C1 (en) | System and Method Allowing Advertisers to Manage Search Listings in a Pay for Placement Search System Using Grouping | |
US8090740B2 (en) | Search-centric hierarchichal browser history | |
US7062488B1 (en) | Task/domain segmentation in applying feedback to command control | |
US7136845B2 (en) | System and method for query refinement to enable improved searching based on identifying and utilizing popular concepts related to users' queries | |
EP1596314B1 (en) | Method and system for determining similarity between queries and between web pages based on their relationships | |
US6775674B1 (en) | Auto completion of relationships between objects in a data model | |
US6647381B1 (en) | Method of defining and utilizing logical domains to partition and to reorganize physical domains | |
US20090006358A1 (en) | Search results | |
US20040162820A1 (en) | Search cart for search results | |
US20120221568A1 (en) | Variable Personalization of Search Results in a Search Engine | |
JP2001522097A (en) | Retrieval of information from hierarchically structured documents | |
WO2001016807A1 (en) | An internet search system for tracking and ranking selected records from a previous search | |
US20070067268A1 (en) | Navigation of structured data | |
WO2008157088A1 (en) | Method for enhancing search results | |
US6301583B1 (en) | Method and apparatus for generating data files for an applet-based content menu using an open hierarchical data structure | |
JP5320835B2 (en) | Search result display method, program for realizing search result display function, and search result display system | |
WO2001046828A2 (en) | A navigation engine for assessing the quality of a trail between pages in a network | |
US7490082B2 (en) | System and method for searching internet domains | |
WO2000007133A1 (en) | Method and system for applying user specified hyperlinks | |
Levene et al. | Navigating the world wide web |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
AK | Designated states |
Kind code of ref document: A3 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2000981479 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2000981479 Country of ref document: EP |
|
NENP | Non-entry into the national phase in: |
Ref country code: JP |