WO2005022401A1 - Method, device and software for querying and presenting search results - Google Patents

Method, device and software for querying and presenting search results Download PDF

Info

Publication number
WO2005022401A1
WO2005022401A1 PCT/CA2003/001283 CA0301283W WO2005022401A1 WO 2005022401 A1 WO2005022401 A1 WO 2005022401A1 CA 0301283 W CA0301283 W CA 0301283W WO 2005022401 A1 WO2005022401 A1 WO 2005022401A1
Authority
WO
WIPO (PCT)
Prior art keywords
index
keyword
query
search results
match
Prior art date
Application number
PCT/CA2003/001283
Other languages
French (fr)
Inventor
Tym Feindel
David Gosse
Original Assignee
Vortaloptics, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vortaloptics, Inc. filed Critical Vortaloptics, Inc.
Priority to PCT/CA2003/001283 priority Critical patent/WO2005022401A1/en
Priority to AU2003258430A priority patent/AU2003258430B2/en
Priority to CA2537269A priority patent/CA2537269C/en
Publication of WO2005022401A1 publication Critical patent/WO2005022401A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results

Definitions

  • the present invention relates to search engines and indexes, and more particularly to a method, device and software for querying and presenting search results obtained from a plurality of indexes.
  • HTTP Hyper Text Transfer Protocol
  • HTML Hyper Text Markup Language
  • HTML may be used to access files provided in many different formats.
  • these web pages are accessible using an addressing allowing scheme allowing pages on the web to be accessed by a Uniform Resource Locator ("URL").
  • URL Uniform Resource Locator
  • search engines While certain search engines increasinglyly attempt to broadly index significant portions of the entire web, other search engines focus on a more specific target and are designed, for example, to exhaustively index a particular web site of an institution. Such a search of a specific target is commonly referred to as a "vertical" search. As a single web site may include hundreds or thousands of web pages, such a "vertical" search engine may be very useful.
  • a search engine indexes web pages by keywords. URLs are indexed against keywords contained in the associated web page. End-user can thus search web pages using the keywords. If there are one or more index entries that match the keyword(s), records corresponding to those index entries may be retrieved, and relevant fields of those records may be displayed to the end-user as matching results.
  • an institution may sometimes offer end-users the capability to search a public index as well.
  • the institution may be taking a risk that some of the search results may not be appropriate for presentation.
  • a query made by the end-user that results in a match for a website operated by a main competitor of the institution.
  • the institution may wish to avoid presenting such search results to the end-user.
  • the search results are combined from results from a first index and results from a second index.
  • the first index comprises a plurality of index entries modifiable by an administrator
  • the second index comprises a plurality of index entries that are not modifiable by the administrator.
  • any search result from the second index for which an associated key field is identical to the associated key field of a matching search result in the first set of search results is discarded in favor of the matching search result in the first set of search results.
  • a method of presenting search results in a response to an end-user query the search results being combined from results from a first index and results from a second index, the first index comprising a plurality of index entries modifiable by an administrator, the second index comprising a plurality of index entries that are not modifiable by the administrator, the index entries of the first index and the second index each having an associated key field, the method comprising:
  • each of the index entries comprises at least one keyword
  • the querying index entries of the first index comprises matching keywords in a query to the at least one keyword for each of the index entries in the first index.
  • each of the at least one keyword is associated with a weight
  • the quality of match is calculated by summing weights for each of the one keyword that matches a keyword in the query.
  • the associated key field of each the search result obtained from the first index identifies a uniform resource locator (URL), and the associated key field of each the search result obtained from the second index identifies a URL.
  • URL uniform resource locator
  • each of the index entries comprises at least one keyword
  • the querying index entries of the first index comprises matching keywords in a query to the at least one keyword for each of the index entries in the first index.
  • each of the at least one keyword is associated with a weight
  • the quality of match is calculated by summing weights for each of the one keyword that matches a keyword in the query.
  • the combining in (iii) comprises utilizing the quality of match to present an ordered listing of URLs identified in each of the search results in the list of matching search results.
  • the method further comprises excluding search results having a predetermined quality of match from being presented as part of the ordered listing.
  • the predetermined quality of match is a null value.
  • a computing device comprising a processor and computer readable memory, the memory storing a first index and a second index, the first index comprising a plurality of index entries modifiable by an administrator, the second index comprising a plurality of index entries that are not modifiable by the administrator, the index entries of the first index and the second index each having an associated key field, search engine software adapting the device to
  • each of the index entries comprises at least one keyword
  • the search engine software further adapts the device to query index entries of the first index by matching keywords in a query to the at least one keyword for each of the index entries in the first index.
  • each of the at least one keyword is associated with a weight
  • the search engine software further adapts the computing device to calculate the quality of match by summing weights for each of the one keyword that matches a keyword in the query.
  • the associated key field of each the search result obtained from the first index identifies a uniform resource locator (URL), and the associated key field of each the search result obtained from the second index identifies a URL.
  • URL uniform resource locator
  • the index entries comprises at least one keyword
  • the search engine software further adapts the device to query index entries of the first index to match keywords in a query to the at least one keyword for each of the index entries in the first index.
  • each of the at least one keyword is associated with a weight
  • the search engine software adapts the computing device to calculate the quality of match by summing weights for each of the one keyword that matches a keyword in the query.
  • the search engine software further adapts the computing device to combine the first and second set of search results from the each of the qualities of match to present an ordered listing of URLs identified in each of the search results in the list of matching search results.
  • the search engine software further adapts the computing device to exclude search results having a predetermined quality of match from being presented as part of the ordered listing.
  • the predetermined quality of match is a null value.
  • a computer readable medium storing computer executable instructions that when loaded at a computing device comprising a processor and processor readable memory storing a first index and a second index, the first index comprising a plurality of index entries modifiable by an administrator, the second index comprising a plurality of index entries that are not modifiable by the administrator, the index entries of the first index and the second index each having an associated key field, adapt the computing device to:
  • each of the index entries comprises at least one keyword
  • the computer executable instructions further adapt the computing device to query index entries of the first index to match keywords in a query to the at least one keyword for each of the index entries in the first index.
  • each of the at least one keyword is associated with a weight
  • the computer executable instructions further adapt the computing device to calculate the quality of match by summing weights for each of the one keyword that matches a keyword in the query.
  • the associated key field of each the search result obtained from the first index identifies a uniform resource locator (URL), and the associated key field of each the search result obtained from the second index identifies a URL.
  • URL uniform resource locator
  • each of the index entries comprises at least one keyword
  • the computer executable instructions further adapt the computing device to query index entries of the first index comprises code for matching keywords in a query to the at least one keyword for each of the index entries in the first index.
  • each of the at least one keyword is associated with a weight
  • the computer executable instructions further adapt the computing device to calculate the quality of match by summing weights for each of the one keyword that matches a keyword in the query.
  • the computer executable instructions further adapt the computing device to utilize the quality of match to present an ordered listing of URLs identified in each of the search results in the list of matching search results.
  • the computer executable instructions further adapt the computing device to exclude search results having a predetermined quality of match from presentation to the end-user.
  • the predetermined quality of match is a null value.
  • FIG. 1 A is a simplified schematic diagram of an exemplary data communications network interconnected with an indexing server exemplary of an embodiment of the present invention, in communication with a plurality of computing devices;
  • FIG. 1 B is a simplified schematic block diagram of a hardware architecture of the indexing server of FIG. 1A;
  • FIG. 2A is a logical block diagram of software and data components at the indexing server of FIGS. 1A and 1B;
  • FIG.2B is a schematic block diagram of an exemplary database schema for an index illustrated in FIG. 2A;
  • FIG. 2C is an illustrative example of a keyword/weight database table corresponding to the database schema of FIG.2B;
  • FIG.2D is an illustrative example of a URL database table corresponding to the database schema of FIG.2B;
  • FIG.3A is a schematic flow chart of exemplary steps for associating keywords and assigning weightings to URLs in order to create records in the databases of FIGS. 2C and 2D;
  • FIG. 3B is a schematic flow chart of exemplary steps performed by the indexing server to query indexes in response to a query request;
  • FIG. 4A schematically illustrates search results obtained for an example query
  • FIG. 4B schematically illustrates search results obtained for another example query.
  • FIG. 1A illustrates an exemplary data communications network 100, interconnected with an indexing server 110 exemplary of an embodiment of the present invention, in communication with a plurality of computing devices 120a, 120b and 120c (individually and collectively devices 120)/
  • Computing devices 120 and indexing server 110 are all conventional computing devices, each including a processor and computer readable memory storing an operating system and software applications and components for execution.
  • Data communications network 100 may, for example, be a conventional local area network that adheres to suitable network protocol such as the Ethernet, token ring or similar protocols. Alternatively, the network protocol may be compliant with higher level protocols such as the Internet protocol (IP), Appletalk, or IPX protocols. Similarly, network 100 may be a wide area network, or the public internet.
  • Client computing devices 120 are network aware computing devices, providing an end-user interface that allows an end-user to view information stored at indexing server 110. Computing devices 120 may for example, be conventional Windows based computing devices storing and executing an HTML compliant browser, such as a Microsoft Windows Explorer, Netscape Navigator or similar browser.
  • indexing server 110 stores web indexing information, and may store software allowing devices 120 to search the stored indexing information.
  • indexing server 110 is a conventional network capable server.
  • Indexing server 110 could, for example, be an Intel x86 based computer acting as a Microsoft Windows NT, Apple, or Unix based server, workstation, personal computer or the like.
  • Example indexing server 110 includes a processor 12, in communication with computer storage memory 114; network interface 116; input output interface 118; and video adapter 122.
  • indexing server 110 may optionally include a display 124 interconnected with adapter 122; input/output devices, such as a keyboard 126, disk drive 128, and a mouse 130 or the like.
  • Processor 112 is typically a conventional central processing unit, and may for example be a microprocessor in the INTEL x86 family. Of course, processor 112 could be any other suitable processor known to those skilled in the art.
  • Computer storage memory 114 includes a suitable combination of random access memory, read-only- memory, and disk storage memory used by processor 112 to store and execute software programs adapting processor 112 to function in manners exemplary of the present invention.
  • Disk drive 128 is capable of reading and writing data to or from a computer readable medium 132 used to store software and data, exemplary of embodiments of the present invention, to be loaded into memory 114.
  • Computer readable medium 132 may be a CD-ROM, diskette, tape, ROM-Cartridge or the like.
  • Network interface 126 is any interface suitable to physically link server 110 to network 100. Interface 126 may, for example, be an Ethernet, ATM, ISDN interface or modem that may be used to pass data from and to network 100 or another suitable communications network.
  • FIG.2A is a logical block diagram of software and data components at server 110.
  • indexing server 110 hosts two indexes, including a private index 212 and a public index 213, as well as search engine software 214, end-user interface 215, and an administrator interface 216.
  • an end-user at devices 120 may access search engine software 214 through network 100, to communicate with end-user interface 215.
  • the search engine software 214 may itself be embodied as one or more software modules stored in memory and executable on a processor in the indexing server 110.
  • Index 212 contains index entries to be searched by end-users.
  • An administrator of a particular institution may modify only records of index 212 associated with that institution.
  • End-users are provided with index entries corresponding to one institution.
  • end-users are provided with only access based on the web address used for end-user interface 215.
  • access to index 212 is provided by an institution through that institution's web site. End-users are classified as being associated with the institution whose web site they have accessed, and are only provided data from index entries controlled by that institution.
  • Search engine software 214 accesses two search algorithms 222 and 223, one associated with each index 212, 213, which may define how a search is to be performed on the associated index 212, 213.
  • These search algorithms 222, 223 may be "modular" in the sense that the search algorithms 222, 223 may be modified or replaced individually.
  • indexes 212 and 213 are stored as one or more relational databases.
  • FIG. 2B is a schematic block diagram of an exemplary database schema 230 for private index 212 of FIG. 2A.
  • index 212 indexes web pages to be searched. The web pages are indexed by their URL and associated key words that an end-user may use to locate the URL (and thus the web page).
  • each web page URL in the private index 212 is associated with at least one keyword, and each associated keyword is assigned a weighting.
  • Multiple records of table 240 define the multiple key words (KEYWORD_HASH) and weights (KEYWORD_WEIGHT) associated with a single URL (HASHJJRL).
  • the UID in the KEYWORD/WEIGHT TABLE (232) is a physical primary key which serves to uniquely identify each record.
  • database storing index 212 may include a keyword/weight table 232 and a corresponding URL table 234.
  • the schema requires each institution to have its own pair of tables 232, 234. For example, an institution with a client code 'CF' would have tables 'CF_KEYWORD_WEIGHT_TABLE' and 'CF_URL_TABLE'.
  • FIG. 2C illustrates a keyword/weight database table 240 corresponding to schema 230 (FIG. 2B). More specifically, table 240 includes a plurality of records 240a - 240c, each containing a plurality of fields 232a - 232e for a particular institution. Using arbitrarily chosen values for illustration, records 240a - 240c contain, respectively: keyword hash values "72", “73”, “74”; weightings 100, 70, 90; literal keywords “KW1", “KW2", “KW3”; and hash URL values "12", "12", “12". Thus, in this particular example, an index entry for a URL having a hash value "12" includes three keywords "KW1", “KW2", “KW3” having relative weightings of 100, 70 and 90.
  • FIG. 2D is an illustrative example of a database table 250 of index 212 corresponding to URL table 234 of FIG.2B. More specifically, database table 250 includes a plurality of records 250a - 250c each having a plurality of fields 234a - 234e. Each record of table 250 provides detailed information about an indexed URL.
  • the records 250a - 250c of database 250 contain, respectively: hash URLs "12", “13", “14”; corresponding URL addresses www.l .com, www.2.com, www.3.com; titles “One”, “Two”, “Three”; descriptions "Home page for One”, “Home page for Two”, “Home page for Three”; and corresponding date/time stamps.
  • FIG. 3A is a flow chart showing exemplary steps S300A for associating keywords and assigning weightings to URLs in order to create records in table 240 and 250 of private index 212 (FIG.2A).
  • Steps 300A may be performed by server 200 under control of software exemplary of embodiments of the present invention.
  • a URL of a web page to be indexed is obtained from an administrator in step S302.
  • step S304 the web page is obtained.
  • step S306 the contents of the web page are parsed and analyzed in order to identify possible keywords that might be used to index the page. For example, keywords may be identified by their frequency in the web page, in meta-tags or in any other way understood by those of ordinary skill.
  • up to 20 of the most relevant keywords are each assigned a numerical weight, corresponding to their perceived relevance.
  • the list of keywords and weights is presented by way of an administrator interface (e.g. administrator interface 216 of FIG. 2A) to an administrator in step S307.
  • the administrator may alter the presented keywords and/or weightings by way of the administrator interface 216, for reasons that will become apparent.
  • an administrator may commit the index entry, including the list of keywords and URL, for storage as records in table 240 and table 250 of index 212, in step S308.
  • Each keyword is used to populate one row of table 240.
  • steps S300A allow an administrator acting for an institution of indexing server 110 to build a collection of indexed sites, each containing an index entry within private index 212.
  • the administrator can effectively shape obtained search results for any search performed by search algorithm 222.
  • public index 213 contains index information not assembled by an administrator of an institution, and may instead be made available by a third party index provider.
  • index 213 may contain index information found in the open directory database DMOZ - Open Directory Project available at the URL "http://www.dmoz.org".
  • the indexing information in public index 213 may be used by multiple institutions on indexing server 110.
  • index 213 is stored in a database having much the same format as the database storing index 212.
  • Index 213 may alternatively have a data structure entirely different from index 212. As index 213 is shared by multiple institutions on indexing server 110, an administrator of index 212 for a particular institution typically has no ability to alter entries of index 213.
  • an administrator for an institution may index a web site already indexed within public index 213, in private index 212.
  • indexing a site already indexed in public index 213 in private index 212 allows the administrator to control how, if at all, a site indexed in public index 213 is presented to end-users.
  • FIG. 3B shows exemplary steps S300B performed by indexing server 110 to query both private index 212 and public index 213, in response to a query request including one or more keywords input by an end-user.
  • steps S300B may be embodied in computer software, exemplary of embodiments of the present invention, including readable code written in a suitable computer language.
  • a query request including one or more keywords input at indexing server 110 by an end-user is received in step S314.
  • step S316 the keyword supplied in step S314 is used to query both private index 212 and the public index 213 to retrieve matching records in indexes 212 and 213.
  • search algorithm 222 is used to query private index 212
  • search algorithm 223 is used to query public index 213.
  • Steps S300B receive and combine the matching records in step S318.
  • a quality of match indicator is calculated in step S319.
  • the quality of match indicator is calculated by summing the weighting (e.g. as contained in field 232c of table 240) of each keyword matching the search request. (As will become apparent, in an embodiment, a predetermined value for a quality of match calculated from summing the weight of keywords matching the search request may be used to determine how the corresponding record is dealt with.)
  • a quality of match indicator may similarly be calculated for matching entries of index 213.
  • the two search algorithms 222 and 223 may individually calculate different quality of match indicators for matches of public index 212, and private index 213.
  • Index entries from the public index 213 and private index 212 may be combined in step S320.
  • the results may be combined in any number of ways. For example, index entries from public and private indexes 212 may be collectively ordered based on the quality of match calculated for each index entry. Index entries with higher quality of matches may be presented in advance of index entries having lower quality of matches. Alternatively, all matching entries from private index 212 may be presented in advance of entries from public index 213.
  • the index entry from the private index may pre-empt the index entry from public index 213. That is, instead of including both index entries from private index 212 and public index 213, only the index entry from private index 212 is possibly presented.
  • Relevant fields in the records combined at step S320 are ordered at step S322 and relevant fields are displayed to the end-user at step S324.
  • the URL field 234b of FIG.2D may be displayed to the end-user in the order determined at step S324. Additional fields such as the title field 234c, the description field 234d, and the stamp field 234e may also be displayed. Steps S300B then end.
  • FIG. 4A schematically illustrates results of an example query performed at server 110. More specifically, example private index 412 (having the structure of private index 212) indexes URL listings in block 414a. Tables 240 and 250 are suitably populated.
  • URL_A arbitrary example URLs in block 414a are labeled "URL_A”, “URL_B”, “URL_C”, and “URL_D”.
  • URL_E shown at block 414b will be explained in further detail below.
  • a list of associated keywords used to index the URL is depicted in block 424a.
  • Each of the keywords in block 424a are assigned weights shown schematically in parentheses.
  • the URLs in block 414a are controlled, for example, by an administrator for institution "Client 1".
  • keywords and weightings at 424a may be readily modified by the administrator for institution "Client 1".
  • the administrator may use exemplary method S300A of FIG. 3 to associate the keywords and assign the keyword weightings for the various URLs "URL_A" to "URL_D”.
  • a first keyword “KW1 " with a weighting of "80” and a second keyword “KW2" with a weighting of "100” are both associated with "URL_A”.
  • the same first keyword “KW1” having a different weighting of "70” and the same second keyword “KW2” having a different weighting of "90” may both be associated with "URL_B”.
  • the weighting range of 0 - 100 is arbitrarily chosen for illustration.
  • FIG. 4A further schematically illustrates entries in a public index 432, of the form of public index 213, representing a number of indexed URLs in block 434, namely, "URL_E" to "URLJT. For each URL in block 434, there is one or more associated keywords, with weightings shown in parentheses, as shown at block 442.
  • Example public index 432 is generated by a third party. It may for example be generated automatically by software that follows linked pages in order to generate an index. For each page, the software identifies a list of significant associated keywords. In addition to automatically generating the keywords, a weighting may be assigned to each keyword associated with a given web page. For example, the weighting may be derived from how frequently a given keyword appears in the web page, or whether the keyword appears in a special area of the web page, such as the title or description. The range of fixed weightings 0.00 - 0.99 shown here is arbitrarily chosen for the purposes of illustration.
  • institutional user "Client 1" has initially no effective ability to edit index entries in index 432 (i.e. "URL_E” to "URL_H” in block 434 or any of the keywords weightings shown in block 442).
  • relative weightings of a given keyword associated with a given URL in private index 412 may be readily changed by the administrator for institution "Client 1". For example, for "KW1" associated with record “URL_A”, the current weighting of "80” may be raised or lowered at will by assigning a new weighting.
  • the quality of match of "URL_A”, when a query includes the keyword "KW1” may be directly controlled such that URL_A results in a higher quality of match for keyword "KW1" and thus appears higher or lower in a list of search results.
  • any keyword may be associated with a given URL, even if that keyword is not automatically generated, and even if that keyword does not appear in the subject web page.
  • a keyword may be arbitrarily assigned to a URL for the purposes of causing that URL to appear or not appear in the search results when that keyword is used in a query. For example, if it is desirable to present "URL_D" whenever a keyword "KW9" (not shown) is entered in a query by an end-user, the keyword “KW9” is simply associated with “URL D", and a suitable weighting may be assigned to KW9 for "URL_D" in order to ensure that "URL_D" appears whenever the keyword "KW9” is used.
  • an administrator may shape the order of search results for any keyword simply by adjusting the relative weights of indexed URLs for that keyword.
  • one or more of the URLs in public index 432 may be selectively indexed by an administrator in private index 412. Specifically, in this illustrative example, "URL_E" has been indexed in private index 412.
  • the indexing of "URL_E" in private index 412 allows an administrator to affect presentation of "URL_E” in a search result.
  • the level of control over "URL_E” becomes the same as that over the other URLs in the block 414a.
  • keywords may be arbitrarily associated with "URL_E”
  • weightings may be arbitrarily assigned to those keywords by the administrator of the institution.
  • URL_E has been associated with keywords “KW1" and "KW2", with each of "KW1” and “KW2” being assigned a weighting of "0" or a "null” weighting.
  • a null weighting may be assigned if, for example, it is undesirable to include that record in combined search results when either of those two keywords "KW1 " or "KW2" are entered.
  • "URL_E” may point to the web site of the main competitor of an institution.
  • block 450 depicts search results in response to a search for keyword KW1 combining URLs obtained from both private index 412 and public index 432 (as indicated at block 452).
  • the keyword "KW1" has been entered by an end-user, as indicated at block 454a.
  • results for any URLs in public index 432 matching the keyword "KW1" may be pre-empted by corresponding URLs in private index 412 (e.g. index results corresponding to "URL_E” in block 434 of public database 432 may be pre-empted by corresponding index entry "URL_E” in block 414b of private index 412).
  • "URL_E” with a "null” weighting is shown in boldface in block 460a.
  • a list of URLs from private index 412 matching "KW1" are ordered based on keyword weighting.
  • a list of URLs from public index 432 matching "KW1" then follows, again in order of keyword weighting.
  • URLs from private index 412 are presented in advance of URLs from public index 432. This reflects an institution wanting to give first present to index entries located in its own private index 432 (i.e. corresponding to index 212) ahead of index entries found in public index 434 (i.e. corresponding to index 212).
  • a predetermined value for a quality of match calculated from summing the weight of keywords matching a search request may cause a corresponding record to be dealt with in a particular manner.
  • a null weighting for the summed weight of keywords may be used to indicate that the associated URL (URL_E in the present example) should be excluded from presentation to the end-user.
  • the preemption or discarding of an index entry from public index 434 is triggered by a common value in a key field in both the private index and the public index.
  • the key field is linked to a URL field 234b (FIG. 3B) via a linking mechanism typically found in a relational database, such as by the HASH JRL fields 232e/234a of each of table 240 and table 250, as shown in the present illustration (FIGS. 2C and 2D).
  • the pre-emption or discarding is then triggered when the identical URL is retrieved from both public index 432 and private index 434.
  • another suitable field may be used.
  • FIG. 4B is a schematic block diagram of another illustrative example using an alternative query.
  • the indexed URLs of private index 412 and index URLs of public index 432 are the same, but as shown at block 454b, the search keyword has been changed to "KW2".
  • the combined list of ordered URLs are shown having a different constitution. For example, “URL_H” now appears in the list in block 460b.
  • "URL_D” is not included, as it is only associated with “KW1" and not "KW2". Again, URLs from private index 412 are displayed in advance of URLs from public index 432.
  • any keyword or keywords with URLs in private index 412, and by assigning any selected weighting to the keywords in private index 412, substantially full control over presentation of these URLs in the combined search results 460a, 460b may be achieved.
  • selected URLs from public index 432 over which full control is desired may be indexed in private index 412, such that keywords may be associated, and keyword weightings may be assigned by the administrator for an institution. This level of control may allow selective presentation of search results such that undesirable URLs from public index 432 are excluded.
  • the administrator can also assign a suitably high weighting to keywords associated with "URL_E” so that "URL_E” is prominently displayed in the combined search results.
  • results obtained from private index 212/412 are used in place of results obtained from public index 213/432.
  • Results from private index 212/412 are treated in priority over like results from public index 213/432.
  • Embodiments of the invention could similarly include more than two indexes, each assigned a relative priority. In the event index entries sharing a like key field are retrieved in response to a search, results from the lower priority indexes are pre-empted by results from any higher priority index. Thus, only the matching result from the highest priority index would be included in any list of presented results.
  • each index may be searched by a search algorithm (like algorithm 222 or 223) associated with only that index. As indexes are added, modular search algorithms may be added to search engine 214.

Abstract

A method, device, and software for presenting search results obtained from a plurality of databases, based on an end-user specified query, is disclosed. In an embodiment, the search results are combined from results from a first index and results from a second index. The first index comprises a plurality of index entries modifiable by an administrator, and the second index comprises a plurality of index entries that are not modifiable by the administrator. In the combined search results, any search result from the second index for which an associated key field is identical to the associated key field of a matching search result in the first set of search results is discarded in favor of the matching search result in the first set of search results.

Description

METHOD, DEVICE AND SOFTWARE FOR QUERYING AND PRESENTING SEARCH RESULTS
FIELD OF THE INVENTION
[0001] The present invention relates to search engines and indexes, and more particularly to a method, device and software for querying and presenting search results obtained from a plurality of indexes.
BACKGROUND OF THE INVENTION
[0002] The rapid growth of the Internet and the World Wide Web ("web") has resulted in a proliferation of web search engines for indexing some of the billions of web pages available. As is well known, the web is a hypertext information and communication system which operates according to a client-server model, known as Hyper Text Transfer Protocol ("HTTP"). HTTP allows users to access these web pages using a standard page description language known as Hyper Text Markup Language ("HTML"). HTML may be used to access files provided in many different formats. Typically, these web pages are accessible using an addressing allowing scheme allowing pages on the web to be accessed by a Uniform Resource Locator ("URL").
[0003] By specifying a URL, an end-user is able to access virtually any accessible web page from a web server connected to the web. However, without knowledge of a URL, an end-user must typically rely on a web search engine that can search a web index or directory to locate URLs for relevant web sites.
[0004] While certain search engines ambitiously attempt to broadly index significant portions of the entire web, other search engines focus on a more specific target and are designed, for example, to exhaustively index a particular web site of an institution. Such a search of a specific target is commonly referred to as a "vertical" search. As a single web site may include hundreds or thousands of web pages, such a "vertical" search engine may be very useful.
[0005] Typically, a search engine indexes web pages by keywords. URLs are indexed against keywords contained in the associated web page. End-user can thus search web pages using the keywords. If there are one or more index entries that match the keyword(s), records corresponding to those index entries may be retrieved, and relevant fields of those records may be displayed to the end-user as matching results.
[0006] From an end-user's perspective, when visiting the web site of an institution, it is often desirable to have a flexible and robust keyword query capability. However, if a search engine is being used to search a web index for a relatively small web site, the likelihood of receiving a match may be low. Correspondingly, end-user satisfaction with the web site may be negatively affected.
[0007] In order to provide the end-user with a better chance of obtaining relevant search results, an institution may sometimes offer end-users the capability to search a public index as well. However, in offering access to a public index, the institution may be taking a risk that some of the search results may not be appropriate for presentation. As a specific example, consider a query made by the end-user that results in a match for a website operated by a main competitor of the institution. The institution may wish to avoid presenting such search results to the end-user.
[0008] However, previous efforts to query and selectively present search results from private and public indexes have been hampered by the lack of effective control over the public indexes.
[0009] A more flexible and effective approach is desirable. SUMMARY OF THE INVENTION
[0010] In accordance with the invention, there is provided a method, device, and software for presenting search results obtained from a plurality of indexes, based on an end-user specified query. In an embodiment, the search results are combined from results from a first index and results from a second index. The first index comprises a plurality of index entries modifiable by an administrator, and the second index comprises a plurality of index entries that are not modifiable by the administrator. In the combined search results, any search result from the second index for which an associated key field is identical to the associated key field of a matching search result in the first set of search results is discarded in favor of the matching search result in the first set of search results.
[0011] In an aspect of the invention, there is provided a method of presenting search results in a response to an end-user query, the search results being combined from results from a first index and results from a second index, the first index comprising a plurality of index entries modifiable by an administrator, the second index comprising a plurality of index entries that are not modifiable by the administrator, the index entries of the first index and the second index each having an associated key field, the method comprising:
(i) querying index entries of the first index using the query to extract a set of first search results, each including a value of the associated key field, each of the first search result associated with a quality of match;
(ii) querying index entries of the second index using the query to extract a set of second search results, each including a value of the associated key field, each of the second search result associated with a quality of match;
(iii) combining the first and second set of search results to generate a list of matching search results, in which any search result from the second index for which an associated key field is identical to the associated key field of a matching search result in the first set of search results is discarded, in favor of the matching search result in the first set of search results.
[0012] In an embodiment, each of the index entries comprises at least one keyword, and the querying index entries of the first index comprises matching keywords in a query to the at least one keyword for each of the index entries in the first index.
[0013] In an embodiment, each of the at least one keyword is associated with a weight, and the quality of match is calculated by summing weights for each of the one keyword that matches a keyword in the query.
[0014] In an embodiment, the associated key field of each the search result obtained from the first index identifies a uniform resource locator (URL), and the associated key field of each the search result obtained from the second index identifies a URL.
[0015] In an embodiment, each of the index entries comprises at least one keyword, and the querying index entries of the first index comprises matching keywords in a query to the at least one keyword for each of the index entries in the first index.
[0016] In an embodiment, each of the at least one keyword is associated with a weight, and the quality of match is calculated by summing weights for each of the one keyword that matches a keyword in the query.
[0017] In an embodiment, the combining in (iii) comprises utilizing the quality of match to present an ordered listing of URLs identified in each of the search results in the list of matching search results.
[0018] In an embodiment, the method further comprises excluding search results having a predetermined quality of match from being presented as part of the ordered listing.
[0019] In an embodiment, the predetermined quality of match is a null value. [0020] In a second aspect of the invention, there is provided a computing device comprising a processor and computer readable memory, the memory storing a first index and a second index, the first index comprising a plurality of index entries modifiable by an administrator, the second index comprising a plurality of index entries that are not modifiable by the administrator, the index entries of the first index and the second index each having an associated key field, search engine software adapting the device to
(i) query index entries of the first index using a query to extract a set of first search results, each including a value of the associated key field, each of the first search result associated with a quality of match;
(ii) query index entries of the second index using the query to extract a set of second search results, each including a value of the associated key field, each of the second search result associated with a quality of match;
(iii) combine the first and second set of search results to generate a list of matching search results, in which any search result from the second index for which an associated key field is identical to the associated key field of a matching search result in the first set of search results is discarded, in favor of the matching search result in the first set of search results.
[0021] In an embodiment, each of the index entries comprises at least one keyword, and the search engine software further adapts the device to query index entries of the first index by matching keywords in a query to the at least one keyword for each of the index entries in the first index.
[0022] In an embodiment, each of the at least one keyword is associated with a weight, and the search engine software further adapts the computing device to calculate the quality of match by summing weights for each of the one keyword that matches a keyword in the query.
[0023] In an embodiment, the associated key field of each the search result obtained from the first index identifies a uniform resource locator (URL), and the associated key field of each the search result obtained from the second index identifies a URL.
[0024] In an embodiment, the index entries comprises at least one keyword, and the search engine software further adapts the device to query index entries of the first index to match keywords in a query to the at least one keyword for each of the index entries in the first index.
[0025] In an embodiment, each of the at least one keyword is associated with a weight, and the search engine software adapts the computing device to calculate the quality of match by summing weights for each of the one keyword that matches a keyword in the query.
[0026] In an embodiment, the search engine software further adapts the computing device to combine the first and second set of search results from the each of the qualities of match to present an ordered listing of URLs identified in each of the search results in the list of matching search results.
[0027] In an embodiment, the search engine software further adapts the computing device to exclude search results having a predetermined quality of match from being presented as part of the ordered listing.
[0028] In an embodiment, the predetermined quality of match is a null value.
[0029] In another aspect of the invention, there is provided a computer readable medium, storing computer executable instructions that when loaded at a computing device comprising a processor and processor readable memory storing a first index and a second index, the first index comprising a plurality of index entries modifiable by an administrator, the second index comprising a plurality of index entries that are not modifiable by the administrator, the index entries of the first index and the second index each having an associated key field, adapt the computing device to:
(i) query index entries of the first index using a query to extract a set of first search results, each including a value of the associated key field, each of the first search result associated with a quality of match; (ii) query index entries of the second index using the query to extract a set of second search results, each including a value of the associated key field, each of the second search result associated with a quality of match;
(iii) combine the first and second set of search results to generate a list of matching search results, in which any search result from the second index for which an associated key field is identical to the associated key field of a matching search result in the first set of search results is discarded, in favor of the matching search result in the first set of search results.
[0030] In an embodiment, each of the index entries comprises at least one keyword, and the computer executable instructions further adapt the computing device to query index entries of the first index to match keywords in a query to the at least one keyword for each of the index entries in the first index.
[0031] In an embodiment, each of the at least one keyword is associated with a weight, and the computer executable instructions further adapt the computing device to calculate the quality of match by summing weights for each of the one keyword that matches a keyword in the query.
[0032] In an embodiment, the associated key field of each the search result obtained from the first index identifies a uniform resource locator (URL), and the associated key field of each the search result obtained from the second index identifies a URL.
[0033] In an embodiment, each of the index entries comprises at least one keyword, and the computer executable instructions further adapt the computing device to query index entries of the first index comprises code for matching keywords in a query to the at least one keyword for each of the index entries in the first index.
[0034] In an embodiment, each of the at least one keyword is associated with a weight, and the computer executable instructions further adapt the computing device to calculate the quality of match by summing weights for each of the one keyword that matches a keyword in the query. [0035] In an embodiment, the computer executable instructions further adapt the computing device to utilize the quality of match to present an ordered listing of URLs identified in each of the search results in the list of matching search results.
[0036] In an embodiment, the computer executable instructions further adapt the computing device to exclude search results having a predetermined quality of match from presentation to the end-user.
[0037] In an embodiment, the predetermined quality of match is a null value.
[0038] Other aspects and features of the present invention will become apparent to those of ordinary skill in the art upon review of the following description of specific embodiments of the invention in conjunction with the accompanying figures.
BRIEF DESCRIPTION OF THE DRAWINGS
[0039] In the figures which illustrate exemplary embodiments of the invention:
[0040] FIG. 1 A is a simplified schematic diagram of an exemplary data communications network interconnected with an indexing server exemplary of an embodiment of the present invention, in communication with a plurality of computing devices;
[0041] FIG. 1 B is a simplified schematic block diagram of a hardware architecture of the indexing server of FIG. 1A;
[0042] FIG. 2A is a logical block diagram of software and data components at the indexing server of FIGS. 1A and 1B;
[0043] FIG.2B is a schematic block diagram of an exemplary database schema for an index illustrated in FIG. 2A;
[0044] FIG. 2C is an illustrative example of a keyword/weight database table corresponding to the database schema of FIG.2B; [0045] FIG.2D is an illustrative example of a URL database table corresponding to the database schema of FIG.2B;
[0046] FIG.3A is a schematic flow chart of exemplary steps for associating keywords and assigning weightings to URLs in order to create records in the databases of FIGS. 2C and 2D;
[0047] FIG. 3B is a schematic flow chart of exemplary steps performed by the indexing server to query indexes in response to a query request;
[0048] FIG. 4A schematically illustrates search results obtained for an example query; and
[0049] FIG. 4B schematically illustrates search results obtained for another example query.
DETAILED DESCRIPTION
[0050] FIG. 1A illustrates an exemplary data communications network 100, interconnected with an indexing server 110 exemplary of an embodiment of the present invention, in communication with a plurality of computing devices 120a, 120b and 120c (individually and collectively devices 120)/
[0051] Computing devices 120 and indexing server 110 are all conventional computing devices, each including a processor and computer readable memory storing an operating system and software applications and components for execution.
[0052] Data communications network 100 may, for example, be a conventional local area network that adheres to suitable network protocol such as the Ethernet, token ring or similar protocols. Alternatively, the network protocol may be compliant with higher level protocols such as the Internet protocol (IP), Appletalk, or IPX protocols. Similarly, network 100 may be a wide area network, or the public internet. [0053] Client computing devices 120 are network aware computing devices, providing an end-user interface that allows an end-user to view information stored at indexing server 110. Computing devices 120 may for example, be conventional Windows based computing devices storing and executing an HTML compliant browser, such as a Microsoft Windows Explorer, Netscape Navigator or similar browser.
[0054] As will become apparent, indexing server 110 stores web indexing information, and may store software allowing devices 120 to search the stored indexing information.
[0055] A simplified preferred hardware architecture of an example indexing server 110 is schematically illustrated in FIG. 1B. In the illustrated embodiment, indexing server 110 is a conventional network capable server. Indexing server 110 could, for example, be an Intel x86 based computer acting as a Microsoft Windows NT, Apple, or Unix based server, workstation, personal computer or the like. Example indexing server 110 includes a processor 12, in communication with computer storage memory 114; network interface 116; input output interface 118; and video adapter 122. As well, indexing server 110 may optionally include a display 124 interconnected with adapter 122; input/output devices, such as a keyboard 126, disk drive 128, and a mouse 130 or the like. Processor 112 is typically a conventional central processing unit, and may for example be a microprocessor in the INTEL x86 family. Of course, processor 112 could be any other suitable processor known to those skilled in the art. Computer storage memory 114 includes a suitable combination of random access memory, read-only- memory, and disk storage memory used by processor 112 to store and execute software programs adapting processor 112 to function in manners exemplary of the present invention. Disk drive 128 is capable of reading and writing data to or from a computer readable medium 132 used to store software and data, exemplary of embodiments of the present invention, to be loaded into memory 114. Computer readable medium 132 may be a CD-ROM, diskette, tape, ROM-Cartridge or the like. Network interface 126 is any interface suitable to physically link server 110 to network 100. Interface 126 may, for example, be an Ethernet, ATM, ISDN interface or modem that may be used to pass data from and to network 100 or another suitable communications network.
[0056] The hardware architectures of computing devices 120 are materially similar to that of indexing server 110, and will therefore not be further detailed.
[0057] FIG.2A is a logical block diagram of software and data components at server 110. As illustrated, indexing server 110 hosts two indexes, including a private index 212 and a public index 213, as well as search engine software 214, end-user interface 215, and an administrator interface 216.
[0058] As will become apparent, an end-user at devices 120 may access search engine software 214 through network 100, to communicate with end-user interface 215. End-user interface 215 may, for example, accept search requests provided as "name = value" pairs embedded within an HTTP GET/POST request. Administrators, acting on behalf of a particular institution may modify private index 212, by way of administrator interface 216, as detailed below. (As will be appreciated, the search engine software 214 may itself be embodied as one or more software modules stored in memory and executable on a processor in the indexing server 110.)
[0059] Index 212 contains index entries to be searched by end-users. An administrator of a particular institution may modify only records of index 212 associated with that institution. End-users, in turn, are provided with index entries corresponding to one institution. Typically, end-users are provided with only access based on the web address used for end-user interface 215. Ideally, access to index 212 is provided by an institution through that institution's web site. End-users are classified as being associated with the institution whose web site they have accessed, and are only provided data from index entries controlled by that institution.
[0060] Search engine software 214 accesses two search algorithms 222 and 223, one associated with each index 212, 213, which may define how a search is to be performed on the associated index 212, 213. These search algorithms 222, 223 may be "modular" in the sense that the search algorithms 222, 223 may be modified or replaced individually.
[0061] In the disclosed embodiments, indexes 212 and 213 are stored as one or more relational databases. FIG. 2B is a schematic block diagram of an exemplary database schema 230 for private index 212 of FIG. 2A. As noted, index 212 indexes web pages to be searched. The web pages are indexed by their URL and associated key words that an end-user may use to locate the URL (and thus the web page). In the disclosed embodiment, each web page URL in the private index 212 is associated with at least one keyword, and each associated keyword is assigned a weighting. Multiple records of table 240 define the multiple key words (KEYWORD_HASH) and weights (KEYWORD_WEIGHT) associated with a single URL (HASHJJRL). The UID in the KEYWORD/WEIGHT TABLE (232) is a physical primary key which serves to uniquely identify each record.
[0062] As shown by schema 230, database storing index 212 may include a keyword/weight table 232 and a corresponding URL table 234. In an embodiment, the schema requires each institution to have its own pair of tables 232, 234. For example, an institution with a client code 'CF' would have tables 'CF_KEYWORD_WEIGHT_TABLE' and 'CF_URL_TABLE'.
[0063] FIG. 2C illustrates a keyword/weight database table 240 corresponding to schema 230 (FIG. 2B). More specifically, table 240 includes a plurality of records 240a - 240c, each containing a plurality of fields 232a - 232e for a particular institution. Using arbitrarily chosen values for illustration, records 240a - 240c contain, respectively: keyword hash values "72", "73", "74"; weightings 100, 70, 90; literal keywords "KW1", "KW2", "KW3"; and hash URL values "12", "12", "12". Thus, in this particular example, an index entry for a URL having a hash value "12" includes three keywords "KW1", "KW2", "KW3" having relative weightings of 100, 70 and 90.
[0064] FIG. 2D is an illustrative example of a database table 250 of index 212 corresponding to URL table 234 of FIG.2B. More specifically, database table 250 includes a plurality of records 250a - 250c each having a plurality of fields 234a - 234e. Each record of table 250 provides detailed information about an indexed URL. Using arbitrarily chosen values for illustration, the records 250a - 250c of database 250 contain, respectively: hash URLs "12", "13", "14"; corresponding URL addresses www.l .com, www.2.com, www.3.com; titles "One", "Two", "Three"; descriptions "Home page for One", "Home page for Two", "Home page for Three"; and corresponding date/time stamps.
[0065] FIG. 3A is a flow chart showing exemplary steps S300A for associating keywords and assigning weightings to URLs in order to create records in table 240 and 250 of private index 212 (FIG.2A). Steps 300A may be performed by server 200 under control of software exemplary of embodiments of the present invention. As illustrated, a URL of a web page to be indexed is obtained from an administrator in step S302. In step S304 the web page is obtained. In step S306, the contents of the web page are parsed and analyzed in order to identify possible keywords that might be used to index the page. For example, keywords may be identified by their frequency in the web page, in meta-tags or in any other way understood by those of ordinary skill. In an embodiment, up to 20 of the most relevant keywords (as identified in step S306) are each assigned a numerical weight, corresponding to their perceived relevance. The list of keywords and weights is presented by way of an administrator interface (e.g. administrator interface 216 of FIG. 2A) to an administrator in step S307. Optionally, the administrator may alter the presented keywords and/or weightings by way of the administrator interface 216, for reasons that will become apparent. Once edited, an administrator may commit the index entry, including the list of keywords and URL, for storage as records in table 240 and table 250 of index 212, in step S308. Each keyword is used to populate one row of table 240.
[0066] Repeated use of steps S300A allow an administrator acting for an institution of indexing server 110 to build a collection of indexed sites, each containing an index entry within private index 212. As will become apparent, by assigning desired keywords and weighting to indexed URLs, the administrator can effectively shape obtained search results for any search performed by search algorithm 222.
[0067] By contrast, public index 213 contains index information not assembled by an administrator of an institution, and may instead be made available by a third party index provider. For example, index 213 may contain index information found in the open directory database DMOZ - Open Directory Project available at the URL "http://www.dmoz.org". Advantageously, the indexing information in public index 213 may be used by multiple institutions on indexing server 110. In the disclosed embodiment, index 213 is stored in a database having much the same format as the database storing index 212.
[0068] . Index 213 may alternatively have a data structure entirely different from index 212. As index 213 is shared by multiple institutions on indexing server 110, an administrator of index 212 for a particular institution typically has no ability to alter entries of index 213.
[0069] In manners exemplary of embodiments of the present invention, an administrator for an institution may index a web site already indexed within public index 213, in private index 212. As will become apparent, indexing a site already indexed in public index 213 in private index 212 allows the administrator to control how, if at all, a site indexed in public index 213 is presented to end-users.
[0070] FIG. 3B shows exemplary steps S300B performed by indexing server 110 to query both private index 212 and public index 213, in response to a query request including one or more keywords input by an end-user. Again, it will be appreciated by those skilled in the art that steps S300B may be embodied in computer software, exemplary of embodiments of the present invention, including readable code written in a suitable computer language.
[0071] As illustrated, a query request including one or more keywords input at indexing server 110 by an end-user is received in step S314. In step S316 the keyword supplied in step S314 is used to query both private index 212 and the public index 213 to retrieve matching records in indexes 212 and 213. In the disclosed embodiment search algorithm 222 is used to query private index 212, and search algorithm 223 is used to query public index 213. Steps S300B receive and combine the matching records in step S318. For each matching site in public database 212, a quality of match indicator is calculated in step S319. In the preferred embodiment, the quality of match indicator is calculated by summing the weighting (e.g. as contained in field 232c of table 240) of each keyword matching the search request. (As will become apparent, in an embodiment, a predetermined value for a quality of match calculated from summing the weight of keywords matching the search request may be used to determine how the corresponding record is dealt with.)
[0072] A quality of match indicator may similarly be calculated for matching entries of index 213. Conveniently the two search algorithms 222 and 223 (FIG. 2A) may individually calculate different quality of match indicators for matches of public index 212, and private index 213.
[0073] Index entries from the public index 213 and private index 212 may be combined in step S320. The results may be combined in any number of ways. For example, index entries from public and private indexes 212 may be collectively ordered based on the quality of match calculated for each index entry. Index entries with higher quality of matches may be presented in advance of index entries having lower quality of matches. Alternatively, all matching entries from private index 212 may be presented in advance of entries from public index 213.
[0074] However, in the event a site is indexed in both private index 212 and public index 213, and index entries for the same site are retrieved from public index 213 and public index 212 in step S318, the index entry from the private index may pre-empt the index entry from public index 213. That is, instead of including both index entries from private index 212 and public index 213, only the index entry from private index 212 is possibly presented.
[0075] Relevant fields in the records combined at step S320 are ordered at step S322 and relevant fields are displayed to the end-user at step S324. In an embodiment, the URL field 234b of FIG.2D may be displayed to the end-user in the order determined at step S324. Additional fields such as the title field 234c, the description field 234d, and the stamp field 234e may also be displayed. Steps S300B then end.
EXAMPLES
[0076] FIG. 4A schematically illustrates results of an example query performed at server 110. More specifically, example private index 412 (having the structure of private index 212) indexes URL listings in block 414a. Tables 240 and 250 are suitably populated.
[0077] In FIG. 4A, arbitrary example URLs in block 414a are labeled "URL_A", "URL_B", "URL_C", and "URL_D". "URL_E" shown at block 414b will be explained in further detail below.
[0078] For each URL in block 414a, a list of associated keywords used to index the URL is depicted in block 424a. Each of the keywords in block 424a are assigned weights shown schematically in parentheses. The URLs in block 414a are controlled, for example, by an administrator for institution "Client 1". Thus, keywords and weightings at 424a may be readily modified by the administrator for institution "Client 1". For example, the administrator may use exemplary method S300A of FIG. 3 to associate the keywords and assign the keyword weightings for the various URLs "URL_A" to "URL_D".
[0079] In this illustrative example, a first keyword "KW1 " with a weighting of "80" and a second keyword "KW2" with a weighting of "100" are both associated with "URL_A". As another example, the same first keyword "KW1" having a different weighting of "70" and the same second keyword "KW2" having a different weighting of "90" may both be associated with "URL_B". The weighting range of 0 - 100 is arbitrarily chosen for illustration.
[0080] FIG. 4A further schematically illustrates entries in a public index 432, of the form of public index 213, representing a number of indexed URLs in block 434, namely, "URL_E" to "URLJT. For each URL in block 434, there is one or more associated keywords, with weightings shown in parentheses, as shown at block 442.
[0081] Example public index 432 is generated by a third party. It may for example be generated automatically by software that follows linked pages in order to generate an index. For each page, the software identifies a list of significant associated keywords. In addition to automatically generating the keywords, a weighting may be assigned to each keyword associated with a given web page. For example, the weighting may be derived from how frequently a given keyword appears in the web page, or whether the keyword appears in a special area of the web page, such as the title or description. The range of fixed weightings 0.00 - 0.99 shown here is arbitrarily chosen for the purposes of illustration.
[0082] In any event, institutional user "Client 1 " has initially no effective ability to edit index entries in index 432 (i.e. "URL_E" to "URL_H" in block 434 or any of the keywords weightings shown in block 442).
[0083] However, as previously shown and described with reference to FIG. 3A, relative weightings of a given keyword associated with a given URL in private index 412 may be readily changed by the administrator for institution "Client 1". For example, for "KW1" associated with record "URL_A", the current weighting of "80" may be raised or lowered at will by assigning a new weighting. In this case, the quality of match of "URL_A", when a query includes the keyword "KW1", may be directly controlled such that URL_A results in a higher quality of match for keyword "KW1" and thus appears higher or lower in a list of search results.
[0084] Advantageously, for URLs in private index 412, any keyword may be associated with a given URL, even if that keyword is not automatically generated, and even if that keyword does not appear in the subject web page. In other words, a keyword may be arbitrarily assigned to a URL for the purposes of causing that URL to appear or not appear in the search results when that keyword is used in a query. For example, if it is desirable to present "URL_D" whenever a keyword "KW9" (not shown) is entered in a query by an end-user, the keyword "KW9" is simply associated with "URL D", and a suitable weighting may be assigned to KW9 for "URL_D" in order to ensure that "URL_D" appears whenever the keyword "KW9" is used.
[0085] Conveniently, an administrator may shape the order of search results for any keyword simply by adjusting the relative weights of indexed URLs for that keyword.
[0086] In order to effectively allow the end-user to include URLs in public index 432 in any shaped search, as shown in FIG.4A, one or more of the URLs in public index 432 may be selectively indexed by an administrator in private index 412. Specifically, in this illustrative example, "URL_E" has been indexed in private index 412.
[0087] As will be apparent, the indexing of "URL_E" in private index 412 allows an administrator to affect presentation of "URL_E" in a search result. In an embodiment, the level of control over "URL_E" becomes the same as that over the other URLs in the block 414a. In other words, keywords may be arbitrarily associated with "URL_E", and weightings may be arbitrarily assigned to those keywords by the administrator of the institution.
[0088] In the present example, "URL_E" has been associated with keywords "KW1" and "KW2", with each of "KW1" and "KW2" being assigned a weighting of "0" or a "null" weighting. In an embodiment, such a null weighting may be assigned if, for example, it is undesirable to include that record in combined search results when either of those two keywords "KW1 " or "KW2" are entered. For example, "URL_E" may point to the web site of the main competitor of an institution.
[0089] To further illustrate this, block 450 depicts search results in response to a search for keyword KW1 combining URLs obtained from both private index 412 and public index 432 (as indicated at block 452). Here, the keyword "KW1" has been entered by an end-user, as indicated at block 454a. As indicated at 456, results for any URLs in public index 432 matching the keyword "KW1" may be pre-empted by corresponding URLs in private index 412 (e.g. index results corresponding to "URL_E" in block 434 of public database 432 may be pre-empted by corresponding index entry "URL_E" in block 414b of private index 412). "URL_E" with a "null" weighting is shown in boldface in block 460a.
[0090] Thus, as illustrated in the ordered list at 460a, a list of URLs from private index 412 matching "KW1" are ordered based on keyword weighting. A list of URLs from public index 432 matching "KW1" then follows, again in order of keyword weighting. In this illustrative example, URLs from private index 412 are presented in advance of URLs from public index 432. This reflects an institution wanting to give first present to index entries located in its own private index 432 (i.e. corresponding to index 212) ahead of index entries found in public index 434 (i.e. corresponding to index 212). [0091] Given the ordering of URLs from private index 412 and public index 432 as described above, relevant fields from corresponding records may be presented to the end-user, in the same order. For example, the URL field 234b (FIG. 2D) and other relevant fields may be presented to the end-user. However, in this illustrative example, even though "URL_E" is in the ordered list, as "KW1" for "URL_E" has been given a "null" weighting, "URL_E" is not displayed to the end-user. Thus, an undesirable URL obtained from the public index 213 may be effectively excluded from the combined list of search results presented to the end-user. For example, a predetermined value for a quality of match calculated from summing the weight of keywords matching a search request may cause a corresponding record to be dealt with in a particular manner. For example, a null weighting for the summed weight of keywords may be used to indicate that the associated URL (URL_E in the present example) should be excluded from presentation to the end-user.
[0092] As will be appreciated, the preemption or discarding of an index entry from public index 434 is triggered by a common value in a key field in both the private index and the public index. In an embodiment, the key field is linked to a URL field 234b (FIG. 3B) via a linking mechanism typically found in a relational database, such as by the HASH JRL fields 232e/234a of each of table 240 and table 250, as shown in the present illustration (FIGS. 2C and 2D). The pre-emption or discarding is then triggered when the identical URL is retrieved from both public index 432 and private index 434. Of course, it will be appreciated that another suitable field may be used.
[0093] FIG. 4B is a schematic block diagram of another illustrative example using an alternative query. The indexed URLs of private index 412 and index URLs of public index 432 are the same, but as shown at block 454b, the search keyword has been changed to "KW2". Thus, the combined list of ordered URLs are shown having a different constitution. For example, "URL_H" now appears in the list in block 460b. On the other hand, "URL_D" is not included, as it is only associated with "KW1" and not "KW2". Again, URLs from private index 412 are displayed in advance of URLs from public index 432.
[0094] In this example in FIG. 4B, "URL_E" of block 414b again pre-empts use of the index entry for "URL_E" of block 434. "URL_E" with its "null" weighting for "KW2" is shown in boldface in block 460b. Again, as "KW2" for "URL_E" has also been given a "null" weighting "URL_E" is not displayed to the end-user.
[0095] As will be appreciated, by associating any keyword or keywords with URLs in private index 412, and by assigning any selected weighting to the keywords in private index 412, substantially full control over presentation of these URLs in the combined search results 460a, 460b may be achieved. Advantageously, selected URLs from public index 432 over which full control is desired may be indexed in private index 412, such that keywords may be associated, and keyword weightings may be assigned by the administrator for an institution. This level of control may allow selective presentation of search results such that undesirable URLs from public index 432 are excluded.
[0096] Alternatively, if it is desired to promote a particular URL for more prominent display (e.g. "URL_E") from public index 432, the administrator can also assign a suitably high weighting to keywords associated with "URL_E" so that "URL_E" is prominently displayed in the combined search results.
[0097] As should now be appreciated, in order to allow institutions great flexibility, results obtained from private index 212/412 are used in place of results obtained from public index 213/432. Results from private index 212/412 are treated in priority over like results from public index 213/432. Embodiments of the invention could similarly include more than two indexes, each assigned a relative priority. In the event index entries sharing a like key field are retrieved in response to a search, results from the lower priority indexes are pre-empted by results from any higher priority index. Thus, only the matching result from the highest priority index would be included in any list of presented results. Advantageously, each index may be searched by a search algorithm (like algorithm 222 or 223) associated with only that index. As indexes are added, modular search algorithms may be added to search engine 214.
[0098] Of course, the above described embodiments are intended to be illustrative only and in no way limiting. The described embodiments of carrying out the invention are susceptible to many modifications of form, arrangement of parts, details and order of operation. The invention, rather, is intended to encompass all such modification within its scope, as defined by the claims.

Claims

WHAT IS CLAIMED IS:
1. A method of presenting search results in a response to an end-user query, said search results being combined from results from a first index and results from a second index, said first index comprising a plurality of index entries modifiable by an administrator, said second index comprising a plurality of index entries that are not modifiable by said administrator, said index entries of said first index and said second index each having an associated key field, said method comprising:
(i) querying index entries of said first index using said query to extract a set of first search results, each including a value of said associated key field, each of said first search result associated with a quality of match;
(ii) querying index entries of said second index using said query to extract a set of second search results, each including a value of said associated key field, each of said second search result associated with a quality of match;
(iii) combining said first and second set of search results to generate a list of matching search results, in which any search result from said second index for which an associated key field is identical to the associated key field of a matching search result in said first set of search results is discarded, in favor of said matching search result in said first set of search results.
2. The method of claim 1, wherein each of said index entries comprises at least one keyword, and said querying index entries of said first index comprises matching keywords in a query to said at least one keyword for each of said index entries in said first index.
3. The method of claim 2, wherein each of said at least one keyword is associated with a weight, and said quality of match is calculated by summing weights for each of said one keyword that matches a keyword in said query.
4. The method of claim 1, wherein said associated key field of each said search result obtained from said first index identifies a uniform resource locator (URL),- and said associated key field of each said search result obtained from said second index identifies a URL.
5. The method of claim 4, wherein each of said index entries comprises at least one keyword, and said querying index entries of said first index cornprises matching keywords in a query to said at least one keyword for each of said index entries in said first index.
6. The method of claim 5, wherein each of said at least one keyword is associated with a weight, and said quality of match is calculated by summing weights for each of said one keyword that matches a keyword in said query.
7. The method of claim 6, wherein said combining in (iii) comprises utilizing said quality of match to present an ordered listing of URLs identified in each of said search results in said list of matching search results.
8. The method of claim 7, further comprising excluding search results having a predetermined quality of match from being presented as part of said ordered listing.
9. The method of claim 8, wherein said predetermined quality of match is a null value.
10. A computing device comprising a processor and computer readable memory, said memory storing a first index and a second index, said first index comprising a plurality of index entries modifiable by an administrator, said second index comprising a plurality of index entries that are not modifiable by said administrator, said index entries of said first index and said second index each having an associated key field, search engine software adapting said device to (i) query index entries of said first index using a query to extract a set of first search results, each including a value of said associated key field, each of said first search result associated with a quality of match;
(ii) query index entries of said second index using said query to extract a set of second search results, each including a value of said associated key field, each of said second search result associated with a quality of match;
(iii) combine said first and second set of search results to generate a list of matching search results, in which any search result from said second index for which an associated key field is identical to the associated key field of a matching search result in said first set of search results is discarded, in favor of said matching search result in said first set of search results.
11. The device of claim 10, wherein each of said index entries comprises at least one keyword, and said search engine software further adapts said device to query index entries of said first index by matching keywords in a query to said at least one keyword for each of said index entries in said first index.
12. The device of claim 11, wherein each of said at least one keyword is associated with a weight, and said search engine software further adapts said computing device to calculate said quality of match by summing weights for each of said one keyword that matches a keyword in said query.
13. The device of claim 10, wherein said associated key field of each said search result obtained from said first index identifies a uniform resource locator (URL), and said associated key field of each said search result obtained from said second index identifies a URL.
14. The device of claim 13, wherein each of said index entries comprises at least one keyword, and said search engine software further adapts said device to query index entries of said first index to match keywords in a query to said at least one keyword for each of said index entries in said first index.
15. The device of claim 14, wherein each of said at least one keyword is associated with a weight, and said search engine software adapts said computing device to calculate said quality of match by summing weights for each of said one keyword that matches a keyword in said query.
16. The device of claim 15, wherein said search engine software further adapts said computing device to combine said first and second set of search results from said each of said qualities of match to present an ordered listing of URLs identified in each of said search results in said list of matching search results.
17. The device of claim 16, wherein said search engine software further adapts said computing device to exclude search results having a predetermined quality of match from being presented as part of said ordered listing.
18. The device of claim 17, wherein said predetermined quality of match is a null value.
19. A computer readable medium, storing computer executable instructions that when loaded at a computing device comprising a processor and processor readable memory storing a first index and a second index, said first index comprising a plurality of index entries modifiable by an administrator, said second index comprising a plurality of index entries that are not modifiable by said administrator, said index entries of said first index and said second index each having an associated key field, adapt said computing device to:
(i) query index entries of said first index using a query to extract a set of first search results, each including a value of said associated key field, each of said first search result associated with a quality of match;
(ii) query index entries of said second index using said query to extract a set of second search results, each including a value of said associated key field, each of said second search result associated with a quality of match;
(iii) combine said first and second set of search results to generate a list of matching search results, in which any search result from said second index for which an associated key field is identical to the associated key field of a matching search result in said first set of search results is discarded, in favor of said matching search result in said first set of search results. I
20. The computer readable medium of claim 19, wherein each of said index entries comprises at least one keyword, and said computer executable instructions further adapt said computing device to query index entries of said first index to match keywords in a query to said at least one keyword for each of said index entries in said first index.
21. The computer readable medium of claim 20, wherein each of said at least one keyword is associated with a weight, and said computer executable instructions further adapt said computing device to calculate said quality of match by summing weights for each of said one keyword that matches a keyword in said query.
22. The computer readable medium of claim 19, wherein said associated key field of each said search result obtained from said first index identifies a uniform resource locator (URL), and said associated key field of each said search result obtained from said second index identifies a URL.
23. The computer readable medium of claim 22, wherein each of said index entries comprises at least one keyword, and said computer executable instructions further adapt said computing device to query index entries of said first index comprises code for matching keywords in a query to said at least one keyword for each of said index entries in said first index.
24. The computer readable medium of claim 23, wherein each of said at least one keyword is associated with a weight, and said computer executable instructions further adapt said computing device to calculate said quality of match by summing weights for each of said one keyword that matches a keyword in said query.
25. The computer readable medium of claim 24, wherein said computer executable instructions further adapt said computing device to utilize said quality of match to present an ordered listing of URLs identified in each of said search results in said list of matching search results.
26. The computer readable medium of claim 25, wherein said computer executable instructions further adapt said computing device to exclude search results having a predetermined quality of match from presentation to said end-user.
27. The computer software program product of claim 26, wherein said predetermined quality of match is a null value.
PCT/CA2003/001283 2003-08-29 2003-08-29 Method, device and software for querying and presenting search results WO2005022401A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/CA2003/001283 WO2005022401A1 (en) 2003-08-29 2003-08-29 Method, device and software for querying and presenting search results
AU2003258430A AU2003258430B2 (en) 2003-08-29 2003-08-29 Method, device and software for querying and presenting search results
CA2537269A CA2537269C (en) 2003-08-29 2003-08-29 Method, device and software for querying and presenting search results

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CA2003/001283 WO2005022401A1 (en) 2003-08-29 2003-08-29 Method, device and software for querying and presenting search results

Publications (1)

Publication Number Publication Date
WO2005022401A1 true WO2005022401A1 (en) 2005-03-10

Family

ID=34230626

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2003/001283 WO2005022401A1 (en) 2003-08-29 2003-08-29 Method, device and software for querying and presenting search results

Country Status (3)

Country Link
AU (1) AU2003258430B2 (en)
CA (1) CA2537269C (en)
WO (1) WO2005022401A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740405A (en) * 2016-01-29 2016-07-06 华为技术有限公司 Data storage method and device
CN111400253A (en) * 2020-03-17 2020-07-10 北京华通人商用信息有限公司 Statistical data query method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6018733A (en) * 1997-09-12 2000-01-25 Infoseek Corporation Methods for iteratively and interactively performing collection selection in full text searches
US6094649A (en) * 1997-12-22 2000-07-25 Partnet, Inc. Keyword searches of structured databases
US20020194162A1 (en) * 2001-05-16 2002-12-19 Vincent Rios Method and system for expanding search criteria for retrieving information items
US20030018624A1 (en) * 2001-07-20 2003-01-23 International Business Machines Corporation Scalable eContent management system and method of using the same

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6018733A (en) * 1997-09-12 2000-01-25 Infoseek Corporation Methods for iteratively and interactively performing collection selection in full text searches
US6094649A (en) * 1997-12-22 2000-07-25 Partnet, Inc. Keyword searches of structured databases
US20020194162A1 (en) * 2001-05-16 2002-12-19 Vincent Rios Method and system for expanding search criteria for retrieving information items
US20030018624A1 (en) * 2001-07-20 2003-01-23 International Business Machines Corporation Scalable eContent management system and method of using the same

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
W.MENG, C.YU, KING-LUPO LIU: "Building Efficient and Effective Metasearch Engines", ACM COMPUTER SURVEYS, vol. 34, no. 1, March 2002 (2002-03-01), pages 48 - 89, XP002284747, Retrieved from the Internet <URL:http://delivery.acm.org/10.1145/510000/505284/p48-meng.pdf?key1=505284&key2=5648837801&coll=GUIDE&dl=ACM&CFID=22838426&CFTOKEN=90642750> [retrieved on 20040616] *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740405A (en) * 2016-01-29 2016-07-06 华为技术有限公司 Data storage method and device
CN105740405B (en) * 2016-01-29 2020-06-26 华为技术有限公司 Method and device for storing data
US11030178B2 (en) 2016-01-29 2021-06-08 Huawei Technologies Co., Ltd. Data storage method and apparatus
CN111400253A (en) * 2020-03-17 2020-07-10 北京华通人商用信息有限公司 Statistical data query method and device, electronic equipment and storage medium
CN111400253B (en) * 2020-03-17 2023-04-21 北京华通人商用信息有限公司 Statistical data query method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
AU2003258430B2 (en) 2010-12-09
AU2003258430A1 (en) 2005-03-16
CA2537269A1 (en) 2005-03-10
CA2537269C (en) 2016-08-02

Similar Documents

Publication Publication Date Title
US7440964B2 (en) Method, device and software for querying and presenting search results
US6763362B2 (en) Method and system for updating a search engine
US7171409B2 (en) Computerized information search and indexing method, software and device
US7487145B1 (en) Method and system for autocompletion using ranked results
US11163802B1 (en) Local search using restriction specification
US7653623B2 (en) Information searching apparatus and method with mechanism of refining search results
US6907423B2 (en) Search engine interface and method of controlling client searches
US7499940B1 (en) Method and system for URL autocompletion using ranked results
US7539669B2 (en) Methods and systems for providing guided navigation
US7185088B1 (en) Systems and methods for removing duplicate search engine results
US7478049B2 (en) Text generation and searching method and system
US7383299B1 (en) System and method for providing service for searching web site addresses
US8583808B1 (en) Automatic generation of rewrite rules for URLs
US20020123988A1 (en) Methods and apparatus for employing usage statistics in document retrieval
JPH1091638A (en) Retrieval system
US20090222454A1 (en) Method and data processing system for restructuring web content
US20100049762A1 (en) Electronic document retrieval system
EP2181400A1 (en) Method and apparatus for generating search keys based on profile information
WO2004031896A2 (en) System and method for accessing medical records
CA2537269C (en) Method, device and software for querying and presenting search results
AU2004269436B2 (en) Method, device and software for querying and presenting search results
US20030163458A1 (en) Method and apparatus for storing and retrieving data

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU CA MX US

WWE Wipo information: entry into national phase

Ref document number: 2537269

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2003258430

Country of ref document: AU