WO2011018453A1 - Method and apparatus for searching documents - Google Patents

Method and apparatus for searching documents Download PDF

Info

Publication number
WO2011018453A1
WO2011018453A1 PCT/EP2010/061604 EP2010061604W WO2011018453A1 WO 2011018453 A1 WO2011018453 A1 WO 2011018453A1 EP 2010061604 W EP2010061604 W EP 2010061604W WO 2011018453 A1 WO2011018453 A1 WO 2011018453A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
electronic documents
search
document
results
Prior art date
Application number
PCT/EP2010/061604
Other languages
French (fr)
Inventor
Robert Harper
Philip Tee
Original Assignee
Nuvoti Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nuvoti Ltd. filed Critical Nuvoti Ltd.
Publication of WO2011018453A1 publication Critical patent/WO2011018453A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3325Reformulation based on results of preceding query
    • G06F16/3326Reformulation based on results of preceding query using relevance feedback from the user, e.g. relevance feedback on documents, documents sets, document terms or passages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Definitions

  • the field of the present invention relates to a system and a method for searching for content in the form of an electronic document stored on a worldwide web.
  • the internet often referred to as the worldwide web, is an extremely large interconnected network of electronic information.
  • the electronic information is stored on the worldwide web in the form of electronic documents.
  • the internet is a network that comprises millions of private and public networks that are linked together.
  • the internet carries various contents in the form of a plurality of electronic documents that can be searched by a user of the internet.
  • the user can search for an electronic document by well-known search engines, such as Google or Yahoo.
  • a user conducting the search on the internet typically enters into a search engine a search string.
  • the search engine will then query the search string with a plurality of further websites to see whether the search string matches a portion of one or more of the electronic documents.
  • Such a search procedure makes the searching process for particular one of the electronic documents over the internet laborious and time consuming until the user finds the electronic documents that are truly relevant to the information sought.
  • Another search strategy implemented by the search engine will search an embedded tag in a webpage or in the electronic document, which may be relevant to the search string entered by the user.
  • the relevance of the results of the search may not be applicable to the information sought by the user.
  • embedded tags are inserted to direct the user to one of the websites or one of the electronic documents that are of marginal interest. Therefore the user may have to use a plurality of search engines offered by a plurality of websites to find the relevant information sought.
  • Another problem of the existing search methods provided by search engines is the inability of the search engine to track the feedback of a user relevant to the search.
  • the ability to track user feedback enables third user to rely upon a "good” or a "bad” feedback score for the electronic document searched.
  • the reliance on a feedback score allows a third user to be confident of the contents of the electronic document before reviewing the electronic document.
  • Internet cookies are a means to track the user of the internet content.
  • the internet cookies fail to provide a means whereby the feedback relevant to a search for specific internet content can be analyzed and "voted" upon by the user.
  • the internet cookies rely on an IP address of the user and recognize when the user returns to a specific page of the internet content.
  • United States patent application publication number US 2008/0114739 (Paul Hayes, NJ) is titled "System and method for searching for internet-accessible content”.
  • the '739 document discloses a system and method for searching for internet accessible content.
  • the '739 document discloses one or more meta servers or sites which store information, in the form of link map data, about the structure and location of content on one more internet host servers (e.g. website host servers) or one or more peer-to- peer networks.
  • the meta servers also store processed content provided by the one or more internet host servers, such that content on the one or more internet host servers is indexed and stored at the meta servers.
  • the content on the internet host servers that is not in HTML format is converted by the system of the Hayes '739 patent application into HTML content.
  • the content is indexed and stored at the one or more meta servers.
  • the indexed and stored content, as well as and link map data, of each meta site allows for a plurality of different types of content including static content, dynamically-generated content and content on peer-to-peer networks to be searched in real time.
  • the content also stored on the one or more internet host servers is linked to the indexed and stored content on the meta servers and is accessible to the users.
  • the '739 document discloses a plug-in which includes a toolbar. The plug-in provides an otherwise standard web browser with the ability to secure or authenticate the identity of voters, to securely collect user votes, and to securely transmit user information and votes to a central repository for secure storage.
  • the toolbar allows a vote based reordering of search results from other popular search engines in real time.
  • the '739 document also discloses a central repository which tracks user votes and query progressions.
  • the '739 document also provides a method for improving search results generated by a search engine and reducing unwanted advertising associated with search results.
  • the '739 disclosure discloses a method whereby a search query is sent to a search engine and a single search result representing the best result is displayed to the user while the remaining ones of the search results are suppressed. The user then votes on the single search result, either rejecting or accepting the result. If the result is accepted the remaining search results are displayed to the user, along with paid advertisements.
  • United States patent application publication number US 2006/0294040 (Ning Zhu, Florida) is titled "Method for using, adding and removing category in directory by documents”.
  • the '040 document discloses a directory service for web pages.
  • the '040 allows for web pages to add, vote to remove and support categories in a directory.
  • Each of the web pages can have one category.
  • the '040 document teaches that the category information is put into the web page itself. A crawler will gather the category information from the web pages periodically.
  • the '040 document discloses a method to score the categories so that the categories can be listed in order of their scores.
  • Google provides an AdSense service that provides the user of the search engine relevant advertisement to the search conducted. If the user clicks-through the advertisement then revenue is paid by the advertiser to the operator of the search engine.
  • US patent application publication No. 2004/0059708 which is owned by Google and is titled Methods and apparatus for serving relevant advertisements.
  • the '708 document teaches a system and method whereby the relevance of advertisements to a users interests is improved.
  • the content of a web page is analyzed to determine a list of one or more topics associated with that web page.
  • An advertisement is considered to be relevant to that web page if it is associated with keywords belonging to the list of one or more topics.
  • One or more of these relevant advertisements may be provided for rendering in conjunction with the web page or related web pages.
  • the present invention teaches a system and a method for searching for an electronic document that is located on the Internet.
  • the system comprises a database that is connected to a search engine wherein the database stores information relating to pages that have been searched by a user.
  • the information identifies one of a plurality of electronic documents and allows a user and/ or one of a plurality of third users to vote upon the relevance of the electronic document with respect to a search string.
  • the present invention enables the generation of a database of expert users.
  • the selection of the expert users is based upon multiple criteria that include but is not limited to the voting of the relevance of the electronic document and the activity of the users.
  • the voting of the electronic document provides a voting result with the advantage that it enables rapid access to the electronic document as it provides a user with a score as to the relevance of the electronic document.
  • a means is provided for the user to receive specific information i.e. advertisements that are relevant to the electronic documents searched by the user and the voting result of the electronic document.
  • the user or one of a plurality of third users is allowed to connect further electronic documents to the results of the search enabling the user to reliably locate information that is connected to and or relevant to the search conducted.
  • the user is able to specify a manner and a type of connection between the further electronic documents and the electronic documents.
  • the further electronic documents have an arbitrary user-defined context, as will be explained below.
  • the present invention also enables the user in one aspect to continually refine search queries [0023]
  • the invention further provides in a further aspect the user the ability to receive advertising that is relevant to the search of which subject matter the user is interested.
  • the advertising received by the user can also be influenced by the user's interests and the user's expertise as well as by the search query.
  • a further aspect of the invention allows for the maintenance of a dynamic search procedure for a user by providing a map of related internet information based upon the categorization by different ones of the users of the same (or similar) pages.
  • Figure 1 shows a schematic of an apparatus for searching and grouping a subset of a plurality of electronic documents according to the present invention.
  • Figure 2 shows a schematic of a device for the selection and grouping of a subset of a plurality of electronic documents according to the present invention.
  • Figure 3 shows a schematic of a method for the selection and grouping of a subset of a plurality of an electronic document according to the present invention.
  • Figure 4 shows a schematic for an exemplary method for retrieving and voting on a selection and grouping of a subset of a plurality of documents according to the present invention.
  • Figure 5 shows a schematic of a method for evaluating users according to the present invention.
  • Figure 6 shows a schematic of a method for the generating advertising for display to a user on a terminal according to the present invention.
  • Figure 7 shows a schematic of a virtual map relating to a search conducted by a user.
  • Figure 8 shows an overview of the information stored on a database according the present invention.
  • Figure 9 shows a schematic representation of a system according to another aspect of the invention.
  • Figure 1 shows an apparatus 10 for searching and grouping of a subject of a plurality of electronic documents 70 that are searched for by a user 20 over a web 60.
  • the web 60 refers to the internet, but could also be an internal intranet, documents stored on a user computer or a combination of the two.
  • the apparatus 10 shows a plurality of users 20 who are at a terminal for accessing the web 60.
  • the users 20 also includes the user 20 who is performing the search for the relevant electronic document 70 and includes further users 20 who are not the user 20 performing the search (i.e. not entering a search string).
  • the terminal includes any means for the user 20 to access the web 60 and includes, but is not limited to, a personal computer, PDA, mobile phone etc.
  • the user 20 can access the web 60 by a network connection 25.
  • the network connection 25 includes any known method in the art for connecting to the web 60, such as fixed line communication or a mobile communication.
  • the users 20 are connected to a search engine 30 and, in some aspects of the invention, to an advertising engine 80.
  • the search engine 30 is connected to the web 60 by an internet connection 50.
  • the search engine 30 is connected to a database 40.
  • the database 40 is used to store details of the searches performed by the users 20.
  • the search engine 30 comprises a processing device 35.
  • the processing device 35 enables to the search and allocates a selected one or more of the plurality of electronic documents 70 to one or more of a plurality of groups 45, as will be explained later.
  • the groups 45 can be envisaged to be a virtual "bookshelves" storing, for example, at least some of the electronic documents 70 generated as results of the search, as will become clearer later.
  • the groups 45 can also store further ones of the electronic documents 70 that are not generated by the search engine 30. Such further electronic documents 70 could include other electronic documents 70 that the user 20 discovers through general browsing or following links in other electronic documents 70. It will be noted at this stage that the electronic documents 70 may be available and stored on the world wide web or on a private intranet (or indeed on the user's own personal computer)).
  • the search engine 30 further comprises a selection device.
  • the selection device is responsive to an input of the user 20 for selecting one or more of the plurality of electronic documents 70.
  • the database 40 stores a document identifier 47 wherein the document identifier 47 is used to identify one of a plurality of the electronic documents 70.
  • the identifiers 47 could be in one aspect of the disclosure the URL (Uniform Resource Locator) of the electronic documents 70 or could be a Digital Object Identifier, such as that known from the ANSI/NISO standard Z39.84-2005 Syntax for the Digital Object Identifier.
  • the advertising engine 80 has a plurality of modes of operation.
  • the function of the advertising engine 80 includes but is not limited to matching interests of the user 20 with a selection of advertisements held in the advertising engine 80 (or an associated database) with interests of the users 20.
  • the selection is based, for example, on the groups 45 established by the user 20 and/or by tags given by the user 20.
  • the advertising engine 80 can use the search terms used by the user 20 in the search and map these search terms with the advertising to generate relevant advertising.
  • Figure 2 shows a schematic of a screenshot of a main screen 200 when the user 20 is using the present invention for searching the web 60 for one of the electronic documents 70.
  • the user 20 will access the main screen 200 using the terminal.
  • the user 20 will enter the search string.
  • the search string is a keyword for which the user 20 wishes to find one or more electronic document 70 relating to the keyword.
  • the user 20 will input the search string using, for example, a keyboard connected to the terminal.
  • the user 20 inputs the search string into a search bar 210.
  • the search string is "photography”.
  • the user 20 wishes to search the web 60 for electronic documents 20 that pertain to photography.
  • the user 20 will then click a search button 215 to begin the search of the web 60 for the electronic document 70 including the search term "photography”.
  • the results of the search are displayed in a result window 220.
  • the result window 220 shows a title of the relevant electronic documents 70 and a summary of the relevant ones of the electronic documents 70.
  • search results 220a - 220f six examples of the relevant documents 70 that pertain to the search string photography, search results 220a - 220f. It will be noted that the example shown in Fig. 2 is merely exemplary. Other arrangements of the search results 220a- 220f in the results window 220 as well as other clickable buttons can be envisaged.
  • the result display 260 contains a number of clickable buttons that allows the user 20 to manipulate information pertaining to the selected one of the plurality of search results 220a-220f.
  • the clickable buttons in the example of Fig.2 are a bookmark button 225, a vote button 275, a connection button 265, and a journey button
  • the main screen 200 in the example of Fig. 2 also comprises a library section 250.
  • the library section 250 contains a searches and journeys category 245, a bookshelf category 230, a connections category 235 and an account category 240.
  • the searches and journey category 245 By selecting the searches and journey category 245, the user 20 is presented with a list of recent search activity conducted by the user 20.
  • the search activity includes the searches activity of the user 20 during the last month, the last week or the current day.
  • the search activity could also show a tree representation of the user's 20 search activity.
  • By clicking the journey button 270 the user 20 is presented with a journey map 261 that is relevant to the search conducted.
  • the journey map 261 allows the user 20 to navigate to electronic documents 70 on the web 60 which are connected to the search conducted.
  • the journey feature will be discussed later, in more detail with reference to figure 7.
  • the bookshelf category 230 is used to categorize electronic documents 70 when the user 20 clicks the bookmark button 225.
  • the bookshelf category 230 can contain a plurality of sub-directories. The sub-directories are more specific with respect to the search conducted by the user 20. In the example with respect to photography shown in figure 2, the bookshelf category 230, contains two sub-directories 230a and 230b, for sports and Java respectively.
  • connection button 265 in the result display window 260 can be used to create a connection between the search result of the electronic document 70 and another one of the plurality of electronic documents 70 to which the searched electronic document 70 may have a connection.
  • the result of the electronic document 70 shown in the result display 260 could have a connection 235a to other one of the electronic documents 70 in the area of photography or a connection 235b to a further electronic document 70 in the category of photo lenses.
  • the feature of creating connections 235 between electronic documents 70 enables the user 20 to rapidly locate electronic documents 70 that are relevant to the electronic document 70 initially searched.
  • the connections 235 allow the user 20 to make connections between electronic documents 70 in a manner desired by the user 20.
  • the account category 240 allows the user 20 to manage a user account and select preferences.
  • the account category 240 includes details of the user 20, user preferences and provides a means for managing the bookshelves 230a, 230b.
  • a method for the selection and grouping of a subset of the plurality of electronic documents 70 to form the bookshelf will now be described with reference to figure 3.
  • a first step 305 is the start.
  • step 310 the user 20 inputs the search string i.e. the search term that is relevant to the search being conducted into the search bar 210.
  • the user 20 will then click the search button 225 in step 215 to commence the search of the web 60.
  • step 315 the web 60 is searched for relevant content with respect to the search string entered by the user 20.
  • the search is carried out using one or more search engines which can include but are not limited to the Google search engine or the Yahoo search engine as well as using the system described in this application
  • the relevant content will be one or more electronic documents 70 which include the search string.
  • the relevant content can be hierarchically ordered using different document source indices provided by the different ones of the search engines. It will be explained later that the relevant result of the search for the electronic document 70 can be further refined using past history and voting results.
  • a list of the search results are presented to the user 20 in the results window 220.
  • the results window 220 displays at least one of the search results.
  • the user 20 can select one search result from the list of the search results displayed in the results window 220.
  • a summary of the selected search result 220c is displayed in the result display 260.
  • the summary of the one 220c selected search result could be part of the electronic document 70 or an abstract of the electronic document 70 related to the one search result 220c.
  • the user 20 can then bookmark the one 220c selected search result in step 330 by clicking the bookmark button 225.
  • the user 20 categorizes the one 220c selected search result into one of the bookshelves 230a, 230b and the one 220c selected search result of the search is saved in the selected bookshelf category 230.
  • the user 20 can access the result of the saved bookshelf any time. It will be noted at this stage that the bookshelves 230a, 230b can also be accessed by other ones of the users 20 including other users 20 who did not create or previously store any electronic documents 70 into one of the bookshelves 230a, 230b. It will be further noted that the user 20 may add tags to the electronic document 70. The user- generated tag can also be used to categorize the electronic documents 70.
  • the bookmark that is displayed in the bookshelf category 230 is stored in the database 40 in step 335.
  • the database 40 has a number of tables 42a, 42b to store information relating to the one selected search results 220c, as will be explained later.
  • the user 20 can continue with a further selection of one of the plurality of search results 220a, 220b 220d, 220e & 220f from the result window 220 from the initial search conducted. In this case the user 20 returns to step 330. The user 20 can select another one from the list of the search results 220a, 220b 220d, 220e & 220f displayed in the result window 220.
  • the user 20 can save the other search result (one of 220a, 220b 220d, 220e & 220f) as a bookmark by clicking the bookmark button 225.
  • the user 20 can then commence a new search as in step 345 whereby the process begins as in step 305.
  • the user 20 might carry out the new search for a related search term "photographers” instead of the search term "photography”.
  • the related search term "photographer" will produce a new list of the search results, which can be selected and saved by the user 20.
  • the user 20 does not have to create a further bookshelf category 230. Instead the user 20 can still use the existing bookshelf category 230. This allows the user 20 to group all of the relevant electronic documents 70 into bookshelf categories 230 with names with which the user 20 is familiar.
  • a document identifier 47 that is relevant to the searched electronic document 70 is saved.
  • the document identifier 47 is typically the uniform resource locator (URL) of the electronic document 70 but could also be another identifier, such as a digital object identifier.
  • Figure 4 shows a schematic of a method used by the user 20 for retrieving a previous search and reviewing the previous search and voting upon the search results of the previous search.
  • the ability to vote on the electronic document 70 allows a second user 20 to search more accurately as votes are associated with the electronic document 70 in the form of a voting result 49.
  • the system of the invention can also be used to categorize the quality of the electronic document 70 by virtue of the voting result 49.
  • the voting result 49 allows the second user 20 to rapidly identify qualitatively good ones of the electronic documents 70, which are relevant to the search requested.
  • the voting on the electronic documents 70 can also be carried out at other points during the search procedure. This is done by clicking an appropriate voting button on the main screen 200 or on another screen. The user 20 is given an opportunity to vote on the electronic document 70 being displayed.
  • the start 405 is followed by the user 20 retrieving a previously stored search result from the bookshelf category 230 in step 410.
  • the user 20 could be either the original user 20 or more preferably is the second user 20.
  • the user 20 can then select one of a plurality of electronic documents 70 that are displayed in the results window 220.
  • the selected electronic document 70 will then be displayed in the results window 220.
  • the summary of the selected electronic document 70 is displayed in more detail in the results display 260, as discussed above.
  • the user 20 can then review the electronic document 70 in step 420 as s/he wishes.
  • the user 20 can vote on the "quality" of the electronic document 70 in step 425.
  • the user can vote on the document 70 by making a selection on the vote button 275.
  • the user can vote on the electronic document 70 with a positive result, a negative result, or a result whereby the user is undecided as to the relevance and accuracy (or other factors affecting the "quality") of the electronic document 70 with respect to the search entered.
  • the vote is stored in the database 40 in step 430 as the voting result 49. Tables are used for storing the voting result 49, as will be discussed below.
  • the voting result 49 of the electronic document 70 is used in one aspect of the invention to control the order of display of the electronic documents 70 retrieved in a search as explained above.
  • the system provides a method for evaluating the users 20.
  • the advantage of evaluating the users 20 is that it provides a means for determining who are "expert" users 20, i.e. which ones of the users 20 could be considered experts in their field. Details of the expert users 20 can be observed by other users 20. This feature allows other users 20 to trust the voting results that the electronic documents 70 have received from the expert user, thus determining the quality of the user 20 with respect to the score the electronic document 70 has received. This aspect of the invention is described in greater detail with reference to figure 5.
  • the method for evaluating a user 20 begins in step 510.
  • step 520 the system selects the user 20 to be evaluated.
  • the system selects in step 530 the electronic documents 70 that the user 20 has bookmarked and divides these bookmarked electronic documents 70 by category.
  • the system reviews the voting result 49 for each one of the categories that the bookmarked electronic documents 70 have received from the other users 20.
  • the system then tabulates the voting results 49 received for the user 20 to be evaluated with the subject matter of the bookmarked electronic documents 70. The tabulation of the voting results 49 is carried out for each one of the categories to allow an evaluation of the users 20 for each one of the categories.
  • the voting results 49 are weighted by the degree of expertise of the other users 20.
  • the votes given by other users 20 who are considered to be experts in that category to which the electronic document 70 is assigned will be assigned a higher weight.
  • Natural language processing methods or ontologies can be used to relate different ones of the categories to each other. So, for example, some categories may be synonyms of other categories. In this case it would make sense to use all of the voting results 49 for all of the categories to assess the level of expertise of the user 20. Other categories may be "supercategories" of a plurality of subcategories. In this case it is useful to define experts for both the supercategory and the subcategory.
  • the invention allows the user 20 to receive selected advertisements that are relevant to the search conducted by the user 20 such that the advertisements display content that might be of interest to the user. So, for example, if the user selects the search term "photography", advertisements relating to photography will be displayed. This is similar to the AdSense system used by Google to produce relevant advertisements to users of the Google search engine.
  • the advertisements that are displayed to the user 20 can be dynamically modified in accordance with one or more of the following criteria: for example with the voting result 49 the electronic documents 70 have received during the voting process, with the selection of the search results, by reviewing the account of the user 20, by reviewing previous ones of the searches carried out by the user 20, the categories of interest to the user 20, the bookmarked electronic documents 70 of the user 20 and the categories for which the user 20 is regarded as an expert.
  • This aspect of the invention is described with reference to figure 6.
  • Figure 6 shows a method for selective advertising according to the present invention.
  • the user 20 will select an electronic document 70, which has the user 20, may have searched, or which is stored in the bookshelf category 230.
  • the user 20 is presented with advertising that is relevant to the search result selected by the user 20.
  • the user 20 may then be presented with one or more advertisements.
  • the user 20 can then input a further search term to narrow down the search in step 650.
  • the next step 660 is that a user will then receive more selective advertisements that are based on the previous search and the most recent search conducted by the user 20.
  • the user 20 is then presented in step 670 with one or more adverts, which are highly relevant to the search of internet electronic document 70.
  • the user 20 is presented with a map of information that is relevant to the search of the electronic document 70.
  • the map allows the user 20 to rapidly navigate to related topics with respect to the search. This aspect of the invention is described with reference to figure 7.
  • the list of search results is displayed in the results window 220 as discussed above.
  • the user 20 can select one of the search results, which is displayed in the result display 260.
  • the user 20 can click on the journey button 270.
  • the result display 260 shows a journey map 261 of the result of the search of the electronic document 70.
  • the selected electronic document 70 is designated as a central home icon 262.
  • virtual routes 263 which are akin to roads of a road map.
  • links 264 are to electronic documents 70 which are relevant to the search conducted by the user 20.
  • the user 20 clicks one of these links 264 to rapidly navigate to further electronic document 70 on the web 60.
  • the central home icon 262 will relate to photography.
  • the links 264 can be generated in one aspect of the invention automatically. This is done by looking for other bookshelves 45 which contain the same electronic documents 70.
  • One or more links 264 are generated to the other bookshelves 45 which share one or more of the electronic documents 70 [0065]
  • the user 20 can create additional links 264 and therefore additional virtual routes 263 by clicking on the connection button 265.
  • the connection button 265 By clicking the connection button 265 the user 20 can create the link 264 to the electronic document 70 on the web 60.
  • the link 264 can be selected from a connection 235a, 235b that is stored in the connections category 235.
  • the connection button 265 by clicking the connection button 265 (not shown) the user can create a link 264 from the central home icon 262 (in this example photography) to a link 264 to the further electronic document 70 which would be in the area of photo lenses 235b.
  • FIG. 8 shows a schematic of a layout of the information that is held on the database 40.
  • the database 40 in this example comprises of two tables. It will be noted that in practice there will be many different tables and that tables may be generated automatically during update operations or may be defined by a programmer. It is not intended that the number of tables is limiting of the scope of the invention.
  • the tables contain information that is relevant to the users 20 and the electronic documents 70 searched by the users 20.
  • the database 40 in this example of the invention comprises a bookmark table 42a and a user subject matrix table 42b.
  • the bookmark table 42a consists of information that includes user identification information 48.
  • the user identification 48 comprises an identification number of the user 20 who selected the electronic document 70 identified by the identifier information 47.
  • the bookmark table 42a further comprises identifier information 47 that identifies content of electronic documents 70 searched by the user 20.
  • the identifier information 47 includes the URL of the electronic documents 70 which have been searched, tags that have been searched and further includes information on the so-called PIP index of the electronic document 70.
  • the PIP index is a measure of how good a page is in relation to the category into which the electronic document 70 has been categorized.
  • One simple measure of the PIP index is the number of weighted votes that the electronic document 70 has received. The weighting of the votes is determined, as noted above, by the degree of expertise of the user 20. It is possible that the electronic document 70 has more than one PIP index. For example, there may be different PIP indices for the different categories to which the electronic document 70 is assigned.
  • the user subject metrics table 42b includes user identification information 48 which is a reference or identification number pertaining to the user 20 who has searched for one of the electronic documents 70 over the web 60.
  • the user subject metrics table 42b further includes subject information 45 relating to the subjects 45 searched by the user 20.
  • the subject information 45 is the same information presented to the user 20 in the bookshelf category 230 according to figure 2.
  • the user subject metrics table 42b also contains information relating to the voting result 49 relevant to the electronic document 70 as well as the users 20 who have voted on the electronic document 70.
  • the user subject metrics table 42b also has information relating to the number of times an electronic document has been viewed in search results 220a-220f.
  • the information contained in the user subject metrics table 42b enables a variety of functions. For example, the information will enable the ordering of the electronic documents 70 in the search results.
  • the database 40 will also store details of the links 264 between the various electronic documents 70.
  • Figure 9 is a schematic representation of a system according to another aspect of the invention, showing an expertise engine 910 and associated votes 49, bookmarks 960, 15 user data 970, and users 20, 920.
  • the expertise engine 910 is joined in communication to the web 60 in any suitable manner.
  • the system also includes a bookshelf 930 including bookshelf categories, and a database 940 including document identifiers and votes on the electronic documents.

Abstract

The present invention teaches a system and a method for a user (20) to rapidly and accurately search for the web (60) for an electronic document (70) and to provide links (264) between different ones of the electronic documents (70). The invention provides a means for the user to vote upon the relevance of the electronic document (70) with respect to the search. The invention allows the user (20) to receive targeted advertising that is relevant to the search for the electronic document (70). The present invention allows the user (70) to navigate rapidly to related electronic documents (70) on the web (70) that are relevant to the search via a virtual map.

Description

Specification
Cross-Reference to Related Applications
[0001] This application is related to and claims the benefit of US Patent Application No. US 61/232,700 filed on 10 August 2009.
Field of the Invention
The field of the present invention relates to a system and a method for searching for content in the form of an electronic document stored on a worldwide web.
Background of the Invention
[0002] The internet, often referred to as the worldwide web, is an extremely large interconnected network of electronic information. The electronic information is stored on the worldwide web in the form of electronic documents. The internet is a network that comprises millions of private and public networks that are linked together. The internet carries various contents in the form of a plurality of electronic documents that can be searched by a user of the internet. The user can search for an electronic document by well-known search engines, such as Google or Yahoo.
[0003] During the last decade it has been estimated that the internet has grown by 100% per year (see document titled "The size and growth rate of the internet" by AT&T Labs by Coffman, K. G. and Oblyzkoam on the website http://netlib.bell- labs.com/netlib/att/math/people/amo/doc/internet. size.ps). As a consequence of the growth of the internet, the content of electronic documents on the internet has grown too. [0004] There is a need for a user to quickly locate electronic documents that have a degree of relevance and accuracy to a search that has been performed by the user.
[0005] A user conducting the search on the internet typically enters into a search engine a search string. The search engine will then query the search string with a plurality of further websites to see whether the search string matches a portion of one or more of the electronic documents. Such a search procedure makes the searching process for particular one of the electronic documents over the internet laborious and time consuming until the user finds the electronic documents that are truly relevant to the information sought. [0006] Another search strategy implemented by the search engine will search an embedded tag in a webpage or in the electronic document, which may be relevant to the search string entered by the user. However, the relevance of the results of the search may not be applicable to the information sought by the user. In particular it is possible that embedded tags are inserted to direct the user to one of the websites or one of the electronic documents that are of marginal interest. Therefore the user may have to use a plurality of search engines offered by a plurality of websites to find the relevant information sought.
[0007] Another problem of the existing search methods provided by search engines is the inability of the search engine to track the feedback of a user relevant to the search. The ability to track user feedback enables third user to rely upon a "good" or a "bad" feedback score for the electronic document searched. The reliance on a feedback score allows a third user to be confident of the contents of the electronic document before reviewing the electronic document. [0008] Internet cookies are a means to track the user of the internet content. However, the internet cookies fail to provide a means whereby the feedback relevant to a search for specific internet content can be analyzed and "voted" upon by the user. The internet cookies rely on an IP address of the user and recognize when the user returns to a specific page of the internet content.
[0009] Therefore to find internet content in the form of the electronic document the user will have to use a number of internet search engines and web pages to locate the electronic document that is relevant and specific to the search of the user. [0010] Systems and methods for searching for internet based content and voting on the relevance of the electronic document are known in the art.
[0011] United States patent application publication number US 2008/0114739 (Paul Hayes, NJ) is titled "System and method for searching for internet-accessible content". The '739 document discloses a system and method for searching for internet accessible content. The '739 document discloses one or more meta servers or sites which store information, in the form of link map data, about the structure and location of content on one more internet host servers (e.g. website host servers) or one or more peer-to- peer networks. The meta servers also store processed content provided by the one or more internet host servers, such that content on the one or more internet host servers is indexed and stored at the meta servers. The content on the internet host servers that is not in HTML format is converted by the system of the Hayes '739 patent application into HTML content. The content is indexed and stored at the one or more meta servers. The indexed and stored content, as well as and link map data, of each meta site allows for a plurality of different types of content including static content, dynamically-generated content and content on peer-to-peer networks to be searched in real time. The content also stored on the one or more internet host servers is linked to the indexed and stored content on the meta servers and is accessible to the users. The '739 document discloses a plug-in which includes a toolbar. The plug-in provides an otherwise standard web browser with the ability to secure or authenticate the identity of voters, to securely collect user votes, and to securely transmit user information and votes to a central repository for secure storage. The toolbar allows a vote based reordering of search results from other popular search engines in real time. The '739 document also discloses a central repository which tracks user votes and query progressions. The '739 document also provides a method for improving search results generated by a search engine and reducing unwanted advertising associated with search results.
[0012] The '739 disclosure discloses a method whereby a search query is sent to a search engine and a single search result representing the best result is displayed to the user while the remaining ones of the search results are suppressed. The user then votes on the single search result, either rejecting or accepting the result. If the result is accepted the remaining search results are displayed to the user, along with paid advertisements.
[0013] International Patent Application No WO 2009/030990 (Wong) teaches a method of generating search results for a search engine that allows users to provide feedback to the search engine. The search engine retains historical data for all of the searches performed. When the results are displayed to the user, the user is able to interact with the system and thus provide feedback about the quality of the results that modifies the historical data. Such feedback can then be used in future searches to generate results that are better ranked and clustered according to relevance to the search request. The ranking is carried out by individual users without any corroboration of their degree of knowledge on the subject.
[0014] International Patent Application Publication No WO 2006/089137 (Infomato) teaches a system and method for organizing and retrieving information. The system has a so- called crosslink database storing data structures that comprise at least a connecting node and further elements connected to the connecting node by a link. A browse on a user's computer allows the information to be retrieved from the database. The elements are related or cross- linked to each other through the connecting node which allows efficient retrieval of the information.
[0015] United States patent application publication number US 2006/0294040 (Ning Zhu, Florida) is titled "Method for using, adding and removing category in directory by documents". The '040 document discloses a directory service for web pages. The '040 allows for web pages to add, vote to remove and support categories in a directory. Each of the web pages can have one category. The '040 document teaches that the category information is put into the web page itself. A crawler will gather the category information from the web pages periodically. The '040 document discloses a method to score the categories so that the categories can be listed in order of their scores.
[0016] There is also a desire to earn revenue from the search engine. For example, Google provides an AdSense service that provides the user of the search engine relevant advertisement to the search conducted. If the user clicks-through the advertisement then revenue is paid by the advertiser to the operator of the search engine.
One example of a systems for serving specific advertising is US patent application publication No. 2004/0059708 which is owned by Google and is titled Methods and apparatus for serving relevant advertisements. The '708 document teaches a system and method whereby the relevance of advertisements to a users interests is improved. In one implementation, the content of a web page is analyzed to determine a list of one or more topics associated with that web page. An advertisement is considered to be relevant to that web page if it is associated with keywords belonging to the list of one or more topics. One or more of these relevant advertisements may be provided for rendering in conjunction with the web page or related web pages. Summary of the Invention
[0017] The present invention teaches a system and a method for searching for an electronic document that is located on the Internet. The system comprises a database that is connected to a search engine wherein the database stores information relating to pages that have been searched by a user. The information identifies one of a plurality of electronic documents and allows a user and/ or one of a plurality of third users to vote upon the relevance of the electronic document with respect to a search string.
[0018] The present invention enables the generation of a database of expert users. The selection of the expert users is based upon multiple criteria that include but is not limited to the voting of the relevance of the electronic document and the activity of the users. [0019] The voting of the electronic document provides a voting result with the advantage that it enables rapid access to the electronic document as it provides a user with a score as to the relevance of the electronic document.
[0020] In a further aspect a means is provided for the user to receive specific information i.e. advertisements that are relevant to the electronic documents searched by the user and the voting result of the electronic document.
[0021] In a further aspect the user or one of a plurality of third users is allowed to connect further electronic documents to the results of the search enabling the user to reliably locate information that is connected to and or relevant to the search conducted. The user is able to specify a manner and a type of connection between the further electronic documents and the electronic documents. The further electronic documents have an arbitrary user-defined context, as will be explained below. [0022] The advantages of the present invention include allowing the user to rapidly access internet content accurately and to avoid the need to visit a plurality of search engines and or web pages to locate the electronic document. The present invention also enables the user in one aspect to continually refine search queries [0023] The invention further provides in a further aspect the user the ability to receive advertising that is relevant to the search of which subject matter the user is interested. The advertising received by the user can also be influenced by the user's interests and the user's expertise as well as by the search query.
[0024] A further aspect of the invention allows for the maintenance of a dynamic search procedure for a user by providing a map of related internet information based upon the categorization by different ones of the users of the same (or similar) pages. Description of Figures
[0025] The foregoing features of the present invention will be apparent from the following detailed description of the invention, taken in conjunction with the accompanying drawings in which:
Figure 1 shows a schematic of an apparatus for searching and grouping a subset of a plurality of electronic documents according to the present invention. Figure 2 shows a schematic of a device for the selection and grouping of a subset of a plurality of electronic documents according to the present invention.
Figure 3 shows a schematic of a method for the selection and grouping of a subset of a plurality of an electronic document according to the present invention.
Figure 4 shows a schematic for an exemplary method for retrieving and voting on a selection and grouping of a subset of a plurality of documents according to the present invention.
Figure 5 shows a schematic of a method for evaluating users according to the present invention.
Figure 6 shows a schematic of a method for the generating advertising for display to a user on a terminal according to the present invention. Figure 7 shows a schematic of a virtual map relating to a search conducted by a user. Figure 8 shows an overview of the information stored on a database according the present invention. Figure 9 shows a schematic representation of a system according to another aspect of the invention.
Detailed Description of the Invention [0026] For a complete understanding of the present invention and the advantages thereof, reference is made to the following detailed description taken in conjunction with the accompanying figures.
[0027] It should be appreciated that various aspects of the present invention discussed herein are nearly illustrative of the specific ways to make and use the invention and do not limit the scope of the invention when taken into consideration with the claims and the following detailed description and the accompanying figures.
[0028] It should be observed that features from one aspect of the invention can be combined with features from other aspects of the invention.
[0029] The teachings of the cited documents should be incorporated by reference into the description. [0030] Figure 1 shows an apparatus 10 for searching and grouping of a subject of a plurality of electronic documents 70 that are searched for by a user 20 over a web 60. The web 60 refers to the internet, but could also be an internal intranet, documents stored on a user computer or a combination of the two. The apparatus 10 shows a plurality of users 20 who are at a terminal for accessing the web 60. The users 20 also includes the user 20 who is performing the search for the relevant electronic document 70 and includes further users 20 who are not the user 20 performing the search (i.e. not entering a search string).
[0031] The terminal includes any means for the user 20 to access the web 60 and includes, but is not limited to, a personal computer, PDA, mobile phone etc. [0032] The user 20 can access the web 60 by a network connection 25. The network connection 25 includes any known method in the art for connecting to the web 60, such as fixed line communication or a mobile communication.
[0033] The users 20 are connected to a search engine 30 and, in some aspects of the invention, to an advertising engine 80. The search engine 30 is connected to the web 60 by an internet connection 50. The search engine 30 is connected to a database 40. The database 40 is used to store details of the searches performed by the users 20. The search engine 30 comprises a processing device 35. The processing device 35 enables to the search and allocates a selected one or more of the plurality of electronic documents 70 to one or more of a plurality of groups 45, as will be explained later. The groups 45 can be envisaged to be a virtual "bookshelves" storing, for example, at least some of the electronic documents 70 generated as results of the search, as will become clearer later. It will be noted, however, that the groups 45 can also store further ones of the electronic documents 70 that are not generated by the search engine 30. Such further electronic documents 70 could include other electronic documents 70 that the user 20 discovers through general browsing or following links in other electronic documents 70. It will be noted at this stage that the electronic documents 70 may be available and stored on the world wide web or on a private intranet (or indeed on the user's own personal computer)).
[0034] The search engine 30 further comprises a selection device. The selection device is responsive to an input of the user 20 for selecting one or more of the plurality of electronic documents 70. The database 40 stores a document identifier 47 wherein the document identifier 47 is used to identify one of a plurality of the electronic documents 70. The identifiers 47 could be in one aspect of the disclosure the URL (Uniform Resource Locator) of the electronic documents 70 or could be a Digital Object Identifier, such as that known from the ANSI/NISO standard Z39.84-2005 Syntax for the Digital Object Identifier. [0035] The advertising engine 80 has a plurality of modes of operation. The function of the advertising engine 80 includes but is not limited to matching interests of the user 20 with a selection of advertisements held in the advertising engine 80 (or an associated database) with interests of the users 20. The selection is based, for example, on the groups 45 established by the user 20 and/or by tags given by the user 20. Furthermore the advertising engine 80 can use the search terms used by the user 20 in the search and map these search terms with the advertising to generate relevant advertising.
[0036] An aspect of the present invention will now be described in reference to figure 2. Figure 2 shows a schematic of a screenshot of a main screen 200 when the user 20 is using the present invention for searching the web 60 for one of the electronic documents 70.
[0037] The user 20 will access the main screen 200 using the terminal. The user 20 will enter the search string. The search string is a keyword for which the user 20 wishes to find one or more electronic document 70 relating to the keyword. The user 20 will input the search string using, for example, a keyboard connected to the terminal. The user 20 inputs the search string into a search bar 210. In the example shown in figure 2, the search string is "photography". The user 20 wishes to search the web 60 for electronic documents 20 that pertain to photography. The user 20 will then click a search button 215 to begin the search of the web 60 for the electronic document 70 including the search term "photography". The results of the search are displayed in a result window 220. The result window 220 shows a title of the relevant electronic documents 70 and a summary of the relevant ones of the electronic documents 70. In the example of figure 2 there are shown six examples of the relevant documents 70 that pertain to the search string photography, search results 220a - 220f. It will be noted that the example shown in Fig. 2 is merely exemplary. Other arrangements of the search results 220a- 220f in the results window 220 as well as other clickable buttons can be envisaged.
[0038] When the user 20 is presented with the search results 220a- 220f of the search in the result window 220 the user 20 can click on any one of the search results 220a - 220f to select the electronic document 70 and obtain more detail about the selected one of the plurality of search results 220a- 220f in a result display 260. The result display 260 contains a number of clickable buttons that allows the user 20 to manipulate information pertaining to the selected one of the plurality of search results 220a-220f. The clickable buttons in the example of Fig.2 are a bookmark button 225, a vote button 275, a connection button 265, and a journey button
270. The function and operation of the clickable buttons will be described in greater below detail. [0039] The main screen 200 in the example of Fig. 2 also comprises a library section 250. The library section 250 contains a searches and journeys category 245, a bookshelf category 230, a connections category 235 and an account category 240. [0040] By selecting the searches and journey category 245, the user 20 is presented with a list of recent search activity conducted by the user 20. The search activity includes the searches activity of the user 20 during the last month, the last week or the current day. The search activity could also show a tree representation of the user's 20 search activity. [0041] By clicking the journey button 270 the user 20 is presented with a journey map 261 that is relevant to the search conducted. The journey map 261 allows the user 20 to navigate to electronic documents 70 on the web 60 which are connected to the search conducted. The journey feature will be discussed later, in more detail with reference to figure 7. [0042] The bookshelf category 230 is used to categorize electronic documents 70 when the user 20 clicks the bookmark button 225. The bookshelf category 230 can contain a plurality of sub-directories. The sub-directories are more specific with respect to the search conducted by the user 20. In the example with respect to photography shown in figure 2, the bookshelf category 230, contains two sub-directories 230a and 230b, for sports and Java respectively.
[0043] The connection button 265 in the result display window 260 can be used to create a connection between the search result of the electronic document 70 and another one of the plurality of electronic documents 70 to which the searched electronic document 70 may have a connection. In the example shown in figure 2 the result of the electronic document 70 shown in the result display 260 could have a connection 235a to other one of the electronic documents 70 in the area of photography or a connection 235b to a further electronic document 70 in the category of photo lenses. The feature of creating connections 235 between electronic documents 70 enables the user 20 to rapidly locate electronic documents 70 that are relevant to the electronic document 70 initially searched. The connections 235 allow the user 20 to make connections between electronic documents 70 in a manner desired by the user 20. One example could be the selection by the user 20 of a degree of difficulty of the documents. The user 20 could make connections to indicate an increasing or a decreasing degree of complexity between connected ones of the electronic documents 70. The method in which the connections 235 are created will be described later with respect to figure 7. [0044] The account category 240, of the library section 250, allows the user 20 to manage a user account and select preferences. The account category 240 includes details of the user 20, user preferences and provides a means for managing the bookshelves 230a, 230b. [0045] A method for the selection and grouping of a subset of the plurality of electronic documents 70 to form the bookshelf will now be described with reference to figure 3. In the method 300 a first step 305 is the start. One of the users 20 wishes to search for an electronic document 70 via the web 60. [0046] In step 310 the user 20 inputs the search string i.e. the search term that is relevant to the search being conducted into the search bar 210. The user 20 will then click the search button 225 in step 215 to commence the search of the web 60. In the next step 315 the web 60 is searched for relevant content with respect to the search string entered by the user 20. The search is carried out using one or more search engines which can include but are not limited to the Google search engine or the Yahoo search engine as well as using the system described in this application The relevant content will be one or more electronic documents 70 which include the search string. The relevant content can be hierarchically ordered using different document source indices provided by the different ones of the search engines. It will be explained later that the relevant result of the search for the electronic document 70 can be further refined using past history and voting results.
[0047] In the next step 320 a list of the search results are presented to the user 20 in the results window 220. The results window 220 displays at least one of the search results. In the next step 325 the user 20 can select one search result from the list of the search results displayed in the results window 220. When the user selects the one e.g. 220c of the plurality of search results 220a-220f from the result window 220 a summary of the selected search result 220c is displayed in the result display 260. The summary of the one 220c selected search result could be part of the electronic document 70 or an abstract of the electronic document 70 related to the one search result 220c.
[0048] The user 20 can then bookmark the one 220c selected search result in step 330 by clicking the bookmark button 225. The user 20 categorizes the one 220c selected search result into one of the bookshelves 230a, 230b and the one 220c selected search result of the search is saved in the selected bookshelf category 230. The user 20 can access the result of the saved bookshelf any time. It will be noted at this stage that the bookshelves 230a, 230b can also be accessed by other ones of the users 20 including other users 20 who did not create or previously store any electronic documents 70 into one of the bookshelves 230a, 230b. It will be further noted that the user 20 may add tags to the electronic document 70. The user- generated tag can also be used to categorize the electronic documents 70.
[0049] The bookmark that is displayed in the bookshelf category 230 is stored in the database 40 in step 335. The database 40 has a number of tables 42a, 42b to store information relating to the one selected search results 220c, as will be explained later. [0050] When the user 20 has saved (bookmarked) the one selected search result 220c, the user 20 can continue with a further selection of one of the plurality of search results 220a, 220b 220d, 220e & 220f from the result window 220 from the initial search conducted. In this case the user 20 returns to step 330. The user 20 can select another one from the list of the search results 220a, 220b 220d, 220e & 220f displayed in the result window 220. Again the user 20 can save the other search result (one of 220a, 220b 220d, 220e & 220f) as a bookmark by clicking the bookmark button 225. Should the user 20 not wish to select a further one of the search results (one of 220a, 220b 220d, 220e & 22Of)), then the user 20 can then commence a new search as in step 345 whereby the process begins as in step 305. To take the example above, the user 20 might carry out the new search for a related search term "photographers" instead of the search term "photography". The related search term "photographer" will produce a new list of the search results, which can be selected and saved by the user 20. The user 20 does not have to create a further bookshelf category 230. Instead the user 20 can still use the existing bookshelf category 230. This allows the user 20 to group all of the relevant electronic documents 70 into bookshelf categories 230 with names with which the user 20 is familiar.
[0051] When the user 20 saves the one of the search result 220c to the database 40, a document identifier 47 that is relevant to the searched electronic document 70 is saved. The document identifier 47 is typically the uniform resource locator (URL) of the electronic document 70 but could also be another identifier, such as a digital object identifier.
[0052] Figure 4 shows a schematic of a method used by the user 20 for retrieving a previous search and reviewing the previous search and voting upon the search results of the previous search. The ability to vote on the electronic document 70 allows a second user 20 to search more accurately as votes are associated with the electronic document 70 in the form of a voting result 49. The system of the invention can also be used to categorize the quality of the electronic document 70 by virtue of the voting result 49. The voting result 49 allows the second user 20 to rapidly identify qualitatively good ones of the electronic documents 70, which are relevant to the search requested. I
[0053] The voting on the electronic documents 70 can also be carried out at other points during the search procedure. This is done by clicking an appropriate voting button on the main screen 200 or on another screen. The user 20 is given an opportunity to vote on the electronic document 70 being displayed.
[0054] The start 405 is followed by the user 20 retrieving a previously stored search result from the bookshelf category 230 in step 410. It will be noted that the user 20 could be either the original user 20 or more preferably is the second user 20. In step 415 the user 20 can then select one of a plurality of electronic documents 70 that are displayed in the results window 220. The selected electronic document 70 will then be displayed in the results window 220. Upon selection of the one electronic document 70 from the results window 220 the summary of the selected electronic document 70 is displayed in more detail in the results display 260, as discussed above. When the summary of the electronic document 70 is displayed in the results display 260 the user 20 can then review the electronic document 70 in step 420 as s/he wishes. The user 20 can vote on the "quality" of the electronic document 70 in step 425. The user can vote on the document 70 by making a selection on the vote button 275. The user can vote on the electronic document 70 with a positive result, a negative result, or a result whereby the user is undecided as to the relevance and accuracy (or other factors affecting the "quality") of the electronic document 70 with respect to the search entered. When the user 20 has voted on the document 70 the vote is stored in the database 40 in step 430 as the voting result 49. Tables are used for storing the voting result 49, as will be discussed below. The voting result 49 of the electronic document 70 is used in one aspect of the invention to control the order of display of the electronic documents 70 retrieved in a search as explained above. [0055] In a further aspect of the present invention, the system provides a method for evaluating the users 20. The advantage of evaluating the users 20 is that it provides a means for determining who are "expert" users 20, i.e. which ones of the users 20 could be considered experts in their field. Details of the expert users 20 can be observed by other users 20. This feature allows other users 20 to trust the voting results that the electronic documents 70 have received from the expert user, thus determining the quality of the user 20 with respect to the score the electronic document 70 has received. This aspect of the invention is described in greater detail with reference to figure 5. [0056] The method for evaluating a user 20 begins in step 510. In step 520, the system selects the user 20 to be evaluated. The system then selects in step 530 the electronic documents 70 that the user 20 has bookmarked and divides these bookmarked electronic documents 70 by category. In step 540 the system then reviews the voting result 49 for each one of the categories that the bookmarked electronic documents 70 have received from the other users 20. In step 550 the system then tabulates the voting results 49 received for the user 20 to be evaluated with the subject matter of the bookmarked electronic documents 70. The tabulation of the voting results 49 is carried out for each one of the categories to allow an evaluation of the users 20 for each one of the categories. [0057] In a further aspect of the invention, the voting results 49 are weighted by the degree of expertise of the other users 20. So, for examples, the votes given by other users 20 who are considered to be experts in that category to which the electronic document 70 is assigned will be assigned a higher weight. [0058] Natural language processing methods or ontologies can be used to relate different ones of the categories to each other. So, for example, some categories may be synonyms of other categories. In this case it would make sense to use all of the voting results 49 for all of the categories to assess the level of expertise of the user 20. Other categories may be "supercategories" of a plurality of subcategories. In this case it is useful to define experts for both the supercategory and the subcategory. For example, an expert in sheepdogs may not necessary be an expert in lapdog breeding (both of which would be subcategory of the "dog" supercategory). However, both of the users 20 would be considered to be experts in the dog supercategory. [0059] In a further aspect of the present invention, the invention allows the user 20 to receive selected advertisements that are relevant to the search conducted by the user 20 such that the advertisements display content that might be of interest to the user. So, for example, if the user selects the search term "photography", advertisements relating to photography will be displayed. This is similar to the AdSense system used by Google to produce relevant advertisements to users of the Google search engine.
[0060] The advertisements that are displayed to the user 20 can be dynamically modified in accordance with one or more of the following criteria: for example with the voting result 49 the electronic documents 70 have received during the voting process, with the selection of the search results, by reviewing the account of the user 20, by reviewing previous ones of the searches carried out by the user 20, the categories of interest to the user 20, the bookmarked electronic documents 70 of the user 20 and the categories for which the user 20 is regarded as an expert. This aspect of the invention is described with reference to figure 6.
[0061] Figure 6 shows a method for selective advertising according to the present invention. In the first step 610 the user 20 will select an electronic document 70, which has the user 20, may have searched, or which is stored in the bookshelf category 230. At this stage the user 20 is presented with advertising that is relevant to the search result selected by the user 20. In step 640 the user 20 may then be presented with one or more advertisements. The user 20 can then input a further search term to narrow down the search in step 650. The next step 660 is that a user will then receive more selective advertisements that are based on the previous search and the most recent search conducted by the user 20. The user 20 is then presented in step 670 with one or more adverts, which are highly relevant to the search of internet electronic document 70. The selection of the advertisements dependent on keywords associated with the advertisements and any tags associated with a selected one of the electronic documents 70, as well as on the entered search term. [0062] In a further aspect the present invention, the user 20 is presented with a map of information that is relevant to the search of the electronic document 70. The map allows the user 20 to rapidly navigate to related topics with respect to the search. This aspect of the invention is described with reference to figure 7. [0063] When the user 20 is presented with the results of a search for an electronic document 70, the list of search results is displayed in the results window 220 as discussed above. The user 20 can select one of the search results, which is displayed in the result display 260. The user 20 can click on the journey button 270. The result display 260 shows a journey map 261 of the result of the search of the electronic document 70. The selected electronic document 70 is designated as a central home icon 262. Connected to the central home icon 262 are virtual routes 263 which are akin to roads of a road map. At the end of the virtual routes 263 are links 264 to further ones of the electronic documents 70 on the web 60. The links 264 are to electronic documents 70 which are relevant to the search conducted by the user 20. The user 20 clicks one of these links 264 to rapidly navigate to further electronic document 70 on the web 60. In the example of figure 7, the central home icon 262 will relate to photography. Should the user 20 click on a link 264 of the virtual route 263 titled prior knowledge, then the user 20 will be directed to a group of the electronic documents 70 on the web 60 that contains prior knowledge on photography. In a further example should the user 20 click on a link 264 titled fashion photography, the user 20 is taken to a group of the electronic documents 70 on the web 60 that contains information relating to fashion photography. Should the user 20 click on any link 264 of the journey map 261, the user 20 has the ability to vote upon the content of the electronic document 70 as previously described. [0064] The links 264 can be generated in one aspect of the invention automatically. This is done by looking for other bookshelves 45 which contain the same electronic documents 70. One or more links 264 are generated to the other bookshelves 45 which share one or more of the electronic documents 70 [0065] The user 20 can create additional links 264 and therefore additional virtual routes 263 by clicking on the connection button 265. By clicking the connection button 265 the user 20 can create the link 264 to the electronic document 70 on the web 60. The link 264 can be selected from a connection 235a, 235b that is stored in the connections category 235. In the example of figure 7, by clicking the connection button 265 (not shown) the user can create a link 264 from the central home icon 262 (in this example photography) to a link 264 to the further electronic document 70 which would be in the area of photo lenses 235b. When the user 20 creates the new connection to photo lenses a new virtual route with a link 264 to photo lenses will be created from the central home icon 262 - photography. [0066] The features of the database 40 shall now be described in greater detailed with reference to figure 8. Figure 8 shows a schematic of a layout of the information that is held on the database 40. The database 40 in this example comprises of two tables. It will be noted that in practice there will be many different tables and that tables may be generated automatically during update operations or may be defined by a programmer. It is not intended that the number of tables is limiting of the scope of the invention. The tables contain information that is relevant to the users 20 and the electronic documents 70 searched by the users 20. The database 40 in this example of the invention comprises a bookmark table 42a and a user subject matrix table 42b.
[0067] The bookmark table 42a consists of information that includes user identification information 48. The user identification 48 comprises an identification number of the user 20 who selected the electronic document 70 identified by the identifier information 47. The bookmark table 42a further comprises identifier information 47 that identifies content of electronic documents 70 searched by the user 20. The identifier information 47 includes the URL of the electronic documents 70 which have been searched, tags that have been searched and further includes information on the so-called PIP index of the electronic document 70.
[0068] The PIP index is a measure of how good a page is in relation to the category into which the electronic document 70 has been categorized. One simple measure of the PIP index is the number of weighted votes that the electronic document 70 has received. The weighting of the votes is determined, as noted above, by the degree of expertise of the user 20. It is possible that the electronic document 70 has more than one PIP index. For example, there may be different PIP indices for the different categories to which the electronic document 70 is assigned.
[0069] The user subject metrics table 42b includes user identification information 48 which is a reference or identification number pertaining to the user 20 who has searched for one of the electronic documents 70 over the web 60. The user subject metrics table 42b further includes subject information 45 relating to the subjects 45 searched by the user 20. The subject information 45 is the same information presented to the user 20 in the bookshelf category 230 according to figure 2. The user subject metrics table 42b also contains information relating to the voting result 49 relevant to the electronic document 70 as well as the users 20 who have voted on the electronic document 70. The user subject metrics table 42b also has information relating to the number of times an electronic document has been viewed in search results 220a-220f. [0070] The information contained in the user subject metrics table 42b enables a variety of functions. For example, the information will enable the ordering of the electronic documents 70 in the search results. [0071] The database 40 will also store details of the links 264 between the various electronic documents 70.
[0072] Figure 9 is a schematic representation of a system according to another aspect of the invention, showing an expertise engine 910 and associated votes 49, bookmarks 960, 15 user data 970, and users 20, 920. The expertise engine 910 is joined in communication to the web 60 in any suitable manner. The system also includes a bookshelf 930 including bookshelf categories, and a database 940 including document identifiers and votes on the electronic documents. [0073] Having thus described the present invention in detail, it is to be understood that the foregoing detailed description of the invention is not intended to limit the scope of the invention thereof. What is desired to be protected by letters patent is set forth in the following claims.
Reference Numerals
10 System
20 User
25 Network connections
30 Engine
35 Processor
40 Database
42 Tables
42a Bookmark tables
42b User subject metric table
45 Bookshelf
47 Identifiers
49 Voting result
50 Internet connection
60 Web
70 Electronic document
80 Advertising engine
200 Main screen
210 Search bar
215 Search button
220 Results window
220a-220f Search results
225 Bookmark button
230 Bookshelf category
235 Connections
240 My account
245 Searcher and Journeys
250 Library
260 Result display
261 Journey map
262 Central home icon
263 Virtual route
264 Link
265 Connection button
270 Journey Button
275 Vote Button

Claims

Claims
1. An apparatus (10) for searching and grouping a subset of a plurality of electronic documents (70) comprising:
- a database (40) connected to a search engine (30), wherein the database (40) stores - a plurality of document identifiers (47), wherein a one of the identifiers (47) identifies a one of the plurality of electronic documents (70), wherein one of more of the identifiers (47) is arranged into one of a plurality of groups (45),
- and a plurality of links between one or more of the document identifiers (47);
- the search engine (30) connectable to a corpus (60) comprising the plurality of electronic documents (70) and to at least one user (20);
a selection device responsive to an input for selecting the one or more of the plurality of electronic documents (70); and
a processing device (35) for allocating the selected one or more of the plurality of electronic documents (70) to one of more the plurality of groups (45), and for generating one or more links (264) between several of the plurality of electronic documents (70).
2. The apparatus (10) of claim 1, wherein the database (40) further includes user identifiers (48) associated with the document identifiers (47).
3. The apparatus (10) of any one of the above claims, wherein the database further
includes one or more voting results (49) associated with the document identifiers (47).
4. The apparatus (10) of any one of the above claims, further comprising an advertising engine (80) in connection with the processing device (35) and the user (20).
5. The apparatus (10) of any one of the above claims wherein the one or more links (264) comprise at least one of a prior knowledge link, related subject link, supercategory link and synonym link.
6. The apparatus (10) of any one of the above claims, further comprising a result display (260) for displaying one or more links (264) between a selected one of the plurality of electronic documents (70).
7. A method for the selection and grouping of a subset of a plurality of electronic documents (70) comprising:
retrieving (310, 315, 320) one or more of the plurality of electronic documents; (70) selecting (325) one of the retrieved plurality of electronic documents (70);
- grouping (330) the selected one of the plurality of the electronic documents (70) into an associated one or more of a plurality of groups (45);
- saving (335) a document identifier (47) to the selected one of the plurality of electronic documents (70) and the associated one or more of the plurality of groups (45); and
- creating one or more links (264) between the selected one of the plurality of electronic documents (70) and another one of the electronic documents (70)
8. The method of claim 7, further comprising saving (335) a user identification (48) associated with the document identifier (47).
9. The method of any one of claims 7 or 8, further comprising:
retrieving (410) one of the plurality of groups (45);
reviewing (415, 420) one of the plurality of electronic documents (70) associated with the retrieved one of the plurality of groups (45); and
voting (425) on the reviewed one of the plurality of electronic documents (70).
10. The method of any one claims 7 to 9, further comprising defining a category of the link (264) between two of the plurality of electronic documents (70).
11. A method of navigating between the plurality of electronic documents (70)
comprising:
- displaying a first one of the plurality of electronic documents (70) and at least one link (264) associated with the first one of the plurality of electronic documents (70);
- selecting the at least one link (264);
- displaying second ones of the plurality of electronic documents (70) associated with the selected at least one link (264).
12. The method of claim 11 further comprising:
- displaying a plurality of links (264) having different categories.
13. The method of claim 11 or 12, further comprising:
- selecting one of the second ones of the plurality of electronic documents (70).
14. A device for the selection and grouping of a subset of a plurality of electronic
documents (70) comprising:
a search bar (210) for entering a search string;
a results window (220) for presenting one or more search results (220a- f) generated from the search string;
a selection device for selecting one (220c) of the one or more search results (220a-f);
a bookmark button (225) for grouping the selected one (220c) of the one or more search results (220a- f);
a voting button (275) for activating a voting result (49) associated with the selected one (220c) of the one or more search results (220a-f); and
- a connection button (265) for establishing a link (264) between different ones of the one or more search results (220a- f) .
15. A database (40) comprising:
a plurality of document identifiers (47) for a plurality of electronic documents (70) grouped into groups (45);
one or more voting results (49) associated with at least one of the plurality of document identifiers (47);
one or more links (264) between different ones of the plurality of document identifies (47); and
- a user identifier (48) associated with the document identifier (47).
16. A method for evaluating bookmarking users (20) comprising:
selecting (520) a bookmarking user (20);
selecting (530) document identifiers (47) for a subset of electronic documents (70) previously reviewed by the bookmarking user (20);
ranking (550) the user voting results (49) associated with the document identifiers (47).
17. The method of claim 16, wherein the voting results are tabulated from votes given by one or more voting users after reviewing one or more of the subset of electronic documents.
18. A method for the generating advertising for display to a user (20) on a user terminal
(20) comprising:
- collecting subject information for groups (45) created by the user (20)
- retrieving (630) an advertising list of one or more advertisements relevant to the subject information;
- presenting (640) the list retrieved one or more advertisements to the user (20); on entry (650) of a search term retrieving (660) a modified list of one or advertisements relevant to the subject information and to the search term;
- presenting (670) the modified list to the user (20) 19. The method according to claim 17, wherein the advertising displayed to a user is dependant upon a voting result (49) of the subject information of an electronic document (70).
PCT/EP2010/061604 2009-08-10 2010-08-10 Method and apparatus for searching documents WO2011018453A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US23270009P 2009-08-10 2009-08-10
US61/232,700 2009-08-10

Publications (1)

Publication Number Publication Date
WO2011018453A1 true WO2011018453A1 (en) 2011-02-17

Family

ID=42710700

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2010/061604 WO2011018453A1 (en) 2009-08-10 2010-08-10 Method and apparatus for searching documents

Country Status (1)

Country Link
WO (1) WO2011018453A1 (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040059708A1 (en) 2002-09-24 2004-03-25 Google, Inc. Methods and apparatus for serving relevant advertisements
WO2006002180A2 (en) * 2004-06-18 2006-01-05 Pictothink Corporation Network content organization tool
WO2006050278A2 (en) * 2004-10-28 2006-05-11 Yahoo!, Inc. Search system and methods with integration of user judgments including trust networks
WO2006089137A1 (en) 2005-02-15 2006-08-24 Infomato Crosslink data structure, crosslink database, and system and method of organizing and retrieving information
US20060294040A1 (en) 2005-05-16 2006-12-28 Ning Zhu Method for using, adding and removing category in directory by documents
US20080059453A1 (en) * 2006-08-29 2008-03-06 Raphael Laderman System and method for enhancing the result of a query
US20080114739A1 (en) 2006-11-14 2008-05-15 Hayes Paul V System and Method for Searching for Internet-Accessible Content
WO2009030990A1 (en) 2007-09-06 2009-03-12 Chin San Sathya Wong Method and system of interacting with a server, and method and system for generating and presenting search results

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040059708A1 (en) 2002-09-24 2004-03-25 Google, Inc. Methods and apparatus for serving relevant advertisements
WO2006002180A2 (en) * 2004-06-18 2006-01-05 Pictothink Corporation Network content organization tool
WO2006050278A2 (en) * 2004-10-28 2006-05-11 Yahoo!, Inc. Search system and methods with integration of user judgments including trust networks
WO2006089137A1 (en) 2005-02-15 2006-08-24 Infomato Crosslink data structure, crosslink database, and system and method of organizing and retrieving information
US20060294040A1 (en) 2005-05-16 2006-12-28 Ning Zhu Method for using, adding and removing category in directory by documents
US20080059453A1 (en) * 2006-08-29 2008-03-06 Raphael Laderman System and method for enhancing the result of a query
US20080114739A1 (en) 2006-11-14 2008-05-15 Hayes Paul V System and Method for Searching for Internet-Accessible Content
WO2009030990A1 (en) 2007-09-06 2009-03-12 Chin San Sathya Wong Method and system of interacting with a server, and method and system for generating and presenting search results

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
AT&T LABS; COFFMAN, K. G.; OBLYZKOAM, THE SIZE AND GROWTH RATE OF THE INTERNET, Retrieved from the Internet <URL:netlib.bell- labs.com/netlib/att/math/people/amo/doc/internet.size.ps>

Similar Documents

Publication Publication Date Title
US20200311155A1 (en) Systems for and methods of finding relevant documents by analyzing tags
US10929487B1 (en) Customization of search results for search queries received from third party sites
US6430558B1 (en) Apparatus and methods for collaboratively searching knowledge databases
AU2009241626B2 (en) Social network powered query refinement and recommendations
USRE48437E1 (en) Collecting and scoring online references
US9443022B2 (en) Method, system, and graphical user interface for providing personalized recommendations of popular search queries
US9529861B2 (en) Method, system, and graphical user interface for improved search result displays via user-specified annotations
US8589391B1 (en) Method and system for generating web site ratings for a user
US20080288588A1 (en) Method and system for searching using image based tagging
US9092756B2 (en) Information-retrieval systems, methods and software with content relevancy enhancements
US20100161592A1 (en) Query Intent Determination Using Social Tagging
US7930287B2 (en) Systems and methods for compound searching
US9411895B2 (en) Personalized deeplinks for search results
US20100169756A1 (en) Automated bookmarking
US9275145B2 (en) Electronic document retrieval system with links to external documents
US20100049762A1 (en) Electronic document retrieval system
US20110208718A1 (en) Method and system for adding anchor identifiers to search results
US20120005183A1 (en) System and method for aggregating and interactive ranking of search engine results
WO2011018453A1 (en) Method and apparatus for searching documents

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10740239

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10740239

Country of ref document: EP

Kind code of ref document: A1