CA2210581C - Methods and/or systems for accessing information - Google Patents
Methods and/or systems for accessing information Download PDFInfo
- Publication number
- CA2210581C CA2210581C CA002210581A CA2210581A CA2210581C CA 2210581 C CA2210581 C CA 2210581C CA 002210581 A CA002210581 A CA 002210581A CA 2210581 A CA2210581 A CA 2210581A CA 2210581 C CA2210581 C CA 2210581C
- Authority
- CA
- Canada
- Prior art keywords
- information
- user
- keywords
- keyword
- meta
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9538—Presentation of query results
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99934—Query formulation, input preparation, or translation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99935—Query augmenting and refining, e.g. inexact access
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99937—Sorting
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99944—Object-oriented database structure
Abstract
A system for accessing information stored in a distributed information database provides a community of intelligent software agents (105). Each age nt (105) can be built as an extension of a known viewer (400) for a distributed information system such as the Internet WorldWide Web (W3). The agent (105) is effectively integrated with the viewer (400) and can extract pages by means of the viewer (400) for storage in an intelligent page store. The text from the information system is abstracted and is stored with additional information, optionally selected by the user. The agent-based access system uses keyword sets to locate information of interest to a user, together with user profiles such that pages being stored by one user can be notified to another whose profile indicates potential interest. The keyword sets can be extended by use of a thesaurus.
Description
METHODS AND/OR SYSTEMS FOR ACCESSING INFORMATION
The present invention relates to methods and/or systems for accessing information by means of a communications system.
The Internet WorIdWide Web is a known communications system based on a plurality of separate communications networks connected together. It provides a rich source of information from many different providers but this very richness creates a problem in accessing specific information as there is no central monitoring and control.
In 1982, the volume of scientific, corporate and technical information was doubling every 5 years. By 1988, it was doubling every 2.2 years and by 1992 every 1.6 years. With the expansion of the Internet and other networks the rate of increase will continue to increase. Key to the viability of such networks will be the ability to manage the information and provide users with the information they want, when they want it.
In "SIGIR '94. Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval", 3-6 July 1994, Dublin, Ireland; pages 272-281; M.Morita et al.: "Information Filtering Based on User Behavior Analysis and Best Match Text Retrieval", features of information filtering systems are discussed, including one holding user profiles defining, in terms of a list of keywords, a users' preferences for receiving information. The system filters incoming information on the basis of information contained in the user's profile, forwarding items of received information to the user in accord with that profile.
The present invention is not concerned with providing another tool for searching systems such as W3: there are already many of these. They are being added to frequently with ever increasing coverage of the Web and sophistication of search engines. Instead, embodiments of the present invention relate to the following problem: having found useful information on W3, how can it be stored for easy retrieval and how can other users likely to be interested in the information be identified and informed?
According to a first aspect of the present invention, there is provided an information access system, for accessing sets of information stored in a distributed manner and accessible by means of a communications network, the access system having:
i) an input for receiving a set of information;
ii) data storage, or means to access data storage, for storing at least one set of predetermined keywords;
iii) generation means, triggerable to generate at least one set of meta-information from the set of information received at the input, the meta-information including at least a pointer for the set of information when stored in said distributed manner, and to store said set of meta-information in the data storage;
iv) comparison means for comparing at least one of said at least one set of keywords with said at least one set of meta-information; and v) means for transmitting an alert message in dependence upon the result of the comparison.
In a useful configuration, at least one set of predetermined keywords may be associated with a specified user.
An agent might then be triggered to apply keyword sets to pages of information in (or being added to) the page store by different circumstances for different users. For instance, an agent might apply a first set of keywords in the course of a storage request from a first user. However, the agent might then apply one or more additional sets of keywords in order to notify one or more other users of the entry.
Preferably, a group of agents will share an intelligent page store, although there may be multiple intelligent page stores in or available to the access system as a whole. This sharing of a page store provides a way of enabling an agent to monitor new entries to the page store for notification to potentially interested users.
Embodiments of the present invention provide a distributed system of intelligent software agents which can be used to perform information tasks, for instance over the Internet WorIdWide Web, on behalf of a user or community of users. That is, software agents are used to store, retrieve, summarise and inform other agents about information found on W3.
According to a second aspect of the present invention, there is provided a method of monitoring information sets, stored in a distributed manner and accessible by means of a communications network, for the purpose of alerting a first user in accordance with alert criteria determined at least in part by said first user to an information set identified by a second user, the method comprising:
i) storing a user profile for each user, which profile comprises at least one set of keywords and an identifier for the user;
ii) detecting a request by the second user to store, in a data store, information relating to said identified information set;
iii) in response to the request, generating a set of meta-information, dependent on said identified information set, comprising at least a pointer to said identified information set when stored in said distributed manner;
iv) comparing the generated set of meta-information with a keyword set from the user profile for the first user; and v) in dependence upon the result from the comparison, transmitting an alert message addressed to the first user.
Network systems such as W3 are known and are built according to known architectures such as the client/server type of architecture and further detail is not therefore given herein.
Software agents provide a known approach to dealing with distributed rather than centralised computer-based systems. Each agent generally comprises functionality to perform a task or tasks on behalf of an entity (human or machine-based) in an autonomous manner, together with local data, or means to access data, to support the task or tasks. In the present specification, agents for use in storing or retrieving information in embodiments of the present invention are referred to for simplicity as "Jasper agents", this stemming from the acronym "Joint Access to Stored Pages with Easy Retrieval".
Given the vast amount of information available on W3, it is preferable to avoid the copying of information from its original location to a local server.
Indeed, it could be argued that such an approach is contrary to the whole ethos of the Web.
Rather than copying information, therefore, Jasper agents store only relevant "meta-information". As will be seen below, this meta-information can be thought of as being at a level above information itself, being about it rather than being actual information. It can include for instance keywords, a summary, document title, universal resource locator (URL) and date and time of access. This meta-information is then used to provide a pointer to, or to "index on", the actual information when a retrieval request is made.
Most known W3 clients (Mosaic, Netscape, and so on) provide some means of storing pages of interest to the user. Typically, this is done by allowing the user to create a (possibly hierarchical) menu of names associated with particular URLs.
While this menu facility is useful, it quickly becomes unwieldy when a reasonably large number of W3 pages are involved. Essentially, the representation provided is not rich enough to allow capture of all that might be required about the information stored: the user can only provide a string naming the page. As well as the fact that useful meta-information such as the date of access of the page is lost, a single phrase (the name) may not be enough to accurately index a page in all contexts.
Consider as a simple example information about the use of knowledge-based systems (KBS) in information retrieval of pharmacological data: in different contexts, it may be any of KBS, information retrieval or pharmacology which is of interest.
Unless a name is carefully chosen to mention all three aspects, the information will be missed in one of more of its useful contexts. This problem is analogous to the problem of finding files containing desired information in a Unix (or other) file system as described in the paper by Jones, W. P.; "On the applied use of human memory models: the memory extender personal filing system" published in Int J. Man-Machine Studies, 25, 191-228, 1986. In most filing systems however there is at least the facility of sorting files by creation date.
The solution to this problem adopted in embodiments of the present invention is to allow the user to access information by a much richer set of meta information. How Jasper agents achieve this and how the resulting meta-information is exploited is explained below.
An information access system according to an embodiment of the present invention will now be described, by way of example only, with reference to the accompanying Figures in which:
Figure 1 shows an information access system incorporating a Jasper agent system;
Figure 2 shows in schematic format a storage process offered by the access system;
Figure 3 shows the structure of an intelligent page store for use in the storage process of Figure 1;
Figure 4 shows in schematic format retrieval processes offered by the access system;
Figure 5 shows a flow diagram for the storage process of Figure 2;
Figures 6, 7 and 8 show flow diagrams for three information retrieval processes using a Jasper access system; and Figure 9 shows a keyword network generated using a clustering technique, for use in extending and/or applying user profiles in a Jasper system.
Referring to Figure 1, an information access system according to an embodiment of the present invention may be built into a known form of information retrieval architecture, such as a client-server type architecture connected to the Internet.
In more detail, a customer, such as an international company, may have multiple users equipped with personal computers or workstations 405. These may be connected via a World Wide Web (WWW) viewer 400 in the customer's client context to the customer's WWW file server 410. The Jasper agent 105, effectively an extension of the viewer 400, may be actually resident on the WWW file server 410.
The customer's WWW file server 410 is connected to the Internet in known manner, for instance via the customer's own network 415 and a router 420.
Service providers' file servers 425 can then be accessed via the Internet, again via routers.
Also resident on, or accessible by, the customer's file server 410 are a text summarising tool 120 and two data stores, one holding user profiles (the profile store 430) and the other fthe intelligent page store 100) holding principally metainformation for a document collection.
In a Jasper agent based system, the agent 105 itself can be built as an extension of a known viewer such as Netscape. The agent 105 is effectively integrated with the viewer 400, which might be provided by Netscape or by Mosaic etc, and can extract W3 pages from the viewer 400.
As described above, in the client-server architecture, the text summariser 120 and the user profile both sit on file in the customer file server 410 where the Jasper agent is resident. However, the Jasper agent 105 could alternatively appear in the customer's client context.
A Jasper agent, being a software agent, can generally be described as a software entity, incorporating functionality for performing a task or tasks on behalf of a user, together with local data, or access to local data, to support that task or tasks. The tasks relevant in a Jasper system, one or more of which may be carried out by a Jasper agent, are described below. The local data will usually include data from the intelligent page store 100 and the profile store 430, and the functionality to be provided by a Jasper agent will generally include means to apply a text summarising tool and store the results, access or read, and update, at least one user profile, means to compare keyword sets with other keyword sets, or metainformation, and means to trigger alert messages to users.
In preferred embodiments, a Jasper agent will also be provided with means to monitor user inputs for the purpose of selecting a keyword set to be compared.
In further preferred embodiments, a Jasper agent is provided with means to apply an algorithm in relation to first and second keyword sets to generate a measure of similarity therebetween. According to the measure of similarity, either the first or second keyword sets may then be proactively updated by the Jasper agent, or the result of comparing the first or second keyword sets with a third keyword set, or with metainformation, may be modified.
Embodiments of the present invention might be built according to different software systems. It might be convenient for instance that object-oriented techniques are applied. However, in embodiments as described below, the server will be Unix based and able to run ConText, a known natural language processing system offered by Oracle Corporation, and a W3 viewer. The system might generally be implemented in "C" although the client might potentially be any machine which can support a W3 viewer.
In the following section, the facilities which Jasper agents offer the user in managing information are discussed. These can be grouped in two categories, storage and retrieval.
Storacte Figures 2 and 5 show the actions taken when a Jasper agent 105 stores information in an intelligent page store (IPS) 100. The user 110 first finds a page of sufficient interest to be stored by the Jasper system in an IPS 100 associated with that user (STEP 501 ). The user 1 10 then transmits a 'store' request to the Jasper agent 105, resident on the customer's WWW file server 410, via a menu option on the user's selected W3 client 1 15 (Mosaic and Netscape versions are currently available on all platforms) (STEP 502). The Jasper agent 105 then invites the user 110 to supply an associated annotation, also to be stored (STEP
503). Typically, this might be the reason the user is interested in the page and can be very useful for other users in deciding which pages retrieved from the IPS
100 to visit. (Information sharing is further discussed below.) The Jasper agent 105 next extracts the source text from the page in question, again via the W3 client 1 15 on W3 (STEP 504). Source text is provided in a "HyperText" format and the Jasper agent 105 first strips out HyperText Markup Language (HTML) tags (STEP 505). The Jasper agent 105 then sends the text to a text summariser such as "ConText" 120 (STEP 506).
ConText 120 first parses a document to determine the syntactic structure of each sentence (STEP 507). The ConText parser is robust and able to deal with a wide range of the syntactic phenomena occurring in English sentences.
Following sentence level parsing, ConText 120 enters its 'concept processing' phase (STEP
508). Among the facilities offered are:
~ Information Extraction: a master index of a document's contents is computed, indexing over concepts, facts and definitions in the text.
~ Content Reduction: several levels of summarisation are available, ranging From a list of the document's main themes to a precis of the entire document.
~ Discourse Tracking: by tracking the discourse of a document, ConText can extract all the parts of a document which are particularly relevant to a certain concept.
ConText 120 is used by the Jasper agent 105 in a client-server architecture:
after parsing the documents, the server generates application-independent marked-g up versions (STEP 509). Calls from the Jasper agent 105 using an Applications Programming Interface (API) can then interpret the mark-ups. Using these API
calls, meta-information is obtained from the source text (STEP 510). The Jasper agent 105 first extracts a summary of the text of the page. The size of the summary can be controlled by the parameters passed to ConText 120 and the Jasper agent 105 ensures that a summary of 100-150 words is obtained. Using a further call to ConText 120, the Jasper agent 105 then derives a set of keywords from the source text. Following this, the user may optionally be presented with the opportunity to add further keywords via an HTML form 125 (STEP 51 1 ). In this way, keywords of particular relevance to the user can be provided, while the Jasper agent 105 supplies a set of keywords which may be of greater relevance to a wider community of users.
At the end of this process, the Jasper agent 105 has generated the following meta-information about the W3 page of interest:
~ the ConText-supplied general keywords;
~ user-specific keywords;
~ the user's annotations;
~ a summary of the page's content;
~ the document title;
~ universal resource location (URL) and ~ date and time of storage.
Referring additionally to Figure 3, the Jasper agent 105 then adds this meta-information for the page to files 130 of the IPS 100 (STEP 512). In the IPS
100, the keywords (of both types) are then used to index on files containing meta-information for other pages.
Retrieval There are three modes in which information can be retrieved from the IPS
100 using a Jasper agent 105. One is a standard keyword retrieval facility, while the other two are concerned with information sharing between a community of agents and their users. Each will be described in the sections below.
When a Jasper agent 105 is installed on a user's machine, the user provides a personal profile: a set of keywords which describe information the user is interested in obtaining via W3. This profile is held, or at least maintained, by the agent 105 in order to determine which pages are potentially of interest to a user.
Keyword Retrieval As shown in Figures 4, 6, 7 and 8, for straightforward keyword retrieval, the user supplies a set of keywords to the Jasper agent 105 via an HTML form 300 provided by the Jasper agent 105 (STEP 601 ). The Jasper agent 105 then retrieves the ten most closely matching pages held in IPS 100 (STEP 602), using a simple keyword matching and scoring algorithm. Keywords supplied by the user when the page was stored (as opposed to those extracted automatically by ConText) can be given extra weight in the matching process. The user can specify in advance a retrieval threshold below which pages will not be displayed. The agent 105 then dynamically constructs an HTML form 305 with a ranked list of links to the pages retrieved and their summaries (STEP 6031. Any annotation made by the original user is also shown, along with the scores of each retrieved page. This page is then presented to the user on their W3 client (STEP 604).
"What's New?" Facility Any user can ask a Jasper agent "What's new?" (STEP 701 ). The agent 105 then interrogates the IPS 100 and retrieves the most recently stored pages (STEP 702). It then determines which of these pages best match the user's profile, again based on a simple keyword matching and scoring algorithm (STEP 703). An HTML page is then presented to the user showing a ranked list of links to the recently stored pages which best match the user's profile, and also to other pages most recently stored in IPS (STEP 704) , with annotations where provided. Thus the user is provided with a view both of the pages recently stored and likely to be of most interest to the user, and a more general selection of recently stored pages (STEP 705).
A user can update the profile which his Jasper agent 105 holds at any time via an HTML form which allows him to add and/or delete keywords from the profile.
In this way, the user can effectively select different "contexts" in which to work.
A context is defined by a set of keywords (those making up the profile, or those specified in a retrieval query) and can be thought of as those types of information which a user is interested in at a given time.
The idea of applying human memory models to the filing of information was explored by Jones in the paper referenced above, in the context of computer filing systems. As he pointed out in the context of a conventional filing system, there is an analogy between a directory in a file system and a set of pages retrieved by a Jasper agent 105. The set of pages can be thought of as a dynamically-constructed directory, defined by the context in which it was retrieved. This is a highly flexible notion of 'directory' in two senses: first, pages which occur in this retrieval can of course occur in others, depending on the context; and, second, there is no sharp boundary to the directory: pages are 'in' the directory to a greater or lesser extent depending on their match to the current context. In the present approach, the number of ways of partitioning the information on the pages is thus only limited by the diversity and richness of the information itself.
Communication With Other Interested Agents Referring to Figure 8, when a page is stored in IPS 100 by a Jasper agent 105 (STEP 801 ), the agent 105 checks the profiles of other agents' users in its 'local community' (STEP 802). This local community could be any predetermined community. If the page matches a user's profile with a score above a certain threshold (STEP 803), a message, for instance an "email" message, can be automatically generated by the agent 105 and sent to the user concerned (STEP
804), informing him of the discovery of the page.
The email header might be for instance in the format:
JASPER KW: (keywords) This allows the user before reading the body of the message to identify it as being one from the Jasper system. Preferably, a list of keywords is provided and the user can assess the relative importance of the information to which the message refers. The keywords in the message header vary from user to user depending on the keywords from the page which match the keywords in their user profile, thus personalising the message to each user's interests. The message body itself can give further information such as the page title and URL, who stored the page and any annotation on the page which the storer provided.
The Jasper agent 105 and system described above provide the basis for an extremely useful way of accessing relevant information in a distributed arrangement such as W3. Variations and extensions may be made in a system without departing from the scope of the present invention. For instance, at a relatively simple level, improved retrieval techniques might be employed. As examples, vector space or probabilistic models might be used, as described by G Salton in "Automatic Text Processing", published in 1989 by Addison-Wesley in Reading, Massachusetts, USA.
Alternatively, indexing might be made more versatile by providing indexing on meta-information other than keywords. For instance, extra meta-information might be the date of storage of a page and the originating site of the page (which Jasper can extract from the URL.) These extra indices allow users (via an HTML
form) to frame commands of the type:
Show me all pages l stored in 1994 from Cambridge University about artificial intelligence and information retrieval.
In another alternative version, a thesaurus might be used by Jasper agents 105 to exploit keyword synonyms. This reduces the importance of entering precisely the same keywords as were used when a page was stored. Indeed, it is possible to exploit the use of a thesaurus in several other areas, including the personal profiles which an agent 105 holds for its user.
Adaptive Aaents The use of user profiles by Jasper agents 105 to determine information relevant to their users, though powerful can be improved. When the user wants to change context (perhaps refocussing from one task to another, or from work to leisurel, the user profile must be respecified by adding and/or deleting keywords. A
better approach is for the agent to change the user's profile as the interests of the user change over time. This change of context can occur in two ways: there can be a short-term switch of context from, for example, work to leisure. The agent can identify this from a list of current contexts it holds for a user and change into the new context. This change could be triggered, for example, when a new page of different information type is visited by the user. There can also be longer term changes in the contexts the agent holds based on evolving interests of the user.
These changes can be inferred from observation of the user by the agent. For instance, known techniques which might be employed in an adaptive agent include genetic algorithms, learning from feedback and memory-based reasoning. Such techniques are disclosed in an internal report of the MIT made available in 1993, by Sheth B. & Maes. P., called "Evolving Agents for Personalised Information Filtering".
Intearation of Remote and Local Information Another possible variation of a Jasper system would be to integrate the user's own computer filing system with the IPS 100, so that information found on W3 and on the local machine would appear homogenous to the user at the top level.
Files could then be accessed similarly to the way in which Jasper agents 105 access W3 pages, freeing the user from the constraints of name-oriented filing systems and providing a contents-addressable interface to both local and remote information of all kinds.
Clustering in Jasper Systems The Jasper IPS 100 and the related documents can essentially be called a collection; it is a set of documents indexed by keywords. It differs from a 'traditional' collection in that the documents are typically located remotely from the index; the index (the IPS 100) actually points to a URL which specifies the location of the document on the Internet. Furthermore, various additional pieces of meta-information are attached to documents in a Jasper system, such as the user who stored the page, when it was stored, any annotation the user may have provided and so forth.
One important area where a Jasper system differs from most document collections is that each document has been entered in the IPS 100 by a user who made a conscious decision to mark it as a piece of information which he and his peers would be likely to find useful in the future. This, along with the meta-information held, makes a Jasper IPS 100 a very rich source of information.
It has also been examined whether known Information Retrieval (IR) techniques can beneficially applied to the Jasper IPS 100. In particular, the use of clustering has been under investigation.
Clusterina Documents Using known IR techniques, Jasper's term-document matrix can be used to calculate a similarity matrix for the documents identified in the Jasper IPS
100. The similarity matrix gives a measure of the similarity of documents identified in the store. For each pair of documents the Dice coefficient is calculated. For two documents Di and Dj.
2~" [Di n Dj]/[Di] + [Dj]
where [X] is the number of terms in X and XnY is the number of terms co-occurring in X and Y. This coefficient yields a number between 0 and 1. A
coefficient of zero implies two documents have no terms in common, while a coefficient of 1 implies that the sets of terms occurring in each document are identical. The similarity matrix, Sim say, represents the similarity of each pair of documents in the store, so that for each pair of documents i and j.
Sim (i,j) = 2*~ [Di n Dj] / [Di] + [Dj]
This matrix can be used to create clusters of related documents automatically, using the hierarchical agglomerative clustering process described in "Hierarchic Agglomerative Clustering Methods for Automatic Document Classification" by Griffiths A et al in the Journal of Documentation, 40:3, September 1984, pp 175-205. In such a process, each document is initially placed in a cluster by itself and the two most similar such clusters are then combined into a larger cluster, for which similarities with each of the other clusters must then be computed.
This combination process is continued until only a single cluster of documents remains at the highest level.
The way in which similarity between clusters (as opposed to individual documents) is calculated can be varied. For a Jasper store, "complete-/ink clustering" can be employed. In complete-link clustering, the similarity between the least similar pair of documents from the two clusters is used as the cluster similarity.
The resulting cluster structures of the Jasper store can then be used to create a three-dimensional (3D) front end onto the Jasper system using the VRML
(Virtual Reality Modelling Language). (VRML is a known language for 3D
graphical spaces or virtual worlds networked via the global Internet and hyperlinked within the World Wide Web).
Clusterina Keywords Keywords (terms) occurring in relation to a particular JASPER document collection can also be clustered in a way which mirrors exactly the document cluster technique described above: a similarity matrix for the keywords in the Jasper store can be constructed which gives a measure of the 'similarity' of keywords in the store. For each pair of documents, the Dice coefficient is calculated. For two keywords Ki and Kj, the Dice coefficient is given by:
2'" [Ki n Kj] / [Ki] + [Kj]
where [X] is the number of documents in which X occurs and X nY is the number of documents in which X and Y co-occur.
Once the similarity matrix for a Jasper store is calculated, however, it is not necessary to cluster the keywords as the documents were clustered. Instead it is possible to exploit the matrix itself in two ways, described below.
The first way is profile enhancement. Here, the user profile can be enhanced by using those keywords most similar to the keywords in the user's profile.
Thus for example, if the words virtual, reality and Internet are part of a user's profile but VRML is not, an enhanced profile might add VRML to the original profile (assuming VRML is clustered close to virtual, reality and Internet). In this way, documents containing VRML but not virtual, reality and Internet may be retrieved whereas they would not have been with the unenhanced profile.
Figure 9 shows an example network of keywords 900 which has been built from the keyword similarity matrix extracted from a current Jasper store. The algorithm is straightforward: given an initial starting keyword, find the four words IS
most similar to it from the similarity matrix. Link these four to the original word and repeat the process for each of the four new words. This can be repeated a number of times (in Figure 9, three times). Double lines 901 between two words indicate that both words occur in the other's four most similar keywords. One could of course attach the particular similarity coefficients to each link for finer-grained information concerning the degree of similarity between words.
The second way is proactive searching. The keywords comprising a user's profile can be used to search for new WWW pages relevant to their interest proactively by Jasper, which can then present a list of new pages which the user may be interested in without the user having to carry out a search explicitly.
These proactive searches can be carried out by a Jasper system at some given interval, such as weekly. Clustering is useful here because a profile may reflect more than one interest. Consider, for example, the following user profile: Internet, WWW, html, football, Manchester, united, linguistics, parsing, pragmatics. Clearly, three separate interests are represented in the above profile and searching on each separately is likely to yield far superior results than merely entering the whole profile as a query for the given user. Clustering keywords from the document collection can automate the process of query generation for proactive searching by a user's Jasper agent.
When the search results are obtained by Jasper, they can be summarised and matched against the user's profile in the usual way to give a prioritised list of new URLs along with locally held summaries.
Other text summarisers may be used in place of ConText. For instance, NetSumm is a summarising tool made available by British Telecommunications plc on the Internet, at http://www.labs.bt.com/innovate/informat/netsumm/index.htm.
Although described in relation to locating information via Internet, embodiments of the present invention might be found useful for locating information on other systems, such as documents on a user's internal systems which are in HyperText.
Further to the inventive aspects of the present system set out in the introduction to this specification, the following should also be viewed as expressions of novel and advantageous features of the system:
A method of monitoring information inputs to a data store, the inputs being requested by any of a plurality of users, for the purpose of alerting a first user to an input by a second user in accordance with alert criteria determined at least in part by said first user, the method comprising:
i) storing a user profile for each user, which profile comprises at least one set of keywords and an identifier for the user;
ii) detecting a request by the second user for an information input to the data store;
iii) processing the request to generate the information input;
iv) comparing the information input with a keyword set from the user profile for the first user; and v) in the event of a positive result from the comparison, transmitting an alert message addressed to the first user.
A method as above which further comprises monitoring information input requests by respective users and, on detection of a significant change in the information input requests made by a particular user, changing the keyword set used in step iv) for that particular user in the event of an information input request by a different user.
A method as above wherein each information input includes at least one set of keywords associated with a respective document, and wherein the method further comprises the steps of generating a similarity matrix for at least two of said sets of keywords, and using said similarity matrix to extend the scope of a keyword set from a user profile in step iv) so as to obtain an increase in the number of positive results for the associated user.
A method as above which further comprises the step of applying a clustering algorithm to a keyword set from a user profile so as to divide the keyword set into sub-keyword sets and applying at least one of the sub-keyword sets in place of the full keyword set in step iv).
The present invention relates to methods and/or systems for accessing information by means of a communications system.
The Internet WorIdWide Web is a known communications system based on a plurality of separate communications networks connected together. It provides a rich source of information from many different providers but this very richness creates a problem in accessing specific information as there is no central monitoring and control.
In 1982, the volume of scientific, corporate and technical information was doubling every 5 years. By 1988, it was doubling every 2.2 years and by 1992 every 1.6 years. With the expansion of the Internet and other networks the rate of increase will continue to increase. Key to the viability of such networks will be the ability to manage the information and provide users with the information they want, when they want it.
In "SIGIR '94. Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval", 3-6 July 1994, Dublin, Ireland; pages 272-281; M.Morita et al.: "Information Filtering Based on User Behavior Analysis and Best Match Text Retrieval", features of information filtering systems are discussed, including one holding user profiles defining, in terms of a list of keywords, a users' preferences for receiving information. The system filters incoming information on the basis of information contained in the user's profile, forwarding items of received information to the user in accord with that profile.
The present invention is not concerned with providing another tool for searching systems such as W3: there are already many of these. They are being added to frequently with ever increasing coverage of the Web and sophistication of search engines. Instead, embodiments of the present invention relate to the following problem: having found useful information on W3, how can it be stored for easy retrieval and how can other users likely to be interested in the information be identified and informed?
According to a first aspect of the present invention, there is provided an information access system, for accessing sets of information stored in a distributed manner and accessible by means of a communications network, the access system having:
i) an input for receiving a set of information;
ii) data storage, or means to access data storage, for storing at least one set of predetermined keywords;
iii) generation means, triggerable to generate at least one set of meta-information from the set of information received at the input, the meta-information including at least a pointer for the set of information when stored in said distributed manner, and to store said set of meta-information in the data storage;
iv) comparison means for comparing at least one of said at least one set of keywords with said at least one set of meta-information; and v) means for transmitting an alert message in dependence upon the result of the comparison.
In a useful configuration, at least one set of predetermined keywords may be associated with a specified user.
An agent might then be triggered to apply keyword sets to pages of information in (or being added to) the page store by different circumstances for different users. For instance, an agent might apply a first set of keywords in the course of a storage request from a first user. However, the agent might then apply one or more additional sets of keywords in order to notify one or more other users of the entry.
Preferably, a group of agents will share an intelligent page store, although there may be multiple intelligent page stores in or available to the access system as a whole. This sharing of a page store provides a way of enabling an agent to monitor new entries to the page store for notification to potentially interested users.
Embodiments of the present invention provide a distributed system of intelligent software agents which can be used to perform information tasks, for instance over the Internet WorIdWide Web, on behalf of a user or community of users. That is, software agents are used to store, retrieve, summarise and inform other agents about information found on W3.
According to a second aspect of the present invention, there is provided a method of monitoring information sets, stored in a distributed manner and accessible by means of a communications network, for the purpose of alerting a first user in accordance with alert criteria determined at least in part by said first user to an information set identified by a second user, the method comprising:
i) storing a user profile for each user, which profile comprises at least one set of keywords and an identifier for the user;
ii) detecting a request by the second user to store, in a data store, information relating to said identified information set;
iii) in response to the request, generating a set of meta-information, dependent on said identified information set, comprising at least a pointer to said identified information set when stored in said distributed manner;
iv) comparing the generated set of meta-information with a keyword set from the user profile for the first user; and v) in dependence upon the result from the comparison, transmitting an alert message addressed to the first user.
Network systems such as W3 are known and are built according to known architectures such as the client/server type of architecture and further detail is not therefore given herein.
Software agents provide a known approach to dealing with distributed rather than centralised computer-based systems. Each agent generally comprises functionality to perform a task or tasks on behalf of an entity (human or machine-based) in an autonomous manner, together with local data, or means to access data, to support the task or tasks. In the present specification, agents for use in storing or retrieving information in embodiments of the present invention are referred to for simplicity as "Jasper agents", this stemming from the acronym "Joint Access to Stored Pages with Easy Retrieval".
Given the vast amount of information available on W3, it is preferable to avoid the copying of information from its original location to a local server.
Indeed, it could be argued that such an approach is contrary to the whole ethos of the Web.
Rather than copying information, therefore, Jasper agents store only relevant "meta-information". As will be seen below, this meta-information can be thought of as being at a level above information itself, being about it rather than being actual information. It can include for instance keywords, a summary, document title, universal resource locator (URL) and date and time of access. This meta-information is then used to provide a pointer to, or to "index on", the actual information when a retrieval request is made.
Most known W3 clients (Mosaic, Netscape, and so on) provide some means of storing pages of interest to the user. Typically, this is done by allowing the user to create a (possibly hierarchical) menu of names associated with particular URLs.
While this menu facility is useful, it quickly becomes unwieldy when a reasonably large number of W3 pages are involved. Essentially, the representation provided is not rich enough to allow capture of all that might be required about the information stored: the user can only provide a string naming the page. As well as the fact that useful meta-information such as the date of access of the page is lost, a single phrase (the name) may not be enough to accurately index a page in all contexts.
Consider as a simple example information about the use of knowledge-based systems (KBS) in information retrieval of pharmacological data: in different contexts, it may be any of KBS, information retrieval or pharmacology which is of interest.
Unless a name is carefully chosen to mention all three aspects, the information will be missed in one of more of its useful contexts. This problem is analogous to the problem of finding files containing desired information in a Unix (or other) file system as described in the paper by Jones, W. P.; "On the applied use of human memory models: the memory extender personal filing system" published in Int J. Man-Machine Studies, 25, 191-228, 1986. In most filing systems however there is at least the facility of sorting files by creation date.
The solution to this problem adopted in embodiments of the present invention is to allow the user to access information by a much richer set of meta information. How Jasper agents achieve this and how the resulting meta-information is exploited is explained below.
An information access system according to an embodiment of the present invention will now be described, by way of example only, with reference to the accompanying Figures in which:
Figure 1 shows an information access system incorporating a Jasper agent system;
Figure 2 shows in schematic format a storage process offered by the access system;
Figure 3 shows the structure of an intelligent page store for use in the storage process of Figure 1;
Figure 4 shows in schematic format retrieval processes offered by the access system;
Figure 5 shows a flow diagram for the storage process of Figure 2;
Figures 6, 7 and 8 show flow diagrams for three information retrieval processes using a Jasper access system; and Figure 9 shows a keyword network generated using a clustering technique, for use in extending and/or applying user profiles in a Jasper system.
Referring to Figure 1, an information access system according to an embodiment of the present invention may be built into a known form of information retrieval architecture, such as a client-server type architecture connected to the Internet.
In more detail, a customer, such as an international company, may have multiple users equipped with personal computers or workstations 405. These may be connected via a World Wide Web (WWW) viewer 400 in the customer's client context to the customer's WWW file server 410. The Jasper agent 105, effectively an extension of the viewer 400, may be actually resident on the WWW file server 410.
The customer's WWW file server 410 is connected to the Internet in known manner, for instance via the customer's own network 415 and a router 420.
Service providers' file servers 425 can then be accessed via the Internet, again via routers.
Also resident on, or accessible by, the customer's file server 410 are a text summarising tool 120 and two data stores, one holding user profiles (the profile store 430) and the other fthe intelligent page store 100) holding principally metainformation for a document collection.
In a Jasper agent based system, the agent 105 itself can be built as an extension of a known viewer such as Netscape. The agent 105 is effectively integrated with the viewer 400, which might be provided by Netscape or by Mosaic etc, and can extract W3 pages from the viewer 400.
As described above, in the client-server architecture, the text summariser 120 and the user profile both sit on file in the customer file server 410 where the Jasper agent is resident. However, the Jasper agent 105 could alternatively appear in the customer's client context.
A Jasper agent, being a software agent, can generally be described as a software entity, incorporating functionality for performing a task or tasks on behalf of a user, together with local data, or access to local data, to support that task or tasks. The tasks relevant in a Jasper system, one or more of which may be carried out by a Jasper agent, are described below. The local data will usually include data from the intelligent page store 100 and the profile store 430, and the functionality to be provided by a Jasper agent will generally include means to apply a text summarising tool and store the results, access or read, and update, at least one user profile, means to compare keyword sets with other keyword sets, or metainformation, and means to trigger alert messages to users.
In preferred embodiments, a Jasper agent will also be provided with means to monitor user inputs for the purpose of selecting a keyword set to be compared.
In further preferred embodiments, a Jasper agent is provided with means to apply an algorithm in relation to first and second keyword sets to generate a measure of similarity therebetween. According to the measure of similarity, either the first or second keyword sets may then be proactively updated by the Jasper agent, or the result of comparing the first or second keyword sets with a third keyword set, or with metainformation, may be modified.
Embodiments of the present invention might be built according to different software systems. It might be convenient for instance that object-oriented techniques are applied. However, in embodiments as described below, the server will be Unix based and able to run ConText, a known natural language processing system offered by Oracle Corporation, and a W3 viewer. The system might generally be implemented in "C" although the client might potentially be any machine which can support a W3 viewer.
In the following section, the facilities which Jasper agents offer the user in managing information are discussed. These can be grouped in two categories, storage and retrieval.
Storacte Figures 2 and 5 show the actions taken when a Jasper agent 105 stores information in an intelligent page store (IPS) 100. The user 110 first finds a page of sufficient interest to be stored by the Jasper system in an IPS 100 associated with that user (STEP 501 ). The user 1 10 then transmits a 'store' request to the Jasper agent 105, resident on the customer's WWW file server 410, via a menu option on the user's selected W3 client 1 15 (Mosaic and Netscape versions are currently available on all platforms) (STEP 502). The Jasper agent 105 then invites the user 110 to supply an associated annotation, also to be stored (STEP
503). Typically, this might be the reason the user is interested in the page and can be very useful for other users in deciding which pages retrieved from the IPS
100 to visit. (Information sharing is further discussed below.) The Jasper agent 105 next extracts the source text from the page in question, again via the W3 client 1 15 on W3 (STEP 504). Source text is provided in a "HyperText" format and the Jasper agent 105 first strips out HyperText Markup Language (HTML) tags (STEP 505). The Jasper agent 105 then sends the text to a text summariser such as "ConText" 120 (STEP 506).
ConText 120 first parses a document to determine the syntactic structure of each sentence (STEP 507). The ConText parser is robust and able to deal with a wide range of the syntactic phenomena occurring in English sentences.
Following sentence level parsing, ConText 120 enters its 'concept processing' phase (STEP
508). Among the facilities offered are:
~ Information Extraction: a master index of a document's contents is computed, indexing over concepts, facts and definitions in the text.
~ Content Reduction: several levels of summarisation are available, ranging From a list of the document's main themes to a precis of the entire document.
~ Discourse Tracking: by tracking the discourse of a document, ConText can extract all the parts of a document which are particularly relevant to a certain concept.
ConText 120 is used by the Jasper agent 105 in a client-server architecture:
after parsing the documents, the server generates application-independent marked-g up versions (STEP 509). Calls from the Jasper agent 105 using an Applications Programming Interface (API) can then interpret the mark-ups. Using these API
calls, meta-information is obtained from the source text (STEP 510). The Jasper agent 105 first extracts a summary of the text of the page. The size of the summary can be controlled by the parameters passed to ConText 120 and the Jasper agent 105 ensures that a summary of 100-150 words is obtained. Using a further call to ConText 120, the Jasper agent 105 then derives a set of keywords from the source text. Following this, the user may optionally be presented with the opportunity to add further keywords via an HTML form 125 (STEP 51 1 ). In this way, keywords of particular relevance to the user can be provided, while the Jasper agent 105 supplies a set of keywords which may be of greater relevance to a wider community of users.
At the end of this process, the Jasper agent 105 has generated the following meta-information about the W3 page of interest:
~ the ConText-supplied general keywords;
~ user-specific keywords;
~ the user's annotations;
~ a summary of the page's content;
~ the document title;
~ universal resource location (URL) and ~ date and time of storage.
Referring additionally to Figure 3, the Jasper agent 105 then adds this meta-information for the page to files 130 of the IPS 100 (STEP 512). In the IPS
100, the keywords (of both types) are then used to index on files containing meta-information for other pages.
Retrieval There are three modes in which information can be retrieved from the IPS
100 using a Jasper agent 105. One is a standard keyword retrieval facility, while the other two are concerned with information sharing between a community of agents and their users. Each will be described in the sections below.
When a Jasper agent 105 is installed on a user's machine, the user provides a personal profile: a set of keywords which describe information the user is interested in obtaining via W3. This profile is held, or at least maintained, by the agent 105 in order to determine which pages are potentially of interest to a user.
Keyword Retrieval As shown in Figures 4, 6, 7 and 8, for straightforward keyword retrieval, the user supplies a set of keywords to the Jasper agent 105 via an HTML form 300 provided by the Jasper agent 105 (STEP 601 ). The Jasper agent 105 then retrieves the ten most closely matching pages held in IPS 100 (STEP 602), using a simple keyword matching and scoring algorithm. Keywords supplied by the user when the page was stored (as opposed to those extracted automatically by ConText) can be given extra weight in the matching process. The user can specify in advance a retrieval threshold below which pages will not be displayed. The agent 105 then dynamically constructs an HTML form 305 with a ranked list of links to the pages retrieved and their summaries (STEP 6031. Any annotation made by the original user is also shown, along with the scores of each retrieved page. This page is then presented to the user on their W3 client (STEP 604).
"What's New?" Facility Any user can ask a Jasper agent "What's new?" (STEP 701 ). The agent 105 then interrogates the IPS 100 and retrieves the most recently stored pages (STEP 702). It then determines which of these pages best match the user's profile, again based on a simple keyword matching and scoring algorithm (STEP 703). An HTML page is then presented to the user showing a ranked list of links to the recently stored pages which best match the user's profile, and also to other pages most recently stored in IPS (STEP 704) , with annotations where provided. Thus the user is provided with a view both of the pages recently stored and likely to be of most interest to the user, and a more general selection of recently stored pages (STEP 705).
A user can update the profile which his Jasper agent 105 holds at any time via an HTML form which allows him to add and/or delete keywords from the profile.
In this way, the user can effectively select different "contexts" in which to work.
A context is defined by a set of keywords (those making up the profile, or those specified in a retrieval query) and can be thought of as those types of information which a user is interested in at a given time.
The idea of applying human memory models to the filing of information was explored by Jones in the paper referenced above, in the context of computer filing systems. As he pointed out in the context of a conventional filing system, there is an analogy between a directory in a file system and a set of pages retrieved by a Jasper agent 105. The set of pages can be thought of as a dynamically-constructed directory, defined by the context in which it was retrieved. This is a highly flexible notion of 'directory' in two senses: first, pages which occur in this retrieval can of course occur in others, depending on the context; and, second, there is no sharp boundary to the directory: pages are 'in' the directory to a greater or lesser extent depending on their match to the current context. In the present approach, the number of ways of partitioning the information on the pages is thus only limited by the diversity and richness of the information itself.
Communication With Other Interested Agents Referring to Figure 8, when a page is stored in IPS 100 by a Jasper agent 105 (STEP 801 ), the agent 105 checks the profiles of other agents' users in its 'local community' (STEP 802). This local community could be any predetermined community. If the page matches a user's profile with a score above a certain threshold (STEP 803), a message, for instance an "email" message, can be automatically generated by the agent 105 and sent to the user concerned (STEP
804), informing him of the discovery of the page.
The email header might be for instance in the format:
JASPER KW: (keywords) This allows the user before reading the body of the message to identify it as being one from the Jasper system. Preferably, a list of keywords is provided and the user can assess the relative importance of the information to which the message refers. The keywords in the message header vary from user to user depending on the keywords from the page which match the keywords in their user profile, thus personalising the message to each user's interests. The message body itself can give further information such as the page title and URL, who stored the page and any annotation on the page which the storer provided.
The Jasper agent 105 and system described above provide the basis for an extremely useful way of accessing relevant information in a distributed arrangement such as W3. Variations and extensions may be made in a system without departing from the scope of the present invention. For instance, at a relatively simple level, improved retrieval techniques might be employed. As examples, vector space or probabilistic models might be used, as described by G Salton in "Automatic Text Processing", published in 1989 by Addison-Wesley in Reading, Massachusetts, USA.
Alternatively, indexing might be made more versatile by providing indexing on meta-information other than keywords. For instance, extra meta-information might be the date of storage of a page and the originating site of the page (which Jasper can extract from the URL.) These extra indices allow users (via an HTML
form) to frame commands of the type:
Show me all pages l stored in 1994 from Cambridge University about artificial intelligence and information retrieval.
In another alternative version, a thesaurus might be used by Jasper agents 105 to exploit keyword synonyms. This reduces the importance of entering precisely the same keywords as were used when a page was stored. Indeed, it is possible to exploit the use of a thesaurus in several other areas, including the personal profiles which an agent 105 holds for its user.
Adaptive Aaents The use of user profiles by Jasper agents 105 to determine information relevant to their users, though powerful can be improved. When the user wants to change context (perhaps refocussing from one task to another, or from work to leisurel, the user profile must be respecified by adding and/or deleting keywords. A
better approach is for the agent to change the user's profile as the interests of the user change over time. This change of context can occur in two ways: there can be a short-term switch of context from, for example, work to leisure. The agent can identify this from a list of current contexts it holds for a user and change into the new context. This change could be triggered, for example, when a new page of different information type is visited by the user. There can also be longer term changes in the contexts the agent holds based on evolving interests of the user.
These changes can be inferred from observation of the user by the agent. For instance, known techniques which might be employed in an adaptive agent include genetic algorithms, learning from feedback and memory-based reasoning. Such techniques are disclosed in an internal report of the MIT made available in 1993, by Sheth B. & Maes. P., called "Evolving Agents for Personalised Information Filtering".
Intearation of Remote and Local Information Another possible variation of a Jasper system would be to integrate the user's own computer filing system with the IPS 100, so that information found on W3 and on the local machine would appear homogenous to the user at the top level.
Files could then be accessed similarly to the way in which Jasper agents 105 access W3 pages, freeing the user from the constraints of name-oriented filing systems and providing a contents-addressable interface to both local and remote information of all kinds.
Clustering in Jasper Systems The Jasper IPS 100 and the related documents can essentially be called a collection; it is a set of documents indexed by keywords. It differs from a 'traditional' collection in that the documents are typically located remotely from the index; the index (the IPS 100) actually points to a URL which specifies the location of the document on the Internet. Furthermore, various additional pieces of meta-information are attached to documents in a Jasper system, such as the user who stored the page, when it was stored, any annotation the user may have provided and so forth.
One important area where a Jasper system differs from most document collections is that each document has been entered in the IPS 100 by a user who made a conscious decision to mark it as a piece of information which he and his peers would be likely to find useful in the future. This, along with the meta-information held, makes a Jasper IPS 100 a very rich source of information.
It has also been examined whether known Information Retrieval (IR) techniques can beneficially applied to the Jasper IPS 100. In particular, the use of clustering has been under investigation.
Clusterina Documents Using known IR techniques, Jasper's term-document matrix can be used to calculate a similarity matrix for the documents identified in the Jasper IPS
100. The similarity matrix gives a measure of the similarity of documents identified in the store. For each pair of documents the Dice coefficient is calculated. For two documents Di and Dj.
2~" [Di n Dj]/[Di] + [Dj]
where [X] is the number of terms in X and XnY is the number of terms co-occurring in X and Y. This coefficient yields a number between 0 and 1. A
coefficient of zero implies two documents have no terms in common, while a coefficient of 1 implies that the sets of terms occurring in each document are identical. The similarity matrix, Sim say, represents the similarity of each pair of documents in the store, so that for each pair of documents i and j.
Sim (i,j) = 2*~ [Di n Dj] / [Di] + [Dj]
This matrix can be used to create clusters of related documents automatically, using the hierarchical agglomerative clustering process described in "Hierarchic Agglomerative Clustering Methods for Automatic Document Classification" by Griffiths A et al in the Journal of Documentation, 40:3, September 1984, pp 175-205. In such a process, each document is initially placed in a cluster by itself and the two most similar such clusters are then combined into a larger cluster, for which similarities with each of the other clusters must then be computed.
This combination process is continued until only a single cluster of documents remains at the highest level.
The way in which similarity between clusters (as opposed to individual documents) is calculated can be varied. For a Jasper store, "complete-/ink clustering" can be employed. In complete-link clustering, the similarity between the least similar pair of documents from the two clusters is used as the cluster similarity.
The resulting cluster structures of the Jasper store can then be used to create a three-dimensional (3D) front end onto the Jasper system using the VRML
(Virtual Reality Modelling Language). (VRML is a known language for 3D
graphical spaces or virtual worlds networked via the global Internet and hyperlinked within the World Wide Web).
Clusterina Keywords Keywords (terms) occurring in relation to a particular JASPER document collection can also be clustered in a way which mirrors exactly the document cluster technique described above: a similarity matrix for the keywords in the Jasper store can be constructed which gives a measure of the 'similarity' of keywords in the store. For each pair of documents, the Dice coefficient is calculated. For two keywords Ki and Kj, the Dice coefficient is given by:
2'" [Ki n Kj] / [Ki] + [Kj]
where [X] is the number of documents in which X occurs and X nY is the number of documents in which X and Y co-occur.
Once the similarity matrix for a Jasper store is calculated, however, it is not necessary to cluster the keywords as the documents were clustered. Instead it is possible to exploit the matrix itself in two ways, described below.
The first way is profile enhancement. Here, the user profile can be enhanced by using those keywords most similar to the keywords in the user's profile.
Thus for example, if the words virtual, reality and Internet are part of a user's profile but VRML is not, an enhanced profile might add VRML to the original profile (assuming VRML is clustered close to virtual, reality and Internet). In this way, documents containing VRML but not virtual, reality and Internet may be retrieved whereas they would not have been with the unenhanced profile.
Figure 9 shows an example network of keywords 900 which has been built from the keyword similarity matrix extracted from a current Jasper store. The algorithm is straightforward: given an initial starting keyword, find the four words IS
most similar to it from the similarity matrix. Link these four to the original word and repeat the process for each of the four new words. This can be repeated a number of times (in Figure 9, three times). Double lines 901 between two words indicate that both words occur in the other's four most similar keywords. One could of course attach the particular similarity coefficients to each link for finer-grained information concerning the degree of similarity between words.
The second way is proactive searching. The keywords comprising a user's profile can be used to search for new WWW pages relevant to their interest proactively by Jasper, which can then present a list of new pages which the user may be interested in without the user having to carry out a search explicitly.
These proactive searches can be carried out by a Jasper system at some given interval, such as weekly. Clustering is useful here because a profile may reflect more than one interest. Consider, for example, the following user profile: Internet, WWW, html, football, Manchester, united, linguistics, parsing, pragmatics. Clearly, three separate interests are represented in the above profile and searching on each separately is likely to yield far superior results than merely entering the whole profile as a query for the given user. Clustering keywords from the document collection can automate the process of query generation for proactive searching by a user's Jasper agent.
When the search results are obtained by Jasper, they can be summarised and matched against the user's profile in the usual way to give a prioritised list of new URLs along with locally held summaries.
Other text summarisers may be used in place of ConText. For instance, NetSumm is a summarising tool made available by British Telecommunications plc on the Internet, at http://www.labs.bt.com/innovate/informat/netsumm/index.htm.
Although described in relation to locating information via Internet, embodiments of the present invention might be found useful for locating information on other systems, such as documents on a user's internal systems which are in HyperText.
Further to the inventive aspects of the present system set out in the introduction to this specification, the following should also be viewed as expressions of novel and advantageous features of the system:
A method of monitoring information inputs to a data store, the inputs being requested by any of a plurality of users, for the purpose of alerting a first user to an input by a second user in accordance with alert criteria determined at least in part by said first user, the method comprising:
i) storing a user profile for each user, which profile comprises at least one set of keywords and an identifier for the user;
ii) detecting a request by the second user for an information input to the data store;
iii) processing the request to generate the information input;
iv) comparing the information input with a keyword set from the user profile for the first user; and v) in the event of a positive result from the comparison, transmitting an alert message addressed to the first user.
A method as above which further comprises monitoring information input requests by respective users and, on detection of a significant change in the information input requests made by a particular user, changing the keyword set used in step iv) for that particular user in the event of an information input request by a different user.
A method as above wherein each information input includes at least one set of keywords associated with a respective document, and wherein the method further comprises the steps of generating a similarity matrix for at least two of said sets of keywords, and using said similarity matrix to extend the scope of a keyword set from a user profile in step iv) so as to obtain an increase in the number of positive results for the associated user.
A method as above which further comprises the step of applying a clustering algorithm to a keyword set from a user profile so as to divide the keyword set into sub-keyword sets and applying at least one of the sub-keyword sets in place of the full keyword set in step iv).
Claims (14)
1 . An information access system, for accessing sets of information stored in a distributed manner and accessible by means of a communications network, the access system having:
i) an input for receiving a set of information;
ii) data storage, or means to access data storage, for storing at least one set of predetermined keywords;
iii) generation means, triggerable to generate at least one set of meta-information from the set of information received at the input, the meta-information including at least a pointer for the set of information when stored in said distributed manner, and to store said set of meta-information in the data storage;
iv) comparison means for comparing at least one of said at least one set of keywords with said at least one set of meta-information; and v) means for transmitting an alert message in dependence upon the result of the comparison.
i) an input for receiving a set of information;
ii) data storage, or means to access data storage, for storing at least one set of predetermined keywords;
iii) generation means, triggerable to generate at least one set of meta-information from the set of information received at the input, the meta-information including at least a pointer for the set of information when stored in said distributed manner, and to store said set of meta-information in the data storage;
iv) comparison means for comparing at least one of said at least one set of keywords with said at least one set of meta-information; and v) means for transmitting an alert message in dependence upon the result of the comparison.
2. A system according to claim 1, wherein said at least one set of predetermined keywords is associated with a specified user and the system includes means to address the alert message to that user.
3. A system according to claim 1 or claim 2, for use by a plurality of users, each of the plurality of users having at least one associated set of keywords stored in said data storage, wherein the system is triggerable, on activation of said generation means to generate a set of meta-information by a first user, to compare said at least one set of meta-information with at least one set of predetermined keywords associated with a second user and to address an alert message to said second user, in dependence upon the result of the comparison, alerting said second user to the received set of information.
4. A system according to any one of claims 1 to 3, wherein the system is provided with a thesaurus of synonyms for said sets of keywords so as to increase the number of positive matches with the sets of keywords.
5. A system according to any one of claims 1 to 4, wherein the system is provided with monitoring means arranged to monitor information sets selected for input by a user, to detect a change in the information sets so selected and to modify or substitute a keyword set associated with that user on detection of the change.
6. A system according to any one of claims 1 to 4, wherein the system is provided with means to change a keyword set associated with a user in response to a request by that user.
7. A system according to any one of claims 1 to 6, wherein the system is further provided with at least one data clustering means arranged to operate according to at least one data clustering algorithm and wherein said system is further arranged to apply the data clustering means to one or more keyword sets so as to modify the keyword set or sets prior to comparison with a set of said meta-information.
8. A system according to any one of claims 1 to 7, comprising a plurality of software agents, each agent comprising elements i) to v) inclusive of a system according to claim 1 and each agent being allocated to a different respective user of the system.
9. A system according to any one of claims 1 to 8, wherein said pointer comprises at least an address for accessing said information set by means of said communications network.
10. A system according to any one of claims 1 to 9, wherein said generation means comprises summary means for generating a summary of said information set.
11. A method of monitoring information sets stored in a distributed manner and accessible by means of a communications network, for the purpose of alerting a first user in accordance with alert criteria determined at least in part by said first user to an information set identified by a second user, the method comprising:
i) storing a user profile for each user, which profile comprises at least one set of keywords and an identifier for the user;
ii) detecting a request by the second user to store, in a data store, information relating to said identified information set;
iii) in response to the request, generating a set of meta-information, dependent on said identified information set, comprising at least a pointer to said identified information set when stored in said distributed manner;
iv) comparing the generated set of meta-information with a keyword set from the user profile for the first user; and v) in dependence upon the result from the comparison, transmitting an alert message addressed to the first user.
i) storing a user profile for each user, which profile comprises at least one set of keywords and an identifier for the user;
ii) detecting a request by the second user to store, in a data store, information relating to said identified information set;
iii) in response to the request, generating a set of meta-information, dependent on said identified information set, comprising at least a pointer to said identified information set when stored in said distributed manner;
iv) comparing the generated set of meta-information with a keyword set from the user profile for the first user; and v) in dependence upon the result from the comparison, transmitting an alert message addressed to the first user.
12. A method according to claim 11, which further comprises monitoring information input requests by respective users and, on detection of a significant change in the information input requests made by a particular user, changing the keyword set used in step iv) for that particular user in the event of an information input request by a different user.
13. A method according to claim 11 or claim 12, wherein each information input includes at least one set of keywords associated with a respective document, and wherein the method further comprises the steps of generating a similarity matrix for at least two of said sets of keywords, and using said similarity matrix to extend the scope of a keyword set from a user profile in step iv) so as to obtain an increase in the number of positive results for an associated user.
14. A method according to claim11 or claim 12, which further comprises the step of applying a clustering algorithm to a keyword set from a user profile so as to divide the keyword set into sub-keyword sets and applying at least one of the sub-keyword sets in place of the full keyword set in step iv).
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP95300420 | 1995-01-23 | ||
EP95300420.7 | 1995-01-23 | ||
PCT/GB1996/000132 WO1996023265A1 (en) | 1995-01-23 | 1996-01-23 | Methods and/or systems for accessing information |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2210581A1 CA2210581A1 (en) | 1996-08-01 |
CA2210581C true CA2210581C (en) | 2002-03-26 |
Family
ID=8221064
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002210581A Expired - Fee Related CA2210581C (en) | 1995-01-23 | 1996-01-23 | Methods and/or systems for accessing information |
Country Status (14)
Country | Link |
---|---|
US (1) | US6289337B1 (en) |
EP (2) | EP0807291B1 (en) |
JP (1) | JPH10513587A (en) |
KR (1) | KR19980701598A (en) |
CN (1) | CN1169195A (en) |
AU (1) | AU707050B2 (en) |
BR (1) | BR9606931A (en) |
CA (1) | CA2210581C (en) |
DE (1) | DE69606021T2 (en) |
FI (1) | FI973080A (en) |
HK (1) | HK1004832A1 (en) |
NO (1) | NO973372L (en) |
NZ (1) | NZ298861A (en) |
WO (1) | WO1996023265A1 (en) |
Families Citing this family (133)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6112186A (en) * | 1995-06-30 | 2000-08-29 | Microsoft Corporation | Distributed system for facilitating exchange of user information and opinion using automated collaborative filtering |
US6049777A (en) * | 1995-06-30 | 2000-04-11 | Microsoft Corporation | Computer-implemented collaborative filtering based method for recommending an item to a user |
US6041311A (en) * | 1995-06-30 | 2000-03-21 | Microsoft Corporation | Method and apparatus for item recommendation using automated collaborative filtering |
US6092049A (en) * | 1995-06-30 | 2000-07-18 | Microsoft Corporation | Method and apparatus for efficiently recommending items using automated collaborative filtering and feature-guided automated collaborative filtering |
US7035914B1 (en) | 1996-01-26 | 2006-04-25 | Simpleair Holdings, Inc. | System and method for transmission of data |
US6076109A (en) | 1996-04-10 | 2000-06-13 | Lextron, Systems, Inc. | Simplified-file hyper text protocol |
CA2184518A1 (en) * | 1996-08-30 | 1998-03-01 | Jim Reed | Real time structured summary search engine |
GB2317302A (en) * | 1996-09-12 | 1998-03-18 | Sharp Kk | A distributed information system |
US6370563B2 (en) * | 1996-09-30 | 2002-04-09 | Fujitsu Limited | Chat system terminal device therefor display method of chat system and recording medium |
EP0848337A1 (en) * | 1996-12-12 | 1998-06-17 | SONY DEUTSCHLAND GmbH | Server with automatic document assembly |
JP3579204B2 (en) | 1997-01-17 | 2004-10-20 | 富士通株式会社 | Document summarizing apparatus and method |
US6480600B1 (en) | 1997-02-10 | 2002-11-12 | Genesys Telecommunications Laboratories, Inc. | Call and data correspondence in a call-in center employing virtual restructuring for computer telephony integrated functionality |
US6104802A (en) | 1997-02-10 | 2000-08-15 | Genesys Telecommunications Laboratories, Inc. | In-band signaling for routing |
US7031442B1 (en) | 1997-02-10 | 2006-04-18 | Genesys Telecommunications Laboratories, Inc. | Methods and apparatus for personal routing in computer-simulated telephony |
GB2324627A (en) * | 1997-03-04 | 1998-10-28 | Talkway Inc | Interface for computer discussion technologies |
AU6555798A (en) * | 1997-03-14 | 1998-09-29 | Firefly Network, Inc. | Method and apparatus for efficiently recommending items using automated collaborative filtering and feature-guided automated collaborative filtering |
JPH10283240A (en) * | 1997-04-09 | 1998-10-23 | Canon Electron Inc | Information filing device, information file recording method, and storage medium with information file recording procedure stored |
US5966711A (en) * | 1997-04-15 | 1999-10-12 | Alpha Gene, Inc. | Autonomous intelligent agents for the annotation of genomic databases |
SE510438C2 (en) * | 1997-07-02 | 1999-05-25 | Telia Ab | Method and system for collecting and distributing information over the Internet |
JPH1125125A (en) * | 1997-07-08 | 1999-01-29 | Canon Inc | Network information retrieving device, its method and storage medium |
WO1999005621A1 (en) * | 1997-07-22 | 1999-02-04 | Microsoft Corporation | System for processing textual inputs using natural language processing techniques |
US5933822A (en) * | 1997-07-22 | 1999-08-03 | Microsoft Corporation | Apparatus and methods for an information retrieval system that employs natural language processing of search results to improve overall precision |
US6353827B1 (en) * | 1997-09-04 | 2002-03-05 | British Telecommunications Public Limited Company | Methods and/or systems for selecting data sets |
US6985943B2 (en) | 1998-09-11 | 2006-01-10 | Genesys Telecommunications Laboratories, Inc. | Method and apparatus for extended management of state and interaction of a remote knowledge worker from a contact center |
US6711611B2 (en) | 1998-09-11 | 2004-03-23 | Genesis Telecommunications Laboratories, Inc. | Method and apparatus for data-linking a mobile knowledge worker to home communication-center infrastructure |
DE69805437T2 (en) * | 1997-10-21 | 2002-12-12 | British Telecomm Public Ltd Co | INFORMATION MANAGEMENT SYSTEM |
USRE46528E1 (en) | 1997-11-14 | 2017-08-29 | Genesys Telecommunications Laboratories, Inc. | Implementation of call-center outbound dialing capability at a telephony network level |
SE511584C2 (en) * | 1998-01-15 | 1999-10-25 | Ericsson Telefon Ab L M | information Routing |
IL125432A (en) * | 1998-01-30 | 2010-11-30 | Easynet Access Inc | Personalized internet interaction |
IL123129A (en) | 1998-01-30 | 2010-12-30 | Aviv Refuah | Www addressing |
US6078924A (en) * | 1998-01-30 | 2000-06-20 | Aeneid Corporation | Method and apparatus for performing data collection, interpretation and analysis, in an information platform |
US7907598B2 (en) | 1998-02-17 | 2011-03-15 | Genesys Telecommunication Laboratories, Inc. | Method for implementing and executing communication center routing strategies represented in extensible markup language |
US6535492B2 (en) * | 1999-12-01 | 2003-03-18 | Genesys Telecommunications Laboratories, Inc. | Method and apparatus for assigning agent-led chat sessions hosted by a communication center to available agents based on message load and agent skill-set |
US6332154B2 (en) | 1998-09-11 | 2001-12-18 | Genesys Telecommunications Laboratories, Inc. | Method and apparatus for providing media-independent self-help modules within a multimedia communication-center customer interface |
SE512106C2 (en) * | 1998-03-10 | 2000-01-24 | Telia Ab | Improvement of, or with regard to, telecommunications transmission systems |
SE512107C2 (en) * | 1998-03-10 | 2000-01-24 | Telia Ab | Improvement of, or with regard to, telecommunications transmission systems |
US6421675B1 (en) | 1998-03-16 | 2002-07-16 | S. L. I. Systems, Inc. | Search engine |
DE19811352C2 (en) * | 1998-03-16 | 2000-01-13 | Siemens Ag | System and method for searching on networked computers with information stocks using software agents |
JP4081175B2 (en) * | 1998-03-19 | 2008-04-23 | 富士通株式会社 | Search processing apparatus and storage medium |
US6658453B1 (en) | 1998-05-28 | 2003-12-02 | America Online, Incorporated | Server agent system |
EP0967545A1 (en) * | 1998-06-23 | 1999-12-29 | BRITISH TELECOMMUNICATIONS public limited company | A system and method for the co-ordination and control of information supply using a distributed multi-agent platform |
US6694357B1 (en) * | 1998-07-02 | 2004-02-17 | Copernican Technologies, Inc. | Accessing, viewing and manipulation of references to non-modifiable data objects |
JP4522583B2 (en) * | 1998-07-08 | 2010-08-11 | ブリティッシュ・テレコミュニケーションズ・パブリック・リミテッド・カンパニー | Requirements matching server, requirements matching system, electronic purchasing apparatus using them, electronic transaction system and method |
EP0971298A1 (en) * | 1998-07-08 | 2000-01-12 | BRITISH TELECOMMUNICATIONS public limited company | Requirements matching |
US6484155B1 (en) * | 1998-07-21 | 2002-11-19 | Sentar, Inc. | Knowledge management system for performing dynamic distributed problem solving |
WO2000008539A1 (en) * | 1998-08-03 | 2000-02-17 | Fish Robert D | Self-evolving database and method of using same |
US6266668B1 (en) * | 1998-08-04 | 2001-07-24 | Dryken Technologies, Inc. | System and method for dynamic data-mining and on-line communication of customized information |
WO2000015847A2 (en) * | 1998-09-11 | 2000-03-23 | Gene Logic, Inc. | Genomic knowledge discovery |
USRE46153E1 (en) | 1998-09-11 | 2016-09-20 | Genesys Telecommunications Laboratories, Inc. | Method and apparatus enabling voice-based management of state and interaction of a remote knowledge worker in a contact center environment |
US6115709A (en) | 1998-09-18 | 2000-09-05 | Tacit Knowledge Systems, Inc. | Method and system for constructing a knowledge profile of a user having unrestricted and restricted access portions according to respective levels of confidence of content of the portions |
AU5822899A (en) | 1998-09-18 | 2000-04-10 | Tacit Knowledge Systems | Method and apparatus for querying a user knowledge profile |
US8380875B1 (en) | 1998-09-18 | 2013-02-19 | Oracle International Corporation | Method and system for addressing a communication document for transmission over a network based on the content thereof |
US6154783A (en) * | 1998-09-18 | 2000-11-28 | Tacit Knowledge Systems | Method and apparatus for addressing an electronic document for transmission over a network |
US6598046B1 (en) * | 1998-09-29 | 2003-07-22 | Qwest Communications International Inc. | System and method for retrieving documents responsive to a given user's role and scenario |
US6768996B1 (en) * | 1998-10-08 | 2004-07-27 | Hewlett-Packard Development Company, L.P. | System and method for retrieving an abstracted portion of a file without regard to the operating system of the current host computer |
US8121891B2 (en) | 1998-11-12 | 2012-02-21 | Accenture Global Services Gmbh | Personalized product report |
US7076504B1 (en) | 1998-11-19 | 2006-07-11 | Accenture Llp | Sharing a centralized profile |
US7062707B1 (en) * | 1998-12-08 | 2006-06-13 | Inceptor, Inc. | System and method of providing multiple items of index information for a single data object |
FR2787902B1 (en) * | 1998-12-23 | 2004-07-30 | France Telecom | MODEL AND METHOD FOR IMPLEMENTING A RATIONAL DIALOGUE AGENT, SERVER AND MULTI-AGENT SYSTEM FOR IMPLEMENTATION |
AU4004500A (en) * | 1999-02-26 | 2000-09-14 | Webivore Knowledge Systems,LLC. | Network information collection tool |
US6449632B1 (en) * | 1999-04-01 | 2002-09-10 | Bar Ilan University Nds Limited | Apparatus and method for agent-based feedback collection in a data broadcasting network |
US6304864B1 (en) * | 1999-04-20 | 2001-10-16 | Textwise Llc | System for retrieving multimedia information from the internet using multiple evolving intelligent agents |
KR19990068686A (en) * | 1999-06-11 | 1999-09-06 | 이판정 | Method for searching WWW site according to real name and providing information |
US6480858B1 (en) * | 1999-06-30 | 2002-11-12 | Microsoft Corporation | Method and apparatus for finding nearest logical record in a hash table |
KR100359233B1 (en) * | 1999-07-15 | 2002-11-01 | 학교법인 한국정보통신학원 | Method for extracing web information and the apparatus therefor |
US7013300B1 (en) * | 1999-08-03 | 2006-03-14 | Taylor David C | Locating, filtering, matching macro-context from indexed database for searching context where micro-context relevant to textual input by user |
US7219073B1 (en) | 1999-08-03 | 2007-05-15 | Brandnamestores.Com | Method for extracting information utilizing a user-context-based search engine |
US6513036B2 (en) * | 1999-08-13 | 2003-01-28 | Mindpass A/S | Method and apparatus for searching and presenting search result from one or more information sources based on context representations selected from the group of other users |
EP1081606A3 (en) * | 1999-08-31 | 2001-05-02 | comMouse AG | Method and mouse with display for navigation in a computer network |
US6321228B1 (en) * | 1999-08-31 | 2001-11-20 | Powercast Media, Inc. | Internet search system for retrieving selected results from a previous search |
WO2001033419A2 (en) * | 1999-10-26 | 2001-05-10 | Jean Poncet | Access by content based computer system |
US7929978B2 (en) | 1999-12-01 | 2011-04-19 | Genesys Telecommunications Laboratories, Inc. | Method and apparatus for providing enhanced communication capability for mobile devices on a virtual private network |
NL1013997C2 (en) * | 1999-12-30 | 2001-07-03 | Cons Health Entrepreneurs Bv | Method for collecting and supplying information. |
GB2358717A (en) * | 2000-01-25 | 2001-08-01 | Gordon Ross | Methods for enhanced information exchange and transactions within multi-device environments |
US7720833B1 (en) | 2000-02-02 | 2010-05-18 | Ebay Inc. | Method and system for automatically updating search results on an online auction site |
AU2000235513A1 (en) * | 2000-03-31 | 2001-10-15 | Kapow Aps | Method of retrieving attributes from at least two data sources |
EP1264265A2 (en) * | 2000-04-18 | 2002-12-11 | Hewlett-Packard Company | Activity report generation |
FI111879B (en) * | 2000-05-08 | 2003-09-30 | Sonera Oyj | Management of user profile information in a telecommunications network |
EP1158419A1 (en) * | 2000-05-15 | 2001-11-28 | Gabriele Huss | Method and apparatus for observing user orientated information from data networks |
CA2311857A1 (en) | 2000-05-16 | 2001-11-16 | Wilson Grad Conn | System and method to facilitate sharing of information |
DE10024368A1 (en) * | 2000-05-17 | 2001-11-22 | Michael Fahrmair | Locating selection of information products involves accessing information product database containing data about information products with at least location, category information per product |
GB2362972A (en) * | 2000-06-02 | 2001-12-05 | Res Summary Com | An internet based searchable database for up to date financial executive summaries with links to full documents |
KR100378642B1 (en) * | 2000-07-06 | 2003-03-31 | 김시환 | Information searching system and method thereof |
AU2001276920A1 (en) * | 2000-07-17 | 2002-01-30 | Blue Ripple, Inc. | Content distribution |
US7054900B1 (en) * | 2000-08-18 | 2006-05-30 | Netzero, Inc. | Automatic, profile-free web page recommendation |
KR20020017622A (en) * | 2000-08-31 | 2002-03-07 | 김종민 | Community service system in internet environment and method thereof |
AUPR033800A0 (en) * | 2000-09-25 | 2000-10-19 | Telstra R & D Management Pty Ltd | A document categorisation system |
FR2814829B1 (en) * | 2000-09-29 | 2003-08-15 | Vivendi Net | METHOD AND SYSTEM FOR OPTIMIZING CONSULTATIONS OF DATA SETS BY A PLURALITY OF CLIENTS |
DE10053738A1 (en) * | 2000-10-30 | 2002-05-02 | Starzone Gmbh | Process for linking different target groups as well as a suitable system for this |
US6910045B2 (en) | 2000-11-01 | 2005-06-21 | Collegenet, Inc. | Automatic data transmission in response to content of electronic forms satisfying criteria |
WO2002041178A1 (en) * | 2000-11-20 | 2002-05-23 | British Telecommunications Public Limited Company | Information provider |
JP2002163546A (en) * | 2000-11-27 | 2002-06-07 | Matsushita Electric Ind Co Ltd | Information delivery system and information delivery method |
US20020161757A1 (en) * | 2001-03-16 | 2002-10-31 | Jeffrey Mock | Simultaneous searching across multiple data sets |
US7200556B2 (en) * | 2001-05-22 | 2007-04-03 | Siemens Communications, Inc. | Methods and apparatus for accessing and processing multimedia messages stored in a unified multimedia mailbox |
US20030028603A1 (en) * | 2001-08-02 | 2003-02-06 | Siemens Information And Communication Networks, Inc. | Methods and apparatus for automatically summarizing messages stored in a unified multimedia mailboxes |
US7260607B2 (en) * | 2001-08-02 | 2007-08-21 | Siemens Communications, Inc. | Methods and apparatus for performing media/device sensitive processing of messages stored in unified multimedia and plain text mailboxes |
AUPR710801A0 (en) * | 2001-08-17 | 2001-09-06 | Gunrock Knowledge Concepts Pty Ltd | Knowledge management system |
US8046343B2 (en) * | 2001-09-29 | 2011-10-25 | Siebel Systems, Inc. | Computing system and method for automatic completion of pick field |
US7814043B2 (en) * | 2001-11-26 | 2010-10-12 | Fujitsu Limited | Content information analyzing method and apparatus |
AU2006203729B2 (en) * | 2001-11-26 | 2008-07-31 | Fujitsu Limited | Information analyzing method and apparatus |
US7333966B2 (en) | 2001-12-21 | 2008-02-19 | Thomson Global Resources | Systems, methods, and software for hyperlinking names |
DE10208959B4 (en) * | 2002-02-28 | 2006-10-12 | Equero Future Net Technologies Ag | Method and device for detecting and evaluating information stored in a computer network |
US20040024756A1 (en) * | 2002-08-05 | 2004-02-05 | John Terrell Rickard | Search engine for non-textual data |
US9805373B1 (en) | 2002-11-19 | 2017-10-31 | Oracle International Corporation | Expertise services platform |
JP2006512693A (en) | 2002-12-30 | 2006-04-13 | トムソン コーポレイション | A knowledge management system for law firms. |
US8055669B1 (en) * | 2003-03-03 | 2011-11-08 | Google Inc. | Search queries improved based on query semantic information |
US7925984B2 (en) * | 2003-03-31 | 2011-04-12 | International Business Machines Corporation | Remote configuration of intelligent software agents |
WO2004097643A2 (en) | 2003-04-29 | 2004-11-11 | University Of Strathclyde | Monitoring software |
US20040230564A1 (en) * | 2003-05-16 | 2004-11-18 | Horatiu Simon | Filtering algorithm for information retrieval systems |
US8010484B2 (en) * | 2003-06-16 | 2011-08-30 | Sap Aktiengesellschaft | Generating data subscriptions based on application data |
US7966260B2 (en) * | 2003-06-16 | 2011-06-21 | Sap Aktiengesellschaft | Generating data subscriptions based on application data |
US7009369B2 (en) * | 2003-07-14 | 2006-03-07 | Texas Instruments Incorporated | Advanced monitoring algorithm for regulated power systems with single output flag |
US7647327B2 (en) | 2003-09-24 | 2010-01-12 | Hewlett-Packard Development Company, L.P. | Method and system for implementing storage strategies of a file autonomously of a user |
WO2005055090A1 (en) * | 2003-12-01 | 2005-06-16 | Metanav Corporation | Dynamic keyword processing system and method for user oriented internet navigation |
JP4200933B2 (en) * | 2004-04-27 | 2008-12-24 | コニカミノルタホールディングス株式会社 | Information retrieval device |
US7716219B2 (en) * | 2004-07-08 | 2010-05-11 | Yahoo ! Inc. | Database search system and method of determining a value of a keyword in a search |
JP4524640B2 (en) * | 2005-03-31 | 2010-08-18 | ソニー株式会社 | Information processing apparatus and method, and program |
US8429167B2 (en) | 2005-08-08 | 2013-04-23 | Google Inc. | User-context-based search engine |
US8027876B2 (en) * | 2005-08-08 | 2011-09-27 | Yoogli, Inc. | Online advertising valuation apparatus and method |
US9008075B2 (en) | 2005-12-22 | 2015-04-14 | Genesys Telecommunications Laboratories, Inc. | System and methods for improving interaction routing performance |
US20070238789A1 (en) * | 2006-03-31 | 2007-10-11 | Chin-Ming Chang | Prednisolone acetate compositions |
US7735010B2 (en) | 2006-04-05 | 2010-06-08 | Lexisnexis, A Division Of Reed Elsevier Inc. | Citation network viewer and method |
WO2008107895A2 (en) * | 2007-03-08 | 2008-09-12 | Technion Research And Development Foundation Ltd | Method for delivering query responses |
WO2009036796A1 (en) * | 2007-09-12 | 2009-03-26 | Admar Informatik Marti | Method for creation of a profile of a user of a data processing system |
US8577930B2 (en) | 2008-08-20 | 2013-11-05 | Yahoo! Inc. | Measuring topical coherence of keyword sets |
CN102725739A (en) * | 2009-05-18 | 2012-10-10 | 西山修平 | Distributed database system by sharing or replicating the meta information on memory caches |
US9235563B2 (en) * | 2009-07-02 | 2016-01-12 | Battelle Memorial Institute | Systems and processes for identifying features and determining feature associations in groups of documents |
US8543381B2 (en) | 2010-01-25 | 2013-09-24 | Holovisions LLC | Morphing text by splicing end-compatible segments |
US9183308B1 (en) * | 2010-05-28 | 2015-11-10 | Sri International | Method and apparatus for searching the internet |
US8832655B2 (en) | 2011-09-29 | 2014-09-09 | Accenture Global Services Limited | Systems and methods for finding project-related information by clustering applications into related concept categories |
FR2988192B1 (en) * | 2012-03-19 | 2016-01-01 | Syneria | METHOD AND SYSTEM FOR DEVELOPING CONSULTATION APPLICATIONS OF CONTENT AND SERVICES ON A TELECOMMUNICATION, DISTRIBUTION AND EXECUTION NETWORK OF SUCH APPLICATIONS ON MULTIPLE APPARATUSES. |
EP2973045A4 (en) | 2013-03-15 | 2017-03-08 | Robert Haddock | Intelligent internet system with adaptive user interface providing one-step access to knowledge |
WO2019027259A1 (en) * | 2017-08-01 | 2019-02-07 | Samsung Electronics Co., Ltd. | Apparatus and method for providing summarized information using an artificial intelligence model |
Family Cites Families (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5384701A (en) * | 1986-10-03 | 1995-01-24 | British Telecommunications Public Limited Company | Language translation system |
JPH021057A (en) * | 1988-01-20 | 1990-01-05 | Ricoh Co Ltd | Document retrieving device |
JP2783558B2 (en) * | 1988-09-30 | 1998-08-06 | 株式会社東芝 | Summary generation method and summary generation device |
US5408655A (en) | 1989-02-27 | 1995-04-18 | Apple Computer, Inc. | User interface system and method for traversing a database |
US5794001A (en) | 1989-06-30 | 1998-08-11 | Massachusetts Institute Of Technology | Object-oriented computer user interface |
JPH03122770A (en) * | 1989-10-05 | 1991-05-24 | Ricoh Co Ltd | Method for retrieving keyword associative document |
US5448727A (en) * | 1991-04-30 | 1995-09-05 | Hewlett-Packard Company | Domain based partitioning and reclustering of relations in object-oriented relational database management systems |
JP2804403B2 (en) * | 1991-05-16 | 1998-09-24 | インターナショナル・ビジネス・マシーンズ・コーポレイション | Question answering system |
US5428778A (en) | 1992-02-13 | 1995-06-27 | Office Express Pty. Ltd. | Selective dissemination of information |
US5446891A (en) | 1992-02-26 | 1995-08-29 | International Business Machines Corporation | System for adjusting hypertext links with weighed user goals and activities |
US5537586A (en) * | 1992-04-30 | 1996-07-16 | Individual, Inc. | Enhanced apparatus and methods for retrieving and selecting profiled textural information records from a database of defined category structures |
DE69432503T2 (en) | 1993-10-08 | 2003-12-24 | Ibm | Information archiving system with object-dependent functionality |
JP2682811B2 (en) * | 1994-03-22 | 1997-11-26 | インターナショナル・ビジネス・マシーンズ・コーポレイション | Data storage management system and method |
US5619615A (en) | 1994-07-22 | 1997-04-08 | Bay Networks, Inc. | Method and apparatus for identifying an agent running on a device in a computer network |
US5623652A (en) | 1994-07-25 | 1997-04-22 | Apple Computer, Inc. | Method and apparatus for searching for information in a network and for controlling the display of searchable information on display devices in the network |
US5680530A (en) * | 1994-09-19 | 1997-10-21 | Lucent Technologies Inc. | Graphical environment for interactively specifying a target system |
US5717923A (en) | 1994-11-03 | 1998-02-10 | Intel Corporation | Method and apparatus for dynamically customizing electronic information to individual end users |
US5694594A (en) | 1994-11-14 | 1997-12-02 | Chang; Daniel | System for linking hypermedia data objects in accordance with associations of source and destination data objects and similarity threshold without using keywords or link-difining terms |
US5758257A (en) | 1994-11-29 | 1998-05-26 | Herz; Frederick | System and method for scheduling broadcast of and access to video programs and other data using customer profiles |
JPH0926970A (en) | 1994-12-20 | 1997-01-28 | Sun Microsyst Inc | Method and apparatus for execution by computer for retrievalof information |
US5530852A (en) | 1994-12-20 | 1996-06-25 | Sun Microsystems, Inc. | Method for extracting profiles and topics from a first file written in a first markup language and generating files in different markup languages containing the profiles and topics for use in accessing data described by the profiles and topics |
JPH08297669A (en) | 1994-12-27 | 1996-11-12 | Internatl Business Mach Corp <Ibm> | System and method for automatic link of plurality of parts at inside of composite document |
US5649186A (en) | 1995-08-07 | 1997-07-15 | Silicon Graphics Incorporated | System and method for a computer-based dynamic information clipping service |
US5745938A (en) | 1996-08-30 | 1998-05-05 | Westvaco Corporation | Rescue board |
-
1996
- 1996-01-23 BR BR9606931A patent/BR9606931A/en unknown
- 1996-01-23 DE DE69606021T patent/DE69606021T2/en not_active Expired - Lifetime
- 1996-01-23 NZ NZ298861A patent/NZ298861A/en not_active IP Right Cessation
- 1996-01-23 EP EP96900645A patent/EP0807291B1/en not_active Expired - Lifetime
- 1996-01-23 CN CN96191566A patent/CN1169195A/en active Pending
- 1996-01-23 EP EP99113304A patent/EP0953920A3/en not_active Withdrawn
- 1996-01-23 KR KR1019970704990A patent/KR19980701598A/en not_active Application Discontinuation
- 1996-01-23 AU AU44549/96A patent/AU707050B2/en not_active Ceased
- 1996-01-23 WO PCT/GB1996/000132 patent/WO1996023265A1/en active Search and Examination
- 1996-01-23 CA CA002210581A patent/CA2210581C/en not_active Expired - Fee Related
- 1996-01-23 JP JP8522713A patent/JPH10513587A/en active Pending
-
1997
- 1997-07-22 NO NO973372A patent/NO973372L/en not_active Application Discontinuation
- 1997-07-22 FI FI973080A patent/FI973080A/en unknown
-
1998
- 1998-05-08 HK HK98104009A patent/HK1004832A1/en not_active IP Right Cessation
-
1999
- 1999-07-12 US US09/351,633 patent/US6289337B1/en not_active Expired - Lifetime
Also Published As
Publication number | Publication date |
---|---|
JPH10513587A (en) | 1998-12-22 |
FI973080A0 (en) | 1997-07-22 |
NO973372L (en) | 1997-09-22 |
DE69606021D1 (en) | 2000-02-10 |
AU4454996A (en) | 1996-08-14 |
EP0953920A2 (en) | 1999-11-03 |
KR19980701598A (en) | 1998-05-15 |
EP0807291A1 (en) | 1997-11-19 |
FI973080A (en) | 1997-07-22 |
AU707050B2 (en) | 1999-07-01 |
WO1996023265A1 (en) | 1996-08-01 |
BR9606931A (en) | 1997-11-11 |
CA2210581A1 (en) | 1996-08-01 |
CN1169195A (en) | 1997-12-31 |
DE69606021T2 (en) | 2000-08-03 |
NO973372D0 (en) | 1997-07-22 |
US6289337B1 (en) | 2001-09-11 |
EP0953920A3 (en) | 2005-06-29 |
EP0807291B1 (en) | 2000-01-05 |
NZ298861A (en) | 1999-01-28 |
HK1004832A1 (en) | 1998-12-11 |
MX9705582A (en) | 1997-11-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2210581C (en) | Methods and/or systems for accessing information | |
US5931907A (en) | Software agent for comparing locally accessible keywords with meta-information and having pointers associated with distributed information | |
CA2302264C (en) | Methods and/or systems for selecting data sets | |
CA2281645C (en) | System and method for semiotically processing text | |
EP0958541B1 (en) | Intelligent network browser using incremental conceptual indexer | |
US5920859A (en) | Hypertext document retrieval system and method | |
WO2000067159A2 (en) | System and method for searching and recommending documents in a collection using shared bookmarks | |
Davies et al. | Jasper: communicating information agents for WWW | |
US20040015485A1 (en) | Method and apparatus for improved internet searching | |
Davies et al. | Information agents for the world wide web | |
O’Riordan et al. | Information filtering and retrieval: An overview | |
Eichmann | Advances in network information discovery and retrieval | |
MXPA97005582A (en) | Methods and / or systems to access information | |
Davies et al. | Networked information management | |
Davies et al. | Networked information management | |
Wondergem | INdex navigator for searching and exploring the WWW | |
Salerno et al. | Buddy: fusing multiple search results together | |
Davies et al. | Networked Information Management | |
Menon | Web crawler indexing: An approach by clustering | |
Sheldon¹ et al. | Content Routing for Distributed Information Servers | |
Eskicioğlu | A Search Engine for Turkish with Stemming | |
Yuwono | Search and Ranking Algorithms for Locating Resources |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
MKLA | Lapsed |
Effective date: 20160125 |