WO2001080070A2

WO2001080070A2 - Search engine with search task model and interactive search task-refinement process

Info

Publication number: WO2001080070A2
Application number: PCT/EP2001/003945
Authority: WO
Inventors: James D. Schaffer; Kwok P. Lee; Paul Rutter
Original assignee: Koninklijke Philips Electronics N.V.
Priority date: 2000-04-13
Filing date: 2001-04-06
Publication date: 2001-10-25
Also published as: JP2004515829A; WO2001080070A3; KR20020019079A

Abstract

A search engine provides an intelligent assistant that accepts a task model definition for each search that can be compared to a result of the search and used to refine the search either by requesting additional input or automatically. The general approach of a task model is to determine either directly or passively, whether the search results are a good match for what the user is seeking. This way the system has something to compare to the search results the user retrieves. As is well known, much frustration can attend the process of doing research of complex data sets. The above provides a mechanism for allowing a user to specify information, or for the system to infer information, that can be compared to a search result or used to refine a search before it is done. As shown, this can be used as a basis for an intelligent assistant that gives the user opportunities to refine his search by offering alternative search terms.

Description

Search engine with search task model and interactive search task-refinement process

BACKGROUND OF THE INVENTION Field of the Invention

The invention relates to search engines such as used to search the Internet, electronic program guides, indexed document databases, etc. More particularly it relates to searches that employ queries that can contain large numbers of alternative search terms such as used to search keyword and full text searchable databases. Even more particularly, it relates to such search engines in which a task model is specified to improve the relevance of the results returned by a search query.

Background

Search engines, as part of their process, generate queries and use them to retrieve records from a database. A variety are well known. Many allow a user to enter a search definition (e.g., a search string) consisting of one or more terms, each of which is a unique one of a large number of possible terms. For example, keyword and full-text search queries can be formed from every word in the dictionary. Also, within the large number of alternatives are many overlapping or redundant terms. Even when there is a smaller possible number of terms, their possible combinations can be large. The number of alternatives and the redundancy sometimes inherent in the list of possible query terms (usually the entire native language chosen by the user) can lead to serious problems in using a search engine. The typical routine of performing a search begins with the entry of a search string, which may be a free form sentence, a question, a set of words and/or phrases, or a single word. The search engine may parse the set of words so that words and phrases are employed as separate possible terms. For example, if the word "dog" appears after "red," the engine may search for "red dog" as well as the word "red" and "dog" separated from each other in the same document. Also, the engine may contain grammatical models that allow it to attach greater importance to some words and less to others. Still other variations are also used such as Boolean operators that allow symbols (e.g., words) to be connected such as a range of proximity, disjunctive and conjunctive connectors, etc. The engine generates from these inputs a definite query to select results. It then retrieves the results and sorts them based on relevance.

Relevance may include criteria other than the relative importance of search terms and words, which is one of the most common. For example, the engine may assign a certain value to a candidate record returned by the query based on the number of links that exist in other sites that point to that candidate, such as done by Google®. Relevance may be inferred from the query or even explicitly specified in some fashion by the user.

One problem with the existing technology is that it produces search results with little regard for the number of records the user is attempting to retrieve. If a user wants a single record, he/she may receive a tremendous number in response to his/her query and have to refine his her query string to reduce the number to help him her find the particular record sought. This is a disadvantage that requires a solution. One approach that is frequently used is to sort the results in order of priority. However, this assumes the intrinsic merit of pre-specified criteria that, aside from the terms themselves and the operators that may be applied by the user, are not accessible by the user. Another problem that often arises is that a search returns far fewer results than expected. This can happen because, although good terms were used, they did not represent all possible relevant terms. For example, if the user searched on a person's name but not his/her nickname, the search might miss relevant records. A solution that allows the user to provide more intelligence (that is, information the user knows before he/she begins his research) to the search process is desirable.

SUMMARY OF THE INVENTION An intelligent search process is implementable on a computer or computer network, for example a network workstation or a television set-top box with an electronic program guide (EPG). The process accepts not only traditional search terms, but a task model definition as well. The task model definition may allow the user, for example, to specify the number of records sought. In the environment of EPG searching, the user may seek a specific program rather than a large set. When the user wishes a large set, he/she may desire to browse through a whole genre or time range for instance. When the user wants a particular record, he/she may know in advance that only one specific program is desired, but is not sure of the title, channel, or other variables that would narrow the search effectively. To support these kinds of tasks, the user interface (UI) allows the specification of a task model indicating whether the user is searching for one, a small number, or a large number of "hits." In the alternative, the task model may be inferred from extant conditions or passively from user behavior such as the browsing of a program guide or other search result.

The search engine uses the additional intelligence defined by the task model to modify the search results. The information provided by the user is supplemented by the specification of a task model, not replaced by it. Thus, a set of retrieved records may be obtained in the normal fashion employing any of numerous query generation and sorting models. Once the search results are obtained, however, the task model specification may be used to modify the result. In one embodiment, in response to a task model requesting a number of records that is much smaller (by an order of magnitude or more) than the number of records returned by the search, the engine may look for discriminants in the set of records returned and, instead of simply listing the results returned, offer the user a list of discriminants from which to select. The discriminant may be, for example, an important term that appears in a small percentage of the retrieved results, but is conspicuously absent from the others. It may identify a number of such discriminants and offer all of them to the user to select from.

The identification of discriminants is a well-developed technology in itself. A very simple approach is to generate a histogram that indicates the terms that appear most often in the returned records and to allow the user to select from among the terms with the highest frequency. Another is to look for common incidences of words not specified in the query but which appear in association with words that were specified in the query under the assumption that the former modify the latter when they appear in mutual proximity. These former terms would be presented as options from which to select. The generation of the statistics needed to identify these discriminants is straightforward from the processes employed by search engines. Search engines generate or use index files that permit the ready generation of such statistics.

The task model selection could receive two inputs: the number of hits expected and the number sought. The selection of a task model could be done using a selection list, a number, or some other code or symbol to indicate the number of records sought or expected.

Another task model that may be employed in connection with the invention may be keyed to a context definition that is specified or inferred automatically. For example, if the user is looking for a single program of a certain variety at a certain time of day, when these conditions are present, the system may make suitable inferences. For example, the system may infer that he/she is looking for a specific program to watch or that he/she wants a particular kind of program (based on previous selections). In response, the system may offer him/her a selection list based on these inferences.

The invention will be described in connection with certain preferred embodiments, with reference to the following illustrative figures so that it may be more fully understood. With reference to the figures, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of the preferred embodiments of the present invention only, and are presented in the cause of providing what is believed to be the most useful and readily understood description of the principles and conceptual aspects of the invention. In this regard, no attempt is made to show structural details of the invention in more detail than is necessary for a fundamental understanding of the invention, the description taken with the drawings making apparent to those skilled in the art how the several forms of the invention may be embodied in practice.

BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1 is an illustration of a hardware system in which the current invention may be employed.

Figs. 2 illustrates a display portion of a user interface that may form a component of an electronic program guide application of the current invention. Figs. 3 A-B illustrate user interface dialogs used to input task models according to an embodiment of the invention.

Fig. 3C illustrates a user interface dialog for input of a query according to an embodiment of the invention.

Figs. 3D-F illustrate user interface dialogs used to modify the search query based on results and/or a task model according to an embodiment of the invention.

Fig. 4 is a block diagram illustrating data flow between processes and data stores in an embodiment of the invention.

Fig. 5 is a flow chart illustrating steps followed in searching a database using the invention. Fig. 6 is a flow chart illustrating steps followed in searching a database according to an alternative embodiment of the invention.

Fig. 7 is a flow chart illustrating steps followed in searching a database according to still another embodiment of the invention. Fig. 8 is a flow chart illustrating steps followed in searching a database according to still another embodiment of the invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Referring to Fig. 1 , the invention will be described in the environment of an electronic program guide (EPG) for a home television system. In an embodiment, a computer (or "set-top box") 240 displays program information on a television or monitor 230. The computer 240 may be equipped to receive a data, video, etc. signal 260, from a server or servers 276 or other video source via network channels 274, and control a channel-changing function as well as accept search queries through user input devices such as a keyboard 211 or handheld remote control 210. The EPG can be browsed based on simple criteria such as a default filter (such as current time of day) as well as queried using a search engine process. The computer 240 may also be programmed to allow a user to select channels through a tuner (not shown) inside the computer 240 rather than through a television's tuner (not shown). The user can then select a program to be viewed by highlighting a desired selection from a displayed program schedule using a remote control 210 to control the computer. The computer 240 may output via other modalities, for example, a speaker 231. The computer 240 has a link for data, video, etc. 260 through which it can receive updated program schedule data. This could be a telephone line co nectable to an Internet service provider or some other suitable data connection. The computer 240 has a mass storage device 235, for example a hard disk, to store program schedule information, program applications and upgrades, and other information. Information about the user's preferences and other data can be uploaded into the computer 240 via removable media such as a memory card or disk 220. EPG data may include titles and various descriptive information such as a narrative summary, various keywords categorizing the content, etc. These may be searchable as full text through a suitable user interface (UI). Referring now also to Fig. 2, the program information can be shown to the user and browsed by the user. The attendant display may be in the form of a time-grid 170 similar to the format commonly used for existing cable television channel guides. In the time-grid display 170, various programs are shown such as indicated by the bar at 130. The length of each bar indicates a respective program's duration and the start and end points of each bar indicate the start and end times, respectively, of each respective program. A description window 165 provides detailed information about a currently selected program. Note that many substitutions are possible in the above example hardware environment and all can be used in connection with the invention. The mass storage can be replaced by volatile memory or non- volatile memory. The data can be stored locally or remotely. In fact, the entire computer 240 could be replaced with a server or servers 276 operating offsite through a link. Rather than using a remote control to send commands to the computer 240 through an infrared port 215, the controller could send commands through a link for data, video, etc. 260 which could be separate from, or the same as, the physical channel carrying the video. The video or other content can be carried by a cable, RF, or any other broadband physical channel originating from a cable source 100' or obtained from a mass storage or removable storage medium, for example a data store 235 or memory card or disk 220. It could be carried by a switched physical channel such as a phone line or a virtually switched channel such as ATM or other network suitable for synchronous data communication. Content could be asynchronous and tolerant of dropouts so that present-day IP networks could be used. Further, the content of the line through which programming content is received could be audio, chat conversation data, web sites, or any other kind of content for which a variety of selections are possible. The program guide data can be received through channels other than the separate link for data, video, etc. 260. For example, program guide information can be received through the same physical channel as the video or other content. It could even be provided through removable data storage media such as memory card or disk 220. The remote control 210 can be replaced by a keyboard, voice command interface, 3D-mouse, joystick, or any other suitable input device. Selections can be made by moving a highlighting indicator, identifying a selection symbolically (e.g., by a name or number), or making selections in batch form through a data transmission or via removable media. Referring to Fig. 5, a user chooses or enters a query in step S10. The task model is an estimate of the number of records expected and/or the number of records ultimately desired. This could be specified by an approximate numerical estimate, term such as "small," "medium," or "large," that can be determined relative to some standard such as relative to the size of the database or general topic area. The query is submitted and the results of the search retrieved in step S20. Next, in step S25, if the task model is smaller than the retrieval retrieved records, useful discriminants are culled from the retrieved record set by a suitable statistic. The discriminants can be derived by various means as discussed below. These are displayed and the user selects one or more of them in step S30. These are applied to the returned results in step S35. This process may be repeated until the number of results ultimately desired matches the records produced. Finally, the results are displayed to the user in step S40 and an appropriate action may be taken, for example, selecting a particular result for retrieval or viewing.

Note that a task model does not have to include a specification of both the number of records expected and the number of records ultimately desired. If both are specified, it can be helpful, however. Consider the situation where the user wants one particular record and begins by, he/she thinks, casting a wide net in the hope of receiving assistance from the smart search process in finding the right record. The user can then tell the system how large a net he/she thinks is defined by the first search string entered. If the search process recognizes that even though one record is ultimately sought, the user expected the string to produce a large number of records, the search process can help the user initially expand the search and then use discriminants to pare the set down.

Note also that the task model does not need to specify an expected number of records. If an unknown quantity is involved, the system might begin by trying to expand the search (using techniques as illustrated in Figs. 3D-3F and attending text) then focus it by selecting discriminants.

The discrimnants identified in step S25 can be derived by various means. For example, using the returned selection set, a histogram indicating the frequency of each term in the returned set of records can be generated. Those terms with the highest discriminating power may be displayed and the user permitted to select one or more. Suppose for example that the user enters the Boolean query: "dog" and "fur or hair" and "curly or wavy" in an effort to find information about a particular breed, not known to him/her. The only information about the breed the user has, is that the breed has curly fur. The user may enter a task model specifying a small number of returns to indicate that he/she is looking for something specific and expects the query to result in a lot of undesired matches.

In the example, the records returned by the search include information about various breeds, most of them focussing on particular breeds. The terms with the highest frequency of hits may provide some information that the user can use to indicate to the search engine that certain classes of records are not desired and certain classes are desired. So, for example, common descriptors may be returned such as "small," "large," "thin," and "heavy." The user can select from among these to help reduce the selected records to a number that can be conveniently browsed. To augment this process, the user interface may display the number of hits in the original set, the number that would result from the combination of any of the proposed discriminants with the original query, and the effect of combinations as a new query is generated using the discriminant terms. For an example of the latter, suppose the user starts entering a query "thin and small." The display could show the effect as each term is added. This is similar to the way Folio Bound Views® by Folio Corporation works where, as a search query is entered, the number of returned results is continuously updated. A problem with such a simple discriminant is that such terms may simply tag along with the terms in the original search query. In other words, they may be common to most of the returned results and therefore act as poor discriminants among results. What is more desirable are discriminants that have a high probability of dividing the returned records by a large proportion. One way to identify better discriminants is to look for common instances of words that are not included in the original query but which appear in association with those in the original query inferring that there is meaning in the association. The association may be inferred by mutual proximity of the terms, for example, or grammatical parsing (e.g., identifying adjectives that modify the search query term), etc. Those candidate discriminants that appear with the highest discriminating power could then be presented to the user and the user permitted to select from among them.

A refinement to the two previous approaches is to select discriminants based on the ability of each to divide the returned set into a small number of subsets. One way to do this is to take a high hit count set of candidate discriminants, such as derived by the histogram procedure, and determine which from among them are "important" terms (importance being inferred, for example, from frequency of occurrence in the record, use in a title, etc.) that appear in a small percentage of the retrieved results, but are conspicuously absent from the others. That is, in some records, the term is important, but the term does not appear in all the records. In the above example of the curly haired dog breed, the name of the breed to which the record relates would be important in records that related to the breed and absent from records unrelated to that breed. The search engine could then show a list of such discriminants, many of which might include breed names.

Referring now to Figs. 3C and 4 and, typically, when a user searches a database, a query window is generated in a suitable user interface 58 (Fig. 4) which could have an appearance 305 on the display 230 such as illustrated in Fig. 3C. The user enters or selects symbols, such as keywords and operators, to define a search string 310. The search string 310 is used to generate a specific query which is then used by a search engine process 59 to generate filtering criteria that may be passed to a display/browse process 40 to filter and view data from a database 30. The end result is that the data retrieved are displayed. Referring now to Figs. 3A and 3B, consider the improvement to the above process whereby the user also indicates a task model for the current search. In the search UI process 58, a query may be generated to request from the user how many results the user seeks and how many are expected to be returned by the search string just entered. This could be done by making selections using radio buttons as illustrated in Fig. 3 A or by entering numerical data as illustrated in Fig. 3B. The option of indicating that the information is not known can be provided.

An example scenario follows. A user enters a task model stating that, either a large number of results are desired, or that a small number (or one), but that the user expects the query entered to produce a large number of hits. For example, the user enters an author's name, knowing the author has written hundreds of documents. In this example, the user may be looking for one in particular. If the search returns a small number of hits, the system may give the user some choices for widening the search. For example, if the user entered the author's name in a diminutive form, the system could recognize it and offer to expand the search by suggesting to the user that the formal name be used as well. For example, the user enters "Becky Wagner." The system could propose that Rebecca be added to the search. Another example could involve the use of synonyms. Someone doing a search for articles on celestial phenomena might be offered terms like "cosmology," "black-hole," "big-bang," etc. if the returned hits were smaller than expected. An alternative way to view data from the database is through the display/browse process 40 directly. As a user browses the data in the database, for example by turning on the system and seeing currently-scheduled programs for each channel, his/her browsing behavior may be monitored continuously by a task model inference engine process 56 and by a profile engine process 53. The interaction data is used by both processes to infer the user's desires. By making inferences about the user's preferences from the choices for viewing, the selections for further detail, the channels whose offerings are repeatedly browsed, etc. the system may generate profile data (for storage in a profile database 50) accordingly.

In other known systems, the same profile data may be used to filter and sort the raw database data to enhance browsing. For example, this feature could be combined with the present invention so that the profile data are also used by the UI 40 process to filter data for browsing.

The profile engine process 53 stores the data for use in deriving relevant filtering criteria so that the user does not always need to enter explicit search terms such as through the user interface of Fig. 3C. The task model inference engine process 56 uses the interaction behavior data to infer whether the user is currently in need of help and to infer the most appropriate task model.

The inference process can take many forms. A simple example is correlating a significance value with a record when a particular action takes place in association with the record. For example, a user may select a particular record to obtain further detailed information, as the user may, when browsing an EPG. A significance value may be associated with the act of requesting further information and correlated in a database with the particular record along with other information such as time of day, weekend or weekday, etc. Also, the time the user spends looking at a record (rather than quickly skipping past it) may be associated with a value. This information may be used directly to infer the task in which the user is engaged (browsing a EPG) and for inferring what is of interest. For example, a user may choose to display the detail of a particular set of programs while browsing. As the browsing continues, the set may become large enough to identify certain common features in the set selected for detail- viewing. In response to this, the UI process can generate a picture- in-picture window that requests confirmation of what the user seeks, for example a sports program. The UI element can ask how many records are sought or suggest a particularly important sports broadcast that might correspond to the one sought. It could also infer one or more task models that fit the pattern of behavior. Referring to Fig. 6, another example of a task model is one derived from environmental data or user behavior in addition to or alternative to one explicitly defined. In step S50, the user enters a query or browses data (for example, the EPG screen displayed in Fig. 2). In step S55, a task model is inferred by the system based on the user's behavior, environmental variables such as the time of day, the day of week, holidays, important dates, and/or data stored in the user profile database 50. For example, if the user is searching for a sports program to record and the Super Bowl or World Cup games are in the near future, these may be offered as suggestions to the user. Alternatively, the system may ask if the user is searching for a particular event before offering such suggestions. In step S60, a query is derived based on the task model, and any other query data, and the results displayed in step S80.

For example, the user may be browsing sports channels. The system may detect that the user is looking at times that are not current. The system can infer a task model from this and/or ask the user to define or refine the task model by asking questions. For example, the system can infer that the user is searching for a sports channel at a future time by the fact that the user requests detailed information on broadcasts that are in the future and only for sports events. The system can offer to present all the sporting events that are available in the near future. Alternatively, the system may ask the user what time range he/she is interested in and display the results accordingly. The system could ask if the user is looking for a particular event and offer some options based on network recommendations (e.g., network banner events may be available in the EPG database) or by prompting for a query term to improve the search.

The general approach of a task model is determined either directly or passively, to determine what the user is seeking. This way the system has something to compare to the search results the user retrieves. As is well known, much frustration can attend the process of doing research of complex data sets. The above provides a mechanism for allowing a user to specify information, or for the system to infer information, that can be compared to a search result or used to refine a search before it is done. As shown, this can be used as a basis for an intelligent assistant that gives the user opportunities to refine his/her search by offering alternative search terms.

Referring now to Figs. 4 and 7 in an alternative embodiment, the user chooses or enters a task model in step S5. The task model may be selected for the user automatically, by inference, or directly specified by the user. For example, when a viewer browses a database or surfs channels or an EPG, the system might infer that the task is to find a show to watch immediately. During such activities, both the task model inference engine 56 and profile engine 53 monitor the interaction behavior. If the user turns on an EPG display and indicates that he/she wishes to search, then a different task might be inferred or the user could be asked to select a task model from a list. Referring to Fig. 7, assuming the user has indicated a desire to search, the user selects a task model in step S5 using a UI element such as illustrated in Figs. 3 A and 3B. Then the user selects or enters a query in step S10 using a UI element such as illustrated in Fig. 3C. Next, in step S20, a query is generated from the query string submitted by the user, the query submitted, and search results retrieved in step S20. The results are compared with the task model in step S25. If the results match the task model in step S45, the results are displayed in step S40. If in step S45, the results do not match expectations, a query modification strategy is selected in step S22 and verified in step SI 5. As discussed above, the query modification strategy can be used to identify a discriminant if the results are too large or to propose expansion of criteria if the results are too few. If the results are too few, the search strategy needs to be expanded. Referring to Fig. 3D, this can be done via selection interface such as shown. A general selection requesting the search be expanded would leave it to the system to determine how to expand the search. For example, the system could look terms up in a thesaurus and add alternate words. Referring to Fig. 3E, the system could display alternative expansion devices such as thesaurus expansion, alternate spelling expansion, removal of a term, etc. and allow the user to select which device(s) to use to expand the search. Referring to Fig. 3F, alternative terms could be suggested by the system on a word by word basis and then the user permitted to either change the term (Δ) to the suggested term or add the alternate (+) to the search. In Fig. 3F, the term "Becky" was expanded to include "Rebecca" and the term "legal" to include

"law." The system suggested another name ("Winger") with a similar spelling as an alternate or substitute for "Wagner."

Referring to Fig. 4, the interaction behavior 44 of the user interacting with the browser/display process 40 may be continually monitored by a profile engine process 53 which generates results 42 that feed the profile database 50, and by a task model inference engine 56 that attempts to anticipate what the user needs. The task model inference engine 56, can invite the user to switch to a search 46 UI process 58 which allows the entry of a search string, as appropriate. It may also generate a query 48 automatically from preference data and submit to a search engine to generate filtering criteria to be used by the results display/browse process 40. This would happen if the user were browsing and the task model inference engine 56 identified a pattern in the browsing behavior (combined also with the preference data supplied via a prediction engine process 45) that suggested something specific, such as that the user was looking for a sports program to watch that day (user's browsing was restricted to sports programs for the same day). Thus there was enough data to generate a query and automatically generate a display result that matched it. The processes in Fig. 4, preferably, would allow the user to interrupt at various stages so, for example, if the user did not want to have the display filtered down to sports programs by the system, the system might offer to help first, or allow the user to backtrack or override the automatic function before the filtering took place. The task model inference engine 56 could also request a task model from the user in an offer to help if it did not have enough data to make an inference as to what the user wanted to see. In that case, it might offer to switch the user to a search UI process 58. The latter could be invoked directly 46 by request by the user from the results display/browse UI process 40 as well. Referring now to Fig. 8, a user begins using a television and chooses to watch either live television or to use an EPG in step SI 10. If the user chooses to use EPG, then the user could either browse the EPG or search it (step SI 15). If the user cho.se to search the EPG, a task model would be selected by the user in step SI 20, a query entered in step SI 52, a search submitted and results retrieved in step SI 55. Next, the results would be compared with the task model in step S160. If the results fit the task model (S165), the results would be displayed in step SI 70. If the results did not fit the task model (SI 65), the query modification strategy would be selected in step SI 22 and the strategy verified or modified by the user in step SI 16 (See for example, Figs. 3D-3F and attending discussion). If at step S 110, the user chose to watch live television, the system would infer a task model of one record at step SI 30 and observe the user interaction behavior in step SI 47 to determine if it warranted intervention to help the user. The criterion or criteria could be the amount of time the user spent searching, the number of items selected for detailed information, the randomness of the browsing, markers of inefficiency such as displaying details of items from a single category without filtering out records from other categories. If the criteria are satisfied, the system could propose to help by providing two options. The first option would be to take the user to the regular search system. The second would be to try to assist the user without a full search by relying on preference data in the profile database 50. A third option would be for the user to simply refuse help. If the first option is selected at step SI 47, the user would be taken to the regular search steps outlined before beginning at step SI 20. If the second option is chosen, the profile database would be consulted (SI 40) to see if enough information was present to generate a useful query to restrict the options. For example, let's say it is Thursday night and the user always watches various sitcoms on Thursday nights. If the system found relevant data (step SI 45) in the profile database, the system would then generate a query at step SI 32 and enter the previous flow at step SI 55. If the system were unable to find sufficient data to generate a query, it would then offer the user the option of performing a search at step S148 and if the user indicated a desire to do so, the system would proceed to step S120, otherwise, the user would be permitted to continue browsing or channel-surfing without further interaction. Alternatively, the entire process could restart from the beginning.

Note that while in the above discussion, it was assumed that the user would indicate the expected number of records in a form that represented a number and this number compared to the set returned by the search, an actual process need not be so concrete. A result set may include many records that precisely match the query, but are a poor match in that the search terms are not important terms (as determined by, for example, frequency of hits in record, whether the terms are "front page" terms, etc.) So records that are weak hits might be discounted and when compared to the expected results, if the expected number of records is close to the number expected but the results contain a lot of weak hits, the results might be expanded rather than contracted. The comparison need not be strict either. It can be a fuzzy set comparison.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims

CLAIMS:

1. A method of searching a database, comprising the steps of: receiving a first search query from a user; receiving a specification of a desired result sought to which a result of a filtering of said database based on said search query may be compared; comparing said specification with a result of a filtering of said database based on said search query; and responsively to a result of said step of comparing, receiving further data from said user; and responsively to said step of receiving further data, generating a second search query based on said first search query and said further data.

2. A method as in claim 1, said search further comprising, after said step of comparing, identifying at least one discriminant term from responsively to said result of a filtering, said discriminant term being a term in said result of a filtering that has the property of being present in some records in said result and not in other records of said result.

3. A method as in claim 2, wherein said discriminant term is selected responsively to a determination of relative importance compared to other terms in said some records.

A method of searching a database, comprising the steps of: receiving a search task definition containing information other than search terms; receiving search terms; searching a database based on said search terms; analyzing a result of said searching and identifying terms appearing in said result that are capable of discriminating some records of said result from other records of said result responsively to said task definition.

5. A method as in claim 4, wherein said task definition is indicative of a number of records sought.

6. A method as in claim 4, wherein said task definition is derived by the steps of: inputting commands to browse a database; and deriving a statistic based on said step of inputting.

7. A method of searching a database, comprising the steps of: receiving a first datum indicating a number of records expected to match a search definition; comparing said first datum to a second datum indicating a number of records actually matching said search definition; expanding said search definition to retrieve a larger number of records when said first datum is larger than said second.

8. A method as in claim 7, further comprising reducing said search to retrieve a smaller number of records when said first datum is smaller than said second datum.

9. A method of searching a database, comprising the steps of: receiving a first datum indicating a number of records expected to match a search definition; comparing said first datum to a second datum indicating a number of records actually matching said search definition; reducing said search definition to retrieve a smaller number of records when said first datum is smaller than said second.

10. A method as in claim 7, further comprising expanding said search to retrieve a larger number of records when said first datum is larger than said second datum.

11. A device for searching databases, comprising: a programmable controller (240) with a data store (235), an output device (230) and an input device (210, 211); said controller being programmed to receive a search definition through said input device; said controller being programmed to receive a task definition through said input device, said task definition indicating a number of records in said database predicted to be match said search definition; said controller being programmed to determine a number of said records that actually match said search definition; said controller being programmed to at least one of expand and contract said search definition responsively to a difference between said number indicated by said task definition and said number actually matching said search definition.

12. A method of searching a database, comprising the steps of: receiving a search definition; identifying ways to expand said search definition and receiving input from a user selecting at least one of said ways; expanding said search definition responsively to a result of said step of receiving input from a user; retrieving records from a database responsively to a result of said step of expanding.

13. A method as in claim 12, wherein said step of identifying includes at least one of identifying terms with a meaning that are similar to terms in said search definition and identifying terms with a spelling that are similar to terms in said search definition.

14. A method of searching a database, comprising the steps of: receiving a search definition; identifying ways to expand said search definition and receiving input from a user selecting at least one of said ways; reducing said search definition responsively to a result of said step of receiving input from a user; retrieving records from a database responsively to a result of said step of expanding.

15. A method as in claim 14, wherein said step of identifying includes at least one of identifying terms that appear in matching records matching said search definition that discriminate said matching records.

16. A device for searching databases, comprising: a programmable controller (240) with a data store (235) , an output device (230) and an input device (210, 211); said controller being programmed to receive a search definition through said input device; said controller being programmed to receive a task defimtion through said input device, said task definition indicating a number of records in said database predicted to be match said search definition; said controller being programmed to determine a number of said records that actually substantially match said search definition; said controller being programmed to at least one of expand and contract said search definition responsively to whether said number indicated by said task definition and said number actually matching said search definition are substantially different or substantially comparable.