CA2640035A1 - Formulating data search queries - Google Patents

Formulating data search queries Download PDF

Info

Publication number
CA2640035A1
CA2640035A1 CA002640035A CA2640035A CA2640035A1 CA 2640035 A1 CA2640035 A1 CA 2640035A1 CA 002640035 A CA002640035 A CA 002640035A CA 2640035 A CA2640035 A CA 2640035A CA 2640035 A1 CA2640035 A1 CA 2640035A1
Authority
CA
Canada
Prior art keywords
tokens
terms
search query
document
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA002640035A
Other languages
French (fr)
Other versions
CA2640035C (en
Inventor
Eric M. Robinson
Edward L. Walter
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
FTI Technology LLC
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of CA2640035A1 publication Critical patent/CA2640035A1/en
Application granted granted Critical
Publication of CA2640035C publication Critical patent/CA2640035C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3322Query formulation using system suggestions

Abstract

A system (10) and method (80) for formulating data search queries (142) is presented. A user interface (50) operable to specify an unstructured search criteria for a search query (142) on one or more documents (40) is provided.
An input portal (23) is exported to receive a data excerpt (51) selected to be searched against the documents (40). A selectable inclusiveness control (52) is exported to specify a granularity of inclusion (141) of matching tokens (142) within each document (40). A selectable proximity control (53) is exported to specify a degree of nearness (140) of the tokens (142) within each document (40). Tokens (142) derived from the data excerpt (51) and parameters corresponding to the granularity of inclusion (141) and the degree of nearness (140) are compiled into the search query (142).

Claims (31)

1. A system (10) for formulating data search queries (142), comprising:
a user interface (50) operable to specify an unstructured search criteria for a search query (142) on one or more documents (40), comprising:
an input portal (23) to receive a data excerpt (51) selected to be searched against the documents (40);
a selectable inclusiveness control (52) to specify a granularity of inclusion (141) of matching tokens (142) within each document (40);
a selectable proximity control (53) to specify a degree of nearness (140) of the tokens (142) within each document (40); and a document searcher (35) to compile tokens (142) derived from the data excerpt (51) and parameters corresponding to the granularity of inclusion (141) and the degree of nearness (140) into the search query (142).
2. A system (10) according to Claim 1, further comprising:
a storage (136) to maintain the target corpus (137) comprising the documents (40) indexed to facilitate searching; and a search engine (135) to execute the search query (142) against the documents (40) maintained in the target corpus (137), wherein search results (56) identified by the search query (142) execution are presented (90).
3. A system (10) according to Claim 1, further comprising:
a parser to extract the tokens (142) from the data excerpt (51).
4. A system (10) according to Claim 1, wherein the granularity of inclusiveness (141) on a continuum vary between a Boolean OR operation of all tokens (142) and a Boolean AND operation of all tokens (142).
5. A system (10) according to Claim 1, wherein a number of tokens h (142) that must be matched by one or more words (41-46) in each target document (40) are determined in accordance with the equation:

h = int(N * p + 1) where N is a total number of the tokens (142) and 0.0 <= p < 1.0 is a value representing the granularity of inclusiveness (141) specified through the selectable inclusiveness control (52).
6. A system (10) according to Claim 1, wherein the degree of nearness (140) on a continuum vary between a span equal to a number of the tokens (142) and a number of terms (41-46) in each document (40).
7. A system (10) according to Claim 1, wherein a span s to be applied and a number of tokens (142) to combine c during searching of each document (40) are determined in accordance with the equations:

s = p c = MaxInt(2, N * p2) where N is a number of the tokens (142) and 0.0 < p <= 1.0 is a value representing the degree of nearness (140) specified through the selectable proximity control (53).
8. A system (10) according to Claim 1, further comprising:
a document analyzer to assign weights to terms (41-46) based on structural location within each document (40), wherein the search query terms (142) are modified to favor the terms (41-46) having higher weights over the terms (41-46) having lower weights.
9. A system (10) according to Claim 8, wherein the higher weights are assigned to the terms (41-46) occurring in a structural location selected from the group comprising titles, headings, tables of content, and indexes.
10. A system (10) according to Claim 1, further comprising:
a query processor to broaden the tokens (142), comprising:
a word analyzer to derive a normalized root stem for each token (142) and to identify one or more synonyms for the normalized root stem, wherein the synonyms are conjunctively included with the token (142) in the search query (142).
11. A system (10) according to Claim 1, further comprising:
a selection control operable to specify at least one of one or more required terms (41-46) and one or more optional terms (41-46) in the data excerpt (51), wherein the search query terms (142) are modified to always include the required terms (41-46) and to permissively include the optional terms (41-46).
12. A system (10) according to Claim 1, further comprising:
an ordering control operable to specify precedence of the tokens (142), wherein the search query terms (142) are modified to favor the terms (41-46) having higher precedence.
13. A system (10) according to Claim 1, further comprising:
a search scope control operable to specify documents (40) to be searched, wherein the search query (142) is modified to search the specified documents (40).
14. A system (10) according to Claim 1, wherein the selectable inclusiveness control (52) and the selectable proximity control (53) are provided as a one of single selectable controls or combined controls selected from the group comprising rotary or gimbal knobs, slider bars, radio buttons, and user input mechanisms that allow continuous or discrete selection over a fixed range of rotation, movement, or selection.
15. A system (10) according to Claim 1, wherein the data excerpt (51) comprises at least one of textual data, binary data, and an encapsulated search query (142).
16. A method (80) for formulating data search queries (142), comprising:
providing (82) a user interface (50) operable to specify an unstructured search criteria for a search query (142) on one or more documents (40), comprising:

exporting an input portal (23) to receive a data excerpt (51) selected to be searched against the documents (40);
exporting a selectable inclusiveness control (52) to specify a granularity of inclusion (141) of matching tokens (142) within each document (40);
exporting a selectable proximity control (53) to specify a degree of nearness (140) of the tokens (142) within each document (40); and compiling tokens (142) derived from the data excerpt (51) and parameters corresponding to the granularity of inclusion (141) and the degree of nearness (140) into the search query (142).
17. A method (80) according to Claim 16, further comprising:
maintaining the target corpus (137) comprising the documents (40) indexed to facilitate searching;
executing the search query (142) against the documents (40) maintained in the target corpus (137); and presenting (90) search results (56) identified by the search query (142) execution.
18. A method (80) according to Claim 16, further comprising:
extracting the tokens (142) from the data excerpt (51).
19. A method (80) according to Claim 16, further comprising:
varying the granularity of inclusiveness (141) on a continuum between a Boolean OR operation of all tokens (142) and a Boolean AND operation of all tokens (142).
20. A method (80) according to Claim 16, further comprising:
determining a number of tokens h (142) that must be matched by one or more words (41-46) in each target document (40) in accordance with the equation:

h=int(N* p+1) where N is a total number of the tokens (142) and 0.0 <= p < 1.0 is a value representing the granularity of inclusiveness (141) specified through the selectable inclusiveness control (52).
21. A method (80) according to Claim 16, further comprising:
varying the degree of nearness (140) on a continuum between a span equal to a number of the tokens (142) and a number of terms (41-46) in each document (40).
22. A method (80) according to Claim 16, further comprising:
determining a span s to be applied and a number of tokens (142) to combine c during searching of each document (40) in accordance with the equations:

s= c = MaxInt(2, N * p2) where N is a number of the tokens (142) and 0.0 < p <= 1.0 is a value representing the degree of nearness (140) specified through the selectable proximity control (53).
23. A method (80) according to Claim 16, further comprising:
assigning weights to terms (41-46) based on structural location within each document (40); and modifying the search query terms (142) to favor the terms (41-46) having higher weights over the terms (41-46) having lower weights.
24. A method (80) according to Claim 23, wherein the higher weights are assigned to the terms (41-46) occurring in a structural location selected from the group comprising titles, headings, tables of content, and indexes.
25. A method (80) according to Claim 16, further comprising:
broadening the tokens (142), comprising:
deriving a normalized root stem for each token (142);

identifying one or more synonyms for the normalized root stem; and conjunctively including the synonyms with the token (142) in the search query (142).
26. A method (80) according to Claim 16, further comprising:
exporting a selection control operable to specify at least one of one or more required terms (41-46) and one or more optional terms (41-46) in the data excerpt (S 1); and modifying the search query terms (142) to always include the required terms (41-46) and to permissively include the optional terms (41-46).
27. A method (80) according to Claim 16, further comprising:
exporting an ordering control operable to specify precedence of the tokens (142); and modifying the search query terms (142) to favor the terms (41-46) having higher precedence.
28. A method (80) according to Claim 16, further comprising:
exporting a search scope control operable to specify documents (40) to be searched; and limiting the search query (142) to search the specified documents (40).
29. A method (80) according to Claim 16, further comprising:
providing the selectable inclusiveness control (52) and the selectable proximity control (53) as a one of single selectable controls or combined controls selected from the group comprising rotary or gimbal knobs, slider bars, radio buttons, and user input mechanisms that allow continuous or discrete selection over a fixed range of rotation, movement, or selection.
30. A method (80) according to Claim 16, wherein the data excerpt (S 1) comprises at least one of textual data, binary data, and an encapsulated search query (142).
31. A computer-readable storage medium holding code for performing the method (80) according to Claim 16.
CA2640035A 2006-01-27 2007-01-26 Formulating data search queries Expired - Fee Related CA2640035C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US11/341,128 US20070179940A1 (en) 2006-01-27 2006-01-27 System and method for formulating data search queries
US11/341,128 2006-01-27
PCT/US2007/002329 WO2007089672A1 (en) 2006-01-27 2007-01-26 Formulating data search queries

Publications (2)

Publication Number Publication Date
CA2640035A1 true CA2640035A1 (en) 2007-08-09
CA2640035C CA2640035C (en) 2014-10-14

Family

ID=38015415

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2640035A Expired - Fee Related CA2640035C (en) 2006-01-27 2007-01-26 Formulating data search queries

Country Status (4)

Country Link
US (1) US20070179940A1 (en)
EP (1) EP1977350A1 (en)
CA (1) CA2640035C (en)
WO (1) WO2007089672A1 (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8131747B2 (en) * 2006-03-15 2012-03-06 The Invention Science Fund I, Llc Live search with use restriction
US7848956B1 (en) 2006-03-30 2010-12-07 Creative Byline, LLC Creative media marketplace system and method
US8555182B2 (en) * 2006-06-07 2013-10-08 Microsoft Corporation Interface for managing search term importance relationships
US20100036813A1 (en) * 2006-07-12 2010-02-11 Coolrock Software Pty Ltd Apparatus and method for securely processing electronic mail
US9070172B2 (en) * 2007-08-27 2015-06-30 Schlumberger Technology Corporation Method and system for data context service
US20100145923A1 (en) * 2008-12-04 2010-06-10 Microsoft Corporation Relaxed filter set
US9256265B2 (en) 2009-12-30 2016-02-09 Nvidia Corporation Method and system for artificially and dynamically limiting the framerate of a graphics processing unit
US9830889B2 (en) * 2009-12-31 2017-11-28 Nvidia Corporation Methods and system for artifically and dynamically limiting the display resolution of an application
US9171350B2 (en) 2010-10-28 2015-10-27 Nvidia Corporation Adaptive resolution DGPU rendering to provide constant framerate with free IGPU scale up
US10678870B2 (en) * 2013-01-15 2020-06-09 Open Text Sa Ulc System and method for search discovery
US9122681B2 (en) 2013-03-15 2015-09-01 Gordon Villy Cormack Systems and methods for classifying electronic information using advanced active learning techniques
US10324965B2 (en) 2014-12-30 2019-06-18 International Business Machines Corporation Techniques for suggesting patterns in unstructured documents
US10229117B2 (en) 2015-06-19 2019-03-12 Gordon V. Cormack Systems and methods for conducting a highly autonomous technology-assisted review classification
US11281687B2 (en) * 2020-01-17 2022-03-22 Sigma Computing, Inc. Compiling a database query

Family Cites Families (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6283787A (en) * 1985-10-09 1987-04-17 株式会社日立製作所 Output control system for display screen
US5056021A (en) * 1989-06-08 1991-10-08 Carolyn Ausborn Method and apparatus for abstracting concepts from natural language
US5278980A (en) * 1991-08-16 1994-01-11 Xerox Corporation Iterative technique for phrase query formation and an information retrieval system employing same
US5488725A (en) * 1991-10-08 1996-01-30 West Publishing Company System of document representation retrieval by successive iterated probability sampling
JPH0756933A (en) * 1993-06-24 1995-03-03 Xerox Corp Method for retrieval of document
US6173275B1 (en) * 1993-09-20 2001-01-09 Hnc Software, Inc. Representation and retrieval of images using context vectors derived from image information elements
US5724571A (en) * 1995-07-07 1998-03-03 Sun Microsystems, Inc. Method and apparatus for generating query responses in a computer-based document retrieval system
US5737734A (en) * 1995-09-15 1998-04-07 Infonautics Corporation Query word relevance adjustment in a search of an information retrieval system
US5842203A (en) * 1995-12-01 1998-11-24 International Business Machines Corporation Method and system for performing non-boolean search queries in a graphical user interface
US5920854A (en) * 1996-08-14 1999-07-06 Infoseek Corporation Real-time document collection search engine with phrase indexing
US5870740A (en) * 1996-09-30 1999-02-09 Apple Computer, Inc. System and method for improving the ranking of information retrieval results for short queries
US5966126A (en) * 1996-12-23 1999-10-12 Szabo; Andrew J. Graphic user interface for database system
GB9713019D0 (en) * 1997-06-20 1997-08-27 Xerox Corp Linguistic search system
US6012053A (en) * 1997-06-23 2000-01-04 Lycos, Inc. Computer system with user-controlled relevance ranking of search results
US6094649A (en) * 1997-12-22 2000-07-25 Partnet, Inc. Keyword searches of structured databases
US6216123B1 (en) * 1998-06-24 2001-04-10 Novell, Inc. Method and system for rapid retrieval in a full text indexing system
US6446061B1 (en) * 1998-07-31 2002-09-03 International Business Machines Corporation Taxonomy generation for document collections
US6243713B1 (en) * 1998-08-24 2001-06-05 Excalibur Technologies Corp. Multimedia document retrieval by application of multimedia queries to a unified index of multimedia data for a plurality of multimedia data types
US6480843B2 (en) * 1998-11-03 2002-11-12 Nec Usa, Inc. Supporting web-query expansion efficiently using multi-granularity indexing and query processing
US6363374B1 (en) * 1998-12-31 2002-03-26 Microsoft Corporation Text proximity filtering in search systems using same sentence restrictions
US6510406B1 (en) * 1999-03-23 2003-01-21 Mathsoft, Inc. Inverse inference engine for high performance web search
US6408294B1 (en) * 1999-03-31 2002-06-18 Verizon Laboratories Inc. Common term optimization
US6629097B1 (en) * 1999-04-28 2003-09-30 Douglas K. Keith Displaying implicit associations among items in loosely-structured data sets
US6493703B1 (en) * 1999-05-11 2002-12-10 Prophet Financial Systems System and method for implementing intelligent online community message board
US6701305B1 (en) * 1999-06-09 2004-03-02 The Boeing Company Methods, apparatus and computer program products for information retrieval and document classification utilizing a multidimensional subspace
US6711585B1 (en) * 1999-06-15 2004-03-23 Kanisa Inc. System and method for implementing a knowledge management system
US6438537B1 (en) * 1999-06-22 2002-08-20 Microsoft Corporation Usage based aggregation optimization
US6542889B1 (en) * 2000-01-28 2003-04-01 International Business Machines Corporation Methods and apparatus for similarity text search based on conceptual indexing
US6560597B1 (en) * 2000-03-21 2003-05-06 International Business Machines Corporation Concept decomposition using clustering
US6915308B1 (en) * 2000-04-06 2005-07-05 Claritech Corporation Method and apparatus for information mining and filtering
US6738759B1 (en) * 2000-07-07 2004-05-18 Infoglide Corporation, Inc. System and method for performing similarity searching using pointer optimization
US6675159B1 (en) * 2000-07-27 2004-01-06 Science Applic Int Corp Concept-based search and retrieval system
US20020032735A1 (en) * 2000-08-25 2002-03-14 Daniel Burnstein Apparatus, means and methods for automatic community formation for phones and computer networks
WO2002063493A1 (en) * 2001-02-08 2002-08-15 2028, Inc. Methods and systems for automated semantic knowledge leveraging graph theoretic analysis and the inherent structure of communication
US6823333B2 (en) * 2001-03-02 2004-11-23 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration System, method and apparatus for conducting a keyterm search
US6714929B1 (en) * 2001-04-13 2004-03-30 Auguri Corporation Weighted preference data search system and method
US20030172048A1 (en) * 2002-03-06 2003-09-11 Business Machines Corporation Text search system for complex queries
US7188107B2 (en) * 2002-03-06 2007-03-06 Infoglide Software Corporation System and method for classification of documents
US6886010B2 (en) * 2002-09-30 2005-04-26 The United States Of America As Represented By The Secretary Of The Navy Method for data and text mining and literature-based discovery
US7246113B2 (en) * 2002-10-02 2007-07-17 General Electric Company Systems and methods for selecting a material that best matches a desired set of properties
US20040215608A1 (en) * 2003-04-25 2004-10-28 Alastair Gourlay Search engine supplemented with URL's that provide access to the search results from predefined search queries
US20040243556A1 (en) * 2003-05-30 2004-12-02 International Business Machines Corporation System, method and computer program product for performing unstructured information management and automatic text analysis, and including a document common analysis system (CAS)
US7146361B2 (en) * 2003-05-30 2006-12-05 International Business Machines Corporation System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a Weighted AND (WAND)
US7433893B2 (en) * 2004-03-08 2008-10-07 Marpex Inc. Method and system for compression indexing and efficient proximity search of text data
US7584221B2 (en) * 2004-03-18 2009-09-01 Microsoft Corporation Field weighting in text searching
US7716223B2 (en) * 2004-03-29 2010-05-11 Google Inc. Variable personalization of search results in a search engine
US7761447B2 (en) * 2004-04-08 2010-07-20 Microsoft Corporation Systems and methods that rank search results
US20050283473A1 (en) * 2004-06-17 2005-12-22 Armand Rousso Apparatus, method and system of artificial intelligence for data searching applications
US7562069B1 (en) * 2004-07-01 2009-07-14 Aol Llc Query disambiguation
US20060053382A1 (en) * 2004-09-03 2006-03-09 Biowisdom Limited System and method for facilitating user interaction with multi-relational ontologies
US20060122997A1 (en) * 2004-12-02 2006-06-08 Dah-Chih Lin System and method for text searching using weighted keywords
US20070112758A1 (en) * 2005-11-14 2007-05-17 Aol Llc Displaying User Feedback for Search Results From People Related to a User
US8442972B2 (en) * 2006-10-11 2013-05-14 Collarity, Inc. Negative associations for search results ranking and refinement
US20080228675A1 (en) * 2006-10-13 2008-09-18 Move, Inc. Multi-tiered cascading crawling system
US20090228811A1 (en) * 2008-03-10 2009-09-10 Randy Adams Systems and methods for processing a plurality of documents

Also Published As

Publication number Publication date
CA2640035C (en) 2014-10-14
US20070179940A1 (en) 2007-08-02
WO2007089672A1 (en) 2007-08-09
EP1977350A1 (en) 2008-10-08

Similar Documents

Publication Publication Date Title
CA2640035A1 (en) Formulating data search queries
Tao et al. Regularized estimation of mixture models for robust pseudo-relevance feedback
US8725732B1 (en) Classifying text into hierarchical categories
EP3161618A1 (en) Code recommendation
CN104298715B (en) A kind of more indexed results ordering by merging methods based on TF IDF
EP2631815A1 (en) Method and device for ordering search results, method and device for providing information
US20170262528A1 (en) System and method of content based recommendation using hypernym expansion
Atlam et al. Documents similarity measurement using field association terms
AU2013231149B2 (en) Systems and methods for keyword research and content analysis
US9372895B1 (en) Keyword search method using visual keyword grouping interface
Singh et al. An effective tokenization algorithm for information retrieval systems
US8577865B2 (en) Document searching system
Ru et al. Indexing the invisible web: a survey
JP2004341753A (en) Retrieval support device, retrieval support method and program
Ghanem et al. Stemming effectiveness in clustering of Arabic documents
CN106649879A (en) Method for intelligent recommendation of professional book in library
de Vries et al. Search by strategy
Zhang et al. ICTIR Subtopic Mining System at NTCIR-9 INTENT Task.
Blanco et al. Supporting the automatic construction of entity aware search engines
Saggion et al. Exploring the performance of boolean retrieval strategies for open domain question answering
Cieslewicz et al. Poznan Contribution to TREC-PM 2019.
Zhang et al. NKU at TREC 2016: Clinical Decision Support Track.
Yu et al. The design and realization of open-source search engine based on Nutch
Gao et al. Combining strategies for XML retrieval
Verberne et al. Author-topic profiles for academic search

Legal Events

Date Code Title Description
EEER Examination request
MKLA Lapsed

Effective date: 20210831

MKLA Lapsed

Effective date: 20200127