CA2534273A1 - Performing efficient document scoring and clustering - Google Patents
Performing efficient document scoring and clustering Download PDFInfo
- Publication number
- CA2534273A1 CA2534273A1 CA002534273A CA2534273A CA2534273A1 CA 2534273 A1 CA2534273 A1 CA 2534273A1 CA 002534273 A CA002534273 A CA 002534273A CA 2534273 A CA2534273 A CA 2534273A CA 2534273 A1 CA2534273 A1 CA 2534273A1
- Authority
- CA
- Canada
- Prior art keywords
- concept
- score
- weight
- documents
- document
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99934—Query formulation, input preparation, or translation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99937—Sorting
Abstract
A system (10) and method (80) for grouping clusters (58) of semantically scored documents (14) is presented. A score (52) assigned to at least one concept (49) extracted from a plurality of documents (14) is determined based on at least one of a frequency of occurrence (53) of the at least one concept (49) within at least one such document (14), a concept weight (54), a structural weight (55), and a corpus weight (56). Clusters (58) of the documents (14) are formed by applying the score (52) for the at least one concept (49) to a best fit criterion for each such document (14).
Claims (17)
1. A system (10) for grouping clusters (58) of semantically scored documents (14), comprising:
a scoring module (42) determining a score (52) assigned to at least one concept (49) extracted from a plurality of documents (14) based on at least one of a frequency of occurrence (53) of the at least one concept (49) within at least one such document (14), a concept weight (54), a structural weight (55), and a corpus weight (56); and a clustering module (43) forming clusters (58) of the documents (14) by applying the score (52) for the at least one concept (49) to a best fit criterion for each such document (14).
a scoring module (42) determining a score (52) assigned to at least one concept (49) extracted from a plurality of documents (14) based on at least one of a frequency of occurrence (53) of the at least one concept (49) within at least one such document (14), a concept weight (54), a structural weight (55), and a corpus weight (56); and a clustering module (43) forming clusters (58) of the documents (14) by applying the score (52) for the at least one concept (49) to a best fit criterion for each such document (14).
2. A system (10) according to Claim 1, further comprising:
the scoring module (42) calculating the score (52) as a function of a summation of at least one of the frequency of occurrence (53), the concept weight (54), the structural weight (55), and the corpus weight (56) of the at least one concept (49).
the scoring module (42) calculating the score (52) as a function of a summation of at least one of the frequency of occurrence (53), the concept weight (54), the structural weight (55), and the corpus weight (56) of the at least one concept (49).
3. A system (10) according to Claim 2, further comprising:
a compression module (31) compressing the score (52) through logarithmic compression.
a compression module (31) compressing the score (52) through logarithmic compression.
4. A system (10) according to Claim 1, further comprising:
the scoring module (42) calculating the concept weight (54) as a function of a number of terms (50) comprising the at least one concept (49).
the scoring module (42) calculating the concept weight (54) as a function of a number of terms (50) comprising the at least one concept (49).
5. A system (10) according to Claim 1, further comprising:
the scoring module (42) calculating the structural weight (55) as a function of a location (125) of the at least one concept (49) within the at least one such document (14).
the scoring module (42) calculating the structural weight (55) as a function of a location (125) of the at least one concept (49) within the at least one such document (14).
6. A system (10) according to Claim 1, further comprising:
the scoring module (42) calculating the corpus weight (56) as a function of a reference count (152) of the at least one concept (49) over the plurality of documents (14).
the scoring module (42) calculating the corpus weight (56) as a function of a reference count (152) of the at least one concept (49) over the plurality of documents (14).
7. A system (10) according to Claim 1, further comprising:
the scoring module (42) forming the score (52) assigned to the at least one concept (49) to a normalized score vector (57) for each such document (14), determining a similarity between the normalized score vector (57) for each such document (14) as an inner product of each normalized score vector (57), and applying the similarity to the best fit criterion.
the scoring module (42) forming the score (52) assigned to the at least one concept (49) to a normalized score vector (57) for each such document (14), determining a similarity between the normalized score vector (57) for each such document (14) as an inner product of each normalized score vector (57), and applying the similarity to the best fit criterion.
8. A system (10) according to Claim 1, further comprising:
the clustering module (43) evaluating a set of candidate seed documents (14) selected from the plurality of documents (14), identifying a set of seed documents (14) by applying the score (52) for the at least one concept (49) to a best fit criterion for each such candidate seed document (14), and basing the best fit criterion on the score (52) of each such seed document (14).
the clustering module (43) evaluating a set of candidate seed documents (14) selected from the plurality of documents (14), identifying a set of seed documents (14) by applying the score (52) for the at least one concept (49) to a best fit criterion for each such candidate seed document (14), and basing the best fit criterion on the score (52) of each such seed document (14).
9. A method (80) for grouping clusters (58) of semantically scored documents (14), comprising:
determining (140) a score (52) assigned to at least one concept (49) extracted from a plurality of documents (14) based on at least one of a frequency of occurrence (53) of the at least one concept (49) within at least one such document (14), a concept weight (54), a structural weight (55), and a corpus weight (56); and forming (160) clusters (58) of the documents (14) by applying the score (52) for the at least one concept (49) to a best fit criterion for each such document (14)s.
determining (140) a score (52) assigned to at least one concept (49) extracted from a plurality of documents (14) based on at least one of a frequency of occurrence (53) of the at least one concept (49) within at least one such document (14), a concept weight (54), a structural weight (55), and a corpus weight (56); and forming (160) clusters (58) of the documents (14) by applying the score (52) for the at least one concept (49) to a best fit criterion for each such document (14)s.
10. A method (80) according to Claim 9, further comprising:
calculating (145) the score (52) as a function of a summation of at least one of the frequency of occurrence (53), the concept weight (54), the structural weight (55), and the corpus weight (56) of the at least one concept (49).
calculating (145) the score (52) as a function of a summation of at least one of the frequency of occurrence (53), the concept weight (54), the structural weight (55), and the corpus weight (56) of the at least one concept (49).
11. A method (80) according to Claim 10, further comprising:
compressing (146) the score (52) through logarithmic compression.
compressing (146) the score (52) through logarithmic compression.
12. A method (80) according to Claim 9, further comprising:
calculating (142) the concept weight (54) as a function of a number of terms (50) comprising the at least one concept (49).
calculating (142) the concept weight (54) as a function of a number of terms (50) comprising the at least one concept (49).
13. A method (80) according to Claim 9, further comprising:
calculating (143) the structural weight (55) as a function of a location (125) of the at least one concept (49) within the at least one such document (14).
calculating (143) the structural weight (55) as a function of a location (125) of the at least one concept (49) within the at least one such document (14).
14. A method (80) according to Claim 9, further comprising:
calculating (144) the corpus weight (56) as a function of a reference count (152) of the at least one concept (49) over the plurality of documents (14).
calculating (144) the corpus weight (56) as a function of a reference count (152) of the at least one concept (49) over the plurality of documents (14).
15. A method (80) according to Claim 9, further comprising:
forming the score (52) assigned to the at least one concept (49) to a normalized score vector (57) for each such document (14);
determining (165) a similarity between the normalized score vector (57) for each such document (14) as an inner product of each normalized score vector (57); and applying (177) the similarity to the best fit criterion.
forming the score (52) assigned to the at least one concept (49) to a normalized score vector (57) for each such document (14);
determining (165) a similarity between the normalized score vector (57) for each such document (14) as an inner product of each normalized score vector (57); and applying (177) the similarity to the best fit criterion.
16. A method (80) according to Claim 9, further comprising:
evaluating (161) a set of candidate seed documents (14) selected from the plurality of documents (14);
identifying (166) a set of seed documents (14) by applying the score (52) for the at least one concept (49) to a best fit criterion for each such candidate seed document (14); and basing (177) the best fit criterion on the score (52) of each such seed document (14).
evaluating (161) a set of candidate seed documents (14) selected from the plurality of documents (14);
identifying (166) a set of seed documents (14) by applying the score (52) for the at least one concept (49) to a best fit criterion for each such candidate seed document (14); and basing (177) the best fit criterion on the score (52) of each such seed document (14).
17. A computer-readable storage medium (11) holding code for performing the method (80) of Claim 9.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/626,984 | 2003-07-25 | ||
US10/626,984 US7610313B2 (en) | 2003-07-25 | 2003-07-25 | System and method for performing efficient document scoring and clustering |
PCT/US2004/023955 WO2005013152A1 (en) | 2003-07-25 | 2004-07-23 | Performing efficient document scoring and clustering |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2534273A1 true CA2534273A1 (en) | 2005-02-10 |
CA2534273C CA2534273C (en) | 2013-08-20 |
Family
ID=34080524
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA2534273A Active CA2534273C (en) | 2003-07-25 | 2004-07-23 | Performing efficient document scoring and clustering |
Country Status (4)
Country | Link |
---|---|
US (3) | US7610313B2 (en) |
EP (1) | EP1652119A1 (en) |
CA (1) | CA2534273C (en) |
WO (1) | WO2005013152A1 (en) |
Families Citing this family (136)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7099671B2 (en) * | 2001-01-16 | 2006-08-29 | Texas Instruments Incorporated | Collaborative mechanism of enhanced coexistence of collocated wireless networks |
US6978274B1 (en) * | 2001-08-31 | 2005-12-20 | Attenex Corporation | System and method for dynamically evaluating latent concepts in unstructured documents |
JP2004164036A (en) * | 2002-11-08 | 2004-06-10 | Hewlett Packard Co <Hp> | Method for evaluating commonality of document |
JP4828091B2 (en) * | 2003-03-05 | 2011-11-30 | ヒューレット・パッカード・カンパニー | Clustering method program and apparatus |
US8554601B1 (en) | 2003-08-22 | 2013-10-08 | Amazon Technologies, Inc. | Managing content based on reputation |
US7519565B2 (en) * | 2003-11-03 | 2009-04-14 | Cloudmark, Inc. | Methods and apparatuses for classifying electronic documents |
US20050149546A1 (en) * | 2003-11-03 | 2005-07-07 | Prakash Vipul V. | Methods and apparatuses for determining and designating classifications of electronic documents |
WO2005081138A1 (en) | 2004-02-13 | 2005-09-01 | Attenex Corporation | Arranging concept clusters in thematic neighborhood relationships in a two-dimensional display |
US7191175B2 (en) | 2004-02-13 | 2007-03-13 | Attenex Corporation | System and method for arranging concept clusters in thematic neighborhood relationships in a two-dimensional visual display space |
US8775436B1 (en) | 2004-03-19 | 2014-07-08 | Google Inc. | Image selection for news search |
US7617176B2 (en) * | 2004-07-13 | 2009-11-10 | Microsoft Corporation | Query-based snippet clustering for search result grouping |
US7805446B2 (en) * | 2004-10-12 | 2010-09-28 | Ut-Battelle Llc | Agent-based method for distributed clustering of textual information |
US7624092B2 (en) * | 2004-11-19 | 2009-11-24 | Sap Aktiengesellschaft | Concept-based content architecture |
CN1609859A (en) * | 2004-11-26 | 2005-04-27 | 孙斌 | Search result clustering method |
US7356777B2 (en) | 2005-01-26 | 2008-04-08 | Attenex Corporation | System and method for providing a dynamic user interface for a dense three-dimensional scene |
JP4368336B2 (en) * | 2005-07-13 | 2009-11-18 | 富士通株式会社 | Category setting support method and apparatus |
JP2007041733A (en) * | 2005-08-01 | 2007-02-15 | Toyota Motor Corp | Attitude angle detection device for motion object |
US7644373B2 (en) | 2006-01-23 | 2010-01-05 | Microsoft Corporation | User interface for viewing clusters of images |
US7836050B2 (en) | 2006-01-25 | 2010-11-16 | Microsoft Corporation | Ranking content based on relevance and quality |
US7814040B1 (en) | 2006-01-31 | 2010-10-12 | The Research Foundation Of State University Of New York | System and method for image annotation and multi-modal image retrieval using probabilistic semantic models |
US20070203865A1 (en) * | 2006-02-09 | 2007-08-30 | Hirsch Martin C | Apparatus and methods for an item retrieval system |
GB0611303D0 (en) * | 2006-06-08 | 2006-07-19 | Ibm | A method, apparatus and software for selecting terms for a glossary in a document processing system |
US9015172B2 (en) | 2006-09-22 | 2015-04-21 | Limelight Networks, Inc. | Method and subsystem for searching media content within a content-search service system |
US8966389B2 (en) * | 2006-09-22 | 2015-02-24 | Limelight Networks, Inc. | Visual interface for identifying positions of interest within a sequentially ordered information encoding |
US8396878B2 (en) | 2006-09-22 | 2013-03-12 | Limelight Networks, Inc. | Methods and systems for generating automated tags for video files |
US8204891B2 (en) * | 2007-09-21 | 2012-06-19 | Limelight Networks, Inc. | Method and subsystem for searching media content within a content-search-service system |
US7917492B2 (en) * | 2007-09-21 | 2011-03-29 | Limelight Networks, Inc. | Method and subsystem for information acquisition and aggregation to facilitate ontology and language-model generation within a content-search-service system |
US8745055B2 (en) * | 2006-09-28 | 2014-06-03 | Symantec Operating Corporation | Clustering system and method |
US7707208B2 (en) * | 2006-10-10 | 2010-04-27 | Microsoft Corporation | Identifying sight for a location |
US7512605B2 (en) * | 2006-11-01 | 2009-03-31 | International Business Machines Corporation | Document clustering based on cohesive terms |
US20080195567A1 (en) * | 2007-02-13 | 2008-08-14 | International Business Machines Corporation | Information mining using domain specific conceptual structures |
WO2008126184A1 (en) * | 2007-03-16 | 2008-10-23 | Fujitsu Limited | Document degree-of-importance calculating program |
US20080243482A1 (en) * | 2007-03-28 | 2008-10-02 | Siemens Aktiengesellschaft | Method for performing effective drill-down operations in text corpus visualization and exploration using language model approaches for key phrase weighting |
EP2160677B1 (en) * | 2007-06-26 | 2019-10-02 | Endeca Technologies, INC. | System and method for measuring the quality of document sets |
US8935249B2 (en) | 2007-06-26 | 2015-01-13 | Oracle Otc Subsidiary Llc | Visualization of concepts within a collection of information |
US8543380B2 (en) * | 2007-10-05 | 2013-09-24 | Fujitsu Limited | Determining a document specificity |
US7912843B2 (en) * | 2007-10-29 | 2011-03-22 | Yahoo! Inc. | Method for selecting electronic advertisements using machine translation techniques |
EP2065816A1 (en) * | 2007-11-28 | 2009-06-03 | British Telecommunications public limited company | Computer file storage |
WO2009073664A2 (en) * | 2007-12-04 | 2009-06-11 | Google Inc. | Rating raters |
US8150842B2 (en) | 2007-12-12 | 2012-04-03 | Google Inc. | Reputation of an author of online content |
US8136034B2 (en) | 2007-12-18 | 2012-03-13 | Aaron Stanton | System and method for analyzing and categorizing text |
US8316022B2 (en) * | 2007-12-21 | 2012-11-20 | Canon Kabushiki Kaisha | Information processing apparatus and information processing method |
US9146985B2 (en) * | 2008-01-07 | 2015-09-29 | Novell, Inc. | Techniques for evaluating patent impacts |
US8788523B2 (en) * | 2008-01-15 | 2014-07-22 | Thomson Reuters Global Resources | Systems, methods and software for processing phrases and clauses in legal documents |
US7877404B2 (en) * | 2008-03-05 | 2011-01-25 | Microsoft Corporation | Query classification based on query click logs |
US8086502B2 (en) | 2008-03-31 | 2011-12-27 | Ebay Inc. | Method and system for mobile publication |
US9135331B2 (en) | 2008-04-07 | 2015-09-15 | Philip J. Rosenthal | Interface including graphic representation of relationships between search results |
US7991646B2 (en) | 2008-10-30 | 2011-08-02 | Ebay Inc. | Systems and methods for marketplace listings using a camera enabled mobile device |
US8086631B2 (en) * | 2008-12-12 | 2011-12-27 | Microsoft Corporation | Search result diversification |
US8325362B2 (en) * | 2008-12-23 | 2012-12-04 | Microsoft Corporation | Choosing the next document |
US8458105B2 (en) * | 2009-02-12 | 2013-06-04 | Decisive Analytics Corporation | Method and apparatus for analyzing and interrelating data |
US20100235314A1 (en) * | 2009-02-12 | 2010-09-16 | Decisive Analytics Corporation | Method and apparatus for analyzing and interrelating video data |
US8825660B2 (en) * | 2009-03-17 | 2014-09-02 | Ebay Inc. | Image-based indexing in a network-based marketplace |
US8719016B1 (en) | 2009-04-07 | 2014-05-06 | Verint Americas Inc. | Speech analytics system and system and method for determining structured speech |
US8103650B1 (en) * | 2009-06-29 | 2012-01-24 | Adchemy, Inc. | Generating targeted paid search campaigns |
US8463783B1 (en) * | 2009-07-06 | 2013-06-11 | Google Inc. | Advertisement selection data clustering |
US8635223B2 (en) | 2009-07-28 | 2014-01-21 | Fti Consulting, Inc. | System and method for providing a classification suggestion for electronically stored information |
EP2471009A1 (en) | 2009-08-24 | 2012-07-04 | FTI Technology LLC | Generating a reference set for use during document review |
US9323426B2 (en) * | 2009-10-05 | 2016-04-26 | Google Inc. | System and method for selecting information for display based on past user interactions |
US9046991B2 (en) * | 2009-11-30 | 2015-06-02 | Hewlett-Packard Development Company, L.P. | System and method for dynamically displaying structurally dissimilar thumbnail images of an electronic document |
US20110218993A1 (en) * | 2010-03-02 | 2011-09-08 | Knewco, Inc. | Semantic page analysis for prioritizing concepts |
US9792638B2 (en) | 2010-03-29 | 2017-10-17 | Ebay Inc. | Using silhouette images to reduce product selection error in an e-commerce environment |
US8819052B2 (en) | 2010-03-29 | 2014-08-26 | Ebay Inc. | Traffic driver for suggesting stores |
US8861844B2 (en) | 2010-03-29 | 2014-10-14 | Ebay Inc. | Pre-computing digests for image similarity searching of image-based listings in a network-based publication system |
US8949252B2 (en) | 2010-03-29 | 2015-02-03 | Ebay Inc. | Product category optimization for image similarity searching of image-based listings in a network-based publication system |
US9405773B2 (en) * | 2010-03-29 | 2016-08-02 | Ebay Inc. | Searching for more products like a specified product |
WO2011137386A1 (en) * | 2010-04-30 | 2011-11-03 | Orbis Technologies, Inc. | Systems and methods for semantic search, content correlation and visualization |
JP5894149B2 (en) * | 2010-06-03 | 2016-03-23 | トムソン ライセンシングThomson Licensing | Enhancement of meaning using TOP-K processing |
US8738627B1 (en) * | 2010-06-14 | 2014-05-27 | Amazon Technologies, Inc. | Enhanced concept lists for search |
WO2012006509A1 (en) * | 2010-07-09 | 2012-01-12 | Google Inc. | Table search using recovered semantic information |
US8412594B2 (en) | 2010-08-28 | 2013-04-02 | Ebay Inc. | Multilevel silhouettes in an online shopping environment |
US8332426B2 (en) | 2010-11-23 | 2012-12-11 | Microsoft Corporation | Indentifying referring expressions for concepts |
US8364672B2 (en) * | 2010-11-23 | 2013-01-29 | Microsoft Corporation | Concept disambiguation via search engine search results |
EP4120101A1 (en) * | 2011-01-07 | 2023-01-18 | Ixreveal, Inc. | Concepts and link discovery system |
US8484228B2 (en) * | 2011-03-17 | 2013-07-09 | Indian Institute Of Science | Extraction and grouping of feature words |
US20120254166A1 (en) * | 2011-03-30 | 2012-10-04 | Google Inc. | Signature Detection in E-Mails |
US8983963B2 (en) * | 2011-07-07 | 2015-03-17 | Software Ag | Techniques for comparing and clustering documents |
WO2013013335A1 (en) * | 2011-07-22 | 2013-01-31 | Hewlett-Packard Development Company, L.P. | Automated document composition using clusters |
US20130166563A1 (en) * | 2011-12-21 | 2013-06-27 | Sap Ag | Integration of Text Analysis and Search Functionality |
US20130197925A1 (en) * | 2012-01-31 | 2013-08-01 | Optumlnsight, Inc. | Behavioral clustering for removing outlying healthcare providers |
US9934522B2 (en) | 2012-03-22 | 2018-04-03 | Ebay Inc. | Systems and methods for batch- listing items stored offline on a mobile device |
CN103455908A (en) * | 2012-05-30 | 2013-12-18 | Sap股份公司 | Brainstorming service in cloud environment |
US20140075282A1 (en) * | 2012-06-26 | 2014-03-13 | Rediff.Com India Limited | Method and apparatus for composing a representative description for a cluster of digital documents |
US20140025687A1 (en) * | 2012-07-17 | 2014-01-23 | Koninklijke Philips N.V | Analyzing a report |
US20140035924A1 (en) * | 2012-08-01 | 2014-02-06 | Apollo Group, Inc. | Trend highlighting |
US9852215B1 (en) * | 2012-09-21 | 2017-12-26 | Amazon Technologies, Inc. | Identifying text predicted to be of interest |
US9141882B1 (en) * | 2012-10-19 | 2015-09-22 | Networked Insights, Llc | Clustering of text units using dimensionality reduction of multi-dimensional arrays |
US20140207870A1 (en) * | 2013-01-22 | 2014-07-24 | Xerox Corporation | Methods and systems for compensating remote workers |
US9348902B2 (en) * | 2013-01-30 | 2016-05-24 | Wal-Mart Stores, Inc. | Automated attribute disambiguation with human input |
US9201969B2 (en) | 2013-01-31 | 2015-12-01 | Lexisnexis, A Division Of Reed Elsevier Inc. | Systems and methods for identifying documents based on citation history |
US10691737B2 (en) * | 2013-02-05 | 2020-06-23 | Intel Corporation | Content summarization and/or recommendation apparatus and method |
US9122681B2 (en) | 2013-03-15 | 2015-09-01 | Gordon Villy Cormack | Systems and methods for classifying electronic information using advanced active learning techniques |
US9898523B2 (en) | 2013-04-22 | 2018-02-20 | Abb Research Ltd. | Tabular data parsing in document(s) |
JP2015036892A (en) * | 2013-08-13 | 2015-02-23 | ソニー株式会社 | Information processing apparatus, information processing method, and program |
US9542391B1 (en) | 2013-11-11 | 2017-01-10 | Amazon Technologies, Inc. | Processing service requests for non-transactional databases |
US11336648B2 (en) | 2013-11-11 | 2022-05-17 | Amazon Technologies, Inc. | Document management and collaboration system |
US10599753B1 (en) | 2013-11-11 | 2020-03-24 | Amazon Technologies, Inc. | Document version control in collaborative environment |
US10540404B1 (en) * | 2014-02-07 | 2020-01-21 | Amazon Technologies, Inc. | Forming a document collection in a document management and collaboration system |
US20150142679A1 (en) * | 2013-11-15 | 2015-05-21 | Adobe Systems Incorporated | Provisioning rules to manage user entitlements |
EP2874073A1 (en) * | 2013-11-18 | 2015-05-20 | Fujitsu Limited | System, apparatus, program and method for data aggregation |
US10210156B2 (en) * | 2014-01-10 | 2019-02-19 | International Business Machines Corporation | Seed selection in corpora compaction for natural language processing |
US10691877B1 (en) | 2014-02-07 | 2020-06-23 | Amazon Technologies, Inc. | Homogenous insertion of interactions into documents |
US10474700B2 (en) * | 2014-02-11 | 2019-11-12 | Nektoon Ag | Robust stream filtering based on reference document |
JP6165657B2 (en) * | 2014-03-20 | 2017-07-19 | 株式会社東芝 | Information processing apparatus, information processing method, and program |
US9703858B2 (en) * | 2014-07-14 | 2017-07-11 | International Business Machines Corporation | Inverted table for storing and querying conceptual indices |
US10503761B2 (en) | 2014-07-14 | 2019-12-10 | International Business Machines Corporation | System for searching, recommending, and exploring documents through conceptual associations |
US10162882B2 (en) | 2014-07-14 | 2018-12-25 | Nternational Business Machines Corporation | Automatically linking text to concepts in a knowledge base |
US9710570B2 (en) * | 2014-07-14 | 2017-07-18 | International Business Machines Corporation | Computing the relevance of a document to concepts not specified in the document |
US10437869B2 (en) | 2014-07-14 | 2019-10-08 | International Business Machines Corporation | Automatic new concept definition |
US9910899B1 (en) * | 2014-09-03 | 2018-03-06 | State Farm Mutual Automobile Insurance Company | Systems and methods for electronically mining intellectual property |
WO2016048283A1 (en) * | 2014-09-23 | 2016-03-31 | Hewlett Packard Enterprise Development Lp | Event log analysis |
US9807073B1 (en) | 2014-09-29 | 2017-10-31 | Amazon Technologies, Inc. | Access to documents in a document management and collaboration system |
US9996603B2 (en) * | 2014-10-14 | 2018-06-12 | Adobe Systems Inc. | Detecting homologies in encrypted and unencrypted documents using fuzzy hashing |
US10331782B2 (en) | 2014-11-19 | 2019-06-25 | Lexisnexis, A Division Of Reed Elsevier Inc. | Systems and methods for automatic identification of potential material facts in documents |
US10025783B2 (en) * | 2015-01-30 | 2018-07-17 | Microsoft Technology Licensing, Llc | Identifying similar documents using graphs |
US10229117B2 (en) | 2015-06-19 | 2019-03-12 | Gordon V. Cormack | Systems and methods for conducting a highly autonomous technology-assisted review classification |
US20170116194A1 (en) | 2015-10-23 | 2017-04-27 | International Business Machines Corporation | Ingestion planning for complex tables |
US20170169032A1 (en) * | 2015-12-12 | 2017-06-15 | Hewlett-Packard Development Company, L.P. | Method and system of selecting and orderingcontent based on distance scores |
CN107015961B (en) * | 2016-01-27 | 2021-06-25 | 中文在线数字出版集团股份有限公司 | Text similarity comparison method |
US10552465B2 (en) * | 2016-02-18 | 2020-02-04 | Microsoft Technology Licensing, Llc | Generating text snippets using universal concept graph |
US11194860B2 (en) | 2016-07-11 | 2021-12-07 | Baidu Usa Llc | Question generation systems and methods for automating diagnosis |
US10521436B2 (en) * | 2016-07-11 | 2019-12-31 | Baidu Usa Llc | Systems and methods for data and information source reliability estimation |
US10650318B2 (en) | 2016-07-20 | 2020-05-12 | Baidu Usa Llc | Systems and methods of determining sufficient causes from multiple outcomes |
JP6930180B2 (en) * | 2017-03-30 | 2021-09-01 | 富士通株式会社 | Learning equipment, learning methods and learning programs |
US20180300323A1 (en) * | 2017-04-17 | 2018-10-18 | Lee & Hayes, PLLC | Multi-Factor Document Analysis |
CN107329999B (en) * | 2017-06-09 | 2020-10-20 | 江西科技学院 | Document classification method and device |
US11283738B2 (en) | 2017-06-23 | 2022-03-22 | Realpage, Inc. | Interaction driven artificial intelligence system and uses for same, including travel or real estate related contexts |
US11138249B1 (en) | 2017-08-23 | 2021-10-05 | Realpage, Inc. | Systems and methods for the creation, update and use of concept networks to select destinations in artificial intelligence systems |
US10872125B2 (en) | 2017-10-05 | 2020-12-22 | Realpage, Inc. | Concept networks and systems and methods for the creation, update and use of same to select images, including the selection of images corresponding to destinations in artificial intelligence systems |
US10997259B2 (en) * | 2017-10-06 | 2021-05-04 | Realpage, Inc. | Concept networks and systems and methods for the creation, update and use of same in artificial intelligence systems |
US10498898B2 (en) | 2017-12-13 | 2019-12-03 | Genesys Telecommunications Laboratories, Inc. | Systems and methods for chatbot generation |
CN108665148B (en) * | 2018-04-18 | 2022-02-22 | 腾讯科技(深圳)有限公司 | Electronic resource quality evaluation method and device and storage medium |
US11048876B2 (en) * | 2018-11-30 | 2021-06-29 | Microsoft Technology Licensing, Llc | Phrase extraction for optimizing digital page |
US10809892B2 (en) * | 2018-11-30 | 2020-10-20 | Microsoft Technology Licensing, Llc | User interface for optimizing digital page |
US20220044200A1 (en) * | 2020-08-06 | 2022-02-10 | Nec Laboratories America, Inc. | Matching business need documents |
US11663410B2 (en) * | 2021-02-17 | 2023-05-30 | Kyndryl, Inc. | Online terms of use interpretation and summarization |
Family Cites Families (215)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3416150A (en) * | 1965-08-02 | 1968-12-10 | Data Disc Inc | Magnetic recording disc cartridge |
US3426210A (en) * | 1965-12-22 | 1969-02-04 | Rca Corp | Control circuit for automatically quantizing signals at desired levels |
BE757866A (en) * | 1969-12-22 | 1971-04-01 | Ibm | MAGNETIC DISC MEMORY ELEMENT AND RECORDING APPARATUS IN USE |
JPS6283787A (en) * | 1985-10-09 | 1987-04-17 | 株式会社日立製作所 | Output control system for display screen |
US4930077A (en) * | 1987-04-06 | 1990-05-29 | Fan David P | Information processing expert system for text analysis and predicting public opinion based information available to the public |
US5121338A (en) * | 1988-03-10 | 1992-06-09 | Indiana University Foundation | Method for detecting subpopulations in spectral analysis |
US4893253A (en) * | 1988-03-10 | 1990-01-09 | Indiana University Foundation | Method for analyzing intact capsules and tablets by near-infrared reflectance spectrometry |
US5056021A (en) | 1989-06-08 | 1991-10-08 | Carolyn Ausborn | Method and apparatus for abstracting concepts from natural language |
US5860136A (en) * | 1989-06-16 | 1999-01-12 | Fenner; Peter R. | Method and apparatus for use of associated memory with large key spaces |
US6089742A (en) * | 1989-11-01 | 2000-07-18 | Warmerdam; Thomas P. H. | Method and apparatus for controlling robots and the like using a bubble data hierarchy placed along a medial axis |
US5477451A (en) | 1991-07-25 | 1995-12-19 | International Business Machines Corp. | Method and system for natural language translation |
US5278980A (en) * | 1991-08-16 | 1994-01-11 | Xerox Corporation | Iterative technique for phrase query formation and an information retrieval system employing same |
US5488725A (en) | 1991-10-08 | 1996-01-30 | West Publishing Company | System of document representation retrieval by successive iterated probability sampling |
US5442778A (en) * | 1991-11-12 | 1995-08-15 | Xerox Corporation | Scatter-gather: a cluster-based method and apparatus for browsing large document collections |
JP3566720B2 (en) * | 1992-04-30 | 2004-09-15 | アプル・コンピュータ・インコーポレーテッド | Method and apparatus for organizing information in a computer system |
JP3364242B2 (en) * | 1992-07-03 | 2003-01-08 | 株式会社東芝 | Link learning device for artificial neural networks |
DE69426541T2 (en) * | 1993-03-12 | 2001-06-13 | Toshiba Kawasaki Kk | Document detection system with presentation of the detection result to facilitate understanding of the user |
US5528735A (en) * | 1993-03-23 | 1996-06-18 | Silicon Graphics Inc. | Method and apparatus for displaying data within a three-dimensional information landscape |
JPH0756933A (en) * | 1993-06-24 | 1995-03-03 | Xerox Corp | Method for retrieval of document |
US7251637B1 (en) * | 1993-09-20 | 2007-07-31 | Fair Isaac Corporation | Context vector generation and retrieval |
US5619709A (en) * | 1993-09-20 | 1997-04-08 | Hnc, Inc. | System and method of context vector generation and retrieval |
US6173275B1 (en) | 1993-09-20 | 2001-01-09 | Hnc Software, Inc. | Representation and retrieval of images using context vectors derived from image information elements |
US5675819A (en) * | 1994-06-16 | 1997-10-07 | Xerox Corporation | Document information retrieval using global word co-occurrence patterns |
US5959699A (en) * | 1994-06-28 | 1999-09-28 | Samsung Electronics Co., Ltd. | Reception mode control in radio receivers for receiving both VSB and QAM digital television signals |
US5619632A (en) * | 1994-09-14 | 1997-04-08 | Xerox Corporation | Displaying node-link structure with region of greater spacings and peripheral branches |
US5758257A (en) * | 1994-11-29 | 1998-05-26 | Herz; Frederick | System and method for scheduling broadcast of and access to video programs and other data using customer profiles |
US5635929A (en) * | 1995-02-13 | 1997-06-03 | Hughes Aircraft Company | Low bit rate video encoder and decoder |
US5724571A (en) | 1995-07-07 | 1998-03-03 | Sun Microsystems, Inc. | Method and apparatus for generating query responses in a computer-based document retrieval system |
US5844991A (en) * | 1995-08-07 | 1998-12-01 | The Regents Of The University Of California | Script identification from images using cluster-based templates |
US6006221A (en) * | 1995-08-16 | 1999-12-21 | Syracuse University | Multilingual document retrieval system and method using semantic vector matching |
US5737734A (en) * | 1995-09-15 | 1998-04-07 | Infonautics Corporation | Query word relevance adjustment in a search of an information retrieval system |
US5799276A (en) * | 1995-11-07 | 1998-08-25 | Accent Incorporated | Knowledge-based speech recognition system and methods having frame length computed based upon estimated pitch period of vocalic intervals |
US5842203A (en) * | 1995-12-01 | 1998-11-24 | International Business Machines Corporation | Method and system for performing non-boolean search queries in a graphical user interface |
US5862325A (en) * | 1996-02-29 | 1999-01-19 | Intermind Corporation | Computer-based communication system and method using metadata defining a control structure |
US5867799A (en) * | 1996-04-04 | 1999-02-02 | Lang; Andrew K. | Information system and method for filtering a massive flow of information entities to meet user information classification needs |
US6026397A (en) | 1996-05-22 | 2000-02-15 | Electronic Data Systems Corporation | Data analysis system and method |
US5794236A (en) * | 1996-05-29 | 1998-08-11 | Lexis-Nexis | Computer-based system for classifying documents into a hierarchy and linking the classifications to the hierarchy |
US5864871A (en) * | 1996-06-04 | 1999-01-26 | Multex Systems | Information delivery system and method including on-line entitlements |
US6453327B1 (en) | 1996-06-10 | 2002-09-17 | Sun Microsystems, Inc. | Method and apparatus for identifying and discarding junk electronic mail |
JP3540511B2 (en) * | 1996-06-18 | 2004-07-07 | 株式会社東芝 | Electronic signature verification device |
US5909677A (en) * | 1996-06-18 | 1999-06-01 | Digital Equipment Corporation | Method for determining the resemblance of documents |
US5864846A (en) * | 1996-06-28 | 1999-01-26 | Siemens Corporate Research, Inc. | Method for facilitating world wide web searches utilizing a document distribution fusion strategy |
US5920854A (en) * | 1996-08-14 | 1999-07-06 | Infoseek Corporation | Real-time document collection search engine with phrase indexing |
US5857179A (en) * | 1996-09-09 | 1999-01-05 | Digital Equipment Corporation | Computer method and apparatus for clustering documents and automatic generation of cluster keywords |
US6484168B1 (en) | 1996-09-13 | 2002-11-19 | Battelle Memorial Institute | System for information discovery |
US5870740A (en) * | 1996-09-30 | 1999-02-09 | Apple Computer, Inc. | System and method for improving the ranking of information retrieval results for short queries |
US5950146A (en) * | 1996-10-04 | 1999-09-07 | At & T Corp. | Support vector method for function estimation |
US5987446A (en) * | 1996-11-12 | 1999-11-16 | U.S. West, Inc. | Searching large collections of text using multiple search engines concurrently |
JP3598742B2 (en) * | 1996-11-25 | 2004-12-08 | 富士ゼロックス株式会社 | Document search device and document search method |
US5966126A (en) * | 1996-12-23 | 1999-10-12 | Szabo; Andrew J. | Graphic user interface for database system |
US5950189A (en) * | 1997-01-02 | 1999-09-07 | At&T Corp | Retrieval system and method |
JP3550929B2 (en) * | 1997-01-28 | 2004-08-04 | 富士通株式会社 | Reference frequency counting apparatus and method in interactive hypertext information reference system |
US5819258A (en) * | 1997-03-07 | 1998-10-06 | Digital Equipment Corporation | Method and apparatus for automatically generating hierarchical categories from large document collections |
US6137499A (en) * | 1997-03-07 | 2000-10-24 | Silicon Graphics, Inc. | Method, system, and computer program product for visualizing data using partial hierarchies |
US5835905A (en) * | 1997-04-09 | 1998-11-10 | Xerox Corporation | System for predicting documents relevant to focus documents by spreading activation through network representations of a linked collection of documents |
US5895470A (en) * | 1997-04-09 | 1999-04-20 | Xerox Corporation | System for categorizing documents in a linked collection of documents |
US6460034B1 (en) | 1997-05-21 | 2002-10-01 | Oracle Corporation | Document knowledge base research and retrieval system |
US5940821A (en) * | 1997-05-21 | 1999-08-17 | Oracle Corporation | Information presentation in a knowledge base search and retrieval system |
US6148102A (en) | 1997-05-29 | 2000-11-14 | Adobe Systems Incorporated | Recognizing text in a multicolor image |
JPH1115759A (en) | 1997-06-16 | 1999-01-22 | Digital Equip Corp <Dec> | Full text index type mail preserving device |
US6137911A (en) | 1997-06-16 | 2000-10-24 | The Dialog Corporation Plc | Test classification system and method |
GB9713019D0 (en) * | 1997-06-20 | 1997-08-27 | Xerox Corp | Linguistic search system |
US6012053A (en) * | 1997-06-23 | 2000-01-04 | Lycos, Inc. | Computer system with user-controlled relevance ranking of search results |
US6470307B1 (en) * | 1997-06-23 | 2002-10-22 | National Research Council Of Canada | Method and apparatus for automatically identifying keywords within a document |
US6070133A (en) * | 1997-07-21 | 2000-05-30 | Battelle Memorial Institute | Information retrieval system utilizing wavelet transform |
US6122628A (en) * | 1997-10-31 | 2000-09-19 | International Business Machines Corporation | Multidimensional data clustering and dimension reduction for indexing and searching |
US6154219A (en) * | 1997-12-01 | 2000-11-28 | Microsoft Corporation | System and method for optimally placing labels on a map |
US6389436B1 (en) * | 1997-12-15 | 2002-05-14 | International Business Machines Corporation | Enhanced hypertext categorization using hyperlinks |
US6094649A (en) * | 1997-12-22 | 2000-07-25 | Partnet, Inc. | Keyword searches of structured databases |
AU1907899A (en) | 1997-12-22 | 1999-07-12 | Accepted Marketing, Inc. | E-mail filter and method thereof |
US6449612B1 (en) | 1998-03-17 | 2002-09-10 | Microsoft Corporation | Varying cluster number in a scalable clustering system for use with large databases |
US6038574A (en) * | 1998-03-18 | 2000-03-14 | Xerox Corporation | Method and apparatus for clustering a collection of linked documents using co-citation analysis |
US6484196B1 (en) | 1998-03-20 | 2002-11-19 | Advanced Web Solutions | Internet messaging system and method for use in computer networks |
US6119124A (en) * | 1998-03-26 | 2000-09-12 | Digital Equipment Corporation | Method for clustering closely resembling data objects |
US6418431B1 (en) | 1998-03-30 | 2002-07-09 | Microsoft Corporation | Information retrieval and speech recognition based on language models |
JPH11316779A (en) | 1998-04-30 | 1999-11-16 | Fujitsu Ltd | Witness system |
US6345243B1 (en) * | 1998-05-27 | 2002-02-05 | Lionbridge Technologies, Inc. | System, method, and product for dynamically propagating translations in a translation-memory system |
US7266365B2 (en) | 1998-05-29 | 2007-09-04 | Research In Motion Limited | System and method for delayed transmission of bundled command messages |
US7209949B2 (en) | 1998-05-29 | 2007-04-24 | Research In Motion Limited | System and method for synchronizing information between a host system and a mobile data communication device |
US6438564B1 (en) | 1998-06-17 | 2002-08-20 | Microsoft Corporation | Method for associating a discussion with a document |
US6100901A (en) * | 1998-06-22 | 2000-08-08 | International Business Machines Corporation | Method and apparatus for cluster exploration and visualization |
US6216123B1 (en) * | 1998-06-24 | 2001-04-10 | Novell, Inc. | Method and system for rapid retrieval in a full text indexing system |
JP2000019556A (en) | 1998-06-29 | 2000-01-21 | Hitachi Ltd | Liquid crystal display device |
US6446061B1 (en) | 1998-07-31 | 2002-09-03 | International Business Machines Corporation | Taxonomy generation for document collections |
US6167368A (en) * | 1998-08-14 | 2000-12-26 | The Trustees Of Columbia University In The City Of New York | Method and system for indentifying significant topics of a document |
JP4032649B2 (en) | 1998-08-24 | 2008-01-16 | 株式会社日立製作所 | How to display multimedia information |
US6243713B1 (en) * | 1998-08-24 | 2001-06-05 | Excalibur Technologies Corp. | Multimedia document retrieval by application of multimedia queries to a unified index of multimedia data for a plurality of multimedia data types |
US6414677B1 (en) * | 1998-09-14 | 2002-07-02 | Microsoft Corporation | Methods, apparatus and data structures for providing a user interface, which exploits spatial memory in three-dimensions, to objects and which visually groups proximally located objects |
WO2000016209A1 (en) | 1998-09-15 | 2000-03-23 | Local2Me.Com, Inc. | Dynamic matchingtm of users for group communication |
JP3903610B2 (en) * | 1998-09-28 | 2007-04-11 | 富士ゼロックス株式会社 | Search device, search method, and computer-readable recording medium storing search program |
US6415283B1 (en) * | 1998-10-13 | 2002-07-02 | Orack Corporation | Methods and apparatus for determining focal points of clusters in a tree structure |
US6480843B2 (en) | 1998-11-03 | 2002-11-12 | Nec Usa, Inc. | Supporting web-query expansion efficiently using multi-granularity indexing and query processing |
US6678705B1 (en) | 1998-11-16 | 2004-01-13 | At&T Corp. | System for archiving electronic documents using messaging groupware |
US6628304B2 (en) | 1998-12-09 | 2003-09-30 | Cisco Technology, Inc. | Method and apparatus providing a graphical user interface for representing and navigating hierarchical networks |
US6442592B1 (en) | 1998-12-11 | 2002-08-27 | Micro Computer Systems, Inc. | Message center system |
US6816175B1 (en) | 1998-12-19 | 2004-11-09 | International Business Machines Corporation | Orthogonal browsing in object hierarchies |
JP2000187668A (en) * | 1998-12-22 | 2000-07-04 | Hitachi Ltd | Grouping method and overlap excluding method |
US6549957B1 (en) | 1998-12-22 | 2003-04-15 | International Business Machines Corporation | Apparatus for preventing automatic generation of a chain reaction of messages if a prior extracted message is similar to current processed message |
JP2000285140A (en) * | 1998-12-24 | 2000-10-13 | Ricoh Co Ltd | Device and method for processing document, device and method for classifying document, and computer readable recording medium recorded with program for allowing computer to execute these methods |
US6349307B1 (en) * | 1998-12-28 | 2002-02-19 | U.S. Philips Corporation | Cooperative topical servers with automatic prefiltering and routing |
US6363374B1 (en) * | 1998-12-31 | 2002-03-26 | Microsoft Corporation | Text proximity filtering in search systems using same sentence restrictions |
EP2178006A3 (en) | 1999-01-26 | 2011-04-13 | Xerox Corporation | Multi-modal information access |
US6922699B2 (en) * | 1999-01-26 | 2005-07-26 | Xerox Corporation | System and method for quantitatively representing data objects in vector space |
US6598054B2 (en) | 1999-01-26 | 2003-07-22 | Xerox Corporation | System and method for clustering data objects in a collection |
US6360227B1 (en) * | 1999-01-29 | 2002-03-19 | International Business Machines Corporation | System and method for generating taxonomies with applications to content-based recommendations |
US6941325B1 (en) | 1999-02-01 | 2005-09-06 | The Trustees Of Columbia University | Multimedia archive description scheme |
WO2000046701A1 (en) | 1999-02-08 | 2000-08-10 | Huntsman Ici Chemicals Llc | Method for retrieving semantically distant analogies |
US7277919B1 (en) * | 1999-03-19 | 2007-10-02 | Bigfix, Inc. | Relevance clause for computed relevance messaging |
US6510406B1 (en) | 1999-03-23 | 2003-01-21 | Mathsoft, Inc. | Inverse inference engine for high performance web search |
US6862710B1 (en) * | 1999-03-23 | 2005-03-01 | Insightful Corporation | Internet navigation using soft hyperlinks |
US6408294B1 (en) * | 1999-03-31 | 2002-06-18 | Verizon Laboratories Inc. | Common term optimization |
US6496822B2 (en) | 1999-04-12 | 2002-12-17 | Micron Technology, Inc. | Methods of providing computer systems with bundled access to restricted-access databases |
US6377287B1 (en) * | 1999-04-19 | 2002-04-23 | Hewlett-Packard Company | Technique for visualizing large web-based hierarchical hyperbolic space with multi-paths |
EP1049030A1 (en) | 1999-04-28 | 2000-11-02 | SER Systeme AG Produkte und Anwendungen der Datenverarbeitung | Classification method and apparatus |
US6629097B1 (en) | 1999-04-28 | 2003-09-30 | Douglas K. Keith | Displaying implicit associations among items in loosely-structured data sets |
US6493703B1 (en) | 1999-05-11 | 2002-12-10 | Prophet Financial Systems | System and method for implementing intelligent online community message board |
US6606625B1 (en) * | 1999-06-03 | 2003-08-12 | University Of Southern California | Wrapper induction by hierarchical data analysis |
US6611825B1 (en) * | 1999-06-09 | 2003-08-26 | The Boeing Company | Method and system for text mining using multidimensional subspaces |
US6701305B1 (en) | 1999-06-09 | 2004-03-02 | The Boeing Company | Methods, apparatus and computer program products for information retrieval and document classification utilizing a multidimensional subspace |
US6711585B1 (en) | 1999-06-15 | 2004-03-23 | Kanisa Inc. | System and method for implementing a knowledge management system |
US6438537B1 (en) | 1999-06-22 | 2002-08-20 | Microsoft Corporation | Usage based aggregation optimization |
US6415171B1 (en) * | 1999-07-16 | 2002-07-02 | International Business Machines Corporation | System and method for fusing three-dimensional shape data on distorted images without correcting for distortion |
US6389433B1 (en) * | 1999-07-16 | 2002-05-14 | Microsoft Corporation | Method and system for automatically merging files into a single instance store |
US7240199B2 (en) | 2000-12-06 | 2007-07-03 | Rpost International Limited | System and method for verifying delivery and integrity of electronic messages |
EP1236175A4 (en) * | 1999-08-06 | 2006-07-12 | Lexis Nexis | System and method for classifying legal concepts using legal topic scheme |
US6523063B1 (en) | 1999-08-30 | 2003-02-18 | Zaplet, Inc. | Method system and program product for accessing a file using values from a redirect message string for each change of the link identifier |
US6651057B1 (en) * | 1999-09-03 | 2003-11-18 | Bbnt Solutions Llc | Method and apparatus for score normalization for information retrieval applications |
US6260038B1 (en) * | 1999-09-13 | 2001-07-10 | International Businemss Machines Corporation | Clustering mixed attribute patterns |
US6990238B1 (en) * | 1999-09-30 | 2006-01-24 | Battelle Memorial Institute | Data processing, analysis, and visualization system for use with disparate data types |
US6544123B1 (en) | 1999-10-29 | 2003-04-08 | Square Co., Ltd. | Game apparatus, command input method for video game and computer-readable recording medium recording programs for realizing the same |
US7130807B1 (en) | 1999-11-22 | 2006-10-31 | Accenture Llp | Technology sharing during demand and supply planning in a network-based supply chain environment |
US6507847B1 (en) | 1999-12-17 | 2003-01-14 | Openwave Systems Inc. | History database structure for Usenet |
US6751621B1 (en) | 2000-01-27 | 2004-06-15 | Manning & Napier Information Services, Llc. | Construction of trainable semantic vectors and clustering, classification, and searching using trainable semantic vectors |
US6542889B1 (en) | 2000-01-28 | 2003-04-01 | International Business Machines Corporation | Methods and apparatus for similarity text search based on conceptual indexing |
US6654739B1 (en) | 2000-01-31 | 2003-11-25 | International Business Machines Corporation | Lightweight document clustering |
US6571225B1 (en) * | 2000-02-11 | 2003-05-27 | International Business Machines Corporation | Text categorizers based on regularizing adaptations of the problem of computing linear separators |
US7412462B2 (en) | 2000-02-18 | 2008-08-12 | Burnside Acquisition, Llc | Data repository and method for promoting network storage of data |
US7117246B2 (en) | 2000-02-22 | 2006-10-03 | Sendmail, Inc. | Electronic mail system with methodology providing distributed message store |
US6560597B1 (en) * | 2000-03-21 | 2003-05-06 | International Business Machines Corporation | Concept decomposition using clustering |
US6757646B2 (en) * | 2000-03-22 | 2004-06-29 | Insightful Corporation | Extended functionality for an inverse inference engine based web search |
US6785679B1 (en) | 2000-03-29 | 2004-08-31 | Brassring, Llc | Method and apparatus for sending and tracking resume data sent via URL |
US6915308B1 (en) | 2000-04-06 | 2005-07-05 | Claritech Corporation | Method and apparatus for information mining and filtering |
US6584564B2 (en) | 2000-04-25 | 2003-06-24 | Sigaba Corporation | Secure e-mail system |
US7325127B2 (en) | 2000-04-25 | 2008-01-29 | Secure Data In Motion, Inc. | Security server system |
US7698167B2 (en) * | 2000-04-28 | 2010-04-13 | Computer Pundits, Inc. | Catalog building method and system |
US6879332B2 (en) | 2000-05-16 | 2005-04-12 | Groxis, Inc. | User interface for displaying and exploring hierarchical information |
US6883001B2 (en) * | 2000-05-26 | 2005-04-19 | Fujitsu Limited | Document information search apparatus and method and recording medium storing document information search program therein |
US6519580B1 (en) * | 2000-06-08 | 2003-02-11 | International Business Machines Corporation | Decision-tree-based symbolic rule induction system for text categorization |
US6697998B1 (en) * | 2000-06-12 | 2004-02-24 | International Business Machines Corporation | Automatic labeling of unlabeled text data |
US20020078090A1 (en) | 2000-06-30 | 2002-06-20 | Hwang Chung Hee | Ontological concept-based, user-centric text summarization |
US7490092B2 (en) * | 2000-07-06 | 2009-02-10 | Streamsage, Inc. | Method and system for indexing and searching timed media information based upon relevance intervals |
US6738759B1 (en) | 2000-07-07 | 2004-05-18 | Infoglide Corporation, Inc. | System and method for performing similarity searching using pointer optimization |
JP2002041544A (en) | 2000-07-25 | 2002-02-08 | Toshiba Corp | Text information analyzing device |
US6675159B1 (en) | 2000-07-27 | 2004-01-06 | Science Applic Int Corp | Concept-based search and retrieval system |
US20020032735A1 (en) * | 2000-08-25 | 2002-03-14 | Daniel Burnstein | Apparatus, means and methods for automatic community formation for phones and computer networks |
AUPR033800A0 (en) * | 2000-09-25 | 2000-10-19 | Telstra R & D Management Pty Ltd | A document categorisation system |
US7457948B1 (en) | 2000-09-29 | 2008-11-25 | Lucent Technologies Inc. | Automated authentication handling system |
US7197470B1 (en) | 2000-10-11 | 2007-03-27 | Buzzmetrics, Ltd. | System and method for collection analysis of electronic discussion methods |
US6684205B1 (en) | 2000-10-18 | 2004-01-27 | International Business Machines Corporation | Clustering hypertext with applications to web searching |
CA2323883C (en) * | 2000-10-19 | 2016-02-16 | Patrick Ryan Morin | Method and device for classifying internet objects and objects stored oncomputer-readable media |
US7233940B2 (en) * | 2000-11-06 | 2007-06-19 | Answers Corporation | System for processing at least partially structured data |
US6978419B1 (en) | 2000-11-15 | 2005-12-20 | Justsystem Corporation | Method and apparatus for efficient identification of duplicate and near-duplicate documents and text spans using high-discriminability text fragments |
WO2002041190A2 (en) | 2000-11-15 | 2002-05-23 | Holbrook David M | Apparatus and method for organizing and/or presenting data |
WO2002042982A2 (en) * | 2000-11-27 | 2002-05-30 | Nextworth, Inc. | Anonymous transaction system |
AU2002235147A1 (en) * | 2000-11-30 | 2002-06-11 | Webtone Technologies, Inc. | Web session collaboration |
US7003551B2 (en) | 2000-11-30 | 2006-02-21 | Bellsouth Intellectual Property Corp. | Method and apparatus for minimizing storage of common attachment files in an e-mail communications server |
US7607083B2 (en) * | 2000-12-12 | 2009-10-20 | Nec Corporation | Test summarization using relevance measures and latent semantic analysis |
WO2002056181A2 (en) * | 2001-01-11 | 2002-07-18 | Force Communications Inc Z | File switch and switched file system |
US6751628B2 (en) | 2001-01-11 | 2004-06-15 | Dolphin Search | Process and system for sparse vector and matrix representation of document indexing and retrieval |
US6658423B1 (en) | 2001-01-24 | 2003-12-02 | Google, Inc. | Detecting duplicate and near-duplicate files |
JP4022374B2 (en) * | 2001-01-26 | 2007-12-19 | 株式会社ルネサステクノロジ | Semiconductor device manufacturing method and system |
JP3768105B2 (en) | 2001-01-29 | 2006-04-19 | 株式会社東芝 | Translation apparatus, translation method, and translation program |
WO2002063493A1 (en) | 2001-02-08 | 2002-08-15 | 2028, Inc. | Methods and systems for automated semantic knowledge leveraging graph theoretic analysis and the inherent structure of communication |
US20020122543A1 (en) | 2001-02-12 | 2002-09-05 | Rowen Chris E. | System and method of indexing unique electronic mail messages and uses for the same |
US7366759B2 (en) * | 2001-02-22 | 2008-04-29 | Parity Communications, Inc. | Method and system for characterizing relationships in social networks |
US6823333B2 (en) | 2001-03-02 | 2004-11-23 | The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration | System, method and apparatus for conducting a keyterm search |
US6819344B2 (en) | 2001-03-12 | 2004-11-16 | Microsoft Corporation | Visualization of multi-dimensional data having an unbounded dimension |
US20030130991A1 (en) | 2001-03-28 | 2003-07-10 | Fidel Reijerse | Knowledge discovery from data sets |
US7353204B2 (en) | 2001-04-03 | 2008-04-01 | Zix Corporation | Certified transmission system |
US6714929B1 (en) | 2001-04-13 | 2004-03-30 | Auguri Corporation | Weighted preference data search system and method |
US6804665B2 (en) | 2001-04-18 | 2004-10-12 | International Business Machines Corporation | Method and apparatus for discovering knowledge gaps between problems and solutions in text databases |
US7020645B2 (en) | 2001-04-19 | 2006-03-28 | Eoriginal, Inc. | Systems and methods for state-less authentication |
US7155668B2 (en) | 2001-04-19 | 2006-12-26 | International Business Machines Corporation | Method and system for identifying relationships between text documents and structured variables pertaining to the text documents |
US7194483B1 (en) * | 2001-05-07 | 2007-03-20 | Intelligenxia, Inc. | Method, system, and computer program product for concept-based multi-dimensional analysis of unstructured information |
US6970881B1 (en) * | 2001-05-07 | 2005-11-29 | Intelligenxia, Inc. | Concept-based method and system for dynamically analyzing unstructured information |
US6735578B2 (en) | 2001-05-10 | 2004-05-11 | Honeywell International Inc. | Indexing of knowledge base in multilayer self-organizing maps with hessian and perturbation induced fast learning |
US7308451B1 (en) * | 2001-09-04 | 2007-12-11 | Stratify, Inc. | Method and system for guided cluster based processing on prototypes |
US20020184193A1 (en) | 2001-05-30 | 2002-12-05 | Meir Cohen | Method and system for performing a similarity search using a dissimilarity based indexing structure |
US6675164B2 (en) | 2001-06-08 | 2004-01-06 | The Regents Of The University Of California | Parallel object-oriented data mining system |
US7266545B2 (en) | 2001-08-07 | 2007-09-04 | International Business Machines Corporation | Methods and apparatus for indexing in a database and for retrieving data from a database in accordance with queries using example sets |
JP4701564B2 (en) | 2001-08-31 | 2011-06-15 | ソニー株式会社 | Menu display device and menu display method |
US6978274B1 (en) * | 2001-08-31 | 2005-12-20 | Attenex Corporation | System and method for dynamically evaluating latent concepts in unstructured documents |
AUPR958901A0 (en) | 2001-12-18 | 2002-01-24 | Telstra New Wave Pty Ltd | Information resource taxonomy |
AU2003201799A1 (en) | 2002-01-16 | 2003-07-30 | Elucidon Ab | Information data retrieval, where the data is organized in terms, documents and document corpora |
US20030172048A1 (en) | 2002-03-06 | 2003-09-11 | Business Machines Corporation | Text search system for complex queries |
US7188107B2 (en) | 2002-03-06 | 2007-03-06 | Infoglide Software Corporation | System and method for classification of documents |
US6847966B1 (en) * | 2002-04-24 | 2005-01-25 | Engenium Corporation | Method and system for optimally searching a document database using a representative semantic space |
US20040205578A1 (en) * | 2002-04-25 | 2004-10-14 | Wolff Alan S. | System and method for converting document to reusable learning object |
US7188117B2 (en) * | 2002-05-17 | 2007-03-06 | Xerox Corporation | Systems and methods for authoritativeness grading, estimation and sorting of documents in large heterogeneous document collections |
US6996575B2 (en) | 2002-05-31 | 2006-02-07 | Sas Institute Inc. | Computer-implemented system and method for text-based document processing |
JP3870862B2 (en) | 2002-07-12 | 2007-01-24 | ソニー株式会社 | Liquid crystal display device, control method thereof, and portable terminal |
US20040024755A1 (en) * | 2002-08-05 | 2004-02-05 | Rickard John Terrell | System and method for indexing non-textual data |
US20040034633A1 (en) * | 2002-08-05 | 2004-02-19 | Rickard John Terrell | Data search system and method using mutual subsethood measures |
US7158983B2 (en) * | 2002-09-23 | 2007-01-02 | Battelle Memorial Institute | Text analysis technique |
US6886010B2 (en) | 2002-09-30 | 2005-04-26 | The United States Of America As Represented By The Secretary Of The Navy | Method for data and text mining and literature-based discovery |
US7246113B2 (en) | 2002-10-02 | 2007-07-17 | General Electric Company | Systems and methods for selecting a material that best matches a desired set of properties |
US7158957B2 (en) * | 2002-11-21 | 2007-01-02 | Honeywell International Inc. | Supervised self organizing maps with fuzzy error correction |
US7472110B2 (en) | 2003-01-29 | 2008-12-30 | Microsoft Corporation | System and method for employing social networks for information discovery |
US7197497B2 (en) * | 2003-04-25 | 2007-03-27 | Overture Services, Inc. | Method and apparatus for machine learning a document relevance function |
US20040215608A1 (en) | 2003-04-25 | 2004-10-28 | Alastair Gourlay | Search engine supplemented with URL's that provide access to the search results from predefined search queries |
US20040243556A1 (en) | 2003-05-30 | 2004-12-02 | International Business Machines Corporation | System, method and computer program product for performing unstructured information management and automatic text analysis, and including a document common analysis system (CAS) |
US7146361B2 (en) | 2003-05-30 | 2006-12-05 | International Business Machines Corporation | System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a Weighted AND (WAND) |
GB2403558A (en) | 2003-07-02 | 2005-01-05 | Sony Uk Ltd | Document searching and method for presenting the results |
US20070020642A1 (en) | 2003-07-03 | 2007-01-25 | Zhan Deng | Structural interaction fingerprint |
US7523349B2 (en) * | 2006-08-25 | 2009-04-21 | Accenture Global Services Gmbh | Data visualization for diagnosing computing systems |
-
2003
- 2003-07-25 US US10/626,984 patent/US7610313B2/en active Active
-
2004
- 2004-07-23 WO PCT/US2004/023955 patent/WO2005013152A1/en active Application Filing
- 2004-07-23 EP EP04779158A patent/EP1652119A1/en not_active Withdrawn
- 2004-07-23 CA CA2534273A patent/CA2534273C/en active Active
-
2009
- 2009-10-26 US US12/606,171 patent/US8626761B2/en active Active
-
2014
- 2014-01-06 US US14/148,686 patent/US20140122495A1/en not_active Abandoned
Also Published As
Publication number | Publication date |
---|---|
EP1652119A1 (en) | 2006-05-03 |
CA2534273C (en) | 2013-08-20 |
US8626761B2 (en) | 2014-01-07 |
US20050022106A1 (en) | 2005-01-27 |
US20140122495A1 (en) | 2014-05-01 |
US7610313B2 (en) | 2009-10-27 |
WO2005013152A1 (en) | 2005-02-10 |
US20100049708A1 (en) | 2010-02-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2534273A1 (en) | Performing efficient document scoring and clustering | |
Hull | Improving text retrieval for the routing problem using latent semantic indexing | |
CA2236623A1 (en) | Method and apparatus for automatically identifying key words within a document | |
US8402369B2 (en) | Multiple-document summarization using document clustering | |
CA2470299A1 (en) | Systems, methods, and software for classifying documents | |
CN110491392A (en) | A kind of audio data cleaning method, device and equipment based on speaker's identity | |
Biemann | Unsupervised part-of-speech tagging employing efficient graph clustering | |
EP1387303A3 (en) | Image classification method and image feature space displaying method | |
CN109299480A (en) | Terminology Translation method and device based on context of co-text | |
WO2006083861A3 (en) | Using personal background data to improve the organization of documents retrieved in response to a search query | |
CA2392893A1 (en) | Similar document retrieving method and system | |
EP1400901A3 (en) | Method and system for retrieving confirming sentences | |
CN103049548B (en) | FAQ in electronic channel application identifies system and method | |
CA2508946A1 (en) | Method and apparatus for natural language call routing using confidence scores | |
CA2764243A1 (en) | Co-selected image classification | |
RU2005111000A (en) | PROPOSAL OF RELATED TERMS FOR A MANY SENSE REQUEST | |
CN1270361A (en) | Method and device for audio information searching by content and loudspeaker information | |
EP0969403A3 (en) | Two-dimensional code recognition processing method and apparatus, and storage medium | |
CN102915729B (en) | Speech keyword spotting system and system and method of creating dictionary for the speech keyword spotting system | |
CN103902619B (en) | A kind of network public-opinion monitoring method and system | |
CN105893414A (en) | Method and apparatus for screening valid term of a pronunciation lexicon | |
WO2005010819A1 (en) | Method for partitioning a pattern into optimized sub-patterns | |
KR101052592B1 (en) | Optimal Cluster Partitioning Method and System in Hierarchical Clustering | |
EP1796009A3 (en) | System for and method of extracting and clustering information | |
KR102376489B1 (en) | Text document cluster and topic generation apparatus and method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request |