WO2001090921A3 - System and method for automatically classifying text - Google Patents

System and method for automatically classifying text Download PDF

Info

Publication number
WO2001090921A3
WO2001090921A3 PCT/US2001/016872 US0116872W WO0190921A3 WO 2001090921 A3 WO2001090921 A3 WO 2001090921A3 US 0116872 W US0116872 W US 0116872W WO 0190921 A3 WO0190921 A3 WO 0190921A3
Authority
WO
WIPO (PCT)
Prior art keywords
category
document
feature
weight
automatically classifying
Prior art date
Application number
PCT/US2001/016872
Other languages
French (fr)
Other versions
WO2001090921A2 (en
Inventor
Igor Ukrainczyk
Max Copperman
Scott B Huffman
Original Assignee
Kanisa Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kanisa Inc filed Critical Kanisa Inc
Priority to AU2001264928A priority Critical patent/AU2001264928A1/en
Publication of WO2001090921A2 publication Critical patent/WO2001090921A2/en
Publication of WO2001090921A3 publication Critical patent/WO2001090921A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis

Abstract

A method is provided for automatically classifying text into categories. In operation, a plurality of tokens or features are manually or automatically associated with each category. A weight is then coupled to each feature, wherein the weight indicates a degree of association between the feature and the category. Next, a document is parsed into a plurality of unique tokens with associated counts, wherein the counts are indicative of the number of times the feature appears in the document. A category score representative of a sum of products of each feature count in the document times the corresponding feature weight in the category for each document is then computed. Next, the category scores are sorted by perspective, and a document is classified into a particular category, provided the category score exceeds a predetermined threshold.
PCT/US2001/016872 2000-05-25 2001-05-25 System and method for automatically classifying text WO2001090921A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001264928A AU2001264928A1 (en) 2000-05-25 2001-05-25 System and method for automatically classifying text

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US20697500P 2000-05-25 2000-05-25
US60/206,975 2000-05-25

Publications (2)

Publication Number Publication Date
WO2001090921A2 WO2001090921A2 (en) 2001-11-29
WO2001090921A3 true WO2001090921A3 (en) 2003-12-24

Family

ID=22768713

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/016872 WO2001090921A2 (en) 2000-05-25 2001-05-25 System and method for automatically classifying text

Country Status (3)

Country Link
US (2) US7028250B2 (en)
AU (1) AU2001264928A1 (en)
WO (1) WO2001090921A2 (en)

Families Citing this family (339)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1049030A1 (en) * 1999-04-28 2000-11-02 SER Systeme AG Produkte und Anwendungen der Datenverarbeitung Classification method and apparatus
US7966234B1 (en) 1999-05-17 2011-06-21 Jpmorgan Chase Bank. N.A. Structured finance performance analytics system
US7925610B2 (en) * 1999-09-22 2011-04-12 Google Inc. Determining a meaning of a knowledge item using document-based information
EP1128278B1 (en) * 2000-02-23 2003-09-17 SER Solutions, Inc Method and apparatus for processing electronic documents
US7249095B2 (en) 2000-06-07 2007-07-24 The Chase Manhattan Bank, N.A. System and method for executing deposit transactions over the internet
DE10036712A1 (en) * 2000-07-27 2002-02-28 Active Mining Ag Computer-assisted method for evaluating computer processes having characteristic features by weighting individual values using feature tree
US9177828B2 (en) 2011-02-10 2015-11-03 Micron Technology, Inc. External gettering method and device
EP1182577A1 (en) * 2000-08-18 2002-02-27 SER Systeme AG Produkte und Anwendungen der Datenverarbeitung Associative memory
US7392212B2 (en) * 2000-09-28 2008-06-24 Jpmorgan Chase Bank, N.A. User-interactive financial vehicle performance prediction, trading and training system and methods
US7313541B2 (en) * 2000-11-03 2007-12-25 Jpmorgan Chase Bank, N.A. System and method for estimating conduit liquidity requirements in asset backed commercial paper
US7225467B2 (en) * 2000-11-15 2007-05-29 Lockheed Martin Corporation Active intrusion resistant environment of layered object and compartment keys (airelock)
US7213265B2 (en) * 2000-11-15 2007-05-01 Lockheed Martin Corporation Real time active network compartmentalization
JP3842577B2 (en) * 2001-03-30 2006-11-08 株式会社東芝 Structured document search method, structured document search apparatus and program
US7596526B2 (en) * 2001-04-16 2009-09-29 Jpmorgan Chase Bank, N.A. System and method for managing a series of overnight financing trades
US7269546B2 (en) * 2001-05-09 2007-09-11 International Business Machines Corporation System and method of finding documents related to other documents and of finding related words in response to a query to refine a search
US20030130993A1 (en) * 2001-08-08 2003-07-10 Quiver, Inc. Document categorization engine
US7096179B2 (en) * 2001-08-15 2006-08-22 Siemens Corporate Research, Inc. Text-based automatic content classification and grouping
ATE537507T1 (en) 2001-08-27 2011-12-15 Bdgb Entpr Software Sarl METHOD FOR AUTOMATICALLY INDEXING DOCUMENTS
US7013261B2 (en) * 2001-10-16 2006-03-14 Xerox Corporation Method and system for accelerated morphological analysis
US20030115191A1 (en) * 2001-12-17 2003-06-19 Max Copperman Efficient and cost-effective content provider for customer relationship management (CRM) or other applications
AU2003201799A1 (en) * 2002-01-16 2003-07-30 Elucidon Ab Information data retrieval, where the data is organized in terms, documents and document corpora
US7133860B2 (en) * 2002-01-23 2006-11-07 Matsushita Electric Industrial Co., Ltd. Device and method for automatically classifying documents using vector analysis
US7188107B2 (en) * 2002-03-06 2007-03-06 Infoglide Software Corporation System and method for classification of documents
US7051009B2 (en) * 2002-03-29 2006-05-23 Hewlett-Packard Development Company, L.P. Automatic hierarchical classification of temporal ordered case log documents for detection of changes
US20030220917A1 (en) * 2002-04-03 2003-11-27 Max Copperman Contextual search
US20040205668A1 (en) * 2002-04-30 2004-10-14 Donald Eastlake Native markup language code size reduction
US7370033B1 (en) * 2002-05-17 2008-05-06 Oracle International Corporation Method for extracting association rules from transactions in a database
JP3744464B2 (en) * 2002-05-20 2006-02-08 ソニー株式会社 Signal recording / reproducing apparatus and method, signal reproducing apparatus and method, program, and recording medium
US7328146B1 (en) * 2002-05-31 2008-02-05 At&T Corp. Spoken language understanding that incorporates prior knowledge into boosting
US8224723B2 (en) 2002-05-31 2012-07-17 Jpmorgan Chase Bank, N.A. Account opening system, method and computer program product
US7818764B2 (en) * 2002-06-20 2010-10-19 At&T Intellectual Property I, L.P. System and method for monitoring blocked content
DE60335472D1 (en) * 2002-07-23 2011-02-03 Quigo Technologies Inc SYSTEM AND METHOD FOR AUTOMATED IMAGING OF KEYWORDS AND KEYPHRASES ON DOCUMENTS
US20040044961A1 (en) * 2002-08-28 2004-03-04 Leonid Pesenson Method and system for transformation of an extensible markup language document
US7249312B2 (en) * 2002-09-11 2007-07-24 Intelligent Results Attribute scoring for unstructured content
JP4233836B2 (en) * 2002-10-16 2009-03-04 インターナショナル・ビジネス・マシーンズ・コーポレーション Automatic document classification system, unnecessary word determination method, automatic document classification method, and program
US7080094B2 (en) * 2002-10-29 2006-07-18 Lockheed Martin Corporation Hardware accelerated validating parser
US20040083466A1 (en) * 2002-10-29 2004-04-29 Dapp Michael C. Hardware parser accelerator
US20070061884A1 (en) * 2002-10-29 2007-03-15 Dapp Michael C Intrusion detection accelerator
US20040093200A1 (en) * 2002-11-07 2004-05-13 Island Data Corporation Method of and system for recognizing concepts
TW200407736A (en) * 2002-11-08 2004-05-16 Hon Hai Prec Ind Co Ltd System and method for classifying patents and displaying patent classification
US20040186704A1 (en) * 2002-12-11 2004-09-23 Jiping Sun Fuzzy based natural speech concept system
US20050044033A1 (en) * 2003-01-10 2005-02-24 Gelson Andrew F. Like-kind exchange method
US20040148170A1 (en) * 2003-01-23 2004-07-29 Alejandro Acero Statistical classifiers for spoken language understanding and command/control scenarios
US8335683B2 (en) * 2003-01-23 2012-12-18 Microsoft Corporation System for using statistical classifiers for spoken language understanding
US20040172234A1 (en) * 2003-02-28 2004-09-02 Dapp Michael C. Hardware accelerator personality compiler
US6980949B2 (en) * 2003-03-14 2005-12-27 Sonum Technologies, Inc. Natural language processor
US7941009B2 (en) * 2003-04-08 2011-05-10 The Penn State Research Foundation Real-time computerized annotation of pictures
US20040243545A1 (en) * 2003-05-29 2004-12-02 Dictaphone Corporation Systems and methods utilizing natural language medical records
US7634435B2 (en) * 2003-05-13 2009-12-15 Jp Morgan Chase Bank Diversified fixed income product and method for creating and marketing same
US20040230898A1 (en) * 2003-05-13 2004-11-18 International Business Machines Corporation Identifying topics in structured documents for machine translation
JP4014160B2 (en) * 2003-05-30 2007-11-28 インターナショナル・ビジネス・マシーンズ・コーポレーション Information processing apparatus, program, and recording medium
US7770184B2 (en) * 2003-06-06 2010-08-03 Jp Morgan Chase Bank Integrated trading platform architecture
US20050015324A1 (en) * 2003-07-15 2005-01-20 Jacob Mathews Systems and methods for trading financial instruments across different types of trading platforms
US7383241B2 (en) * 2003-07-25 2008-06-03 Enkata Technologies, Inc. System and method for estimating performance of a classifier
US7970688B2 (en) * 2003-07-29 2011-06-28 Jp Morgan Chase Bank Method for pricing a trade
US20050060256A1 (en) * 2003-09-12 2005-03-17 Andrew Peterson Foreign exchange trading interface
US20050120300A1 (en) * 2003-09-25 2005-06-02 Dictaphone Corporation Method, system, and apparatus for assembly, transport and display of clinical data
US7346839B2 (en) * 2003-09-30 2008-03-18 Google Inc. Information retrieval based on historical data
US7593876B2 (en) * 2003-10-15 2009-09-22 Jp Morgan Chase Bank System and method for processing partially unstructured data
US8200477B2 (en) * 2003-10-22 2012-06-12 International Business Machines Corporation Method and system for extracting opinions from text documents
US7516492B1 (en) * 2003-10-28 2009-04-07 Rsa Security Inc. Inferring document and content sensitivity from public account accessibility
EP1684507A4 (en) * 2003-11-13 2008-11-26 Panasonic Corp Program recommendation device, program recommendation method of program recommendation device, and computer program
US7689536B1 (en) * 2003-12-18 2010-03-30 Google Inc. Methods and systems for detecting and extracting information
CA2498728A1 (en) * 2004-02-27 2005-08-27 Dictaphone Corporation A system and method for normalization of a string of words
EP1577791B1 (en) * 2004-03-16 2011-11-02 Microdasys Inc. XML content monitoring
US20050222937A1 (en) * 2004-03-31 2005-10-06 Coad Edward J Automated customer exchange
US8423447B2 (en) * 2004-03-31 2013-04-16 Jp Morgan Chase Bank System and method for allocating nominal and cash amounts to trades in a netted trade
US20050251478A1 (en) * 2004-05-04 2005-11-10 Aura Yanavi Investment and method for hedging operational risk associated with business events of another
JP4254623B2 (en) * 2004-06-09 2009-04-15 日本電気株式会社 Topic analysis method, apparatus thereof, and program
US20050283470A1 (en) * 2004-06-17 2005-12-22 Or Kuntzman Content categorization
US7860314B2 (en) * 2004-07-21 2010-12-28 Microsoft Corporation Adaptation of exponential models
US20060020448A1 (en) * 2004-07-21 2006-01-26 Microsoft Corporation Method and apparatus for capitalizing text using maximum entropy
US7693770B2 (en) 2004-08-06 2010-04-06 Jp Morgan Chase & Co. Method and system for creating and marketing employee stock option mirror image warrants
US7698339B2 (en) * 2004-08-13 2010-04-13 Microsoft Corporation Method and system for summarizing a document
US20060095900A1 (en) * 2004-08-26 2006-05-04 Calpont Corporation Semantic processor for a hardware database management system
US7756871B2 (en) * 2004-10-13 2010-07-13 Hewlett-Packard Development Company, L.P. Article extraction
US20060095473A1 (en) * 2004-10-23 2006-05-04 Data Management Associates, Inc. System and method of orchestrating electronic workflow automation processes
US7827180B2 (en) * 2004-11-05 2010-11-02 Intenational Business Machines Corporation Methods and apparatus for assigning content identifiers to content portions
US20090132428A1 (en) * 2004-11-15 2009-05-21 Stephen Jeffrey Wolf Method for creating and marketing a modifiable debt product
US7853544B2 (en) * 2004-11-24 2010-12-14 Overtone, Inc. Systems and methods for automatically categorizing unstructured text
US7937263B2 (en) * 2004-12-01 2011-05-03 Dictaphone Corporation System and method for tokenization of text using classifier models
US7769579B2 (en) 2005-05-31 2010-08-03 Google Inc. Learning facts from semi-structured text
US8706475B2 (en) * 2005-01-10 2014-04-22 Xerox Corporation Method and apparatus for detecting a table of contents and reference determination
US20090164384A1 (en) * 2005-02-09 2009-06-25 Hellen Patrick J Investment structure and method for reducing risk associated with withdrawals from an investment
WO2006088914A1 (en) * 2005-02-14 2006-08-24 Inboxer, Inc. Statistical categorization of electronic messages based on an analysis of accompanying images
US7788087B2 (en) * 2005-03-01 2010-08-31 Microsoft Corporation System for processing sentiment-bearing text
US7788086B2 (en) * 2005-03-01 2010-08-31 Microsoft Corporation Method and apparatus for processing sentiment-bearing text
US8904463B2 (en) * 2005-03-09 2014-12-02 Vudu, Inc. Live video broadcasting on distributed networks
US20080022343A1 (en) 2006-07-24 2008-01-24 Vvond, Inc. Multiple audio streams
US7191215B2 (en) * 2005-03-09 2007-03-13 Marquee, Inc. Method and system for providing instantaneous media-on-demand services by transmitting contents in pieces from client machines
US7698451B2 (en) * 2005-03-09 2010-04-13 Vudu, Inc. Method and apparatus for instant playback of a movie title
US20090025046A1 (en) * 2005-03-09 2009-01-22 Wond, Llc Hybrid architecture for media services
US7937379B2 (en) * 2005-03-09 2011-05-03 Vudu, Inc. Fragmentation of a file for instant access
US9176955B2 (en) * 2005-03-09 2015-11-03 Vvond, Inc. Method and apparatus for sharing media files among network nodes
US20060206531A1 (en) * 2005-03-10 2006-09-14 Kabushiki Kaisha Toshiba Document managing apparatus
JP4260128B2 (en) * 2005-03-17 2009-04-30 富士通株式会社 Business skill estimation program
US8688569B1 (en) 2005-03-23 2014-04-01 Jpmorgan Chase Bank, N.A. System and method for post closing and custody services
US8468445B2 (en) * 2005-03-30 2013-06-18 The Trustees Of Columbia University In The City Of New York Systems and methods for content extraction
US8682913B1 (en) 2005-03-31 2014-03-25 Google Inc. Corroborating facts extracted from multiple sources
US7587387B2 (en) 2005-03-31 2009-09-08 Google Inc. User interface for facts query engine with snippets from information sources that include query terms and answer terms
US9208229B2 (en) * 2005-03-31 2015-12-08 Google Inc. Anchor text summarization for corroboration
US8996470B1 (en) 2005-05-31 2015-03-31 Google Inc. System for ensuring the internal consistency of a fact repository
US20090187512A1 (en) * 2005-05-31 2009-07-23 Jp Morgan Chase Bank Asset-backed investment instrument and related methods
US7822682B2 (en) 2005-06-08 2010-10-26 Jpmorgan Chase Bank, N.A. System and method for enhancing supply chain transactions
US8099511B1 (en) 2005-06-11 2012-01-17 Vudu, Inc. Instantaneous media-on-demand
US8572018B2 (en) * 2005-06-20 2013-10-29 New York University Method, system and software arrangement for reconstructing formal descriptive models of processes from functional/modal data using suitable ontology
US20110035306A1 (en) * 2005-06-20 2011-02-10 Jpmorgan Chase Bank, N.A. System and method for buying and selling securities
US8019758B2 (en) * 2005-06-21 2011-09-13 Microsoft Corporation Generation of a blended classification model
US8543906B2 (en) * 2005-06-29 2013-09-24 Xerox Corporation Probabilistic learning method for XML annotation of documents
US20070016863A1 (en) * 2005-07-08 2007-01-18 Yan Qu Method and apparatus for extracting and structuring domain terms
CA2928051C (en) * 2005-07-15 2018-07-24 Indxit Systems, Inc. Systems and methods for data indexing and processing
US8433558B2 (en) 2005-07-25 2013-04-30 At&T Intellectual Property Ii, L.P. Methods and systems for natural language understanding using human knowledge and collected data
US7567928B1 (en) 2005-09-12 2009-07-28 Jpmorgan Chase Bank, N.A. Total fair value swap
US20070067155A1 (en) * 2005-09-20 2007-03-22 Sonum Technologies, Inc. Surface structure generation
US9177050B2 (en) 2005-10-04 2015-11-03 Thomson Reuters Global Resources Systems, methods, and interfaces for extending legal search results
US7451155B2 (en) * 2005-10-05 2008-11-11 At&T Intellectual Property I, L.P. Statistical methods and apparatus for records management
US7818238B1 (en) 2005-10-11 2010-10-19 Jpmorgan Chase Bank, N.A. Upside forward with early funding provision
GB0521552D0 (en) * 2005-10-22 2005-11-30 Ibm Method and system for constructing a classifier
US7917519B2 (en) * 2005-10-26 2011-03-29 Sizatola, Llc Categorized document bases
US20070106644A1 (en) * 2005-11-08 2007-05-10 International Business Machines Corporation Methods and apparatus for extracting and correlating text information derived from comment and product databases for use in identifying product improvements based on comment and product database commonalities
WO2007059287A1 (en) 2005-11-16 2007-05-24 Evri Inc. Extending keyword searching to syntactically and semantically annotated data
US20070124316A1 (en) * 2005-11-29 2007-05-31 Chan John Y M Attribute selection for collaborative groupware documents using a multi-dimensional matrix
US7873584B2 (en) * 2005-12-22 2011-01-18 Oren Asher Method and system for classifying users of a computer network
US8260785B2 (en) 2006-02-17 2012-09-04 Google Inc. Automatic object reference identification and linking in a browseable fact repository
US8280794B1 (en) 2006-02-03 2012-10-02 Jpmorgan Chase Bank, National Association Price earnings derivative financial product
US8379841B2 (en) 2006-03-23 2013-02-19 Exegy Incorporated Method and system for high throughput blockwise independent encryption/decryption
US8364467B1 (en) 2006-03-31 2013-01-29 Google Inc. Content-based classification
US7620578B1 (en) 2006-05-01 2009-11-17 Jpmorgan Chase Bank, N.A. Volatility derivative financial product
US7647268B1 (en) 2006-05-04 2010-01-12 Jpmorgan Chase Bank, N.A. System and method for implementing a recurrent bidding process
US20070294223A1 (en) * 2006-06-16 2007-12-20 Technion Research And Development Foundation Ltd. Text Categorization Using External Knowledge
US8108204B2 (en) * 2006-06-16 2012-01-31 Evgeniy Gabrilovich Text categorization using external knowledge
US7664740B2 (en) * 2006-06-26 2010-02-16 Microsoft Corporation Automatically displaying keywords and other supplemental information
US7873641B2 (en) * 2006-07-14 2011-01-18 Bea Systems, Inc. Using tags in an enterprise search system
US20080016052A1 (en) * 2006-07-14 2008-01-17 Bea Systems, Inc. Using Connections Between Users and Documents to Rank Documents in an Enterprise Search System
US20080016061A1 (en) * 2006-07-14 2008-01-17 Bea Systems, Inc. Using a Core Data Structure to Calculate Document Ranks
US7730062B2 (en) * 2006-08-01 2010-06-01 Topix Llc Cap-sensitive text search for documents
US9811868B1 (en) 2006-08-29 2017-11-07 Jpmorgan Chase Bank, N.A. Systems and methods for integrating a deal process
US8340957B2 (en) * 2006-08-31 2012-12-25 Waggener Edstrom Worldwide, Inc. Media content assessment and control systems
US8296812B1 (en) 2006-09-01 2012-10-23 Vudu, Inc. Streaming video using erasure encoding
WO2008029150A1 (en) * 2006-09-07 2008-03-13 Xploite Plc Categorisation of data using a model
WO2008029154A1 (en) * 2006-09-07 2008-03-13 Xploite Plc Processing a database
US8996993B2 (en) * 2006-09-15 2015-03-31 Battelle Memorial Institute Text analysis devices, articles of manufacture, and text analysis methods
US8452767B2 (en) * 2006-09-15 2013-05-28 Battelle Memorial Institute Text analysis devices, articles of manufacture, and text analysis methods
US7917492B2 (en) * 2007-09-21 2011-03-29 Limelight Networks, Inc. Method and subsystem for information acquisition and aggregation to facilitate ontology and language-model generation within a content-search-service system
US9015172B2 (en) 2006-09-22 2015-04-21 Limelight Networks, Inc. Method and subsystem for searching media content within a content-search service system
US8396878B2 (en) 2006-09-22 2013-03-12 Limelight Networks, Inc. Methods and systems for generating automated tags for video files
US8966389B2 (en) * 2006-09-22 2015-02-24 Limelight Networks, Inc. Visual interface for identifying positions of interest within a sequentially ordered information encoding
US8204891B2 (en) * 2007-09-21 2012-06-19 Limelight Networks, Inc. Method and subsystem for searching media content within a content-search-service system
WO2008043082A2 (en) 2006-10-05 2008-04-10 Splunk Inc. Time series search engine
US9495358B2 (en) * 2006-10-10 2016-11-15 Abbyy Infopoisk Llc Cross-language text clustering
US9588958B2 (en) * 2006-10-10 2017-03-07 Abbyy Infopoisk Llc Cross-language text classification
US8073681B2 (en) 2006-10-16 2011-12-06 Voicebox Technologies, Inc. System and method for a cooperative conversational voice user interface
US8122026B1 (en) 2006-10-20 2012-02-21 Google Inc. Finding and disambiguating references to entities on web pages
US20080103849A1 (en) * 2006-10-31 2008-05-01 Forman George H Calculating an aggregate of attribute values associated with plural cases
US7827096B1 (en) 2006-11-03 2010-11-02 Jp Morgan Chase Bank, N.A. Special maturity ASR recalculated timing
US7660793B2 (en) 2006-11-13 2010-02-09 Exegy Incorporated Method and system for high performance integration, processing and searching of structured and unstructured data using coprocessors
US8326819B2 (en) * 2006-11-13 2012-12-04 Exegy Incorporated Method and system for high performance data metatagging and data indexing using coprocessors
US8645397B1 (en) * 2006-11-30 2014-02-04 At&T Intellectual Property Ii, L.P. Method and apparatus for propagating updates in databases
US7818176B2 (en) 2007-02-06 2010-10-19 Voicebox Technologies, Inc. System and method for selecting and presenting advertisements based on natural language processing of voice-based input
US8108413B2 (en) 2007-02-15 2012-01-31 International Business Machines Corporation Method and apparatus for automatically discovering features in free form heterogeneous data
US8996587B2 (en) 2007-02-15 2015-03-31 International Business Machines Corporation Method and apparatus for automatically structuring free form hetergeneous data
US20080249764A1 (en) * 2007-03-01 2008-10-09 Microsoft Corporation Smart Sentiment Classifier for Product Reviews
US8954469B2 (en) 2007-03-14 2015-02-10 Vcvciii Llc Query templates and labeled search tip system, methods, and techniques
US8347202B1 (en) 2007-03-14 2013-01-01 Google Inc. Determining geographic locations for place names in a fact repository
US8180713B1 (en) 2007-04-13 2012-05-15 Standard & Poor's Financial Services Llc System and method for searching and identifying potential financial risks disclosed within a document
US8122032B2 (en) 2007-07-20 2012-02-21 Google Inc. Identifying and linking similar passages in a digital text corpus
US9323827B2 (en) * 2007-07-20 2016-04-26 Google Inc. Identifying key terms related to similar passages
US9396254B1 (en) * 2007-07-20 2016-07-19 Hewlett-Packard Development Company, L.P. Generation of representative document components
US7970766B1 (en) 2007-07-23 2011-06-28 Google Inc. Entity type assignment
EP2019361A1 (en) * 2007-07-26 2009-01-28 Siemens Aktiengesellschaft A method and apparatus for extraction of textual content from hypertext web documents
EP2186250B1 (en) 2007-08-31 2019-03-27 IP Reservoir, LLC Method and apparatus for hardware-accelerated encryption/decryption
US8560950B2 (en) * 2007-09-04 2013-10-15 Apple Inc. Advanced playlist creation
US8290272B2 (en) * 2007-09-14 2012-10-16 Abbyy Software Ltd. Creating a document template for capturing data from a document image and capturing data from a document image
US9081852B2 (en) * 2007-10-05 2015-07-14 Fujitsu Limited Recommending terms to specify ontology space
US8280892B2 (en) 2007-10-05 2012-10-02 Fujitsu Limited Selecting tags for a document by analyzing paragraphs of the document
US8594996B2 (en) 2007-10-17 2013-11-26 Evri Inc. NLP-based entity recognition and disambiguation
US8700604B2 (en) 2007-10-17 2014-04-15 Evri, Inc. NLP-based content recommender
US7937389B2 (en) * 2007-11-01 2011-05-03 Ut-Battelle, Llc Dynamic reduction of dimensions of a document vector in a document search and retrieval system
US20090116756A1 (en) * 2007-11-06 2009-05-07 Copanion, Inc. Systems and methods for training a document classification system using documents from a plurality of users
US8812435B1 (en) 2007-11-16 2014-08-19 Google Inc. Learning objects and facts from documents
US8392816B2 (en) * 2007-12-03 2013-03-05 Microsoft Corporation Page classifier engine
US8250469B2 (en) * 2007-12-03 2012-08-21 Microsoft Corporation Document layout extraction
US8140335B2 (en) 2007-12-11 2012-03-20 Voicebox Technologies, Inc. System and method for providing a natural language voice user interface in an integrated voice navigation services environment
JP5290591B2 (en) * 2008-02-12 2013-09-18 キヤノン株式会社 Document management apparatus, method, program, and document management system
US9082080B2 (en) * 2008-03-05 2015-07-14 Kofax, Inc. Systems and methods for organizing data sets
US8046361B2 (en) * 2008-04-18 2011-10-25 Yahoo! Inc. System and method for classifying tags of content using a hyperlinked corpus of classified web pages
JP4875024B2 (en) * 2008-05-09 2012-02-15 株式会社東芝 Image information transmission device
US9305548B2 (en) 2008-05-27 2016-04-05 Voicebox Technologies Corporation System and method for an integrated, multi-modal, multi-device natural language voice services environment
US8972410B2 (en) * 2008-07-30 2015-03-03 Hewlett-Packard Development Company, L.P. Identifying related objects in a computer database
US8547589B2 (en) 2008-09-08 2013-10-01 Abbyy Software Ltd. Data capture from multi-page documents
US9390321B2 (en) 2008-09-08 2016-07-12 Abbyy Development Llc Flexible structure descriptions for multi-page documents
US8386489B2 (en) 2008-11-07 2013-02-26 Raytheon Company Applying formal concept analysis to validate expanded concept types
US8463808B2 (en) 2008-11-07 2013-06-11 Raytheon Company Expanding concept types in conceptual graphs
US20100131569A1 (en) * 2008-11-21 2010-05-27 Robert Marc Jamison Method & apparatus for identifying a secondary concept in a collection of documents
US20100153092A1 (en) * 2008-12-15 2010-06-17 Raytheon Company Expanding Base Attributes for Terms
US8577924B2 (en) * 2008-12-15 2013-11-05 Raytheon Company Determining base attributes for terms
US9087293B2 (en) 2008-12-23 2015-07-21 Raytheon Company Categorizing concept types of a conceptual graph
US8301619B2 (en) * 2009-02-18 2012-10-30 Avaya Inc. System and method for generating queries
US8326637B2 (en) 2009-02-20 2012-12-04 Voicebox Technologies, Inc. System and method for processing multi-modal device interactions in a natural language voice services environment
US20120185501A1 (en) * 2011-01-18 2012-07-19 Ilya Geller Systems and methods for searching data
US8516013B2 (en) 2009-03-03 2013-08-20 Ilya Geller Systems and methods for subtext searching data using synonym-enriched predicative phrases and substituted pronouns
US8447789B2 (en) * 2009-09-15 2013-05-21 Ilya Geller Systems and methods for creating structured data
US8145636B1 (en) * 2009-03-13 2012-03-27 Google Inc. Classifying text into hierarchical categories
US20100268528A1 (en) * 2009-04-16 2010-10-21 Olga Raskina Method & Apparatus for Identifying Contract Characteristics
US20100274750A1 (en) * 2009-04-22 2010-10-28 Microsoft Corporation Data Classification Pipeline Including Automatic Classification Rules
US8122043B2 (en) * 2009-06-30 2012-02-21 Ebsco Industries, Inc System and method for using an exemplar document to retrieve relevant documents from an inverted index of a large corpus
US8352386B2 (en) * 2009-07-02 2013-01-08 International Business Machines Corporation Identifying training documents for a content classifier
US8423554B2 (en) * 2009-07-07 2013-04-16 Sosvia, Inc. Content category scoring for nodes in a linked database
US8959079B2 (en) * 2009-09-29 2015-02-17 International Business Machines Corporation Method and system for providing relationships in search results
US9152883B2 (en) * 2009-11-02 2015-10-06 Harry Urbschat System and method for increasing the accuracy of optical character recognition (OCR)
US9158833B2 (en) * 2009-11-02 2015-10-13 Harry Urbschat System and method for obtaining document information
US9213756B2 (en) * 2009-11-02 2015-12-15 Harry Urbschat System and method of using dynamic variance networks
US8321357B2 (en) * 2009-09-30 2012-11-27 Lapir Gennady Method and system for extraction
US8972436B2 (en) * 2009-10-28 2015-03-03 Yahoo! Inc. Translation model and method for matching reviews to objects
US8954893B2 (en) * 2009-11-06 2015-02-10 Hewlett-Packard Development Company, L.P. Visually representing a hierarchy of category nodes
US8356045B2 (en) * 2009-12-09 2013-01-15 International Business Machines Corporation Method to identify common structures in formatted text documents
KR20110071635A (en) * 2009-12-21 2011-06-29 한국전자통신연구원 System and method for keyword extraction based on rss
US8868402B2 (en) 2009-12-30 2014-10-21 Google Inc. Construction of text classifiers
US8738514B2 (en) 2010-02-18 2014-05-27 Jpmorgan Chase Bank, N.A. System and method for providing borrow coverage services to short sell securities
US20110208670A1 (en) * 2010-02-19 2011-08-25 Jpmorgan Chase Bank, N.A. Execution Optimizer
US8352354B2 (en) * 2010-02-23 2013-01-08 Jpmorgan Chase Bank, N.A. System and method for optimizing order execution
US9710556B2 (en) 2010-03-01 2017-07-18 Vcvc Iii Llc Content recommendation based on collections of entities
EP2369505A1 (en) * 2010-03-26 2011-09-28 British Telecommunications public limited company Text classifier system
US8572013B1 (en) * 2010-03-30 2013-10-29 Amazon Technologies, Inc. Classification of items with manual confirmation
US8645125B2 (en) 2010-03-30 2014-02-04 Evri, Inc. NLP-based systems and methods for providing quotations
US8140567B2 (en) 2010-04-13 2012-03-20 Microsoft Corporation Measuring entity extraction complexity
TWI396983B (en) * 2010-04-14 2013-05-21 Inst Information Industry Named entity marking apparatus, named entity marking method, and computer program product thereof
US8195458B2 (en) * 2010-08-17 2012-06-05 Xerox Corporation Open class noun classification
US9405848B2 (en) 2010-09-15 2016-08-02 Vcvc Iii Llc Recommending mobile device activities
US8725739B2 (en) 2010-11-01 2014-05-13 Evri, Inc. Category-based content recommendation
US10387564B2 (en) * 2010-11-12 2019-08-20 International Business Machines Corporation Automatically assessing document quality for domain-specific documentation
US9342590B2 (en) * 2010-12-23 2016-05-17 Microsoft Technology Licensing, Llc Keywords extraction and enrichment via categorization systems
US9542479B2 (en) * 2011-02-15 2017-01-10 Telenav, Inc. Navigation system with rule based point of interest classification mechanism and method of operation thereof
EP2506157A1 (en) * 2011-03-30 2012-10-03 British Telecommunications Public Limited Company Textual analysis system
US9116995B2 (en) * 2011-03-30 2015-08-25 Vcvc Iii Llc Cluster-based identification of news stories
US8606575B1 (en) * 2011-09-06 2013-12-10 West Corporation Method and apparatus of providing semi-automated classifier adaptation for natural language processing
US9507801B2 (en) * 2011-10-04 2016-11-29 Google Inc. Enforcing category diversity
CN103136247B (en) * 2011-11-29 2015-12-02 阿里巴巴集团控股有限公司 Attribute data interval division method and device
US8751424B1 (en) * 2011-12-15 2014-06-10 The Boeing Company Secure information classification
US8874615B2 (en) * 2012-01-13 2014-10-28 Quova, Inc. Method and apparatus for implementing a learning model for facilitating answering a query on a database
US9256862B2 (en) * 2012-02-10 2016-02-09 International Business Machines Corporation Multi-tiered approach to E-mail prioritization
US9152953B2 (en) * 2012-02-10 2015-10-06 International Business Machines Corporation Multi-tiered approach to E-mail prioritization
US8971630B2 (en) 2012-04-27 2015-03-03 Abbyy Development Llc Fast CJK character recognition
US8989485B2 (en) 2012-04-27 2015-03-24 Abbyy Development Llc Detecting a junction in a text line of CJK characters
US20130304739A1 (en) * 2012-05-10 2013-11-14 Samsung Electronics Co., Ltd. Computing system with domain independence orientation mechanism and method of operation thereof
US9519685B1 (en) * 2012-08-30 2016-12-13 deviantArt, Inc. Tag selection, clustering, and recommendation for content hosting services
US20150170160A1 (en) * 2012-10-23 2015-06-18 Google Inc. Business category classification
US9348899B2 (en) 2012-10-31 2016-05-24 Open Text Corporation Auto-classification system and method with dynamic user feedback
US10366360B2 (en) 2012-11-16 2019-07-30 SPF, Inc. System and method for identifying potential future interaction risks between a client and a provider
US20140143010A1 (en) * 2012-11-16 2014-05-22 SPF, Inc. System and Method for Assessing Interaction Risks Potentially Associated with Transactions Between a Client and a Provider
US9165258B2 (en) 2012-12-10 2015-10-20 Hewlett-Packard Development Company, L.P. Generating training documents
US9047368B1 (en) * 2013-02-19 2015-06-02 Symantec Corporation Self-organizing user-centric document vault
US9535899B2 (en) 2013-02-20 2017-01-03 International Business Machines Corporation Automatic semantic rating and abstraction of literature
CN105264518B (en) * 2013-02-28 2017-12-01 株式会社东芝 Data processing equipment and story model building method
US9355091B2 (en) * 2013-03-13 2016-05-31 Crimson Hexagon, Inc. Systems and methods for language classification
US11928606B2 (en) 2013-03-15 2024-03-12 TSG Technologies, LLC Systems and methods for classifying electronic documents
US9201864B2 (en) * 2013-03-15 2015-12-01 Luminoso Technologies, Inc. Method and system for converting document sets to term-association vector spaces on demand
US9298814B2 (en) 2013-03-15 2016-03-29 Maritz Holdings Inc. Systems and methods for classifying electronic documents
US10614132B2 (en) 2013-04-30 2020-04-07 Splunk Inc. GUI-triggered processing of performance data and log data from an information technology environment
US10019496B2 (en) 2013-04-30 2018-07-10 Splunk Inc. Processing of performance data and log data from an information technology environment by using diverse data stores
US10346357B2 (en) 2013-04-30 2019-07-09 Splunk Inc. Processing of performance data and structure data from an information technology environment
US10318541B2 (en) 2013-04-30 2019-06-11 Splunk Inc. Correlating log data with performance measurements having a specified relationship to a threshold value
US10225136B2 (en) 2013-04-30 2019-03-05 Splunk Inc. Processing of log data and performance data obtained via an application programming interface (API)
US10353957B2 (en) 2013-04-30 2019-07-16 Splunk Inc. Processing of performance data and raw log data from an information technology environment
US10997191B2 (en) 2013-04-30 2021-05-04 Splunk Inc. Query-triggered processing of performance data and log data from an information technology environment
US9262510B2 (en) 2013-05-10 2016-02-16 International Business Machines Corporation Document tagging and retrieval using per-subject dictionaries including subject-determining-power scores for entries
US10331976B2 (en) * 2013-06-21 2019-06-25 Xerox Corporation Label-embedding view of attribute-based recognition
US9348815B1 (en) 2013-06-28 2016-05-24 Digital Reasoning Systems, Inc. Systems and methods for construction, maintenance, and improvement of knowledge representations
US20160210426A1 (en) * 2013-08-30 2016-07-21 3M Innovative Properties Company Method of classifying medical documents
US9424345B1 (en) * 2013-09-25 2016-08-23 Google Inc. Contextual content distribution
US9251136B2 (en) 2013-10-16 2016-02-02 International Business Machines Corporation Document tagging and retrieval using entity specifiers
US9235638B2 (en) 2013-11-12 2016-01-12 International Business Machines Corporation Document retrieval using internal dictionary-hierarchies to adjust per-subject match results
US9552344B2 (en) 2013-12-03 2017-01-24 International Business Machines Corporation Producing visualizations of elements in works of literature
US9298802B2 (en) 2013-12-03 2016-03-29 International Business Machines Corporation Recommendation engine using inferred deep similarities for works of literature
US10073835B2 (en) * 2013-12-03 2018-09-11 International Business Machines Corporation Detecting literary elements in literature and their importance through semantic analysis and literary correlation
US10013655B1 (en) * 2014-03-11 2018-07-03 Applied Underwriters, Inc. Artificial intelligence expert system for anomaly detection
US11803561B1 (en) * 2014-03-31 2023-10-31 Amazon Technologies, Inc. Approximation query
US9311301B1 (en) * 2014-06-27 2016-04-12 Digital Reasoning Systems, Inc. Systems and methods for large scale global entity resolution
US10503761B2 (en) * 2014-07-14 2019-12-10 International Business Machines Corporation System for searching, recommending, and exploring documents through conceptual associations
US10437869B2 (en) 2014-07-14 2019-10-08 International Business Machines Corporation Automatic new concept definition
US9710570B2 (en) * 2014-07-14 2017-07-18 International Business Machines Corporation Computing the relevance of a document to concepts not specified in the document
US9703858B2 (en) * 2014-07-14 2017-07-11 International Business Machines Corporation Inverted table for storing and querying conceptual indices
US10162882B2 (en) 2014-07-14 2018-12-25 Nternational Business Machines Corporation Automatically linking text to concepts in a knowledge base
US10417301B2 (en) 2014-09-10 2019-09-17 Adobe Inc. Analytics based on scalable hierarchical categorization of web content
US9898459B2 (en) 2014-09-16 2018-02-20 Voicebox Technologies Corporation Integration of domain information into state transitions of a finite state transducer for natural language processing
EP3195145A4 (en) 2014-09-16 2018-01-24 VoiceBox Technologies Corporation Voice commerce
CN104268268B (en) * 2014-10-13 2018-05-22 宁波公众信息产业有限公司 A kind of webpage information correlating method and system
CN107003999B (en) 2014-10-15 2020-08-21 声钰科技 System and method for subsequent response to a user's prior natural language input
US11100557B2 (en) 2014-11-04 2021-08-24 International Business Machines Corporation Travel itinerary recommendation engine using inferred interests and sentiments
US10614799B2 (en) 2014-11-26 2020-04-07 Voicebox Technologies Corporation System and method of providing intent predictions for an utterance prior to a system detection of an end of the utterance
US10431214B2 (en) * 2014-11-26 2019-10-01 Voicebox Technologies Corporation System and method of determining a domain and/or an action related to a natural language input
US20160283564A1 (en) * 2015-03-26 2016-09-29 Dejavuto Corp. Predictive visual search enginge
GB2540534A (en) * 2015-06-15 2017-01-25 Erevalue Ltd A method and system for processing data using an augmented natural language processing engine
US11803918B2 (en) 2015-07-07 2023-10-31 Oracle International Corporation System and method for identifying experts on arbitrary topics in an enterprise social network
US11164223B2 (en) 2015-09-04 2021-11-02 Walmart Apollo, Llc System and method for annotating reviews
US10140646B2 (en) * 2015-09-04 2018-11-27 Walmart Apollo, Llc System and method for analyzing features in product reviews and displaying the results
US10643291B2 (en) 2015-09-28 2020-05-05 Smartvid.io, Inc. Media management system
EP3369002A4 (en) * 2015-10-26 2019-06-12 24/7 Customer, Inc. Method and apparatus for facilitating customer intent prediction
US10311087B1 (en) * 2016-03-17 2019-06-04 Veritas Technologies Llc Systems and methods for determining topics of data artifacts
CN107292186B (en) * 2016-03-31 2021-01-12 阿里巴巴集团控股有限公司 Model training method and device based on random forest
US10817519B2 (en) * 2016-06-06 2020-10-27 Baidu Usa Llc Automatic conversion stage discovery
US11238115B1 (en) 2016-07-11 2022-02-01 Wells Fargo Bank, N.A. Semantic and context search using knowledge graphs
JP6235082B1 (en) * 2016-07-13 2017-11-22 ヤフー株式会社 Data classification apparatus, data classification method, and program
US10331784B2 (en) 2016-07-29 2019-06-25 Voicebox Technologies Corporation System and method of disambiguating natural language processing requests
US10430450B2 (en) * 2016-08-22 2019-10-01 International Business Machines Corporation Creation of a summary for a plurality of texts
US11681942B2 (en) 2016-10-27 2023-06-20 Dropbox, Inc. Providing intelligent file name suggestions
US9852377B1 (en) * 2016-11-10 2017-12-26 Dropbox, Inc. Providing intelligent storage location suggestions
WO2018098009A1 (en) * 2016-11-22 2018-05-31 President And Fellows Of Harvard College Improved automated nonparametric content analysis for information management and retrieval
US11238084B1 (en) 2016-12-30 2022-02-01 Wells Fargo Bank, N.A. Semantic translation of data sets
CN106909537B (en) * 2017-02-07 2020-04-07 中山大学 One-word polysemous analysis method based on topic model and vector space
US11275794B1 (en) * 2017-02-14 2022-03-15 Casepoint LLC CaseAssist story designer
US11158012B1 (en) 2017-02-14 2021-10-26 Casepoint LLC Customizing a data discovery user interface based on artificial intelligence
US10740557B1 (en) 2017-02-14 2020-08-11 Casepoint LLC Technology platform for data discovery
JP6930180B2 (en) * 2017-03-30 2021-09-01 富士通株式会社 Learning equipment, learning methods and learning programs
US10528329B1 (en) 2017-04-27 2020-01-07 Intuit Inc. Methods, systems, and computer program product for automatic generation of software application code
US10467122B1 (en) 2017-04-27 2019-11-05 Intuit Inc. Methods, systems, and computer program product for capturing and classification of real-time data and performing post-classification tasks
US10705796B1 (en) * 2017-04-27 2020-07-07 Intuit Inc. Methods, systems, and computer program product for implementing real-time or near real-time classification of digital data
US10467261B1 (en) 2017-04-27 2019-11-05 Intuit Inc. Methods, systems, and computer program product for implementing real-time classification and recommendations
US10339423B1 (en) * 2017-06-13 2019-07-02 Symantec Corporation Systems and methods for generating training documents used by classification algorithms
CN107436922B (en) * 2017-07-05 2021-06-08 北京百度网讯科技有限公司 Text label generation method and device
US10896385B2 (en) 2017-07-27 2021-01-19 Logmein, Inc. Real time learning of text classification models for fast and efficient labeling of training data and customization
CN107506434A (en) * 2017-08-23 2017-12-22 北京百度网讯科技有限公司 Method and apparatus based on artificial intelligence classification phonetic entry text
US10698936B2 (en) * 2017-12-19 2020-06-30 Hireteammate, Inc. Generating and using multiple representations of data objects in computing systems and environments
US10789460B1 (en) * 2018-01-24 2020-09-29 The Boston Consulting Group, Inc. Methods and systems for screening documents
CN110390094B (en) * 2018-04-20 2023-05-23 伊姆西Ip控股有限责任公司 Method, electronic device and computer program product for classifying documents
US10788952B2 (en) * 2018-05-29 2020-09-29 The Boeing Company System and method for obtaining resource materials based on attribute association
RU2712101C2 (en) * 2018-06-27 2020-01-24 Общество с ограниченной ответственностью "Аби Продакшн" Prediction of probability of occurrence of line using sequence of vectors
US11295083B1 (en) * 2018-09-26 2022-04-05 Amazon Technologies, Inc. Neural models for named-entity recognition
US11138425B2 (en) * 2018-09-26 2021-10-05 Leverton Holding Llc Named entity recognition with convolutional networks
US11520835B2 (en) * 2018-09-28 2022-12-06 Rakuten Group, Inc. Learning system, learning method, and program
US11874882B2 (en) * 2019-07-02 2024-01-16 Microsoft Technology Licensing, Llc Extracting key phrase candidates from documents and producing topical authority ranking
US11250214B2 (en) 2019-07-02 2022-02-15 Microsoft Technology Licensing, Llc Keyphrase extraction beyond language modeling
US20210027896A1 (en) * 2019-07-24 2021-01-28 Janssen Pharmaceuticals, Inc. Learning platform for patient journey mapping
CN111539438B (en) * 2020-04-28 2024-01-12 北京百度网讯科技有限公司 Text content identification method and device and electronic equipment
CN112016097B (en) * 2020-08-28 2024-02-27 深圳泓越信息科技有限公司 Method for predicting network security vulnerability time to be utilized
US11321527B1 (en) 2021-01-21 2022-05-03 International Business Machines Corporation Effective classification of data based on curated features
WO2022177806A1 (en) * 2021-02-19 2022-08-25 President And Fellows Of Harvard College Hardware-accelerated topic modeling
US20230297596A1 (en) * 2022-03-15 2023-09-21 International Business Machines Corporation Mutual Exclusion Data Class Analysis in Data Governance
US11694030B1 (en) * 2022-04-06 2023-07-04 Fulcrum Management Solutions Ltd. System and method for automatic theming of a plurality of thought objects

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5371807A (en) * 1992-03-20 1994-12-06 Digital Equipment Corporation Method and apparatus for text classification
WO1999067728A1 (en) * 1998-06-23 1999-12-29 Microsoft Corporation Methods and apparatus for classifying text and for building a text classifier

Family Cites Families (51)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4817623A (en) * 1983-10-14 1989-04-04 Somanetics Corporation Method and apparatus for interpreting optical response data
US5553226A (en) 1985-03-27 1996-09-03 Hitachi, Ltd. System for displaying concept networks
US5404506A (en) 1985-03-27 1995-04-04 Hitachi, Ltd. Knowledge based information retrieval system
US4918621A (en) 1986-08-13 1990-04-17 Intellicorp, Inc. Method for representing a directed acyclic graph of worlds using an assumption-based truth maintenance system
US4833610A (en) * 1986-12-16 1989-05-23 International Business Machines Corporation Morphological/phonetic method for ranking word similarities
US4931951A (en) * 1987-05-08 1990-06-05 Mitsubishi Denki Kabushiki Kaisha Method for generating rules for an expert system for use in controlling a plant
US5257394A (en) * 1988-10-18 1993-10-26 Japan Atomic Energy Research Institute Logical expression processing pipeline using pushdown stacks for a vector computer
US5278911A (en) * 1989-05-18 1994-01-11 Smiths Industries Public Limited Company Speech recognition using a neural net
US5073867A (en) * 1989-06-12 1991-12-17 Westinghouse Electric Corp. Digital neural network processing elements
US5309359A (en) 1990-08-16 1994-05-03 Boris Katz Method and apparatus for generating and utlizing annotations to facilitate computer text retrieval
US5404295A (en) 1990-08-16 1995-04-04 Katz; Boris Method and apparatus for utilizing annotations to facilitate computer retrieval of database material
US5621857A (en) * 1991-12-20 1997-04-15 Oregon Graduate Institute Of Science And Technology Method and system for identifying and recognizing speech
JPH06176081A (en) 1992-12-02 1994-06-24 Hitachi Ltd Hierarchical structure browsing method and device
US5787417A (en) 1993-01-28 1998-07-28 Microsoft Corporation Method and system for selection of hierarchically related information using a content-variable list
US5594837A (en) 1993-01-29 1997-01-14 Noyes; Dallas B. Method for representation of knowledge in a computer as a network database system
JP3170095B2 (en) 1993-04-14 2001-05-28 富士通株式会社 Information retrieval system
US5325533A (en) * 1993-06-28 1994-06-28 Taligent, Inc. Engineering system for modeling computer programs
JP3053153B2 (en) 1993-09-20 2000-06-19 株式会社日立製作所 How to start application of document management system
JPH07160658A (en) * 1993-12-07 1995-06-23 Togami Electric Mfg Co Ltd Method for classifying data
US5655116A (en) 1994-02-28 1997-08-05 Lucent Technologies Inc. Apparatus and methods for retrieving information
US5600831A (en) 1994-02-28 1997-02-04 Lucent Technologies Inc. Apparatus and methods for retrieving information by modifying query plan based on description of information sources
US5671333A (en) 1994-04-07 1997-09-23 Lucent Technologies Inc. Training apparatus and method
US5625748A (en) 1994-04-18 1997-04-29 Bbn Corporation Topic discriminator using posterior probability or confidence scores
US5630125A (en) 1994-05-23 1997-05-13 Zellweger; Paul Method and apparatus for information management using an open hierarchical data structure
US5659725A (en) 1994-06-06 1997-08-19 Lucent Technologies Inc. Query optimization by predicate move-around
US5758322A (en) * 1994-12-09 1998-05-26 International Voice Register, Inc. Method and apparatus for conducting point-of-sale transactions using voice recognition
US5794050A (en) 1995-01-04 1998-08-11 Intelligent Text Processing, Inc. Natural language understanding system
US5625767A (en) 1995-03-13 1997-04-29 Bartell; Brian Method and system for two-dimensional visualization of an information taxonomy and of text documents based on topical content of the documents
US6151592A (en) * 1995-06-07 2000-11-21 Seiko Epson Corporation Recognition apparatus using neural network, and learning method therefor
US5724571A (en) 1995-07-07 1998-03-03 Sun Microsystems, Inc. Method and apparatus for generating query responses in a computer-based document retrieval system
US5761385A (en) * 1995-09-05 1998-06-02 Loral Defense Systems Product and method for extracting image data
US5809499A (en) 1995-10-20 1998-09-15 Pattern Discovery Software Systems, Ltd. Computational method for discovering patterns in data sets
US5845270A (en) 1996-01-02 1998-12-01 Datafusion, Inc. Multidimensional input-output modeling for organizing information
US5924108A (en) * 1996-03-29 1999-07-13 Microsoft Corporation Document summarizer for word processors
US6314420B1 (en) 1996-04-04 2001-11-06 Lycos, Inc. Collaborative/adaptive search engine
EP0976062A1 (en) 1996-04-10 2000-02-02 AT&T Corp. Method of organizing information retrieved from the internet using knowledge based representation
US5835918A (en) * 1996-07-01 1998-11-10 Sun Microsystems, Inc. Method-management system and process based on a single master message file
US5819258A (en) 1997-03-07 1998-10-06 Digital Equipment Corporation Method and apparatus for automatically generating hierarchical categories from large document collections
US5878423A (en) 1997-04-21 1999-03-02 Bellsouth Corporation Dynamically processing an index to create an ordered set of questions
US6233575B1 (en) 1997-06-24 2001-05-15 International Business Machines Corporation Multilevel taxonomy based on features derived from training documents classification using fisher values as discrimination values
US6256627B1 (en) 1997-07-08 2001-07-03 At&T Corp. System and method for maintaining a knowledge base and evidence set
US5930748A (en) * 1997-07-11 1999-07-27 Motorola, Inc. Speaker identification system and method
US6237011B1 (en) 1997-10-08 2001-05-22 Caere Corporation Computer-based document management system
US5991756A (en) 1997-11-03 1999-11-23 Yahoo, Inc. Information retrieval from hierarchical compound documents
US6297824B1 (en) 1997-11-26 2001-10-02 Xerox Corporation Interactive interface for viewing retrieval results
US6309359B1 (en) * 1998-06-01 2001-10-30 Michael D. Whitt Method and apparatus for noninvasive determination of peripheral arterial lumenal area
US6006225A (en) 1998-06-15 1999-12-21 Amazon.Com Refining search queries by the suggestion of correlated terms from prior searches
US6317722B1 (en) * 1998-09-18 2001-11-13 Amazon.Com, Inc. Use of electronic shopping carts to generate personal recommendations
US6301579B1 (en) 1998-10-20 2001-10-09 Silicon Graphics, Inc. Method, system, and computer program product for visualizing a data structure
US6745238B1 (en) * 2000-03-31 2004-06-01 Oracle International Corporation Self service system for web site publishing
US6434550B1 (en) 2000-04-14 2002-08-13 Rightnow Technologies, Inc. Temporal updates of relevancy rating of retrieved information in an information search system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5371807A (en) * 1992-03-20 1994-12-06 Digital Equipment Corporation Method and apparatus for text classification
WO1999067728A1 (en) * 1998-06-23 1999-12-29 Microsoft Corporation Methods and apparatus for classifying text and for building a text classifier

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JOACHIMS T: "Text categorization with support vector machines: learning with many relevant features", MACHINE LEARNING. EUROPEAN CONFERENCE ON MACHINE LEARNING. PROCEEDINGS, XX, XX, 21 April 1998 (1998-04-21), pages 137 - 142, XP002119808 *

Also Published As

Publication number Publication date
US20060143175A1 (en) 2006-06-29
US7028250B2 (en) 2006-04-11
AU2001264928A1 (en) 2001-12-03
US20020022956A1 (en) 2002-02-21
WO2001090921A2 (en) 2001-11-29

Similar Documents

Publication Publication Date Title
WO2001090921A3 (en) System and method for automatically classifying text
WO2002008933A3 (en) System and method for automated classification of text by time slicing
WO2002006997A3 (en) Method of and system for screening electronic mail items
CA2236623A1 (en) Method and apparatus for automatically identifying key words within a document
WO2004075029A3 (en) Using distinguishing properties to classify messages
EP2450809A3 (en) Method for extracting information from a database
WO2002095534A3 (en) Methods for feature selection in a learning machine
WO2007033300A3 (en) Systems and methods for martingale boosting in machine learning
WO2006088830A3 (en) System and method for automatically categorizing objects using an empirically based goodness of fit technique
EP2261866A3 (en) Coin sorting apparatus and coin receiving system
WO2003013057A3 (en) Method and apparatus of detecting network activity
EP1154358A3 (en) Automatic text classification system
AU4954200A (en) Document sorting method, document sorter, and recorded medium on which document sorting program is recorded
EP0895398A3 (en) Method and system for identifying call records
CA2463098A1 (en) Character identification
EP0796670A3 (en) Apparatus for sorting sheets or the like
CN106874255A (en) Method and device for rule matching
WO2005034081A3 (en) A method for grouping short windows in audio encoding
WO2000067168A3 (en) Account fraud scoring
CN102722526A (en) Part-of-speech classification statistics-based duplicate webpage and approximate webpage identification method
EP1321906A3 (en) Rental item return method and apparatus
GB0205267D0 (en) Method of making a helmet
DK138189D0 (en) WASTE HOLE COLLECTOR
EP1103929A3 (en) Device for sorting coins
EP1133160A3 (en) Method of and apparatus for distinguishing type of pixel

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP