CA2423033A1 - A document categorisation system - Google Patents

A document categorisation system Download PDF

Info

Publication number
CA2423033A1
CA2423033A1 CA002423033A CA2423033A CA2423033A1 CA 2423033 A1 CA2423033 A1 CA 2423033A1 CA 002423033 A CA002423033 A CA 002423033A CA 2423033 A CA2423033 A CA 2423033A CA 2423033 A1 CA2423033 A1 CA 2423033A1
Authority
CA
Canada
Prior art keywords
data
module
categorisation
document
documents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA002423033A
Other languages
French (fr)
Other versions
CA2423033C (en
Inventor
Bhavani Raskutti
Adam Kowalczyk
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Telstra Corp Ltd
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of CA2423033A1 publication Critical patent/CA2423033A1/en
Application granted granted Critical
Publication of CA2423033C publication Critical patent/CA2423033C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification

Abstract

A document categorisation system, including a clusterer for generating clusters of related electronic documents based on features extracted from sa id documents, and a filter module for generating a filter on the basis of said clusters to categorise further documents received by said system. The system may include an editor for manually browsing and modifying the clusters. The categorisation of the documents is based on n-grams, which are used to determine significant features of the documents. The system includes a trend analyzer for determining trends of changing document categories over time, a nd for identifying novel clusters. The system may be implemented as a plug-in module for a spreadsheet application, providing a convenient means for one-o ff or ongoing analysis of text entries in a worksheet.

Claims (39)

1. A document categorisation system including:
a clusterer for generating clusters of related electronic documents based on features extracted from said documents; and a filter module for generating a filter on the basis of said clusters to categorise further documents received by said system.
2. A document categorisation system including:
a clusterer for generating clusters of related electronic documents based on features extracted from said documents; and an editor for browsing and modifying said clusters.
3. A document categorisation system as claimed any one of claims 1 and 2, wherein said clusterer is adapted to extract features from electronic documents, determine significant features from the extracted features, and generate clusters of said documents based an said significant features.
4. A document categorisation system as claimed in any one of flee preceding claims, wherein the clusterer further includes a cluster describer module for generating text describing each cluster,
5. A document categorisation system including:
an editor for browsing and modifying clusters of documents; and a filter module for generating a filter on the basis of features of said clusters to categorise further documents received by said system.
6. A document categorisation system as claimed any one of the preceding claims, wherein said features include at least one of n-grams, words and phrases.
7. A document categorisation system including:

a clusterer for generating clusters of documents by executing unsupervised learning on said documents; and a filter module for generating a filter to categorise received documents by executing supervised learning on said clusters.
8. A document categorisation system as claimed in claim 1 or claim 7, further including an editor for adjusting said clusters.
9. A document categorisation system as claimed in any one of the preceding claims, further including a trend analyzer for determining trends of document categories over time.
10. A method for categorising documents, including creating categories for said documents based on feature extraction, where said features include at least one of n-grams, words and phrases.
11. A method for categorising documents, including:
creating categories for said documents, based on feature extraction; and manually modifying said categories with a category editor.
12. A method for categorising documents, as claimed in claim 10 or claim 11, including selecting features of said documents based on respective discriminating abilities of the features.
13. A method for categorising documents, as claimed in claim 12, wherein each said discriminating ability is based on similarities for said documents with and without said feature.
14. A method for categorising a document, including:
creating a document filter for a pre-existing document category by analysing pre-existing documents in said category; and applying said filter to said document in order to determine whether said document belongs in said category.
15. A method as claimed in claim 14, including generating descriptive labels for large document sets.
16. A method as claimed in claim 14, wherein said filter is also used to produce descriptive labels for large document sets.
17. A method as claimed in claim 15, wherein said descriptive labels include at least one of phrases and sentences.
18. A method as claimed in any one of claims 10 to 13, wherein said features are described by an n-gram extraction process.
19. A method as claimed in any one of claims 14 to 17, wherein said filters are generated using features which are selected using an n-gram extraction process.
20. A method as claimed in any one of claims 10 to 19, include determining a trend of a document category over time.
21. A data categorisation module for use with a spreadsheet application, said module including:
a cluster module for generating clusters of related data from data in a document of the spreadsheet application, based on extracted features of said data; and a training module for generating a filter on the basis of said clusters to categorise further data.
22. A data categorisation module as claimed in claim 21, including a filtering module for categorising further data on the basis of said filter.
23. A data categorisation module as claimed in claim 21, wherein said data includes a plurality of entries in the document.
24. A data categorisation module as claimed in claim 23, wherein said entries include text data to be used for categorising said plurality of entries, and structured data.
25. A data categorisation module as claimed in claim 21, wherein said cluster module is adapted to generate a cluster identifier for identifying a cluster to which an entry of said data belongs, and a cluster size value for identifying the size of said cluster.
26. A data categorisation module as claimed in claim 25, wherein said cluster module is adapted to generate at least one category descriptor for said entry.
27. A data categorisation module as claimed in claim 21, wherein said cluster module generates a worksheet column for identifying a category of each entry of said data.
28. A data categorisation module as claimed in claim 27, wherein said cluster module generates a formatted version of data of an entry for indicating a category descriptor of said entry.
29. A data categorisation module as claimed in claim 21, wherein said data categorisation module includes a module for testing said filters by categorising training data on the basis of filters generated using said training data.
30. A data categorisation module as claimed in claim 21, wherein said data categorisation module includes labelling functions for generating a category identifier for an entry of said data.
31. A data categorisation module as claimed in claim 30, wherein said document is a worksheet and said functions include a labelling function for generating a plurality of category columns of said worksheet for identifying at least one category of an entry.
32. A data categorisation module as claimed in claim 22, wherein said filtering module generates a respective score for each category of an entry.
33. A data categorisation module as claimed in claim 32, wherein said filtering .module generates an error for an entry if any one of said scores is inconsistent with a respective category identifier.
34. A data categorisation module as claimed in claim 32, wherein a score indicates that the corresponding entry belongs to a respective category if said score exceeds a pre-determined value.
35. A. data categorisation module as claimed in claim 34, wherein the default value of said pre-determined value is zero.
36. A data categorisation module as claimed in claim 34, wherein scores exceeding said pre-determined value are formatted differently than scores less than said value.
37. A data categorisation module as claimed in claim 32, wherein a score may be used to calculate a probability that said entry belongs to the corresponding category.
38. A data categorisation module for use with a spreadsheet application, said module including a cluster module for generating clusters of related data from data in a document of the spreadsheet application, based on extracted features of said data.
39. A method of data categorisation in a spreadsheet application, including the steps of:
generating clusters of related data from data in a document of the spreadsheet application, based on extracted features of said data; and generating a filter on the basis of said clusters to categorise further data.
CA2423033A 2000-09-25 2001-09-25 A document categorisation system Expired - Fee Related CA2423033C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
AUPR0338A AUPR033800A0 (en) 2000-09-25 2000-09-25 A document categorisation system
AUPR0338 2000-09-25
PCT/AU2001/001198 WO2002025479A1 (en) 2000-09-25 2001-09-25 A document categorisation system

Publications (2)

Publication Number Publication Date
CA2423033A1 true CA2423033A1 (en) 2002-03-28
CA2423033C CA2423033C (en) 2012-12-04

Family

ID=3824404

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2423033A Expired - Fee Related CA2423033C (en) 2000-09-25 2001-09-25 A document categorisation system

Country Status (6)

Country Link
US (2) US20040100022A1 (en)
EP (1) EP1323078A4 (en)
AU (1) AUPR033800A0 (en)
CA (1) CA2423033C (en)
NZ (1) NZ524988A (en)
WO (1) WO2002025479A1 (en)

Families Citing this family (210)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AUPR958901A0 (en) 2001-12-18 2002-01-24 Telstra New Wave Pty Ltd Information resource taxonomy
US7356461B1 (en) * 2002-01-14 2008-04-08 Nstein Technologies Inc. Text categorization method and apparatus
NL1020670C2 (en) * 2002-05-24 2003-11-25 Oce Tech Bv Determining a semantic image.
US7426509B2 (en) * 2002-11-15 2008-09-16 Justsystems Evans Research, Inc. Method and apparatus for document filtering using ensemble filters
US7725544B2 (en) * 2003-01-24 2010-05-25 Aol Inc. Group based spam classification
US7089241B1 (en) * 2003-01-24 2006-08-08 America Online, Inc. Classifier tuning based on data similarities
US7590695B2 (en) 2003-05-09 2009-09-15 Aol Llc Managing electronic messages
US7409336B2 (en) * 2003-06-19 2008-08-05 Siebel Systems, Inc. Method and system for searching data based on identified subset of categories and relevance-scored text representation-category combinations
RU2635259C1 (en) * 2016-06-22 2017-11-09 Общество с ограниченной ответственностью "Аби Девелопмент" Method and device for determining type of digital document
US7610313B2 (en) * 2003-07-25 2009-10-27 Attenex Corporation System and method for performing efficient document scoring and clustering
US7209908B2 (en) * 2003-09-18 2007-04-24 Microsoft Corporation Data classification using stochastic key feature generation
US7346839B2 (en) * 2003-09-30 2008-03-18 Google Inc. Information retrieval based on historical data
US7191175B2 (en) 2004-02-13 2007-03-13 Attenex Corporation System and method for arranging concept clusters in thematic neighborhood relationships in a two-dimensional visual display space
US20050262039A1 (en) * 2004-05-20 2005-11-24 International Business Machines Corporation Method and system for analyzing unstructured text in data warehouse
FR2872601B1 (en) 2004-07-02 2007-01-19 Radiotelephone Sfr METHOD FOR DETECTING REDUNDANT MESSAGES IN A MESSAGE FLOW
US7698339B2 (en) * 2004-08-13 2010-04-13 Microsoft Corporation Method and system for summarizing a document
TWI254880B (en) * 2004-10-18 2006-05-11 Avectec Com Inc Method for classifying electronic document analysis
US20060212142A1 (en) * 2005-03-16 2006-09-21 Omid Madani System and method for providing interactive feature selection for training a document classification system
US10127130B2 (en) 2005-03-18 2018-11-13 Salesforce.Com Identifying contributors that explain differences between a data set and a subset of the data set
US8782087B2 (en) 2005-03-18 2014-07-15 Beyondcore, Inc. Analyzing large data sets to find deviation patterns
US9792359B2 (en) * 2005-04-29 2017-10-17 Entit Software Llc Providing training information for training a categorizer
US9047290B1 (en) 2005-04-29 2015-06-02 Hewlett-Packard Development Company, L.P. Computing a quantification measure associated with cases in a category
US20070004309A1 (en) * 2005-07-01 2007-01-04 John Hinnen Aerodynamic throwing toy
GB2430073A (en) * 2005-09-08 2007-03-14 Univ East Anglia Analysis and transcription of music
US8341112B2 (en) * 2006-05-19 2012-12-25 Microsoft Corporation Annotation by search
US8386232B2 (en) * 2006-06-01 2013-02-26 Yahoo! Inc. Predicting results for input data based on a model generated from clusters
US20080005137A1 (en) * 2006-06-29 2008-01-03 Microsoft Corporation Incrementally building aspect models
WO2008030510A2 (en) * 2006-09-06 2008-03-13 Nexplore Corporation System and method for weighted search and advertisement placement
US20080189171A1 (en) * 2007-02-01 2008-08-07 Nice Systems Ltd. Method and apparatus for call categorization
US8930331B2 (en) 2007-02-21 2015-01-06 Palantir Technologies Providing unique views of data based on changes or rules
US8239460B2 (en) * 2007-06-29 2012-08-07 Microsoft Corporation Content-based tagging of RSS feeds and E-mail
US8005782B2 (en) * 2007-08-10 2011-08-23 Microsoft Corporation Domain name statistical classification using character-based N-grams
US8041662B2 (en) * 2007-08-10 2011-10-18 Microsoft Corporation Domain name geometrical classification using character-based n-grams
US20090063470A1 (en) * 2007-08-28 2009-03-05 Nogacom Ltd. Document management using business objects
US9081852B2 (en) * 2007-10-05 2015-07-14 Fujitsu Limited Recommending terms to specify ontology space
US20090119281A1 (en) * 2007-11-03 2009-05-07 Andrew Chien-Chung Wang Granular knowledge based search engine
GB2463515A (en) * 2008-04-23 2010-03-24 British Telecomm Classification of online posts using keyword clusters derived from existing posts
GB2459476A (en) * 2008-04-23 2009-10-28 British Telecomm Classification of posts for prioritizing or grouping comments.
JP5347334B2 (en) * 2008-05-29 2013-11-20 富士通株式会社 Summary work support processing method, apparatus and program
US7921584B2 (en) * 2008-06-04 2011-04-12 Michael Kestner Kinetic sculptural system and assembly of interconnected modules
US20100042623A1 (en) * 2008-08-14 2010-02-18 Junlan Feng System and method for mining and tracking business documents
JP5098914B2 (en) * 2008-09-11 2012-12-12 富士通株式会社 Message pattern generation program, method and apparatus
US10747952B2 (en) 2008-09-15 2020-08-18 Palantir Technologies, Inc. Automatic creation and server push of multiple distinct drafts
US8170966B1 (en) * 2008-11-04 2012-05-01 Bitdefender IPR Management Ltd. Dynamic streaming message clustering for rapid spam-wave detection
US8543569B2 (en) * 2009-01-13 2013-09-24 Infotrieve, Inc. System and method for the centralized management of a document ordering and delivery program
US8484200B2 (en) * 2009-01-13 2013-07-09 Infotrieve, Inc. System and method for the centralized management of a document ordering and delivery program
EP2216947A1 (en) * 2009-02-10 2010-08-11 Alcatel Lucent Method of identifying spam messages
US8396850B2 (en) * 2009-02-27 2013-03-12 Red Hat, Inc. Discriminating search results by phrase analysis
US8527500B2 (en) * 2009-02-27 2013-09-03 Red Hat, Inc. Preprocessing text to enhance statistical features
JP5647602B2 (en) * 2009-04-27 2015-01-07 パナソニック インテレクチュアル プロパティ コーポレーション オブアメリカPanasonic Intellectual Property Corporation of America Data processing apparatus, data processing method, program, and integrated circuit
US8296309B2 (en) * 2009-05-29 2012-10-23 H5 System and method for high precision and high recall relevancy searching
US10891659B2 (en) 2009-05-29 2021-01-12 Red Hat, Inc. Placing resources in displayed web pages via context modeling
US9430566B2 (en) * 2009-07-11 2016-08-30 International Business Machines Corporation Control of web content tagging
US8713018B2 (en) 2009-07-28 2014-04-29 Fti Consulting, Inc. System and method for displaying relationships between electronically stored information to provide classification suggestions via inclusion
US9092411B2 (en) * 2009-08-18 2015-07-28 Miosoft Corporation Understanding data in data sets
EP2471009A1 (en) 2009-08-24 2012-07-04 FTI Technology LLC Generating a reference set for use during document review
US20110072047A1 (en) * 2009-09-21 2011-03-24 Microsoft Corporation Interest Learning from an Image Collection for Advertising
US20120197910A1 (en) * 2009-10-11 2012-08-02 Patrick Sander Walsh Method and system for performing classified document research
WO2011149608A1 (en) * 2010-05-25 2011-12-01 Beyondcore, Inc. Identifying and using critical fields in quality management
US8359279B2 (en) 2010-05-26 2013-01-22 Microsoft Corporation Assisted clustering
US9703782B2 (en) 2010-05-28 2017-07-11 Microsoft Technology Licensing, Llc Associating media with metadata of near-duplicates
US8903798B2 (en) 2010-05-28 2014-12-02 Microsoft Corporation Real-time annotation and enrichment of captured video
US9268878B2 (en) 2010-06-22 2016-02-23 Microsoft Technology Licensing, Llc Entity category extraction for an entity that is the subject of pre-labeled data
US8671040B2 (en) * 2010-07-23 2014-03-11 Thomson Reuters Global Resources Credit risk mining
CN102457250B (en) * 2010-10-20 2015-04-15 Tcl集团股份有限公司 Collected data filter processing method and device
US8645298B2 (en) 2010-10-26 2014-02-04 Microsoft Corporation Topic models
US8559682B2 (en) 2010-11-09 2013-10-15 Microsoft Corporation Building a person profile database
CN101984435B (en) * 2010-11-17 2012-10-10 百度在线网络技术(北京)有限公司 Method and device for distributing texts
US9342590B2 (en) * 2010-12-23 2016-05-17 Microsoft Technology Licensing, Llc Keywords extraction and enrichment via categorization systems
US9542479B2 (en) 2011-02-15 2017-01-10 Telenav, Inc. Navigation system with rule based point of interest classification mechanism and method of operation thereof
US9678992B2 (en) 2011-05-18 2017-06-13 Microsoft Technology Licensing, Llc Text to image translation
US9547693B1 (en) 2011-06-23 2017-01-17 Palantir Technologies Inc. Periodic database search manager for multiple data sources
US9092482B2 (en) 2013-03-14 2015-07-28 Palantir Technologies, Inc. Fair scheduling for mixed-query loads
US8799240B2 (en) 2011-06-23 2014-08-05 Palantir Technologies, Inc. System and method for investigating large amounts of data
US8732574B2 (en) 2011-08-25 2014-05-20 Palantir Technologies, Inc. System and method for parameterizing documents for automatic workflow generation
US8504542B2 (en) 2011-09-02 2013-08-06 Palantir Technologies, Inc. Multi-row transactions
US8990224B1 (en) * 2011-11-14 2015-03-24 Google Inc. Detecting document text that is hard to read
US8713028B2 (en) * 2011-11-17 2014-04-29 Yahoo! Inc. Related news articles
US10796232B2 (en) 2011-12-04 2020-10-06 Salesforce.Com, Inc. Explaining differences between predicted outcomes and actual outcomes of a process
US10802687B2 (en) 2011-12-04 2020-10-13 Salesforce.Com, Inc. Displaying differences between different data sets of a process
US8977620B1 (en) * 2011-12-27 2015-03-10 Google Inc. Method and system for document classification
US8954519B2 (en) 2012-01-25 2015-02-10 Bitdefender IPR Management Ltd. Systems and methods for spam detection using character histograms
US9130778B2 (en) 2012-01-25 2015-09-08 Bitdefender IPR Management Ltd. Systems and methods for spam detection using frequency spectra of character strings
US9239848B2 (en) 2012-02-06 2016-01-19 Microsoft Technology Licensing, Llc System and method for semantically annotating images
CA2865187C (en) * 2012-05-15 2015-09-22 Whyz Technologies Limited Method and system relating to salient content extraction for electronic content
US9569327B2 (en) * 2012-10-03 2017-02-14 Xerox Corporation System and method for labeling alert messages from devices for automated management
US9348677B2 (en) 2012-10-22 2016-05-24 Palantir Technologies Inc. System and method for batch evaluation programs
US9123086B1 (en) 2013-01-31 2015-09-01 Palantir Technologies, Inc. Automatically generating event objects from images
US9251292B2 (en) 2013-03-11 2016-02-02 Wal-Mart Stores, Inc. Search result ranking using query clustering
US10037314B2 (en) 2013-03-14 2018-07-31 Palantir Technologies, Inc. Mobile reports
US8868486B2 (en) 2013-03-15 2014-10-21 Palantir Technologies Inc. Time-sensitive cube
US8917274B2 (en) 2013-03-15 2014-12-23 Palantir Technologies Inc. Event matrix based on integrated data
US8909656B2 (en) 2013-03-15 2014-12-09 Palantir Technologies Inc. Filter chains with associated multipath views for exploring large data sets
US8937619B2 (en) 2013-03-15 2015-01-20 Palantir Technologies Inc. Generating an object time series from data objects
US10275778B1 (en) 2013-03-15 2019-04-30 Palantir Technologies Inc. Systems and user interfaces for dynamic and interactive investigation based on automatic malfeasance clustering of related data in various data structures
US9965937B2 (en) 2013-03-15 2018-05-08 Palantir Technologies Inc. External malware data item clustering and analysis
US8788405B1 (en) 2013-03-15 2014-07-22 Palantir Technologies, Inc. Generating data clusters with customizable analysis strategies
CN103258000B (en) * 2013-03-29 2017-02-08 北界无限(北京)软件有限公司 Method and device for clustering high-frequency keywords in webpages
US8799799B1 (en) 2013-05-07 2014-08-05 Palantir Technologies Inc. Interactive geospatial map
US9275291B2 (en) 2013-06-17 2016-03-01 Texifter, LLC System and method of classifier ranking for incorporation into enhanced machine learning
US9223773B2 (en) 2013-08-08 2015-12-29 Palatir Technologies Inc. Template system for custom document generation
US9335897B2 (en) 2013-08-08 2016-05-10 Palantir Technologies Inc. Long click display of a context menu
US8713467B1 (en) 2013-08-09 2014-04-29 Palantir Technologies, Inc. Context-sensitive views
JP5669904B1 (en) * 2013-09-06 2015-02-18 株式会社Ubic Document search system, document search method, and document search program for providing prior information
US9785317B2 (en) 2013-09-24 2017-10-10 Palantir Technologies Inc. Presentation and analysis of user interaction data
US8938686B1 (en) 2013-10-03 2015-01-20 Palantir Technologies Inc. Systems and methods for analyzing performance of an entity
US8812960B1 (en) 2013-10-07 2014-08-19 Palantir Technologies Inc. Cohort-based presentation of user interaction data
US9990422B2 (en) * 2013-10-15 2018-06-05 Adobe Systems Incorporated Contextual analysis engine
US10430806B2 (en) 2013-10-15 2019-10-01 Adobe Inc. Input/output interface for contextual analysis engine
US10235681B2 (en) 2013-10-15 2019-03-19 Adobe Inc. Text extraction module for contextual analysis engine
US9116975B2 (en) 2013-10-18 2015-08-25 Palantir Technologies Inc. Systems and user interfaces for dynamic and interactive simultaneous querying of multiple data stores
US8924872B1 (en) 2013-10-18 2014-12-30 Palantir Technologies Inc. Overview user interface of emergency call data of a law enforcement agency
US9021384B1 (en) 2013-11-04 2015-04-28 Palantir Technologies Inc. Interactive vehicle information map
US8868537B1 (en) 2013-11-11 2014-10-21 Palantir Technologies, Inc. Simple web search
US9105000B1 (en) 2013-12-10 2015-08-11 Palantir Technologies Inc. Aggregating data from a plurality of data sources
US9734217B2 (en) 2013-12-16 2017-08-15 Palantir Technologies Inc. Methods and systems for analyzing entity performance
US9552615B2 (en) 2013-12-20 2017-01-24 Palantir Technologies Inc. Automated database analysis to detect malfeasance
US10356032B2 (en) 2013-12-26 2019-07-16 Palantir Technologies Inc. System and method for detecting confidential information emails
US9043696B1 (en) 2014-01-03 2015-05-26 Palantir Technologies Inc. Systems and methods for visual definition of data associations
US8832832B1 (en) 2014-01-03 2014-09-09 Palantir Technologies Inc. IP reputation
US9009827B1 (en) 2014-02-20 2015-04-14 Palantir Technologies Inc. Security sharing system
US9483162B2 (en) 2014-02-20 2016-11-01 Palantir Technologies Inc. Relationship visualizations
US9727376B1 (en) 2014-03-04 2017-08-08 Palantir Technologies, Inc. Mobile tasks
US8935201B1 (en) 2014-03-18 2015-01-13 Palantir Technologies Inc. Determining and extracting changed data from a data source
US10127229B2 (en) 2014-04-23 2018-11-13 Elsevier B.V. Methods and computer-program products for organizing electronic documents
US9857958B2 (en) 2014-04-28 2018-01-02 Palantir Technologies Inc. Systems and user interfaces for dynamic and interactive access of, investigation of, and analysis of data objects stored in one or more databases
US9483457B2 (en) * 2014-04-28 2016-11-01 International Business Machines Corporation Method for logical organization of worksheets
US9009171B1 (en) 2014-05-02 2015-04-14 Palantir Technologies Inc. Systems and methods for active column filtering
US9535974B1 (en) 2014-06-30 2017-01-03 Palantir Technologies Inc. Systems and methods for identifying key phrase clusters within documents
US9619557B2 (en) 2014-06-30 2017-04-11 Palantir Technologies, Inc. Systems and methods for key phrase characterization of documents
US9202249B1 (en) 2014-07-03 2015-12-01 Palantir Technologies Inc. Data item clustering and analysis
US9785773B2 (en) 2014-07-03 2017-10-10 Palantir Technologies Inc. Malware data item analysis
US10572496B1 (en) 2014-07-03 2020-02-25 Palantir Technologies Inc. Distributed workflow system and database with access controls for city resiliency
US9256664B2 (en) * 2014-07-03 2016-02-09 Palantir Technologies Inc. System and method for news events detection and visualization
US9842586B2 (en) 2014-07-09 2017-12-12 Genesys Telecommunications Laboratories, Inc. System and method for semantically exploring concepts
US20190332619A1 (en) * 2014-08-07 2019-10-31 Cortical.Io Ag Methods and systems for mapping data items to sparse distributed representations
US9454281B2 (en) 2014-09-03 2016-09-27 Palantir Technologies Inc. System for providing dynamic linked panels in user interface
US9767172B2 (en) 2014-10-03 2017-09-19 Palantir Technologies Inc. Data aggregation and analysis system
US9501851B2 (en) 2014-10-03 2016-11-22 Palantir Technologies Inc. Time-series analysis system
US9785328B2 (en) 2014-10-06 2017-10-10 Palantir Technologies Inc. Presentation of multivariate data on a graphical user interface of a computing system
US9984133B2 (en) 2014-10-16 2018-05-29 Palantir Technologies Inc. Schematic and database linking system
US9229952B1 (en) 2014-11-05 2016-01-05 Palantir Technologies, Inc. History preserving data pipeline system and method
US9043894B1 (en) 2014-11-06 2015-05-26 Palantir Technologies Inc. Malicious software detection in a computing system
US10362133B1 (en) 2014-12-22 2019-07-23 Palantir Technologies Inc. Communication data processing architecture
US9348920B1 (en) 2014-12-22 2016-05-24 Palantir Technologies Inc. Concept indexing among database of documents using machine learning techniques
US9367872B1 (en) 2014-12-22 2016-06-14 Palantir Technologies Inc. Systems and user interfaces for dynamic and interactive investigation of bad actor behavior based on automatic clustering of related data in various data structures
US10552994B2 (en) 2014-12-22 2020-02-04 Palantir Technologies Inc. Systems and interactive user interfaces for dynamic retrieval, analysis, and triage of data items
US9335911B1 (en) 2014-12-29 2016-05-10 Palantir Technologies Inc. Interactive user interface for dynamic data analysis exploration and query processing
US9870205B1 (en) 2014-12-29 2018-01-16 Palantir Technologies Inc. Storing logical units of program code generated using a dynamic programming notebook user interface
US9817563B1 (en) 2014-12-29 2017-11-14 Palantir Technologies Inc. System and method of generating data points from one or more data stores of data items for chart creation and manipulation
US10372879B2 (en) 2014-12-31 2019-08-06 Palantir Technologies Inc. Medical claims lead summary report generation
US10387834B2 (en) 2015-01-21 2019-08-20 Palantir Technologies Inc. Systems and methods for accessing and storing snapshots of a remote application in a document
US10176253B2 (en) 2015-01-28 2019-01-08 International Business Machines Corporation Fusion of cluster labeling algorithms by analyzing sub-clusters
US9727560B2 (en) 2015-02-25 2017-08-08 Palantir Technologies Inc. Systems and methods for organizing and identifying documents via hierarchies and dimensions of tags
EP3611632A1 (en) 2015-03-16 2020-02-19 Palantir Technologies Inc. Displaying attribute and event data along paths
US9886467B2 (en) 2015-03-19 2018-02-06 Plantir Technologies Inc. System and method for comparing and visualizing data entities and data entity series
US9392008B1 (en) 2015-07-23 2016-07-12 Palantir Technologies Inc. Systems and methods for identifying information related to payment card breaches
US9454785B1 (en) 2015-07-30 2016-09-27 Palantir Technologies Inc. Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data
US9996595B2 (en) 2015-08-03 2018-06-12 Palantir Technologies, Inc. Providing full data provenance visualization for versioned datasets
US9456000B1 (en) 2015-08-06 2016-09-27 Palantir Technologies Inc. Systems, methods, user interfaces, and computer-readable media for investigating potential malicious communications
US9600146B2 (en) 2015-08-17 2017-03-21 Palantir Technologies Inc. Interactive geospatial map
US10489391B1 (en) 2015-08-17 2019-11-26 Palantir Technologies Inc. Systems and methods for grouping and enriching data items accessed from one or more databases for presentation in a user interface
US10102369B2 (en) 2015-08-19 2018-10-16 Palantir Technologies Inc. Checkout system executable code monitoring, and user account compromise determination system
US10853378B1 (en) 2015-08-25 2020-12-01 Palantir Technologies Inc. Electronic note management via a connected entity graph
US11150917B2 (en) 2015-08-26 2021-10-19 Palantir Technologies Inc. System for data aggregation and analysis of data from a plurality of data sources
US9485265B1 (en) 2015-08-28 2016-11-01 Palantir Technologies Inc. Malicious activity detection system capable of efficiently processing data accessed from databases and generating alerts for display in interactive user interfaces
US10706434B1 (en) 2015-09-01 2020-07-07 Palantir Technologies Inc. Methods and systems for determining location information
US9576015B1 (en) 2015-09-09 2017-02-21 Palantir Technologies, Inc. Domain-specific language for dataset transformations
US10296617B1 (en) 2015-10-05 2019-05-21 Palantir Technologies Inc. Searches of highly structured data
US9424669B1 (en) 2015-10-21 2016-08-23 Palantir Technologies Inc. Generating graphical representations of event participation flow
US10613722B1 (en) 2015-10-27 2020-04-07 Palantir Technologies Inc. Distorting a graph on a computer display to improve the computer's ability to display the graph to, and interact with, a user
US9542446B1 (en) 2015-12-17 2017-01-10 Palantir Technologies, Inc. Automatic generation of composite datasets based on hierarchical fields
US9823818B1 (en) 2015-12-29 2017-11-21 Palantir Technologies Inc. Systems and interactive user interfaces for automatic generation of temporal representation of data objects
US10268735B1 (en) 2015-12-29 2019-04-23 Palantir Technologies Inc. Graph based resolution of matching items in data sources
US10089289B2 (en) 2015-12-29 2018-10-02 Palantir Technologies Inc. Real-time document annotation
US9612723B1 (en) 2015-12-30 2017-04-04 Palantir Technologies Inc. Composite graphical interface with shareable data-objects
US10698938B2 (en) 2016-03-18 2020-06-30 Palantir Technologies Inc. Systems and methods for organizing and identifying documents via hierarchies and dimensions of tags
CN105868781A (en) * 2016-03-29 2016-08-17 国云科技股份有限公司 Method for classifying computer files based on Naive Bayes Classifier algorithm
US10650558B2 (en) 2016-04-04 2020-05-12 Palantir Technologies Inc. Techniques for displaying stack graphs
AU2017274558B2 (en) 2016-06-02 2021-11-11 Nuix North America Inc. Analyzing clusters of coded documents
US10318568B2 (en) 2016-06-07 2019-06-11 International Business Machines Corporation Generation of classification data used for classifying documents
US10007674B2 (en) 2016-06-13 2018-06-26 Palantir Technologies Inc. Data revision control in large-scale data analytic systems
RU2634180C1 (en) 2016-06-24 2017-10-24 Акционерное общество "Лаборатория Касперского" System and method for determining spam-containing message by topic of message sent via e-mail
US10719188B2 (en) 2016-07-21 2020-07-21 Palantir Technologies Inc. Cached database and synchronization system for providing dynamic linked panels in user interface
US10324609B2 (en) 2016-07-21 2019-06-18 Palantir Technologies Inc. System for providing dynamic linked panels in user interface
US10437840B1 (en) 2016-08-19 2019-10-08 Palantir Technologies Inc. Focused probabilistic entity resolution from multiple data sources
US9881066B1 (en) 2016-08-31 2018-01-30 Palantir Technologies, Inc. Systems, methods, user interfaces and algorithms for performing database analysis and search of information involving structured and/or semi-structured data
US10572221B2 (en) 2016-10-20 2020-02-25 Cortical.Io Ag Methods and systems for identifying a level of similarity between a plurality of data representations
US10318630B1 (en) 2016-11-21 2019-06-11 Palantir Technologies Inc. Analysis of large bodies of textual data
US10552436B2 (en) 2016-12-28 2020-02-04 Palantir Technologies Inc. Systems and methods for retrieving and processing data for display
US10460602B1 (en) 2016-12-28 2019-10-29 Palantir Technologies Inc. Interactive vehicle information mapping system
US10628278B2 (en) * 2017-01-26 2020-04-21 International Business Machines Corporation Generation of end-user sessions from end-user events identified from computer system logs
US10776693B2 (en) * 2017-01-31 2020-09-15 Xerox Corporation Method and system for learning transferable feature representations from a source domain for a target domain
US10475219B1 (en) 2017-03-30 2019-11-12 Palantir Technologies Inc. Multidimensional arc chart for visual comparison
US10956406B2 (en) 2017-06-12 2021-03-23 Palantir Technologies Inc. Propagated deletion of database records and derived data
US10403011B1 (en) 2017-07-18 2019-09-03 Palantir Technologies Inc. Passing system with an interactive user interface
US11100425B2 (en) * 2017-10-31 2021-08-24 International Business Machines Corporation Facilitating data-driven mapping discovery
US10929476B2 (en) 2017-12-14 2021-02-23 Palantir Technologies Inc. Systems and methods for visualizing and analyzing multi-dimensional data
US11599369B1 (en) 2018-03-08 2023-03-07 Palantir Technologies Inc. Graphical user interface configuration system
US10754822B1 (en) 2018-04-18 2020-08-25 Palantir Technologies Inc. Systems and methods for ontology migration
US10885021B1 (en) 2018-05-02 2021-01-05 Palantir Technologies Inc. Interactive interpreter and graphical user interface
US11119630B1 (en) 2018-06-19 2021-09-14 Palantir Technologies Inc. Artificial intelligence assisted evaluations and user interface for same
US20210026874A1 (en) * 2018-07-24 2021-01-28 Ntt Docomo, Inc. Document classification device and trained model
WO2020129031A1 (en) * 2018-12-21 2020-06-25 Element Ai Inc. Method and system for generating investigation cases in the context of cybersecurity
US11481578B2 (en) * 2019-02-22 2022-10-25 Neuropace, Inc. Systems and methods for labeling large datasets of physiological records based on unsupervised machine learning
FR3094508A1 (en) * 2019-03-29 2020-10-02 Orange Data enrichment system and method
US10540381B1 (en) 2019-08-09 2020-01-21 Capital One Services, Llc Techniques and components to find new instances of text documents and identify known response templates
KR20210083706A (en) * 2019-12-27 2021-07-07 삼성전자주식회사 Apparatus and method for classifying a category of data
US11657071B1 (en) * 2020-03-13 2023-05-23 Wells Fargo Bank N.A. Mapping disparate datasets
US11734332B2 (en) 2020-11-19 2023-08-22 Cortical.Io Ag Methods and systems for reuse of data item fingerprints in generation of semantic maps

Family Cites Families (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4706956A (en) * 1982-07-02 1987-11-17 Abu Shumays Ibrahim K Regular polyhedron puzzles
US4513970A (en) * 1983-01-24 1985-04-30 Ovidiu Opresco Polymorphic twist puzzle
CS277266B6 (en) * 1990-11-08 1992-12-16 Hrsel Karel Three-dimensioned jig-saw puzzle
JP2774397B2 (en) * 1991-08-23 1998-07-09 富士写真フイルム株式会社 Document recording / retrieval method and apparatus
GB9220404D0 (en) * 1992-08-20 1992-11-11 Nat Security Agency Method of identifying,retrieving and sorting documents
JP3669016B2 (en) * 1994-09-30 2005-07-06 株式会社日立製作所 Document information classification device
US5548697A (en) 1994-12-30 1996-08-20 Panasonic Technologies, Inc. Non-linear color corrector having a neural network and using fuzzy membership values to correct color and a method thereof
WO1996023265A1 (en) * 1995-01-23 1996-08-01 British Telecommunications Public Limited Company Methods and/or systems for accessing information
EP0738526A3 (en) * 1995-04-20 1997-08-27 Dario Cabrera Irregular polyhedron puzzle game with pieces of asimetric shapes
US5675710A (en) * 1995-06-07 1997-10-07 Lucent Technologies, Inc. Method and apparatus for training a text classifier
US6006221A (en) * 1995-08-16 1999-12-21 Syracuse University Multilingual document retrieval system and method using semantic vector matching
US5948058A (en) * 1995-10-30 1999-09-07 Nec Corporation Method and apparatus for cataloging and displaying e-mail using a classification rule preparing means and providing cataloging a piece of e-mail into multiple categories or classification types based on e-mail object information
US5745893A (en) * 1995-11-30 1998-04-28 Electronic Data Systems Corporation Process and system for arrangement of documents
US5864855A (en) * 1996-02-26 1999-01-26 The United States Of America As Represented By The Secretary Of The Army Parallel document clustering process
US5857179A (en) * 1996-09-09 1999-01-05 Digital Equipment Corporation Computer method and apparatus for clustering documents and automatic generation of cluster keywords
US5819258A (en) * 1997-03-07 1998-10-06 Digital Equipment Corporation Method and apparatus for automatically generating hierarchical categories from large document collections
US6298351B1 (en) * 1997-04-11 2001-10-02 International Business Machines Corporation Modifying an unreliable training set for supervised classification
US6137911A (en) * 1997-06-16 2000-10-24 The Dialog Corporation Plc Test classification system and method
US6233575B1 (en) * 1997-06-24 2001-05-15 International Business Machines Corporation Multilevel taxonomy based on features derived from training documents classification using fisher values as discrimination values
US6128613A (en) * 1997-06-26 2000-10-03 The Chinese University Of Hong Kong Method and apparatus for establishing topic word classes based on an entropy cost function to retrieve documents represented by the topic words
US6167397A (en) * 1997-09-23 2000-12-26 At&T Corporation Method of clustering electronic documents in response to a search query
US6032146A (en) * 1997-10-21 2000-02-29 International Business Machines Corporation Dimension reduction for data mining application
US6122628A (en) * 1997-10-31 2000-09-19 International Business Machines Corporation Multidimensional data clustering and dimension reduction for indexing and searching
US5953718A (en) * 1997-11-12 1999-09-14 Oracle Corporation Research mode for a knowledge base search and retrieval system
US6012058A (en) * 1998-03-17 2000-01-04 Microsoft Corporation Scalable system for K-means clustering of large databases
US7194471B1 (en) * 1998-04-10 2007-03-20 Ricoh Company, Ltd. Document classification system and method for classifying a document according to contents of the document
US6446061B1 (en) * 1998-07-31 2002-09-03 International Business Machines Corporation Taxonomy generation for document collections
US6360215B1 (en) * 1998-11-03 2002-03-19 Inktomi Corporation Method and apparatus for retrieving documents based on information other than document content
US6480843B2 (en) * 1998-11-03 2002-11-12 Nec Usa, Inc. Supporting web-query expansion efficiently using multi-granularity indexing and query processing
EP1024437B1 (en) * 1999-01-26 2010-04-21 Xerox Corporation Multi-modal information access
JP3347088B2 (en) * 1999-02-12 2002-11-20 インターナショナル・ビジネス・マシーンズ・コーポレーション Related information search method and system
US6665681B1 (en) * 1999-04-09 2003-12-16 Entrieva, Inc. System and method for generating a taxonomy from a plurality of documents
US6560594B2 (en) * 1999-05-13 2003-05-06 International Business Machines Corporation Cube indices for relational database management systems
US6442545B1 (en) * 1999-06-01 2002-08-27 Clearforest Ltd. Term-level text with mining with taxonomies
JP3793085B2 (en) * 1999-08-06 2006-07-05 レキシス ネクシス System and method for categorizing legal concepts using legal topic systems
US6430547B1 (en) * 1999-09-22 2002-08-06 International Business Machines Corporation Method and system for integrating spatial analysis and data mining analysis to ascertain relationships between collected samples and geology with remotely sensed data
FR2799023B1 (en) * 1999-09-24 2003-04-18 France Telecom METHOD FOR THEMATIC CLASSIFICATION OF DOCUMENTS, MODULE FOR THEMATIC CLASSIFICATION AND SEARCH ENGINE INCORPORATING SUCH A MODULE
US6424971B1 (en) * 1999-10-29 2002-07-23 International Business Machines Corporation System and method for interactive classification and analysis of data
US6701314B1 (en) * 2000-01-21 2004-03-02 Science Applications International Corporation System and method for cataloguing digital information for searching and retrieval
US6922706B1 (en) * 2000-04-27 2005-07-26 International Business Machines Corporation Data mining techniques for enhancing shelf-space management
US6766035B1 (en) * 2000-05-03 2004-07-20 Koninklijke Philips Electronics N.V. Method and apparatus for adaptive position determination video conferencing and other applications

Also Published As

Publication number Publication date
US20060089924A1 (en) 2006-04-27
US7971150B2 (en) 2011-06-28
US20040100022A1 (en) 2004-05-27
WO2002025479B1 (en) 2002-05-10
NZ524988A (en) 2005-02-25
WO2002025479A1 (en) 2002-03-28
CA2423033C (en) 2012-12-04
EP1323078A1 (en) 2003-07-02
AUPR033800A0 (en) 2000-10-19
EP1323078A4 (en) 2005-03-02

Similar Documents

Publication Publication Date Title
CA2423033A1 (en) A document categorisation system
Gamon Linguistic correlates of style: authorship classification with deep linguistic analysis features
CN109255113A (en) Intelligent critique system
CN107943786B (en) Chinese named entity recognition method and system
Yatsko et al. Automatic genre recognition and adaptive text summarization
KR20140080089A (en) A speech recognition device and speech recognition method, database for the speech recognition devicw, and constructing method of the database for the speech recognition device
Alshutayri et al. Arabic language WEKA-based dialect classifier for Arabic automatic speech recognition transcripts
CN101963972A (en) Method and system for extracting emotional keywords
Bigot et al. Person name recognition in ASR outputs using continuous context models
Oroumchian et al. Creating a feasible corpus for Persian POS tagging
Singh et al. Writing Style Change Detection on Multi-Author Documents.
Wu et al. Using a knowledge base to automatically annotate speech corpora and to identify sociolinguistic variation
Atwell et al. Pattern recognition applied to the acquisition of a grammatical classification system from unrestricted English text
Wieling et al. Bipartite spectral graph partitioning to co-cluster varieties and sound correspondences in dialectology
Origlia et al. Automatic classification of emotions via global and local prosodic features on a multilingual emotional database
CN111680493B (en) English text analysis method and device, readable storage medium and computer equipment
Zbancioc et al. Statistical characteristics of the formants of the Romanian vowels in emotional states
CN114880496A (en) Multimedia information topic analysis method, device, equipment and storage medium
Silipo et al. Prosodic stress and topic detection in spoken sentences
KR102372629B1 (en) Triple Extraction method using Pointer Network and the extraction apparatus
Jouvet et al. About vocabulary adaptation for automatic speech recognition of video data
Sun et al. Using maximum entropy model to extract protein-protein interaction information from biomedical literature
KR101318674B1 (en) Word recongnition apparatus by using n-gram
Asker et al. Applying machine learning to Amharic text classification
Garg et al. Identification and Classification of Reduplication Words in Punjabi Language

Legal Events

Date Code Title Description
EEER Examination request
MKLA Lapsed

Effective date: 20170925