CA2411227A1 - System and method of creating and using compact linguistic data - Google Patents

System and method of creating and using compact linguistic data Download PDF

Info

Publication number
CA2411227A1
CA2411227A1 CA002411227A CA2411227A CA2411227A1 CA 2411227 A1 CA2411227 A1 CA 2411227A1 CA 002411227 A CA002411227 A CA 002411227A CA 2411227 A CA2411227 A CA 2411227A CA 2411227 A1 CA2411227 A1 CA 2411227A1
Authority
CA
Canada
Prior art keywords
words
creating
linguistic data
mapped
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA002411227A
Other languages
French (fr)
Other versions
CA2411227C (en
Inventor
Vadim Fux
Michael G. Elizarov
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BlackBerry Ltd
Original Assignee
2012244 Ontario Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 2012244 Ontario Inc filed Critical 2012244 Ontario Inc
Priority to DE60336856T priority Critical patent/DE60336856D1/en
Priority to PCT/CA2003/001023 priority patent/WO2004006122A2/en
Priority to AT03762372T priority patent/ATE506651T1/en
Priority to AU2003249793A priority patent/AU2003249793A1/en
Priority to JP2004518331A priority patent/JP4382663B2/en
Priority to EP03762372A priority patent/EP1631920B1/en
Publication of CA2411227A1 publication Critical patent/CA2411227A1/en
Priority to HK06108040.7A priority patent/HK1091668A1/en
Application granted granted Critical
Publication of CA2411227C publication Critical patent/CA2411227C/en
Priority to JP2009145681A priority patent/JP2009266244A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99937Sorting

Abstract

A system and method of creating and using compact linguistic data are provided. Frequencies of words appearing in a corpus are calculated. Each unique character in the words is mapped to a character index, and characters in the words are replaced with the character indexes. Sequences of characters are mapped to substitution indexes, and the sequences of characters in the words are replaced with the substitution indexes. The words are grouped by common prefixes, and each prefix is mapped to location information for the group of words which start with the prefix. -34-
CA002411227A 2002-07-03 2002-11-07 System and method of creating and using compact linguistic data Expired - Lifetime CA2411227C (en)

Priority Applications (8)

Application Number Priority Date Filing Date Title
PCT/CA2003/001023 WO2004006122A2 (en) 2002-07-03 2003-07-03 System and method of creating and using compact linguistic data
AT03762372T ATE506651T1 (en) 2002-07-03 2003-07-03 SYSTEM AND METHOD FOR GENERATING AND USING COMPACT LINGUISTIC DATA
AU2003249793A AU2003249793A1 (en) 2002-07-03 2003-07-03 System and method of creating and using compact linguistic data
JP2004518331A JP4382663B2 (en) 2002-07-03 2003-07-03 System and method for generating and using concise linguistic data
DE60336856T DE60336856D1 (en) 2002-07-03 2003-07-03 SYSTEM AND METHOD FOR THE PRODUCTION AND USE OF COMPACT LINGUISTIC DATA
EP03762372A EP1631920B1 (en) 2002-07-03 2003-07-03 System and method of creating and using compact linguistic data
HK06108040.7A HK1091668A1 (en) 2002-07-03 2006-07-18 System and method of creating and using compact linguistic data
JP2009145681A JP2009266244A (en) 2002-07-03 2009-06-18 System and method of creating and using compact linguistic data

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US39390302P 2002-07-03 2002-07-03
US60/393,903 2002-07-03

Publications (2)

Publication Number Publication Date
CA2411227A1 true CA2411227A1 (en) 2004-01-03
CA2411227C CA2411227C (en) 2007-01-09

Family

ID=30770900

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002411227A Expired - Lifetime CA2411227C (en) 2002-07-03 2002-11-07 System and method of creating and using compact linguistic data

Country Status (6)

Country Link
US (3) US7269548B2 (en)
JP (1) JP2009266244A (en)
CN (1) CN1703692A (en)
AT (1) ATE506651T1 (en)
CA (1) CA2411227C (en)
HK (1) HK1091668A1 (en)

Families Citing this family (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
USRE43082E1 (en) 1998-12-10 2012-01-10 Eatoni Ergonomics, Inc. Touch-typable devices based on ambiguous codes and methods to design such devices
US7312726B2 (en) 2004-06-02 2007-12-25 Research In Motion Limited Handheld electronic device with text disambiguation
US7091885B2 (en) 2004-06-02 2006-08-15 2012244 Ontario Inc. Handheld electronic device with text disambiguation
US7711542B2 (en) * 2004-08-31 2010-05-04 Research In Motion Limited System and method for multilanguage text input in a handheld electronic device
US7895218B2 (en) 2004-11-09 2011-02-22 Veveo, Inc. Method and system for performing searches for television content using reduced text input
FR2878344B1 (en) * 2004-11-22 2012-12-21 Sionnest Laurent Guyot DATA CONTROLLER AND INPUT DEVICE
CA2589942A1 (en) * 2004-12-01 2006-08-17 Whitesmoke, Inc. System and method for automatic enrichment of documents
US7779011B2 (en) 2005-08-26 2010-08-17 Veveo, Inc. Method and system for dynamically processing ambiguous, reduced text search queries and highlighting results thereof
US7788266B2 (en) 2005-08-26 2010-08-31 Veveo, Inc. Method and system for processing ambiguous, multi-term search queries
US7644054B2 (en) 2005-11-23 2010-01-05 Veveo, Inc. System and method for finding desired results by incremental search using an ambiguous keypad with the input containing orthographic and typographic errors
US7835998B2 (en) 2006-03-06 2010-11-16 Veveo, Inc. Methods and systems for selecting and presenting content on a first system based on user preferences learned on a second system
US8073860B2 (en) 2006-03-30 2011-12-06 Veveo, Inc. Method and system for incrementally selecting and providing relevant search engines in response to a user query
WO2007124429A2 (en) 2006-04-20 2007-11-01 Veveo, Inc. User interface methods and systems for selecting and presenting content based on user navigation and selection actions associated with the content
US7646868B2 (en) * 2006-08-29 2010-01-12 Intel Corporation Method for steganographic cryptography
US8423908B2 (en) * 2006-09-08 2013-04-16 Research In Motion Limited Method for identifying language of text in a handheld electronic device and a handheld electronic device incorporating the same
US7752193B2 (en) * 2006-09-08 2010-07-06 Guidance Software, Inc. System and method for building and retrieving a full text index
CA3163292A1 (en) 2006-09-14 2008-03-20 Veveo, Inc. Methods and systems for dynamically rearranging search results into hierarchically organized concept clusters
WO2008045690A2 (en) 2006-10-06 2008-04-17 Veveo, Inc. Linear character selection display interface for ambiguous text input
US20080091427A1 (en) * 2006-10-11 2008-04-17 Nokia Corporation Hierarchical word indexes used for efficient N-gram storage
US8078884B2 (en) 2006-11-13 2011-12-13 Veveo, Inc. Method of and system for selecting and presenting content based on user identification
AU2007323859A1 (en) * 2006-11-19 2008-05-29 Rmax, Llc Internet-based computer for mobile and thin client users
US8048363B2 (en) * 2006-11-20 2011-11-01 Kimberly Clark Worldwide, Inc. Container with an in-mold label
US8103499B2 (en) * 2007-03-22 2012-01-24 Tegic Communications, Inc. Disambiguation of telephone style key presses to yield Chinese text using segmentation and selective shifting
WO2008148012A1 (en) 2007-05-25 2008-12-04 Veveo, Inc. System and method for text disambiguation and context designation in incremental search
US8176419B2 (en) * 2007-12-19 2012-05-08 Microsoft Corporation Self learning contextual spell corrector
JP2009245308A (en) * 2008-03-31 2009-10-22 Fujitsu Ltd Document proofreading support program, document proofreading support method, and document proofreading support apparatus
US7663511B2 (en) * 2008-06-18 2010-02-16 Microsoft Corporation Dynamic character encoding
US7730061B2 (en) * 2008-09-12 2010-06-01 International Business Machines Corporation Fast-approximate TFIDF
CN101533403B (en) * 2008-11-07 2010-12-01 广东国笔科技股份有限公司 Derivative generating method and system
US20100332215A1 (en) * 2009-06-26 2010-12-30 Nokia Corporation Method and apparatus for converting text input
US20110191330A1 (en) 2010-02-04 2011-08-04 Veveo, Inc. Method of and System for Enhanced Content Discovery Based on Network and Device Access Behavior
EP2602724A4 (en) * 2010-08-06 2016-08-17 Intellectual Business Machines Corp Method of character string generation, program and system
JP5392227B2 (en) * 2010-10-14 2014-01-22 株式会社Jvcケンウッド Filtering apparatus and filtering method
JP5392228B2 (en) * 2010-10-14 2014-01-22 株式会社Jvcケンウッド Program search device and program search method
JP5605288B2 (en) * 2011-03-31 2014-10-15 富士通株式会社 Appearance map generation method, file extraction method, appearance map generation program, file extraction program, appearance map generation device, and file extraction device
JPWO2012150637A1 (en) * 2011-05-02 2014-07-28 富士通株式会社 Extraction method, information processing method, extraction program, information processing program, extraction device, and information processing device
US8924446B2 (en) 2011-12-29 2014-12-30 Verisign, Inc. Compression of small strings
CN102831224B (en) * 2012-08-24 2018-09-04 北京百度网讯科技有限公司 Generation method and device are suggested in a kind of method for building up in data directory library, search
US9329778B2 (en) * 2012-09-07 2016-05-03 International Business Machines Corporation Supplementing a virtual input keyboard
US10304465B2 (en) 2012-10-30 2019-05-28 Google Technology Holdings LLC Voice control user interface for low power mode
US9584642B2 (en) 2013-03-12 2017-02-28 Google Technology Holdings LLC Apparatus with adaptive acoustic echo control for speakerphone mode
US10381002B2 (en) 2012-10-30 2019-08-13 Google Technology Holdings LLC Voice control user interface during low-power mode
US10373615B2 (en) 2012-10-30 2019-08-06 Google Technology Holdings LLC Voice control user interface during low power mode
USD788115S1 (en) 2013-03-15 2017-05-30 H2 & Wf3 Research, Llc. Display screen with graphical user interface for a document management system
USD772898S1 (en) 2013-03-15 2016-11-29 H2 & Wf3 Research, Llc Display screen with graphical user interface for a document management system
US8788263B1 (en) * 2013-03-15 2014-07-22 Steven E. Richfield Natural language processing for analyzing internet content and finding solutions to needs expressed in text
US9805018B1 (en) 2013-03-15 2017-10-31 Steven E. Richfield Natural language processing for analyzing internet content and finding solutions to needs expressed in text
WO2015073349A1 (en) * 2013-11-14 2015-05-21 3M Innovative Properties Company Systems and methods for obfuscating data using dictionary
US8768712B1 (en) 2013-12-04 2014-07-01 Google Inc. Initiating actions based on partial hotwords
US20160170971A1 (en) * 2014-12-15 2016-06-16 Nuance Communications, Inc. Optimizing a language model based on a topic of correspondence messages
US9799049B2 (en) * 2014-12-15 2017-10-24 Nuance Communications, Inc. Enhancing a message by providing supplemental content in the message
KR20180031291A (en) * 2016-09-19 2018-03-28 삼성전자주식회사 Multilingual Prediction and Translation Keyboard
US10120860B2 (en) * 2016-12-21 2018-11-06 Intel Corporation Methods and apparatus to identify a count of n-grams appearing in a corpus
US10877998B2 (en) * 2017-07-06 2020-12-29 Durga Turaga Highly atomized segmented and interrogatable data systems (HASIDS)
US10740381B2 (en) * 2018-07-18 2020-08-11 International Business Machines Corporation Dictionary editing system integrated with text mining
CN110673836B (en) * 2019-08-22 2023-05-23 创新先进技术有限公司 Code complement method, device, computing equipment and storage medium

Family Cites Families (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4403303A (en) * 1981-05-15 1983-09-06 Beehive International Terminal configuration manager
US4500955A (en) 1981-12-31 1985-02-19 International Business Machines Corporation Full word coding for information processing
US4814746A (en) * 1983-06-01 1989-03-21 International Business Machines Corporation Data compression method
US4843389A (en) * 1986-12-04 1989-06-27 International Business Machines Corp. Text compression and expansion method and apparatus
US4864503A (en) * 1987-02-05 1989-09-05 Toltran, Ltd. Method of using a created international language as an intermediate pathway in translation between two national languages
US5126739A (en) * 1989-01-13 1992-06-30 Stac Electronics Data compression apparatus and method
US5146221A (en) * 1989-01-13 1992-09-08 Stac, Inc. Data compression apparatus and method
DE69118250T2 (en) * 1990-01-19 1996-10-17 Hewlett Packard Ltd ACCESS FOR COMPRESSED DATA
US5254990A (en) * 1990-02-26 1993-10-19 Fujitsu Limited Method and apparatus for compression and decompression of data
EP0688104A2 (en) * 1990-08-13 1995-12-20 Fujitsu Limited Data compression method and apparatus
GB2266822B (en) * 1990-12-21 1995-05-10 British Telecomm Speech coding
US5325091A (en) * 1992-08-13 1994-06-28 Xerox Corporation Text-compression technique using frequency-ordered array of word-number mappers
US5657423A (en) * 1993-02-22 1997-08-12 Texas Instruments Incorporated Hardware filter circuit and address circuitry for MPEG encoded data
US5509088A (en) * 1993-12-06 1996-04-16 Xerox Corporation Method for converting CCITT compressed data using a balanced tree
JPH07192095A (en) 1993-12-27 1995-07-28 Nec Corp Character string input device
US5798721A (en) * 1994-03-14 1998-08-25 Mita Industrial Co., Ltd. Method and apparatus for compressing text data
US5684478A (en) * 1994-12-06 1997-11-04 Cennoid Technologies, Inc. Method and apparatus for adaptive data compression
US5847697A (en) * 1995-01-31 1998-12-08 Fujitsu Limited Single-handed keyboard having keys with multiple characters and character ambiguity resolution logic
US5818437A (en) * 1995-07-26 1998-10-06 Tegic Communications, Inc. Reduced keyboard disambiguating computer
GB2305746B (en) 1995-09-27 2000-03-29 Canon Res Ct Europe Ltd Data compression apparatus
US5778361A (en) * 1995-09-29 1998-07-07 Microsoft Corporation Method and system for fast indexing and searching of text in compound-word languages
JP3566441B2 (en) * 1996-01-30 2004-09-15 シャープ株式会社 Dictionary creation device for text compression
US6169672B1 (en) * 1996-07-03 2001-01-02 Hitachi, Ltd. Power converter with clamping circuit
US5951623A (en) * 1996-08-06 1999-09-14 Reynar; Jeffrey C. Lempel- Ziv data compression technique utilizing a dictionary pre-filled with frequent letter combinations, words and/or phrases
US6023670A (en) * 1996-08-19 2000-02-08 International Business Machines Corporation Natural language determination using correlation between common words
AU6313298A (en) * 1997-02-24 1998-09-22 Rodney John Smith Improvements relating to data compression
US6618506B1 (en) * 1997-09-23 2003-09-09 International Business Machines Corporation Method and apparatus for improved compression and decompression
JPH11143877A (en) * 1997-10-22 1999-05-28 Internatl Business Mach Corp <Ibm> Compression method, method for compressing entry index data and machine translation system
US5896321A (en) * 1997-11-14 1999-04-20 Microsoft Corporation Text completion system for a miniature computer
US6075470A (en) * 1998-02-26 2000-06-13 Research In Motion Limited Block-wise adaptive statistical data compressor
US6646573B1 (en) * 1998-12-04 2003-11-11 America Online, Inc. Reduced keyboard text input system for the Japanese language
US6219731B1 (en) * 1998-12-10 2001-04-17 Eaton: Ergonomics, Inc. Method and apparatus for improved multi-tap text input
GB2347240A (en) * 1999-02-22 2000-08-30 Nokia Mobile Phones Ltd Communication terminal having a predictive editor application
US6668092B1 (en) * 1999-07-30 2003-12-23 Sun Microsystems, Inc. Memory efficient variable-length encoding/decoding system
US6904402B1 (en) * 1999-11-05 2005-06-07 Microsoft Corporation System and iterative method for lexicon, segmentation and language model joint optimization
US6516305B1 (en) * 2000-01-14 2003-02-04 Microsoft Corporation Automatic inference of models for statistical code compression
EP1213643A1 (en) * 2000-12-05 2002-06-12 Inventec Appliances Corp. Intelligent dictionary input method
US7103534B2 (en) * 2001-03-31 2006-09-05 Microsoft Corporation Machine learning contextual approach to word determination for text input via reduced keypad keys
US6400286B1 (en) * 2001-06-20 2002-06-04 Unisys Corporation Data compression method and apparatus implemented with limited length character tables
US6587057B2 (en) * 2001-07-25 2003-07-01 Quicksilver Technology, Inc. High performance memory efficient variable-length coding decoder
US6653954B2 (en) * 2001-11-07 2003-11-25 International Business Machines Corporation System and method for efficient data compression
US20030182279A1 (en) * 2002-03-19 2003-09-25 Willows Kevin John Progressive prefix input method for data entry
US6657565B2 (en) * 2002-03-21 2003-12-02 International Business Machines Corporation Method and system for improving lossless compression efficiency

Also Published As

Publication number Publication date
JP2009266244A (en) 2009-11-12
HK1091668A1 (en) 2007-01-26
CN1703692A (en) 2005-11-30
CA2411227C (en) 2007-01-09
US20040006455A1 (en) 2004-01-08
US7269548B2 (en) 2007-09-11
ATE506651T1 (en) 2011-05-15
US7809553B2 (en) 2010-10-05
US20080015844A1 (en) 2008-01-17
US20100211381A1 (en) 2010-08-19

Similar Documents

Publication Publication Date Title
CA2411227A1 (en) System and method of creating and using compact linguistic data
WO2003005288A3 (en) Method and system for performing a pattern match search for text strings
BR9612258B1 (en) interleukin-1beta converting enzyme inhibitors as well as pharmaceutical composition.
SG142159A1 (en) Index structure of metadata, method for providing indices of metadata, and metadata searching method and apparatus using the indices of metadata
SG142156A1 (en) Index structure of metadata, method for providing indices of metadata, and metadata searching method and apparatus using the indices of metadata
ATE297992T1 (en) RECOMBINANT PRODUCTION OF AROMATIC POLYKETIDES
WO2002019176A8 (en) Data list transmutation and input mapping
AU3031897A (en) Method of coding and decoding stereo audio spectral values
ES2186690T3 (en) ENZYMATIC NUCLEIC ACID CONTAINING NON-NUCLEOTIDES.
AU1444697A (en) Sialyl-lewisa and sialyl-lewisx epitope analogues
WO2002037690A3 (en) A method of generating huffman code length information
SE0004319D0 (en) System and procedure
ATE195192T1 (en) DICTIONARY OF THE ALPHABETICAL FOREIGN LANGUAGE
WO2005038584A3 (en) Matching job candidate information
AU2000251210A1 (en) An alphabet character input device
Payne Jusepe de Ribera: The Rawness of Nature
Johnson The Black Scholar books received--Beyond Ontological Blackness: An Essay on African-American Religious and Cultural Criticism by Victor Anderson
TW348235B (en) Method of spelling check using Pinyin and universal characters
Kieffer et al. A class of noiseless data compression algorithms based on Lempel-Ziv parsing trees
De Voogd The Letters of Laurence Sterne
Plentinger et al. CAMASE: register of agro-ecosystems models, version 2, March, 1996
Kupreeva St Anselm of Canterbury. Works
EP0797360A3 (en) Method for calculating bit length of code word and variable length code table applied to the method therefor
Bertoletti Deborah Parker. Commentary and Ideology: Dante in the Renaissance.
TW200516425A (en) Character searching method

Legal Events

Date Code Title Description
EEER Examination request
MKEX Expiry

Effective date: 20221107