CA2411227A1 - System and method of creating and using compact linguistic data - Google Patents
System and method of creating and using compact linguistic data Download PDFInfo
- Publication number
- CA2411227A1 CA2411227A1 CA002411227A CA2411227A CA2411227A1 CA 2411227 A1 CA2411227 A1 CA 2411227A1 CA 002411227 A CA002411227 A CA 002411227A CA 2411227 A CA2411227 A CA 2411227A CA 2411227 A1 CA2411227 A1 CA 2411227A1
- Authority
- CA
- Canada
- Prior art keywords
- words
- creating
- linguistic data
- mapped
- characters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99937—Sorting
Abstract
A system and method of creating and using compact linguistic data are provided. Frequencies of words appearing in a corpus are calculated. Each unique character in the words is mapped to a character index, and characters in the words are replaced with the character indexes. Sequences of characters are mapped to substitution indexes, and the sequences of characters in the words are replaced with the substitution indexes. The words are grouped by common prefixes, and each prefix is mapped to location information for the group of words which start with the prefix. -34-
Priority Applications (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CA2003/001023 WO2004006122A2 (en) | 2002-07-03 | 2003-07-03 | System and method of creating and using compact linguistic data |
AT03762372T ATE506651T1 (en) | 2002-07-03 | 2003-07-03 | SYSTEM AND METHOD FOR GENERATING AND USING COMPACT LINGUISTIC DATA |
AU2003249793A AU2003249793A1 (en) | 2002-07-03 | 2003-07-03 | System and method of creating and using compact linguistic data |
JP2004518331A JP4382663B2 (en) | 2002-07-03 | 2003-07-03 | System and method for generating and using concise linguistic data |
DE60336856T DE60336856D1 (en) | 2002-07-03 | 2003-07-03 | SYSTEM AND METHOD FOR THE PRODUCTION AND USE OF COMPACT LINGUISTIC DATA |
EP03762372A EP1631920B1 (en) | 2002-07-03 | 2003-07-03 | System and method of creating and using compact linguistic data |
HK06108040.7A HK1091668A1 (en) | 2002-07-03 | 2006-07-18 | System and method of creating and using compact linguistic data |
JP2009145681A JP2009266244A (en) | 2002-07-03 | 2009-06-18 | System and method of creating and using compact linguistic data |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US39390302P | 2002-07-03 | 2002-07-03 | |
US60/393,903 | 2002-07-03 |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2411227A1 true CA2411227A1 (en) | 2004-01-03 |
CA2411227C CA2411227C (en) | 2007-01-09 |
Family
ID=30770900
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002411227A Expired - Lifetime CA2411227C (en) | 2002-07-03 | 2002-11-07 | System and method of creating and using compact linguistic data |
Country Status (6)
Country | Link |
---|---|
US (3) | US7269548B2 (en) |
JP (1) | JP2009266244A (en) |
CN (1) | CN1703692A (en) |
AT (1) | ATE506651T1 (en) |
CA (1) | CA2411227C (en) |
HK (1) | HK1091668A1 (en) |
Families Citing this family (56)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
USRE43082E1 (en) | 1998-12-10 | 2012-01-10 | Eatoni Ergonomics, Inc. | Touch-typable devices based on ambiguous codes and methods to design such devices |
US7312726B2 (en) | 2004-06-02 | 2007-12-25 | Research In Motion Limited | Handheld electronic device with text disambiguation |
US7091885B2 (en) | 2004-06-02 | 2006-08-15 | 2012244 Ontario Inc. | Handheld electronic device with text disambiguation |
US7711542B2 (en) * | 2004-08-31 | 2010-05-04 | Research In Motion Limited | System and method for multilanguage text input in a handheld electronic device |
US7895218B2 (en) | 2004-11-09 | 2011-02-22 | Veveo, Inc. | Method and system for performing searches for television content using reduced text input |
FR2878344B1 (en) * | 2004-11-22 | 2012-12-21 | Sionnest Laurent Guyot | DATA CONTROLLER AND INPUT DEVICE |
CA2589942A1 (en) * | 2004-12-01 | 2006-08-17 | Whitesmoke, Inc. | System and method for automatic enrichment of documents |
US7779011B2 (en) | 2005-08-26 | 2010-08-17 | Veveo, Inc. | Method and system for dynamically processing ambiguous, reduced text search queries and highlighting results thereof |
US7788266B2 (en) | 2005-08-26 | 2010-08-31 | Veveo, Inc. | Method and system for processing ambiguous, multi-term search queries |
US7644054B2 (en) | 2005-11-23 | 2010-01-05 | Veveo, Inc. | System and method for finding desired results by incremental search using an ambiguous keypad with the input containing orthographic and typographic errors |
US7835998B2 (en) | 2006-03-06 | 2010-11-16 | Veveo, Inc. | Methods and systems for selecting and presenting content on a first system based on user preferences learned on a second system |
US8073860B2 (en) | 2006-03-30 | 2011-12-06 | Veveo, Inc. | Method and system for incrementally selecting and providing relevant search engines in response to a user query |
WO2007124429A2 (en) | 2006-04-20 | 2007-11-01 | Veveo, Inc. | User interface methods and systems for selecting and presenting content based on user navigation and selection actions associated with the content |
US7646868B2 (en) * | 2006-08-29 | 2010-01-12 | Intel Corporation | Method for steganographic cryptography |
US8423908B2 (en) * | 2006-09-08 | 2013-04-16 | Research In Motion Limited | Method for identifying language of text in a handheld electronic device and a handheld electronic device incorporating the same |
US7752193B2 (en) * | 2006-09-08 | 2010-07-06 | Guidance Software, Inc. | System and method for building and retrieving a full text index |
CA3163292A1 (en) | 2006-09-14 | 2008-03-20 | Veveo, Inc. | Methods and systems for dynamically rearranging search results into hierarchically organized concept clusters |
WO2008045690A2 (en) | 2006-10-06 | 2008-04-17 | Veveo, Inc. | Linear character selection display interface for ambiguous text input |
US20080091427A1 (en) * | 2006-10-11 | 2008-04-17 | Nokia Corporation | Hierarchical word indexes used for efficient N-gram storage |
US8078884B2 (en) | 2006-11-13 | 2011-12-13 | Veveo, Inc. | Method of and system for selecting and presenting content based on user identification |
AU2007323859A1 (en) * | 2006-11-19 | 2008-05-29 | Rmax, Llc | Internet-based computer for mobile and thin client users |
US8048363B2 (en) * | 2006-11-20 | 2011-11-01 | Kimberly Clark Worldwide, Inc. | Container with an in-mold label |
US8103499B2 (en) * | 2007-03-22 | 2012-01-24 | Tegic Communications, Inc. | Disambiguation of telephone style key presses to yield Chinese text using segmentation and selective shifting |
WO2008148012A1 (en) | 2007-05-25 | 2008-12-04 | Veveo, Inc. | System and method for text disambiguation and context designation in incremental search |
US8176419B2 (en) * | 2007-12-19 | 2012-05-08 | Microsoft Corporation | Self learning contextual spell corrector |
JP2009245308A (en) * | 2008-03-31 | 2009-10-22 | Fujitsu Ltd | Document proofreading support program, document proofreading support method, and document proofreading support apparatus |
US7663511B2 (en) * | 2008-06-18 | 2010-02-16 | Microsoft Corporation | Dynamic character encoding |
US7730061B2 (en) * | 2008-09-12 | 2010-06-01 | International Business Machines Corporation | Fast-approximate TFIDF |
CN101533403B (en) * | 2008-11-07 | 2010-12-01 | 广东国笔科技股份有限公司 | Derivative generating method and system |
US20100332215A1 (en) * | 2009-06-26 | 2010-12-30 | Nokia Corporation | Method and apparatus for converting text input |
US20110191330A1 (en) | 2010-02-04 | 2011-08-04 | Veveo, Inc. | Method of and System for Enhanced Content Discovery Based on Network and Device Access Behavior |
EP2602724A4 (en) * | 2010-08-06 | 2016-08-17 | Intellectual Business Machines Corp | Method of character string generation, program and system |
JP5392227B2 (en) * | 2010-10-14 | 2014-01-22 | 株式会社Jvcケンウッド | Filtering apparatus and filtering method |
JP5392228B2 (en) * | 2010-10-14 | 2014-01-22 | 株式会社Jvcケンウッド | Program search device and program search method |
JP5605288B2 (en) * | 2011-03-31 | 2014-10-15 | 富士通株式会社 | Appearance map generation method, file extraction method, appearance map generation program, file extraction program, appearance map generation device, and file extraction device |
JPWO2012150637A1 (en) * | 2011-05-02 | 2014-07-28 | 富士通株式会社 | Extraction method, information processing method, extraction program, information processing program, extraction device, and information processing device |
US8924446B2 (en) | 2011-12-29 | 2014-12-30 | Verisign, Inc. | Compression of small strings |
CN102831224B (en) * | 2012-08-24 | 2018-09-04 | 北京百度网讯科技有限公司 | Generation method and device are suggested in a kind of method for building up in data directory library, search |
US9329778B2 (en) * | 2012-09-07 | 2016-05-03 | International Business Machines Corporation | Supplementing a virtual input keyboard |
US10304465B2 (en) | 2012-10-30 | 2019-05-28 | Google Technology Holdings LLC | Voice control user interface for low power mode |
US9584642B2 (en) | 2013-03-12 | 2017-02-28 | Google Technology Holdings LLC | Apparatus with adaptive acoustic echo control for speakerphone mode |
US10381002B2 (en) | 2012-10-30 | 2019-08-13 | Google Technology Holdings LLC | Voice control user interface during low-power mode |
US10373615B2 (en) | 2012-10-30 | 2019-08-06 | Google Technology Holdings LLC | Voice control user interface during low power mode |
USD788115S1 (en) | 2013-03-15 | 2017-05-30 | H2 & Wf3 Research, Llc. | Display screen with graphical user interface for a document management system |
USD772898S1 (en) | 2013-03-15 | 2016-11-29 | H2 & Wf3 Research, Llc | Display screen with graphical user interface for a document management system |
US8788263B1 (en) * | 2013-03-15 | 2014-07-22 | Steven E. Richfield | Natural language processing for analyzing internet content and finding solutions to needs expressed in text |
US9805018B1 (en) | 2013-03-15 | 2017-10-31 | Steven E. Richfield | Natural language processing for analyzing internet content and finding solutions to needs expressed in text |
WO2015073349A1 (en) * | 2013-11-14 | 2015-05-21 | 3M Innovative Properties Company | Systems and methods for obfuscating data using dictionary |
US8768712B1 (en) | 2013-12-04 | 2014-07-01 | Google Inc. | Initiating actions based on partial hotwords |
US20160170971A1 (en) * | 2014-12-15 | 2016-06-16 | Nuance Communications, Inc. | Optimizing a language model based on a topic of correspondence messages |
US9799049B2 (en) * | 2014-12-15 | 2017-10-24 | Nuance Communications, Inc. | Enhancing a message by providing supplemental content in the message |
KR20180031291A (en) * | 2016-09-19 | 2018-03-28 | 삼성전자주식회사 | Multilingual Prediction and Translation Keyboard |
US10120860B2 (en) * | 2016-12-21 | 2018-11-06 | Intel Corporation | Methods and apparatus to identify a count of n-grams appearing in a corpus |
US10877998B2 (en) * | 2017-07-06 | 2020-12-29 | Durga Turaga | Highly atomized segmented and interrogatable data systems (HASIDS) |
US10740381B2 (en) * | 2018-07-18 | 2020-08-11 | International Business Machines Corporation | Dictionary editing system integrated with text mining |
CN110673836B (en) * | 2019-08-22 | 2023-05-23 | 创新先进技术有限公司 | Code complement method, device, computing equipment and storage medium |
Family Cites Families (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4403303A (en) * | 1981-05-15 | 1983-09-06 | Beehive International | Terminal configuration manager |
US4500955A (en) | 1981-12-31 | 1985-02-19 | International Business Machines Corporation | Full word coding for information processing |
US4814746A (en) * | 1983-06-01 | 1989-03-21 | International Business Machines Corporation | Data compression method |
US4843389A (en) * | 1986-12-04 | 1989-06-27 | International Business Machines Corp. | Text compression and expansion method and apparatus |
US4864503A (en) * | 1987-02-05 | 1989-09-05 | Toltran, Ltd. | Method of using a created international language as an intermediate pathway in translation between two national languages |
US5126739A (en) * | 1989-01-13 | 1992-06-30 | Stac Electronics | Data compression apparatus and method |
US5146221A (en) * | 1989-01-13 | 1992-09-08 | Stac, Inc. | Data compression apparatus and method |
DE69118250T2 (en) * | 1990-01-19 | 1996-10-17 | Hewlett Packard Ltd | ACCESS FOR COMPRESSED DATA |
US5254990A (en) * | 1990-02-26 | 1993-10-19 | Fujitsu Limited | Method and apparatus for compression and decompression of data |
EP0688104A2 (en) * | 1990-08-13 | 1995-12-20 | Fujitsu Limited | Data compression method and apparatus |
GB2266822B (en) * | 1990-12-21 | 1995-05-10 | British Telecomm | Speech coding |
US5325091A (en) * | 1992-08-13 | 1994-06-28 | Xerox Corporation | Text-compression technique using frequency-ordered array of word-number mappers |
US5657423A (en) * | 1993-02-22 | 1997-08-12 | Texas Instruments Incorporated | Hardware filter circuit and address circuitry for MPEG encoded data |
US5509088A (en) * | 1993-12-06 | 1996-04-16 | Xerox Corporation | Method for converting CCITT compressed data using a balanced tree |
JPH07192095A (en) | 1993-12-27 | 1995-07-28 | Nec Corp | Character string input device |
US5798721A (en) * | 1994-03-14 | 1998-08-25 | Mita Industrial Co., Ltd. | Method and apparatus for compressing text data |
US5684478A (en) * | 1994-12-06 | 1997-11-04 | Cennoid Technologies, Inc. | Method and apparatus for adaptive data compression |
US5847697A (en) * | 1995-01-31 | 1998-12-08 | Fujitsu Limited | Single-handed keyboard having keys with multiple characters and character ambiguity resolution logic |
US5818437A (en) * | 1995-07-26 | 1998-10-06 | Tegic Communications, Inc. | Reduced keyboard disambiguating computer |
GB2305746B (en) | 1995-09-27 | 2000-03-29 | Canon Res Ct Europe Ltd | Data compression apparatus |
US5778361A (en) * | 1995-09-29 | 1998-07-07 | Microsoft Corporation | Method and system for fast indexing and searching of text in compound-word languages |
JP3566441B2 (en) * | 1996-01-30 | 2004-09-15 | シャープ株式会社 | Dictionary creation device for text compression |
US6169672B1 (en) * | 1996-07-03 | 2001-01-02 | Hitachi, Ltd. | Power converter with clamping circuit |
US5951623A (en) * | 1996-08-06 | 1999-09-14 | Reynar; Jeffrey C. | Lempel- Ziv data compression technique utilizing a dictionary pre-filled with frequent letter combinations, words and/or phrases |
US6023670A (en) * | 1996-08-19 | 2000-02-08 | International Business Machines Corporation | Natural language determination using correlation between common words |
AU6313298A (en) * | 1997-02-24 | 1998-09-22 | Rodney John Smith | Improvements relating to data compression |
US6618506B1 (en) * | 1997-09-23 | 2003-09-09 | International Business Machines Corporation | Method and apparatus for improved compression and decompression |
JPH11143877A (en) * | 1997-10-22 | 1999-05-28 | Internatl Business Mach Corp <Ibm> | Compression method, method for compressing entry index data and machine translation system |
US5896321A (en) * | 1997-11-14 | 1999-04-20 | Microsoft Corporation | Text completion system for a miniature computer |
US6075470A (en) * | 1998-02-26 | 2000-06-13 | Research In Motion Limited | Block-wise adaptive statistical data compressor |
US6646573B1 (en) * | 1998-12-04 | 2003-11-11 | America Online, Inc. | Reduced keyboard text input system for the Japanese language |
US6219731B1 (en) * | 1998-12-10 | 2001-04-17 | Eaton: Ergonomics, Inc. | Method and apparatus for improved multi-tap text input |
GB2347240A (en) * | 1999-02-22 | 2000-08-30 | Nokia Mobile Phones Ltd | Communication terminal having a predictive editor application |
US6668092B1 (en) * | 1999-07-30 | 2003-12-23 | Sun Microsystems, Inc. | Memory efficient variable-length encoding/decoding system |
US6904402B1 (en) * | 1999-11-05 | 2005-06-07 | Microsoft Corporation | System and iterative method for lexicon, segmentation and language model joint optimization |
US6516305B1 (en) * | 2000-01-14 | 2003-02-04 | Microsoft Corporation | Automatic inference of models for statistical code compression |
EP1213643A1 (en) * | 2000-12-05 | 2002-06-12 | Inventec Appliances Corp. | Intelligent dictionary input method |
US7103534B2 (en) * | 2001-03-31 | 2006-09-05 | Microsoft Corporation | Machine learning contextual approach to word determination for text input via reduced keypad keys |
US6400286B1 (en) * | 2001-06-20 | 2002-06-04 | Unisys Corporation | Data compression method and apparatus implemented with limited length character tables |
US6587057B2 (en) * | 2001-07-25 | 2003-07-01 | Quicksilver Technology, Inc. | High performance memory efficient variable-length coding decoder |
US6653954B2 (en) * | 2001-11-07 | 2003-11-25 | International Business Machines Corporation | System and method for efficient data compression |
US20030182279A1 (en) * | 2002-03-19 | 2003-09-25 | Willows Kevin John | Progressive prefix input method for data entry |
US6657565B2 (en) * | 2002-03-21 | 2003-12-02 | International Business Machines Corporation | Method and system for improving lossless compression efficiency |
-
2002
- 2002-11-07 CA CA002411227A patent/CA2411227C/en not_active Expired - Lifetime
- 2002-11-07 US US10/289,656 patent/US7269548B2/en active Active
-
2003
- 2003-07-03 AT AT03762372T patent/ATE506651T1/en not_active IP Right Cessation
- 2003-07-03 CN CNA038157594A patent/CN1703692A/en active Pending
-
2006
- 2006-07-18 HK HK06108040.7A patent/HK1091668A1/en not_active IP Right Cessation
-
2007
- 2007-07-17 US US11/778,982 patent/US7809553B2/en not_active Expired - Lifetime
-
2009
- 2009-06-18 JP JP2009145681A patent/JP2009266244A/en active Pending
-
2010
- 2010-04-27 US US12/767,969 patent/US20100211381A1/en not_active Abandoned
Also Published As
Publication number | Publication date |
---|---|
JP2009266244A (en) | 2009-11-12 |
HK1091668A1 (en) | 2007-01-26 |
CN1703692A (en) | 2005-11-30 |
CA2411227C (en) | 2007-01-09 |
US20040006455A1 (en) | 2004-01-08 |
US7269548B2 (en) | 2007-09-11 |
ATE506651T1 (en) | 2011-05-15 |
US7809553B2 (en) | 2010-10-05 |
US20080015844A1 (en) | 2008-01-17 |
US20100211381A1 (en) | 2010-08-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2411227A1 (en) | System and method of creating and using compact linguistic data | |
WO2003005288A3 (en) | Method and system for performing a pattern match search for text strings | |
BR9612258B1 (en) | interleukin-1beta converting enzyme inhibitors as well as pharmaceutical composition. | |
SG142159A1 (en) | Index structure of metadata, method for providing indices of metadata, and metadata searching method and apparatus using the indices of metadata | |
SG142156A1 (en) | Index structure of metadata, method for providing indices of metadata, and metadata searching method and apparatus using the indices of metadata | |
ATE297992T1 (en) | RECOMBINANT PRODUCTION OF AROMATIC POLYKETIDES | |
WO2002019176A8 (en) | Data list transmutation and input mapping | |
AU3031897A (en) | Method of coding and decoding stereo audio spectral values | |
ES2186690T3 (en) | ENZYMATIC NUCLEIC ACID CONTAINING NON-NUCLEOTIDES. | |
AU1444697A (en) | Sialyl-lewisa and sialyl-lewisx epitope analogues | |
WO2002037690A3 (en) | A method of generating huffman code length information | |
SE0004319D0 (en) | System and procedure | |
ATE195192T1 (en) | DICTIONARY OF THE ALPHABETICAL FOREIGN LANGUAGE | |
WO2005038584A3 (en) | Matching job candidate information | |
AU2000251210A1 (en) | An alphabet character input device | |
Payne | Jusepe de Ribera: The Rawness of Nature | |
Johnson | The Black Scholar books received--Beyond Ontological Blackness: An Essay on African-American Religious and Cultural Criticism by Victor Anderson | |
TW348235B (en) | Method of spelling check using Pinyin and universal characters | |
Kieffer et al. | A class of noiseless data compression algorithms based on Lempel-Ziv parsing trees | |
De Voogd | The Letters of Laurence Sterne | |
Plentinger et al. | CAMASE: register of agro-ecosystems models, version 2, March, 1996 | |
Kupreeva | St Anselm of Canterbury. Works | |
EP0797360A3 (en) | Method for calculating bit length of code word and variable length code table applied to the method therefor | |
Bertoletti | Deborah Parker. Commentary and Ideology: Dante in the Renaissance. | |
TW200516425A (en) | Character searching method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
MKEX | Expiry |
Effective date: 20221107 |