CA2364999A1 - Method and apparatus for terminology translation - Google Patents

Method and apparatus for terminology translation Download PDF

Info

Publication number
CA2364999A1
CA2364999A1 CA002364999A CA2364999A CA2364999A1 CA 2364999 A1 CA2364999 A1 CA 2364999A1 CA 002364999 A CA002364999 A CA 002364999A CA 2364999 A CA2364999 A CA 2364999A CA 2364999 A1 CA2364999 A1 CA 2364999A1
Authority
CA
Canada
Prior art keywords
category
language
term
languages
translation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA002364999A
Other languages
French (fr)
Other versions
CA2364999C (en
Inventor
David Hull
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xerox Corp
Original Assignee
Xerox Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xerox Corp filed Critical Xerox Corp
Publication of CA2364999A1 publication Critical patent/CA2364999A1/en
Application granted granted Critical
Publication of CA2364999C publication Critical patent/CA2364999C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/47Machine-assisted translation, e.g. using translation memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/49Data-driven translation using very large corpora, e.g. the web

Abstract

The invention relates to a method and apparatus for generating translations of natural language terms from a first language to a second language. A plurality of terms are extracted from unaligned comparable corpora of the first and second languages. Comparable corpora are sets of documents in different languages that come from the same domain and have similar genre and content. Unaligned documents are not translations of one another and are not linked in any other way.
By accessing monolingual thesauri of the first and second languages, a category is assigned to each extracted term. Then, category-to-category translation probabilities are estimated, and using said category-to-category translation probabilities, term-to-term translation probabilities are estimated. The invention preferably exploits class-based normalization of probability estimates, bi-directionality, and relative frequency normalization. The most important applications are cross-language text retrieval, semi-automatic bilingual thesaurus enhancement, and machine-aided human translation.
CA002364999A 2000-12-18 2001-12-10 Method and apparatus for terminology translation Expired - Fee Related CA2364999C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/737,964 2000-12-18
US09/737,964 US6885985B2 (en) 2000-12-18 2000-12-18 Terminology translation for unaligned comparable corpora using category based translation probabilities

Publications (2)

Publication Number Publication Date
CA2364999A1 true CA2364999A1 (en) 2002-06-18
CA2364999C CA2364999C (en) 2005-05-03

Family

ID=24966003

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002364999A Expired - Fee Related CA2364999C (en) 2000-12-18 2001-12-10 Method and apparatus for terminology translation

Country Status (4)

Country Link
US (1) US6885985B2 (en)
EP (1) EP1217534A3 (en)
JP (1) JP2002222188A (en)
CA (1) CA2364999C (en)

Families Citing this family (91)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4066600B2 (en) * 2000-12-20 2008-03-26 富士ゼロックス株式会社 Multilingual document search system
US6990439B2 (en) * 2001-01-10 2006-01-24 Microsoft Corporation Method and apparatus for performing machine translation using a unified language model and translation model
JP4876329B2 (en) * 2001-05-15 2012-02-15 日本電気株式会社 Parallel translation probability assigning device, parallel translation probability assigning method, and program thereof
FR2825496B1 (en) * 2001-06-01 2003-08-15 Synomia METHOD AND SYSTEM FOR BROAD SYNTAXIC ANALYSIS OF CORPUSES, ESPECIALLY SPECIALIZED CORPUSES
US7191115B2 (en) 2001-06-20 2007-03-13 Microsoft Corporation Statistical method and apparatus for learning translation relationships among words
US8214196B2 (en) 2001-07-03 2012-07-03 University Of Southern California Syntax-based statistical translation model
EP1306775A1 (en) * 2001-10-29 2003-05-02 BRITISH TELECOMMUNICATIONS public limited company Machine translation
WO2003065252A1 (en) * 2002-02-01 2003-08-07 John Fairweather System and method for managing memory
WO2004001623A2 (en) 2002-03-26 2003-12-31 University Of Southern California Constructing a translation lexicon from comparable, non-parallel corpora
US7542908B2 (en) * 2002-10-18 2009-06-02 Xerox Corporation System for learning a language
US7356457B2 (en) * 2003-02-28 2008-04-08 Microsoft Corporation Machine translation using learned word associations without referring to a multi-lingual human authored dictionary of content words
US7283949B2 (en) * 2003-04-04 2007-10-16 International Business Machines Corporation System, method and program product for bidirectional text translation
US8548794B2 (en) 2003-07-02 2013-10-01 University Of Southern California Statistical noun phrase translation
US7711545B2 (en) * 2003-07-02 2010-05-04 Language Weaver, Inc. Empirical methods for splitting compound words with application to machine translation
US20050027566A1 (en) * 2003-07-09 2005-02-03 Haskell Robert Emmons Terminology management system
US7346487B2 (en) * 2003-07-23 2008-03-18 Microsoft Corporation Method and apparatus for identifying translations
US7689412B2 (en) * 2003-12-05 2010-03-30 Microsoft Corporation Synonymous collocation extraction using translation information
KR100559473B1 (en) * 2003-12-24 2006-03-10 한국전자통신연구원 System for stylistic Translation and Method thereof
US8296127B2 (en) * 2004-03-23 2012-10-23 University Of Southern California Discovery of parallel text portions in comparable collections of corpora and training using comparable texts
US8666725B2 (en) 2004-04-16 2014-03-04 University Of Southern California Selection and use of nonstatistical translation components in a statistical machine translation framework
US7620539B2 (en) * 2004-07-12 2009-11-17 Xerox Corporation Methods and apparatuses for identifying bilingual lexicons in comparable corpora using geometric processing
JP5452868B2 (en) 2004-10-12 2014-03-26 ユニヴァーシティー オブ サザン カリフォルニア Training for text-to-text applications that use string-to-tree conversion for training and decoding
US20060174170A1 (en) * 2005-01-28 2006-08-03 Peter Garland Integrated reporting of data
US20060282255A1 (en) * 2005-06-14 2006-12-14 Microsoft Corporation Collocation translation from monolingual and available bilingual corpora
US8886517B2 (en) 2005-06-17 2014-11-11 Language Weaver, Inc. Trust scoring for language translation systems
US8676563B2 (en) 2009-10-01 2014-03-18 Language Weaver, Inc. Providing human-generated and machine-generated trusted translations
US20070016397A1 (en) * 2005-07-18 2007-01-18 Microsoft Corporation Collocation translation using monolingual corpora
US7664629B2 (en) * 2005-07-19 2010-02-16 Xerox Corporation Second language writing advisor
US10319252B2 (en) 2005-11-09 2019-06-11 Sdl Inc. Language capability assessment and training apparatus and techniques
US7536295B2 (en) * 2005-12-22 2009-05-19 Xerox Corporation Machine translation using non-contiguous fragments of text
CN101042692B (en) * 2006-03-24 2010-09-22 富士通株式会社 translation obtaining method and apparatus based on semantic forecast
US8943080B2 (en) 2006-04-07 2015-01-27 University Of Southern California Systems and methods for identifying parallel documents and sentence fragments in multilingual document collections
US7542893B2 (en) * 2006-05-10 2009-06-02 Xerox Corporation Machine translation using elastic chunks
US9020804B2 (en) * 2006-05-10 2015-04-28 Xerox Corporation Method for aligning sentences at the word level enforcing selective contiguity constraints
US7725306B2 (en) * 2006-06-28 2010-05-25 Microsoft Corporation Efficient phrase pair extraction from bilingual word alignments
US8886518B1 (en) 2006-08-07 2014-11-11 Language Weaver, Inc. System and method for capitalizing machine translated text
US7801721B2 (en) * 2006-10-02 2010-09-21 Google Inc. Displaying original text in a user interface with translated text
US8433556B2 (en) 2006-11-02 2013-04-30 University Of Southern California Semi-supervised training for statistical word alignment
GB2444084A (en) * 2006-11-23 2008-05-28 Sharp Kk Selecting examples in an example based machine translation system
US9122674B1 (en) 2006-12-15 2015-09-01 Language Weaver, Inc. Use of annotations in statistical machine translation
US7783473B2 (en) * 2006-12-28 2010-08-24 At&T Intellectual Property Ii, L.P. Sequence classification for machine translation
US8468149B1 (en) 2007-01-26 2013-06-18 Language Weaver, Inc. Multi-lingual online community
US8788258B1 (en) * 2007-03-15 2014-07-22 At&T Intellectual Property Ii, L.P. Machine translation using global lexical selection and sentence reconstruction
US8615389B1 (en) 2007-03-16 2013-12-24 Language Weaver, Inc. Generation and exploitation of an approximate language model
US8831928B2 (en) * 2007-04-04 2014-09-09 Language Weaver, Inc. Customizable machine translation service
US7983898B2 (en) * 2007-06-08 2011-07-19 Microsoft Corporation Generating a phrase translation model by iteratively estimating phrase translation probabilities
US8825466B1 (en) 2007-06-08 2014-09-02 Language Weaver, Inc. Modification of annotated bilingual segment pairs in syntax-based machine translation
JP5235344B2 (en) * 2007-07-03 2013-07-10 株式会社東芝 Apparatus, method and program for machine translation
CN101802812B (en) * 2007-08-01 2015-07-01 金格软件有限公司 Automatic context sensitive language correction and enhancement using an internet corpus
US8548791B2 (en) * 2007-08-29 2013-10-01 Microsoft Corporation Validation of the consistency of automatic terminology translation
US7983903B2 (en) * 2007-09-07 2011-07-19 Microsoft Corporation Mining bilingual dictionaries from monolingual web pages
US8209164B2 (en) * 2007-11-21 2012-06-26 University Of Washington Use of lexical translations for facilitating searches
US8849665B2 (en) * 2008-01-30 2014-09-30 At&T Intellectual Property I, L.P. System and method of providing machine translation from a source language to a target language
JP5100445B2 (en) * 2008-02-28 2012-12-19 株式会社東芝 Machine translation apparatus and method
US8615388B2 (en) 2008-03-28 2013-12-24 Microsoft Corporation Intra-language statistical machine translation
US8620936B2 (en) * 2008-05-05 2013-12-31 The Boeing Company System and method for a data dictionary
US8504354B2 (en) * 2008-06-02 2013-08-06 Microsoft Corporation Parallel fragment extraction from noisy parallel corpora
US20100049496A1 (en) * 2008-08-22 2010-02-25 Inventec Corporation Word translation enquiry system across multiple thesauri and the method thereof
US9798720B2 (en) 2008-10-24 2017-10-24 Ebay Inc. Hybrid machine translation
US8990064B2 (en) 2009-07-28 2015-03-24 Language Weaver, Inc. Translating documents based on content
US8380486B2 (en) 2009-10-01 2013-02-19 Language Weaver, Inc. Providing machine-generated translations and corresponding trust levels
US10417646B2 (en) 2010-03-09 2019-09-17 Sdl Inc. Predicting the cost associated with translating textual content
US9063931B2 (en) * 2011-02-16 2015-06-23 Ming-Yuan Wu Multiple language translation system
US11003838B2 (en) 2011-04-18 2021-05-11 Sdl Inc. Systems and methods for monitoring post translation editing
US8694303B2 (en) 2011-06-15 2014-04-08 Language Weaver, Inc. Systems and methods for tuning parameters in statistical machine translation
CN102360372B (en) * 2011-10-09 2013-01-30 北京航空航天大学 Cross-language document similarity detection method
US8886515B2 (en) 2011-10-19 2014-11-11 Language Weaver, Inc. Systems and methods for enhancing machine translation post edit review processes
US8942973B2 (en) 2012-03-09 2015-01-27 Language Weaver, Inc. Content page URL translation
US10261994B2 (en) 2012-05-25 2019-04-16 Sdl Inc. Method and system for automatic management of reputation of translators
US9152622B2 (en) 2012-11-26 2015-10-06 Language Weaver, Inc. Personalized machine translation via online adaptation
US8914395B2 (en) 2013-01-03 2014-12-16 Uptodate, Inc. Database query translation system
US9547641B2 (en) * 2013-09-26 2017-01-17 International Business Machines Corporation Domain specific salient point translation
US9213694B2 (en) 2013-10-10 2015-12-15 Language Weaver, Inc. Efficient online domain adaptation
US9940658B2 (en) 2014-02-28 2018-04-10 Paypal, Inc. Cross border transaction machine translation
US9881006B2 (en) 2014-02-28 2018-01-30 Paypal, Inc. Methods for automatic generation of parallel corpora
US9569526B2 (en) 2014-02-28 2017-02-14 Ebay Inc. Automatic machine translation using user feedback
US9530161B2 (en) 2014-02-28 2016-12-27 Ebay Inc. Automatic extraction of multilingual dictionary items from non-parallel, multilingual, semi-structured data
RU2639684C2 (en) 2014-08-29 2017-12-21 Общество С Ограниченной Ответственностью "Яндекс" Text processing method (versions) and constant machine-readable medium (versions)
CN106294507B (en) * 2015-06-10 2020-07-24 华中师范大学 Cross-language viewpoint data classification method and device
US9558182B1 (en) * 2016-01-08 2017-01-31 International Business Machines Corporation Smart terminology marker system for a language translation system
US9400781B1 (en) * 2016-02-08 2016-07-26 International Business Machines Corporation Automatic cognate detection in a computer-assisted language learning system
US10417350B1 (en) 2017-08-28 2019-09-17 Amazon Technologies, Inc. Artificial intelligence system for automated adaptation of text-based classification models for multiple languages
US10740558B2 (en) * 2017-08-31 2020-08-11 Smartcat Llc Translating a current document using a planned workflow associated with a profile of a translator automatically selected by comparing terms in previously translated documents with terms in the current document
US10275462B2 (en) * 2017-09-18 2019-04-30 Sap Se Automatic translation of string collections
US10769386B2 (en) 2017-12-05 2020-09-08 Sap Se Terminology proposal engine for determining target language equivalents
US10599782B2 (en) * 2018-05-21 2020-03-24 International Business Machines Corporation Analytical optimization of translation and post editing
US11222176B2 (en) 2019-05-24 2022-01-11 International Business Machines Corporation Method and system for language and domain acceleration with embedding evaluation
US11386276B2 (en) 2019-05-24 2022-07-12 International Business Machines Corporation Method and system for language and domain acceleration with embedding alignment
US11217227B1 (en) 2019-11-08 2022-01-04 Suki AI, Inc. Systems and methods for generating disambiguated terms in automatically generated transcriptions including instructions within a particular knowledge domain
US11538465B1 (en) 2019-11-08 2022-12-27 Suki AI, Inc. Systems and methods to facilitate intent determination of a command by grouping terms based on context
US20230351172A1 (en) * 2022-04-29 2023-11-02 Intuit Inc. Supervised machine learning method for matching unsupervised data

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5418717A (en) * 1990-08-27 1995-05-23 Su; Keh-Yih Multiple score language processing system
GB9103080D0 (en) * 1991-02-14 1991-04-03 British And Foreign Bible The Analysing textual documents
US5477451A (en) * 1991-07-25 1995-12-19 International Business Machines Corp. Method and system for natural language translation
US5523946A (en) * 1992-02-11 1996-06-04 Xerox Corporation Compact encoding of multi-lingual translation dictionaries
US5510981A (en) * 1993-10-28 1996-04-23 International Business Machines Corporation Language translation apparatus and method using context-based translation models
JPH08329105A (en) * 1995-05-31 1996-12-13 Canon Inc Method and device for processing document
US6061675A (en) * 1995-05-31 2000-05-09 Oracle Corporation Methods and apparatus for classifying terminology utilizing a knowledge catalog
US5680511A (en) * 1995-06-07 1997-10-21 Dragon Systems, Inc. Systems and methods for word recognition
JPH09128396A (en) * 1995-11-06 1997-05-16 Hitachi Ltd Preparation method for bilingual dictionary
JP2987099B2 (en) * 1996-03-27 1999-12-06 株式会社日立国際ビジネス Document creation support system and term dictionary
US5832499A (en) * 1996-07-10 1998-11-03 Survivors Of The Shoah Visual History Foundation Digital library system
US6085162A (en) * 1996-10-18 2000-07-04 Gedanken Corporation Translation system and method in which words are translated by a specialized dictionary and then a general dictionary
US5956711A (en) * 1997-01-16 1999-09-21 Walter J. Sullivan, III Database system with restricted keyword list and bi-directional keyword translation
DE69837979T2 (en) * 1997-06-27 2008-03-06 International Business Machines Corp. System for extracting multilingual terminology
KR980004126A (en) * 1997-12-16 1998-03-30 양승택 Query Language Conversion Apparatus and Method for Searching Multilingual Web Documents
US6092034A (en) * 1998-07-27 2000-07-18 International Business Machines Corporation Statistical translation system and method for fast sense disambiguation and translation of large corpora using fertility models and sense models
US6349276B1 (en) * 1998-10-29 2002-02-19 International Business Machines Corporation Multilingual information retrieval with a transfer corpus
US6330530B1 (en) * 1999-10-18 2001-12-11 Sony Corporation Method and system for transforming a source language linguistic structure into a target language linguistic structure based on example linguistic feature structures

Also Published As

Publication number Publication date
EP1217534A2 (en) 2002-06-26
JP2002222188A (en) 2002-08-09
US6885985B2 (en) 2005-04-26
CA2364999C (en) 2005-05-03
EP1217534A3 (en) 2006-06-07
US20020111789A1 (en) 2002-08-15

Similar Documents

Publication Publication Date Title
CA2364999A1 (en) Method and apparatus for terminology translation
Oard A comparative study of query and document translation for cross-language information retrieval
Levow et al. Dictionary-based techniques for cross-language information retrieval
Resnik et al. The Bible as a parallel corpus: Annotating the ‘Book of 2000 Tongues’
Azmi et al. A text summarizer for Arabic
Hilgendorf The impact of English in Germany
WO1997008604A3 (en) Multilingual document retrieval system and method using semantic vector matching
AU3272499A (en) A translation system and a multifunction computer, particularly for treating texts and translation on paper
Nguyen et al. WikiTranslate: Query translation for cross-lingual information retrieval using only Wikipedia
Kamps et al. Language-dependent and language-independent approaches to cross-lingual text retrieval
Strugnell More on wives and marriage in the Dead Sea scrolls:(" 4Q416" 2 ii 21 [Cf." 1 Thess" 4: 4] and" 4QMMT" § B)
Liang et al. Researching collocational features: Towards China English as a distinctive new variety
Qu et al. The Effect of Pseudo Relevance Feedback on MT-Based CLIR.
Nie et al. Multilingual information retrieval based on parallel texts from the Web
Gey et al. Translingual vocabulary mappings for multilingual information access
Kahla et al. Fine-tuning and multilingual pre-training for abstractive summarization task for the Arabic language
Sakai Advanced technologies for information access
Cook MLSN: A multi-lingual semantic network
Adriani English-Dutch CLIR using query translation techniques
Adafre et al. The University of Amsterdam at CLEF 2004.
Argaw et al. Dictionary-based Amharic–English information retrieval
Lin et al. Foreign name backward transliteration in Chinese-English cross-language image retrieval
Airio et al. UTACLIR@ CLEF 2002—Bilingual and Multilingual Runs with a Unified Process
Chen et al. TopAlign: word alignment for bilingual corpora based on topical clusters of dictionary entries and translations
Nie et al. Using Parallel Web Pages for Multi-lingual IR.

Legal Events

Date Code Title Description
EEER Examination request
MKLA Lapsed

Effective date: 20171211