WO2014190220A3 - Language model trained using predicted queries from statistical machine translation - Google Patents

Language model trained using predicted queries from statistical machine translation Download PDF

Info

Publication number
WO2014190220A3
WO2014190220A3 PCT/US2014/039258 US2014039258W WO2014190220A3 WO 2014190220 A3 WO2014190220 A3 WO 2014190220A3 US 2014039258 W US2014039258 W US 2014039258W WO 2014190220 A3 WO2014190220 A3 WO 2014190220A3
Authority
WO
WIPO (PCT)
Prior art keywords
language model
smt
model
content
predicted queries
Prior art date
Application number
PCT/US2014/039258
Other languages
French (fr)
Other versions
WO2014190220A2 (en
Inventor
Michael Levit
Dilek Hakkani-Tur
Gokhan Tur
Original Assignee
Microsoft Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corporation filed Critical Microsoft Corporation
Priority to EP14733810.7A priority Critical patent/EP2941719A2/en
Publication of WO2014190220A2 publication Critical patent/WO2014190220A2/en
Publication of WO2014190220A3 publication Critical patent/WO2014190220A3/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • G06F16/90332Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
    • G10L15/197Probabilistic grammars, e.g. word n-grams

Abstract

A Statistical Machine Translation (SMT) model (165) is trained using pairs of sentences that include content obtained from one or more content sources (e.g. feed(s)) with corresponding queries that have been used to access the content. A query click graph (130) may be used to assist in determining candidate pairs for the SMT training data. All/portion of the candidate pairs may be used to train the SMT model. After training the SMT model using the SMT training data, the SMT model is applied to content to determine predicted queries (154) that may be used to search for the content. The predicted queries are used to train a language model, such as a query language model. The query language model may be interpolated other language models, such as a background language model, as well as a feed language model trained using the content used in determining the predicted queries.
PCT/US2014/039258 2013-05-24 2014-05-23 Language model trained using predicted queries from statistical machine translation WO2014190220A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP14733810.7A EP2941719A2 (en) 2013-05-24 2014-05-23 Language model trained using predicted queries from statistical machine translation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13/902,470 2013-05-24
US13/902,470 US20140350931A1 (en) 2013-05-24 2013-05-24 Language model trained using predicted queries from statistical machine translation

Publications (2)

Publication Number Publication Date
WO2014190220A2 WO2014190220A2 (en) 2014-11-27
WO2014190220A3 true WO2014190220A3 (en) 2015-05-14

Family

ID=51023074

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2014/039258 WO2014190220A2 (en) 2013-05-24 2014-05-23 Language model trained using predicted queries from statistical machine translation

Country Status (3)

Country Link
US (1) US20140350931A1 (en)
EP (1) EP2941719A2 (en)
WO (1) WO2014190220A2 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10319252B2 (en) 2005-11-09 2019-06-11 Sdl Inc. Language capability assessment and training apparatus and techniques
US10417646B2 (en) 2010-03-09 2019-09-17 Sdl Inc. Predicting the cost associated with translating textual content
US11003838B2 (en) 2011-04-18 2021-05-11 Sdl Inc. Systems and methods for monitoring post translation editing
US10261994B2 (en) 2012-05-25 2019-04-16 Sdl Inc. Method and system for automatic management of reputation of translators
US9213694B2 (en) * 2013-10-10 2015-12-15 Language Weaver, Inc. Efficient online domain adaptation
US10452786B2 (en) * 2014-12-29 2019-10-22 Paypal, Inc. Use of statistical flow data for machine translations between different languages
KR102325724B1 (en) 2015-02-28 2021-11-15 삼성전자주식회사 Synchronization of Text Data among a plurality of Devices
CN111971679A (en) * 2018-01-26 2020-11-20 威盖特技术美国有限合伙人公司 Generating natural language recommendations based on an industry language model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009120449A1 (en) * 2008-03-28 2009-10-01 Microsoft Corporation Intra-language statistical machine translation
US20110289063A1 (en) * 2010-05-21 2011-11-24 Microsoft Corporation Query Intent in Information Retrieval

Family Cites Families (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7194455B2 (en) * 2002-09-19 2007-03-20 Microsoft Corporation Method and system for retrieving confirming sentences
US8122030B1 (en) * 2005-01-14 2012-02-21 Wal-Mart Stores, Inc. Dual web graph
WO2006133571A1 (en) * 2005-06-17 2006-12-21 National Research Council Of Canada Means and method for adapted language translation
WO2007076529A2 (en) * 2005-12-28 2007-07-05 The Trustees Of Columbia University In The City Of New York A system and method for accessing images with a novel user interface and natural language processing
US8898052B2 (en) * 2006-05-22 2014-11-25 Facebook, Inc. Systems and methods for training statistical speech translation systems from speech utilizing a universal speech recognizer
US8032356B2 (en) * 2006-05-25 2011-10-04 University Of Southern California Spoken translation system using meta information strings
US9002869B2 (en) * 2007-06-22 2015-04-07 Google Inc. Machine translation for query expansion
US8073803B2 (en) * 2007-07-16 2011-12-06 Yahoo! Inc. Method for matching electronic advertisements to surrounding context based on their advertisement content
US20090182547A1 (en) * 2008-01-16 2009-07-16 Microsoft Corporation Adaptive Web Mining of Bilingual Lexicon for Query Translation
US20090265290A1 (en) * 2008-04-18 2009-10-22 Yahoo! Inc. Optimizing ranking functions using click data
US8918328B2 (en) * 2008-04-18 2014-12-23 Yahoo! Inc. Ranking using word overlap and correlation features
US20100082324A1 (en) * 2008-09-30 2010-04-01 Microsoft Corporation Replacing terms in machine translation
US8306806B2 (en) * 2008-12-02 2012-11-06 Microsoft Corporation Adaptive web mining of bilingual lexicon
US20100191746A1 (en) * 2009-01-26 2010-07-29 Microsoft Corporation Competitor Analysis to Facilitate Keyword Bidding
US20100299132A1 (en) * 2009-05-22 2010-11-25 Microsoft Corporation Mining phrase pairs from an unstructured resource
US8781231B1 (en) * 2009-08-25 2014-07-15 Google Inc. Content-based image ranking
US20120047172A1 (en) * 2010-08-23 2012-02-23 Google Inc. Parallel document mining
US9081760B2 (en) * 2011-03-08 2015-07-14 At&T Intellectual Property I, L.P. System and method for building diverse language models
US8732151B2 (en) * 2011-04-01 2014-05-20 Microsoft Corporation Enhanced query rewriting through statistical machine translation
US9507861B2 (en) * 2011-04-01 2016-11-29 Microsoft Technolgy Licensing, LLC Enhanced query rewriting through click log analysis
US9064006B2 (en) * 2012-08-23 2015-06-23 Microsoft Technology Licensing, Llc Translating natural language utterances to keyword search queries
US9471565B2 (en) * 2011-07-29 2016-10-18 At&T Intellectual Property I, L.P. System and method for locating bilingual web sites
US20130103695A1 (en) * 2011-10-21 2013-04-25 Microsoft Corporation Machine translation detection in web-scraped parallel corpora
US8533148B1 (en) * 2012-10-01 2013-09-10 Recommind, Inc. Document relevancy analysis within machine learning systems including determining closest cosine distances of training examples
US9235567B2 (en) * 2013-01-14 2016-01-12 Xerox Corporation Multi-domain machine translation model adaptation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009120449A1 (en) * 2008-03-28 2009-10-01 Microsoft Corporation Intra-language statistical machine translation
US20110289063A1 (en) * 2010-05-21 2011-11-24 Microsoft Corporation Query Intent in Information Retrieval

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
STEFAN RIEZLER ET AL: "Statistical Machine Translation for Query Expansion in Answer Retrieval", 23 June 2007 (2007-06-23), XP008126878, Retrieved from the Internet <URL:http://www.stefanriezler.com> [retrieved on 20150220] *

Also Published As

Publication number Publication date
WO2014190220A2 (en) 2014-11-27
EP2941719A2 (en) 2015-11-11
US20140350931A1 (en) 2014-11-27

Similar Documents

Publication Publication Date Title
WO2014190220A3 (en) Language model trained using predicted queries from statistical machine translation
WO2018203147A3 (en) Multi-lingual semantic parser based on transferred learning
AU2017408798A1 (en) Method and device of analysis based on model, and computer readable storage medium
GB2543429A (en) Machine learning for visual processing
BR112017009666A2 (en) method and device for social platform-based data mining
WO2014074925A3 (en) Providing content recommendation to users on a site
MY186909A (en) Methods for understanding incomplete natural language query
MX2015008723A (en) Data base query translation system.
MX2016004667A (en) Template construction method and apparatus, and information recognition method and apparatus.
WO2016029018A3 (en) Executing constant time relational queries against structured and semi-structured data
BR112016028797A2 (en) session context modeling for conversation understanding systems
WO2015170191A3 (en) Method and apparatus for screening promotion keywords
MX2016014071A (en) Method and apparatus for analyzing media content.
WO2014210184A3 (en) Real-time and adaptive data mining
CR20150552A (en) LANGUAGE LEARNING ENVIRONMENT
WO2013188504A3 (en) Multilingual mixed search method and system
WO2014183956A3 (en) Social media content analysis and output
MX2016012272A (en) Client intent in integrated search environment.
WO2012122212A3 (en) Processing medical records
MY194297A (en) A method and device for providing search engine label
SG11201811808VA (en) Database data modification request processing method and apparatus
PH12021550937A1 (en) Information providing system, information providing method, and data structure of knowledge data
TW201612841A (en) Online learning system, skill evaluation method thereof, and storage media storing the method
GB201217354D0 (en) &#34;At least&#34; operator for combining audio search hits
EP2851809A3 (en) Machine translation apparatus and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14733810

Country of ref document: EP

Kind code of ref document: A2

REEP Request for entry into the european phase

Ref document number: 2014733810

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2014733810

Country of ref document: EP

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14733810

Country of ref document: EP

Kind code of ref document: A2