WO2014190220A3 - Language model trained using predicted queries from statistical machine translation - Google Patents
Language model trained using predicted queries from statistical machine translation Download PDFInfo
- Publication number
- WO2014190220A3 WO2014190220A3 PCT/US2014/039258 US2014039258W WO2014190220A3 WO 2014190220 A3 WO2014190220 A3 WO 2014190220A3 US 2014039258 W US2014039258 W US 2014039258W WO 2014190220 A3 WO2014190220 A3 WO 2014190220A3
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- language model
- smt
- model
- content
- predicted queries
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9032—Query formulation
- G06F16/90332—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/42—Data-driven translation
- G06F40/44—Statistical methods, e.g. probability models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/19—Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
- G10L15/197—Probabilistic grammars, e.g. word n-grams
Abstract
A Statistical Machine Translation (SMT) model (165) is trained using pairs of sentences that include content obtained from one or more content sources (e.g. feed(s)) with corresponding queries that have been used to access the content. A query click graph (130) may be used to assist in determining candidate pairs for the SMT training data. All/portion of the candidate pairs may be used to train the SMT model. After training the SMT model using the SMT training data, the SMT model is applied to content to determine predicted queries (154) that may be used to search for the content. The predicted queries are used to train a language model, such as a query language model. The query language model may be interpolated other language models, such as a background language model, as well as a feed language model trained using the content used in determining the predicted queries.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP14733810.7A EP2941719A2 (en) | 2013-05-24 | 2014-05-23 | Language model trained using predicted queries from statistical machine translation |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/902,470 | 2013-05-24 | ||
US13/902,470 US20140350931A1 (en) | 2013-05-24 | 2013-05-24 | Language model trained using predicted queries from statistical machine translation |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2014190220A2 WO2014190220A2 (en) | 2014-11-27 |
WO2014190220A3 true WO2014190220A3 (en) | 2015-05-14 |
Family
ID=51023074
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2014/039258 WO2014190220A2 (en) | 2013-05-24 | 2014-05-23 | Language model trained using predicted queries from statistical machine translation |
Country Status (3)
Country | Link |
---|---|
US (1) | US20140350931A1 (en) |
EP (1) | EP2941719A2 (en) |
WO (1) | WO2014190220A2 (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10319252B2 (en) | 2005-11-09 | 2019-06-11 | Sdl Inc. | Language capability assessment and training apparatus and techniques |
US10417646B2 (en) | 2010-03-09 | 2019-09-17 | Sdl Inc. | Predicting the cost associated with translating textual content |
US11003838B2 (en) | 2011-04-18 | 2021-05-11 | Sdl Inc. | Systems and methods for monitoring post translation editing |
US10261994B2 (en) | 2012-05-25 | 2019-04-16 | Sdl Inc. | Method and system for automatic management of reputation of translators |
US9213694B2 (en) * | 2013-10-10 | 2015-12-15 | Language Weaver, Inc. | Efficient online domain adaptation |
US10452786B2 (en) * | 2014-12-29 | 2019-10-22 | Paypal, Inc. | Use of statistical flow data for machine translations between different languages |
KR102325724B1 (en) | 2015-02-28 | 2021-11-15 | 삼성전자주식회사 | Synchronization of Text Data among a plurality of Devices |
CN111971679A (en) * | 2018-01-26 | 2020-11-20 | 威盖特技术美国有限合伙人公司 | Generating natural language recommendations based on an industry language model |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009120449A1 (en) * | 2008-03-28 | 2009-10-01 | Microsoft Corporation | Intra-language statistical machine translation |
US20110289063A1 (en) * | 2010-05-21 | 2011-11-24 | Microsoft Corporation | Query Intent in Information Retrieval |
Family Cites Families (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7194455B2 (en) * | 2002-09-19 | 2007-03-20 | Microsoft Corporation | Method and system for retrieving confirming sentences |
US8122030B1 (en) * | 2005-01-14 | 2012-02-21 | Wal-Mart Stores, Inc. | Dual web graph |
WO2006133571A1 (en) * | 2005-06-17 | 2006-12-21 | National Research Council Of Canada | Means and method for adapted language translation |
WO2007076529A2 (en) * | 2005-12-28 | 2007-07-05 | The Trustees Of Columbia University In The City Of New York | A system and method for accessing images with a novel user interface and natural language processing |
US8898052B2 (en) * | 2006-05-22 | 2014-11-25 | Facebook, Inc. | Systems and methods for training statistical speech translation systems from speech utilizing a universal speech recognizer |
US8032356B2 (en) * | 2006-05-25 | 2011-10-04 | University Of Southern California | Spoken translation system using meta information strings |
US9002869B2 (en) * | 2007-06-22 | 2015-04-07 | Google Inc. | Machine translation for query expansion |
US8073803B2 (en) * | 2007-07-16 | 2011-12-06 | Yahoo! Inc. | Method for matching electronic advertisements to surrounding context based on their advertisement content |
US20090182547A1 (en) * | 2008-01-16 | 2009-07-16 | Microsoft Corporation | Adaptive Web Mining of Bilingual Lexicon for Query Translation |
US20090265290A1 (en) * | 2008-04-18 | 2009-10-22 | Yahoo! Inc. | Optimizing ranking functions using click data |
US8918328B2 (en) * | 2008-04-18 | 2014-12-23 | Yahoo! Inc. | Ranking using word overlap and correlation features |
US20100082324A1 (en) * | 2008-09-30 | 2010-04-01 | Microsoft Corporation | Replacing terms in machine translation |
US8306806B2 (en) * | 2008-12-02 | 2012-11-06 | Microsoft Corporation | Adaptive web mining of bilingual lexicon |
US20100191746A1 (en) * | 2009-01-26 | 2010-07-29 | Microsoft Corporation | Competitor Analysis to Facilitate Keyword Bidding |
US20100299132A1 (en) * | 2009-05-22 | 2010-11-25 | Microsoft Corporation | Mining phrase pairs from an unstructured resource |
US8781231B1 (en) * | 2009-08-25 | 2014-07-15 | Google Inc. | Content-based image ranking |
US20120047172A1 (en) * | 2010-08-23 | 2012-02-23 | Google Inc. | Parallel document mining |
US9081760B2 (en) * | 2011-03-08 | 2015-07-14 | At&T Intellectual Property I, L.P. | System and method for building diverse language models |
US8732151B2 (en) * | 2011-04-01 | 2014-05-20 | Microsoft Corporation | Enhanced query rewriting through statistical machine translation |
US9507861B2 (en) * | 2011-04-01 | 2016-11-29 | Microsoft Technolgy Licensing, LLC | Enhanced query rewriting through click log analysis |
US9064006B2 (en) * | 2012-08-23 | 2015-06-23 | Microsoft Technology Licensing, Llc | Translating natural language utterances to keyword search queries |
US9471565B2 (en) * | 2011-07-29 | 2016-10-18 | At&T Intellectual Property I, L.P. | System and method for locating bilingual web sites |
US20130103695A1 (en) * | 2011-10-21 | 2013-04-25 | Microsoft Corporation | Machine translation detection in web-scraped parallel corpora |
US8533148B1 (en) * | 2012-10-01 | 2013-09-10 | Recommind, Inc. | Document relevancy analysis within machine learning systems including determining closest cosine distances of training examples |
US9235567B2 (en) * | 2013-01-14 | 2016-01-12 | Xerox Corporation | Multi-domain machine translation model adaptation |
-
2013
- 2013-05-24 US US13/902,470 patent/US20140350931A1/en not_active Abandoned
-
2014
- 2014-05-23 WO PCT/US2014/039258 patent/WO2014190220A2/en active Application Filing
- 2014-05-23 EP EP14733810.7A patent/EP2941719A2/en not_active Ceased
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009120449A1 (en) * | 2008-03-28 | 2009-10-01 | Microsoft Corporation | Intra-language statistical machine translation |
US20110289063A1 (en) * | 2010-05-21 | 2011-11-24 | Microsoft Corporation | Query Intent in Information Retrieval |
Non-Patent Citations (1)
Title |
---|
STEFAN RIEZLER ET AL: "Statistical Machine Translation for Query Expansion in Answer Retrieval", 23 June 2007 (2007-06-23), XP008126878, Retrieved from the Internet <URL:http://www.stefanriezler.com> [retrieved on 20150220] * |
Also Published As
Publication number | Publication date |
---|---|
WO2014190220A2 (en) | 2014-11-27 |
EP2941719A2 (en) | 2015-11-11 |
US20140350931A1 (en) | 2014-11-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2014190220A3 (en) | Language model trained using predicted queries from statistical machine translation | |
WO2018203147A3 (en) | Multi-lingual semantic parser based on transferred learning | |
AU2017408798A1 (en) | Method and device of analysis based on model, and computer readable storage medium | |
GB2543429A (en) | Machine learning for visual processing | |
BR112017009666A2 (en) | method and device for social platform-based data mining | |
WO2014074925A3 (en) | Providing content recommendation to users on a site | |
MY186909A (en) | Methods for understanding incomplete natural language query | |
MX2015008723A (en) | Data base query translation system. | |
MX2016004667A (en) | Template construction method and apparatus, and information recognition method and apparatus. | |
WO2016029018A3 (en) | Executing constant time relational queries against structured and semi-structured data | |
BR112016028797A2 (en) | session context modeling for conversation understanding systems | |
WO2015170191A3 (en) | Method and apparatus for screening promotion keywords | |
MX2016014071A (en) | Method and apparatus for analyzing media content. | |
WO2014210184A3 (en) | Real-time and adaptive data mining | |
CR20150552A (en) | LANGUAGE LEARNING ENVIRONMENT | |
WO2013188504A3 (en) | Multilingual mixed search method and system | |
WO2014183956A3 (en) | Social media content analysis and output | |
MX2016012272A (en) | Client intent in integrated search environment. | |
WO2012122212A3 (en) | Processing medical records | |
MY194297A (en) | A method and device for providing search engine label | |
SG11201811808VA (en) | Database data modification request processing method and apparatus | |
PH12021550937A1 (en) | Information providing system, information providing method, and data structure of knowledge data | |
TW201612841A (en) | Online learning system, skill evaluation method thereof, and storage media storing the method | |
GB201217354D0 (en) | "At least" operator for combining audio search hits | |
EP2851809A3 (en) | Machine translation apparatus and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14733810 Country of ref document: EP Kind code of ref document: A2 |
|
REEP | Request for entry into the european phase |
Ref document number: 2014733810 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2014733810 Country of ref document: EP |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14733810 Country of ref document: EP Kind code of ref document: A2 |