WO1999054869A1 - Adaptation of a speech recognizer for dialectal and linguistic domain variations - Google Patents
Adaptation of a speech recognizer for dialectal and linguistic domain variations Download PDFInfo
- Publication number
- WO1999054869A1 WO1999054869A1 PCT/EP1999/002673 EP9902673W WO9954869A1 WO 1999054869 A1 WO1999054869 A1 WO 1999054869A1 EP 9902673 W EP9902673 W EP 9902673W WO 9954869 A1 WO9954869 A1 WO 9954869A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- speech
- recognizer
- data
- smoothing
- additional
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
Definitions
- the present invention relates to speech recognition systems. More particularly, the invention relates to a generator for generating an adapted speech recognizer. Furthermore the invention also relates to a method of generating such an adapted speech recognizer said method being executed by said generator.
- Speech recognition systems use Hidden Markov Models to capture the statistical properties of acoustic subword units, like e.g. context dependent phones or subphones .
- Hidden Markov Models To capture the statistical properties of acoustic subword units, like e.g. context dependent phones or subphones .
- An overview on this topic may be found for instance in L. Rabiner, A tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceedings of the IEEE, Vol. 77(2), pp. 257-285, 1989 or in X. Huang and Y. Ariki and M. Jack, Hidden Markov Models for Speech Recognition, Information Technology Series, Edinburgh University Press, Edinburgh, 1990.
- N k p ( C l I s k ) ⁇ N(c 2 I v ⁇ k , ⁇ ⁇ k ) (5)
- the mixture component weights ⁇ , the means ⁇ , and the covariance matrices ⁇ are estimated from a large amount of transcribed speech data during the training of the recognizer.
- a well known procedure to solve that problem is the EM-algorithm (illustrated for instance by A. Dempster and N. Laird and D. Rubin, Maximum Likelihood from Incomplete Data via the EM Algorithm, Journal of the Royal Statistical Society, Series B (Methodological), 1977, Vol. 39(1), pp. 1-38), and the markov model parameters ⁇ , A, B are usually estimated by the use of the forward-backward algorithm (illustrated for instance by L. Rabiner, A tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceedings of the IEEE, Vol. 77(2), pp. 257-285, 1989).
- the labelled training data is passed through a binary decision network that separates the contexts into equivalence classes depending on the variations observed in the feature vectors .
- a multi-dimensional Gaussian mixture model is used to model the feature vectors that belong to each class represented by the terminal nodes (leaves) of the decision network. These models are used as initial observation densities in a set of context-dependent, continuous parameter HMM, and are further refined by running the forward-backward algorithm, which converges to a local optimum after a few iterations.
- the total number of both context dependent HMMs and Gaussians is limited by the specification of an upper bound and depends on the amount and contents of the training data
- speaker adaptation techniques like the maximum a posteriori estimation of gaussian mixture observations (MAP adaptation) - refer for instance to J. Gauvain and C. Lee, Maximum a Posteriori Estimation of Multivariate Gaussian Mixture Observations of Markov Chains, IEEE Trans, on Speech and Audio Processing, Vol. 2(2), pp. 291--298, 1994 - or the maximum likelihood linear regression (MLLR adaptation) - refer for instance to C. Leggetter and P. Woodland, Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models, Computer Speech and Language, Vol. 9, pp. 171--185, 1995 - are exploited during the training of the recognizer.
- MAP adaptation maximum a posteriori estimation of gaussian mixture observations
- MLLR adaptation maximum likelihood linear regression
- the invention is based on the objective of a reduction of training effort for individual end users and an improved speaker independent recognition accuracy.
- the objective of the invention is solved by claim 1.
- the generator of an adapted speech recognizer according the teaching of the current application is being based upon a base speech recognizer 201 for a definite but arbitrary base language.
- the generator also comprises an additional speech data corpus 202 used for generation of said adapted speech recognizer.
- Said additional speech data corpus comprises a collection of domain specific speech data and/or dialect - 6 -
- said generator comprises reestimation means 203 for reestimating acoustic model parameters of the base speech recognizer by a speaker adaption technique. Said additional speech data corpus is exploited by said reestimation means for generating the adapted speech recognizer.
- the technique proposed by the current invention thus achieves a significant reduction of training effort for individual end users, an improved speaker independent recognition accuracy for specific domains and dialect speakers, and the rapid development of new data files for speech recognizers in specific environments. Moreover also the recognition rate of non-dialect speakers is also improved.
- speaker adaptation techniques were usually applied to an individual end users speech data and therefore yield in a speaker dependent speech recognizer
- they are applied to a dialect and/or domain specific collection of training data from several speakers. This allows for an improved speaker independent recognition especially (but not solely) for a given dialect and domain and minimizes the individual end users investment to customize the recognizer to their needs .
- Another important aspect of this invention is the reduced effort for the generation of a specific speech recognizer: whereas other commercially available toolkits start from the definition of subword units and/or HMM topologies, and thus require a considerable large amount of training data, the current approach starts from an already trained general purpose speech recognizer.
- the approach of the current teaching offers a scalable recognition accuracy, if dialects and/or specific domains are handled in an integrated speech recognizer. As the current invention is completely independent from the specific dialect and/or specific domain they may be combined in any possible combination.
- additional speech data corpus the additional speech data corpus
- Only few additional domain specific or dialect data is required and besides that it is inexpensive and easy to collect.
- the current invention allows to reduce the time for the upfront training of the recognizer significantly. Therefore it allows for rapid development of new data files for recognizers in specific environments or combination of environments.
- said additional speech data corpus can be collected unsupervised or supervised.
- said acoustic model is a Hidden-Markov-Model (HMM) .
- the current teaching my be applied to the HMM technology. Therefore the HMM approach, one of the most successful techniques in the area of speech recognition, can be further improved with the current teachings .
- said speaker adaption technique is the Maximum-A-Posteriori-adaption (MAP) or the Maximum-Likelihood-Linear-Regression-adaption (MLLR) . - 8 -
- Claim 5 achieves additional benefits.
- smoothing means 204 are introduced for optionally smoothing the reestimated acoustic acoustic model parameters.
- said smoothing means performing a Bayesian smoothing.
- a smoothing factor K from the range 1 to 500 is being suggested.
- Especially the subrange for smoothing factor K of 20 to 60 is proposed.
- Bayesian smoothing has been shown to produce good results in terms of recognition accuracy and performance. Intensive experimentation revealed that a smoothing factor K from the range 1 to 500 accomplishes good results. Especially the subrange for smoothing factor K of 20 to 60 turned out to achieve the best results.
- iteration means 205 for optionally iterating the operation of said reestimation means and for optionally iterating the operation of said smoothing means are suggested.
- the iteration may be based on said reestimated dialect or domain specific acoustic model parameters or based on said base language acoustic model parameters . - 9 -
- This teaching allows for a stepwise approach to the generation of an optimally adapted speech recognizer.
- said iteration means use a modified additional speech data corpus and/or said iteration means use a new smoothing factor value K.
- the iteration process may be based on an enlarged or modified additional speech data corpus. For instance a changed smoothing factor allows to assist the generation process depending on the narrowness of the amount of training data.
- said adapted speech recognizer is speaker independent.
- a method for generating an adapted speech recognizer using a base speech recognizer 201 for a definite but arbitrary base language is suggested.
- Said method comprises a first step 202 of providing an additional speech data corpus.
- Said additional speech data corpus comprises a collection of domain specific - 10 -
- said method comprises a second step 203 of reestimating acoustic model parameters of said base speech recognizer by a speaker adaption technique using said additional speech data corpus.
- said method comprises an optional third step 204 for smoothing the reestimated acoustic model parameters .
- said method comprises an optional fourth step 205 for iterating said first step by providing a modified additional speech data corpus and for iterating said second and third step based on said reestimated acoustic model parameters or based on said base acoustic model parameters .
- said acoustic model is a Hidden Markov Model (HMM) .
- HMM Hidden Markov Model
- said speaker adaption technique is the Maximum-A- Posteriori-adaption (MAP) or the Maximum-Likelihood-Linear- Regression-adaption (MLLR) .
- MAP Maximum-A- Posteriori-adaption
- MLLR Maximum-Likelihood-Linear- Regression-adaption
- said adapted speech recognizer is speaker independent.
- Figure 1 is a diagram reflecting the overall structure of the state-of-the-art adaptation process visualizing the generation of a speaker dependent speech recognizer from a speaker independent speech recognizer of the base language.
- Figure 2 is a diagram reflecting the overall structure of the adaptation process according the current invention visualizing the generation of an improved speaker independent speech recognizer from a speaker independent speech recognizer of the base language.
- Said improved speaker independent speech recognizer may be the basis for further customization generating an improved speaker dependent speech recognizer.
- Figure 3 gives a comparison of the error rates of the baseline recognizer (W) , the standard training procedure (VV-S), and the scalable fastboot method (VV-G) normalized to the error rate of the baseline recognizer (VV) for a German test speaker.
- the starting point is a speech recognizer 101 for a base language which is speaker independent and without specialization to any domain.
- the individual user has to read a predefined enrollment script 103 which is a further input to the reestimation process 102.
- the parameters of the underlying acoustic model are adapted by available speaker adaptation techniques according to the state of the art.
- the result of this generation process is the output of a speaker dependent speech recognizer.
- the current invention is teaching a fast bootstrap (i.e. upfront) procedure for the training of a speech recognizer with improved recognition accuracy; i.e. the current invention is proposing a generation process for an additionally adapted speaker independent speech recognizer based upon a general speech recognizer for the base language. - 13 -
- both accuracy and speed of the recognition system can be significantly improved by explicit modelling of language dialects and orthogonally by the integration of domain specific training data in the modelling process.
- the architecture of the invention allows to improve the recognition system along both of these directions.
- the current invention utilizes the fact that for certain dialects, like e.g. Austrian German or Canadian French, the phonetic contexts are similar in the base language (German or French, resp.), whereas acoustic model parameters differ significantly due to different pronunciations.
- Similar, not well trained acoustic models for specific domains e.g. base domain: office correspondence, specific domain: radiology
- the current invention achieves the reduction of training efforts for individual end users, an improved speaker independent recognition accuracy for specific domains and dialect speakers, and the rapid development of new data files for speech recognizers in specific environments.
- the current invention (called fastboot in the remainder) utilizes the observation that speaker adaptation techniques, like e.g. the maximum a posteriori estimation of gaussian mixture observations (MAP adaptation) or maximum likelihood linear regression (MLLR adaptation) , yield a significant larger improvement in recognition accuracy for dialect speakers than for speakers that use pronunciations observed during the training of the recognizer. According to the current teaching this approach results in improved speaker independent recognition accuracy not only for dialect speakers. These techniques move the output probabilities Bof the HMMs to a speakers particular acoustic space, and thus it is achieved that
- o the main differences between dialect and base language are captured by the output probabilities of the HMMs, o the trained parameters for the base language already provide good initial values for a dialect specific reestimation by the forward-backward algorithm, and o the reestimation of significant contexts from dialect data can be omitted to achieve a fast training procedure.
- Fig. 2 teaching the application of additional speaker adaptation techniques for the upfront training, i.e. for the training before the speech recognizer is personalized to a specific user, of a speech recognizer for a dialect within a base language or for a special domain.
- the current invention suggest to start with base speech recognizer 201 for a base language.
- base speech recognizer 201 for a base language.
- an additional speech data corpus 202 is being provided; the current invention is suggesting the usage of actual speech data not comparable with a dictionary.
- This additional speech data corpus may comprise any collection of domain specific speech data and/or dialect - 15 -
- the speech recognizer for the base language may be already used for an unsupervised collection of the additional speech data.
- the generation process comprises reestimating 203 the acoustic model parameters of said base speech recognizer by one of the available speaker adaption techniques using the additional speech data corpus, thus generating an improved adapted speech recognizer reducing the potential training effort for individual end users and at the same time improving the speaker independent recognition accuracy for specific domains and/or dialect speakers .
- the invention teaches the application of a further smoothing 204 of the reestimated acoustic model parameters.
- Bayesian smoothing is an efficient smoothing technology for that purpose. With respect to Bayesian smoothing good results have been achieved with a smoothing factor k from the range 1 to 500 (see below for more details with respect to the smoothing approach) . Especially the range of 20 to 60 for the smoothing factor k ensued excellent results.
- the current teaching suggests to iterate 205 the above mentioned generation process of reestimating the acoustic model parameters and the smoothing.
- the iteration can be based on the reestimated acoustic model parameters of the previous run or on the base acoustic model parameters.
- the iteration can be based on the decision whether the generated adapted speech recognizer shows sufficient recognition improvement.
- the iteration step may be based for example on a modified additional speech data corpus and/or on the usage of a new smoothing factor value K.
- speaker adaptation techniques were usually applied to an individual end users speech data and therefore yield in a speaker dependent speech recognizer
- they are applied to a dialect and/or domain specific collection of training data from several speakers. This allows for an improved speaker independent recognition especially (but not solely) for a given dialect and domain and minimizes the individual end users investment to customize the recognizer to their needs .
- Another important aspect of this invention is the reduced effort for the generation of a specific speech recognizer: whereas other commercially available toolkits start from the definition of subword units and/or HMM topologies, and thus require a considerable large amount of training data, the current approach starts from an already trained general purpose speech recognizer.
- this invention suggest to optionally apply Bayesian smoothing to the reestimated parameters.
- this invention suggest to use the means ⁇ i7
- c ⁇ c (t)is the sum of all posteriori probabilities t c (t)of the i-th gaussian, at time t, computed from all observed dialect data x t , iVdenotes the total number of mixture
- the constant Jcis referred to as a smoothing factor; it allows for an optimization of the recognition accuracy and depends on the relative amount of dialect training data.
- Figure 3 compares the relative speaker independent error rates achieved with the baseline recognizer.
- Figure 3 shows a comparison of the error rates of the baseline recognizer (VV) , the standard training procedure (VV-S), and the scalable fastboot method (VV-G) normalized to the error rate of the baseline recognizer (VV) for the German test speakers.
- the error rate for the Austrian speakers increases by more than 50 percent, showing the need to improve the recognition accuracy for dialect speakers. Therefore, for the follow up product, ViaVoice Gold (VV-G), only less than 50 hours of speech from approx. hundred native Austrian speakers (50 ⁇ % female, 50 ⁇ % male) have been collected and applied with the fastboot approach for the upfront training of the recognizer according to the current invention.
- Figure 3 compares the results achieved with the fastboot method (VV-G) to the standard training procedure (VV-S), that can be applied if both training corpora are pooled together. It becomes evident that the fastboot method is superior to the standard procedure and yields a 30 percent improvement for the dialect speakers.
- the results for different values of the smoothing factor show that recognition accuracy is scalable, which is an important feature, if an integrated recognizer for base language and dialect (or - orthogonal to this direction - base domain and specific domain) is needed.
- the pooled training corpus of the common recognizer (VV-S) is approx.
- the fastboot approach offers a scalable recognition accuracy, if dialects and/or specific domains are handled in an integrated speech recognizer.
- the fastboot approach uses only few additional domain specific or dialect data which is inexpensive and easy to collect. s * The fastboot approach reduces the time for the upfront training of the recognizer, and therefore allows for the rapid development of new data files for recognizers in specific environments.
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AT99924814T ATE231642T1 (en) | 1998-04-22 | 1999-04-21 | ADAPTATION OF A LANGUAGE RECOGNIZER TO DIALECTIC AND LINGUISTIC FIELD VARIANTS |
DE69905030T DE69905030T2 (en) | 1998-04-22 | 1999-04-21 | ADAPTING A SPEAKER TO DIALECTIC AND LINGUISTIC AREAS |
EP99924814A EP1074019B1 (en) | 1998-04-22 | 1999-04-21 | Adaptation of a speech recognizer for dialectal and linguistic domain variations |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US8265698P | 1998-04-22 | 1998-04-22 | |
US60/082,656 | 1998-04-22 | ||
US6611398A | 1998-04-23 | 1998-04-23 | |
US09/066,113 | 1998-04-23 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1999054869A1 true WO1999054869A1 (en) | 1999-10-28 |
Family
ID=26746379
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP1999/002673 WO1999054869A1 (en) | 1998-04-22 | 1999-04-21 | Adaptation of a speech recognizer for dialectal and linguistic domain variations |
Country Status (6)
Country | Link |
---|---|
EP (1) | EP1074019B1 (en) |
CN (1) | CN1157711C (en) |
AT (1) | ATE231642T1 (en) |
DE (1) | DE69905030T2 (en) |
TW (1) | TW477964B (en) |
WO (1) | WO1999054869A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1136982A2 (en) * | 2000-03-24 | 2001-09-26 | Philips Corporate Intellectual Property GmbH | Generation of a language model and an acoustic model for a speech recognition system |
EP1215653A1 (en) * | 2000-12-18 | 2002-06-19 | Siemens Aktiengesellschaft | Method and system for speech recognition for a small size implement |
US6999925B2 (en) | 2000-11-14 | 2006-02-14 | International Business Machines Corporation | Method and apparatus for phonetic context adaptation for improved speech recognition |
CN102543071A (en) * | 2011-12-16 | 2012-07-04 | 安徽科大讯飞信息科技股份有限公司 | Voice recognition system and method used for mobile equipment |
CN104751844A (en) * | 2015-03-12 | 2015-07-01 | 深圳市富途网络科技有限公司 | Voice identification method and system used for security information interaction |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ATE466361T1 (en) * | 2006-08-11 | 2010-05-15 | Harman Becker Automotive Sys | LANGUAGE RECOGNITION USING A STATISTICAL LANGUAGE MODEL USING SQUARE ROOT SMOOTHING |
CN103839546A (en) * | 2014-03-26 | 2014-06-04 | 合肥新涛信息科技有限公司 | Voice recognition system based on Yangze river and Huai river language family |
CN104766607A (en) * | 2015-03-05 | 2015-07-08 | 广州视源电子科技股份有限公司 | Television program recommendation method and system |
CN106384587B (en) * | 2015-07-24 | 2019-11-15 | 科大讯飞股份有限公司 | A kind of audio recognition method and system |
CN107452403B (en) * | 2017-09-12 | 2020-07-07 | 清华大学 | Speaker marking method |
CN112133290A (en) * | 2019-06-25 | 2020-12-25 | 南京航空航天大学 | Speech recognition method based on transfer learning and aiming at civil aviation air-land communication field |
CN112767961B (en) * | 2021-02-07 | 2022-06-03 | 哈尔滨琦音科技有限公司 | Accent correction method based on cloud computing |
-
1999
- 1999-03-12 TW TW088103857A patent/TW477964B/en not_active IP Right Cessation
- 1999-04-21 DE DE69905030T patent/DE69905030T2/en not_active Expired - Lifetime
- 1999-04-21 CN CNB99805299XA patent/CN1157711C/en not_active Expired - Fee Related
- 1999-04-21 WO PCT/EP1999/002673 patent/WO1999054869A1/en active IP Right Grant
- 1999-04-21 EP EP99924814A patent/EP1074019B1/en not_active Expired - Lifetime
- 1999-04-21 AT AT99924814T patent/ATE231642T1/en not_active IP Right Cessation
Non-Patent Citations (3)
Title |
---|
"BUILDING BASEFORMS FOR A NEW APPLICATION DOMAIN", IBM TECHNICAL DISCLOSURE BULLETIN, vol. 36, no. 4, 1 April 1993 (1993-04-01), pages 93 - 94, XP000364452, ISSN: 0018-8689 * |
DIAKOLOUKAS V ET AL: "Development of dialect-specific speech recognizers using adaptation methods", 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (CAT. NO.97CB36052), 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, MUNICH, GERMANY, 21-24 APRIL 1997, 1997, Los Alamitos, CA, USA, IEEE Comput. Soc. Press, USA, pages 1455 - 1458 vol.2, XP002111686, ISBN: 0-8186-7919-0 * |
HSIAO-WUEN HON ET AL: "TOWARDS SPEECH RECOGNITION WITHOUT VOCABULARY-SPECIFIC TRAINING", PROCEEDINGS OF THE EUROPEAN CONFERENCE ON SPEECH COMMUNICATION AND TECHNOLOGY (EUROSPEECH), PARIS, SEPT. 26 - 28, 1989, vol. 1, no. CONF. 1, 26 September 1989 (1989-09-26), TUBACH J P;MARIANI J J, pages 481 - 484, XP000209672 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1136982A2 (en) * | 2000-03-24 | 2001-09-26 | Philips Corporate Intellectual Property GmbH | Generation of a language model and an acoustic model for a speech recognition system |
EP1136982A3 (en) * | 2000-03-24 | 2004-03-03 | Philips Intellectual Property & Standards GmbH | Generation of a language model and an acoustic model for a speech recognition system |
US6999925B2 (en) | 2000-11-14 | 2006-02-14 | International Business Machines Corporation | Method and apparatus for phonetic context adaptation for improved speech recognition |
EP1215653A1 (en) * | 2000-12-18 | 2002-06-19 | Siemens Aktiengesellschaft | Method and system for speech recognition for a small size implement |
CN102543071A (en) * | 2011-12-16 | 2012-07-04 | 安徽科大讯飞信息科技股份有限公司 | Voice recognition system and method used for mobile equipment |
CN104751844A (en) * | 2015-03-12 | 2015-07-01 | 深圳市富途网络科技有限公司 | Voice identification method and system used for security information interaction |
Also Published As
Publication number | Publication date |
---|---|
TW477964B (en) | 2002-03-01 |
DE69905030D1 (en) | 2003-02-27 |
CN1157711C (en) | 2004-07-14 |
EP1074019A1 (en) | 2001-02-07 |
EP1074019B1 (en) | 2003-01-22 |
ATE231642T1 (en) | 2003-02-15 |
DE69905030T2 (en) | 2003-11-27 |
CN1298533A (en) | 2001-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6999925B2 (en) | Method and apparatus for phonetic context adaptation for improved speech recognition | |
US5995928A (en) | Method and apparatus for continuous spelling speech recognition with early identification | |
US5680510A (en) | System and method for generating and using context dependent sub-syllable models to recognize a tonal language | |
US5862519A (en) | Blind clustering of data with application to speech processing systems | |
US8386254B2 (en) | Multi-class constrained maximum likelihood linear regression | |
JP2002500779A (en) | Speech recognition system using discriminatively trained model | |
EP1022725B1 (en) | Selection of acoustic models using speaker verification | |
JP5660441B2 (en) | Speech recognition apparatus, speech recognition method, and program | |
JPH09152886A (en) | Unspecified speaker mode generating device and voice recognition device | |
EP1074019B1 (en) | Adaptation of a speech recognizer for dialectal and linguistic domain variations | |
US5706397A (en) | Speech recognition system with multi-level pruning for acoustic matching | |
Chen et al. | Automatic transcription of broadcast news | |
Ranjan et al. | Isolated word recognition using HMM for Maithili dialect | |
Williams | Knowing what you don't know: roles for confidence measures in automatic speech recognition | |
Sawant et al. | Isolated spoken Marathi words recognition using HMM | |
EP0562138A1 (en) | Method and apparatus for the automatic generation of Markov models of new words to be added to a speech recognition vocabulary | |
Schlüter et al. | Comparison of optimization methods for discriminative training criteria. | |
CN114360514A (en) | Speech recognition method, apparatus, device, medium, and product | |
Justo et al. | Improving dialogue systems in a home automation environment | |
JP3776391B2 (en) | Multilingual speech recognition method, apparatus, and program | |
Shen et al. | Automatic selection of phonetically distributed sentence sets for speaker adaptation with application to large vocabulary Mandarin speech recognition | |
JP3589044B2 (en) | Speaker adaptation device | |
Mohanty et al. | Isolated Odia digit recognition using HTK: an implementation view | |
EP1205907B1 (en) | Phonetic context adaptation for improved speech recognition | |
Breslin | Generation and combination of complementary systems for automatic speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 99805299.X Country of ref document: CN |
|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CN DE HU IN JP KR PL |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: IN/PCT/2000/00211/DE Country of ref document: IN |
|
NENP | Non-entry into the national phase |
Ref country code: KR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1999924814 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 1999924814 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWG | Wipo information: grant in national office |
Ref document number: 1999924814 Country of ref document: EP |