USRE45289E1 - Selective noise/channel/coding models and recognizers for automatic speech recognition - Google Patents

Selective noise/channel/coding models and recognizers for automatic speech recognition Download PDF

Info

Publication number
USRE45289E1
USRE45289E1 US09/978,250 US97825001A USRE45289E US RE45289 E1 USRE45289 E1 US RE45289E1 US 97825001 A US97825001 A US 97825001A US RE45289 E USRE45289 E US RE45289E
Authority
US
United States
Prior art keywords
noise
speech recognition
models
yield
background noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US09/978,250
Inventor
Randy G. Goldberg
Kenneth H. Rosen
Richard M. Sachs
Joel A. Winthrop
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nuance Communications Inc
AT&T Properties LLC
Original Assignee
AT&T Intellectual Property II LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AT&T Intellectual Property II LP filed Critical AT&T Intellectual Property II LP
Priority to US09/978,250 priority Critical patent/USRE45289E1/en
Assigned to AT&T CORP. reassignment AT&T CORP. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ROSEN, KENNETH H., WINTHROP, JOEL A., III, GOLDBERG, RANDY G., SACHS, RICHARD M.
Application granted granted Critical
Publication of USRE45289E1 publication Critical patent/USRE45289E1/en
Assigned to AT&T PROPERTIES, LLC reassignment AT&T PROPERTIES, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AT&T CORP.
Assigned to AT&T INTELLECTUAL PROPERTY II, L.P. reassignment AT&T INTELLECTUAL PROPERTY II, L.P. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AT&T PROPERTIES, LLC
Assigned to NUANCE COMMUNICATIONS, INC. reassignment NUANCE COMMUNICATIONS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AT&T INTELLECTUAL PROPERTY II, L.P.
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech

Definitions

  • the present invention relates to the robust recognition of speech in noisy environments using specific noise environment models and recognizers, and more particularly, to selective noise/channel/coding models and recognizers for automatic speech recognition.
  • U.S. Pat. No. 5,148,489 issued Sep. 15, 1992 to Erell et al., relates to the preprocessing of noisy speech to minimize the likelihood of errors.
  • the speech is preprocessed by calculating for each vector of speech in the presence of noise an estimate of clean speech. Calculations are accomplished by what is called minimum-mean-log-spectral distance estimations using mixture models and Markov models.
  • the preprocessing calculations rely on the basic assumptions that the clean speech can be modeled because the speech and noise are uncorrelated. As this basic assumption may not be true in all cases, errors may still occur.
  • U.S. Pat. No. 4,933,973, issued Jun. 12, 1990 to Porter relates to the recognition of incoming speech signals in noise.
  • Pre-stored templates of noise-free speech are modified to have the estimated spectral values of noise and the same signal-to-noise ratio as the incoming signal.
  • the templates are compared within a processor by a recognition algorithm.
  • recognition is dependent upon proper modification of the noise-free templates. If modification is incorrectly carried out, errors may still be present in the speech recognition.
  • U.S. Pat. No. 4,720,802 issued Jan. 19, 1988 to Damoulakis et al., relates to a noise compensation arrangement. Speech recognition is carried out by extracting an estimate of the background noise during unknown speech input. The noise estimate is then used to modify pre-stored noiseless speech reference signals for comparison with the unknown speech input. The comparison is accomplished by averaging values and generating sets of probability density signals. Correct recognition of the unknown speech thus relies upon the proper estimation of the background noise and proper selection of the speech reference signals. Improper estimation and selection may cause errors to occur in the speech recognition.
  • the present invention provides a method and an apparatus for robust speech recognition in various noisy environments.
  • the speech recognition system of the present invention is capable of higher performance than currently known methods in both noisy and other environments.
  • the present invention provides noise models, created to handle specific background noises, which can quickly be determined to relate to the background noise of a specific call.
  • the present invention is directed to the robust recognition of speech in noisy environments using specific noise environment models and recognizers.
  • models of various noise environments are created to handle specific background noises.
  • a real-time system then analyzes the background noise of an incoming call, loads the appropriate noise model and performs the speech recognition task with the model.
  • the background noise models themselves, are created for each set of background noise which may be used.
  • Examples of the background noises to be sampled as models would be: city noise, motor vehicle noise, truck noise, airport noise, subway train noise, cellular interference noise, etc.
  • the models need not only be limited to simple background noise.
  • various models may model different channel conditions, different telephone microphone characteristics, various different cellular coding techniques, Internet connections, and other noises associated with the placement of a call wherein speech recognition is to be used.
  • a complete set of sub-word models can be created for each characteristic by mixing different background noise characteristics.
  • models can be created by recording background noise and clean speech separately and later combining the two.
  • models can be created by recording speech with the various background noise environments present.
  • the models can be created using signal processing of recorded speech to alter it as if it had been recorded in the noisy background.
  • Determination of which model to use is determined by the speech recognition apparatus.
  • a sample of the surrounding background environment from where the call is being placed is recorded.
  • the system analyzes the recorded background noise. Different methods of analysis may be used. Once the appropriate noise model has been chosen on the basis of the analysis, speech recognition is performed with the model.
  • the system can also constantly monitor the speech recognition function, and if it is determined that speech recognition is not at an acceptable level, the system can replace the chosen model with another.
  • FIG. 1 illustrates a speech recognition apparatus for the creation, storage and use of various background noise models, according to an embodiment of the present invention.
  • FIG. 2 illustrates a flow chart for determination of the proper noise model to use, according to an embodiment of the present invention.
  • FIG. 3 illustrates a flow chart for robust speech recognition and, if necessary, model replacement, according to an embodiment of the present invention.
  • FIG. 4 illustrates a chart of an example of the selection of an appropriate background noise model to be used in the speech recognition application, according to an embodiment of the present invention.
  • FIGS. 1 to 4 show a speech recognition apparatus and method for robust speech recognition in noisy environments according to an embodiment of the present invention.
  • a hidden Markov model is created to model a specific background noise. When a call is placed, background noise is recorded and analyzed to determine which Markov model is most appropriate to use. Speech recognition is then carried out using the appropriately determined model. If speech recognition is not being performed at an acceptable level, the model may be replaced by another.
  • various background noises 1 , . . . , n, n+1 are recorded using known sound collection devices, such as pick-up microphones 1 , . . . , n, n+1. It is to be understood, of course, that any collection technique, whether known or heretofore to be known, may be used.
  • the various background noises which can be recorded are sounds such as: city noise, traffic noise, airport noise, subway train noise, cellular interference noise, different channel characteristics noise, various different cellular coding techniques noise, Internet connection noise, etc.
  • the various individual background characteristics may also be mixed in infinite variations. For example, cellular channel characteristics noise may be mixed with background traffic noise. It is to be understood, of course, that other more various background noise may also be recorded, what is to be recorded is not to be limited and that any means sufficient for the recordation and/or storage of sound may be used.
  • the recorded background noise is then modeled to create hidden Markov models for use in speech recognizers. Modeling is performed in the modeling device 10 using known modeling techniques.
  • the recorded background noise and pre-labeled speech data are put through algorithms which pick out phonemes creating, in essence, statistical background noise models. As described in this embodiment then, the models are thus created by recording background noise and clean speech separately and later combining the two.
  • models can be created by recording speech with the various background noise environments present.
  • the models can be created using signal processing of the recorded speech to alter it as if it had been recorded in the noisy background.
  • the modeled background noise is then stored in an appropriate storage device 20 .
  • the storage device 20 itself may be located at a central network hub, or it may be reproduced and distributed locally.
  • the various stored background noise models 1 , . . . , n, n+1 are then appropriately accessed from the storage device 20 by a speech recognition unit 30 when a call is placed by the telephone user 40 .
  • the present invention will work equally well with any technique of speech recognition using the background noise models.
  • a call is placed by a user and received by the telephone company in steps 100 and 110 , respectively.
  • the preferred embodiment described herein is in the context of the receipt a simple telephone call, the present invention will work equally well with any speech transmission technique used and thus is not to be limited to the one embodiment.
  • step 120 approximately 2 seconds worth of background noise at the caller's location is recorded and/or monitored. Of course, various lengths of time may be used based upon adequate reception and other factors.
  • Introductory messages, instructions or the like are then played in step 125 . While these messages are being played, the background noise recorded in step 120 is analyzed by the system in step 130 . Even while the messages are being played to the caller, the known technique of echoing cancellation may be used to record and/or monitor further background noise. In explanation, the system will effectively cancel out the messages being played in the recording and/or monitoring of the background noise.
  • Signal information such as the type of signals (ANI, DNIS, SS7 signals, etc.), channel port number, or trunk line number may be used to help restrict what the background noise is, and thus what background noise model would be most suitable.
  • the system may determine that a call received over a particular trunk line number may more likely than not be from India, as that trunk line number is the designated trunk for receiving calls from India.
  • the location of the call may be recognized by the caller's account number, time the call is placed or other known information about the caller and/or the call. Such information could be used as a preliminary indicator of the existence and type of background noise.
  • a series of questions or instructions to be posed to the caller with corresponding answers to be made by the caller may be used. These answers may then be analyzed using each model (or a pre-determined maximum number of models) to determine which models have a higher correct match percentage. For example, the system may carry on a dialog with the caller and instruct the caller to say “NS437W”, “Boston”, and “July 1st”. The system will then analyze each response using the various background noise models. The model(s) with the correct match for each response by the caller can then be used in the speech recognition application. An illustration of the above analysis method is found in FIG. 4 .
  • model n would be chosen for the following speech recognition application.
  • the system may either guess, use more than one model by using more than one speech recognizer, or compare parameters of the call's recorded background noise to parameters contained in each background noise model.
  • the system can store that information in a database.
  • a database of which background noise models are most successful in the proper analysis of the call's background noise can be created and stored. This database can later be accessed when another incoming call is received from the same location. For example, it has previously been determined, and stored in the database, that a call from a particular location should use the city noise background noise model in the speech recognition application, because that model results in the highest percentage of correct speech recognitions. Thus the most appropriate model is used.
  • the system can dynamically update itself by constantly re-analyzing the call's recorded background noise to detect potential changes in the background noise environment.
  • step 140 the most appropriate background noise model is selected and recalled from the storage means 20 .
  • alternative background noise models may be ordered on a standby basis in case speech recognition fails with the selected model. With the most appropriate background noise model having been selected, and other models ordered on standby, the system proceeds in step 150 to the speech recognition application using the selected model.
  • step 160 the selected background noise model is loaded into the speech recognition unit 30 .
  • speech recognition is performed using the chosen model.
  • the speech utterance by the caller can be routed to a preset recognizer with the specific model(s) needed, or the necessary model(s) may be loaded into the speech recognition means 30 .
  • step 180 the correctness of the speech recognition is determined. In this manner then, constant monitoring and adjustment can take place while the call is in progress if necessary.
  • Correctness of the speech recognition in step 180 may be accomplished in several ways. If more than one speech recognizer means 30 is being used, the correct recognition of the speech utterance may be determined by using a voter scheme. That is, each speech recognizer unit 30 , using a set of models with different background noise characteristics, will analyze the speech utterance. A vote determines what analysis is correct. For example, if fifty recognizers determine that “Boston” has been said by the caller, and twenty recognizers determine that “Baltimore” has been said, than the system determines in step 180 that “Boston” must be the correct speech utterance. Alternatively, or in conjunction with the above method, the system can ask the caller to validate the determined speech utterance. For example, the system can prompt the caller by asking “Is this correct?”. A determination of correctness in step 180 can thus be made on a basis of most correct validations by the user and/or lowest rejections (rejections could be set high).
  • step 185 the system returns to step 160 to load a new model, perhaps the model which was previously determined in step 140 to be the next in order.
  • the minimal criteria of correctness may be set at any level deemed appropriate and most often will be experimentally determined on the basis of each individual system and its own separate characteristics.
  • step 180 If the determination in step 180 is that speech recognition is proceeding at an acceptable level, then the system can proceed to carry out the caller's desired functions, as shown in step 190 .
  • the present invention has many advantageous uses.
  • the system is able to provide robust speech recognition in a variety of noisy environments.
  • the present invention works well over a gamut of different noisy environments and is thus easy to implement.
  • the speech recognition system is capable of a higher performance and a lower error rate than current systems. Even when the error rate begins to approach an unacceptable level, the present system automatically corrects itself by switching to a different model(s).

Abstract

An apparatus and method for the robust recognition of speech during a call in a noisy environment is presented. Specific background noise models are created to model various background noises which may interfere in the error free recognition of speech. These background noise models are then used to determine which noise characteristics a particular call has. Once a determination has been made of the background noise in any given call, speech recognition is carried out using the appropriate background noise model.

Description

FIELD OF THE INVENTION
The present invention relates to the robust recognition of speech in noisy environments using specific noise environment models and recognizers, and more particularly, to selective noise/channel/coding models and recognizers for automatic speech recognition.
BACKGROUND INFORMATION
Many of the speech recognition applications in current use today often have difficulty properly recognizing speech in a noisy background environment. Or, if speech recognition applications work well in one noisy background environment, they may not work well in another. That is, when a speaker is speaking into a pick-up microphone/telephone with a background that is filled with extraneous noise, the speech recognition application may incorrectly recognize the speech and is thus prone to error. Thus time and effort is wasted by the speaker and the goals of the speech recognition applications are often not achieved. In telephone applications it is often necessary for a human operator to then again have the speaker repeat what has been previously spoken or attempt to decipher what has been recorded.
Thus, there has been a need for speech recognition applications to be able to correctly assess what has been spoken in a noisy background environment. U.S. Pat. No. 5,148,489, issued Sep. 15, 1992 to Erell et al., relates to the preprocessing of noisy speech to minimize the likelihood of errors. The speech is preprocessed by calculating for each vector of speech in the presence of noise an estimate of clean speech. Calculations are accomplished by what is called minimum-mean-log-spectral distance estimations using mixture models and Markov models. However, the preprocessing calculations rely on the basic assumptions that the clean speech can be modeled because the speech and noise are uncorrelated. As this basic assumption may not be true in all cases, errors may still occur.
U.S. Pat. No. 4,933,973, issued Jun. 12, 1990 to Porter, relates to the recognition of incoming speech signals in noise. Pre-stored templates of noise-free speech are modified to have the estimated spectral values of noise and the same signal-to-noise ratio as the incoming signal. Once modified, the templates are compared within a processor by a recognition algorithm. Thus recognition is dependent upon proper modification of the noise-free templates. If modification is incorrectly carried out, errors may still be present in the speech recognition.
U.S. Pat. No. 4,720,802, issued Jan. 19, 1988 to Damoulakis et al., relates to a noise compensation arrangement. Speech recognition is carried out by extracting an estimate of the background noise during unknown speech input. The noise estimate is then used to modify pre-stored noiseless speech reference signals for comparison with the unknown speech input. The comparison is accomplished by averaging values and generating sets of probability density signals. Correct recognition of the unknown speech thus relies upon the proper estimation of the background noise and proper selection of the speech reference signals. Improper estimation and selection may cause errors to occur in the speech recognition.
Thus, as can be seen, the industry has not yet provided a system of robust speech recognition which can function effectively in various noisy backgrounds.
SUMMARY OF THE INVENTION
In response to the above noted and other deficiencies, the present invention provides a method and an apparatus for robust speech recognition in various noisy environments. Thus the speech recognition system of the present invention is capable of higher performance than currently known methods in both noisy and other environments. Additionally, the present invention provides noise models, created to handle specific background noises, which can quickly be determined to relate to the background noise of a specific call.
To achieve the foregoing, and in accordance with the purposes of the present invention, as embodied and broadly described herein, the present invention is directed to the robust recognition of speech in noisy environments using specific noise environment models and recognizers. Thus models of various noise environments are created to handle specific background noises. A real-time system then analyzes the background noise of an incoming call, loads the appropriate noise model and performs the speech recognition task with the model.
The background noise models, themselves, are created for each set of background noise which may be used. Examples of the background noises to be sampled as models would be: city noise, motor vehicle noise, truck noise, airport noise, subway train noise, cellular interference noise, etc. Obviously, the models need not only be limited to simple background noise. For instance, various models may model different channel conditions, different telephone microphone characteristics, various different cellular coding techniques, Internet connections, and other noises associated with the placement of a call wherein speech recognition is to be used. Further, a complete set of sub-word models can be created for each characteristic by mixing different background noise characteristics.
Actual creation and collection of the models can be accomplished in any known manner, or any manner heretofore to be known, as long as the noise sampled can be loaded into a speech recognizer. For instance, models can be created by recording background noise and clean speech separately and later combining the two. Or, models can be created by recording speech with the various background noise environments present. Or even further, for example, the models can be created using signal processing of recorded speech to alter it as if it had been recorded in the noisy background.
Determination of which model to use is determined by the speech recognition apparatus. At the beginning of a call, a sample of the surrounding background environment from where the call is being placed is recorded. As introductory prompts, or other such messages are being played to the caller, the system analyzes the recorded background noise. Different methods of analysis may be used. Once the appropriate noise model has been chosen on the basis of the analysis, speech recognition is performed with the model. The system can also constantly monitor the speech recognition function, and if it is determined that speech recognition is not at an acceptable level, the system can replace the chosen model with another.
The present invention and its features and advantages will become more apparent from the following detailed description with reference to the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 illustrates a speech recognition apparatus for the creation, storage and use of various background noise models, according to an embodiment of the present invention.
FIG. 2 illustrates a flow chart for determination of the proper noise model to use, according to an embodiment of the present invention.
FIG. 3 illustrates a flow chart for robust speech recognition and, if necessary, model replacement, according to an embodiment of the present invention.
FIG. 4 illustrates a chart of an example of the selection of an appropriate background noise model to be used in the speech recognition application, according to an embodiment of the present invention.
DETAILED DESCRIPTION
FIGS. 1 to 4 show a speech recognition apparatus and method for robust speech recognition in noisy environments according to an embodiment of the present invention. A hidden Markov model is created to model a specific background noise. When a call is placed, background noise is recorded and analyzed to determine which Markov model is most appropriate to use. Speech recognition is then carried out using the appropriately determined model. If speech recognition is not being performed at an acceptable level, the model may be replaced by another.
Referring to FIG. 1, various background noises 1, . . . , n, n+1 are recorded using known sound collection devices, such as pick-up microphones 1, . . . , n, n+1. It is to be understood, of course, that any collection technique, whether known or heretofore to be known, may be used. The various background noises which can be recorded are sounds such as: city noise, traffic noise, airport noise, subway train noise, cellular interference noise, different channel characteristics noise, various different cellular coding techniques noise, Internet connection noise, etc. Of course, the various individual background characteristics may also be mixed in infinite variations. For example, cellular channel characteristics noise may be mixed with background traffic noise. It is to be understood, of course, that other more various background noise may also be recorded, what is to be recorded is not to be limited and that any means sufficient for the recordation and/or storage of sound may be used.
The recorded background noise is then modeled to create hidden Markov models for use in speech recognizers. Modeling is performed in the modeling device 10 using known modeling techniques. In this embodiment, the recorded background noise and pre-labeled speech data are put through algorithms which pick out phonemes creating, in essence, statistical background noise models. As described in this embodiment then, the models are thus created by recording background noise and clean speech separately and later combining the two.
Of course, it is to be recognized that any method capable of creating noises models which can be uploaded into a speech recognizer can be used in the present invention. For instance, models can be created by recording speech with the various background noise environments present. Or, for example, the models can be created using signal processing of the recorded speech to alter it as if it had been recorded in the noisy background.
The modeled background noise is then stored in an appropriate storage device 20. The storage device 20 itself may be located at a central network hub, or it may be reproduced and distributed locally. The various stored background noise models 1, . . . , n, n+1 are then appropriately accessed from the storage device 20 by a speech recognition unit 30 when a call is placed by the telephone user 40. There may, of course, be more than one speech recognition unit 30 used for any given call. Further, the present invention will work equally well with any technique of speech recognition using the background noise models.
Referring to FIG. 2, a call is placed by a user and received by the telephone company in steps 100 and 110, respectively. It is to be recognized, of course, that although the preferred embodiment described herein is in the context of the receipt a simple telephone call, the present invention will work equally well with any speech transmission technique used and thus is not to be limited to the one embodiment. Once the connection has been made, in step 120, approximately 2 seconds worth of background noise at the caller's location is recorded and/or monitored. Of course, various lengths of time may be used based upon adequate reception and other factors. Introductory messages, instructions or the like are then played in step 125. While these messages are being played, the background noise recorded in step 120 is analyzed by the system in step 130. Even while the messages are being played to the caller, the known technique of echoing cancellation may be used to record and/or monitor further background noise. In explanation, the system will effectively cancel out the messages being played in the recording and/or monitoring of the background noise.
Analysis of the background noise may be accomplished by one or more ways. Signal information, such as the type of signals (ANI, DNIS, SS7 signals, etc.), channel port number, or trunk line number may be used to help restrict what the background noise is, and thus what background noise model would be most suitable. For example, the system may determine that a call received over a particular trunk line number may more likely than not be from India, as that trunk line number is the designated trunk for receiving calls from India. Further, the location of the call may be recognized by the caller's account number, time the call is placed or other known information about the caller and/or the call. Such information could be used as a preliminary indicator of the existence and type of background noise.
Alternatively, or in conjunction with the preceding method, a series of questions or instructions to be posed to the caller with corresponding answers to be made by the caller may be used. These answers may then be analyzed using each model (or a pre-determined maximum number of models) to determine which models have a higher correct match percentage. For example, the system may carry on a dialog with the caller and instruct the caller to say “NS437W”, “Boston”, and “July 1st”. The system will then analyze each response using the various background noise models. The model(s) with the correct match for each response by the caller can then be used in the speech recognition application. An illustration of the above analysis method is found in FIG. 4. As can be seen, the analysis of the first response “NS437W” is correctly matched by models 2, 4 and n. However, only models 2 and n correctly matched the second response, and only model n matched all three responses correctly. Thus model n would be chosen for the following speech recognition application.
Also, if the system is unable to definitively decide which model and/or models yield the best performance in the speech recognition application, the system may either guess, use more than one model by using more than one speech recognizer, or compare parameters of the call's recorded background noise to parameters contained in each background noise model.
Once a call from a particular location has been matched to a background noise model, the system can store that information in a database. Thus in step 135, a database of which background noise models are most successful in the proper analysis of the call's background noise can be created and stored. This database can later be accessed when another incoming call is received from the same location. For example, it has previously been determined, and stored in the database, that a call from a particular location should use the city noise background noise model in the speech recognition application, because that model results in the highest percentage of correct speech recognitions. Thus the most appropriate model is used. Of course, the system can dynamically update itself by constantly re-analyzing the call's recorded background noise to detect potential changes in the background noise environment.
Once the call's recorded background noise has been analyzed, or the database has been accessed to determine where the call is coming from and which model is most appropriate, in step 140 the most appropriate background noise model is selected and recalled from the storage means 20. Further, alternative background noise models may be ordered on a standby basis in case speech recognition fails with the selected model. With the most appropriate background noise model having been selected, and other models ordered on standby, the system proceeds in step 150 to the speech recognition application using the selected model.
Referring to FIG. 3, in step 160 the selected background noise model is loaded into the speech recognition unit 30. Here speech recognition is performed using the chosen model. There is more than one method by which the speech recognition can be performed using the background noise model. The speech utterance by the caller can be routed to a preset recognizer with the specific model(s) needed, or the necessary model(s) may be loaded into the speech recognition means 30. In step 180 the correctness of the speech recognition is determined. In this manner then, constant monitoring and adjustment can take place while the call is in progress if necessary.
Correctness of the speech recognition in step 180 may be accomplished in several ways. If more than one speech recognizer means 30 is being used, the correct recognition of the speech utterance may be determined by using a voter scheme. That is, each speech recognizer unit 30, using a set of models with different background noise characteristics, will analyze the speech utterance. A vote determines what analysis is correct. For example, if fifty recognizers determine that “Boston” has been said by the caller, and twenty recognizers determine that “Baltimore” has been said, than the system determines in step 180 that “Boston” must be the correct speech utterance. Alternatively, or in conjunction with the above method, the system can ask the caller to validate the determined speech utterance. For example, the system can prompt the caller by asking “Is this correct?”. A determination of correctness in step 180 can thus be made on a basis of most correct validations by the user and/or lowest rejections (rejections could be set high).
If the minimal criteria of correctness is not met, and thus the most appropriate background noise model loaded in step 160 is determined to be an unsuitable choice, a new model can be loaded. Thus in step 185, the system returns to step 160 to load a new model, perhaps the model which was previously determined in step 140 to be the next in order. The minimal criteria of correctness may be set at any level deemed appropriate and most often will be experimentally determined on the basis of each individual system and its own separate characteristics.
If the determination in step 180 is that speech recognition is proceeding at an acceptable level, then the system can proceed to carry out the caller's desired functions, as shown in step 190.
As such, the present invention has many advantageous uses. For instance, the system is able to provide robust speech recognition in a variety of noisy environments. In other words, the present invention works well over a gamut of different noisy environments and is thus easy to implement. Not only that, but the speech recognition system is capable of a higher performance and a lower error rate than current systems. Even when the error rate begins to approach an unacceptable level, the present system automatically corrects itself by switching to a different model(s).
It is to be understood and expected that variations in the principles of construction and methodology herein disclosed in an embodiment may be made by one skilled in the art and it is intended that such modifications, changes, and substitutions are to be included within the scope of the present invention.

Claims (42)

What is claimed is:
1. A method for the robust recognition of speech in a noisy environment, comprising the steps of:
receiving the speech;
recording an amount of data related to the a noisy environment, to yield recorded data;
analyzing the recorded data;
selecting at least one appropriate a background noise model on the basis of based on the recorded data, to yield a selected background noise model; and
performing speech recognition with the at least one selected background noise model.
2. The method according to of claim 1, further comprising the step of:
modeling at least one a background noise in a the noisy environment to create at least one the background noise model.
3. The method according to of claim 1, further comprising the step of:
determining the a correctness of the at least one selected background noise model, wherein if when the at least one selected background noise model is determined to be incorrect, the method comprises loading at least one other another background noise model for use in the step of performing speech recognition.
4. The method according to of claim 1, further comprising the step of:
constructing a background noise database for use in analyzing the recorded data on the noisy environment.
5. The method according to of claim 4, wherein the background noise database is dynamically updated for each location from which data is recorded.
6. The method according to of claim 1, wherein the step of analyzing the recorded data is accomplished by using at least one of a plurality of signal information.
7. The method according to of claim 1, wherein the step of analyzing the recorded data is accomplished by using a correct match percentage for a plurality of background noise models determined by an input response.
8. The method according to of claim 1, wherein the step of performing speech recognition is accomplished by at least one a recognizer.
9. A method for improving recognition of speech subjected to noise, the method comprising the steps of:
sampling a connection noise to yield sampled connection noise;
searching a database for a noise model most closely matching that matches the sampled connection noise to yield a matching noise model; and
applying the most closely matching noise model to a speech recognition process.
10. The method according to of claim 9, wherein the connection noise includes at least comprises one of city noise, motor vehicle noise, truck noise, traffic noise, airport noise, subway train noise, cellular interference noise, channel condition noise, telephone microphone characteristics noise, cellular coding noise, and Internet network connection noise.
11. The method according to of claim 9, wherein the noise model is constructed by modeling at least one the connection noise.
12. The method according to of claim 9, wherein when a speech recognition error rate is determined to be above a predetermined level, the system substitutes the applied method further comprises substituting the matching noise model by applying at least one other a second noise model.
13. The method according to of claim 9, wherein at least one a speech recognition unit is used when applying the matching noise model.
14. A speech recognition apparatus comprising:
a speech recognizer;
a database having stored thereon templates of a plurality of background noises; and
an identifier that identifies, via a processor, a background noise template from the plurality of background noise templates, the background noise template matching a background noise from an input signal, to yield a matching background noise template, wherein the speech recognizer recognizes speech from the input signal with reference to the matching background noise template.
15. The speech recognition apparatus of claim 14, wherein the identifier compares hidden Markov models of the plurality of background noise templates to a hidden Markov model of the background noise from the input signal.
16. The speech recognition apparatus of claim 14, wherein the identifier identifies a portion of the input signal that is unlikely to contain speech, to yield an identified portion, wherein the identified portion is used as the background noise.
17. The speech recognition apparatus of claim 14, wherein the identifier, when a plurality of background noise templates match the background noise, selects a template selected in a prior iteration as the matching background noise template.
18. The speech recognition apparatus of claim 14, further comprising:
a restrictor that restricts a number of candidate templates based on geographic information associated with the input signal;
a comparer that compares the background noise to the restricted candidate templates to yield a comparison; and
a selector that selects the matching background noise template based on the comparison.
19. The speech recognition apparatus of claim 14, further comprising:
a restrictor that restricts a number of candidate templates based on time of day information associated with the input signal to yield restricted candidate templates;
a comparer that compares the background noise to the restricted candidate templates to yield a comparison; and
a selector that selects the matching background noise template based on the comparison.
20. The speech recognition apparatus of claim 14, further comprising:
a restrictor that restricts a number of candidate templates based on an identifier of a user at a location from which the input signal is captured to yield restricted candidate templates;
a comparer that compares the background noise to the restricted candidate templates to yield a comparison; and
a selector that selects the matching background noise template based on the comparison.
21. The speech recognition apparatus of claim 14, further comprising a microphone to capture the input signal.
22. The speech recognition apparatus of claim 14, further comprising a telephone to capture the input signal.
23. A speech recognition apparatus comprising:
a database having stored thereon templates of a plurality of background noises; and
a controller that identifies a background noise template, from the templates of the plurality of background noise templates, that matches background noise from a received input signal, to yield a matching background noise template, and supplies the matching background noise template to a speech recognizer.
24. The speech recognition apparatus of claim 23, further comprising the speech recognizer.
25. The speech recognition apparatus of claim 23, further comprising a microphone to capture the input signal.
26. The speech recognition apparatus of claim 23, further comprising a telephone to capture the input signal.
27. A method comprising:
sampling a noise signal to yield a sampled noise signal;
searching a database for a noise model matching the sampled noise signal to yield a matching noise model; and
applying the matching noise model to a speech recognition process.
28. The method of claim 27, wherein the searching comprises comparing hidden Markov models in the database to a hidden Markov model of the sampled noise signal.
29. The method of claim 27, further comprising, prior to the sampling, isolating the noise signal from an input signal.
30. The method of claim 27, further comprising, when a plurality of stored noise models match the sampled noise signal, selecting one of the plurality of stored noise models as the matching noise model according to a selection made in a prior iteration.
31. The method of claim 27, wherein the searching comprises:
restricting a set of candidate noise models based on geographic information associated with the sampled noise signal, to yield a restricted set of candidate noise models;
comparing the sampled noise signal to the restricted set of candidate noise models, to yield a comparison; and
selecting the matching noise model based on the comparison.
32. The method of claim 27, wherein the searching comprises:
restricting a set of candidate noise models based on time of day information associated with the sampled noise signal, to yield a restricted set of candidate noise models;
comparing the sampled noise signal to the restricted set of candidate noise models, to yield a comparison; and
selecting the matching noise model based on the comparison.
33. The method of claim 27, wherein the searching comprises:
restricting a set of candidate noise models based on an identifier of a user at a location from which the sampled noise signal is captured, to yield a restricted set of candidate noise models;
comparing the sampled noise signal to the restricted set of candidate noise models, to yield a comparison; and
selecting the matching noise model based on the comparison.
34. A speech recognition method, comprising:
identifying a background noise component from an input signal;
comparing the background noise component to a plurality of previously-stored noise models, to yield a comparison;
selecting a noise model from the plurality of previously-stored noise models based on the comparison, to yield a selected noise model; and
performing speech recognition on the input signal with reference to the selected noise model.
35. The speech recognition method of claim 34, further comprising:
identifying a subsequent background noise component from the input signal;
comparing the subsequent background noise component to the plurality of previously-stored noise models, to yield a second comparison;
selecting a second noise model from the plurality of previously-stored noise models based on the second comparison, to yield a second selected noise model; and
performing speech recognition on the input signal with reference to second selected noise model.
36. The speech recognition method of claim 34, further comprising:
when speech recognition fails, selecting a second noise model from the plurality of previously-stored noise models based on the second comparison, to yield a second selected noise model; and
performing speech recognition on the input signal with reference to the second selected noise model.
37. The speech recognition method of claim 34, further comprising, wherein the identifying occurs while prompting a user with an introductory message.
38. The speech recognition method of claim 34, wherein the comparing uses hidden Markov models of the plurality of previously-stored noise models and a hidden Markov model of the background noise component.
39. The speech recognition method of claim 34, further comprising, when a plurality of noise models from the plurality of previously-stored noise models match the background noise component, selecting one of the plurality of previously-stored noise models as a most closely matching noise model according to a selection made in a prior iteration.
40. The speech recognition method of claim 34, wherein the comparing and selecting comprise:
restricting a set of candidate noise models based on geographic information associated with the background noise component, to yield a restricted set of candidate noise models;
comparing the background noise component to the restricted set of candidate noise models, to yield a second comparison; and
selecting the matching noise model based on the second comparison.
41. The speech recognition method of claim 34, wherein the comparing and selecting comprise:
restricting a set of candidate noise models based on time of day information associated with the background noise component, to yield a restricted set of candidate noise models;
comparing the background noise component to the restricted set of candidate noise models, to yield a second comparison; and
selecting the matching noise model based on the second comparison.
42. The speech recognition method of claim 34, wherein the comparing and selection comprise:
restricting a set of candidate noise models based on an identifier of a user at a location from which the input signal is captured, to yield a restricted set of candidate noise models;
comparing the background noise component to the restricted set of candidate noise models, to yield a second comparison; and
selecting a closely matching noise model based on the second comparison.
US09/978,250 1997-11-25 2001-10-17 Selective noise/channel/coding models and recognizers for automatic speech recognition Expired - Lifetime USRE45289E1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/978,250 USRE45289E1 (en) 1997-11-25 2001-10-17 Selective noise/channel/coding models and recognizers for automatic speech recognition

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US08/978,527 US5970446A (en) 1997-11-25 1997-11-25 Selective noise/channel/coding models and recognizers for automatic speech recognition
US09/978,250 USRE45289E1 (en) 1997-11-25 2001-10-17 Selective noise/channel/coding models and recognizers for automatic speech recognition

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US08/978,527 Reissue US5970446A (en) 1997-11-25 1997-11-25 Selective noise/channel/coding models and recognizers for automatic speech recognition

Publications (1)

Publication Number Publication Date
USRE45289E1 true USRE45289E1 (en) 2014-12-09

Family

ID=25526176

Family Applications (2)

Application Number Title Priority Date Filing Date
US08/978,527 Ceased US5970446A (en) 1997-11-25 1997-11-25 Selective noise/channel/coding models and recognizers for automatic speech recognition
US09/978,250 Expired - Lifetime USRE45289E1 (en) 1997-11-25 2001-10-17 Selective noise/channel/coding models and recognizers for automatic speech recognition

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US08/978,527 Ceased US5970446A (en) 1997-11-25 1997-11-25 Selective noise/channel/coding models and recognizers for automatic speech recognition

Country Status (1)

Country Link
US (2) US5970446A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140324428A1 (en) * 2013-04-30 2014-10-30 Ebay Inc. System and method of improving speech recognition using context
US20160336025A1 (en) * 2014-05-16 2016-11-17 Alphonso Inc. Efficient apparatus and method for audio signature generation using recognition history
US20170213549A1 (en) * 2016-01-21 2017-07-27 Ford Global Technologies, Llc Dynamic Acoustic Model Switching to Improve Noisy Speech Recognition

Families Citing this family (191)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6233556B1 (en) * 1998-12-16 2001-05-15 Nuance Communications Voice processing and verification system
US6377922B2 (en) * 1998-12-29 2002-04-23 At&T Corp. Distributed recognition system having multiple prompt-specific and response-specific speech recognizers
US6275800B1 (en) * 1999-02-23 2001-08-14 Motorola, Inc. Voice recognition system and method
US6324499B1 (en) * 1999-03-08 2001-11-27 International Business Machines Corp. Noise recognizer for speech recognition systems
SE521465C2 (en) * 1999-06-07 2003-11-04 Ericsson Telefon Ab L M Mobile phone with speech recognition system containing a spectral distance calculator.
DE60018696T2 (en) * 1999-07-01 2006-04-06 Koninklijke Philips Electronics N.V. ROBUST LANGUAGE PROCESSING OF CHARACTERED LANGUAGE MODELS
JP3969908B2 (en) * 1999-09-14 2007-09-05 キヤノン株式会社 Voice input terminal, voice recognition device, voice communication system, and voice communication method
US6721701B1 (en) * 1999-09-20 2004-04-13 Lucent Technologies Inc. Method and apparatus for sound discrimination
DE10006240A1 (en) 2000-02-11 2001-08-16 Bsh Bosch Siemens Hausgeraete Electric cooking appliance controlled by voice commands has noise correction provided automatically by speech processing device when noise source is switched on
US8645137B2 (en) 2000-03-16 2014-02-04 Apple Inc. Fast, language-independent method for user authentication by voice
US6728671B1 (en) * 2000-03-29 2004-04-27 Lucent Technologies Inc. Automatic speech recognition caller input rate control
DE10041456A1 (en) * 2000-08-23 2002-03-07 Philips Corp Intellectual Pty Method for controlling devices using voice signals, in particular in motor vehicles
US7212969B1 (en) * 2000-09-29 2007-05-01 Intel Corporation Dynamic generation of voice interface structure and voice content based upon either or both user-specific contextual information and environmental information
US7047197B1 (en) * 2000-09-29 2006-05-16 Intel Corporation Changing characteristics of a voice user interface
US7451085B2 (en) * 2000-10-13 2008-11-11 At&T Intellectual Property Ii, L.P. System and method for providing a compensated speech recognition model for speech recognition
US7457750B2 (en) * 2000-10-13 2008-11-25 At&T Corp. Systems and methods for dynamic re-configurable speech recognition
JP4244514B2 (en) * 2000-10-23 2009-03-25 セイコーエプソン株式会社 Speech recognition method and speech recognition apparatus
US8135589B1 (en) 2000-11-30 2012-03-13 Google Inc. Performing speech recognition over a network and using speech recognition results
US6915262B2 (en) 2000-11-30 2005-07-05 Telesector Resources Group, Inc. Methods and apparatus for performing speech recognition and using speech recognition results
US6823306B2 (en) 2000-11-30 2004-11-23 Telesector Resources Group, Inc. Methods and apparatus for generating, updating and distributing speech recognition models
JP3912003B2 (en) * 2000-12-12 2007-05-09 株式会社日立製作所 Communication device
US6876968B2 (en) * 2001-03-08 2005-04-05 Matsushita Electric Industrial Co., Ltd. Run time synthesizer adaptation to improve intelligibility of synthesized speech
US7209880B1 (en) 2001-03-20 2007-04-24 At&T Corp. Systems and methods for dynamic re-configurable speech recognition
DE10124762B4 (en) * 2001-05-21 2004-07-15 Siemens Ag Method for training and operating a speech recognizer and speech recognizer with noise identification
US6996525B2 (en) * 2001-06-15 2006-02-07 Intel Corporation Selecting one of multiple speech recognizers in a system based on performance predections resulting from experience
US6950796B2 (en) * 2001-11-05 2005-09-27 Motorola, Inc. Speech recognition by dynamical noise model adaptation
US7165028B2 (en) * 2001-12-12 2007-01-16 Texas Instruments Incorporated Method of speech recognition resistant to convolutive distortion and additive distortion
US6772118B2 (en) * 2002-01-04 2004-08-03 General Motors Corporation Automated speech recognition filter
US6934364B1 (en) * 2002-02-28 2005-08-23 Hewlett-Packard Development Company, L.P. Handset identifier using support vector machines
AU2003209821B2 (en) * 2002-03-13 2006-11-16 Hear Ip Pty Ltd A method and system for controlling potentially harmful signals in a signal arranged to convey speech
AUPS102902A0 (en) * 2002-03-13 2002-04-11 Hearworks Pty Ltd A method and system for reducing potentially harmful noise in a signal arranged to convey speech
US7224981B2 (en) * 2002-06-20 2007-05-29 Intel Corporation Speech recognition of mobile devices
US7181392B2 (en) * 2002-07-16 2007-02-20 International Business Machines Corporation Determining speech recognition accuracy
JP4352790B2 (en) * 2002-10-31 2009-10-28 セイコーエプソン株式会社 Acoustic model creation method, speech recognition device, and vehicle having speech recognition device
DE10251113A1 (en) * 2002-11-02 2004-05-19 Philips Intellectual Property & Standards Gmbh Voice recognition method, involves changing over to noise-insensitive mode and/or outputting warning signal if reception quality value falls below threshold or noise value exceeds threshold
DE10305369B4 (en) * 2003-02-10 2005-05-19 Siemens Ag User-adaptive method for noise modeling
US9106526B2 (en) 2003-03-21 2015-08-11 Hewlett-Packard Development Company, L.P. Traversing firewalls
JP3836815B2 (en) * 2003-05-21 2006-10-25 インターナショナル・ビジネス・マシーンズ・コーポレーション Speech recognition apparatus, speech recognition method, computer-executable program and storage medium for causing computer to execute speech recognition method
KR101058003B1 (en) * 2004-02-11 2011-08-19 삼성전자주식회사 Noise-adaptive mobile communication terminal device and call sound synthesis method using the device
FR2871978B1 (en) * 2004-06-16 2006-09-22 Alcatel Sa METHOD FOR PROCESSING SOUND SIGNALS FOR A COMMUNICATION TERMINAL AND COMMUNICATION TERMINAL USING THE SAME
KR100745976B1 (en) * 2005-01-12 2007-08-06 삼성전자주식회사 Method and apparatus for classifying voice and non-voice using sound model
US8175877B2 (en) * 2005-02-02 2012-05-08 At&T Intellectual Property Ii, L.P. Method and apparatus for predicting word accuracy in automatic speech recognition systems
US20070041589A1 (en) * 2005-08-17 2007-02-22 Gennum Corporation System and method for providing environmental specific noise reduction algorithms
US8677377B2 (en) 2005-09-08 2014-03-18 Apple Inc. Method and apparatus for building an intelligent automated assistant
US7729911B2 (en) * 2005-09-27 2010-06-01 General Motors Llc Speech recognition method and system
US7872574B2 (en) * 2006-02-01 2011-01-18 Innovation Specialists, Llc Sensory enhancement systems and methods in personal electronic devices
JP4245617B2 (en) * 2006-04-06 2009-03-25 株式会社東芝 Feature amount correction apparatus, feature amount correction method, and feature amount correction program
JP4316583B2 (en) * 2006-04-07 2009-08-19 株式会社東芝 Feature amount correction apparatus, feature amount correction method, and feature amount correction program
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
ATE403928T1 (en) * 2006-12-14 2008-08-15 Harman Becker Automotive Sys VOICE DIALOGUE CONTROL BASED ON SIGNAL PREPROCESSING
US8977255B2 (en) 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US7983916B2 (en) * 2007-07-03 2011-07-19 General Motors Llc Sampling rate independent speech recognition
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US8468019B2 (en) * 2008-01-31 2013-06-18 Qnx Software Systems Limited Adaptive noise modeling speech recognition system
US8996376B2 (en) 2008-04-05 2015-03-31 Apple Inc. Intelligent text-to-speech conversion
US8504365B2 (en) * 2008-04-11 2013-08-06 At&T Intellectual Property I, L.P. System and method for detecting synthetic speaker verification
US8121837B2 (en) 2008-04-24 2012-02-21 Nuance Communications, Inc. Adjusting a speech engine for a mobile computing device based on background noise
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US20100030549A1 (en) 2008-07-31 2010-02-04 Lee Michael M Mobile device having human language translation capability with positional feedback
JP2010183289A (en) * 2009-02-04 2010-08-19 Seiko Epson Corp Mobile terminal and management system
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US20120311585A1 (en) 2011-06-03 2012-12-06 Apple Inc. Organizing task items that represent tasks to perform
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US9858925B2 (en) * 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US9431006B2 (en) 2009-07-02 2016-08-30 Apple Inc. Methods and apparatuses for automatic speech recognition
US8731475B1 (en) * 2009-12-30 2014-05-20 Sprint Spectrum L.P. Method and system for determining environmental characteristics of a called communication device
US8600743B2 (en) * 2010-01-06 2013-12-03 Apple Inc. Noise profile determination for voice-related feature
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
EP2362620A1 (en) * 2010-02-23 2011-08-31 Vodafone Holding GmbH Method of editing a noise-database and computer device
US8682667B2 (en) 2010-02-25 2014-03-25 Apple Inc. User profiling for selecting user specific voice input processing information
US8639516B2 (en) * 2010-06-04 2014-01-28 Apple Inc. User-specific noise suppression for voice quality improvements
US8234111B2 (en) * 2010-06-14 2012-07-31 Google Inc. Speech and noise models for speech recognition
US8725506B2 (en) * 2010-06-30 2014-05-13 Intel Corporation Speech audio processing
US8812310B2 (en) * 2010-08-22 2014-08-19 King Saud University Environment recognition of audio input
US9443511B2 (en) 2011-03-04 2016-09-13 Qualcomm Incorporated System and method for recognizing environmental sound
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US9760566B2 (en) 2011-03-31 2017-09-12 Microsoft Technology Licensing, Llc Augmented conversational understanding agent to identify conversation context between two humans and taking an agent action thereof
US9842168B2 (en) 2011-03-31 2017-12-12 Microsoft Technology Licensing, Llc Task driven user intents
US20150149167A1 (en) * 2011-03-31 2015-05-28 Google Inc. Dynamic selection among acoustic transforms
US10642934B2 (en) 2011-03-31 2020-05-05 Microsoft Technology Licensing, Llc Augmented conversational understanding architecture
US9244984B2 (en) 2011-03-31 2016-01-26 Microsoft Technology Licensing, Llc Location based conversational understanding
US9298287B2 (en) 2011-03-31 2016-03-29 Microsoft Technology Licensing, Llc Combined activation for natural user interface systems
EP2691877A4 (en) * 2011-03-31 2015-06-24 Microsoft Technology Licensing Llc Conversational dialog learning and correction
US9858343B2 (en) 2011-03-31 2018-01-02 Microsoft Technology Licensing Llc Personalization of queries, conversations, and searches
US9064006B2 (en) 2012-08-23 2015-06-23 Microsoft Technology Licensing, Llc Translating natural language utterances to keyword search queries
US9454962B2 (en) 2011-05-12 2016-09-27 Microsoft Technology Licensing, Llc Sentence simplification for spoken language understanding
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US8994660B2 (en) 2011-08-29 2015-03-31 Apple Inc. Text correction processing
US8438023B1 (en) 2011-09-30 2013-05-07 Google Inc. Warning a user when voice input to a device is likely to fail because of background or other noise
US8972256B2 (en) * 2011-10-17 2015-03-03 Nuance Communications, Inc. System and method for dynamic noise adaptation for robust automatic speech recognition
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9280610B2 (en) 2012-05-14 2016-03-08 Apple Inc. Crowd sourcing information to fulfill user requests
US9721563B2 (en) 2012-06-08 2017-08-01 Apple Inc. Name recognition system
EP2867890B1 (en) * 2012-06-28 2018-04-25 Nuance Communications, Inc. Meta-data inputs to front end processing for automatic speech recognition
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
RU2642353C2 (en) * 2012-09-03 2018-01-24 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Device and method for providing informed probability estimation and multichannel speech presence
US8484017B1 (en) * 2012-09-10 2013-07-09 Google Inc. Identifying media content
US20140074466A1 (en) 2012-09-10 2014-03-13 Google Inc. Answering questions using environmental context
US9547647B2 (en) 2012-09-19 2017-01-17 Apple Inc. Voice-based media searching
US9691377B2 (en) 2013-07-23 2017-06-27 Google Technology Holdings LLC Method and device for voice recognition training
US9098467B1 (en) * 2012-12-19 2015-08-04 Rawles Llc Accepting voice commands based on user identity
CN103065631B (en) * 2013-01-24 2015-07-29 华为终端有限公司 A kind of method of speech recognition, device
CN103971680B (en) * 2013-01-24 2018-06-05 华为终端(东莞)有限公司 A kind of method, apparatus of speech recognition
US9275638B2 (en) 2013-03-12 2016-03-01 Google Technology Holdings LLC Method and apparatus for training a voice recognition model database
US9489965B2 (en) * 2013-03-15 2016-11-08 Sri International Method and apparatus for acoustic signal characterization
US9208781B2 (en) * 2013-04-05 2015-12-08 International Business Machines Corporation Adapting speech recognition acoustic models with environmental and social cues
US9437208B2 (en) * 2013-06-03 2016-09-06 Adobe Systems Incorporated General sound decomposition models
WO2014197336A1 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
WO2014197334A2 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
WO2014197335A1 (en) 2013-06-08 2014-12-11 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
CN105264524B (en) 2013-06-09 2019-08-02 苹果公司 For realizing the equipment, method and graphic user interface of the session continuity of two or more examples across digital assistants
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US20150032238A1 (en) 2013-07-23 2015-01-29 Motorola Mobility Llc Method and Device for Audio Input Routing
US9548047B2 (en) 2013-07-31 2017-01-17 Google Technology Holdings LLC Method and apparatus for evaluating trigger phrase enrollment
US9704478B1 (en) * 2013-12-02 2017-07-11 Amazon Technologies, Inc. Audio output masking for improved automatic speech recognition
US9466310B2 (en) * 2013-12-20 2016-10-11 Lenovo Enterprise Solutions (Singapore) Pte. Ltd. Compensating for identifiable background content in a speech recognition device
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
EP3149728B1 (en) 2014-05-30 2019-01-16 Apple Inc. Multi-command single utterance input method
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9904851B2 (en) 2014-06-11 2018-02-27 At&T Intellectual Property I, L.P. Exploiting visual information for enhancing audio signals via source separation and beamforming
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
JP6118838B2 (en) * 2014-08-21 2017-04-19 本田技研工業株式会社 Information processing apparatus, information processing system, information processing method, and information processing program
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9530408B2 (en) 2014-10-31 2016-12-27 At&T Intellectual Property I, L.P. Acoustic environment recognizer for optimal speech processing
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US9672821B2 (en) * 2015-06-05 2017-06-06 Apple Inc. Robust speech recognition in the presence of echo and noise using multiple signals for discrimination
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
DK179588B1 (en) 2016-06-09 2019-02-22 Apple Inc. Intelligent automated assistant in a home environment
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10586535B2 (en) 2016-06-10 2020-03-10 Apple Inc. Intelligent digital assistant in a multi-tasking environment
DK179049B1 (en) 2016-06-11 2017-09-18 Apple Inc Data driven natural language event detection and classification
DK179415B1 (en) 2016-06-11 2018-06-14 Apple Inc Intelligent device arbitration and control
DK179343B1 (en) 2016-06-11 2018-05-14 Apple Inc Intelligent task discovery
DK201670540A1 (en) 2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant
US10951720B2 (en) 2016-10-24 2021-03-16 Bank Of America Corporation Multi-channel cognitive resource platform
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10720165B2 (en) * 2017-01-23 2020-07-21 Qualcomm Incorporated Keyword voice authentication
DK179745B1 (en) 2017-05-12 2019-05-01 Apple Inc. SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT
DK201770431A1 (en) 2017-05-15 2018-12-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US11620985B2 (en) * 2018-05-15 2023-04-04 Nec Corporation Pattern recognition robust to influence of a transfer path
US10762905B2 (en) * 2018-07-31 2020-09-01 Cirrus Logic, Inc. Speaker verification
CN109087659A (en) * 2018-08-03 2018-12-25 三星电子(中国)研发中心 Audio optimization method and apparatus
US11114089B2 (en) * 2018-11-19 2021-09-07 International Business Machines Corporation Customizing a voice-based interface using surrounding factors
US20210104237A1 (en) * 2019-10-08 2021-04-08 Zebra Technologies Corporation Method and Apparatus for Providing Modular Speech Input to Client Applications
US11489794B2 (en) 2019-11-04 2022-11-01 Bank Of America Corporation System for configuration and intelligent transmission of electronic communications and integrated resource processing
FR3104797B1 (en) * 2019-12-17 2022-01-07 Renault Sas METHOD FOR IDENTIFYING AT LEAST ONE PERSON ON BOARD A MOTOR VEHICLE BY VOICE ANALYSIS
US11411950B2 (en) 2020-04-28 2022-08-09 Bank Of America Corporation Electronic system for integration of communication channels and active cross-channel communication transmission

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4610023A (en) 1982-06-04 1986-09-02 Nissan Motor Company, Limited Speech recognition system and method for variable noise environment
US4720802A (en) 1983-07-26 1988-01-19 Lear Siegler Noise compensation arrangement
US4933973A (en) 1988-02-29 1990-06-12 Itt Corporation Apparatus and methods for the selective addition of noise to templates employed in automatic speech recognition systems
US5148489A (en) 1990-02-28 1992-09-15 Sri International Method for spectral estimation to improve noise robustness for speech recognition
US5222190A (en) 1991-06-11 1993-06-22 Texas Instruments Incorporated Apparatus and method for identifying a speech pattern
US5386492A (en) 1992-06-29 1995-01-31 Kurzweil Applied Intelligence, Inc. Speech recognition system utilizing vocabulary model preselection
US5509104A (en) 1989-05-17 1996-04-16 At&T Corp. Speech recognition employing key word modeling and non-key word modeling
US5617509A (en) 1995-03-29 1997-04-01 Motorola, Inc. Method, apparatus, and radio optimizing Hidden Markov Model speech recognition
US5649057A (en) 1989-05-17 1997-07-15 Lucent Technologies Inc. Speech recognition employing key word modeling and non-key word modeling
US5649055A (en) 1993-03-26 1997-07-15 Hughes Electronics Voice activity detector for speech signals in variable background noise
US5721808A (en) 1995-03-06 1998-02-24 Nippon Telegraph And Telephone Corporation Method for the composition of noise-resistant hidden markov models for speech recognition and speech recognizer using the same
US5749068A (en) 1996-03-25 1998-05-05 Mitsubishi Denki Kabushiki Kaisha Speech recognition apparatus and method in noisy circumstances
US5749067A (en) 1993-09-14 1998-05-05 British Telecommunications Public Limited Company Voice activity detector
US5761639A (en) 1989-03-13 1998-06-02 Kabushiki Kaisha Toshiba Method and apparatus for time series signal recognition with signal variation proof learning
US5778342A (en) * 1996-02-01 1998-07-07 Dspc Israel Ltd. Pattern recognition system and method
US5854999A (en) 1995-06-23 1998-12-29 Nec Corporation Method and system for speech recognition with compensation for variations in the speech environment
US5860062A (en) 1996-06-21 1999-01-12 Matsushita Electric Industrial Co., Ltd. Speech recognition apparatus and speech recognition method
US6078884A (en) * 1995-08-24 2000-06-20 British Telecommunications Public Limited Company Pattern recognition

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4610023A (en) 1982-06-04 1986-09-02 Nissan Motor Company, Limited Speech recognition system and method for variable noise environment
US4720802A (en) 1983-07-26 1988-01-19 Lear Siegler Noise compensation arrangement
US4933973A (en) 1988-02-29 1990-06-12 Itt Corporation Apparatus and methods for the selective addition of noise to templates employed in automatic speech recognition systems
US5761639A (en) 1989-03-13 1998-06-02 Kabushiki Kaisha Toshiba Method and apparatus for time series signal recognition with signal variation proof learning
US5649057A (en) 1989-05-17 1997-07-15 Lucent Technologies Inc. Speech recognition employing key word modeling and non-key word modeling
US5509104A (en) 1989-05-17 1996-04-16 At&T Corp. Speech recognition employing key word modeling and non-key word modeling
US5148489A (en) 1990-02-28 1992-09-15 Sri International Method for spectral estimation to improve noise robustness for speech recognition
US5222190A (en) 1991-06-11 1993-06-22 Texas Instruments Incorporated Apparatus and method for identifying a speech pattern
US5386492A (en) 1992-06-29 1995-01-31 Kurzweil Applied Intelligence, Inc. Speech recognition system utilizing vocabulary model preselection
US5649055A (en) 1993-03-26 1997-07-15 Hughes Electronics Voice activity detector for speech signals in variable background noise
US5749067A (en) 1993-09-14 1998-05-05 British Telecommunications Public Limited Company Voice activity detector
US5721808A (en) 1995-03-06 1998-02-24 Nippon Telegraph And Telephone Corporation Method for the composition of noise-resistant hidden markov models for speech recognition and speech recognizer using the same
US5617509A (en) 1995-03-29 1997-04-01 Motorola, Inc. Method, apparatus, and radio optimizing Hidden Markov Model speech recognition
US5854999A (en) 1995-06-23 1998-12-29 Nec Corporation Method and system for speech recognition with compensation for variations in the speech environment
US6078884A (en) * 1995-08-24 2000-06-20 British Telecommunications Public Limited Company Pattern recognition
US5778342A (en) * 1996-02-01 1998-07-07 Dspc Israel Ltd. Pattern recognition system and method
US5749068A (en) 1996-03-25 1998-05-05 Mitsubishi Denki Kabushiki Kaisha Speech recognition apparatus and method in noisy circumstances
US5860062A (en) 1996-06-21 1999-01-12 Matsushita Electric Industrial Co., Ltd. Speech recognition apparatus and speech recognition method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ICASSP-94. 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1994. Kobayashi et al., "Markov model based noise modelling and its application to noisy speech recognition using dynamical features of speech" pp. II/57-II/60, Apr. 1994.
Proceedings., IEEE International Joint Symposia on Intelligence and Systems. Khn et al., "Robust speech reconition using noise rejection approach." pp. 325-335, May 1998.

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140324428A1 (en) * 2013-04-30 2014-10-30 Ebay Inc. System and method of improving speech recognition using context
US9626963B2 (en) * 2013-04-30 2017-04-18 Paypal, Inc. System and method of improving speech recognition using context
US20170221477A1 (en) * 2013-04-30 2017-08-03 Paypal, Inc. System and method of improving speech recognition using context
US10176801B2 (en) * 2013-04-30 2019-01-08 Paypal, Inc. System and method of improving speech recognition using context
US20160336025A1 (en) * 2014-05-16 2016-11-17 Alphonso Inc. Efficient apparatus and method for audio signature generation using recognition history
US9641980B2 (en) 2014-05-16 2017-05-02 Alphonso Inc. Apparatus and method for determining co-location of services using a device that generates an audio signal
US9698924B2 (en) * 2014-05-16 2017-07-04 Alphonso Inc. Efficient apparatus and method for audio signature generation using recognition history
US9942711B2 (en) 2014-05-16 2018-04-10 Alphonso Inc. Apparatus and method for determining co-location of services using a device that generates an audio signal
US10278017B2 (en) 2014-05-16 2019-04-30 Alphonso, Inc Efficient apparatus and method for audio signature generation using recognition history
US10575126B2 (en) 2014-05-16 2020-02-25 Alphonso Inc. Apparatus and method for determining audio and/or visual time shift
US20170213549A1 (en) * 2016-01-21 2017-07-27 Ford Global Technologies, Llc Dynamic Acoustic Model Switching to Improve Noisy Speech Recognition
US10297251B2 (en) * 2016-01-21 2019-05-21 Ford Global Technologies, Llc Vehicle having dynamic acoustic model switching to improve noisy speech recognition

Also Published As

Publication number Publication date
US5970446A (en) 1999-10-19

Similar Documents

Publication Publication Date Title
USRE45289E1 (en) Selective noise/channel/coding models and recognizers for automatic speech recognition
US10854205B2 (en) Channel-compensated low-level features for speaker recognition
US8175874B2 (en) Personalized voice activity detection
US6374221B1 (en) Automatic retraining of a speech recognizer while using reliable transcripts
US7392188B2 (en) System and method enabling acoustic barge-in
JP4546512B2 (en) Speech recognition system using technology that implicitly adapts to the speaker
US6487530B1 (en) Method for recognizing non-standard and standard speech by speaker independent and speaker dependent word models
US5488652A (en) Method and apparatus for training speech recognition algorithms for directory assistance applications
US5812972A (en) Adaptive decision directed speech recognition bias equalization method and apparatus
US7930179B1 (en) Unsupervised speaker segmentation of multi-speaker speech data
US5414755A (en) System and method for passive voice verification in a telephone network
EP1159737B9 (en) Speaker recognition
US8000962B2 (en) Method and system for using input signal quality in speech recognition
JP2768274B2 (en) Voice recognition device
US20030191636A1 (en) Adapting to adverse acoustic environment in speech processing using playback training data
US20050080627A1 (en) Speech recognition device
EP2148325B1 (en) Method for determining the presence of a wanted signal component
US6246980B1 (en) Method of speech recognition
KR19990043998A (en) Pattern recognition system
JPH096388A (en) Voice recognition equipment
EP3516652B1 (en) Channel-compensated low-level features for speaker recognition
US6138097A (en) Method of learning in a speech recognition system
JP2001520764A (en) Speech analysis system
US20020069064A1 (en) Method and apparatus for testing user interface integrity of speech-enabled devices
US20080228477A1 (en) Method and Device For Processing a Voice Signal For Robust Speech Recognition

Legal Events

Date Code Title Description
AS Assignment

Owner name: AT&T CORP., NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GOLDBERG, RANDY G.;ROSEN, KENNETH H.;SACHS, RICHARD M.;AND OTHERS;SIGNING DATES FROM 19971118 TO 19971124;REEL/FRAME:033526/0246

AS Assignment

Owner name: AT&T PROPERTIES, LLC, NEVADA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AT&T CORP.;REEL/FRAME:038274/0841

Effective date: 20160204

Owner name: AT&T INTELLECTUAL PROPERTY II, L.P., GEORGIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AT&T PROPERTIES, LLC;REEL/FRAME:038274/0917

Effective date: 20160204

AS Assignment

Owner name: NUANCE COMMUNICATIONS, INC., MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AT&T INTELLECTUAL PROPERTY II, L.P.;REEL/FRAME:041498/0316

Effective date: 20161214