US20060253272A1 - Voice prompts for use in speech-to-speech translation system - Google Patents

Voice prompts for use in speech-to-speech translation system Download PDF

Info

Publication number
US20060253272A1
US20060253272A1 US11/123,287 US12328705A US2006253272A1 US 20060253272 A1 US20060253272 A1 US 20060253272A1 US 12328705 A US12328705 A US 12328705A US 2006253272 A1 US2006253272 A1 US 2006253272A1
Authority
US
United States
Prior art keywords
speech
speaker
translation system
language
uttered
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/123,287
Inventor
Yuqing Gao
Liang Gu
Fu-Hua Liu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US11/123,287 priority Critical patent/US20060253272A1/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GAO, YUQING,, GU, LIANG, LIU, FU-HUA
Publication of US20060253272A1 publication Critical patent/US20060253272A1/en
Priority to US12/115,205 priority patent/US8560326B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • This present invention generally relates to speech processing techniques and, more particularly, to techniques for employing voice prompts in a speech-to-speech translation system.
  • Multilingual speech-to-speech language translation systems have been developed to facilitate communication between people that do not share a common language.
  • One example of such a system is the speech-to-speech translation system developed by Carnegie Mellon University (Pittsburgh, Pa.).
  • a speech-to-speech translation system allows a user who has been trained with the system (hereinafter “system user”) to communicate with another person who speaks another language (hereinafter “foreign language speaker” or just “foreign speaker”) and is most often not familiar with the system, by providing speech-to-speech translation service between the two parties.
  • system user a user who has been trained with the system
  • foreign language speaker or just “foreign speaker”
  • Another solution is to use visual prompts.
  • the system user can point a microphone associated with the system to himself or herself when he or she starts to talk and point the microphone to the foreign speaker to indicate for the foreign speaker to start to talk.
  • Other visual indications or gestures may be used to indict the switch of the turn.
  • visual prompts are only helpful in face-to-face speech translation conversations and are useless for other scenarios such as automatic speech translation through call centers. Additionally, in some situations such as emergency medical care, patients speaking another language may keep their eyes closed due to their medical conditions so that the above-described visual prompts may be completely useless. Furthermore, these visual indications may still be confusing without verbal explanations.
  • Principles of the present invention provide techniques for employing improved prompts in a speech-to-speech translation system.
  • a technique for use in indicating a dialogue turn in an automated speech-to-speech translation system comprises the following steps/operations.
  • One or more text-based scripts are obtained.
  • the one or more text-based scripts are synthesizable into one or more voice prompts.
  • At least one of the one or more voice prompts is synthesized for playback from at least one of the one or more text-based scripts, the at least one synthesized voice prompt comprising an audible message in a language understandable to a speaker interacting with the speech-to-speech translation system, the audible message indicating a dialogue turn in the automated speech-to-speech translation system.
  • the technique may also comprise detecting a language spoken by a speaker interacting with the speech-to-speech translation system such that a voice prompt in the detected language is synthesized for playback to the speaker.
  • An initial voice prompt may be synthesized for playback in a default language until the actual language of the speaker is detected.
  • the technique may also comprise one or more of displaying the at least one voice prompt synthesized for playback, recognizing speech uttered by the speaker interacting with the speech-to-speech translation system, and recognizing speech uttered by a system user of the speech-to-speech translation system. At least a portion of the speech uttered by the speaker or the system user may be translated from one language to another language. At least a portion of the translated speech may be displayed.
  • a technique for providing an interface for use in an automated speech-to-speech translation system comprises the following steps/operations.
  • the system user enables a microphone of the translation system via the interface.
  • At least one previously-generated voice prompt is output to the speaker, the at least one voice prompt comprising an audible message in a language understandable to the speaker, the audible message indicating a turn in a dialogue between the system user and the speaker.
  • the speaker once prompted, utters speech into the microphone, the uttered speech being translated by the translation system.
  • an interface for use in an automated speech-to-speech translation system comprises a first field for use by the system user to enable a microphone of the translation system, a second field for use by the system user for at least one of displaying speech uttered by the system user and displaying translated speech uttered by the speaker, and a third field for use by the speaker for at least one of displaying speech uttered by the speaker and displaying translated speech uttered by the system user, wherein the translation system outputs at least one previously-generated voice prompt to the speaker, the at least one voice prompt comprising an audible message in a language understandable to the speaker, the audible message indicating a turn in a dialogue between the system user and the speaker, and the speaker, once prompted, uttering speech into the microphone, the uttered speech being translated by the translation system.
  • the interface may comprise a fourth field for use by the system user to enable a
  • an article of manufacture for use in indicating a dialogue turn in an automated speech-to-speech translation system comprises a machine readable medium containing one or more programs which when executed implement the steps of obtaining one or more text-based scripts, the one or more text-based scripts being synthesizable into one or more voice prompts, and synthesizing for playback at least one of the one or more voice prompts from at least one of the one or more text-based scripts, the at least one synthesized voice prompt comprising an audible message in a language understandable to a speaker interacting with the speech-to-speech translation system, the audible message indicating a dialogue turn in the automated speech-to-speech translation system.
  • principles of the invention provide a prompt solution for use in a speech-to-speech translation system that can sufficiently indicate both the switch of dialogue turns and the specific source language for the next turn.
  • FIG. 1 is a block/flow diagram illustrating a speech-to-speech translation system employing language detection-based multilingual voice prompts, according to an embodiment of the invention
  • FIG. 2 is a diagram illustrating a speech-to-speech translation system user interface, according to an embodiment of the invention.
  • FIG. 3 is a diagram illustrating a computing system in accordance with which one or more components/steps of a speech-to-speech translation system may be implemented, according to an embodiment of the present invention.
  • principles of the invention introduce language-dependent voice prompts during machine-mediated automatic speech-to-speech translation.
  • language can also broadly include a dialect or derivation of a language. That is, language translation may also include translation from one dialect to another dialect.
  • the “system user” is one trained on (or at least operationally familiar with) the speech-to-speech translation system.
  • the system user may also be considered a system operator.
  • the “foreign language speaker” or just “foreign speaker” is one not familiar with or trained on the system.
  • the speech-to-speech translation system may be used to allow a customer service representative (i.e., system user) of some business to communicate with a customer (i.e., foreign language speaker), when the two individuals speak different languages.
  • a voice prompt solution is provided in accordance with principles of the invention that can verbally indicate the switch of the dialogue turns in the language of the foreign speaker by using an automatic language detection algorithm.
  • Such a voice prompt solution is provided with a highly friendly user interface.
  • the voice prompts comprise concise, natural and configurable voice instructions in the foreign language generated by text-to-speech synthesis (TTS) techniques.
  • TTS text-to-speech synthesis
  • the foreign language is determined based on the language detection result of the foreign speaker's speech during one or more previous turns, and with a default foreign language for the first dialogue turn. Therefore, no language selection is required and the system user only needs to click one button to activate the voice prompt for all the foreign language speakers.
  • the user interface of the speech-to-speech translation system is hence very simple and highly convenient.
  • FIG. 1 a block/flow diagram depicts a speech-to-speech translation system 100 employing language detection-based multilingual voice prompts, according to an embodiment of the invention.
  • voice prompts are generated in each desired foreign language and stored as respective script or text files (i.e., text-based scripts).
  • a script file storage unit 103 may be used to store the generated prompts. For example, for a Chinese-to-English speech-to-speech translation, the voice prompt “your turn please” is generated in Chinese text as a script file and stored in script file storage unit 103 . Any number of voice prompts with various audible messages can be generated and stored in such a manner. Such voice prompts are easily generated and reconfigured since a system user can design preferred prompts by modifying existing prompt script files.
  • an initial or default foreign language is set.
  • This initial language could be a foreign language prevalent in the geographic area of system use, e.g., Chinese or Spanish.
  • the voice prompts in this default language are used at the beginning of the speech-translated system-mediated dialogues.
  • a voice prompt is generated (synthesized) via a text-to-speech synthesis (TTS) engine 108 from a prompt script file and audibly presented (played back) to the foreign speaker via an output speaker associated with the system.
  • the synthesized speech associated with the voice prompt may be generated from the text of the corresponding script file in a manner well known in the art of speech processing.
  • TTS text-to-speech synthesis
  • ID initial foreign language identifier
  • step 110 a text-form message of the played voice prompt is displayed on the system user interface in both the native language of the system and the foreign language as a visual feedback for the system user and the foreign language speaker.
  • An illustrative system user interface will be described below in the context of FIG. 2 .
  • the foreign speaker will then speak into a microphone of the system.
  • the speech is recognized (step 112 ) via an automatic speech recognition (ASR) system 114 .
  • ASR automatic speech recognition
  • a language identification algorithm detects the language the foreign speaker is speaking in. It is to be understood that speech may be automatically recognized and the foreign language detected in manners well known in the art of speech processing. Thus, the well-known automated ASR techniques and automated language detection techniques are not described herein in detail, and the invention is not limited to any particular ASR or language detection techniques.
  • the language detection algorithm used As also shown in step 116 , the language detection algorithm used generates an identifier that identifies the language detected by the algorithm. This is provided back to step 106 and replaces the default language identifier. Accordingly, before the dialogue turn switches back again to the foreign speaker, a voice prompt is played to the foreign speaker using the foreign language detected in the previous dialogue turn.
  • buttons displayed icons on the screen associated with the system
  • the various buttons, bars and textboxes described below are predefined functional fields within the screen area of the system user interface.
  • any TTS or ASR operations described below are respectively performed by TTS engine 108 and ASR engine 114 described above in the context of FIG. 1 .
  • the system user presses (clicks on) button 202 of the system interface (also referred to as the graphical user interface or GUI) to turn on the system microphone.
  • Voice volume bar 204 appears in the upper-middle part of the GUI page.
  • An audio prompt (such as a “beep”) is played to indicate that the microphone is now on.
  • the system user speaks native language (e.g., English) into the microphone.
  • the recognized speech (recognized via ASR engine 114 ) is shown in the first textbox 206 .
  • button 202 is pressed again to turn off the microphone.
  • Voice volume bar 204 indicates that the microphone is off.
  • the recognized message is then translated into the foreign language (e.g., Chinese) using a foreign language translation engine (not shown) and displayed in second textbox 208 .
  • the recognized message may be automatically translated into another language in a manner well known in the art of language translation.
  • TTS engine 108 TTS technique
  • the system user presses button 210 to turn on the microphone (which may be the same microphone used by the system user or a different microphone) and let the foreign speaker speak.
  • a language-dependent voice prompt is played to indicate (in the foreign language speech) that the microphone is now on and ready for speech-to-speech translation.
  • Such a voice prompt may be generated and presented as explained above in the context of FIG. 1 .
  • the language of the voice prompt is determined based on the language detection algorithm, as also described above.
  • an audio prompt (such as a beep sound) may also be played to further notify the foreign speaker that the microphone is on and he or she can start to talk.
  • the voice prompt solution of the invention can be combined with the conventional audio prompt solution to achieve even higher user satisfaction.
  • the foreign speaker then speaks into the microphone. His or her speech is recognized and displayed in textbox 208 . After the foreign speaker finishes his or her speech and all the speech has been recognized, button 210 is pressed again to turn off the microphone.
  • the recognized message is then translated back into the native language (e.g., English) and displayed in textbox 206 .
  • the translated sentence is further played back in the native language speech using TTS techniques.
  • buttons 214 which serves as a short-cut button to playback a voice prompt that says “please repeat” in the detected foreign language. This may be used if the system user or the system itself does not understand what the foreign language speaker has said. Also, a pull-down menu 214 enables the system user to manually select the languages to be used in translation operations (e.g., English-to-Chinese, as shown). Further, button 216 functions as an “instruction” button. When pressed, an instructional voice message is played in the detected foreign language to enable the foreign speaker to get familiar with the system functions and therefore enable a smooth system-mediated speech-to-speech translation.
  • a computing system in accordance with which one or more components/steps of a speech-to-speech translation system (e.g., components and methodologies described in the context of FIGS. 1 and 2 ) may be implemented, according to an embodiment of the present invention, is shown. It is to be understood that the individual components/steps may be implemented on one such computer system or on more than one such computer system. In the case of an implementation on a distributed computing system, the individual computer systems and/or devices may be connected via a suitable network, e.g., the Internet or World Wide Web. However, the system may be realized via private or local networks. The invention is not limited to any particular network.
  • the computing system shown in FIG. 3 represents an illustrative computing system architecture for, among other things, a TTS engine, an ASR engine, a language detector, a language translator, and/or combinations thereof, within which one or more of the steps of the voice prompt-based speech-to-speech translation techniques of the invention may be executed.
  • the computer system 300 implementing a speech-to-speech translation system may comprise a processor 302 , a memory 304 , I/O devices 306 , and a communication interface 308 , coupled via a computer bus 310 or alternate connection arrangement.
  • processor as used herein is intended to include any processing device, such as, for example, one that includes a CPU and/or other processing circuitry. It is also to be understood that the term “processor” may refer to more than one processing device and that various elements associated with a processing device may be shared by other processing devices.
  • memory as used herein is intended to include memory associated with a processor or CPU, such as, for example, RAM, ROM, a fixed memory device (e.g., hard drive), a removable memory device (e.g., diskette), flash memory, etc.
  • I/O devices or “I/O devices” as used herein is intended to include, for example, one or more input devices (e.g., keyboard, mouse, microphone, etc.) for entering data to the processing unit, and/or one or more output devices (e.g., speaker, display, etc.) for presenting results associated with the processing unit.
  • I/O devices 306 collectively represent, among other things, the one or more microphones, output speaker, and screen display referred to above.
  • the system interface (GUI) in FIG. 2 is displayable in accordance with such a screen display.
  • communication interface is intended to include, for example, one or more transceivers to permit the computer system to communicate with another computer system via an appropriate communications protocol. That is, if the translation system is distributed (one or more components of the system remotely located from one or more other components), communication interface 308 permits all the components to communicate.
  • software components including instructions or code for performing the methodologies described herein may be stored in one or more of the associated memory devices (e.g., ROM, fixed or removable memory) and, when ready to be utilized, loaded in part or in whole (e.g., into RAM) and executed by a CPU.
  • ROM read-only memory
  • RAM random access memory

Abstract

Techniques for employing improved prompts in a speech-to-speech translation system are disclosed. By way of example, a technique for use in indicating a dialogue turn in an automated speech-to-speech translation system comprises the following steps/operations. One or more text-based scripts are obtained. The one or more text-based scripts are synthesizable into one or more voice prompts. At least one of the one or more voice prompts is synthesized for playback from at least one of the one or more text-based scripts, the at least one synthesized voice prompt comprising an audible message in a language understandable to a speaker interacting with the speech-to-speech translation system, the audible message indicating a dialogue turn in the automated speech-to-speech translation system.

Description

  • This invention was made with Government support under Contract No.: N66001-99-2-8916 awarded by DARPA BABYLON. The Government has certain rights in this invention.
  • FIELD OF THE INVENTION
  • This present invention generally relates to speech processing techniques and, more particularly, to techniques for employing voice prompts in a speech-to-speech translation system.
  • BACKGROUND OF THE INVENTION
  • Multilingual speech-to-speech language translation systems have been developed to facilitate communication between people that do not share a common language. One example of such a system is the speech-to-speech translation system developed by Carnegie Mellon University (Pittsburgh, Pa.).
  • A speech-to-speech translation system allows a user who has been trained with the system (hereinafter “system user”) to communicate with another person who speaks another language (hereinafter “foreign language speaker” or just “foreign speaker”) and is most often not familiar with the system, by providing speech-to-speech translation service between the two parties.
  • Since conventional speech-to-speech translation systems can handle only one speaker at a time, the two speakers need to take turns during the communication. Therefore, the indication (or prompt) of the switch of turns becomes a very important issue in order to ensure a smooth speech translation multilingual conversation.
  • Various prompts to indicate the switch of turns exist in conventional speech-to-speech translation systems. The most widely adopted prompt uses audio sound effects such as a beep sound. The sound effects can be language dependent so that a specific sound represents a specific language. The drawback of this approach is that both the system user and the foreign language speaker need to be trained to be familiar with the meaning of these sound effects. For a frequent system user, this brings additional inconvenience, as he or she must remember the meaning of sound effects for each language supported by the system. For a foreign speaker who is not familiar with or has never used this kind of system before, this function is not easily usable for them since the system user cannot explain the function to the foreign speaker because of the language barrier. The foreign speaker needs to guess the meanings of these sounds, often with great frustration and, consequently, with great dissatisfaction.
  • Another solution is to use visual prompts. The system user can point a microphone associated with the system to himself or herself when he or she starts to talk and point the microphone to the foreign speaker to indicate for the foreign speaker to start to talk. Other visual indications or gestures may be used to indict the switch of the turn. However, visual prompts are only helpful in face-to-face speech translation conversations and are useless for other scenarios such as automatic speech translation through call centers. Additionally, in some situations such as emergency medical care, patients speaking another language may keep their eyes closed due to their medical conditions so that the above-described visual prompts may be completely useless. Furthermore, these visual indications may still be confusing without verbal explanations.
  • SUMMARY OF THE INVENTION
  • Principles of the present invention provide techniques for employing improved prompts in a speech-to-speech translation system.
  • By way of example, in a first aspect of the invention, a technique for use in indicating a dialogue turn in an automated speech-to-speech translation system comprises the following steps/operations. One or more text-based scripts are obtained. The one or more text-based scripts are synthesizable into one or more voice prompts. At least one of the one or more voice prompts is synthesized for playback from at least one of the one or more text-based scripts, the at least one synthesized voice prompt comprising an audible message in a language understandable to a speaker interacting with the speech-to-speech translation system, the audible message indicating a dialogue turn in the automated speech-to-speech translation system.
  • The technique may also comprise detecting a language spoken by a speaker interacting with the speech-to-speech translation system such that a voice prompt in the detected language is synthesized for playback to the speaker. An initial voice prompt may be synthesized for playback in a default language until the actual language of the speaker is detected.
  • The technique may also comprise one or more of displaying the at least one voice prompt synthesized for playback, recognizing speech uttered by the speaker interacting with the speech-to-speech translation system, and recognizing speech uttered by a system user of the speech-to-speech translation system. At least a portion of the speech uttered by the speaker or the system user may be translated from one language to another language. At least a portion of the translated speech may be displayed.
  • In a second aspect of the invention, a technique for providing an interface for use in an automated speech-to-speech translation system, the translation system being operated by a system user and interacted with by a speaker, comprises the following steps/operations. The system user enables a microphone of the translation system via the interface. At least one previously-generated voice prompt is output to the speaker, the at least one voice prompt comprising an audible message in a language understandable to the speaker, the audible message indicating a turn in a dialogue between the system user and the speaker. The speaker, once prompted, utters speech into the microphone, the uttered speech being translated by the translation system.
  • In a third aspect of the invention, an interface for use in an automated speech-to-speech translation system, the translation system being operated by a system user and interacted with by a speaker, comprises a first field for use by the system user to enable a microphone of the translation system, a second field for use by the system user for at least one of displaying speech uttered by the system user and displaying translated speech uttered by the speaker, and a third field for use by the speaker for at least one of displaying speech uttered by the speaker and displaying translated speech uttered by the system user, wherein the translation system outputs at least one previously-generated voice prompt to the speaker, the at least one voice prompt comprising an audible message in a language understandable to the speaker, the audible message indicating a turn in a dialogue between the system user and the speaker, and the speaker, once prompted, uttering speech into the microphone, the uttered speech being translated by the translation system. The interface may comprise a fourth field for use by the system user to enable a microphone of the translation system such that speech uttered by the system user is captured by the translation system.
  • In a fourth aspect of the invention, an article of manufacture for use in indicating a dialogue turn in an automated speech-to-speech translation system, comprises a machine readable medium containing one or more programs which when executed implement the steps of obtaining one or more text-based scripts, the one or more text-based scripts being synthesizable into one or more voice prompts, and synthesizing for playback at least one of the one or more voice prompts from at least one of the one or more text-based scripts, the at least one synthesized voice prompt comprising an audible message in a language understandable to a speaker interacting with the speech-to-speech translation system, the audible message indicating a dialogue turn in the automated speech-to-speech translation system.
  • Accordingly, principles of the invention provide a prompt solution for use in a speech-to-speech translation system that can sufficiently indicate both the switch of dialogue turns and the specific source language for the next turn.
  • These and other objects, features and advantages of the present invention will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block/flow diagram illustrating a speech-to-speech translation system employing language detection-based multilingual voice prompts, according to an embodiment of the invention;
  • FIG. 2 is a diagram illustrating a speech-to-speech translation system user interface, according to an embodiment of the invention; and
  • FIG. 3 is a diagram illustrating a computing system in accordance with which one or more components/steps of a speech-to-speech translation system may be implemented, according to an embodiment of the present invention.
  • DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
  • As will be illustratively explained herein, principles of the invention introduce language-dependent voice prompts during machine-mediated automatic speech-to-speech translation.
  • It is to be understood that while principles of the invention are described as translating from one language to another language, the term “language” can also broadly include a dialect or derivation of a language. That is, language translation may also include translation from one dialect to another dialect.
  • It is also to be understood that the “system user” is one trained on (or at least operationally familiar with) the speech-to-speech translation system. The system user may also be considered a system operator. The “foreign language speaker” or just “foreign speaker” is one not familiar with or trained on the system. In one example application, the speech-to-speech translation system may be used to allow a customer service representative (i.e., system user) of some business to communicate with a customer (i.e., foreign language speaker), when the two individuals speak different languages.
  • A voice prompt solution is provided in accordance with principles of the invention that can verbally indicate the switch of the dialogue turns in the language of the foreign speaker by using an automatic language detection algorithm. Such a voice prompt solution is provided with a highly friendly user interface. The voice prompts comprise concise, natural and configurable voice instructions in the foreign language generated by text-to-speech synthesis (TTS) techniques. In a multilingual speech-to-speech translation system with more than two languages involved, the foreign language is determined based on the language detection result of the foreign speaker's speech during one or more previous turns, and with a default foreign language for the first dialogue turn. Therefore, no language selection is required and the system user only needs to click one button to activate the voice prompt for all the foreign language speakers. The user interface of the speech-to-speech translation system is hence very simple and highly convenient.
  • An illustrative embodiment of a methodology and system for implementing multilingual voice prompts in a speech-to-speech translation system will now be described.
  • Referring initially to FIG. 1, a block/flow diagram depicts a speech-to-speech translation system 100 employing language detection-based multilingual voice prompts, according to an embodiment of the invention.
  • As shown, in step 102, voice prompts are generated in each desired foreign language and stored as respective script or text files (i.e., text-based scripts). A script file storage unit 103 may be used to store the generated prompts. For example, for a Chinese-to-English speech-to-speech translation, the voice prompt “your turn please” is generated in Chinese text as a script file and stored in script file storage unit 103. Any number of voice prompts with various audible messages can be generated and stored in such a manner. Such voice prompts are easily generated and reconfigured since a system user can design preferred prompts by modifying existing prompt script files.
  • In step 104, an initial or default foreign language is set. This initial language could be a foreign language prevalent in the geographic area of system use, e.g., Chinese or Spanish. The voice prompts in this default language are used at the beginning of the speech-translated system-mediated dialogues.
  • In step 106, a voice prompt is generated (synthesized) via a text-to-speech synthesis (TTS) engine 108 from a prompt script file and audibly presented (played back) to the foreign speaker via an output speaker associated with the system. The synthesized speech associated with the voice prompt may be generated from the text of the corresponding script file in a manner well known in the art of speech processing. Thus, the well-known automated TTS techniques are not described herein in detail, and the invention is not limited to any particular TTS technique. It is also to be understood that an initial foreign language identifier (ID) can be used to instruct the system as to which foreign language voice prompt to initially select for playback.
  • In step 110, a text-form message of the played voice prompt is displayed on the system user interface in both the native language of the system and the foreign language as a visual feedback for the system user and the foreign language speaker. An illustrative system user interface will be described below in the context of FIG. 2.
  • Once prompted (e.g., “your turn please”) that it is his or her turn, the foreign speaker will then speak into a microphone of the system. During each turn of the foreign speaker, the speech is recognized (step 112) via an automatic speech recognition (ASR) system 114. Based on the actual speech and/or the recognized speech, in step 116, a language identification algorithm detects the language the foreign speaker is speaking in. It is to be understood that speech may be automatically recognized and the foreign language detected in manners well known in the art of speech processing. Thus, the well-known automated ASR techniques and automated language detection techniques are not described herein in detail, and the invention is not limited to any particular ASR or language detection techniques.
  • As also shown in step 116, the language detection algorithm used generates an identifier that identifies the language detected by the algorithm. This is provided back to step 106 and replaces the default language identifier. Accordingly, before the dialogue turn switches back again to the foreign speaker, a voice prompt is played to the foreign speaker using the foreign language detected in the previous dialogue turn.
  • Referring now to FIG. 2, an illustrative speech-to-speech translation system user interface 200, according to an embodiment of the invention, is shown. It is to be understood that control of the various buttons (displayed icons on the screen associated with the system) is exercised by the system user, i.e., the person trained to use the system. It is also to be understood that the various buttons, bars and textboxes described below are predefined functional fields within the screen area of the system user interface. Also, any TTS or ASR operations described below are respectively performed by TTS engine 108 and ASR engine 114 described above in the context of FIG. 1.
  • The system user presses (clicks on) button 202 of the system interface (also referred to as the graphical user interface or GUI) to turn on the system microphone. Voice volume bar 204 appears in the upper-middle part of the GUI page.
  • An audio prompt (such as a “beep”) is played to indicate that the microphone is now on. The system user speaks native language (e.g., English) into the microphone. The recognized speech (recognized via ASR engine 114) is shown in the first textbox 206.
  • After the user finishes his/her speech and all the speech has been recognized, button 202 is pressed again to turn off the microphone. Voice volume bar 204 indicates that the microphone is off. The recognized message is then translated into the foreign language (e.g., Chinese) using a foreign language translation engine (not shown) and displayed in second textbox 208. It is to be understood that the recognized message may be automatically translated into another language in a manner well known in the art of language translation. Thus, the well-known automated translation techniques are not described herein in detail, and the invention is not limited to any particular language translation techniques. The translated sentence is further played back to the foreign language speaker using TTS techniques (TTS engine 108).
  • The system user presses button 210 to turn on the microphone (which may be the same microphone used by the system user or a different microphone) and let the foreign speaker speak. A language-dependent voice prompt is played to indicate (in the foreign language speech) that the microphone is now on and ready for speech-to-speech translation. Such a voice prompt may be generated and presented as explained above in the context of FIG. 1. The language of the voice prompt is determined based on the language detection algorithm, as also described above.
  • In one embodiment, after the language-ID-based voice prompt is played, an audio prompt (such as a beep sound) may also be played to further notify the foreign speaker that the microphone is on and he or she can start to talk. In other words, the voice prompt solution of the invention can be combined with the conventional audio prompt solution to achieve even higher user satisfaction.
  • The foreign speaker then speaks into the microphone. His or her speech is recognized and displayed in textbox 208. After the foreign speaker finishes his or her speech and all the speech has been recognized, button 210 is pressed again to turn off the microphone. The recognized message is then translated back into the native language (e.g., English) and displayed in textbox 206. The translated sentence is further played back in the native language speech using TTS techniques.
  • The above steps are considered as one turn of the speech-translation system-mediated dialogue. The native language user (system user) and the foreign language speaker will repeat these steps to communicate with each other until all information has been successfully exchanged.
  • Also shown in system interface 200 is a button 214 which serves as a short-cut button to playback a voice prompt that says “please repeat” in the detected foreign language. This may be used if the system user or the system itself does not understand what the foreign language speaker has said. Also, a pull-down menu 214 enables the system user to manually select the languages to be used in translation operations (e.g., English-to-Chinese, as shown). Further, button 216 functions as an “instruction” button. When pressed, an instructional voice message is played in the detected foreign language to enable the foreign speaker to get familiar with the system functions and therefore enable a smooth system-mediated speech-to-speech translation.
  • Referring finally to FIG. 3, a computing system in accordance with which one or more components/steps of a speech-to-speech translation system (e.g., components and methodologies described in the context of FIGS. 1 and 2) may be implemented, according to an embodiment of the present invention, is shown. It is to be understood that the individual components/steps may be implemented on one such computer system or on more than one such computer system. In the case of an implementation on a distributed computing system, the individual computer systems and/or devices may be connected via a suitable network, e.g., the Internet or World Wide Web. However, the system may be realized via private or local networks. The invention is not limited to any particular network.
  • Thus, the computing system shown in FIG. 3 represents an illustrative computing system architecture for, among other things, a TTS engine, an ASR engine, a language detector, a language translator, and/or combinations thereof, within which one or more of the steps of the voice prompt-based speech-to-speech translation techniques of the invention may be executed.
  • As shown, the computer system 300 implementing a speech-to-speech translation system may comprise a processor 302, a memory 304, I/O devices 306, and a communication interface 308, coupled via a computer bus 310 or alternate connection arrangement.
  • It is to be appreciated that the term “processor” as used herein is intended to include any processing device, such as, for example, one that includes a CPU and/or other processing circuitry. It is also to be understood that the term “processor” may refer to more than one processing device and that various elements associated with a processing device may be shared by other processing devices.
  • The term “memory” as used herein is intended to include memory associated with a processor or CPU, such as, for example, RAM, ROM, a fixed memory device (e.g., hard drive), a removable memory device (e.g., diskette), flash memory, etc.
  • In addition, the phrase “input/output devices” or “I/O devices” as used herein is intended to include, for example, one or more input devices (e.g., keyboard, mouse, microphone, etc.) for entering data to the processing unit, and/or one or more output devices (e.g., speaker, display, etc.) for presenting results associated with the processing unit. Thus, I/O devices 306 collectively represent, among other things, the one or more microphones, output speaker, and screen display referred to above. The system interface (GUI) in FIG. 2 is displayable in accordance with such a screen display.
  • Still further, the phrase “communication interface” as used herein is intended to include, for example, one or more transceivers to permit the computer system to communicate with another computer system via an appropriate communications protocol. That is, if the translation system is distributed (one or more components of the system remotely located from one or more other components), communication interface 308 permits all the components to communicate.
  • Accordingly, software components including instructions or code for performing the methodologies described herein may be stored in one or more of the associated memory devices (e.g., ROM, fixed or removable memory) and, when ready to be utilized, loaded in part or in whole (e.g., into RAM) and executed by a CPU.
  • Although illustrative embodiments of the present invention have been described herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various other changes and modifications may be made by one skilled in the art without departing from the scope or spirit of the invention.

Claims (22)

1. A method for use in indicating a dialogue turn in an automated speech-to-speech translation system, comprising the steps of:
obtaining one or more text-based scripts, the one or more text-based scripts being synthesizable into one or more voice prompts; and
synthesizing for playback at least one of the one or more voice prompts from at least one of the one or more text-based scripts, the at least one synthesized voice prompt comprising an audible message in a language understandable to a speaker interacting with the speech-to-speech translation system, the audible message indicating a dialogue turn in the automated speech-to-speech translation system.
2. The method of claim 1, further comprising the step of detecting a language spoken by a speaker interacting with the speech-to-speech translation system such that a voice prompt in the detected language is synthesized for playback to the speaker.
3. The method of claim 2, wherein an initial voice prompt is synthesized for playback in a default language until the actual language of the speaker is detected.
4. The method of claim 1, further comprising the step of displaying the at least one voice prompt synthesized for playback.
5. The method of claim 1, further comprising the step of recognizing speech uttered by the speaker interacting with the speech-to-speech translation system.
6. The method of claim 5, further comprising the step of recognizing speech uttered by a system user of the speech-to-speech translation system.
7. The method of claim 6, wherein at least a portion of the speech uttered by the speaker or the system user is translated from one language to another language.
8. The method of claim 7, wherein at least a portion of the translated speech is displayed.
9. A method of providing an interface for use in an automated speech-to-speech translation system, the translation system being operated by a system user and interacted with by a speaker, the method comprising the steps of:
the system user enabling a microphone of the translation system via the interface;
outputting at least one previously-generated voice prompt to the speaker, the at least one voice prompt comprising an audible message in a language understandable to the speaker, the audible message indicating a turn in a dialogue between the system user and the speaker; and
the speaker, once prompted, uttering speech into the microphone, the uttered speech being translated by the translation system.
10. The method of claim 9, further comprising the step of displaying text in a first field of the interface representing speech uttered by the system user.
11. The method of claim 10, further comprising the step of displaying text in a second field of the interface representing speech uttered by the speaker.
12. Apparatus for use in indicating a dialogue turn in an automated speech-to-speech translation system, comprising:
a memory; and
at least one processor coupled to the memory and operative to: (i) obtain one or more text-based scripts, the one or more text-based scripts being synthesizable into one or more voice prompts, and (ii) synthesize for playback at least one of the one or more voice prompts from at least one of the one or more text-based scripts, the at least one synthesized voice prompt comprising an audible message in a language understandable to a speaker interacting with the speech-to-speech translation system, the audible message indicating a dialogue turn in the automated speech-to-speech translation system.
13. The apparatus of claim 12, wherein the at least one processor is further operative to detect a language spoken by a speaker interacting with the speech-to-speech translation system such that a voice prompt in the detected language is synthesized for playback to the speaker.
14. The apparatus of claim 13, wherein an initial voice prompt is synthesized for playback in a default language until the actual language of the speaker is detected.
15. The apparatus of claim 12, wherein the at least one processor is further operative to display the at least one voice prompt synthesized for playback.
16. The apparatus of claim 12, wherein the at least one processor is further operative to recognize speech uttered by the speaker interacting with the speech-to-speech translation system.
17. The apparatus of claim 16, wherein the at least one processor is further operative to recognize speech uttered by a system user of the speech-to-speech translation system.
18. The apparatus of claim 17, wherein at least a portion of the speech uttered by the speaker or the system user is translated from one language to another language.
19. The apparatus of claim 18, wherein at least a portion of the translated speech is displayed.
20. An interface for use in an automated speech-to-speech translation system, the translation system being operated by a system user and interacted with by a speaker, the interface comprising:
a first field for use by the system user to enable a microphone of the translation system;
a second field for use by the system user for at least one of displaying speech uttered by the system user and displaying translated speech uttered by the speaker; and
a third field for use by the speaker for at least one of displaying speech uttered by the speaker and displaying translated speech uttered by the system user;
wherein the translation system outputs at least one previously-generated voice prompt to the speaker, the at least one voice prompt comprising an audible message in a language understandable to the speaker, the audible message indicating a turn in a dialogue between the system user and the speaker, and the speaker, once prompted, uttering speech into the microphone, the uttered speech being translated by the translation system.
21. The interface of claim 20, further comprising a fourth field for use by the system user to enable a microphone of the translation system such that speech uttered by the system user is captured by the translation system.
22. An article of manufacture for use in indicating a dialogue turn in an automated speech-to-speech translation system, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
obtaining one or more text-based scripts, the one or more text-based scripts being synthesizable into one or more voice prompts; and
synthesizing for playback at least one of the one or more voice prompts from at least one of the one or more text-based scripts, the at least one synthesized voice prompt comprising an audible message in a language understandable to a speaker interacting with the speech-to-speech translation system, the audible message indicating a dialogue turn in the automated speech-to-speech translation system.
US11/123,287 2005-05-06 2005-05-06 Voice prompts for use in speech-to-speech translation system Abandoned US20060253272A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US11/123,287 US20060253272A1 (en) 2005-05-06 2005-05-06 Voice prompts for use in speech-to-speech translation system
US12/115,205 US8560326B2 (en) 2005-05-06 2008-05-05 Voice prompts for use in speech-to-speech translation system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/123,287 US20060253272A1 (en) 2005-05-06 2005-05-06 Voice prompts for use in speech-to-speech translation system

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US12/115,205 Continuation US8560326B2 (en) 2005-05-06 2008-05-05 Voice prompts for use in speech-to-speech translation system

Publications (1)

Publication Number Publication Date
US20060253272A1 true US20060253272A1 (en) 2006-11-09

Family

ID=37395081

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/123,287 Abandoned US20060253272A1 (en) 2005-05-06 2005-05-06 Voice prompts for use in speech-to-speech translation system
US12/115,205 Expired - Fee Related US8560326B2 (en) 2005-05-06 2008-05-05 Voice prompts for use in speech-to-speech translation system

Family Applications After (1)

Application Number Title Priority Date Filing Date
US12/115,205 Expired - Fee Related US8560326B2 (en) 2005-05-06 2008-05-05 Voice prompts for use in speech-to-speech translation system

Country Status (1)

Country Link
US (2) US20060253272A1 (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070061152A1 (en) * 2005-09-15 2007-03-15 Kabushiki Kaisha Toshiba Apparatus and method for translating speech and performing speech synthesis of translation result
US20080243476A1 (en) * 2005-05-06 2008-10-02 International Business Machines Corporation Voice Prompts for Use in Speech-to-Speech Translation System
US20090234633A1 (en) * 2008-03-17 2009-09-17 Virginia Chao-Suren Systems and methods for enabling inter-language communications
US20140180671A1 (en) * 2012-12-24 2014-06-26 Maria Osipova Transferring Language of Communication Information
US20140288919A1 (en) * 2010-08-05 2014-09-25 Google Inc. Translating languages
US9195656B2 (en) 2013-12-30 2015-11-24 Google Inc. Multilingual prosody generation
US20150379981A1 (en) * 2014-06-26 2015-12-31 Nuance Communications, Inc. Automatically presenting different user experiences, such as customized voices in automated communication systems
US20170186338A1 (en) * 2015-12-28 2017-06-29 Amazon Technologies, Inc. System for assisting in foreign language learning
US9922641B1 (en) * 2012-10-01 2018-03-20 Google Llc Cross-lingual speaker adaptation for multi-lingual speech synthesis
US9953631B1 (en) * 2015-05-07 2018-04-24 Google Llc Automatic speech recognition techniques for multiple languages
US10403291B2 (en) 2016-07-15 2019-09-03 Google Llc Improving speaker verification across locations, languages, and/or dialects
US20200193965A1 (en) * 2018-12-13 2020-06-18 Language Line Services, Inc. Consistent audio generation configuration for a multi-modal language interpretation system
US11049493B2 (en) * 2016-07-28 2021-06-29 National Institute Of Information And Communications Technology Spoken dialog device, spoken dialog method, and recording medium
US20220084523A1 (en) * 2020-09-11 2022-03-17 Avaya Management L.P. Multilingual transcription at customer endpoint for optimizing interaction results in a contact center
US11361780B2 (en) * 2021-12-24 2022-06-14 Sandeep Dhawan Real-time speech-to-speech generation (RSSG) apparatus, method and a system therefore
US20220276829A1 (en) * 2019-03-04 2022-09-01 Giide Audio, Inc. Interactive podcast platform with integrated additional audio/visual content
US11443737B2 (en) * 2020-01-14 2022-09-13 Sony Corporation Audio video translation into multiple languages for respective listeners

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9685190B1 (en) * 2006-06-15 2017-06-20 Google Inc. Content sharing
KR20130071958A (en) * 2011-12-21 2013-07-01 엔에이치엔(주) System and method for providing interpretation or translation of user message by instant messaging application
US20140163948A1 (en) * 2012-12-10 2014-06-12 At&T Intellectual Property I, L.P. Message language conversion
US9953630B1 (en) * 2013-05-31 2018-04-24 Amazon Technologies, Inc. Language recognition for device settings

Citations (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4974191A (en) * 1987-07-31 1990-11-27 Syntellect Software Inc. Adaptive natural language computer interface system
US5748974A (en) * 1994-12-13 1998-05-05 International Business Machines Corporation Multimodal natural language interface for cross-application tasks
US5870701A (en) * 1992-08-21 1999-02-09 Canon Kabushiki Kaisha Control signal processing method and apparatus having natural language interfacing capabilities
US5943643A (en) * 1995-04-13 1999-08-24 Canon Kabushiki Kaisha Language processing method and apparatus
US5956668A (en) * 1997-07-18 1999-09-21 At&T Corp. Method and apparatus for speech translation with unrecognized segments
US6223150B1 (en) * 1999-01-29 2001-04-24 Sony Corporation Method and apparatus for parsing in a spoken language translation system
US6266642B1 (en) * 1999-01-29 2001-07-24 Sony Corporation Method and portable apparatus for performing spoken language translation
US20010011217A1 (en) * 1998-12-31 2001-08-02 Egbert Ammicht User barge-in enablement in large vocabulary speech recognition systems
US20010013051A1 (en) * 1997-06-10 2001-08-09 Akifumi Nakada Message handling method, message handling apparatus, and memory media for storing a message handling apparatus controlling program
US6327343B1 (en) * 1998-01-16 2001-12-04 International Business Machines Corporation System and methods for automatic call and data transfer processing
US20020010742A1 (en) * 1999-01-04 2002-01-24 Fujitsu Limited Communication assistance method and device
US20020094067A1 (en) * 2001-01-18 2002-07-18 Lucent Technologies Inc. Network provided information using text-to-speech and speech recognition and text or speech activated network control sequences for complimentary feature access
US20020156688A1 (en) * 2001-02-21 2002-10-24 Michel Horn Global electronic commerce system
US20020161579A1 (en) * 2001-04-26 2002-10-31 Speche Communications Systems and methods for automated audio transcription, translation, and transfer
US20020169592A1 (en) * 2001-05-11 2002-11-14 Aityan Sergey Khachatur Open environment for real-time multilingual communication
US20030033312A1 (en) * 2001-08-09 2003-02-13 Atsuko Koizumi Method of interpretation service for voice on the phone
US20030046062A1 (en) * 2001-08-31 2003-03-06 Cartus John R. Productivity tool for language translators
US6553345B1 (en) * 1999-08-26 2003-04-22 Matsushita Electric Industrial Co., Ltd. Universal remote control allowing natural language modality for television and multimedia searches and requests
US20030105634A1 (en) * 2001-10-15 2003-06-05 Alicia Abella Method for dialog management
US20030110023A1 (en) * 2001-12-07 2003-06-12 Srinivas Bangalore Systems and methods for translating languages
US20030120478A1 (en) * 2001-12-21 2003-06-26 Robert Palmquist Network-based translation system
US20030187641A1 (en) * 2002-04-02 2003-10-02 Worldcom, Inc. Media translator
US20040019487A1 (en) * 2002-03-11 2004-01-29 International Business Machines Corporation Multi-modal messaging
US6701294B1 (en) * 2000-01-19 2004-03-02 Lucent Technologies, Inc. User interface for translating natural language inquiries into database queries and data presentations
US20040122678A1 (en) * 2002-12-10 2004-06-24 Leslie Rousseau Device and method for translating language
US20040183749A1 (en) * 2003-03-21 2004-09-23 Roel Vertegaal Method and apparatus for communication between humans and devices
US20040199375A1 (en) * 1999-05-28 2004-10-07 Farzad Ehsani Phrase-based dialogue modeling with particular application to creating a recognition grammar for a voice-controlled user interface
US6816578B1 (en) * 2001-11-27 2004-11-09 Nortel Networks Limited Efficient instant messaging using a telephony interface
US6839670B1 (en) * 1995-09-11 2005-01-04 Harman Becker Automotive Systems Gmbh Process for automatic control of one or more devices by voice commands or by real-time voice dialog and apparatus for carrying out this process
US20050049874A1 (en) * 2003-09-03 2005-03-03 International Business Machines Corporation Method and apparatus for dynamic modification of command weights in a natural language understanding system
US20050131684A1 (en) * 2003-12-12 2005-06-16 International Business Machines Corporation Computer generated prompting
US7343290B2 (en) * 2001-09-26 2008-03-11 Nuance Communications, Inc. System and method of switching between dialog systems with separate dedicated communication units

Family Cites Families (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5694558A (en) * 1994-04-22 1997-12-02 U S West Technologies, Inc. Method and system for interactive object-oriented dialogue management
US5493606A (en) * 1994-05-31 1996-02-20 Unisys Corporation Multi-lingual prompt management system for a network applications platform
JP3034773B2 (en) * 1994-12-27 2000-04-17 シャープ株式会社 Electronic interpreter
US5794218A (en) * 1996-01-16 1998-08-11 Citibank, N.A. Automated multilingual interactive system and method to perform financial transactions
US6233545B1 (en) * 1997-05-01 2001-05-15 William E. Datig Universal machine translator of arbitrary languages utilizing epistemic moments
US7003463B1 (en) * 1998-10-02 2006-02-21 International Business Machines Corporation System and method for providing network coordinated conversational services
US6470317B1 (en) * 1998-10-02 2002-10-22 Motorola, Inc. Markup language to allow for billing of interactive services and methods thereof
US6546366B1 (en) * 1999-02-26 2003-04-08 Mitel, Inc. Text-to-speech converter
DE69942663D1 (en) * 1999-04-13 2010-09-23 Sony Deutschland Gmbh Merging of speech interfaces for the simultaneous use of devices and applications
US6526382B1 (en) * 1999-12-07 2003-02-25 Comverse, Inc. Language-oriented user interfaces for voice activated services
US6757362B1 (en) * 2000-03-06 2004-06-29 Avaya Technology Corp. Personal virtual assistant
US6963841B2 (en) * 2000-04-21 2005-11-08 Lessac Technology, Inc. Speech training method with alternative proper pronunciation database
US6920425B1 (en) * 2000-05-16 2005-07-19 Nortel Networks Limited Visual interactive response system and method translated from interactive voice response for telephone utility
US6934756B2 (en) * 2000-11-01 2005-08-23 International Business Machines Corporation Conversational networking via transport, coding and control conversational protocols
US6559866B2 (en) * 2001-05-23 2003-05-06 Digeo, Inc. System and method for providing foreign language support for a remote control device
GB0113583D0 (en) * 2001-06-04 2001-07-25 Hewlett Packard Co Speech system barge-in control
US20050234727A1 (en) * 2001-07-03 2005-10-20 Leo Chiu Method and apparatus for adapting a voice extensible markup language-enabled voice system for natural speech recognition and system response
US7609829B2 (en) * 2001-07-03 2009-10-27 Apptera, Inc. Multi-platform capable inference engine and universal grammar language adapter for intelligent voice application execution
US6985865B1 (en) * 2001-09-26 2006-01-10 Sprint Spectrum L.P. Method and system for enhanced response to voice commands in a voice command platform
US7711570B2 (en) * 2001-10-21 2010-05-04 Microsoft Corporation Application abstraction with dialog purpose
US6807529B2 (en) * 2002-02-27 2004-10-19 Motorola, Inc. System and method for concurrent multimodal communication
US7177816B2 (en) * 2002-07-05 2007-02-13 At&T Corp. System and method of handling problematic input during context-sensitive help for multi-modal dialog systems
US20040044517A1 (en) * 2002-08-30 2004-03-04 Robert Palmquist Translation system
GB2395029A (en) * 2002-11-06 2004-05-12 Alan Wilkinson Translation of electronically transmitted messages
US7373300B1 (en) * 2002-12-18 2008-05-13 At&T Corp. System and method of providing a spoken dialog interface to a website
US7003464B2 (en) * 2003-01-09 2006-02-21 Motorola, Inc. Dialog recognition and control in a voice browser
US7275032B2 (en) * 2003-04-25 2007-09-25 Bvoice Corporation Telephone call handling center where operators utilize synthesized voices generated or modified to exhibit or omit prescribed speech characteristics
US7260535B2 (en) * 2003-04-28 2007-08-21 Microsoft Corporation Web server controls for web enabled recognition and/or audible prompting for call controls
US8301436B2 (en) * 2003-05-29 2012-10-30 Microsoft Corporation Semantic object synchronous understanding for highly interactive interface
US20050010418A1 (en) * 2003-07-10 2005-01-13 Vocollect, Inc. Method and system for intelligent prompt control in a multimodal software application
US7539619B1 (en) * 2003-09-05 2009-05-26 Spoken Translation Ind. Speech-enabled language translation system and method enabling interactive user supervision of translation and speech recognition accuracy
US7555533B2 (en) * 2003-10-15 2009-06-30 Harman Becker Automotive Systems Gmbh System for communicating information from a server via a mobile communication device
US8296126B2 (en) * 2004-02-25 2012-10-23 Research In Motion Limited System and method for multi-lingual translation
US7412393B1 (en) * 2004-03-01 2008-08-12 At&T Corp. Method for developing a dialog manager using modular spoken-dialog components
US7461000B2 (en) * 2004-10-19 2008-12-02 International Business Machines Corporation System and methods for conducting an interactive dialog via a speech-based user interface
US20060253272A1 (en) * 2005-05-06 2006-11-09 International Business Machines Corporation Voice prompts for use in speech-to-speech translation system
US20070015121A1 (en) * 2005-06-02 2007-01-18 University Of Southern California Interactive Foreign Language Teaching
US7552053B2 (en) * 2005-08-22 2009-06-23 International Business Machines Corporation Techniques for aiding speech-to-speech translation

Patent Citations (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4974191A (en) * 1987-07-31 1990-11-27 Syntellect Software Inc. Adaptive natural language computer interface system
US5870701A (en) * 1992-08-21 1999-02-09 Canon Kabushiki Kaisha Control signal processing method and apparatus having natural language interfacing capabilities
US5748974A (en) * 1994-12-13 1998-05-05 International Business Machines Corporation Multimodal natural language interface for cross-application tasks
US5943643A (en) * 1995-04-13 1999-08-24 Canon Kabushiki Kaisha Language processing method and apparatus
US6839670B1 (en) * 1995-09-11 2005-01-04 Harman Becker Automotive Systems Gmbh Process for automatic control of one or more devices by voice commands or by real-time voice dialog and apparatus for carrying out this process
US20010013051A1 (en) * 1997-06-10 2001-08-09 Akifumi Nakada Message handling method, message handling apparatus, and memory media for storing a message handling apparatus controlling program
US5956668A (en) * 1997-07-18 1999-09-21 At&T Corp. Method and apparatus for speech translation with unrecognized segments
US6327343B1 (en) * 1998-01-16 2001-12-04 International Business Machines Corporation System and methods for automatic call and data transfer processing
US20010011217A1 (en) * 1998-12-31 2001-08-02 Egbert Ammicht User barge-in enablement in large vocabulary speech recognition systems
US20020010742A1 (en) * 1999-01-04 2002-01-24 Fujitsu Limited Communication assistance method and device
US6223150B1 (en) * 1999-01-29 2001-04-24 Sony Corporation Method and apparatus for parsing in a spoken language translation system
US6266642B1 (en) * 1999-01-29 2001-07-24 Sony Corporation Method and portable apparatus for performing spoken language translation
US20040199375A1 (en) * 1999-05-28 2004-10-07 Farzad Ehsani Phrase-based dialogue modeling with particular application to creating a recognition grammar for a voice-controlled user interface
US6553345B1 (en) * 1999-08-26 2003-04-22 Matsushita Electric Industrial Co., Ltd. Universal remote control allowing natural language modality for television and multimedia searches and requests
US6701294B1 (en) * 2000-01-19 2004-03-02 Lucent Technologies, Inc. User interface for translating natural language inquiries into database queries and data presentations
US20020094067A1 (en) * 2001-01-18 2002-07-18 Lucent Technologies Inc. Network provided information using text-to-speech and speech recognition and text or speech activated network control sequences for complimentary feature access
US20020156688A1 (en) * 2001-02-21 2002-10-24 Michel Horn Global electronic commerce system
US20020161579A1 (en) * 2001-04-26 2002-10-31 Speche Communications Systems and methods for automated audio transcription, translation, and transfer
US20020169592A1 (en) * 2001-05-11 2002-11-14 Aityan Sergey Khachatur Open environment for real-time multilingual communication
US20030033312A1 (en) * 2001-08-09 2003-02-13 Atsuko Koizumi Method of interpretation service for voice on the phone
US20030046062A1 (en) * 2001-08-31 2003-03-06 Cartus John R. Productivity tool for language translators
US7343290B2 (en) * 2001-09-26 2008-03-11 Nuance Communications, Inc. System and method of switching between dialog systems with separate dedicated communication units
US20030105634A1 (en) * 2001-10-15 2003-06-05 Alicia Abella Method for dialog management
US6816578B1 (en) * 2001-11-27 2004-11-09 Nortel Networks Limited Efficient instant messaging using a telephony interface
US20030110023A1 (en) * 2001-12-07 2003-06-12 Srinivas Bangalore Systems and methods for translating languages
US20030120478A1 (en) * 2001-12-21 2003-06-26 Robert Palmquist Network-based translation system
US20040019487A1 (en) * 2002-03-11 2004-01-29 International Business Machines Corporation Multi-modal messaging
US20030187641A1 (en) * 2002-04-02 2003-10-02 Worldcom, Inc. Media translator
US20040122678A1 (en) * 2002-12-10 2004-06-24 Leslie Rousseau Device and method for translating language
US20040183749A1 (en) * 2003-03-21 2004-09-23 Roel Vertegaal Method and apparatus for communication between humans and devices
US20050049874A1 (en) * 2003-09-03 2005-03-03 International Business Machines Corporation Method and apparatus for dynamic modification of command weights in a natural language understanding system
US20050131684A1 (en) * 2003-12-12 2005-06-16 International Business Machines Corporation Computer generated prompting

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080243476A1 (en) * 2005-05-06 2008-10-02 International Business Machines Corporation Voice Prompts for Use in Speech-to-Speech Translation System
US8560326B2 (en) * 2005-05-06 2013-10-15 International Business Machines Corporation Voice prompts for use in speech-to-speech translation system
US20070061152A1 (en) * 2005-09-15 2007-03-15 Kabushiki Kaisha Toshiba Apparatus and method for translating speech and performing speech synthesis of translation result
US20090234633A1 (en) * 2008-03-17 2009-09-17 Virginia Chao-Suren Systems and methods for enabling inter-language communications
US20140288919A1 (en) * 2010-08-05 2014-09-25 Google Inc. Translating languages
US10817673B2 (en) 2010-08-05 2020-10-27 Google Llc Translating languages
US10025781B2 (en) * 2010-08-05 2018-07-17 Google Llc Network based speech to speech translation
US9922641B1 (en) * 2012-10-01 2018-03-20 Google Llc Cross-lingual speaker adaptation for multi-lingual speech synthesis
US20140180671A1 (en) * 2012-12-24 2014-06-26 Maria Osipova Transferring Language of Communication Information
US9905220B2 (en) 2013-12-30 2018-02-27 Google Llc Multilingual prosody generation
US9195656B2 (en) 2013-12-30 2015-11-24 Google Inc. Multilingual prosody generation
US9639854B2 (en) 2014-06-26 2017-05-02 Nuance Communications, Inc. Voice-controlled information exchange platform, such as for providing information to supplement advertising
US11055739B2 (en) 2014-06-26 2021-07-06 Nuance Communications, Inc. Using environment and user data to deliver advertisements targeted to user interests, e.g. based on a single command
US10643235B2 (en) 2014-06-26 2020-05-05 Nuance Communications, Inc. Using environment and user data to deliver advertisements targeted to user interests, e.g. based on a single command
US9639855B2 (en) 2014-06-26 2017-05-02 Nuance Communications, Inc. Dynamic embedded recognizer and preloading on client devices grammars for recognizing user inquiries and responses
US9626695B2 (en) * 2014-06-26 2017-04-18 Nuance Communications, Inc. Automatically presenting different user experiences, such as customized voices in automated communication systems
US20150379981A1 (en) * 2014-06-26 2015-12-31 Nuance Communications, Inc. Automatically presenting different user experiences, such as customized voices in automated communication systems
US9953631B1 (en) * 2015-05-07 2018-04-24 Google Llc Automatic speech recognition techniques for multiple languages
US10777096B2 (en) * 2015-12-28 2020-09-15 Amazon Technologies, Inc. System for assisting in foreign language learning
US20170186338A1 (en) * 2015-12-28 2017-06-29 Amazon Technologies, Inc. System for assisting in foreign language learning
US10403291B2 (en) 2016-07-15 2019-09-03 Google Llc Improving speaker verification across locations, languages, and/or dialects
US11017784B2 (en) 2016-07-15 2021-05-25 Google Llc Speaker verification across locations, languages, and/or dialects
US11594230B2 (en) 2016-07-15 2023-02-28 Google Llc Speaker verification
US11049493B2 (en) * 2016-07-28 2021-06-29 National Institute Of Information And Communications Technology Spoken dialog device, spoken dialog method, and recording medium
US20200193965A1 (en) * 2018-12-13 2020-06-18 Language Line Services, Inc. Consistent audio generation configuration for a multi-modal language interpretation system
US20220276829A1 (en) * 2019-03-04 2022-09-01 Giide Audio, Inc. Interactive podcast platform with integrated additional audio/visual content
US11443737B2 (en) * 2020-01-14 2022-09-13 Sony Corporation Audio video translation into multiple languages for respective listeners
US20220084523A1 (en) * 2020-09-11 2022-03-17 Avaya Management L.P. Multilingual transcription at customer endpoint for optimizing interaction results in a contact center
US11862169B2 (en) * 2020-09-11 2024-01-02 Avaya Management L.P. Multilingual transcription at customer endpoint for optimizing interaction results in a contact center
US11361780B2 (en) * 2021-12-24 2022-06-14 Sandeep Dhawan Real-time speech-to-speech generation (RSSG) apparatus, method and a system therefore

Also Published As

Publication number Publication date
US8560326B2 (en) 2013-10-15
US20080243476A1 (en) 2008-10-02

Similar Documents

Publication Publication Date Title
US8560326B2 (en) Voice prompts for use in speech-to-speech translation system
US20200410174A1 (en) Translating Languages
JP5967569B2 (en) Speech processing system
US20080140398A1 (en) System and a Method For Representing Unrecognized Words in Speech to Text Conversions as Syllables
JP2017058673A (en) Dialog processing apparatus and method, and intelligent dialog processing system
CN109543021B (en) Intelligent robot-oriented story data processing method and system
JP2019090942A (en) Information processing unit, information processing system, information processing method and information processing program
KR20160081244A (en) Automatic interpretation system and method
JP6832503B2 (en) Information presentation method, information presentation program and information presentation system
JP6290479B1 (en) Speech translation device, speech translation method, and speech translation program
JP2020113150A (en) Voice translation interactive system
JP6353860B2 (en) Speech translation device, speech translation method, and speech translation program
JP6310950B2 (en) Speech translation device, speech translation method, and speech translation program
WO2017122657A1 (en) Speech translation device, speech translation method, and speech translation program
JP6383748B2 (en) Speech translation device, speech translation method, and speech translation program
JP2018163581A (en) Voice translation device, voice translation method, and voice translation program
Kodirov et al. Implementation of web application based on Augmentative and Alternative Communication (AAC) method for People with Hearing and Speech Impairment
Roy et al. Voice E-Mail Synced with Gmail for Visually Impaired
JP2002132291A (en) Natural language interaction processor and method for the same as well as memory medium for the same
JP6110539B1 (en) Speech translation device, speech translation method, and speech translation program
JP6856277B1 (en) Automatic voice translation system that sets the translation language by voice input, automatic voice translation method and its program
US11902466B2 (en) Captioned telephone service system having text-to-speech and answer assistance functions
JP2020119043A (en) Voice translation system and voice translation method
WO2023026544A1 (en) Information processing device, information processing method, and program
JP2015036826A (en) Communication processor, communication processing method and communication processing program

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GAO, YUQING,;GU, LIANG;LIU, FU-HUA;REEL/FRAME:016544/0847;SIGNING DATES FROM 20050618 TO 20050701

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION