WO2001042875A2 - Language translation voice telephony - Google Patents

Language translation voice telephony Download PDF

Info

Publication number
WO2001042875A2
WO2001042875A2 PCT/US2000/042472 US0042472W WO0142875A2 WO 2001042875 A2 WO2001042875 A2 WO 2001042875A2 US 0042472 W US0042472 W US 0042472W WO 0142875 A2 WO0142875 A2 WO 0142875A2
Authority
WO
WIPO (PCT)
Prior art keywords
language
text
stream
standard reference
words
Prior art date
Application number
PCT/US2000/042472
Other languages
French (fr)
Other versions
WO2001042875A3 (en
Inventor
Ralph Samuel Hoefelmeyer
James Patrick Brechtel
Original Assignee
Mci Worldcom, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mci Worldcom, Inc. filed Critical Mci Worldcom, Inc.
Priority to AU45126/01A priority Critical patent/AU4512601A/en
Publication of WO2001042875A2 publication Critical patent/WO2001042875A2/en
Publication of WO2001042875A3 publication Critical patent/WO2001042875A3/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2203/00Aspects of automatic or semi-automatic exchanges
    • H04M2203/20Aspects of automatic or semi-automatic exchanges related to features of supplementary services
    • H04M2203/2061Language aspects

Definitions

  • the present invention relates to telecommunications and more particularly to the conversion of a voice message of one language to a corresponding voice or text message of a second language in .a grammatically correct manner.
  • the translation system of the present invention provides an abstracted, or generalized, capability for effecting language translation among different human languages, both written and spoken. This method is based on a context-centric basis and focuses specifically to an object oriented technique.
  • the incoming speech signal Is first stored in a voice buffer.
  • the speech stream that is stored in the buffer is broken down into phoneme sets. This could be done to an analog input signal by sensing the distinct changes in the state of the carrier signal.
  • the silence symbol of the digitized voice stream can be used for separating the voice stream into the phoneme sets.
  • the speech is converted into texts.
  • the texts most likely in ASCII format, are stored in a text buffer.
  • the individual text words in the buffer are then tokenized so as to generate a token that is unique to each word.
  • the tokenized words are used as keys for effecting a fast retrieval or look-up in a tokenized dictionary of a given standard reference language.
  • a pattern matching mechanism is then used to validate the tokenized texts against the tokenized dictionary of the standard reference language.
  • the tokenized texts in the text buffer are translated by means of a translation mechanism that is specific to the targeted language. This is done by applying grammatical rules using a grammar engine that parses the tokenized words in the buffer and applies the grammatical rules of the target language thereto. Once the text words have been translated into the target language, they are compressed and packetized. Thereafter, the packetized packets could be transmitted.
  • the packetized text words could be transmitted directly, for example in the form of an email or fax.
  • the packetized words are fed to a voice synthesizer for conversion, so that the voice output Is in the target language. If there is compression of the tokenized texts, decompression is used for decompressing the tokenized texts.
  • Fig. 1 is a high level block diagram of the architecture of the system of the present invention.
  • Fig. 2 is a flow chart illustrating the method in which an input voice stream is translated into a target language data stream.
  • GTS general translation system
  • an input voice stream 2 from an input transmission medium such as for example either a landline or wireless phone, -is received by a conventional voice signal receiver 4.
  • the voice stream is then routed to a speech to text converter 6 that converts the voice stream into text.
  • Converter 6 could be a hardwired converter or a processor that runs any one of a number of conventional speech to text conversion programs such as for example Dragon Naturally Speaking by the Dragon System Inc. of Newton, Massachusetts that enables the conversion of spoken words into texts.
  • the technology for converting speech to texts is well known and two such systems are described in U.S. patent 5,031 ,113 assigned to the U.S. Phillips Corporation and U.S. patent 5,754,978 assigned to the Speech Systems of Colorado Company. The disclosures of the '113 and '978 patents are incorporated by reference herein.
  • the incoming voice stream could be either an analog voice stream or a stream of digitized voice packets.
  • the voice stream is chunked into phoneme sets for storage in a text buffer, or a series of text buffers 8.
  • the partition of the voice stream may be done by the pauses that are inherent in the spoken words uttered by a speaker.
  • One such system for recognizing speech and dividing the speech into phoneme sets is described in U.S. patent 5,646,490, assigned to the Fonix Corporation. The disclosure of the '490 patent is incorporated by reference herein.
  • a tokenizing algorithm so as to build a tree of tokenized words, which are tied to a standard reference language.
  • the NeoData program generates a grammar object based on the grammar of the source language, and then expresses the source grammar object in terms of the grammar object of a standard reference language.
  • the NeoData program therefore provides a transform generator that, in receipt of input transforms and data strings, would convert them into new transform outputs.
  • the technology behind the NeoData program is disclosed in U.S. patent 5,942,002 assigned to the NeoCore Corporation. The disclosure of the '002 patent is incorporated by reference herein.
  • the source language in addition to having a particular grammar also has a given vocabulary.
  • the grammar and the vocabulary of the source language are combined to generate a semantics object that is expressed in terms of the standard reference language.
  • the standard reference language can be any language, such as for example English, that has a sufficiently rich vocabulary and grammar so as to be able to act as a standard from which other languages could be compared to and translated from.
  • the vocabulary class is basically an object that points to a given place in a dictionary of the particular language for definition. So, too, any language has its own grammar object classes, such as for example nouns, subjects, objects, predicates, possessives, interrogatives, etc. that are common in a language such as for example the English language.
  • the grammar object class encapsulates state models and language meaning modifiers for the language. And once the actual state models and language meaning modifiers are defined for a language such as for example the standard reference language, specific grammar classes for the language could be derived.
  • the standard reference language further has a semantics object that in essence provides a study of the meanings, significance and changes in the various words and phrases of the language, and the linguistic development of the meaning and relationship of the various words of the language.
  • the vocabulary object and the grammar object being combined to generate the semantics object
  • the source language, and also the standard reference language would each have objects that comprise the vocabulary class 10, the grammar class 12 and the semantics class 14, as shown in the source language module 16 of Fig. 1.
  • Source language module 16 is further shown to be connected to a standard reference language module 18.
  • the interconnection between source language module 16 and standard reference language module "18 allows the matching of the various classes between the source language and the standard reference language. This may be referred to as a pattern matching for looking up text, words, or phrases in the standard reference language that correspond to the transformed texts, words, or phrases in the source language.
  • a semantics object is created with the standard reference language that is a combination of the dictionary, grammar and semantics classes of the source language. Therefore, a meaningful translation of the voice stream in a standard reference language text could be generated.
  • the texts generated from the mapping of the source language texts with the standard reference language texts could be done using the method and architecture as described in U.S. patent 5,677,835 assigned to the Catipillar Inc.
  • the "835 patent discloses that source texts may be converted into target texts by using a constraints source language analyzer and a machine translation generator.
  • the disclosure of the '835 patent is incorporated by reference herein.
  • the thus translated or derived source language text stream is stored in a source reference language mapped text store 20.
  • a selector 22 would select from among a number of target language modules 24a-24n for mapping therewith the translated voice stream texts based on the standard reference language.
  • Another mapping process such as that taught in the '835 patent whereby the translated voice stream based on the standard reference language is mapped to the target language is effected so that a text stream that now is based on the targeted language is sent to and stored in a target language text buffer 26.
  • the translated target language text stream could be output a number of ways via a transmission output medium such as for example la ⁇ dline or wireless telephony.
  • the translated voice stream now based on the target language is output to a voice synthesizer 28 so that a voice stream based on the target language is output to the listener, who presumably is a speaker of the target language.
  • Such translated speech may be output as voice packets in a telephony environment with insignificant lag time.
  • the reverse process takes place in a two-way voice communication, as the listener then becomes the speaker and the same process as described above, but in the reverse order, will take place and will continue until the conversation is terminated.
  • the input voice stream is to be output as texts such as for example an email or fax to the receiving party, or a braille message if the receiving party happens to be blind
  • the translated texts stored in the target language text buffer 26 are output as a text stream.
  • the various modules as noted in Fig. 1 could be program applications that reside and run in a computer or processor means such as for example a Pentium based personal computer.
  • the various buffers may be high speed and high capacity IC memory chips built onto a board inserted into any one of the available slots of the personal computer.
  • Fig. 2 provides a flow chart for illustrating the method of how the voice stream in the source language is translated to a standard reference language, and the subsequent translation of the standard reference language text stream to a target language text stream.
  • a voice stream when a voice stream is received at the processor, it is converted into texts and chunked into segments using some natural division or pauses in the voice stream
  • the chunked voice text segments are then stored into a text buffer such as buffer 8. This is illustrated in step
  • step 32 the stored texts are processed and mapped onto a source language object. This is done by matching each word to the vocabulary and placing each word in its grammatical context, and then matching the word with the semantics object of the language so as to place the word into its proper context as being used in the input voice stream. Having done that, the word could then be mapped with the standard reference language, as illustrated in process step 34. And with the vocabulary, grammar and semantics classes or objects having been established for each word, the now mapped standard reference language word is next related to the target language, by means of the different objects of the target language, per step 36. Once that is done and a target language object is generated, a target voice stream is created and provided to a buffer of the target language, per step 38. Thereafter, the now converted voice stream, in the form of a text stream, is output from the voice buffer per step 40. As was mentioned previously, the target language texts could be output in either a speech or a text format.

Abstract

To allow speakers of different languages to communicate with each other, either verbally or in written words, the present invention takes the incoming voice stream, which could be either analog or digital, and breaks it into appropriate segments. The appropriately broken down segments are then converted from a voice stream into a text stream and stored in a text buffer. The texts, and particularly the words in the text, are classified according to its vocabulary, grammar, and semantics classes to generate a source language object (Figure 1). Once that is done and the text stream is parsed into a tree format, the texts are mapped onto a standard reference language to generate a standard reference language object.

Description

Title: Language Translation Voice Telephony
Field of the Invention
The present invention relates to telecommunications and more particularly to the conversion of a voice message of one language to a corresponding voice or text message of a second language in .a grammatically correct manner.
Background of the Invention
In the present global economy, peoples who speak different languages often have to communicate with each other. This communication between persons who speak different languages often are done either by voice or by texts such as emails. To communicate effectively, oftentimes persons who speak different languages would have to settle on a common language. Yet due to the oftentimes idiomatic and grammatic usage of a language, such communication may become confusing and misunderstood. Moreover, the translation of one language into another language is often done manually by a human translator, therefore necessitating a time delay as well as the cost of the translator.
In a telephony environment where the conversation between two persons are held in real time and the exchange of texts such as emails between the parties are held in substantially real time, the use of translators would at best be awkward and time consuming, and at worst not work. Thus, if two parties who speak different languages were to communicate effectively, without having to require the services of translators, a method of readily converting a voice stream in one language into either spoken or written texts in an other language in substantially real time is needed.
Summary of the Invention
The translation system of the present invention provides an abstracted, or generalized, capability for effecting language translation among different human languages, both written and spoken. This method is based on a context-centric basis and focuses specifically to an object oriented technique.
In particular, in a telephony environment, the incoming speech signal, either analog or digital, Is first stored in a voice buffer. The speech stream that is stored in the buffer is broken down into phoneme sets. This could be done to an analog input signal by sensing the distinct changes in the state of the carrier signal. In the case of a digital signal such as a voice packet stream, the silence symbol of the digitized voice stream can be used for separating the voice stream into the phoneme sets.
Once given the various phoneme sets, the speech is converted into texts. The texts, most likely in ASCII format, are stored in a text buffer.
The individual text words in the buffer are then tokenized so as to generate a token that is unique to each word. The tokenized words are used as keys for effecting a fast retrieval or look-up in a tokenized dictionary of a given standard reference language. A pattern matching mechanism is then used to validate the tokenized texts against the tokenized dictionary of the standard reference language. Thereafter, the tokenized texts in the text buffer are translated by means of a translation mechanism that is specific to the targeted language. This is done by applying grammatical rules using a grammar engine that parses the tokenized words in the buffer and applies the grammatical rules of the target language thereto. Once the text words have been translated into the target language, they are compressed and packetized. Thereafter, the packetized packets could be transmitted.
In the case of a communication in a textual format, the packetized text words could be transmitted directly, for example in the form of an email or fax. On the other hand, if it is a voice communication, then the packetized words are fed to a voice synthesizer for conversion, so that the voice output Is in the target language. If there is compression of the tokenized texts, decompression is used for decompressing the tokenized texts.
It is therefore an objective of the present invention to provide. a method of translating in substantially real time an incoming voice message of one language into a communicative output of a different language.
It is another objective of the present invention to translate an input voice stream of one language into either an output voice stream or an output text stream of another language. It is yet another objective of the present invention to allow persons who speak or understand different languages to communicate directly without the need for translators.
Brief Description of the Figures The above-mentioned objectives and advantages of the present invention will become apparent and the invention itself will be best understood by reference to the following description of an embodiment of the invention taken in conjunction with the accompanying drawings, wherein:
Fig. 1 is a high level block diagram of the architecture of the system of the present invention; and
Fig. 2 is a flow chart illustrating the method in which an input voice stream is translated into a target language data stream.
Detailed Description of the invention With reference to Fig. 1 , a general translation system (GTS) architecture of the present invention for converting an input voice stream of one language into an output voice or text stream of another language is shown.
In particular, an input voice stream 2, from an input transmission medium such as for example either a landline or wireless phone, -is received by a conventional voice signal receiver 4. The voice stream is then routed to a speech to text converter 6 that converts the voice stream into text. Converter 6 could be a hardwired converter or a processor that runs any one of a number of conventional speech to text conversion programs such as for example Dragon Naturally Speaking by the Dragon System Inc. of Newton, Massachusetts that enables the conversion of spoken words into texts. Moreover, the technology for converting speech to texts is well known and two such systems are described in U.S. patent 5,031 ,113 assigned to the U.S. Phillips Corporation and U.S. patent 5,754,978 assigned to the Speech Systems of Colorado Company. The disclosures of the '113 and '978 patents are incorporated by reference herein.
The incoming voice stream could be either an analog voice stream or a stream of digitized voice packets. In any event, the voice stream is chunked into phoneme sets for storage in a text buffer, or a series of text buffers 8. The partition of the voice stream may be done by the pauses that are inherent in the spoken words uttered by a speaker. And there are a number of conventional systems available for identifying the phoneme sound types that are contained in an audio speech stream. One such system for recognizing speech and dividing the speech into phoneme sets is described in U.S. patent 5,646,490, assigned to the Fonix Corporation. The disclosure of the '490 patent is incorporated by reference herein.
The texts in the text buffer 8, which are in the language of the input voice stream, i.e., the source language, are then parsed. Each word of the text stream is then tokenized by a tokenizing algorithm so as to build a tree of tokenized words, which are tied to a standard reference language. This could be done by utilizing the NeoData program by the NeoCore Corporation of Colorado Springs. In essence, the NeoData program generates a grammar object based on the grammar of the source language, and then expresses the source grammar object in terms of the grammar object of a standard reference language. The NeoData program therefore provides a transform generator that, in receipt of input transforms and data strings, would convert them into new transform outputs. The technology behind the NeoData program is disclosed in U.S. patent 5,942,002 assigned to the NeoCore Corporation. The disclosure of the '002 patent is incorporated by reference herein.
The source language in addition to having a particular grammar also has a given vocabulary. By means of the tokenizing technique as disclosed in the '002 patent, the grammar and the vocabulary of the source language are combined to generate a semantics object that is expressed in terms of the standard reference language. The standard reference language can be any language, such as for example English, that has a sufficiently rich vocabulary and grammar so as to be able to act as a standard from which other languages could be compared to and translated from.
The vocabulary class is basically an object that points to a given place in a dictionary of the particular language for definition. So, too, any language has its own grammar object classes, such as for example nouns, subjects, objects, predicates, possessives, interrogatives, etc. that are common in a language such as for example the English language. In other words, the grammar object class encapsulates state models and language meaning modifiers for the language. And once the actual state models and language meaning modifiers are defined for a language such as for example the standard reference language, specific grammar classes for the language could be derived.
The standard reference language further has a semantics object that in essence provides a study of the meanings, significance and changes in the various words and phrases of the language, and the linguistic development of the meaning and relationship of the various words of the language. In essence, with the vocabulary object and the grammar object being combined to generate the semantics object, the source language, and also the standard reference language, would each have objects that comprise the vocabulary class 10, the grammar class 12 and the semantics class 14, as shown in the source language module 16 of Fig. 1.
Source language module 16 is further shown to be connected to a standard reference language module 18. The interconnection between source language module 16 and standard reference language module "18 allows the matching of the various classes between the source language and the standard reference language. This may be referred to as a pattern matching for looking up text, words, or phrases in the standard reference language that correspond to the transformed texts, words, or phrases in the source language. By thus patterning or correlating the source language with the standard reference language, a semantics object . is created with the standard reference language that is a combination of the dictionary, grammar and semantics classes of the source language. Therefore, a meaningful translation of the voice stream in a standard reference language text could be generated. The texts generated from the mapping of the source language texts with the standard reference language texts could be done using the method and architecture as described in U.S. patent 5,677,835 assigned to the Catipillar Inc. In brief, the "835 patent discloses that source texts may be converted into target texts by using a constraints source language analyzer and a machine translation generator. The disclosure of the '835 patent is incorporated by reference herein.
The thus translated or derived source language text stream is stored in a source reference language mapped text store 20. Depending on the language to be targeted, a selector 22 would select from among a number of target language modules 24a-24n for mapping therewith the translated voice stream texts based on the standard reference language. Another mapping process such as that taught in the '835 patent whereby the translated voice stream based on the standard reference language is mapped to the target language is effected so that a text stream that now is based on the targeted language is sent to and stored in a target language text buffer 26. From there, the translated target language text stream could be output a number of ways via a transmission output medium such as for example laπdline or wireless telephony.
For example, if it is a voice-to-voice communication between two speakers, the translated voice stream now based on the target language is output to a voice synthesizer 28 so that a voice stream based on the target language is output to the listener, who presumably is a speaker of the target language. Such translated speech may be output as voice packets in a telephony environment with insignificant lag time. Of course, the reverse process takes place in a two-way voice communication, as the listener then becomes the speaker and the same process as described above, but in the reverse order, will take place and will continue until the conversation is terminated.
In the event that the input voice stream is to be output as texts such as for example an email or fax to the receiving party, or a braille message if the receiving party happens to be blind, then the translated texts stored in the target language text buffer 26 are output as a text stream.
Note that the various modules as noted in Fig. 1 could be program applications that reside and run in a computer or processor means such as for example a Pentium based personal computer. Furthermore, the various buffers may be high speed and high capacity IC memory chips built onto a board inserted into any one of the available slots of the personal computer.
Fig. 2 provides a flow chart for illustrating the method of how the voice stream in the source language is translated to a standard reference language, and the subsequent translation of the standard reference language text stream to a target language text stream.
In particular, when a voice stream is received at the processor, it is converted into texts and chunked into segments using some natural division or pauses in the voice stream The chunked voice text segments are then stored into a text buffer such as buffer 8. This is illustrated in step
30. Thereafter, in step 32, the stored texts are processed and mapped onto a source language object. This is done by matching each word to the vocabulary and placing each word in its grammatical context, and then matching the word with the semantics object of the language so as to place the word into its proper context as being used in the input voice stream. Having done that, the word could then be mapped with the standard reference language, as illustrated in process step 34. And with the vocabulary, grammar and semantics classes or objects having been established for each word, the now mapped standard reference language word is next related to the target language, by means of the different objects of the target language, per step 36. Once that is done and a target language object is generated, a target voice stream is created and provided to a buffer of the target language, per step 38. Thereafter, the now converted voice stream, in the form of a text stream, is output from the voice buffer per step 40. As was mentioned previously, the target language texts could be output in either a speech or a text format.
Inasmuch as the present invention is subject to many variations, modifications and changes in detail, it is intended that all matter described throughout this specification and shown in the accompanying drawings be interpreted as illustrative only and not in a limiting sense. For example, even though the present invention discussed so far relates to the conversion of an incoming voice message, in practice, an incoming text message in one language could be converted just as well as the former. Such translation of an input text message may occur, for example, in the guise of a received email of one language that requires translation to another language, or for that matter, to an output voice message. In such input text scenario, there is no need for any speech to voice translation process. Accordingly, it is intended that the invention be limited only the spirit and scope of the hereto attached claims.

Claims

Claims
1. A method of translating a source language into a target language, comprising the steps of: breaking an incoming voice stream of said source language into phoneme sets; converting said phoneme sets of said incoming voice stream into a text stream of a standard reference language; storing said text stream into a text buffer; translating said text words in said text buffer into text words of said standard reference language; and converting said translated text words of said standard reference language into words of said target language.
2. Method of claim 1 , wherein said translating step comprises the steps of: parsing each word of said text stream; correlating said each word with the vocabulary of said standard reference language to obtain a vocabulary object; defining a grammar object of said each word based on the grammar of said source language; and deriving from a combination of said vocabulary and grammar objects a semantics object for said source language.
3. Method of claim 2, further comprising the steps of: expressing said semantics object for said source language in said standard reference language; mapping said semantics object of said standard reference language with said target language; and outputting a target language object patterned from said standard reference language.
4. Method of claim 2, further comprising the step of : tokenizing the words of said text stream in said text buffer before said correlating step.
5. Method of claim 1 , wherein said words converted from said source language into said target language can be either spoken or written texts.
6. Method of claim 1 , wherein said translating step comprises the steps of: defining a grammar object class that encapsulates state models and language meaning modifiers for said standard reference language; deriving language specific grammar classes wherein the actual state models and language meaning modifiers are defined for said standard reference language; and using said derived language specific grammar classes to generate a stream of words in said target language that corresponds semantically to said source language.
7. Method of claim 1 , wherein said incoming voice stream comprises voice packets.
8. Method of claim 1 , further comprising the step of: outputting said converted words of said target language as voice packets..
9. A method of converting an input voice stream of one language into an output of an other language, comprising the steps of: receiving from an input transmission medium said input voice stream; breaking said input voice stream into phoneme sets; converting said phoneme sets of said input voice stream into a text stream of a standard reference language; storing said text stream into a text buffer; translating the words of said text stream in said text buffer into text words of said standard reference language; converting said translated text words of said standard reference language into words of said other language; combining the words of said other language into an output stream; and outputting said output stream onto an output transmission medium.
10. Method of claim 9, wherein said translating step comprises the steps of: parsing each word of said text stream; correlating said each word with the vocabulary of said standard reference language to obtain a vocabulary object; defining a grammar object of said each word based on the grammar of said one language; and deriving from a combination of said vocabulary and grammar objects a semantics object for said other language.
11. Method of claim 10, further comprising the steps of: expressing said semantics object for said one language in said standard reference language; mapping said semantics object of said standard reference language with said other language; and outputting an object of said other language patterned from said standard reference language.
12. Method of claim 9, further comprising the step of : tokenizing the text words of said text stream in said text buffer so that said words are readily retrievable after said breaking step; and packetizing said output stream into voice packets before outputting said voice packets onto said output transmission medium.
13. Method of claim 9, wherein said words converted from said one language into said other language can be either spoken or written texts.
14. Apparatus for translating a source language into a target language, comprising: means for breaking an incoming voice stream of said source language into phoneme sets; means for converting said phoneme sets of said incoming voice stream into a text stream of a standard reference language; means for storing said text stream into a text buffer; means for translating said text words in said text buffer into text words of said standard reference language; and means for converting said translated text words of said standard reference language into words of said target language.
15. Apparatus of claim 14, further comprising: means for parsing each word of said text stream; means for correlating said each word with the vocabulary of said standard reference language to obtain a vocabulary object; means for defining a grammar object of said each word based on the grammar of said source language; and means for deriving from a combination of said vocabulary and grammar objects a semantics object for said source language.
16. Apparatus of claim 15, further comprising: means for expressing said semantics object for said source language in said standard reference language; and means for mapping said semantics object of said standard reference language with said target language; and means for outputting said target language mapped from said standard reference language.
17. Apparatus of claim 14, further comprising: means for tokenizing the text words of said text stream in said text buffer; and means for packetizing said tokentized text words into output packets for said target language; and means for outputting said target language packets onto an output transmission medium.
18. Apparatus of claim 14, wherein said words converted from said source language into said target language can be either spoken or written texts.
19. A system for converting an input voice stream of one language into an output of an other language, comprising: processor means; receiver means workingly connected to said processor means for receiving from an input transmission medium said input voice stream, said input voice stream being routed to said processor means; said processor means including module means for breaking said input voice stream into phoneme sets; module means for converting said'phoneme sets of said input voice stream into a text stream of a standard reference language; store means electrically connected to said processor means for storing said text stream into a text buffer; said processor means further including module means for translating the words of said text stream in said text buffer into text words of said standard reference language; module means for converting said translated text words of said standard reference language into words of said other language; module means for combining the words of said other language into an output stream; and transmitting means electrically connected to an output medium for outputting said output stream onto said output transmission medium.
20. System of claim 19, wherein said translating module means further performs the operations of: parsing each word of said text stream; correlating said each word with the vocabulary of said standard reference language to obtain a vocabulary object; defining a grammar object of said each word based on the grammar of said one language; and deriving from a combination of said vocabulary and grammar objects a semantics object for said other language.
21. System of claim 20, wherein said translating module means further performs the operations of: expressing said semantics object for said one language in said standard reference language; and mapping said semantics object of said standard reference language with said other language; and outputting as the other language mapped from said standard reference language.
PCT/US2000/042472 1999-12-02 2000-12-01 Language translation voice telephony WO2001042875A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU45126/01A AU4512601A (en) 1999-12-02 2000-12-01 Language translation voice telephony

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US45295999A 1999-12-02 1999-12-02
US09/452,959 1999-12-02

Publications (2)

Publication Number Publication Date
WO2001042875A2 true WO2001042875A2 (en) 2001-06-14
WO2001042875A3 WO2001042875A3 (en) 2001-11-01

Family

ID=23798664

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/042472 WO2001042875A2 (en) 1999-12-02 2000-12-01 Language translation voice telephony

Country Status (2)

Country Link
AU (1) AU4512601A (en)
WO (1) WO2001042875A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1280320A1 (en) * 2001-07-26 2003-01-29 Siemens Aktiengesellschaft Mobile radio telephone comprising text entry and dictionary function
GB2450186A (en) * 2007-06-11 2008-12-17 Avaya Gmbh & Co Kg Operating a voice mail system
CN111144138A (en) * 2019-12-17 2020-05-12 Oppo广东移动通信有限公司 Simultaneous interpretation method and device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4994966A (en) * 1988-03-31 1991-02-19 Emerson & Stern Associates, Inc. System and method for natural language parsing by initiating processing prior to entry of complete sentences
US5477451A (en) * 1991-07-25 1995-12-19 International Business Machines Corp. Method and system for natural language translation
US5642519A (en) * 1994-04-29 1997-06-24 Sun Microsystems, Inc. Speech interpreter with a unified grammer compiler

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4994966A (en) * 1988-03-31 1991-02-19 Emerson & Stern Associates, Inc. System and method for natural language parsing by initiating processing prior to entry of complete sentences
US5477451A (en) * 1991-07-25 1995-12-19 International Business Machines Corp. Method and system for natural language translation
US5768603A (en) * 1991-07-25 1998-06-16 International Business Machines Corporation Method and system for natural language translation
US5642519A (en) * 1994-04-29 1997-06-24 Sun Microsystems, Inc. Speech interpreter with a unified grammer compiler

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FITZPATRICK E.: 'Parsing for prosody: what a text-to-speech system needs from syntax' IEEE CATALOUQUE NUMBER: CH2715-1/89/0000/0188 1989, XP002939199 *
KITANI T.: 'A japanese preprocessor for syntactic and semantic parsing' IEEE 1993, pages 86 - 92, ISSN: 1043-0989, XP002939200 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1280320A1 (en) * 2001-07-26 2003-01-29 Siemens Aktiengesellschaft Mobile radio telephone comprising text entry and dictionary function
GB2450186A (en) * 2007-06-11 2008-12-17 Avaya Gmbh & Co Kg Operating a voice mail system
US8300774B2 (en) 2007-06-11 2012-10-30 Avaya Gmbh & Co. Kg Method for operating a voice mail system
CN111144138A (en) * 2019-12-17 2020-05-12 Oppo广东移动通信有限公司 Simultaneous interpretation method and device and storage medium

Also Published As

Publication number Publication date
AU4512601A (en) 2001-06-18
WO2001042875A3 (en) 2001-11-01

Similar Documents

Publication Publication Date Title
US7593842B2 (en) Device and method for translating language
JP3672800B2 (en) Voice input communication system
US20030115059A1 (en) Real time translator and method of performing real time translation of a plurality of spoken languages
US20040073423A1 (en) Phonetic speech-to-text-to-speech system and method
WO2003052624A1 (en) A real time translator and method of performing real time translation of a plurality of spoken word languages
WO2008084476A2 (en) Vowel recognition system and method in speech to text applications
US20090037170A1 (en) Method and apparatus for voice communication using abbreviated text messages
JP2011504624A (en) Automatic simultaneous interpretation system
US20080300855A1 (en) Method for realtime spoken natural language translation and apparatus therefor
RU2419142C2 (en) Method to organise synchronous interpretation of oral speech from one language to another by means of electronic transceiving system
JPH0965424A (en) Automatic translation system using radio portable terminal equipment
JPH07129594A (en) Automatic interpretation system
CN102196100A (en) Instant call translation system and method
WO2001042875A2 (en) Language translation voice telephony
KR20050080671A (en) Emoticon processing method for text to speech system
JPH0561637A (en) Voice synthesizing mail system
JPH03132797A (en) Voice recognition device
RU80603U1 (en) ELECTRONIC TRANSMISSION SYSTEM WITH THE FUNCTION OF SYNCHRONOUS TRANSLATION OF ORAL SPEECH FROM ONE LANGUAGE TO ANOTHER
JPH10224520A (en) Multi-media public telephone system
Wang et al. Real-Time Voice-Call Language Translation
Bharthi et al. Unit selection based speech synthesis for converting short text message into voice message in mobile phones
KR20010057258A (en) Method and Apparatus for intelligent dialog based on voice recognition using expert system
KR100363876B1 (en) A text to speech system using the characteristic vector of voice and the method thereof
WO2012091608A1 (en) Electronic receiving and transmitting system with the function of synchronous translation of verbal speech from one language into another
KR20040015638A (en) Apparatus for automatic interpreting of foreign language in a telephone

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: A3

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP