EP1498872A1 - Method and system for audio rendering of a text with emotional information - Google Patents
Method and system for audio rendering of a text with emotional information Download PDFInfo
- Publication number
- EP1498872A1 EP1498872A1 EP03291765A EP03291765A EP1498872A1 EP 1498872 A1 EP1498872 A1 EP 1498872A1 EP 03291765 A EP03291765 A EP 03291765A EP 03291765 A EP03291765 A EP 03291765A EP 1498872 A1 EP1498872 A1 EP 1498872A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- text
- codes
- sentences
- expressions
- tts
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
Definitions
- the present invention is related to the field of the conversion of text data into voice or speech data, more particularly in connection with so-called Text To Speech systems (TTS).
- TTS Text To Speech systems
- the present invention concerns a method and a system for rendering a text in an audio form with a better expressiveness, based on the state of mind of the author at the time of producing said text.
- the receiver or listener of the message has most of the time no indication of the state of mind, mood or general state of the sender of the message, as the generated audio signal is "flat" and the output voice generally monotone. This can obliterate a great part of the meaning and/or of the strength of the text or message.
- EP-A-1 102 242 It has been proposed, in EP-A-1 102 242, to personalise the features of the outputting voice. But said personalisation is performed upon the initiative and according to the desires of the receiver or listener, and does not reflect the feelings, the mood or the state of mind of the writer or sender.
- the main object of the present invention is a method for rendering in an audio form a text using a given Text To Speech system, program or engine, said text including symbolic signs corresponding to the mental, emotional and/or physical state or state of mind of the author and/or sender of said text, method characterised in that it consists in:
- the inventive method mainly comprises the following steps:
- the text consists of an e-mail or SMS message
- the symbolic signs consist of smileys
- the processable codes are output configuring escape codes belonging to the used TTS system 1.
- each symbolic sign inserted in the text to be rendered in audio form affects only the output voice for the word, sentence or expression which immediately precedes it, the corresponding escape code being put immediately in front of the corresponding translated audio word, sentence or expression.
- each symbolic sign inserted in the text to be rendered in audio form affects the output voice for all the words, sentences and/or expressions which precede it up to a respective preceding symbolic sign, the beginning of the text or predetermined Text cutting sign.
- the method according to the invention can also comprise a previous step of building up a [symbolic signs / TTS processable codes ] translation library, specifically adapted to the possibilities of the used TTS system, i.e. the plurality of codes it is able to process.
- Said library can be integrated to or be separate from the pretreatment means, and its content can be specifically adapted to the escape codes of the used TTS system and evoluate with the appearance of new symbolic signs and the disappearance of older ones, which have become obsolete.
- the text to be treated can also comprise symbolic signs which are not is said library.
- the method can further comprise the step of deleting or inhibiting, during the first treatment, each symbolic sign which cannot be transcribed into a configuration code processable by the TTS system 1.
- the adjustable or tunable acoustic parameters may comprise parameters selected from the group consisting of volume, rate, pitch, tone or analog utterance or voice intonation characteristics.
- the present invention also concerns a system 1 for rendering in an audio form a text, said text including symbolic signs corresponding to the mental, emotional and/or physical state or state of mind of the author or sender of said text, system 2.
- Said system 1 is characterised in that it comprises:
- system 1 will further comprise adapted mean 9 to implement the method as described herein before.
- the first software module can or not be integrated into the TTS program.
- escape codes can be specific codes proposed by a given TTS (such as "Realspeak” for example), can also be generic codes, belonging to the hardware provider.
- the text to be processed is either passed over directly to the TTS with understandable escape codes, or an adapted particular software module analyses the generic escape codes and calls the configuring functions (for example C language) provided by the target TTS.
- TTS which does manage escape codes, but does also offer an API
Abstract
The present invention concerns a method and a system for
rendering in an audio form a text using a given Text To Speech (TTS)
system, said text including symbolic signs corresponding to the mental,
emotional and/or physical state or state of mind of the author and/or sender
of said text.
Method characterised in that it consists in:
- subjecting said text to a first treatment wherein at least some of the symbolic signs present in said text are transcribed or translated into corresponding codes processable by the concerned TTS system (1),
- subjecting said pretreated text to a second treatment by said TTS system (1) wherein the textual words, sentences or expressions are translated into audio words, sentences or expressions made of associated phonetic tokens and wherein said codes are interpreted in order to adjust at least one acoustic parameter of at least one of said words, sentences or expressions, and
- generating, immediately or with a delay, an output acoustic signal, with the features and properties of the outputting voice being set by the adjusted parameters.
Description
The present invention is related to the field of the conversion of
text data into voice or speech data, more particularly in connection with so-called
Text To Speech systems (TTS).
The present invention concerns a method and a system for
rendering a text in an audio form with a better expressiveness, based on the
state of mind of the author at the time of producing said text.
It is nowadays quite a common practice, with the increased use
of Internet, e-mails and SMS (Small Message System), to have text
messages converted or translated into voice messages by means of a TTS
system.
Nevertheless, the receiver or listener of the message has most
of the time no indication of the state of mind, mood or general state of the
sender of the message, as the generated audio signal is "flat" and the output
voice generally monotone. This can obliterate a great part of the meaning
and/or of the strength of the text or message.
It has been proposed, in EP-A-1 102 242, to personalise the
features of the outputting voice. But said personalisation is performed upon
the initiative and according to the desires of the receiver or listener, and
does not reflect the feelings, the mood or the state of mind of the writer or
sender.
Furthermore, the writer of the text does not know that his text
will be rendered in an audio form and he is anyway not aware of the
features which would allow to personalise said rendering.
It is an aim of the present invention to overcome the
aforementioned drawbacks and restrictions.
Therefore, the main object of the present invention is a method
for rendering in an audio form a text using a given Text To Speech system,
program or engine, said text including symbolic signs corresponding to the
mental, emotional and/or physical state or state of mind of the author and/or
sender of said text, method characterised in that it consists in:
- subjecting said text to a first treatment wherein at least some of the symbolic signs present in said text are transcribed or translated into corresponding codes processable by the concerned TTS system,
- subjecting said pretreated text to a second treatment by said TTS system wherein the textual words, sentences or expressions are translated into audio words, sentences or expressions made of associated phonetic tokens and wherein said codes are interpreted in order to adjust at least one acoustic parameter of at least one of said words, sentences or expressions, and
- generating, immediately or with a delay, an output acoustic signal, with the features and properties of the outputting voice being set by the adjusted parameters.
The present invention will be better understood thanks to the
following description of additional features and advantages, and will now
be described in more details, by way of example, in relation to a non
limitative embodiment shown on the enclosed drawings, wherein:
The inventive method mainly comprises the following steps:
- subjecting said text to a first treatment wherein at least some of the
symbolic signs present in said text are transcribed or translated into
corresponding codes processable by the
concerned TTS system 1, - subjecting said pretreated text to a second treatment by said
TTS system 1 wherein the textual words, sentences or expressions are translated into audio words, sentences or expressions made of associated phonetic tokens and wherein said codes are interpreted in order to adjust at least one acoustic parameter of at least one of said words, sentences or expressions, and - generating, immediately or with a delay, an output acoustic signal, with the features and properties of the outputting voice being set by the adjusted parameters.
Text To Speech systems as such are well known and exist in
several different versions, with no specific common standardised base. In
particular, the output features configuring data, known as escape codes, are
particular to each of them.
According to a preferred embodiment of the invention, the text
consists of an e-mail or SMS message, the symbolic signs consist of
smileys and the processable codes are output configuring escape codes
belonging to the used TTS system 1.
In line with the most popular way of use of smileys, each
symbolic sign inserted in the text to be rendered in audio form affects only
the output voice for the word, sentence or expression which immediately
precedes it, the corresponding escape code being put immediately in front
of the corresponding translated audio word, sentence or expression.
But, as an example of alternative solution, it can also be
provided for that each symbolic sign inserted in the text to be rendered in
audio form affects the output voice for all the words, sentences and/or
expressions which precede it up to a respective preceding symbolic sign,
the beginning of the text or predetermined Text cutting sign.
In order to allow a fast pretreatment of the text and to confer a
certain flexibility to said first treatment step, the method according to the
invention can also comprise a previous step of building up a [symbolic
signs / TTS processable codes ] translation library, specifically adapted to
the possibilities of the used TTS system, i.e. the plurality of codes it is able
to process.
Said library can be integrated to or be separate from the
pretreatment means, and its content can be specifically adapted to the
escape codes of the used TTS system and evoluate with the appearance of
new symbolic signs and the disappearance of older ones, which have
become obsolete.
A list of examples of signs in the form of smileys, which could
be translated into codes, is given herein after:
°:-) | Angelic |
>:-( | Angry |
|- I | Asleep |
(:: () : : ) | Bandaid |
:-{} | Blowing a Kiss |
\-o | Bored |
:-c | Bummed Out |
: ( ) | Cannot Stop Talking |
:~ / | Confused |
:' | Crying |
:'-) | Crying with Joy |
:'-( | Crying Sadly |
:-9 | Delicious |
:P | Disgusted |
:-6 | Exhausted |
: (- | Frown |
^5 | High five |
:-# | Sealed Lips |
@>- -, -- | Rose FOR YOU |
:-@ | Screaming |
: O | Shocked |
:-) | Smile |
:-O | Surprised |
Λ | Thumbs Up |
:-& | Tongue Tied |
:-\ | Undecided |
;-) | Wink |
Nevertheless, the text to be treated can also comprise symbolic
signs which are not is said library.
To avoid any misinterpretation of such unknown signs, the
method can further comprise the step of deleting or inhibiting, during the
first treatment, each symbolic sign which cannot be transcribed into a
configuration code processable by the TTS system 1.
Preferably, the adjustable or tunable acoustic parameters may
comprise parameters selected from the group consisting of volume, rate,
pitch, tone or analog utterance or voice intonation characteristics.
As schematicaly shown on the enclosed figure, the present
invention also concerns a system 1 for rendering in an audio form a text,
said text including symbolic signs corresponding to the mental, emotional
and/or physical state or state of mind of the author or sender of said text,
system 2.
Said system 1 is characterised in that it comprises:
- a first
treatment software module 3 able to transcribe or translate at least some of the symbolic signs present in said text into corresponding predeterming codes, saidfirst module 3 possibly incorporating or being associated with a library; - a second
treatment software module 1 in the form of a Text To Speech TTS program or engine able to translate the text as pretreated by thefirst module 3, namely the textual words, sentences or expressions, into audio sentences or expressions made of associated phonetic tokens and able to interpret the codes present in the pretreated text in order to adjust at least some acoustic parameters of at least one of said words, sentences or expressions; - voice or speech generating means 4 able to provide an output signal, whereby the features or properties of the outputting voice are set by the adjusted parameters.
Of course said system 1 will further comprise adapted mean 9
to implement the method as described herein before.
The first software module can or not be integrated into the TTS
program.
The figures 2 to 4 of the enclosed drawings show, in the form
of flow charts, possible structures of software module which could be used
to perform the inventive method.
It should be noted that the escape codes can be specific codes
proposed by a given TTS (such as "Realspeak" for example), can also be
generic codes, belonging to the hardware provider.
According to the possibilities and properties of the used TTS,
the text to be processed is either passed over directly to the TTS with
understandable escape codes, or an adapted particular software module
analyses the generic escape codes and calls the configuring functions (for
example C language) provided by the target TTS.
As an example of a TTS which does not manage escape codes
and does only offer an API, once can quote the TTS known as "Babel".
As an example of an TTS which does manage escape codes, but
does also offer an API, one can quote the TTS known as "Scansoft".
The present invention is, of course, not limited to the preferred
embodiment described and represented herein, changes can be made or
equivalents used without departing from the scope of the invention.
Claims (10)
- Method for rendering in an audio form a text using a given Text To Speech (TTS) system, program or engine, said text including symbolic signs corresponding to the mental, emotional and/or physical state or state of mind of the author and/or sender of said text, method characterised in that it consists in:subjecting said text to a first treatment wherein at least some of the symbolic signs present in said text are transcribed or translated into corresponding codes processable by the concerned TTS system (1),subjecting said pretreated text to a second treatment by said TTS system (1) wherein the textual words, sentences or expressions are translated into audio words, sentences or expressions made of associated phonetic tokens and wherein said codes are interpreted in order to adjust at least one acoustic parameter of at least one of said words, sentences or expressions, andgenerating, immediately or with a delay, an output acoustic signal, with the features and properties of the outputting voice being set by the adjusted parameters.
- Method according to claim 1, characterised in that the text consists of an e-mail or SMS message and in that the symbolic signs consist of smileys.
- Method according to anyone of claims 1 or 2, characterised in that the processable codes are output configuring escape codes.
- Method according to anyone of claims 1 to 3, characterised in that each symbolic sign inserted in the text to be rendered in audio form affects only the output voice for the word, sentence or expression which immediately precedes it, the corresponding escape code being put immediately in front of the corresponding translated audio word, sentence or expression.
- Method according to anyone of claims 1 to 3, characterised in that each symbolic sign inserted in the text to be rendered in audio form affects the output voice for all the words, sentences and/or expressions which precede it up to a respective preceding symbolic sign, the beginning of the text or predetermined Text cutting sign.
- Method according to anyone of claims 1 to 5, characterised in that it also comprises a previous step of building up a [symbolic signs /TTS processable codes ] translation library, specifically adapted to the possibilities of the used TTS system, i.e. the plurality of codes it is able to process.
- Method according to anyone of claims 1 to 6, characterised in that it further comprises the step of deleting or inhibiting, during the first treatment, each symbolic sign which cannot be transcribed into a configuration code processable by the TTS system (1).
- Method according to anyone of claims 1 to 7, characterised in that the acoustic parameters comprise parameters selected from the group consisting of volume, rate, pitch, tone or analog utterance or voice intonation characteristics.
- System for rendering in an audio form a text, said text including symbolic signs corresponding to the mental, emotional and/or physical state or state of mind of the author or sender of said text, system (2) characterised in that it comprises:a first treatment software module (3) able to transcribe or translate at least some of the symbolic signs present in said text into corresponding predeterming codes, said first module (3) possibly incorporating or being associated with a library;a second treatment software module (1) in the form of a Text To Speech (TTS) program or engine able to translate the text as pretreated by the first module (3), namely the textual words, sentences or expressions, into audio sentences or expressions made of associated phonetic tokens and able to interpret the codes present in the pretreated text in order to adjust at least some acoustic parameters of at least one of said words, sentences or expressions;voice or speech generating means (4) able to provide an output signal, whereby the features or properties of the outputting voice are set by the adjusted parameters.
- System according to claim 9, characterised in that it further comprises adapted means to implement the method according to claims 2 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP03291765A EP1498872A1 (en) | 2003-07-16 | 2003-07-16 | Method and system for audio rendering of a text with emotional information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP03291765A EP1498872A1 (en) | 2003-07-16 | 2003-07-16 | Method and system for audio rendering of a text with emotional information |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1498872A1 true EP1498872A1 (en) | 2005-01-19 |
Family
ID=33462248
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP03291765A Withdrawn EP1498872A1 (en) | 2003-07-16 | 2003-07-16 | Method and system for audio rendering of a text with emotional information |
Country Status (1)
Country | Link |
---|---|
EP (1) | EP1498872A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7983910B2 (en) | 2006-03-03 | 2011-07-19 | International Business Machines Corporation | Communicating across voice and text channels with emotion preservation |
CN102244788A (en) * | 2010-05-10 | 2011-11-16 | 索尼公司 | Information processing method, information processing device, scene metadata extraction device, loss recovery information generation device, and programs |
CN106294296A (en) * | 2016-08-16 | 2017-01-04 | 唐哲敏 | A kind of Word message conversation managing method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020191757A1 (en) * | 2001-06-04 | 2002-12-19 | Hewlett-Packard Company | Audio-form presentation of text messages |
US20030028380A1 (en) * | 2000-02-02 | 2003-02-06 | Freeland Warwick Peter | Speech system |
-
2003
- 2003-07-16 EP EP03291765A patent/EP1498872A1/en not_active Withdrawn
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030028380A1 (en) * | 2000-02-02 | 2003-02-06 | Freeland Warwick Peter | Speech system |
US20020191757A1 (en) * | 2001-06-04 | 2002-12-19 | Hewlett-Packard Company | Audio-form presentation of text messages |
Non-Patent Citations (1)
Title |
---|
MARC SCHRÖDER: "Emotional Speech Synthesis: A Review", PROCEEDINGS EUROSPEECH 2001, vol. 1, Aalborg, pages 561 - 564, XP007005064 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7983910B2 (en) | 2006-03-03 | 2011-07-19 | International Business Machines Corporation | Communicating across voice and text channels with emotion preservation |
US8386265B2 (en) | 2006-03-03 | 2013-02-26 | International Business Machines Corporation | Language translation with emotion metadata |
CN102244788A (en) * | 2010-05-10 | 2011-11-16 | 索尼公司 | Information processing method, information processing device, scene metadata extraction device, loss recovery information generation device, and programs |
CN102244788B (en) * | 2010-05-10 | 2015-11-25 | 索尼公司 | Information processing method, information processor and loss recovery information generation device |
CN106294296A (en) * | 2016-08-16 | 2017-01-04 | 唐哲敏 | A kind of Word message conversation managing method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7490042B2 (en) | Methods and apparatus for adapting output speech in accordance with context of communication | |
US7062439B2 (en) | Speech synthesis apparatus and method | |
US6725199B2 (en) | Speech synthesis apparatus and selection method | |
US7644000B1 (en) | Adding audio effects to spoken utterance | |
JP3895766B2 (en) | Speech synthesizer | |
CN101727904B (en) | Voice translation method and device | |
KR100590553B1 (en) | Method and apparatus for generating dialog prosody structure and speech synthesis method and system employing the same | |
US7191132B2 (en) | Speech synthesis apparatus and method | |
US20050192793A1 (en) | System and method for generating a phrase pronunciation | |
CN1692403A (en) | Speech synthesis apparatus with personalized speech segments | |
US20090024393A1 (en) | Speech synthesizer and speech synthesis system | |
WO2005093713A1 (en) | Speech synthesis device | |
US7747440B2 (en) | Methods and apparatus for conveying synthetic speech style from a text-to-speech system | |
US8355484B2 (en) | Methods and apparatus for masking latency in text-to-speech systems | |
CN114678001A (en) | Speech synthesis method and speech synthesis device | |
EP1498872A1 (en) | Method and system for audio rendering of a text with emotional information | |
US20040122668A1 (en) | Method and apparatus for using computer generated voice | |
CN114822489A (en) | Text transfer method and text transfer device | |
JP2002132282A (en) | Electronic text reading aloud system | |
Plumpe et al. | Which is More Important in a Concatenative Text to Speech System-Pitch, Duration, or Spectral Discontinuity? | |
JPH05134691A (en) | Method and apparatus for speech synthesis | |
JP4056647B2 (en) | Waveform connection type speech synthesis apparatus and method | |
KR102116014B1 (en) | voice imitation system using recognition engine and TTS engine | |
KR101129124B1 (en) | Mobile terminla having text to speech function using individual voice character and method used for it | |
CN113421549A (en) | Speech synthesis method, speech synthesis device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL LT LV MK |
|
AKX | Designation fees paid | ||
REG | Reference to a national code |
Ref country code: DE Ref legal event code: 8566 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20050720 |