EP1498872A1 - Method and system for audio rendering of a text with emotional information - Google Patents

Method and system for audio rendering of a text with emotional information Download PDF

Info

Publication number
EP1498872A1
EP1498872A1 EP03291765A EP03291765A EP1498872A1 EP 1498872 A1 EP1498872 A1 EP 1498872A1 EP 03291765 A EP03291765 A EP 03291765A EP 03291765 A EP03291765 A EP 03291765A EP 1498872 A1 EP1498872 A1 EP 1498872A1
Authority
EP
European Patent Office
Prior art keywords
text
codes
sentences
expressions
tts
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP03291765A
Other languages
German (de)
French (fr)
Inventor
Jean Luc Guevel
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alcatel CIT SA
Alcatel Lucent SAS
Original Assignee
Alcatel CIT SA
Alcatel SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alcatel CIT SA, Alcatel SA filed Critical Alcatel CIT SA
Priority to EP03291765A priority Critical patent/EP1498872A1/en
Publication of EP1498872A1 publication Critical patent/EP1498872A1/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination

Definitions

  • the present invention is related to the field of the conversion of text data into voice or speech data, more particularly in connection with so-called Text To Speech systems (TTS).
  • TTS Text To Speech systems
  • the present invention concerns a method and a system for rendering a text in an audio form with a better expressiveness, based on the state of mind of the author at the time of producing said text.
  • the receiver or listener of the message has most of the time no indication of the state of mind, mood or general state of the sender of the message, as the generated audio signal is "flat" and the output voice generally monotone. This can obliterate a great part of the meaning and/or of the strength of the text or message.
  • EP-A-1 102 242 It has been proposed, in EP-A-1 102 242, to personalise the features of the outputting voice. But said personalisation is performed upon the initiative and according to the desires of the receiver or listener, and does not reflect the feelings, the mood or the state of mind of the writer or sender.
  • the main object of the present invention is a method for rendering in an audio form a text using a given Text To Speech system, program or engine, said text including symbolic signs corresponding to the mental, emotional and/or physical state or state of mind of the author and/or sender of said text, method characterised in that it consists in:
  • the inventive method mainly comprises the following steps:
  • the text consists of an e-mail or SMS message
  • the symbolic signs consist of smileys
  • the processable codes are output configuring escape codes belonging to the used TTS system 1.
  • each symbolic sign inserted in the text to be rendered in audio form affects only the output voice for the word, sentence or expression which immediately precedes it, the corresponding escape code being put immediately in front of the corresponding translated audio word, sentence or expression.
  • each symbolic sign inserted in the text to be rendered in audio form affects the output voice for all the words, sentences and/or expressions which precede it up to a respective preceding symbolic sign, the beginning of the text or predetermined Text cutting sign.
  • the method according to the invention can also comprise a previous step of building up a [symbolic signs / TTS processable codes ] translation library, specifically adapted to the possibilities of the used TTS system, i.e. the plurality of codes it is able to process.
  • Said library can be integrated to or be separate from the pretreatment means, and its content can be specifically adapted to the escape codes of the used TTS system and evoluate with the appearance of new symbolic signs and the disappearance of older ones, which have become obsolete.
  • the text to be treated can also comprise symbolic signs which are not is said library.
  • the method can further comprise the step of deleting or inhibiting, during the first treatment, each symbolic sign which cannot be transcribed into a configuration code processable by the TTS system 1.
  • the adjustable or tunable acoustic parameters may comprise parameters selected from the group consisting of volume, rate, pitch, tone or analog utterance or voice intonation characteristics.
  • the present invention also concerns a system 1 for rendering in an audio form a text, said text including symbolic signs corresponding to the mental, emotional and/or physical state or state of mind of the author or sender of said text, system 2.
  • Said system 1 is characterised in that it comprises:
  • system 1 will further comprise adapted mean 9 to implement the method as described herein before.
  • the first software module can or not be integrated into the TTS program.
  • escape codes can be specific codes proposed by a given TTS (such as "Realspeak” for example), can also be generic codes, belonging to the hardware provider.
  • the text to be processed is either passed over directly to the TTS with understandable escape codes, or an adapted particular software module analyses the generic escape codes and calls the configuring functions (for example C language) provided by the target TTS.
  • TTS which does manage escape codes, but does also offer an API

Abstract

The present invention concerns a method and a system for rendering in an audio form a text using a given Text To Speech (TTS) system, said text including symbolic signs corresponding to the mental, emotional and/or physical state or state of mind of the author and/or sender of said text.
Method characterised in that it consists in:
  • subjecting said text to a first treatment wherein at least some of the symbolic signs present in said text are transcribed or translated into corresponding codes processable by the concerned TTS system (1),
  • subjecting said pretreated text to a second treatment by said TTS system (1) wherein the textual words, sentences or expressions are translated into audio words, sentences or expressions made of associated phonetic tokens and wherein said codes are interpreted in order to adjust at least one acoustic parameter of at least one of said words, sentences or expressions, and
  • generating, immediately or with a delay, an output acoustic signal, with the features and properties of the outputting voice being set by the adjusted parameters.

Description

SPECIFICATION
The present invention is related to the field of the conversion of text data into voice or speech data, more particularly in connection with so-called Text To Speech systems (TTS).
The present invention concerns a method and a system for rendering a text in an audio form with a better expressiveness, based on the state of mind of the author at the time of producing said text.
It is nowadays quite a common practice, with the increased use of Internet, e-mails and SMS (Small Message System), to have text messages converted or translated into voice messages by means of a TTS system.
Nevertheless, the receiver or listener of the message has most of the time no indication of the state of mind, mood or general state of the sender of the message, as the generated audio signal is "flat" and the output voice generally monotone. This can obliterate a great part of the meaning and/or of the strength of the text or message.
It has been proposed, in EP-A-1 102 242, to personalise the features of the outputting voice. But said personalisation is performed upon the initiative and according to the desires of the receiver or listener, and does not reflect the feelings, the mood or the state of mind of the writer or sender.
Furthermore, the writer of the text does not know that his text will be rendered in an audio form and he is anyway not aware of the features which would allow to personalise said rendering.
It is an aim of the present invention to overcome the aforementioned drawbacks and restrictions.
Therefore, the main object of the present invention is a method for rendering in an audio form a text using a given Text To Speech system, program or engine, said text including symbolic signs corresponding to the mental, emotional and/or physical state or state of mind of the author and/or sender of said text, method characterised in that it consists in:
  • subjecting said text to a first treatment wherein at least some of the symbolic signs present in said text are transcribed or translated into corresponding codes processable by the concerned TTS system,
  • subjecting said pretreated text to a second treatment by said TTS system wherein the textual words, sentences or expressions are translated into audio words, sentences or expressions made of associated phonetic tokens and wherein said codes are interpreted in order to adjust at least one acoustic parameter of at least one of said words, sentences or expressions, and
  • generating, immediately or with a delay, an output acoustic signal, with the features and properties of the outputting voice being set by the adjusted parameters.
The present invention will be better understood thanks to the following description of additional features and advantages, and will now be described in more details, by way of example, in relation to a non limitative embodiment shown on the enclosed drawings, wherein:
  • Figure 1 is a schematical drawing of a system able to perform the method according to the invention, and connected through a network 5 to a text message sender 6;
  • Figures 2 and 3 are self-explanatory schematical drawings of the first and second software modules being part of the system for performing the inventive method, and
  • Figure 4 is a more detailed self-explanatory drawing of a possible embodiment of the text analysing and treating module being part of the first software module of figure 2.
  • The inventive method mainly comprises the following steps:
    • subjecting said text to a first treatment wherein at least some of the symbolic signs present in said text are transcribed or translated into corresponding codes processable by the concerned TTS system 1,
    • subjecting said pretreated text to a second treatment by said TTS system 1 wherein the textual words, sentences or expressions are translated into audio words, sentences or expressions made of associated phonetic tokens and wherein said codes are interpreted in order to adjust at least one acoustic parameter of at least one of said words, sentences or expressions, and
    • generating, immediately or with a delay, an output acoustic signal, with the features and properties of the outputting voice being set by the adjusted parameters.
    Text To Speech systems as such are well known and exist in several different versions, with no specific common standardised base. In particular, the output features configuring data, known as escape codes, are particular to each of them.
    According to a preferred embodiment of the invention, the text consists of an e-mail or SMS message, the symbolic signs consist of smileys and the processable codes are output configuring escape codes belonging to the used TTS system 1.
    In line with the most popular way of use of smileys, each symbolic sign inserted in the text to be rendered in audio form affects only the output voice for the word, sentence or expression which immediately precedes it, the corresponding escape code being put immediately in front of the corresponding translated audio word, sentence or expression.
    But, as an example of alternative solution, it can also be provided for that each symbolic sign inserted in the text to be rendered in audio form affects the output voice for all the words, sentences and/or expressions which precede it up to a respective preceding symbolic sign, the beginning of the text or predetermined Text cutting sign.
    In order to allow a fast pretreatment of the text and to confer a certain flexibility to said first treatment step, the method according to the invention can also comprise a previous step of building up a [symbolic signs / TTS processable codes ] translation library, specifically adapted to the possibilities of the used TTS system, i.e. the plurality of codes it is able to process.
    Said library can be integrated to or be separate from the pretreatment means, and its content can be specifically adapted to the escape codes of the used TTS system and evoluate with the appearance of new symbolic signs and the disappearance of older ones, which have become obsolete.
    A list of examples of signs in the form of smileys, which could be translated into codes, is given herein after:
    °:-) Angelic
    >:-( Angry
    |- I Asleep
    (:: () : : ) Bandaid
    :-{} Blowing a Kiss
    \-o Bored
    :-c Bummed Out
    : ( ) Cannot Stop Talking
    :~ / Confused
    :' Crying
    :'-) Crying with Joy
    :'-( Crying Sadly
    :-9 Delicious
    :P Disgusted
    :-6 Exhausted
    : (- Frown
    ^5 High five
    :-# Sealed Lips
    @>- -, -- Rose FOR YOU
    :-@ Screaming
    : O Shocked
    :-) Smile
    :-O Surprised
    Λ Thumbs Up
    :-& Tongue Tied
    :-\ Undecided
    ;-) Wink
    Nevertheless, the text to be treated can also comprise symbolic signs which are not is said library.
    To avoid any misinterpretation of such unknown signs, the method can further comprise the step of deleting or inhibiting, during the first treatment, each symbolic sign which cannot be transcribed into a configuration code processable by the TTS system 1.
    Preferably, the adjustable or tunable acoustic parameters may comprise parameters selected from the group consisting of volume, rate, pitch, tone or analog utterance or voice intonation characteristics.
    As schematicaly shown on the enclosed figure, the present invention also concerns a system 1 for rendering in an audio form a text, said text including symbolic signs corresponding to the mental, emotional and/or physical state or state of mind of the author or sender of said text, system 2.
    Said system 1 is characterised in that it comprises:
    • a first treatment software module 3 able to transcribe or translate at least some of the symbolic signs present in said text into corresponding predeterming codes, said first module 3 possibly incorporating or being associated with a library;
    • a second treatment software module 1 in the form of a Text To Speech TTS program or engine able to translate the text as pretreated by the first module 3, namely the textual words, sentences or expressions, into audio sentences or expressions made of associated phonetic tokens and able to interpret the codes present in the pretreated text in order to adjust at least some acoustic parameters of at least one of said words, sentences or expressions;
    • voice or speech generating means 4 able to provide an output signal, whereby the features or properties of the outputting voice are set by the adjusted parameters.
    Of course said system 1 will further comprise adapted mean 9 to implement the method as described herein before.
    The first software module can or not be integrated into the TTS program.
    The figures 2 to 4 of the enclosed drawings show, in the form of flow charts, possible structures of software module which could be used to perform the inventive method.
    It should be noted that the escape codes can be specific codes proposed by a given TTS (such as "Realspeak" for example), can also be generic codes, belonging to the hardware provider.
    According to the possibilities and properties of the used TTS, the text to be processed is either passed over directly to the TTS with understandable escape codes, or an adapted particular software module analyses the generic escape codes and calls the configuring functions (for example C language) provided by the target TTS.
    As an example of a TTS which does not manage escape codes and does only offer an API, once can quote the TTS known as "Babel".
    As an example of an TTS which does manage escape codes, but does also offer an API, one can quote the TTS known as "Scansoft".
    The present invention is, of course, not limited to the preferred embodiment described and represented herein, changes can be made or equivalents used without departing from the scope of the invention.

    Claims (10)

    1. Method for rendering in an audio form a text using a given Text To Speech (TTS) system, program or engine, said text including symbolic signs corresponding to the mental, emotional and/or physical state or state of mind of the author and/or sender of said text, method characterised in that it consists in:
      subjecting said text to a first treatment wherein at least some of the symbolic signs present in said text are transcribed or translated into corresponding codes processable by the concerned TTS system (1),
      subjecting said pretreated text to a second treatment by said TTS system (1) wherein the textual words, sentences or expressions are translated into audio words, sentences or expressions made of associated phonetic tokens and wherein said codes are interpreted in order to adjust at least one acoustic parameter of at least one of said words, sentences or expressions, and
      generating, immediately or with a delay, an output acoustic signal, with the features and properties of the outputting voice being set by the adjusted parameters.
    2. Method according to claim 1, characterised in that the text consists of an e-mail or SMS message and in that the symbolic signs consist of smileys.
    3. Method according to anyone of claims 1 or 2, characterised in that the processable codes are output configuring escape codes.
    4. Method according to anyone of claims 1 to 3, characterised in that each symbolic sign inserted in the text to be rendered in audio form affects only the output voice for the word, sentence or expression which immediately precedes it, the corresponding escape code being put immediately in front of the corresponding translated audio word, sentence or expression.
    5. Method according to anyone of claims 1 to 3, characterised in that each symbolic sign inserted in the text to be rendered in audio form affects the output voice for all the words, sentences and/or expressions which precede it up to a respective preceding symbolic sign, the beginning of the text or predetermined Text cutting sign.
    6. Method according to anyone of claims 1 to 5, characterised in that it also comprises a previous step of building up a [symbolic signs /TTS processable codes ] translation library, specifically adapted to the possibilities of the used TTS system, i.e. the plurality of codes it is able to process.
    7. Method according to anyone of claims 1 to 6, characterised in that it further comprises the step of deleting or inhibiting, during the first treatment, each symbolic sign which cannot be transcribed into a configuration code processable by the TTS system (1).
    8. Method according to anyone of claims 1 to 7, characterised in that the acoustic parameters comprise parameters selected from the group consisting of volume, rate, pitch, tone or analog utterance or voice intonation characteristics.
    9. System for rendering in an audio form a text, said text including symbolic signs corresponding to the mental, emotional and/or physical state or state of mind of the author or sender of said text, system (2) characterised in that it comprises:
      a first treatment software module (3) able to transcribe or translate at least some of the symbolic signs present in said text into corresponding predeterming codes, said first module (3) possibly incorporating or being associated with a library;
      a second treatment software module (1) in the form of a Text To Speech (TTS) program or engine able to translate the text as pretreated by the first module (3), namely the textual words, sentences or expressions, into audio sentences or expressions made of associated phonetic tokens and able to interpret the codes present in the pretreated text in order to adjust at least some acoustic parameters of at least one of said words, sentences or expressions;
      voice or speech generating means (4) able to provide an output signal, whereby the features or properties of the outputting voice are set by the adjusted parameters.
    10. System according to claim 9, characterised in that it further comprises adapted means to implement the method according to claims 2 to 8.
    EP03291765A 2003-07-16 2003-07-16 Method and system for audio rendering of a text with emotional information Withdrawn EP1498872A1 (en)

    Priority Applications (1)

    Application Number Priority Date Filing Date Title
    EP03291765A EP1498872A1 (en) 2003-07-16 2003-07-16 Method and system for audio rendering of a text with emotional information

    Applications Claiming Priority (1)

    Application Number Priority Date Filing Date Title
    EP03291765A EP1498872A1 (en) 2003-07-16 2003-07-16 Method and system for audio rendering of a text with emotional information

    Publications (1)

    Publication Number Publication Date
    EP1498872A1 true EP1498872A1 (en) 2005-01-19

    Family

    ID=33462248

    Family Applications (1)

    Application Number Title Priority Date Filing Date
    EP03291765A Withdrawn EP1498872A1 (en) 2003-07-16 2003-07-16 Method and system for audio rendering of a text with emotional information

    Country Status (1)

    Country Link
    EP (1) EP1498872A1 (en)

    Cited By (3)

    * Cited by examiner, † Cited by third party
    Publication number Priority date Publication date Assignee Title
    US7983910B2 (en) 2006-03-03 2011-07-19 International Business Machines Corporation Communicating across voice and text channels with emotion preservation
    CN102244788A (en) * 2010-05-10 2011-11-16 索尼公司 Information processing method, information processing device, scene metadata extraction device, loss recovery information generation device, and programs
    CN106294296A (en) * 2016-08-16 2017-01-04 唐哲敏 A kind of Word message conversation managing method

    Citations (2)

    * Cited by examiner, † Cited by third party
    Publication number Priority date Publication date Assignee Title
    US20020191757A1 (en) * 2001-06-04 2002-12-19 Hewlett-Packard Company Audio-form presentation of text messages
    US20030028380A1 (en) * 2000-02-02 2003-02-06 Freeland Warwick Peter Speech system

    Patent Citations (2)

    * Cited by examiner, † Cited by third party
    Publication number Priority date Publication date Assignee Title
    US20030028380A1 (en) * 2000-02-02 2003-02-06 Freeland Warwick Peter Speech system
    US20020191757A1 (en) * 2001-06-04 2002-12-19 Hewlett-Packard Company Audio-form presentation of text messages

    Non-Patent Citations (1)

    * Cited by examiner, † Cited by third party
    Title
    MARC SCHRÖDER: "Emotional Speech Synthesis: A Review", PROCEEDINGS EUROSPEECH 2001, vol. 1, Aalborg, pages 561 - 564, XP007005064 *

    Cited By (5)

    * Cited by examiner, † Cited by third party
    Publication number Priority date Publication date Assignee Title
    US7983910B2 (en) 2006-03-03 2011-07-19 International Business Machines Corporation Communicating across voice and text channels with emotion preservation
    US8386265B2 (en) 2006-03-03 2013-02-26 International Business Machines Corporation Language translation with emotion metadata
    CN102244788A (en) * 2010-05-10 2011-11-16 索尼公司 Information processing method, information processing device, scene metadata extraction device, loss recovery information generation device, and programs
    CN102244788B (en) * 2010-05-10 2015-11-25 索尼公司 Information processing method, information processor and loss recovery information generation device
    CN106294296A (en) * 2016-08-16 2017-01-04 唐哲敏 A kind of Word message conversation managing method

    Similar Documents

    Publication Publication Date Title
    US7490042B2 (en) Methods and apparatus for adapting output speech in accordance with context of communication
    US7062439B2 (en) Speech synthesis apparatus and method
    US6725199B2 (en) Speech synthesis apparatus and selection method
    US7644000B1 (en) Adding audio effects to spoken utterance
    JP3895766B2 (en) Speech synthesizer
    CN101727904B (en) Voice translation method and device
    KR100590553B1 (en) Method and apparatus for generating dialog prosody structure and speech synthesis method and system employing the same
    US7191132B2 (en) Speech synthesis apparatus and method
    US20050192793A1 (en) System and method for generating a phrase pronunciation
    CN1692403A (en) Speech synthesis apparatus with personalized speech segments
    US20090024393A1 (en) Speech synthesizer and speech synthesis system
    WO2005093713A1 (en) Speech synthesis device
    US7747440B2 (en) Methods and apparatus for conveying synthetic speech style from a text-to-speech system
    US8355484B2 (en) Methods and apparatus for masking latency in text-to-speech systems
    CN114678001A (en) Speech synthesis method and speech synthesis device
    EP1498872A1 (en) Method and system for audio rendering of a text with emotional information
    US20040122668A1 (en) Method and apparatus for using computer generated voice
    CN114822489A (en) Text transfer method and text transfer device
    JP2002132282A (en) Electronic text reading aloud system
    Plumpe et al. Which is More Important in a Concatenative Text to Speech System-Pitch, Duration, or Spectral Discontinuity?
    JPH05134691A (en) Method and apparatus for speech synthesis
    JP4056647B2 (en) Waveform connection type speech synthesis apparatus and method
    KR102116014B1 (en) voice imitation system using recognition engine and TTS engine
    KR101129124B1 (en) Mobile terminla having text to speech function using individual voice character and method used for it
    CN113421549A (en) Speech synthesis method, speech synthesis device, computer equipment and storage medium

    Legal Events

    Date Code Title Description
    PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

    Free format text: ORIGINAL CODE: 0009012

    AK Designated contracting states

    Kind code of ref document: A1

    Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR

    AX Request for extension of the european patent

    Extension state: AL LT LV MK

    AKX Designation fees paid
    REG Reference to a national code

    Ref country code: DE

    Ref legal event code: 8566

    STAA Information on the status of an ep patent application or granted ep patent

    Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

    18D Application deemed to be withdrawn

    Effective date: 20050720