EP1498872A1

EP1498872A1 - Method and system for audio rendering of a text with emotional information

Info

Publication number: EP1498872A1
Application number: EP03291765A
Authority: EP
Inventors: Jean Luc Guevel
Original assignee: Alcatel CIT SA; Alcatel SA
Current assignee: Alcatel CIT SA; Alcatel Lucent SAS
Priority date: 2003-07-16
Filing date: 2003-07-16
Publication date: 2005-01-19

Abstract

The present invention concerns a method and a system for rendering in an audio form a text using a given Text To Speech (TTS) system, said text including symbolic signs corresponding to the mental, emotional and/or physical state or state of mind of the author and/or sender of said text.

Method characterised in that it consists in:

subjecting said text to a first treatment wherein at least some of the symbolic signs present in said text are transcribed or translated into corresponding codes processable by the concerned TTS system (1),
subjecting said pretreated text to a second treatment by said TTS system (1) wherein the textual words, sentences or expressions are translated into audio words, sentences or expressions made of associated phonetic tokens and wherein said codes are interpreted in order to adjust at least one acoustic parameter of at least one of said words, sentences or expressions, and
generating, immediately or with a delay, an output acoustic signal, with the features and properties of the outputting voice being set by the adjusted parameters.

Description

SPECIFICATION

The present invention is related to the field of the conversion of text data into voice or speech data, more particularly in connection with so-called Text To Speech systems (TTS).

The present invention concerns a method and a system for rendering a text in an audio form with a better expressiveness, based on the state of mind of the author at the time of producing said text.

It is nowadays quite a common practice, with the increased use of Internet, e-mails and SMS (Small Message System), to have text messages converted or translated into voice messages by means of a TTS system.

Nevertheless, the receiver or listener of the message has most of the time no indication of the state of mind, mood or general state of the sender of the message, as the generated audio signal is "flat" and the output voice generally monotone. This can obliterate a great part of the meaning and/or of the strength of the text or message.

It has been proposed, in EP-A-1 102 242, to personalise the features of the outputting voice. But said personalisation is performed upon the initiative and according to the desires of the receiver or listener, and does not reflect the feelings, the mood or the state of mind of the writer or sender.

Furthermore, the writer of the text does not know that his text will be rendered in an audio form and he is anyway not aware of the features which would allow to personalise said rendering.

It is an aim of the present invention to overcome the aforementioned drawbacks and restrictions.

Therefore, the main object of the present invention is a method for rendering in an audio form a text using a given Text To Speech system, program or engine, said text including symbolic signs corresponding to the mental, emotional and/or physical state or state of mind of the author and/or sender of said text, method characterised in that it consists in:

subjecting said text to a first treatment wherein at least some of the symbolic signs present in said text are transcribed or translated into corresponding codes processable by the concerned TTS system,
subjecting said pretreated text to a second treatment by said TTS system wherein the textual words, sentences or expressions are translated into audio words, sentences or expressions made of associated phonetic tokens and wherein said codes are interpreted in order to adjust at least one acoustic parameter of at least one of said words, sentences or expressions, and
generating, immediately or with a delay, an output acoustic signal, with the features and properties of the outputting voice being set by the adjusted parameters.

The present invention will be better understood thanks to the following description of additional features and advantages, and will now be described in more details, by way of example, in relation to a non limitative embodiment shown on the enclosed drawings, wherein:

Figure 1 is a schematical drawing of a system able to perform the method according to the invention, and connected through a network 5 to a text message sender 6;

Figures 2 and 3 are self-explanatory schematical drawings of the first and second software modules being part of the system for performing the inventive method, and

Figure 4 is a more detailed self-explanatory drawing of a possible embodiment of the text analysing and treating module being part of the first software module of figure 2.

The inventive method mainly comprises the following steps:

subjecting said text to a first treatment wherein at least some of the symbolic signs present in said text are transcribed or translated into corresponding codes processable by the concerned TTS system 1,
subjecting said pretreated text to a second treatment by said TTS system 1 wherein the textual words, sentences or expressions are translated into audio words, sentences or expressions made of associated phonetic tokens and wherein said codes are interpreted in order to adjust at least one acoustic parameter of at least one of said words, sentences or expressions, and
generating, immediately or with a delay, an output acoustic signal, with the features and properties of the outputting voice being set by the adjusted parameters.

Text To Speech systems as such are well known and exist in several different versions, with no specific common standardised base. In particular, the output features configuring data, known as escape codes, are particular to each of them.

According to a preferred embodiment of the invention, the text consists of an e-mail or SMS message, the symbolic signs consist of smileys and the processable codes are output configuring escape codes belonging to the used TTS system 1.

In line with the most popular way of use of smileys, each symbolic sign inserted in the text to be rendered in audio form affects only the output voice for the word, sentence or expression which immediately precedes it, the corresponding escape code being put immediately in front of the corresponding translated audio word, sentence or expression.

But, as an example of alternative solution, it can also be provided for that each symbolic sign inserted in the text to be rendered in audio form affects the output voice for all the words, sentences and/or expressions which precede it up to a respective preceding symbolic sign, the beginning of the text or predetermined Text cutting sign.

In order to allow a fast pretreatment of the text and to confer a certain flexibility to said first treatment step, the method according to the invention can also comprise a previous step of building up a [symbolic signs / TTS processable codes ] translation library, specifically adapted to the possibilities of the used TTS system, i.e. the plurality of codes it is able to process.

Said library can be integrated to or be separate from the pretreatment means, and its content can be specifically adapted to the escape codes of the used TTS system and evoluate with the appearance of new symbolic signs and the disappearance of older ones, which have become obsolete.

A list of examples of signs in the form of smileys, which could be translated into codes, is given herein after:

°:-)	Angelic
>:-(	Angry
\|- I	Asleep
(:: () : : )	Bandaid
:-{}	Blowing a Kiss
\-o	Bored
:-c	Bummed Out
: ( )	Cannot Stop Talking

:~ /	Confused
:'	Crying
:'-)	Crying with Joy
:'-(	Crying Sadly
:-9	Delicious
:P	Disgusted
:-6	Exhausted
: (-	Frown
^5	High five
:-#	Sealed Lips
@>- -, --	Rose FOR YOU
:-@	Screaming
: O	Shocked
:-)	Smile
:-O	Surprised
Λ	Thumbs Up
:-&	Tongue Tied
:-\	Undecided
;-)	Wink

Nevertheless, the text to be treated can also comprise symbolic signs which are not is said library.

To avoid any misinterpretation of such unknown signs, the method can further comprise the step of deleting or inhibiting, during the first treatment, each symbolic sign which cannot be transcribed into a configuration code processable by the TTS system 1.

Preferably, the adjustable or tunable acoustic parameters may comprise parameters selected from the group consisting of volume, rate, pitch, tone or analog utterance or voice intonation characteristics.

As schematicaly shown on the enclosed figure, the present invention also concerns a system 1 for rendering in an audio form a text, said text including symbolic signs corresponding to the mental, emotional and/or physical state or state of mind of the author or sender of said text, system 2.

Said system 1 is characterised in that it comprises:

a first treatment software module 3 able to transcribe or translate at least some of the symbolic signs present in said text into corresponding predeterming codes, said first module 3 possibly incorporating or being associated with a library;
a second treatment software module 1 in the form of a Text To Speech TTS program or engine able to translate the text as pretreated by the first module 3, namely the textual words, sentences or expressions, into audio sentences or expressions made of associated phonetic tokens and able to interpret the codes present in the pretreated text in order to adjust at least some acoustic parameters of at least one of said words, sentences or expressions;
voice or speech generating means 4 able to provide an output signal, whereby the features or properties of the outputting voice are set by the adjusted parameters.

Of course said system 1 will further comprise adapted mean 9 to implement the method as described herein before.

The first software module can or not be integrated into the TTS program.

The figures 2 to 4 of the enclosed drawings show, in the form of flow charts, possible structures of software module which could be used to perform the inventive method.

It should be noted that the escape codes can be specific codes proposed by a given TTS (such as "Realspeak" for example), can also be generic codes, belonging to the hardware provider.

According to the possibilities and properties of the used TTS, the text to be processed is either passed over directly to the TTS with understandable escape codes, or an adapted particular software module analyses the generic escape codes and calls the configuring functions (for example C language) provided by the target TTS.

As an example of a TTS which does not manage escape codes and does only offer an API, once can quote the TTS known as "Babel".

As an example of an TTS which does manage escape codes, but does also offer an API, one can quote the TTS known as "Scansoft".

The present invention is, of course, not limited to the preferred embodiment described and represented herein, changes can be made or equivalents used without departing from the scope of the invention.

Claims

Method for rendering in an audio form a text using a given Text To Speech (TTS) system, program or engine, said text including symbolic signs corresponding to the mental, emotional and/or physical state or state of mind of the author and/or sender of said text, method characterised in that it consists in:

subjecting said text to a first treatment wherein at least some of the symbolic signs present in said text are transcribed or translated into corresponding codes processable by the concerned TTS system (1),

subjecting said pretreated text to a second treatment by said TTS system (1) wherein the textual words, sentences or expressions are translated into audio words, sentences or expressions made of associated phonetic tokens and wherein said codes are interpreted in order to adjust at least one acoustic parameter of at least one of said words, sentences or expressions, and

generating, immediately or with a delay, an output acoustic signal, with the features and properties of the outputting voice being set by the adjusted parameters.
Method according to claim 1, characterised in that the text consists of an e-mail or SMS message and in that the symbolic signs consist of smileys.
Method according to anyone of claims 1 or 2, characterised in that the processable codes are output configuring escape codes.
Method according to anyone of claims 1 to 3, characterised in that each symbolic sign inserted in the text to be rendered in audio form affects only the output voice for the word, sentence or expression which immediately precedes it, the corresponding escape code being put immediately in front of the corresponding translated audio word, sentence or expression.
Method according to anyone of claims 1 to 3, characterised in that each symbolic sign inserted in the text to be rendered in audio form affects the output voice for all the words, sentences and/or expressions which precede it up to a respective preceding symbolic sign, the beginning of the text or predetermined Text cutting sign.
Method according to anyone of claims 1 to 5, characterised in that it also comprises a previous step of building up a [symbolic signs /TTS processable codes ] translation library, specifically adapted to the possibilities of the used TTS system, i.e. the plurality of codes it is able to process.
Method according to anyone of claims 1 to 6, characterised in that it further comprises the step of deleting or inhibiting, during the first treatment, each symbolic sign which cannot be transcribed into a configuration code processable by the TTS system (1).
Method according to anyone of claims 1 to 7, characterised in that the acoustic parameters comprise parameters selected from the group consisting of volume, rate, pitch, tone or analog utterance or voice intonation characteristics.
System for rendering in an audio form a text, said text including symbolic signs corresponding to the mental, emotional and/or physical state or state of mind of the author or sender of said text, system (2) characterised in that it comprises:

a first treatment software module (3) able to transcribe or translate at least some of the symbolic signs present in said text into corresponding predeterming codes, said first module (3) possibly incorporating or being associated with a library;

a second treatment software module (1) in the form of a Text To Speech (TTS) program or engine able to translate the text as pretreated by the first module (3), namely the textual words, sentences or expressions, into audio sentences or expressions made of associated phonetic tokens and able to interpret the codes present in the pretreated text in order to adjust at least some acoustic parameters of at least one of said words, sentences or expressions;

voice or speech generating means (4) able to provide an output signal, whereby the features or properties of the outputting voice are set by the adjusted parameters.
System according to claim 9, characterised in that it further comprises adapted means to implement the method according to claims 2 to 8.