WO2008053204A1 - Speech communication method and apparatus - Google Patents

Speech communication method and apparatus Download PDF

Info

Publication number
WO2008053204A1
WO2008053204A1 PCT/GB2007/004146 GB2007004146W WO2008053204A1 WO 2008053204 A1 WO2008053204 A1 WO 2008053204A1 GB 2007004146 W GB2007004146 W GB 2007004146W WO 2008053204 A1 WO2008053204 A1 WO 2008053204A1
Authority
WO
WIPO (PCT)
Prior art keywords
message
user
style
speech
messages
Prior art date
Application number
PCT/GB2007/004146
Other languages
French (fr)
Inventor
Hugh Brogan
Original Assignee
Stars2U Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB0621547A external-priority patent/GB2443468A/en
Priority claimed from GB0706505A external-priority patent/GB0706505D0/en
Application filed by Stars2U Limited filed Critical Stars2U Limited
Publication of WO2008053204A1 publication Critical patent/WO2008053204A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/06Message adaptation to terminal or network requirements
    • H04L51/066Format adaptation, e.g. format conversion or compression
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/58Message adaptation for wireless communication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/18Information format or content conversion, e.g. adaptation by the network of the transmitted or received information for the purpose of wireless delivery to users or terminals

Definitions

  • the invention allows a user to record their own voice to create a model to allow their text messages delivered in their own vocal style.
  • a generic voice with clear enunciation to deliver messages.
  • An example of a generic voice application would be for truck drivers that are travelling internationally who would not necessarily be connected to a trunked radio data network, as trunked radio data networks are normally national. Telephoning and talking to a driver when they are driving is dangerous, but with this system, any texts sent could be spoken and would not cause the driver to have to manage a text call at the same time.
  • the text-to-speech conversion itself will be supported to run many simultaneous threads as there will be a significant volume of messaging traffic to deal with, (there are over 130 billion SMS messages sent every month globally, Jan 2006).
  • the application software 1 takes the text input sent by the SMSC 2 and performs a conversion to speech upon it. It may also perform the additional functions such as ("persona feature"). These additional functions will be performed upon the message and it will be converted into speech. It can then be packaged up as an MMS. This will most likely be done in the application software itself, or may be done at the Multimedia Message Switching Centre (MMSC) 3, if this is possible. Whether it is done pre-sending or at the reception point of MMSC 3 makes no material difference to the operation of the service.
  • MMSC Multimedia Message Switching Centre
  • the system may have various quality checking processes built into it, some of which may be done by third parties. These tools and processes will compare the output speech from the text- to-speech software against the original reference model of the person being impersonated or transcribed as a known reference point, as well as subjective listening.
  • an email system is adapted to incorporate the speech generation system.
  • the speech generation system is hosted on an application server within the email system and emails are directed to the application server for conversion in accordance with whether or not users have elected to use the speech delivery service.
  • the style of speech delivery may be selected by a user.
  • the speech delivery service may be billed to users on the basis of volume of use or period of use of on a fixed charge basis. Charges may be varied according to the number of style options that a user selects for delivering emails.
  • FIG. 2 illustrates an email system comprising an SMTP server 10 which interacts with an email client on the sender's terminal 11 and determines the identity of the sender and of the addressee, and from the latter and the Domain Name Server database identifies the addressee's SMTP server 12.
  • the two SMTP servers 10,12 then communicate to transfer the email from one to the other, while the second SMTP server hands the email to its associated POP3 or IMAP server 13 to handle the delivery to the addressee's terminal 14.
  • An application server 15 for the speech delivery service is provided to interwork with the SMTP server 10 to modify email messages in accordance with the identity of the sender and their subscription to this service. If the sender has not subscribed to the service then the email is passed on without any enhancement in the standard manner.
  • an email may be sent incorporating an applet to run on the recipient's terminal to generate the modified speech message.
  • Emails enhanced in accordance with the invention may also automatically incorporate an advertisement to the speech delivery service, which may take the form of an audio/ video advertisement and/or a web link.
  • the invention may be implemented as an email terminal which may take the form of a pc or handheld device including a telephone terminal.
  • the invention can equally well be implemented in any terminal for receiving messages so as to deliver these in a selected style whether audio or video or a combination of these.
  • the terminal incorporates a speech generation system maybe in the form of an applet to convert incoming messages in the selected style.
  • the terminal may be used in an SMS/MMS system or instant messaging system or email system.

Abstract

A user selects a style of message to be delivered and conversion apparatus (1) converts the message to a modified speech message with reference to the style selected by the user. In a text messaging system text messages, for example, SMS messages, are converted to speech messages in a style of speech selected by the user for onward delivery to the addressee. In an email system email text is converted to a speech message for delivery with the email to the addressee. Speech styles include personalities and characters and may involve the addition of content to the original text message to emulate the persona of a character. An application server (1) to provide this service may be provided in a standards based communications system. Users can subscribe to the service so that text messages they send or receive are directed to the application server and converted to a modified message in a speech/ character style selected by them. Games terminals may also incorporate a speech generation system to allow a use to select the style of voice of a game character.

Description

Speech Communication Method and Apparatus
Technical Field
This invention relates to a communication method and apparatus for delivering a speech message.
Disclosure of the Invention
According to one aspect, the invention consists in delivering a message to an addressee characterised in that a user selects a style for the message to be delivered, conversion apparatus converts the message to a modified speech message with reference to the style selected by the user, and the modified message is delivered to the addressee.
The message may originate as a verbal message (text or voice) and the modified message be delivered in a vocal style selected by the user. For example, the message may be delivered in the vocal style of a personality, such as a well known person or fictional character or in a generic or personal vocal style or may even be sung in a melody selected by the user. The modified message may involve language translation or translation into a vernacular style of speech. The modified message may also include an additional audio component such as music or an audio background e.g. office, public house, or countryside background noises. The modified message may also include graphics or animation or video and this may be synchronised with the speech or other audio component of the message.
The invention may be implemented in a communications network such as a computer network or a telecommunications network with a messaging service. A computer network might support email or an instant messaging service. A telecommunications network might support the Multimedia Messaging Service MMS. Messages may be created and delivered in MMS or may be created as text messages in the Short Messaging Service SMS and delivered as MMS messages.
The user who selects the style may be the sender or receiver of messages, and if required, a receiver's selection may take priority over a sender's selection, or vice versa. In one embodiment, the invention is applied to a mobile telephone system supporting the Short Message Service (SMS) and Multimedia Message Service (MMS). An SMS/MMS text message is sent in the usual way. No new skills or behaviours need to be learnt in order to send a text message that will be delivered as a voice message in the voice of a selected personality. The user sends a message to the friend and when the friend receives that message, they get a notification on their mobile handset display in the usual way, informing them they have a new message to read. They push the read button and at that point an MMS is delivered to them, possibly with a graphics or video display on the handset display with the text of the message, and with a vocal delivery of that message through the handset speaker or an audio device connected to the handset. This vocal message is delivered in the vocal style of the personality selected by the sender or receiver of the message. Optionally, the system may also send the message back to the user so that the user can also experience the modified message. This feature may be switched on and off by the user adding a character to the initial service activation sequence.
Implementation of the invention requires a speech generation system which can generate speech with the required styles or personalities for the selections made by users in delivering their messages to others or receiving their own messages. Such a system takes a person's speech and learns how someone pronounces words and sounds so that it can then reproduce this style of speech when given a new text input. The learning ability of the system is preferably such that it can use pre-recorded voices of people, for example, Tony Blair, whom users might select to deliver a message. It would be possible to learn Tony Blair's speech personality by transcribing several hours of his speeches, effectively telling the system what each sound represents in terms of data pattern. This allows a very accurate speech model to be created. The speech personality creation system converts the voice to a series of parameters which are then loaded into a computer running a text-to-speech algorithm. In summary, the system can emulate a person's speech by training a computer with speech samples. This creates a series of mathematical parameters which are then used by another computer running a text-to-speech programme that uses these parameters to convert text-to- speech with the required personality.
A mobile telephone system according to the invention has the ability to send a verbal message and deliver it in the style of, or be spoken in the accent or manner of a well known personality. This service brings two service areas of mobile data together in one new innovative service, that is messaging on one side (which has achieved 80% penetration globally) and entertainment on the other (the most popular example being ringtones, which have 20% global penetration). The combination of these two services creates a new and fun service for consumers which should become very popular.
As well as allowing the delivery of personalized messages with the voice of a celebrity, the invention allows a user to record their own voice to create a model to allow their text messages delivered in their own vocal style. There is also the ability to use a generic voice with clear enunciation to deliver messages. An example of a generic voice application would be for truck drivers that are travelling internationally who would not necessarily be connected to a trunked radio data network, as trunked radio data networks are normally national. Telephoning and talking to a driver when they are driving is dangerous, but with this system, any texts sent could be spoken and would not cause the driver to have to manage a text call at the same time.
The invention also includes the option to add a persona to a selected style. For example, particular mannerisms or words that a character or person uses in their speech should be generated and automatically inserted into the text by the system, so that the text has more of the "personality" of the person being emulated. For example, one could send the text "would you like a coffee", and if one sent that in the persona of Homer Simpson, then Homer Simpson's phrasing would be added automatically into the message, before the text-to-speech conversation takes place. The system would then perform the text-to-speech conversion and send the message to the recipient, and when the recipient plays the message they would hear the message with the added phrases, and could also be presented with a text message including the added phrases.
Other embodiments of the invention may be implemented in email systems or instant messaging systems, or in an advertising system or in a computer games terminal, and the delivered message is modified in the same manner as the text in the example of a mobile telephone system described above. Description of the Drawings
The invention is now described by way of example with reference to the accompanying drawings:
Figure 1 is a schematic drawing of a mobile telephone system incorporating a text-to-speech delivery service according to the invention, and
Figure 2 is a schematic drawing of an email system incorporating a text-to-speech delivery service according to the invention.
Embodiments of the Invention
The technical architecture of mobile telephone system in Figure 1 is implemented in three core blocks; all of these blocks are interconnected by the use of existing standards and technologies. This enables the service to be widely distributed to consumers. Standards and infrastructure for communicating with networks are already in place and also the SMS/MMS feature is already available in mobile handsets. The architecture will use standards based protocols, this allows the interconnection of an Application server 1 to a Short Message Service Centre (SMSC) 2 on a standard IP connection. The SMSC 2 enables messages to be handled and passed through a mobile telephone network and takes care of the routing and any other matters that may need to be performed on the message in order that it reaches the correct recipient. The mobile network has at least one SMSC, (typically a minimum of one in each country). The system uses standard SMSC capabilities requiring minimal incremental infrastructure investment from operators.
In certain networks (modern IP networks) the SMSC will use an adjunct server to make routing decisions. Since an adjunct server is used, a different message routing can be enabled when the text-to-speech delivery service is activated. It should be noted that although there are different network types, the service activation sequence is only done once by the user, whichever network type (e.g. SS7 or IP based) is available. The service is activated by the user sending a short code and then the name of the character whose style of speech they wish to use to deliver messages, for example, **2U BART would be Bart Simpson. The **2U is a code that identifies to the SMSC that the user (subscriber) wants to activate the voice personality service. The characters following this describe the personality/character the user wishes to use. These personality or speech style codes will be advertised by marketers of the service. The SMSC if it has an adjunct server will then know that all messages following from this user are to be sent to an IP address - which will be the IP address of the application server 1, where both the persona and the text-to-speech conversion software resides. The adjunct server will send the user's number and information of the personality that is selected to the application server (IP address) such that the voice personality system knows that all the messages it receives from that user's telephone number are to be converted in the style of the selected character code in the activation message, until further notice. The personality can be changed at any time by the user sending a new short code and character code. The system can be turned off by sending a short code followed by "off message.
In SS7 or non-IP networks (which in general have less routing intelligence than IP networks) an additional activation step is needed in order that the user of the service does not need to add a prefix to each message to let the SMSC know that it is to be delivered with voice personality. The additional step requires the SS7 SMSC to be programmed to send an over the air message to the user's phone so that upon receiving the activation short code (e.g. **2U), the SS7 SMSC responds by sending a new short message switching centre SMSC number to the user's phone. This causes all messages sent by the user from then on to go to a dedicated SMSC. This dedicated SMSC is set up to send all messages it receives to the application server 1. The system then functions in the same way as the IP network system, except that when the subscriber wishes to turn off the text-to-speech service, they send the OFF code, and the dedicated SS7 SMSC sends a message to the user's terminal to change the dedicated SMSC number back to one of several general purpose SMSCs in the operator's network.
The output messages from the SMSC 2 will in all cases be directed towards the application server 1 via a standard IP internet link. The system will use IP protocol to interconnect between the SMSC 2 and the application server 1. The application server can be, and most probably will be, on a leased server farm.
The text-to-speech conversion itself will be supported to run many simultaneous threads as there will be a significant volume of messaging traffic to deal with, (there are over 130 billion SMS messages sent every month globally, Jan 2006).The application software 1 takes the text input sent by the SMSC 2 and performs a conversion to speech upon it. It may also perform the additional functions such as ("persona feature"). These additional functions will be performed upon the message and it will be converted into speech. It can then be packaged up as an MMS. This will most likely be done in the application software itself, or may be done at the Multimedia Message Switching Centre (MMSC) 3, if this is possible. Whether it is done pre-sending or at the reception point of MMSC 3 makes no material difference to the operation of the service. The message will be converted to an MMS and it will then be sent by the MMSC through to the correct recipient. Throughout the whole process, the destination phone number (addressee's number) needs to be retained as well as the transmission phone number (user's number). This is important information to correctly repackage and reconstitute the message so that the MMSC 3 is able to interpret the instructions contained in the data part of the message correctly. The SMSC 2 must pass the user's number and the addressee's number, plus the message content to the application server 1. As mentioned, the system is fully standards based, therefore standard SMS and MMS message packaging is used to put the message together and send it through the network as the message will be sent and received by standard mobile handsets 4.
As outlined above, one of the functions of the speech delivery service is to convert text-to- speech according to the parameters stored for the personality of a particular character/ individual. The personality file will reside in an application server 1 and messages will be processed using a text-to-speech programme, which will create the spoken message. The persona feature allows the individual characteristics of a person to be automatically entered into the text stream before the text-to-speech conversion takes place. Phrases and or pauses may be built into the personality's speech. These parameters will be held in a file called the "normalization file". The contents of this file will be used to selectively process the message. The persona feature can be turned on and off. In addition to adding particular phrases the service may add a preamble to a message such that it appears that the personality is putting together a complete message to the recipient, for example, the phase "My name is David Beckham" could be inserted at the beginning of every message sent, if this option/feature is selected. The system will have contextual capability to enable it to insert the correct preamble/postamble depending on the circumstance or context of the user text message.
Text messages commonly use abbreviations such as "L8r" for later, "Txt" for text and "Y" for yes, and the text-to-speech processor is preferably preceded by a translation function which replaces the abbreviation with the full text for conversion to speech. A translation function could also be used to create a different vernacular of speech in a particular language, or to translate between different languages. The sound that a particular word makes is stored for each word that the system is presented with, for example, the word "one" could be translated to "un(e)" in French. If a person were to write a text that contained the word "one", the system would use "un(e)", if a user selected the speech to be delivered in French. There are various characteristics, grammar, pronunciation etc of different languages that will determine the performance of this feature.
The system may have various quality checking processes built into it, some of which may be done by third parties. These tools and processes will compare the output speech from the text- to-speech software against the original reference model of the person being impersonated or transcribed as a known reference point, as well as subjective listening.
A billing system 5 charges users for the speech delivery service. This is typically done by the mobile phone billing system by charging a premium SMS rate for each message sent, but charging could also be done by other means. The business model of this service is one targeted at managing churn. Therefore, once the customer has been acquired, the pricing model is designed so as to not switch off the service. As a consequence, low subscription costs to the service may be offered. This will be market dependent and in some markets it may be free to subscribe, but the service will have a usage model instead. In other markets where a subscription would be acceptable to customers and pricing sensitivity is not high, a subscription may be added. In addition to a subscription, there may be a per text charge. This charge may be kept low enough so that is does not encourage churn, cira 10-15% of the additional cost of a standard SMS message.
The personality creation is be performed by an adapting text-to-speech algorithm that is resident on the application server 1 to alter it's acoustic speech playback model dependant on parameters that are entered into it from data collected during a prior personality training session. The training session uses "known speech", by which is meant spoken words with corresponding known digital speech/signal patterns. The personality creation system creates parameters for use by the text-to-speech system. It does this by inputting a fixed series of speeches read with the accent of a particular personality. This "impersonation" is typically done by a hired external impersonator. It is also possible in the case of public figures to take samples of their speech, (where there are enough public/available examples of their speech) and create a set of parameters which is able to be used by the text-to-speech software. This can be done by manually transcribing spoken words contained within the recorded material. The personality creation is performed when recorded speech is run through an algorithm and produces a series of parameters, which can then be used by the text-to-speech software in order to adjust the text-to-speech playback model.
There is also a normalisation file created to determine the persona of the character/ personality. If the normalisation file is not populated or enabled in the Application server 1 then a text sent by a user will be spoken verbatim with the accent of the person being emulated and consequently any unique phrases or ways of speaking are not automatically inserted into the text. If the normalisation file has data inserted in it then it enables the persona of the character to be inserted into the message. This allows various phrases, pauses in speech etc to be added to the message. These words etc are inserted into the text, to modify the original before any text-to-voice conversion takes place, therefore when the text-to- speech software subsequently processes the text, the modified message closely resembles the manner in which the personality being emulated would deliver the message that was originally sent. The tool 6 to create personalities comprises a computer, and audio microphone, amplifier and various A-to-D and D-to-A converter elements, and software running on the computer that creates the parameters that are used by the text-to-speech software running on the application server 1.
In an alternative embodiment of the invention, a message can be delivered as a sung text, and the melody used may also be selected by the user from a choice of well-known melodies, or the user may be able to edit the words of a well known song such as "Happy Birthday". For example, the user may just add the name of the birthday celebrant in the song "Happy Birthday", or may make other word changes to personalise the song. The user may even select the personality who is to sing the song. Rap songs may be a popular style of message delivery.
In another embodiment of the invention, a message may be delivered as a voice message with an added audio component including an audio background such as music or ambient noises of an office or public house or an audience or crowd or countryside or seaside, etc. as selected by the user. In another embodiment of the invention, a message may be delivered as a voice message together with an audio clip of music or other sound as selected by the user, which is added to the voice message as a prefix or preamble or postamble or combination of these.
In another embodiment of the invention, a user's message may be delivered with graphics or video material which may relate to the selected personality. For example, a video of a personality may comprise a talking head with lip sync to match the voice message. The personality may, for example, comprise the Queen, or George Bush or another celebrity such as a film or pop star who reads the voice message in the voice of that personality. In addition, music might be included in the message such as the national anthem of United Kingdom or USA. In another example, a graphics or video or audio clip may be included with the voice message which interprets certain words or symbols in a predetermined manner within the animation. For example, a "©" may be interpreted in a happy manner or an "x" at the end of a message may be interpreted as a kiss sound "Mwah". This interpretation may be selectable or editable by the user. This embodiment of the invention can be readily implemented on a system such as a telephone system where terminals or handsets are capable of displaying graphics content.
In other embodiments of the invention, the sent message may be an MMS message which contains text and animation data, in which case this does not need to be converted to an MMS message (as is the case when sending an SMS message), and the MMS message can therefore be routed in the normal manner once it has undergone conversion in the application server 1.
The invention is applicable on an international level, although it may be that the user has to enter a short code in front of every message when roaming; as internationally, SMSC may not be able to route as previously outlined. This may be the default condition due to the SMSC functionality. However, it may be the case that within the Home Location Register (HLR) a setting could be enabled such that when the phone registered on a new (roaming) network, the Visitor Location Network (VLR) is updated with this setting from the HLR regarding supplementary services so as to enable VLR to the revector the message to the application server 1.
In another embodiment of the invention, voice messages in communications systems are modified according to a style selected by a user before being delivered. The style may be selected by the sender or receiver. The voice message may be sent to a mobile telephone terminal or a land line or to voicemail. The communications system may for example comprise an IP telephone system, and the implementation of the invention would be similar to that of the IP telecommunications system already described in an adjunct server is used to route messages for voice processing to the speech generation system. Once converted to the required style of voice, the message is then routed onwards to the specified addressee. A code selected by the user is picked up by the adjunct server and used to route all subsequent incoming voice messages to the speech generation system, and the code identifies both the voice conversion service and the style of voice to be used.
In another embodiment of the system, an email system is adapted to incorporate the speech generation system. The speech generation system is hosted on an application server within the email system and emails are directed to the application server for conversion in accordance with whether or not users have elected to use the speech delivery service. Furthermore, the style of speech delivery may be selected by a user. The speech delivery service may be billed to users on the basis of volume of use or period of use of on a fixed charge basis. Charges may be varied according to the number of style options that a user selects for delivering emails.
Figure 2 illustrates an email system comprising an SMTP server 10 which interacts with an email client on the sender's terminal 11 and determines the identity of the sender and of the addressee, and from the latter and the Domain Name Server database identifies the addressee's SMTP server 12. The two SMTP servers 10,12 then communicate to transfer the email from one to the other, while the second SMTP server hands the email to its associated POP3 or IMAP server 13 to handle the delivery to the addressee's terminal 14. An application server 15 for the speech delivery service is provided to interwork with the SMTP server 10 to modify email messages in accordance with the identity of the sender and their subscription to this service. If the sender has not subscribed to the service then the email is passed on without any enhancement in the standard manner. However, if the sender has subscribed to the service then the email is referred to the application server where the text is converted to a speech message and attached to the original email and sent on via the SMTP server 10 or directly to the SMTP server 12. The application server 15 is comparable in its email conversion abilities to the application server 1 in the text messaging system of Figure 1. It will be appreciated that a similar application server 15 may be associated with the SMTP server 12 to support a similar speech conversion service for the email service provider operating the SMTP server 12.
hi other email embodiments of the invention, the speech generation system may be installed on a user's email terminal to process emails either before they are sent or once they have been received so that they are delivered as a speech message. In the case where the speech generation system processes emails on the sender's terminal before being sent, the system processes the email to generate a speech message which is saved as an audio file and attached to the original email before it is sent to the addressee. The recipient who receives the email will then receive the email text together with the corresponding audio file, which preferably opens automatically to deliver the speech message. In the case where the speech generation system processes emails on the recipient's terminal, the system processes received emails to generate corresponding speech messages. In both cases the speech generation system takes the form of an application installed on the user terminal 11 or 12 in Figure 2 instead of residing in an application server 15..
hi another embodiment of the invention, an email may be sent incorporating an applet to run on the recipient's terminal to generate the modified speech message.
In all of the above email embodiments of the invention, the email, as well as being converted into speech, may additionally be subject to any of the other audio or graphics/ video enhancements described above in relation to the mobile telephone embodiment, including personality and persona enhancements, musical or background enhancements, animations with or without lip sync, selection of all which may be under the control of the user. A sender of emails may select to send emails in the style of their own voice and accompanied by a video avatar. Any audio/video enhancements are saved in the file attached to the original email.
Emails enhanced in accordance with the invention may also automatically incorporate an advertisement to the speech delivery service, which may take the form of an audio/ video advertisement and/or a web link.
As described above the invention may be implemented as an email terminal which may take the form of a pc or handheld device including a telephone terminal. However, the invention can equally well be implemented in any terminal for receiving messages so as to deliver these in a selected style whether audio or video or a combination of these. The terminal incorporates a speech generation system maybe in the form of an applet to convert incoming messages in the selected style. The terminal may be used in an SMS/MMS system or instant messaging system or email system.
hi another embodiment of the invention a communications system such as a call centre or outbound telephone based or Internet based advertising system incorporates a speech generation system to enable a choice of voice styles, e.g. celebrity voices, to be used with outgoing messages. The outgoing messages may be generated as text or voice messages and then converted into a celebrity voice message that is delivered to the customer. In an interactive system in which responses to the customer are converted to a celebrity voice, the customer therefore has the impression of talking to a celebrity, which enhances the "stickiness" of the delivered message. The communications system may deliver purely audio messages, but in an alternative embodiment of the invention, the voice message generated may be incorporated in a multimedia message so that it is associated with a graphical or video representation which may take the form of a character or avatar that delivers the voice message.
Another embodiment of the invention comprises an advertising system or advertisement generation system which incorporates a speech generation system that can generate voice messages in different styles such as different celebrity voices to enhance the effectiveness of advertising messages, whether delivered as telephone or Internet advertisements or otherwise, and whether audio or multimedia advertisements.
It will be appreciated that the message created or delivered according to the invention may also contain advertising material including advertising or other information relating to the message service itself whether contained in the message or contained in a web link.
In another embodiment of the invention, a computer game is adapted so that the user can select the style of voice to be used in a game. For example, a computer game may involve characters who play out roles in a graphic representation which includes speech delivered by the characters. Characters in such games are often well known, for example, Lara Croft, and have a well developed personality including style of speech. Generally, this personality is fixed, and is what makes a game attractive to players. However, in some games the user is intended to identify with a particular character, or to adopt a personal representation of themself, known as an avatar. According to one embodiment of the invention, the avatar may adopt a style of voice selected by the user, which may be an impersonation of their own voice or that of a personality. The computer game may be played on a stand-alone computer, or on a networked system, or over the Internet via a user terminal, such as a computer, telephone, PDA or the like, and may involve multiple players.

Claims

Claims
1. A method of delivering a message to an addressee, characterised in that a user selects a style for the message to be delivered, and conversion apparatus converts the message to a modified speech message with reference to the style selected by the user, and the modified message is delivered to the addressee.
2. A method as claimed in claim 1 in which the user is the sender of said message or the addressee.
3. A method as claimed in claim 1 or 2 in which the user selects a style by sending a message containing a style code to the conversion apparatus.
4. A method as claimed in any one of the preceding claims in which the conversion apparatus stores the style selection made by the user and refers to this subsequently in converting messages into modified messages for delivery to addressees.
5. A method as claimed in any of the preceding claims in which the user can cancel a previously selected style or replace one selection with a new style selection.
6. A method as claimed in any of the preceding claims in which the message takes the form of a message in a network supporting a messaging service and having a messaging service centre, the messaging service centre responding to a message to or from a user who has selected a style by forwarding the message to the conversion apparatus for conversion of the message to a modified message according to the selected style.
7. A method as claimed in claim 6 in which the messaging service centre refers to a user database to determine whether a message to or from a user needs to be forwarded to the conversion apparatus for conversion to a modified message.
8. A method as claimed in claim 7 in which the user database stores user selections of styles as received by the messaging service centre and forwarded to the user database.
9. A method as claimed in claim 8 in which user identification and style selections are forwarded from the user database to the conversion apparatus for reference by the conversion apparatus in converting messages to modified message according to users' selections.
10. A method as claimed in any one of claims 1 to 5 in which the message takes the form of a message in a network supporting a messaging service and having a messaging service centre, the messaging service centre responding to a style selection from a user in respect of their sent messages by sending a re-direction message to the user's terminal so that messages thereafter are sent to a dedicated messaging service centre which in turn forwards the messages to the conversion apparatus for conversion to modified messages according to the selected style.
11. A method as claimed in claim 6 or 10 in which the network is a mobile telephone network or an IP telephone network.
12. A method as claimed in any one of claims 6 to 10 in which the messaging service is a short message services SMS or a multimedia message services MMS.
13. A method as claimed in any one of claims 6 to 11 in which the conversion apparatus forwards the messages to a Multi-media Message Switching Centre MMSC for forwarding to addressees.
14. A method as claimed in any one of claims 1 to 10 in which the message takes the form of an email which is delivered to an addressee as a modified message.
15. A method as claimed in any of the preceding claims in which the user is the sender of a message and the modified message is delivered to the user as well as to the addressee.
16. A method as claimed in claim 15 in which the user can control whether or not they receive a copy of the modified message corresponding to the message they send.
17. A method as claimed in any one of the preceding claims in which the style that is selected by the user is an audio style and the modified message comprises an audio message which incorporates the selected audio style.
18. A method as claimed in claim 17 in which the message sent is a verbal message and the modified message comprises an audio message.
19. A method as claimed in claim 18 in which the message is sent as a text message.
20. A method as claimed in claim 18 or 19 in which the audio message comprises a voice message.
21. A method as claimed in claim 20 in which the voice message is created in the vocal style of the user, or a well known person or fictional character or a generic vocal style, as selected by the user.
22. A method as claimed in any one of the preceding claims in which the conversion apparatus is such as to enhance a message so that the modified message has added content in accordance with the style selected by the user.
23. A method as claimed in claim 22 in which the conversion apparatus adds content to the message in accordance with a selected vocal style before converting the message to a voice message.
24. A method as claimed in claim 22 or 23 in which the added content comprises mannerisms of speech.
25. A method as claimed in claim 20 in which the voice message comprises a song in which the message is sung.
26. A method as claimed in claim 25 in which the song is sung in a rap style.
27. A method as claimed in claim 25 in which the song comprises a melody selected by the user.
28. A method as claimed in any one of claims 20 to 24 in which the audio message comprises a voice message in accordance with the verbal message with an audio background added as selected by the user.
29. A method as claimed in claim 28 in which the added background comprises music.
30. A method as claimed in any one of claims 1 to 17 in which an audio clip in accordance with the selected style is added to said message to form the modified message.
31. A method as claimed in claim 17 in which the modified message comprises said message with the audio message added as a prefix, or a preamble or a postamble or a combination of these.
32. A method as claimed in claim 31 in which the audio message is created in the vocal style of the user, or a well known person or fictional character, or a generic vocal style as selected by the user.
33. A method as claimed in any of the preceding claims in which graphic or video data is added to the modified message for delivery to the addressee so that the addressee receives a graphic display as selected by the user.
34. A method as claimed in claim 33 in which the graphic or video display comprises a character with lip synch in relation to the audio message so as to seemingly deliver the message.
35. Apparatus in which a message is delivered in accordance with a style selected by a user and in which the system converts the message to a modified speech message and the modified message is delivered to the addressee.
36. Apparatus as claims in claim 35 which includes text-to-speech conversion apparatus in order to create an audio message to form the modified message.
37. Apparatus as claimed in claim 36 in which the text-to-speech conversion apparatus is such as to create a voice message in a vocal style as selected by the user, the vocal style comprising the voice of the user or a well known person or fictional character or a generic voice as selected by the user.
38. A communications network including apparatus as claimed in any of claims 35 to 37 to convert a message into a modified message for delivery to an addressee, wherein the apparatus converts text message to a modified message by reference to a style selected by a user so that the modified message is created in the style selected by the user.
39. A communications system as claimed in claim 38 which is configured as a mobile telephone system, or an IP telephone system, or an instant messaging system or an email system.
40. A computer game in which the user selects the style of voice to be used in a computer game.
41. A computer game as claimed in claim 40 in which an avatar adopts a style of voice selected by the user, which may be an impersonation of their own voice or that of a personality.
42. A terminal for receiving messages incorporating a speech generation system that is capable of converting incoming messages to voice messages in different styles as selected by a user.
43. A terminal as claimed in claim 42 comprising a handheld device.
44. A terminal as claimed in claim 42 comprising a personal computer or other networked device supporting messaging within the network, the incoming messages being converted to voice messages in the selected style.
45. A terminal as claimed in claim 42 that enables emails to be delivered as a voice message in a selected style, as selected by the sender or recipient.
46. A Communications system such as a call centre or outbound telephone based or Internet based advertising system in which a speech generation system enables a choice of voice styles to be used with outgoing messages.
47. An advertising system or advertisement generation system which incorporates a speech generation system that can generate voice messages in different styles such as different celebrity voices to enhance the effectiveness of advertising messages.
PCT/GB2007/004146 2006-10-30 2007-10-30 Speech communication method and apparatus WO2008053204A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GB0621547A GB2443468A (en) 2006-10-30 2006-10-30 Message delivery service and converting text to a user chosen style of speech
GB0621547.9 2006-10-30
GB0706505.5 2007-04-03
GB0706505A GB0706505D0 (en) 2007-04-03 2007-04-03 Speech communication systems

Publications (1)

Publication Number Publication Date
WO2008053204A1 true WO2008053204A1 (en) 2008-05-08

Family

ID=39064362

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2007/004146 WO2008053204A1 (en) 2006-10-30 2007-10-30 Speech communication method and apparatus

Country Status (1)

Country Link
WO (1) WO2008053204A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012079409A1 (en) * 2010-12-15 2012-06-21 中兴通讯股份有限公司 Processing method and system for media message
CN106230686A (en) * 2015-12-30 2016-12-14 深圳超多维科技有限公司 Main broadcaster's class interaction platform word rendering method and device, client

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5747715A (en) * 1995-08-04 1998-05-05 Yamaha Corporation Electronic musical apparatus using vocalized sounds to sing a song automatically
US6072467A (en) * 1996-05-03 2000-06-06 Mitsubishi Electric Information Technology Center America, Inc. (Ita) Continuously variable control of animated on-screen characters
DE10056762A1 (en) * 2000-11-14 2002-05-23 Stefan Schleifer Electronic information transmission method has numerical characters representing information text converted into synthetic speech and combined with video signals for monitor display
US20020191757A1 (en) * 2001-06-04 2002-12-19 Hewlett-Packard Company Audio-form presentation of text messages

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5747715A (en) * 1995-08-04 1998-05-05 Yamaha Corporation Electronic musical apparatus using vocalized sounds to sing a song automatically
US6072467A (en) * 1996-05-03 2000-06-06 Mitsubishi Electric Information Technology Center America, Inc. (Ita) Continuously variable control of animated on-screen characters
DE10056762A1 (en) * 2000-11-14 2002-05-23 Stefan Schleifer Electronic information transmission method has numerical characters representing information text converted into synthetic speech and combined with video signals for monitor display
US20020191757A1 (en) * 2001-06-04 2002-12-19 Hewlett-Packard Company Audio-form presentation of text messages

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012079409A1 (en) * 2010-12-15 2012-06-21 中兴通讯股份有限公司 Processing method and system for media message
CN106230686A (en) * 2015-12-30 2016-12-14 深圳超多维科技有限公司 Main broadcaster's class interaction platform word rendering method and device, client
CN106230686B (en) * 2015-12-30 2019-10-25 深圳超多维科技有限公司 Main broadcaster's class interaction platform text rendering method and its device, client

Similar Documents

Publication Publication Date Title
AU2010257228B2 (en) A method of providing voicemails to a wireless information device
US7116976B2 (en) Adaptable communication techniques for electronic devices
US20070054678A1 (en) Method of generating a sms or mms text message for receipt by a wireless information device
US7099457B2 (en) Personal ring tone message indicator
ZA200700775B (en) A method of providing voicemails to a wireless information device
EP1411736A1 (en) System and method for converting text messages prepared with a mobile equipment into voice messages
WO2008053204A1 (en) Speech communication method and apparatus
GB2443468A (en) Message delivery service and converting text to a user chosen style of speech
JP2007515082A (en) Method and system for transmission of voice content by MMS
BE1017454A6 (en) Short text messaging method, sends text code with message in order to activate e.g. sounds or graphic images in destination phone when message is opened
CN201499306U (en) System for realizing power-off show of mobile phone

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07824390

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07824390

Country of ref document: EP

Kind code of ref document: A1