US6405169B1 - Speech synthesis apparatus - Google Patents

Speech synthesis apparatus Download PDF

Info

Publication number
US6405169B1
US6405169B1 US09/325,544 US32554499A US6405169B1 US 6405169 B1 US6405169 B1 US 6405169B1 US 32554499 A US32554499 A US 32554499A US 6405169 B1 US6405169 B1 US 6405169B1
Authority
US
United States
Prior art keywords
information
modification
phonological
section
prosodic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
US09/325,544
Inventor
Reishi Kondo
Yukio Mitome
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Assigned to NEC CORPORATION reassignment NEC CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KONDO, REISHI, MITOME, YUKIO
Application granted granted Critical
Publication of US6405169B1 publication Critical patent/US6405169B1/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management

Definitions

  • the present invention relates to a speech synthesis apparatus, and more particularly to an apparatus which performs speech synthesis by rule.
  • control parameters of synthetic speech are produced, and a speech waveform is produced based on the control parameters using an LSP (line spectrum pair) synthesis filter system, a formant synthesis system or a waveform editing system.
  • LSP line spectrum pair
  • Control parameters of synthetic speech are roughly divided into phonological unit information and prosodic information.
  • the phonological unit information is information regarding a list of phonological units used
  • the prosodic information is information regarding a pitch pattern representative of intonation and accent and duration lengths representative of rhythm.
  • phonological unit information and prosodic information For production of phonological unit information and prosodic information, a method is conventionally known and disclosed, for example, in Furui, “Digital Speech processing”, p.146, FIGS. 7 and 6 (document 1) wherein phonological unit information and prosodic information are produced separately from each other.
  • meta information such as phonemic representations or devocalization regarding phonological units is used to produce prosodic information, but information of phonological units actually used for synthesis is not used.
  • the time length or the pitch frequency of the original speech is different.
  • the prosodic information is modified using the phonological unit information. Specifically, duration length information and pitch pattern information and the phonological unit information are modified with each other.
  • a speech synthesis apparatus comprising prosodic pattern production means for producing a prosodic pattern, phonological unit selection means for selecting phonological units based on the prosodic pattern produced by the prosodic pattern production means, and means for modifying the prosodic pattern based on the selected phonological units.
  • the speech synthesis apparatus is advantageous in that prosodic information can be modified based on phonological unit information, and consequently, synthetic speech with reduced distortion can be obtained taking environments of phonological units as collected into consideration.
  • a speech synthesis apparatus comprising prosodic pattern production means for producing a prosodic pattern, phonological unit selection means for selecting phonological units based on the prosodic pattern produced by the prosodic pattern production means, and means for feeding back the phonological units selected by the phonological unit selection means to the prosodic pattern production means so that the prosodic pattern and the selected phonological units are modified repetitively.
  • the speech synthesis apparatus is advantageous in that, since phonological unit information is fed back to repetitively perform modification to it, synthetic speech with further reduced distortion can be obtained.
  • a speech synthesis apparatus comprising duration length production means for producing duration lengths of phonological units, pitch pattern production means for producing a pitch pattern based on the duration lengths produced by the duration length production means, and means for feeding back the pitch pattern to the duration length production means so that the phonological unit duration lengths are modified.
  • the speech synthesis apparatus is advantageous in that duration lengths of phonological units can be modified based on a pitch pattern and synthetic speech of a high quality can be produced.
  • a speech synthesis apparatus comprising duration length production means for producing duration lengths of phonological units, pitch pattern production means for producing a pitch pattern, phonological unit selection means for selecting phonological units, first means for supplying the duration lengths produced by the duration length production means to the pitch pattern production means and the phonological unit selection means, second means for supplying the pitch pattern produced by the pitch pattern production means to the duration length production means and the phonological unit selection means, and third means for supplying the phonological units selected by the phonological unit selection means to the pitch pattern production means and the duration length production means, the duration lengths, the pitch pattern and the phonological units being modified by cooperative operations of the duration length production means, the pitch pattern production means and the phonological unit selection means.
  • the speech synthesis apparatus is advantageous in that modification to duration lengths and a pitch pattern of phonological units and phonological unit information can be performed by referring to them with each other and synthetic speech of a high quality can be produced.
  • a speech synthesis apparatus comprising duration length production means for producing duration lengths of phonological units, pitch pattern production means for producing a pitch pattern, phonological unit selection means for selecting phonological units, and control means for activating the duration length production means, the pitch pattern production means and the phonological unit selection means in this order and controlling the duration length production means, the pitch pattern production means and the phonological unit selection means so that at least one of the duration lengths produced by the duration length production means, the pitch pattern produced by the pitch pattern production means and the phonological units selected by the phonological unit selection means is modified by a corresponding one of the duration length production means, the pitch pattern production means and the phonological unit selection means.
  • the speech synthesis apparatus is advantageous in that, since modification to duration lengths and a pitch pattern of phonological units and phonological unit information is determined not independently of each other but collectively by the single control means, synthetic speech of a high quality can be produced and the amount of calculation can be reduced.
  • the speech synthesis apparatus may be constructed such that it further comprises a shared information storage section, and the duration length production means produces duration lengths based on information stored in the shared information storage section and writes the duration length into the shared information storage section, the pitch pattern production section produces a pitch pattern based on the information stored in the shared information storage section and writes the pitch pattern into the shared information storage section, and the phonological unit selection means selects phonological units based on the information stored in the shared information storage section and writes the phonological units into the shared information storage section.
  • the speech synthesis apparatus is advantageous in that, since information mutually relating to the pertaining means is shared by the pertaining means, reduction of the calculation time can be achieved.
  • FIG. 1 is a block diagram showing a speech synthesis apparatus to which the present invention is applied;
  • FIG. 2 is a table illustrating an example of phonological unit information to be selected in the speech synthesis apparatus of FIG. 1;
  • FIG. 3 is a table schematically illustrating contents of a phonological unit condition database used in the speech synthesis apparatus of FIG. 1;
  • FIG. 4 is a diagrammatic view illustrating operation of a phonological unit modification section of the speech synthesis apparatus of FIG. 1;
  • FIG. 5 is a table illustrating an example of phonological unit modification rules used in the speech synthesis apparatus of FIG. 1;
  • FIG. 6 is a block diagram of a modification to the speech synthesis apparatus of FIG. 1;
  • FIG. 7 is a block diagram of another modification to the speech synthesis apparatus of FIG. 1;
  • FIG. 8 is a diagrammatic view illustrating operation of a duration length modification control section of the modified speech synthesis apparatus of FIG. 7;
  • FIGS. 9 to 11 are block diagrams of different modifications to the speech synthesis apparatus of FIG. 1 .
  • a speech synthesis apparatus includes a prosodic pattern production section ( 21 in FIG. 1) for receiving utterance contents such as a text and a phonetic symbol train to be uttered, index information representative of a particular utterance text and so forth as an input thereto and producing a prosodic pattern which includes one or more or all of an accent position, a pause position, a pitch pattern and a duration length, a phonological unit selection section ( 22 of FIG. 1) for selecting phonological units based on the prosodic pattern produced by the prosodic pattern production section, a prosody modification control section ( 23 of FIG.
  • a speech synthesis apparatus includes a prosodic pattern production section for producing a prosodic pattern, and a phonological unit selection section for selecting phonological units based on the prosodic pattern produced by the prosodic pattern production section ( 21 of FIG. 1 ), and feeds back contents of a location for modification regarding phonological units selected by the phonological unit selection section from a prosody modification control section ( 23 of FIG. 1) to the prosodic pattern production section so that the prosodic pattern and the selected phonological units are modified repetitively.
  • the prosodic pattern production section for receiving utterance contents as an input thereto and producing a prosodic pattern based on the utterance contents includes a duration length production section ( 26 of FIG. 6) for producing duration lengths of phonological units and a pitch pattern production section ( 27 of FIG. 6) for producing a prosodic pattern based on the duration lengths produced by the duration length production section.
  • the phonological unit selection section ( 22 of FIG. 6) selects phonological units based on the prosodic pattern produced by the pitch pattern production section.
  • the phonological unit selection section searches the phonological unit information selected by the phonological unit selection section for a location for which modification to the prosodic pattern produced by the pitch pattern production section is required and feeds back, when modification is required, information of contents of the modification to the duration length production section and/or the pitch pattern production section so that the duration lengths and the pitch pattern are modified by the duration length production section and the pitch pattern production section, respectively.
  • the prosodic pattern and the selected phonological units are modified repetitively.
  • a speech synthesis apparatus includes a duration length production section ( 26 of FIG. 7) for producing duration lengths of phonological units, a pitch pattern production section ( 27 of FIG. 7) for producing a pitch pattern based on the duration lengths produced by the duration length production section, and a duration length modification control section ( 29 of FIG. 7) for feeding back the pitch pattern to the duration length production section so that the phonological unit duration lengths are modified.
  • the speech synthesis apparatus further includes a duration length modification control section ( 29 of FIG. 7) for discriminating modification contents to the duration length information produced by the duration length production section ( 26 of FIG. 7 ), and a duration length modification section ( 30 of FIG. 7) for modifying the duration length information in accordance with the modification contents outputted from the duration length modification control section ( 29 of FIG. 7 ).
  • a speech synthesis apparatus includes a duration length production section ( 26 of FIG. 9) for producing duration lengths of phonological units, a pitch pattern production section ( 27 of FIG. 9) for producing a pitch pattern, a phonological unit selection section ( 22 of FIG. 9) for selecting phonological units, a means ( 29 of FIG. 9) for supplying the duration lengths produced by the duration length production section ( 26 of FIG. 9) to the pitch pattern production section and the phonological unit selection section, another means ( 31 of FIG. 9) for supplying the pitch pattern produced by the pitch pattern production section to the duration length production section and the phonological unit selection section, and a further means ( 32 of FIG.
  • a duration length modification control section determines modification contents to the duration lengths based on the utterance contents, the pitch pattern information from the pitch pattern production section ( 27 of FIG. 9) and the phonological unit information from the phonological unit selection section ( 22 of FIG. 9 ), and the duration length production section ( 26 of FIG. 9) produces duration length information in accordance with the thus determined modification contents.
  • a pitch pattern modification control section ( 31 of FIG.
  • a phonological unit modification control section ( 32 of FIG. 9) determines modification contents to the phonological units based on the uttered contents, the duration length information from the duration time production section ( 26 of FIG. 9) and the pitch pattern information from the pitch pattern production section ( 27 of FIG. 9 ), and the phonological unit selection section ( 22 of FIG. 9) produces pitch pattern information in accordance with the thus determined modification contents.
  • a phonological unit modification control section ( 32 of FIG. 9) determines modification contents to the phonological units based on the uttered contents, the duration length information from the duration time production section ( 26 of FIG. 9) and the pitch pattern information from the pitch pattern production section ( 27 of FIG. 9 ), and the phonological unit selection section ( 22 of FIG. 9) produces phonological unit information in accordance with the thus determined modification contents.
  • the speech synthesis apparatus may further include a shared information storage section ( 52 of FIG. 11 ).
  • the duration length production section ( 26 of FIG. 11) produces duration lengths based on information stored in the shared information storage section and writes the duration length into the shared information storage section.
  • the pitch pattern production section ( 27 of FIG. 11) produces a pitch pattern based on the information stored in the shared storage section and writes the pitch pattern into the shared information storage section.
  • the phonological unit selection section selects phonological units based on the information stored in the shared information storage section and writes the phonological units into the shared information storage section.
  • the speech synthesis apparatus may further include a shared information storage section ( 52 of FIG. 11 ).
  • the duration length production section ( 26 of FIG. 11) produces duration lengths based on information stored in the shared information storage section and writes the duration length into the shared information storage section.
  • the pitch pattern production section ( 28 of FIG. 11) produces a pitch pattern based on the information stored in the shared information storage section and writes the pitch pattern into the shared information storage section.
  • the phonological unit selection section selects phonological units based on the information stored in the shared information storage section and writes the phonological units into the shared information storage section.
  • the speech synthesis apparatus shown includes a prosody production section 21 , a phonological unit selection section 22 , a prosody modification control section 23 , a prosody modification section 24 , a waveform production section 25 , a phonological unit condition database 41 and a phonological unit database 42 .
  • the prosody production section 21 receives contents 11 of utterance as an input thereto and produces prosodic information 12 .
  • the utterance contents 11 include a text and a phonetic symbol train to be uttered, index information representative of a particular utterance text and so forth.
  • the prosodic information 12 includes one or more or all of an accent position, a pause position, a pitch pattern and a duration length.
  • the phonological unit selection section 22 receives the utterance contents 11 and the prosodic information produced by the prosody production section 21 as inputs thereto, selects a suitable phonological unit sequence from phonological units recorded in the phonological unit condition database 41 and determines the selected phonological unit sequence as phonological unit information 13 .
  • the phonological unit information 13 may possibly be different significantly depending upon a method employed by the waveform production section 25 . However, a train of indices representative of phonological units actually used as seen in FIG. 2 is used as the phonological unit information 13 here.
  • FIG. 2 illustrates an example of an index train of phonological units selected by the phonological unit selection section 22 when the utterance contents are “aisatsu”.
  • FIG. 3 illustrates contents of the phonological unit condition database 41 of the speech synthesis apparatus of FIG. 1 .
  • the phonological unit condition database 41 information regarding a symbol representative of a phonological unit, a pitch frequency of a speech as collected, a duration length and an accent position is recorded in advance for each phonological unit provided in the speech synthesis apparatus.
  • the prosody modification control section 23 searches the phonological unit information 13 selected by the phonological unit selection section 22 for a portion for which modification in prosody is required. Then, the prosody modification control section 23 sends information of the location for modification and contents of the modification to the prosody modification section 24 , and the prosody modification section 24 modifies the prosodic information 12 from the prosody production section 21 based on the received information.
  • the prosody modification control section 23 which discriminates whether or not modification in prosody is required determines whether modification to the prosodic information 12 is required in accordance with rules determined in advance.
  • FIG. 4 illustrates operation of the prosody modification control section 23 of the speech synthesis apparatus of FIG. 1, and such operation of the prosody modification control section 23 is described below with reference to FIG. 4 .
  • the utterance contents are “aisatsu”, and with regard to the first phonological unit “a” of the utterance contents, the pitch frequency produced by the prosody production section 21 is 190 Hz and the duration length is 80 msec. Further, with regard to the same first phonological unit “a”, the phonological unit index selected by the phonological unit selection section 22 is 1. Thus, by referring to the index 1 of the phonological unit condition database 41 , it can be seen that the pitch frequency of the sound as collected is 190 Hz, and the duration length of the sound as collected is 80 msec. In this instance, since the conditions when the speech was collected and the conditions to be produced actually coincide with each other, no modification is performed.
  • the pitch frequency produced by the prosody production section 21 is 160 Hz, and the duration length is 85 msec. Since the phonological unit index selected by the phonological unit selection section 22 is 81, the pitch frequency of the sound as collected was 163 Hz and the duration length of the sound as collected was 85 msec. In this instance, since the duration lengths are equal to each other, no modification is required, but the pitch frequencies are different from each other.
  • FIG. 5 illustrates an example of the rules used by the prosody modification section 24 of the speech synthesis apparatus of FIG. 1 .
  • Each rule includes a rule number, a condition part and an action (if ⁇ condition> then ⁇ action> format), and if satisfaction of a condition is determined, then processing of the corresponding action is performed.
  • the pitch frequency mentioned above satisfies the condition part of the rule 1 (the difference between a pitch to be produced for a voiced short vowel (a, i, u, e, o) and the pitch of the sound as collected is within 5 Hz) and makes an object of modification (the action is to modify the pitch frequency to that of the collected sound), and consequently, the pitch frequency is modified to 163 Hz. Consequently, since the pitch frequency need not be transformed unnecessarily, the synthetic sound quality is improved.
  • the pitch frequency is not defined, and the duration length produced by the prosody production section 21 is 100 msec.
  • the duration length of the sound as collected is 90 msec. This duration length satisfies the rule 2 of FIG. 5 and makes an object of modification, and consequently, the duration length is modified to 90 msec. Consequently, since the duration length need not be transformed unnecessarily, the synthetic sound quality is improved.
  • the waveform production section 25 produces synthetic speech based on the phonological unit information 13 and the prosodic information 12 modified by the prosody modification section 24 using the phonological unit database 42 .
  • phonological unit database 42 speech element pieces for production of synthetic speech corresponding to the phonological unit condition database 41 are registered.
  • the modified speech synthesis apparatus is different from the speech synthesis apparatus of FIG. 1 in that it includes, in place of the prosody production section 21 described hereinabove, a duration length production section 26 and a pitch pattern production section 27 which successively produce duration length information 15 and pitch pattern information, respectively, to produce prosodic information 12 .
  • the duration length production section 26 produces duration lengths for utterance contents 11 inputted thereto. At this time, however, if a duration length is designated for some phonological unit, then the duration length production section 26 uses the duration length to produce a duration length of the entire utterance contents 11 .
  • the pitch pattern production section 27 produces a pitch pattern for the utterance contents 11 inputted thereto. However, if a pitch frequency is designated for some phonological unit, then the pitch pattern production section 27 uses the pitch frequency to produce a pitch pattern for the entire utterance contents 11 .
  • the prosody modification control section 23 sends modification contents to phonological unit information determined in a similar manner as in the speech synthesis apparatus of FIG. 1 not to the prosody modification section 24 but to the duration length production section 26 and the pitch pattern production section 27 when necessary.
  • the duration length production section 26 re-produces, when the modification contents are sent thereto from the prosody modification control section 23 , duration length information in accordance with the modification contents. Thereafter, the operations of the pitch pattern production section 27 , phonological unit selection section 22 and prosody modification control section 23 described above are repeated.
  • the pitch pattern production section 27 re-produces, when the modification contents are set thereto from the prosody modification control section 23 , pitch pattern information in accordance with the contents of modification. Thereafter, the operations of the phonological unit selection section 22 and the prosody modification control section 23 are repeated. If the necessity for modification is eliminated, then the prosody modification control section 23 sends the prosodic information 12 received from the pitch pattern production section 27 to the waveform production section 25 .
  • the present modified speech synthesis apparatus performs, different from the speech synthesis apparatus of FIG. 1, feedback control, and to this end, discrimination of convergence is performed by the prosody modification control section 23 . More particularly, the number of times of modification is counted, and if the number of times of modification exceeds a prescribed number determined in advance, then the prosody modification control section 23 determines that there remains no portion to be modified and sends the prosodic information 12 then to the waveform production section 25 .
  • the present modified speech synthesis apparatus is different from the speech synthesis apparatus of FIG. 1 in that it includes, in place of the prosody production section 21 , a duration length production section 26 and a pitch pattern production section 27 similarly as in the modified speech synthesis apparatus of FIG. 6, and further includes a duration length modification control section 29 for discriminating contents of modification to duration length information produced by the duration length production section 26 , and a duration length modification section 30 for modifying the duration length information 15 in accordance with the modification contents outputted from the duration length modification control section 29 .
  • the duration length modification control section 29 of the present modified speech synthesis apparatus is described with reference to FIG. 8 .
  • the pitch frequency produced by the pitch pattern production section 27 is 190 Hz.
  • the duration length modification control section 29 has predetermined duration length modification rules (if then format) provided therein, and the pitch frequency of 190 Hz mentioned above corresponds to the rule 1 . Therefore, the duration length for the phonological unit “a” is modified to 85 msec.
  • the duration length modification control section 29 does not have a pertaining duration length modification rule and therefore is not subject to modification. All of the phonological units of the utterance contents 11 are checked to detect whether or not modification is required in this manner to determine modification contents to duration length information 15 .
  • the present modified speech synthesis apparatus is different from the speech synthesis apparatus of FIG. 1 in that it includes, in place of the prosody production section 21 , a duration length production section 26 and a pitch pattern production section 27 similarly as in the speech synthesis apparatus of FIG. 6, and further includes a duration length modification control section 29 , a pitch pattern modification control section 31 and a phonological unit modification control section 32 .
  • the duration length modification control section 29 determines modification contents to duration lengths based on utterance contents 11 , pitch pattern information 16 and phonological unit information 13 , and the duration length production section 26 produces duration length information 15 in accordance with the modification contents.
  • the pitch pattern modification control section 31 determines modification contents to a pitch pattern based on the utterance contents 11 , duration length information 15 and phonological unit information 13 , and the pitch pattern production section 27 produces pitch pattern information 16 in accordance with the thus determined modification contents.
  • the phonological unit modification control section 32 determines modification contents to phonological units based on the utterance contents 11 , duration length information 15 and pitch pattern information 16 , and the phonological unit selection section 22 produces phonological unit information 13 in accordance with the thus determined modification contents.
  • the duration length modification control section 29 determines that no modification should be performed, and the duration length production section 26 produces duration lengths in accordance with the utterance contents 11 .
  • the pitch pattern modification control section 31 determines modification contents based on the duration length information 15 and the utterance contents 11 since the phonological unit information 13 is not produced as yet, and the pitch pattern production section 27 produces pitch pattern information 16 in accordance with the thus determined modification contents.
  • the phonological unit modification control section 32 determines modification contents based on the utterance contents 11 , duration length information 15 and pitch pattern information 16 , and the phonological unit selection section 22 produces phonological unit information based on the thus determined modification contents using the phonological unit condition database 41 .
  • duration length information 15 is updated, pitch pattern information 16 and phonological unit information 13 are updated, and the duration length modification control section 29 , pitch pattern modification control section 31 and phonological unit modification control section 32 to which they are inputted, respectively, are activated to perform their respective operations.
  • the waveform production section 25 produces a speech waveform 14 .
  • the end condition may be, for example, that the total number of updating times exceeds a value determined in advance.
  • the present modified speech synthesis apparatus is different from the modified speech synthesis of FIG. 6 in that it does not include the prosody modification control section 23 but includes a control section 51 instead.
  • the control section 51 receives utterance contents 11 as an input thereto and sends the utterance contents 11 to the duration length production section 26 .
  • the duration length production section 26 produces duration length information 15 based on the utterance contents 11 and sends the duration length information 15 to the control section 51 .
  • control section 51 sends the utterance contents 11 and the duration length information 15 to the pitch pattern production section 27 .
  • the pitch pattern production section 27 produces pitch pattern information 16 based on the utterance contents 11 and the duration length information 15 and sends the pitch pattern information 16 to the control section 51 .
  • control section 51 sends the utterance contents 11 , duration length information 15 and pitch pattern information 16 to the phonological unit selection section 22 , and the phonological unit selection section 22 produces phonological unit information 13 based on the utterance contents 11 , duration length information 15 and pitch pattern information 16 and sends the phonological unit information 13 to the control section 51 .
  • the control section 51 discriminates, if any of the duration length information 15 , pitch pattern information 16 and phonological unit information 13 is varied, information whose modification becomes required as a result of the variation, and then sends modification contents to the pertaining one of the duration length production section 26 , pitch pattern production section 27 and phonological unit selection section 22 so that suitable modification may be performed for the information.
  • the criteria for the modification are similar to those in the speech synthesis apparatus described hereinabove.
  • control section 51 If the control section 51 discriminates that there is no necessity for modification, then it sends the duration length information 15 , pitch pattern information 16 and phonological unit information 13 to the waveform production section 25 , and the waveform production section 25 produces a speech waveform 14 based on the thus received duration length information 15 , pitch pattern information 16 and phonological unit information 13 .
  • the present modified speech synthesis apparatus is different from the speech synthesis apparatus of FIG. 10 in that it additionally includes a shared information storage section 52 .
  • the control section 51 instructs the duration length production section 26 , pitch pattern production section 27 and phonological unit selection section 22 to produce duration length information 15 , pitch pattern information 16 and phonological unit information 13 , respectively.
  • the thus produced duration length information 15 , pitch pattern information 16 and phonological unit information 13 are stored into the shared information storage section 52 by the duration length production section 26 , pitch pattern production section 27 and phonological unit selection section 22 , respectively.
  • the control section 51 discriminates that there is no necessity for modification any more, then the waveform production section 25 reads out the duration length information 15 , pitch pattern information 16 and phonological unit information 13 from the shared information storage section 52 and produces a speech waveform 14 based on the duration length information 15 , pitch pattern information 16 and phonological unit information 13 .

Abstract

The invention provides a speech synthesis apparatus which can produce synthetic speech of a high quality with reduced distortion. To this end, upon production of synthetic speech based on prosodic information and phonological unit information, the prosodic information is modified using the phonological unit information, and duration length information and pitch pattern information of phonological units of the prosodic information and the phonological unit information are modified with each other. The speech synthesis apparatus includes a prosodic pattern production section for receiving utterance contents as an input thereto and producing a prosodic pattern, a phonological unit selection section for selecting phonological units based on the prosodic pattern, a prosody modification control section for searching the phonological unit information selected by the phonological unit selection section for a location for which modification to the prosodic pattern is required and outputting information of the location for the modification and contents of the modification, a prosody modification section for modifying the prosodic pattern based on the information of the location for the modification and the contents of the modification outputted from the prosody modification control section, and a waveform production section for producing synthetic speech based on the phonological unit information and the prosodic information modified by the prosody modification section using a phonological unit database.

Description

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a speech synthesis apparatus, and more particularly to an apparatus which performs speech synthesis by rule.
2. Description of the Related Art
Conventionally, in order to perform speech synthesis by rule, control parameters of synthetic speech are produced, and a speech waveform is produced based on the control parameters using an LSP (line spectrum pair) synthesis filter system, a formant synthesis system or a waveform editing system.
Control parameters of synthetic speech are roughly divided into phonological unit information and prosodic information. The phonological unit information is information regarding a list of phonological units used, and the prosodic information is information regarding a pitch pattern representative of intonation and accent and duration lengths representative of rhythm.
For production of phonological unit information and prosodic information, a method is conventionally known and disclosed, for example, in Furui, “Digital Speech processing”, p.146, FIGS. 7 and 6 (document 1) wherein phonological unit information and prosodic information are produced separately from each other.
Also another method is known and disclosed in Takahashi et al., “Speech Synthesis Software for a Personal Computer”, Collection of Papers of the 47th National Meeting of the Information Processing Society of Japan, pages 2-377 to 2-378 (document 2) wherein prosodic information is produced first, and then phonological unit information is produced based on the prosodic information. In the method, upon production of the prosodic information, duration lengths are produced first, and then a pitch pattern is produced. However, also an alternative method is known wherein duration lengths and a pitch pattern information are produced independently of each other.
Further, as a method of improving the quality of synthetic speech after prosodic information and phonological unit information are produced, a method is proposed, for example, in Japanese Patent Laid-Open Application No. Hei 4-053998 wherein a signal for improving the quality of speech is generated based on phonological unit parameters.
Conventionally, for control parameters to be used for speech synthesis by rule, meta information such as phonemic representations or devocalization regarding phonological units is used to produce prosodic information, but information of phonological units actually used for synthesis is not used.
Here, for example, in a speech synthesis apparatus which produces a speech waveform using a waveform concatenation method, for each of phonological units actually selected, the time length or the pitch frequency of the original speech is different.
Consequently, there is a problem in that a phonological unit actually used for synthesis is sometimes varied unnecessarily from its phonological unit as collected and this sometimes gives rise to a distortion of the sound on the sense of hearing.
SUMMARY OF THE INVENTION
It is an object of the present invention to provide a speech synthesis apparatus which reduces a distortion of synthetic speech.
It is another object of the present invention to provide a speech synthesis apparatus which can produce synthetic speech of a high quality.
In order to attain the objects described above, according to the present invention, upon production of synthetic speech based on prosodic information and phonological unit information, the prosodic information is modified using the phonological unit information. Specifically, duration length information and pitch pattern information and the phonological unit information are modified with each other.
In particular, according to an aspect of the present invention, there is provided a speech synthesis apparatus, comprising prosodic pattern production means for producing a prosodic pattern, phonological unit selection means for selecting phonological units based on the prosodic pattern produced by the prosodic pattern production means, and means for modifying the prosodic pattern based on the selected phonological units.
The speech synthesis apparatus is advantageous in that prosodic information can be modified based on phonological unit information, and consequently, synthetic speech with reduced distortion can be obtained taking environments of phonological units as collected into consideration.
According to another aspect of the present invention, there is provided a speech synthesis apparatus, comprising prosodic pattern production means for producing a prosodic pattern, phonological unit selection means for selecting phonological units based on the prosodic pattern produced by the prosodic pattern production means, and means for feeding back the phonological units selected by the phonological unit selection means to the prosodic pattern production means so that the prosodic pattern and the selected phonological units are modified repetitively.
The speech synthesis apparatus is advantageous in that, since phonological unit information is fed back to repetitively perform modification to it, synthetic speech with further reduced distortion can be obtained.
According to a further aspect of the present invention, there is provided a speech synthesis apparatus, comprising duration length production means for producing duration lengths of phonological units, pitch pattern production means for producing a pitch pattern based on the duration lengths produced by the duration length production means, and means for feeding back the pitch pattern to the duration length production means so that the phonological unit duration lengths are modified.
The speech synthesis apparatus is advantageous in that duration lengths of phonological units can be modified based on a pitch pattern and synthetic speech of a high quality can be produced.
According to a still further aspect of the present invention, there is provided a speech synthesis apparatus, comprising duration length production means for producing duration lengths of phonological units, pitch pattern production means for producing a pitch pattern, phonological unit selection means for selecting phonological units, first means for supplying the duration lengths produced by the duration length production means to the pitch pattern production means and the phonological unit selection means, second means for supplying the pitch pattern produced by the pitch pattern production means to the duration length production means and the phonological unit selection means, and third means for supplying the phonological units selected by the phonological unit selection means to the pitch pattern production means and the duration length production means, the duration lengths, the pitch pattern and the phonological units being modified by cooperative operations of the duration length production means, the pitch pattern production means and the phonological unit selection means.
The speech synthesis apparatus is advantageous in that modification to duration lengths and a pitch pattern of phonological units and phonological unit information can be performed by referring to them with each other and synthetic speech of a high quality can be produced.
According to a yet further aspect of the present invention, there is provided a speech synthesis apparatus, comprising duration length production means for producing duration lengths of phonological units, pitch pattern production means for producing a pitch pattern, phonological unit selection means for selecting phonological units, and control means for activating the duration length production means, the pitch pattern production means and the phonological unit selection means in this order and controlling the duration length production means, the pitch pattern production means and the phonological unit selection means so that at least one of the duration lengths produced by the duration length production means, the pitch pattern produced by the pitch pattern production means and the phonological units selected by the phonological unit selection means is modified by a corresponding one of the duration length production means, the pitch pattern production means and the phonological unit selection means.
The speech synthesis apparatus is advantageous in that, since modification to duration lengths and a pitch pattern of phonological units and phonological unit information is determined not independently of each other but collectively by the single control means, synthetic speech of a high quality can be produced and the amount of calculation can be reduced.
The speech synthesis apparatus may be constructed such that it further comprises a shared information storage section, and the duration length production means produces duration lengths based on information stored in the shared information storage section and writes the duration length into the shared information storage section, the pitch pattern production section produces a pitch pattern based on the information stored in the shared information storage section and writes the pitch pattern into the shared information storage section, and the phonological unit selection means selects phonological units based on the information stored in the shared information storage section and writes the phonological units into the shared information storage section.
The speech synthesis apparatus is advantageous in that, since information mutually relating to the pertaining means is shared by the pertaining means, reduction of the calculation time can be achieved.
The above and other objects, features and advantages of the present invention will become apparent from the following description and the appended claims, taken in conjunction with the accompanying drawings in which like parts or elements are denoted by like reference symbols.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram showing a speech synthesis apparatus to which the present invention is applied;
FIG. 2 is a table illustrating an example of phonological unit information to be selected in the speech synthesis apparatus of FIG. 1;
FIG. 3 is a table schematically illustrating contents of a phonological unit condition database used in the speech synthesis apparatus of FIG. 1;
FIG. 4 is a diagrammatic view illustrating operation of a phonological unit modification section of the speech synthesis apparatus of FIG. 1;
FIG. 5 is a table illustrating an example of phonological unit modification rules used in the speech synthesis apparatus of FIG. 1;
FIG. 6 is a block diagram of a modification to the speech synthesis apparatus of FIG. 1;
FIG. 7 is a block diagram of another modification to the speech synthesis apparatus of FIG. 1;
FIG. 8 is a diagrammatic view illustrating operation of a duration length modification control section of the modified speech synthesis apparatus of FIG. 7; and
FIGS. 9 to 11 are block diagrams of different modifications to the speech synthesis apparatus of FIG. 1.
DESCRIPTION OF THE PREFERRED EMBODIMENT
Before a preferred embodiment of the present invention is described, speech synthesis apparatus according to different aspects of the present invention are described in connection with elements of the preferred embodiment of the present invention described below.
A speech synthesis apparatus according to an aspect of the present invention includes a prosodic pattern production section (21 in FIG. 1) for receiving utterance contents such as a text and a phonetic symbol train to be uttered, index information representative of a particular utterance text and so forth as an input thereto and producing a prosodic pattern which includes one or more or all of an accent position, a pause position, a pitch pattern and a duration length, a phonological unit selection section (22 of FIG. 1) for selecting phonological units based on the prosodic pattern produced by the prosodic pattern production section, a prosody modification control section (23 of FIG. 1) for searching the phonological unit information selected by the phonological unit selection section for a location for which modification to the prosodic pattern is required and outputting information of the location for the modification and contents of the modification, a prosody modification section (24 of FIG. 1) for modifying the prosodic pattern based on the information of the location for the modification and the contents of the modification outputted from the prosody modification control section, and a waveform production section (25 of FIG. 1) for producing synthetic speech based on the phonological unit information and the prosodic information modified by the prosody modification section using a phonological unit database (42 of FIG. 1).
A speech synthesis apparatus according to another aspect of the present invention includes a prosodic pattern production section for producing a prosodic pattern, and a phonological unit selection section for selecting phonological units based on the prosodic pattern produced by the prosodic pattern production section (21 of FIG. 1), and feeds back contents of a location for modification regarding phonological units selected by the phonological unit selection section from a prosody modification control section (23 of FIG. 1) to the prosodic pattern production section so that the prosodic pattern and the selected phonological units are modified repetitively.
In the speech synthesis apparatus, the prosodic pattern production section for receiving utterance contents as an input thereto and producing a prosodic pattern based on the utterance contents includes a duration length production section (26 of FIG. 6) for producing duration lengths of phonological units and a pitch pattern production section (27 of FIG. 6) for producing a prosodic pattern based on the duration lengths produced by the duration length production section. Further, the phonological unit selection section (22 of FIG. 6) selects phonological units based on the prosodic pattern produced by the pitch pattern production section. The phonological unit modification control section (23 of FIG. 6) searches the phonological unit information selected by the phonological unit selection section for a location for which modification to the prosodic pattern produced by the pitch pattern production section is required and feeds back, when modification is required, information of contents of the modification to the duration length production section and/or the pitch pattern production section so that the duration lengths and the pitch pattern are modified by the duration length production section and the pitch pattern production section, respectively. Thus, the prosodic pattern and the selected phonological units are modified repetitively.
A speech synthesis apparatus according to a further aspect of the present invention includes a duration length production section (26 of FIG. 7) for producing duration lengths of phonological units, a pitch pattern production section (27 of FIG. 7) for producing a pitch pattern based on the duration lengths produced by the duration length production section, and a duration length modification control section (29 of FIG. 7) for feeding back the pitch pattern to the duration length production section so that the phonological unit duration lengths are modified. The speech synthesis apparatus further includes a duration length modification control section (29 of FIG. 7) for discriminating modification contents to the duration length information produced by the duration length production section (26 of FIG. 7), and a duration length modification section (30 of FIG. 7) for modifying the duration length information in accordance with the modification contents outputted from the duration length modification control section (29 of FIG. 7).
A speech synthesis apparatus according to a still further aspect of the present invention includes a duration length production section (26 of FIG. 9) for producing duration lengths of phonological units, a pitch pattern production section (27 of FIG. 9) for producing a pitch pattern, a phonological unit selection section (22 of FIG. 9) for selecting phonological units, a means (29 of FIG. 9) for supplying the duration lengths produced by the duration length production section (26 of FIG. 9) to the pitch pattern production section and the phonological unit selection section, another means (31 of FIG. 9) for supplying the pitch pattern produced by the pitch pattern production section to the duration length production section and the phonological unit selection section, and a further means (32 of FIG. 9) for supplying the phonological units selected by the phonological unit selection section to the pitch pattern production section and the duration length production section, the duration lengths, the pitch pattern and the phonological units being modified by cooperative operations of the duration length production section, the pitch pattern production section and the phonological unit selection section. More particularly, a duration length modification control section (29 of FIG. 9) determines modification contents to the duration lengths based on the utterance contents, the pitch pattern information from the pitch pattern production section (27 of FIG. 9) and the phonological unit information from the phonological unit selection section (22 of FIG. 9), and the duration length production section (26 of FIG. 9) produces duration length information in accordance with the thus determined modification contents. A pitch pattern modification control section (31 of FIG. 9) determines modification contents to the pitch pattern based on the utterance contents, the duration length information from the duration time production section (26 of FIG. 9) and the phonological unit information from the phonological unit selection section (22 of FIG. 9), and the pitch pattern production section (27 of FIG. 9) produces pitch pattern information in accordance with the thus determined modification contents. Further, a phonological unit modification control section (32 of FIG. 9) determines modification contents to the phonological units based on the uttered contents, the duration length information from the duration time production section (26 of FIG. 9) and the pitch pattern information from the pitch pattern production section (27 of FIG. 9), and the phonological unit selection section (22 of FIG. 9) produces phonological unit information in accordance with the thus determined modification contents.
The speech synthesis apparatus may further include a shared information storage section (52 of FIG. 11). In this instance, the duration length production section (26 of FIG. 11) produces duration lengths based on information stored in the shared information storage section and writes the duration length into the shared information storage section. The pitch pattern production section (27 of FIG. 11) produces a pitch pattern based on the information stored in the shared storage section and writes the pitch pattern into the shared information storage section. Further, the phonological unit selection section (22 of FIG. 11) selects phonological units based on the information stored in the shared information storage section and writes the phonological units into the shared information storage section.
The speech synthesis apparatus may further include a shared information storage section (52 of FIG. 11). In this instance, the duration length production section (26 of FIG. 11) produces duration lengths based on information stored in the shared information storage section and writes the duration length into the shared information storage section. The pitch pattern production section (28 of FIG. 11) produces a pitch pattern based on the information stored in the shared information storage section and writes the pitch pattern into the shared information storage section. Further, the phonological unit selection section (22 of FIG. 11) selects phonological units based on the information stored in the shared information storage section and writes the phonological units into the shared information storage section.
Referring now to FIG. 1, there is shown a speech synthesis apparatus to which the present invention is applied. The speech synthesis apparatus shown includes a prosody production section 21, a phonological unit selection section 22, a prosody modification control section 23, a prosody modification section 24, a waveform production section 25, a phonological unit condition database 41 and a phonological unit database 42.
The prosody production section 21 receives contents 11 of utterance as an input thereto and produces prosodic information 12. The utterance contents 11 include a text and a phonetic symbol train to be uttered, index information representative of a particular utterance text and so forth. The prosodic information 12 includes one or more or all of an accent position, a pause position, a pitch pattern and a duration length.
The phonological unit selection section 22 receives the utterance contents 11 and the prosodic information produced by the prosody production section 21 as inputs thereto, selects a suitable phonological unit sequence from phonological units recorded in the phonological unit condition database 41 and determines the selected phonological unit sequence as phonological unit information 13.
The phonological unit information 13 may possibly be different significantly depending upon a method employed by the waveform production section 25. However, a train of indices representative of phonological units actually used as seen in FIG. 2 is used as the phonological unit information 13 here. FIG. 2 illustrates an example of an index train of phonological units selected by the phonological unit selection section 22 when the utterance contents are “aisatsu”.
FIG. 3 illustrates contents of the phonological unit condition database 41 of the speech synthesis apparatus of FIG. 1. Referring to FIG. 3, in the phonological unit condition database 41, information regarding a symbol representative of a phonological unit, a pitch frequency of a speech as collected, a duration length and an accent position is recorded in advance for each phonological unit provided in the speech synthesis apparatus.
Referring back to FIG. 1, the prosody modification control section 23 searches the phonological unit information 13 selected by the phonological unit selection section 22 for a portion for which modification in prosody is required. Then, the prosody modification control section 23 sends information of the location for modification and contents of the modification to the prosody modification section 24, and the prosody modification section 24 modifies the prosodic information 12 from the prosody production section 21 based on the received information.
The prosody modification control section 23 which discriminates whether or not modification in prosody is required determines whether modification to the prosodic information 12 is required in accordance with rules determined in advance. FIG. 4 illustrates operation of the prosody modification control section 23 of the speech synthesis apparatus of FIG. 1, and such operation of the prosody modification control section 23 is described below with reference to FIG. 4.
From FIG. 4, it can be seen that the utterance contents are “aisatsu”, and with regard to the first phonological unit “a” of the utterance contents, the pitch frequency produced by the prosody production section 21 is 190 Hz and the duration length is 80 msec. Further, with regard to the same first phonological unit “a”, the phonological unit index selected by the phonological unit selection section 22 is 1. Thus, by referring to the index 1 of the phonological unit condition database 41, it can be seen that the pitch frequency of the sound as collected is 190 Hz, and the duration length of the sound as collected is 80 msec. In this instance, since the conditions when the speech was collected and the conditions to be produced actually coincide with each other, no modification is performed.
With regard to the next phonological unit “i”, the pitch frequency produced by the prosody production section 21 is 160 Hz, and the duration length is 85 msec. Since the phonological unit index selected by the phonological unit selection section 22 is 81, the pitch frequency of the sound as collected was 163 Hz and the duration length of the sound as collected was 85 msec. In this instance, since the duration lengths are equal to each other, no modification is required, but the pitch frequencies are different from each other.
FIG. 5 illustrates an example of the rules used by the prosody modification section 24 of the speech synthesis apparatus of FIG. 1. Each rule includes a rule number, a condition part and an action (if <condition> then <action> format), and if satisfaction of a condition is determined, then processing of the corresponding action is performed. Referring to FIG. 5, the pitch frequency mentioned above satisfies the condition part of the rule 1 (the difference between a pitch to be produced for a voiced short vowel (a, i, u, e, o) and the pitch of the sound as collected is within 5 Hz) and makes an object of modification (the action is to modify the pitch frequency to that of the collected sound), and consequently, the pitch frequency is modified to 163 Hz. Consequently, since the pitch frequency need not be transformed unnecessarily, the synthetic sound quality is improved.
Referring back to FIG. 4, with regard to the next phonological unit “s”, since this phonological unit is a voiceless sound, the pitch frequency is not defined, and the duration length produced by the prosody production section 21 is 100 msec. And, since the phonological unit selected by the phonological unit selection section 22 is 56, the duration length of the sound as collected is 90 msec. This duration length satisfies the rule 2 of FIG. 5 and makes an object of modification, and consequently, the duration length is modified to 90 msec. Consequently, since the duration length need not be transformed unnecessarily, the synthetic sound quality is improved.
Referring back to FIG. 1, the waveform production section 25 produces synthetic speech based on the phonological unit information 13 and the prosodic information 12 modified by the prosody modification section 24 using the phonological unit database 42.
In the phonological unit database 42, speech element pieces for production of synthetic speech corresponding to the phonological unit condition database 41 are registered.
Referring now to FIG. 6, there is shown a modification to the speech synthesis apparatus described hereinabove with reference to FIG. 1. The modified speech synthesis apparatus is different from the speech synthesis apparatus of FIG. 1 in that it includes, in place of the prosody production section 21 described hereinabove, a duration length production section 26 and a pitch pattern production section 27 which successively produce duration length information 15 and pitch pattern information, respectively, to produce prosodic information 12.
The duration length production section 26 produces duration lengths for utterance contents 11 inputted thereto. At this time, however, if a duration length is designated for some phonological unit, then the duration length production section 26 uses the duration length to produce a duration length of the entire utterance contents 11.
The pitch pattern production section 27 produces a pitch pattern for the utterance contents 11 inputted thereto. However, if a pitch frequency is designated for some phonological unit, then the pitch pattern production section 27 uses the pitch frequency to produce a pitch pattern for the entire utterance contents 11.
The prosody modification control section 23 sends modification contents to phonological unit information determined in a similar manner as in the speech synthesis apparatus of FIG. 1 not to the prosody modification section 24 but to the duration length production section 26 and the pitch pattern production section 27 when necessary.
The duration length production section 26 re-produces, when the modification contents are sent thereto from the prosody modification control section 23, duration length information in accordance with the modification contents. Thereafter, the operations of the pitch pattern production section 27, phonological unit selection section 22 and prosody modification control section 23 described above are repeated.
The pitch pattern production section 27 re-produces, when the modification contents are set thereto from the prosody modification control section 23, pitch pattern information in accordance with the contents of modification. Thereafter, the operations of the phonological unit selection section 22 and the prosody modification control section 23 are repeated. If the necessity for modification is eliminated, then the prosody modification control section 23 sends the prosodic information 12 received from the pitch pattern production section 27 to the waveform production section 25.
The present modified speech synthesis apparatus performs, different from the speech synthesis apparatus of FIG. 1, feedback control, and to this end, discrimination of convergence is performed by the prosody modification control section 23. More particularly, the number of times of modification is counted, and if the number of times of modification exceeds a prescribed number determined in advance, then the prosody modification control section 23 determines that there remains no portion to be modified and sends the prosodic information 12 then to the waveform production section 25.
Referring now to FIG. 7, there is shown another modification to the speech synthesis apparatus described hereinabove with reference to FIG. 1. The present modified speech synthesis apparatus is different from the speech synthesis apparatus of FIG. 1 in that it includes, in place of the prosody production section 21, a duration length production section 26 and a pitch pattern production section 27 similarly as in the modified speech synthesis apparatus of FIG. 6, and further includes a duration length modification control section 29 for discriminating contents of modification to duration length information produced by the duration length production section 26, and a duration length modification section 30 for modifying the duration length information 15 in accordance with the modification contents outputted from the duration length modification control section 29.
Operation of the duration length modification control section 29 of the present modified speech synthesis apparatus is described with reference to FIG. 8. With regard to the first phonological unit “a” of the utterance contents “a i s a ts u”, the pitch frequency produced by the pitch pattern production section 27 is 190 Hz.
The duration length modification control section 29 has predetermined duration length modification rules (if then format) provided therein, and the pitch frequency of 190 Hz mentioned above corresponds to the rule 1. Therefore, the duration length for the phonological unit “a” is modified to 85 msec.
As regards the next phonological unit “i”, the duration length modification control section 29 does not have a pertaining duration length modification rule and therefore is not subject to modification. All of the phonological units of the utterance contents 11 are checked to detect whether or not modification is required in this manner to determine modification contents to duration length information 15.
Referring now to FIG. 9, there is shown a further modification to the speech synthesis apparatus described hereinabove with reference to FIG. 1. The present modified speech synthesis apparatus is different from the speech synthesis apparatus of FIG. 1 in that it includes, in place of the prosody production section 21, a duration length production section 26 and a pitch pattern production section 27 similarly as in the speech synthesis apparatus of FIG. 6, and further includes a duration length modification control section 29, a pitch pattern modification control section 31 and a phonological unit modification control section 32. The duration length modification control section 29 determines modification contents to duration lengths based on utterance contents 11, pitch pattern information 16 and phonological unit information 13, and the duration length production section 26 produces duration length information 15 in accordance with the modification contents.
The pitch pattern modification control section 31 determines modification contents to a pitch pattern based on the utterance contents 11, duration length information 15 and phonological unit information 13, and the pitch pattern production section 27 produces pitch pattern information 16 in accordance with the thus determined modification contents.
The phonological unit modification control section 32 determines modification contents to phonological units based on the utterance contents 11, duration length information 15 and pitch pattern information 16, and the phonological unit selection section 22 produces phonological unit information 13 in accordance with the thus determined modification contents.
When the utterance contents 11 are first provided to the modified speech synthesis apparatus of FIG. 9, since the duration length information 15, pitch pattern information 16 and phonological unit information 13 are not produced as yet, the duration length modification control section 29 determines that no modification should be performed, and the duration length production section 26 produces duration lengths in accordance with the utterance contents 11.
Then, the pitch pattern modification control section 31 determines modification contents based on the duration length information 15 and the utterance contents 11 since the phonological unit information 13 is not produced as yet, and the pitch pattern production section 27 produces pitch pattern information 16 in accordance with the thus determined modification contents.
Thereafter, the phonological unit modification control section 32 determines modification contents based on the utterance contents 11, duration length information 15 and pitch pattern information 16, and the phonological unit selection section 22 produces phonological unit information based on the thus determined modification contents using the phonological unit condition database 41.
Thereafter, each time modification is performed successively, the duration length information 15, pitch pattern information 16 and phonological unit information 13 are updated, and the duration length modification control section 29, pitch pattern modification control section 31 and phonological unit modification control section 32 to which they are inputted, respectively, are activated to perform their respective operations.
Then, when updating of the duration length information 15, pitch pattern information 16 and phonological unit information 13 is not performed any more or when an end condition defined in advance is satisfied, the waveform production section 25 produces a speech waveform 14.
The end condition may be, for example, that the total number of updating times exceeds a value determined in advance.
Referring now to FIG. 10, there is shown a modification to the modified speech synthesis apparatus described hereinabove with reference to FIG. 6. The present modified speech synthesis apparatus is different from the modified speech synthesis of FIG. 6 in that it does not include the prosody modification control section 23 but includes a control section 51 instead. The control section 51 receives utterance contents 11 as an input thereto and sends the utterance contents 11 to the duration length production section 26. The duration length production section 26 produces duration length information 15 based on the utterance contents 11 and sends the duration length information 15 to the control section 51.
Then, the control section 51 sends the utterance contents 11 and the duration length information 15 to the pitch pattern production section 27. The pitch pattern production section 27 produces pitch pattern information 16 based on the utterance contents 11 and the duration length information 15 and sends the pitch pattern information 16 to the control section 51.
Then, the control section 51 sends the utterance contents 11, duration length information 15 and pitch pattern information 16 to the phonological unit selection section 22, and the phonological unit selection section 22 produces phonological unit information 13 based on the utterance contents 11, duration length information 15 and pitch pattern information 16 and sends the phonological unit information 13 to the control section 51.
The control section 51 discriminates, if any of the duration length information 15, pitch pattern information 16 and phonological unit information 13 is varied, information whose modification becomes required as a result of the variation, and then sends modification contents to the pertaining one of the duration length production section 26, pitch pattern production section 27 and phonological unit selection section 22 so that suitable modification may be performed for the information. The criteria for the modification are similar to those in the speech synthesis apparatus described hereinabove.
If the control section 51 discriminates that there is no necessity for modification, then it sends the duration length information 15, pitch pattern information 16 and phonological unit information 13 to the waveform production section 25, and the waveform production section 25 produces a speech waveform 14 based on the thus received duration length information 15, pitch pattern information 16 and phonological unit information 13.
Referring now to FIG. 11, there is shown a modification to the modified speech synthesis apparatus described hereinabove with reference to FIG. 10. The present modified speech synthesis apparatus is different from the speech synthesis apparatus of FIG. 10 in that it additionally includes a shared information storage section 52.
The control section 51 instructs the duration length production section 26, pitch pattern production section 27 and phonological unit selection section 22 to produce duration length information 15, pitch pattern information 16 and phonological unit information 13, respectively. The thus produced duration length information 15, pitch pattern information 16 and phonological unit information 13 are stored into the shared information storage section 52 by the duration length production section 26, pitch pattern production section 27 and phonological unit selection section 22, respectively. Then, if the control section 51 discriminates that there is no necessity for modification any more, then the waveform production section 25 reads out the duration length information 15, pitch pattern information 16 and phonological unit information 13 from the shared information storage section 52 and produces a speech waveform 14 based on the duration length information 15, pitch pattern information 16 and phonological unit information 13.
While a preferred embodiment of the present invention has been described using specific terms, such description is for illustrative purposes only, and it is to be understood that changes and variations may be made without departing from the spirit or scope of the following claims.

Claims (1)

What is claimed is:
1. A speech synthesis apparatus, comprising:
prosodic pattern production means for receiving utterance contents as an input thereto and producing a prosodic pattern based on the inputted utterance contents;
phonological unit selection means for selecting phonological units based on the prosodic pattern produced by said prosodic pattern production means;
prosody modification control means for searching the phonological unit information selected by said phonological unit selection means for a location for which modification to the prosodic pattern produced by said prosodic pattern production means is required and outputting, when modification is required, information of the location for the modification and contents of the modification;
prosody modification means for modifying the prosodic pattern produced by said prosodic pattern production means based on the information of the location for the modification and the contents of the modification outputted from said prosody modification control means; and
waveform production means for producing synthetic speech based on the phonological unit information and the prosodic information modified by said prosody modification means.
US09/325,544 1998-06-05 1999-06-04 Speech synthesis apparatus Expired - Fee Related US6405169B1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP15702198A JP3180764B2 (en) 1998-06-05 1998-06-05 Speech synthesizer
JP10-157021 1998-06-05

Publications (1)

Publication Number Publication Date
US6405169B1 true US6405169B1 (en) 2002-06-11

Family

ID=15640458

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/325,544 Expired - Fee Related US6405169B1 (en) 1998-06-05 1999-06-04 Speech synthesis apparatus

Country Status (2)

Country Link
US (1) US6405169B1 (en)
JP (1) JP3180764B2 (en)

Cited By (134)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010047259A1 (en) * 2000-03-31 2001-11-29 Yasuo Okutani Speech synthesis apparatus and method, and storage medium
US20030158721A1 (en) * 2001-03-08 2003-08-21 Yumiko Kato Prosody generating device, prosody generating method, and program
US6625575B2 (en) * 2000-03-03 2003-09-23 Oki Electric Industry Co., Ltd. Intonation control method for text-to-speech conversion
US20040024600A1 (en) * 2002-07-30 2004-02-05 International Business Machines Corporation Techniques for enhancing the performance of concatenative speech synthesis
US6778962B1 (en) * 1999-07-23 2004-08-17 Konami Corporation Speech synthesis with prosodic model data and accent type
US20040260551A1 (en) * 2003-06-19 2004-12-23 International Business Machines Corporation System and method for configuring voice readers using semantic analysis
US20050027532A1 (en) * 2000-03-31 2005-02-03 Canon Kabushiki Kaisha Speech synthesis apparatus and method, and storage medium
US20060136214A1 (en) * 2003-06-05 2006-06-22 Kabushiki Kaisha Kenwood Speech synthesis device, speech synthesis method, and program
US20060136213A1 (en) * 2004-10-13 2006-06-22 Yoshifumi Hirose Speech synthesis apparatus and speech synthesis method
US20070100627A1 (en) * 2003-06-04 2007-05-03 Kabushiki Kaisha Kenwood Device, method, and program for selecting voice data
US20070174056A1 (en) * 2001-08-31 2007-07-26 Kabushiki Kaisha Kenwood Apparatus and method for creating pitch wave signals and apparatus and method compressing, expanding and synthesizing speech signals using these pitch wave signals
US20070233492A1 (en) * 2006-03-31 2007-10-04 Fujitsu Limited Speech synthesizer
US20080235025A1 (en) * 2007-03-20 2008-09-25 Fujitsu Limited Prosody modification device, prosody modification method, and recording medium storing prosody modification program
US20090258333A1 (en) * 2008-03-17 2009-10-15 Kai Yu Spoken language learning systems
US8103505B1 (en) * 2003-11-19 2012-01-24 Apple Inc. Method and apparatus for speech synthesis using paralinguistic variation
US8321225B1 (en) 2008-11-14 2012-11-27 Google Inc. Generating prosodic contours for synthesized speech
US8614833B2 (en) * 2005-07-21 2013-12-24 Fuji Xerox Co., Ltd. Printer, printer driver, printing system, and print controlling method
US8761581B2 (en) * 2010-10-13 2014-06-24 Sony Corporation Editing device, editing method, and editing program
US8892446B2 (en) 2010-01-18 2014-11-18 Apple Inc. Service orchestration for intelligent automated assistant
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US9300784B2 (en) 2013-06-13 2016-03-29 Apple Inc. System and method for emergency calls initiated by voice command
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9606986B2 (en) 2014-09-29 2017-03-28 Apple Inc. Integrated word N-gram and class M-gram language models
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US9697822B1 (en) 2013-03-15 2017-07-04 Apple Inc. System and method for updating an adaptive speech recognition model
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9922642B2 (en) 2013-03-15 2018-03-20 Apple Inc. Training an at least partial voice command system
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US9997154B2 (en) 2014-05-12 2018-06-12 At&T Intellectual Property I, L.P. System and method for prosodically modified unit selection databases
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10199051B2 (en) 2013-02-07 2019-02-05 Apple Inc. Voice trigger for a digital assistant
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10568032B2 (en) 2007-04-03 2020-02-18 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10607140B2 (en) 2010-01-25 2020-03-31 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US10791216B2 (en) 2013-08-06 2020-09-29 Apple Inc. Auto-activating smart responses based on activities from remote devices
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3828132A (en) * 1970-10-30 1974-08-06 Bell Telephone Labor Inc Speech synthesis by concatenation of formant encoded words
JPS6315297A (en) 1986-07-08 1988-01-22 株式会社東芝 Voice synthesizer
US4833718A (en) * 1986-11-18 1989-05-23 First Byte Compression of stored waveforms for artificial speech
JPH0453998A (en) 1990-06-22 1992-02-21 Sony Corp Voice synthesizer
JPH04298794A (en) 1991-01-28 1992-10-22 Matsushita Electric Works Ltd Voice data correction system
JPH06161490A (en) 1992-11-19 1994-06-07 Meidensha Corp Rhythm processing system of speech synthesizing device
JPH07140996A (en) 1993-11-16 1995-06-02 Fujitsu Ltd Speech rule synthesizer
US5832434A (en) * 1995-05-26 1998-11-03 Apple Computer, Inc. Method and apparatus for automatic assignment of duration values for synthetic speech
US5940797A (en) * 1996-09-24 1999-08-17 Nippon Telegraph And Telephone Corporation Speech synthesis method utilizing auxiliary information, medium recorded thereon the method and apparatus utilizing the method
US6035272A (en) * 1996-07-25 2000-03-07 Matsushita Electric Industrial Co., Ltd. Method and apparatus for synthesizing speech
US6101470A (en) * 1998-05-26 2000-08-08 International Business Machines Corporation Methods for generating pitch and duration contours in a text to speech system
US6109923A (en) * 1995-05-24 2000-08-29 Syracuase Language Systems Method and apparatus for teaching prosodic features of speech

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2878483B2 (en) 1991-06-19 1999-04-05 株式会社エイ・ティ・アール自動翻訳電話研究所 Voice rule synthesizer

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3828132A (en) * 1970-10-30 1974-08-06 Bell Telephone Labor Inc Speech synthesis by concatenation of formant encoded words
JPS6315297A (en) 1986-07-08 1988-01-22 株式会社東芝 Voice synthesizer
US4833718A (en) * 1986-11-18 1989-05-23 First Byte Compression of stored waveforms for artificial speech
JPH0453998A (en) 1990-06-22 1992-02-21 Sony Corp Voice synthesizer
JPH04298794A (en) 1991-01-28 1992-10-22 Matsushita Electric Works Ltd Voice data correction system
JPH06161490A (en) 1992-11-19 1994-06-07 Meidensha Corp Rhythm processing system of speech synthesizing device
JPH07140996A (en) 1993-11-16 1995-06-02 Fujitsu Ltd Speech rule synthesizer
US6109923A (en) * 1995-05-24 2000-08-29 Syracuase Language Systems Method and apparatus for teaching prosodic features of speech
US5832434A (en) * 1995-05-26 1998-11-03 Apple Computer, Inc. Method and apparatus for automatic assignment of duration values for synthetic speech
US6035272A (en) * 1996-07-25 2000-03-07 Matsushita Electric Industrial Co., Ltd. Method and apparatus for synthesizing speech
US5940797A (en) * 1996-09-24 1999-08-17 Nippon Telegraph And Telephone Corporation Speech synthesis method utilizing auxiliary information, medium recorded thereon the method and apparatus utilizing the method
US6101470A (en) * 1998-05-26 2000-08-08 International Business Machines Corporation Methods for generating pitch and duration contours in a text to speech system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Speech Synthesis Software for a Personal Computer", Collection of Papers of the 47th National Meeting of the Information Processing Society of Japan, 1993.
Furui, "Digital Speech Processing", Sep. 25, 1985.

Cited By (192)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6778962B1 (en) * 1999-07-23 2004-08-17 Konami Corporation Speech synthesis with prosodic model data and accent type
US6625575B2 (en) * 2000-03-03 2003-09-23 Oki Electric Industry Co., Ltd. Intonation control method for text-to-speech conversion
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US20050027532A1 (en) * 2000-03-31 2005-02-03 Canon Kabushiki Kaisha Speech synthesis apparatus and method, and storage medium
US20010047259A1 (en) * 2000-03-31 2001-11-29 Yasuo Okutani Speech synthesis apparatus and method, and storage medium
US6980955B2 (en) * 2000-03-31 2005-12-27 Canon Kabushiki Kaisha Synthesis unit selection apparatus and method, and storage medium
US7039588B2 (en) 2000-03-31 2006-05-02 Canon Kabushiki Kaisha Synthesis unit selection apparatus and method, and storage medium
US7200558B2 (en) * 2001-03-08 2007-04-03 Matsushita Electric Industrial Co., Ltd. Prosody generating device, prosody generating method, and program
US8738381B2 (en) 2001-03-08 2014-05-27 Panasonic Corporation Prosody generating devise, prosody generating method, and program
US20030158721A1 (en) * 2001-03-08 2003-08-21 Yumiko Kato Prosody generating device, prosody generating method, and program
US20070118355A1 (en) * 2001-03-08 2007-05-24 Matsushita Electric Industrial Co., Ltd. Prosody generating devise, prosody generating method, and program
US20070174056A1 (en) * 2001-08-31 2007-07-26 Kabushiki Kaisha Kenwood Apparatus and method for creating pitch wave signals and apparatus and method compressing, expanding and synthesizing speech signals using these pitch wave signals
US7647226B2 (en) * 2001-08-31 2010-01-12 Kabushiki Kaisha Kenwood Apparatus and method for creating pitch wave signals, apparatus and method for compressing, expanding, and synthesizing speech signals using these pitch wave signals and text-to-speech conversion using unit pitch wave signals
US20040024600A1 (en) * 2002-07-30 2004-02-05 International Business Machines Corporation Techniques for enhancing the performance of concatenative speech synthesis
US8145491B2 (en) * 2002-07-30 2012-03-27 Nuance Communications, Inc. Techniques for enhancing the performance of concatenative speech synthesis
US20070100627A1 (en) * 2003-06-04 2007-05-03 Kabushiki Kaisha Kenwood Device, method, and program for selecting voice data
US8214216B2 (en) * 2003-06-05 2012-07-03 Kabushiki Kaisha Kenwood Speech synthesis for synthesizing missing parts
US20060136214A1 (en) * 2003-06-05 2006-06-22 Kabushiki Kaisha Kenwood Speech synthesis device, speech synthesis method, and program
US20040260551A1 (en) * 2003-06-19 2004-12-23 International Business Machines Corporation System and method for configuring voice readers using semantic analysis
US20070276667A1 (en) * 2003-06-19 2007-11-29 Atkin Steven E System and Method for Configuring Voice Readers Using Semantic Analysis
US8103505B1 (en) * 2003-11-19 2012-01-24 Apple Inc. Method and apparatus for speech synthesis using paralinguistic variation
US7349847B2 (en) * 2004-10-13 2008-03-25 Matsushita Electric Industrial Co., Ltd. Speech synthesis apparatus and speech synthesis method
US20060136213A1 (en) * 2004-10-13 2006-06-22 Yoshifumi Hirose Speech synthesis apparatus and speech synthesis method
US8614833B2 (en) * 2005-07-21 2013-12-24 Fuji Xerox Co., Ltd. Printer, printer driver, printing system, and print controlling method
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US8135592B2 (en) * 2006-03-31 2012-03-13 Fujitsu Limited Speech synthesizer
US20070233492A1 (en) * 2006-03-31 2007-10-04 Fujitsu Limited Speech synthesizer
US9117447B2 (en) 2006-09-08 2015-08-25 Apple Inc. Using event alert text as input to an automated assistant
US8930191B2 (en) 2006-09-08 2015-01-06 Apple Inc. Paraphrasing of user requests and results by automated digital assistant
US8942986B2 (en) 2006-09-08 2015-01-27 Apple Inc. Determining user intent based on ontologies of domains
US8433573B2 (en) * 2007-03-20 2013-04-30 Fujitsu Limited Prosody modification device, prosody modification method, and recording medium storing prosody modification program
US20080235025A1 (en) * 2007-03-20 2008-09-25 Fujitsu Limited Prosody modification device, prosody modification method, and recording medium storing prosody modification program
US10568032B2 (en) 2007-04-03 2020-02-18 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US10381016B2 (en) 2008-01-03 2019-08-13 Apple Inc. Methods and apparatus for altering audio output signals
US20090258333A1 (en) * 2008-03-17 2009-10-15 Kai Yu Spoken language learning systems
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9865248B2 (en) 2008-04-05 2018-01-09 Apple Inc. Intelligent text-to-speech conversion
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US10108612B2 (en) 2008-07-31 2018-10-23 Apple Inc. Mobile device having human language translation capability with positional feedback
US8321225B1 (en) 2008-11-14 2012-11-27 Google Inc. Generating prosodic contours for synthesized speech
US9093067B1 (en) 2008-11-14 2015-07-28 Google Inc. Generating prosodic contours for synthesized speech
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US10475446B2 (en) 2009-06-05 2019-11-12 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10795541B2 (en) 2009-06-05 2020-10-06 Apple Inc. Intelligent organization of tasks items
US11080012B2 (en) 2009-06-05 2021-08-03 Apple Inc. Interface for a virtual digital assistant
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US8892446B2 (en) 2010-01-18 2014-11-18 Apple Inc. Service orchestration for intelligent automated assistant
US8903716B2 (en) 2010-01-18 2014-12-02 Apple Inc. Personalized vocabulary for digital assistant
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US9548050B2 (en) 2010-01-18 2017-01-17 Apple Inc. Intelligent automated assistant
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10706841B2 (en) 2010-01-18 2020-07-07 Apple Inc. Task flow identification based on user intent
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10607141B2 (en) 2010-01-25 2020-03-31 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US11410053B2 (en) 2010-01-25 2022-08-09 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US10984326B2 (en) 2010-01-25 2021-04-20 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US10984327B2 (en) 2010-01-25 2021-04-20 New Valuexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US10607140B2 (en) 2010-01-25 2020-03-31 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US10049675B2 (en) 2010-02-25 2018-08-14 Apple Inc. User profiling for voice input processing
US8761581B2 (en) * 2010-10-13 2014-06-24 Sony Corporation Editing device, editing method, and editing program
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US10102359B2 (en) 2011-03-21 2018-10-16 Apple Inc. Device access using voice authentication
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US11120372B2 (en) 2011-06-03 2021-09-14 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US10978090B2 (en) 2013-02-07 2021-04-13 Apple Inc. Voice trigger for a digital assistant
US10199051B2 (en) 2013-02-07 2019-02-05 Apple Inc. Voice trigger for a digital assistant
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US9922642B2 (en) 2013-03-15 2018-03-20 Apple Inc. Training an at least partial voice command system
US9697822B1 (en) 2013-03-15 2017-07-04 Apple Inc. System and method for updating an adaptive speech recognition model
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9966060B2 (en) 2013-06-07 2018-05-08 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10657961B2 (en) 2013-06-08 2020-05-19 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US9300784B2 (en) 2013-06-13 2016-03-29 Apple Inc. System and method for emergency calls initiated by voice command
US10791216B2 (en) 2013-08-06 2020-09-29 Apple Inc. Auto-activating smart responses based on activities from remote devices
US9997154B2 (en) 2014-05-12 2018-06-12 At&T Intellectual Property I, L.P. System and method for prosodically modified unit selection databases
US11049491B2 (en) * 2014-05-12 2021-06-29 At&T Intellectual Property I, L.P. System and method for prosodically modified unit selection databases
US10607594B2 (en) 2014-05-12 2020-03-31 At&T Intellectual Property I, L.P. System and method for prosodically modified unit selection databases
US10249290B2 (en) 2014-05-12 2019-04-02 At&T Intellectual Property I, L.P. System and method for prosodically modified unit selection databases
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US10169329B2 (en) 2014-05-30 2019-01-01 Apple Inc. Exemplar-based natural language processing
US10083690B2 (en) 2014-05-30 2018-09-25 Apple Inc. Better resolution when referencing to concepts
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US10497365B2 (en) 2014-05-30 2019-12-03 Apple Inc. Multi-command single utterance input method
US11257504B2 (en) 2014-05-30 2022-02-22 Apple Inc. Intelligent assistant for home automation
US11133008B2 (en) 2014-05-30 2021-09-28 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US10904611B2 (en) 2014-06-30 2021-01-26 Apple Inc. Intelligent automated assistant for TV user interactions
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US9668024B2 (en) 2014-06-30 2017-05-30 Apple Inc. Intelligent automated assistant for TV user interactions
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10431204B2 (en) 2014-09-11 2019-10-01 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US9606986B2 (en) 2014-09-29 2017-03-28 Apple Inc. Integrated word N-gram and class M-gram language models
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US9986419B2 (en) 2014-09-30 2018-05-29 Apple Inc. Social reminders
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US11556230B2 (en) 2014-12-02 2023-01-17 Apple Inc. Data detection
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US11087759B2 (en) 2015-03-08 2021-08-10 Apple Inc. Virtual assistant activation
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US10311871B2 (en) 2015-03-08 2019-06-04 Apple Inc. Competing devices responding to voice triggers
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US11526368B2 (en) 2015-11-06 2022-12-13 Apple Inc. Intelligent automated assistant in a messaging environment
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US11069347B2 (en) 2016-06-08 2021-07-20 Apple Inc. Intelligent automated assistant for media exploration
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US11037565B2 (en) 2016-06-10 2021-06-15 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US11152002B2 (en) 2016-06-11 2021-10-19 Apple Inc. Application integration with a digital assistant
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10553215B2 (en) 2016-09-23 2020-02-04 Apple Inc. Intelligent automated assistant
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services

Also Published As

Publication number Publication date
JPH11352980A (en) 1999-12-24
JP3180764B2 (en) 2001-06-25

Similar Documents

Publication Publication Date Title
US6405169B1 (en) Speech synthesis apparatus
US7565291B2 (en) Synthesis-based pre-selection of suitable units for concatenative speech
JP3078205B2 (en) Speech synthesis method by connecting and partially overlapping waveforms
JPH0833744B2 (en) Speech synthesizer
JPH11503535A (en) Waveform language synthesis
US6212501B1 (en) Speech synthesis apparatus and method
JP2000310997A (en) Method of discriminating unit overlapping area for coupling type speech synthesis and method of coupling type speech synthesis
EP1105867A1 (en) Method and device for the concatenation of audiosegments, taking into account coarticulation
JP2000267687A (en) Audio response apparatus
JPH05260082A (en) Text reader
JPH08335096A (en) Text voice synthesizer
van Rijnsoever A multilingual text-to-speech system
JP3083624B2 (en) Voice rule synthesizer
JPH07140996A (en) Speech rule synthesizer
JPH0580791A (en) Device and method for speech rule synthesis
JP3771565B2 (en) Fundamental frequency pattern generation device, fundamental frequency pattern generation method, and program recording medium
JP2577372B2 (en) Speech synthesis apparatus and method
JP3292218B2 (en) Voice message composer
JP2703253B2 (en) Speech synthesizer
JP2003005774A (en) Speech synthesizer
JPH09230893A (en) Regular speech synthesis method and device therefor
JPH0863187A (en) Speech synthesizer
JP3297221B2 (en) Phoneme duration control method
JPH06214585A (en) Voice synthesizer
JPH0756589A (en) Voice synthesis method

Legal Events

Date Code Title Description
AS Assignment

Owner name: NEC CORPORATION, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KONDO, REISHI;MITOME, YUKIO;REEL/FRAME:010015/0717

Effective date: 19990601

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20060611