CA2165546A1 - Method of encoding a signal containing speech - Google Patents
Method of encoding a signal containing speechInfo
- Publication number
- CA2165546A1 CA2165546A1 CA002165546A CA2165546A CA2165546A1 CA 2165546 A1 CA2165546 A1 CA 2165546A1 CA 002165546 A CA002165546 A CA 002165546A CA 2165546 A CA2165546 A CA 2165546A CA 2165546 A1 CA2165546 A1 CA 2165546A1
- Authority
- CA
- Canada
- Prior art keywords
- frame
- mode
- pitch
- thq
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/012—Comfort noise or silence coding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/12—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/26—Pre-filtering or post-filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L2019/0001—Codebooks
- G10L2019/0002—Codebook adaptations
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L2019/0001—Codebooks
- G10L2019/0003—Backward prediction of gain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/09—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being zero crossing rates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/93—Discriminating between voiced and unvoiced parts of speech signals
Abstract
A method of encoding a signal containing speech is employed in a bit rate Codebook Excited Linear Predictor (CELP) communication system. The system includes a transmitter that organizes a signal containing speech into frames of 40 millisecond duration, and classifies each frame as one of three modes: voiced and stationary, unvoiced or transient, and background noise.
Description
wo gsl28824 2 1 6 5 5 4 6 A ~I~,J.~, _,10 1_77 METHOD OF ENCODING A SIGNAL CONTAINING SPEECH
BACKGROUND OF THE INVENTION
Fi~ld of th~ ~nv~ntion q~he pr~ent 1,~ n ~ 1 ly relate~ to a ~othod of encod-lnq ~ ~Lgn~l cont~ining ~peech ~nd more part1r~ y to ~ method ~ploylng a line~r pr~dictor to encod~ a ~lqn~l.
De~crlDtion of the Related ~rt A ~odern _ Ir~tlon technique e~ploy~ a C~ Excited L~ln~ns Pr~dictLon (C~P) coder. Th~ c~ 1 a t~_le r~ ini~q nrclt~tlon vnctOr~ for ~ nS~ by ~ lln ~r pr~dic-tlv~ fLlter. ITho t~chnigue lnvolv~ p~stltlonLng an lnput ~ign~l lnto ~ultlpl~ portLon~ ~nd, for ~ch portion, ~~~rrhi-~g tho for the v~ctor th~t ,~r lu ~ ~ filter output slgnal th~t i~ clo~e~t to the lnput ~lgn~l.
~ ` f ~ 2 1 6 55 46 wo s~2ss24 1 ~I/L~ _ 1577 Tha typlc~l CI~P technique may di-tort portion~ of the input 3ignal dominAted by noiDe becauDe the ~ el~ ~nd thQ linear pre-dictivQ filtQr thAt may be optimum for ~peech m~y be inappropri~te f or noi n~T~ r~ smQ~ o~
~ t i~ an ob~-ct of thQ pre~ent Lnv-ntlon to provlde ~ method of ~nro~l~ng _ ~Lgn~l containlng both Dpeech _nd noiDe whlle avoiding ~om~ of the di~tortionD irL ~l. ~d by typical CEI,P encod-ing techniquQD
Additional ob~ectives And advantAge~ of thQ invention will b~
~et forth in the deDcription th_t follows _nd in pArt will be ob-ViouD from the deocrLption, or ~y be le_rned by practlc~ of th~
invQntiOn ThQ ob~ect- and advAnt~guD of the inv~nt$on m~y be and att ined by meanD of the irD~ -Al~tie~ and combi-n_tion3 p~rt~ lA~ly pointed out ln the ~E~ ' claimD
To _chlav~ th ob~ectD And in ~ r~ wlth the purpo~ of thu inv~ntlon, _~ d And broadly ~ hQr in, ~ method of pro~n~ n~ a ~ l havlng ~ peech ,t, th~ ign~l being org~nizod a- a plur~llty of frcm~-, 1D u- d Th~ mQthod compri~-~
thQ ~t-p~ ' for each fr~me, of dQt-~m~n~-~ whQthQr the frAme ~ y -~ to a firDt mode, ~ q on whether the spQech AI~t1Ally ~bDent from th- fr~me~ g-n~r~tlng an ncod~d fr~e in ~~: - with one of a firDt coding Dcheme, when thQ frAme c~ 1D to the fir-t mode, and A Decond coding ~ch~m~ when th~ fr~me doeD not ~ Cy~A~ to th~ firDt mode; and dc~ o~1ng the encoded frame in ~c ~ - e with on~ of th~ fLr~t .
BACKGROUND OF THE INVENTION
Fi~ld of th~ ~nv~ntion q~he pr~ent 1,~ n ~ 1 ly relate~ to a ~othod of encod-lnq ~ ~Lgn~l cont~ining ~peech ~nd more part1r~ y to ~ method ~ploylng a line~r pr~dictor to encod~ a ~lqn~l.
De~crlDtion of the Related ~rt A ~odern _ Ir~tlon technique e~ploy~ a C~ Excited L~ln~ns Pr~dictLon (C~P) coder. Th~ c~ 1 a t~_le r~ ini~q nrclt~tlon vnctOr~ for ~ nS~ by ~ lln ~r pr~dic-tlv~ fLlter. ITho t~chnigue lnvolv~ p~stltlonLng an lnput ~ign~l lnto ~ultlpl~ portLon~ ~nd, for ~ch portion, ~~~rrhi-~g tho for the v~ctor th~t ,~r lu ~ ~ filter output slgnal th~t i~ clo~e~t to the lnput ~lgn~l.
~ ` f ~ 2 1 6 55 46 wo s~2ss24 1 ~I/L~ _ 1577 Tha typlc~l CI~P technique may di-tort portion~ of the input 3ignal dominAted by noiDe becauDe the ~ el~ ~nd thQ linear pre-dictivQ filtQr thAt may be optimum for ~peech m~y be inappropri~te f or noi n~T~ r~ smQ~ o~
~ t i~ an ob~-ct of thQ pre~ent Lnv-ntlon to provlde ~ method of ~nro~l~ng _ ~Lgn~l containlng both Dpeech _nd noiDe whlle avoiding ~om~ of the di~tortionD irL ~l. ~d by typical CEI,P encod-ing techniquQD
Additional ob~ectives And advantAge~ of thQ invention will b~
~et forth in the deDcription th_t follows _nd in pArt will be ob-ViouD from the deocrLption, or ~y be le_rned by practlc~ of th~
invQntiOn ThQ ob~ect- and advAnt~guD of the inv~nt$on m~y be and att ined by meanD of the irD~ -Al~tie~ and combi-n_tion3 p~rt~ lA~ly pointed out ln the ~E~ ' claimD
To _chlav~ th ob~ectD And in ~ r~ wlth the purpo~ of thu inv~ntlon, _~ d And broadly ~ hQr in, ~ method of pro~n~ n~ a ~ l havlng ~ peech ,t, th~ ign~l being org~nizod a- a plur~llty of frcm~-, 1D u- d Th~ mQthod compri~-~
thQ ~t-p~ ' for each fr~me, of dQt-~m~n~-~ whQthQr the frAme ~ y -~ to a firDt mode, ~ q on whether the spQech AI~t1Ally ~bDent from th- fr~me~ g-n~r~tlng an ncod~d fr~e in ~~: - with one of a firDt coding Dcheme, when thQ frAme c~ 1D to the fir-t mode, and A Decond coding ~ch~m~ when th~ fr~me doeD not ~ Cy~A~ to th~ firDt mode; and dc~ o~1ng the encoded frame in ~c ~ - e with on~ of th~ fLr~t .
woss/2ss24 r~ 5~0l 77 codlng ~cheme, when the fr~me C~IL~ to the ~Ir-t mc~é, ~nd thQ ~econd codlng ~cheme when the fr~me doe~ not COL' ~YC,A.'I to the fir-t mod~
Rl2T1~P r ~ ~ o~ T~S DR~DGS
~ he forqgo;n~ And other ob~ect-, Aspect- ~nd _dv_nt~qe- will be ~atter u~d~L~L~ from the followlnq det~iled de-cription of ~
preferr~d ` ~ L of the invention wlth reforence to the drav-inqs, in which I
FIG l 18 _ block di_qram of a tr~n~mitter in ~ wlrele~ com-munic_tion sy~tem Acc~r~i{nq to a pr~ferred A ' ~ t of the in-v~ntion;
~ IG 2 is ~ block di~gr~m of ~ receiver in ~ wir~la~- com-munic_tion ~y~tem Accor~l1n~ to the p.~f.L._d ~ i t of the invention;
FIG 3 i- block diAgram of th- encoder in the tran-mitter Jhown in FIG . l;
FIG 4 i- ~ bloc~c dlagr~m of the decod~r in the receiv-r shown in FIG. 2 ~ TG 5A i~ a ti~ng dlagrA showing th~ Alla t of linear predictlon ~m~ly~s window- in th~ encoder shown ln FIG 3;
; `;- `~ 2 ~ 65546 WO95/28824 p~,""~ c~o1-77 rIG~ 5~ timing dl_grA~ ~howLng the ~ , t of pit~h prediction ~n~ly~i~ windows for open loop pitch prediction in the encoder Yhown Ln FIG 3;
FIG 6 and 68 _re a f lowchart illustr_ting the 26-blt line spectral ~ vector quAnti2atlon proce-- performed by th-encoder of l! ~G 3;
FIG ~ is a flowchart illustrAting the op~_tinn of ~ pitch tr~l cklng algorithm;
FTG 8 i~ _ block diagra~ showing in more det_il the open loop pitch e~tlm~tion of the encoder shown in FIG 3;
FIG g i- a f ~ t illu~tr~ting th- oper~tion of thn modi-fied pitch i 'ng algorithm i ,1~ by th- op~n loop pitch ~tim tion ~hown in F$G B;
PIG 10 i~ _ fl~ t ~howing the ~__ m~ ' -9 ~ ~ r - by the mode i~t^~m~nA~ n module ~hown in ~IG 3;
FIG 11 is a dataflow di_gra~ showing a part of the proce~-ing of a ~tep of det~ininq spectr_l ~tationarity ~r~lue~ shown ir~
FIG 10;
wo ss/zssz4 Pcr/usss/04s77 ~IG 12 1- a dataflow diagram showing anothQr part of the ~e~-in~ of the step of det~ininq spectral statlonarity v~l-u~;
FIG 13 18 a dataflow diaqram showing ~nother part of the proces~ing of the ~tep of det~"nin;nq ~pectral ~t_tlonarity val-u~ 5 FIG 14 i~ a dataflow diagram ~howing th~ pro~ nq of the stop of det~ n;~J pltch stationarity value~ ~hown in FIG 10;
FIG 15 is a ~A~fl~ dlagram showlng the pro~a~ln~ of the ~t-p of g~nerating z~ro cro~ing rat~ valu~ ~hown ln FIG 10;
FIG 16 is a dataflow dl_gram showlng th~ p~u~e~~~nq of the ~tep of det~n~q level grA~i~^nt value~ ln YIG 10;
FIG 17 1~ a d~t~ dlagram showing tho p,~c ~-in7 of tha _top of date~n~ng Ahort-t~rm energy value- ~hown in FIG 10;
~ IGS. 18~, 18B and 18C are a fl~ t of detn~in~n~ the moda b~- ~d on th~ ~ U~d value- a~ hown in YIG 10;
FIG. 19 i- a S~locl~ dlagram showing in mor~ det~il the ~ tlon of th~ e~ccltatlon l~ng c~rcultry o~ the encodet ~hown in PIG 3;
_ 5 _ 2 1 6 ~ 5 4 6 w0 ss/2ss24 r~l~L ./~ ~s77 PIGS 20 1J a diagram lllustratLng a proce~Lng of the ~ncod~r ~how Ln FLg 3;
FIGS 21~ ant 21B are a chart of speech coder ~ ~er~ for mod~ A;
FIGS 22 LJ a chart of ~peech coder parameter~ for mode A;
FIG 23 L~ a chart of spe~ch coder paramet~r~ for mode A;
~ IG 24 Ls a block dLagram Lllu~tratlng a ~_ _ e ~ i nq of the ~peech decoder ghowA ln FIG 4; and PIG 25 Ls a timing diagram showing ~n alternative ~1~, t of llnear predictlon analy~l~ window-~n DEscRIPq!~ON OF A r~rSr~, ~M~nr~T~vuq~ OF ~HE lh.r~
FIG 1 ~how~ the tr~n~mitter of the i.,af~ tion~y~t~ Analoq-to-dlgltal (AtD) ~ ,La~ 11 Rample- analog ~peech fro~ a t~lq~h~ - hand-~t at an 8 1~}~ rate, ~_,L. to digltal value- and tupplie~ the dlgital v~lue- to the speech en-cod~r 12 Channel encoder 13 further ~ncode~ th~ signal, a~ may be requlred ln a digltal ~ r ~ 1 rtlom~ ~y tem, and ~p-pll~ a r~ultlng encoded bit ~tr~am to a modulator 14 Digital-to-~n~log (DtA) converter 15 c~ L~ the output of th~ modulator wo g5n8824 P~
1~, to Ph_~- Shit ~ying (PS~) ~ignal~ Radlo fr~ (RFl up cv ~ .L&r 16 amplifLe~ and fL~q,_n ~ multiplie~ the PS~ ~iignals and ~upplie~ thQ amplified ~lgnal~ to anttinna 17 A low-pa~, AntiAliA~i"q, filtQr (not thown) filt-r~ tho ~na-log speech signal input to A/D converter 11 A high-pa~ cont ordQr blqu~d, filter (not ~hown~ filter~ th~ digitized ~ample~
fsom A/D Co~, LLt ll Th- tran~f~r function i~
l 2z-1 +z-2 HE~p(Z) ' 1 -1 . 8891Z-i +0 . 89503Z-2 The hiqh pa~i filt~r attQnuate~ D C or hum contamination nay occur in the i n~ -q ~peech sign~l FIG 2 Hhow~ th~ receivQr of tho L_~f3'_ld ~Ation Jy~~
tem RF down CV~ LL~ 22 receive~ a ~ignal from antQnna 21 and hoteLv~ tho ~ign_l to An i I~te -tL~.~ !) . A/D
cv ~ LL r 23 cv ~, L~ the ~F signAl to ~ digital bit ~tre_m, znd ~d 1 Ator 24 ' ' 1 Ate~ the re~ulting ~it ~tre~m At thi~
point the reVQr~Q of the ;~i~7 proce~ ln th- trAn~mitter talc~
plac- Ch_nn~l decodQr 2S _nd ~pe-ch d~cod~r 26 p~rform '-- 'ing O/A cv,~Les 27 ,~ ~e-i--- _mllog ~p~ch from th~ output of thQ
~peech decoder ISuch of th~ p~cer~ hed in thi~ ~! f ~Ation i~
f ' by a guneral purpo~ ~ign_l ~ a ;"~ progrAm DL~t t~ To facilitate a de~cript$on of th- ~ .f~L..I com-munic~tlon ~y~tem, howeYer, th~ p.~r.. ~ r ~c~tion ~y~tem L~
illustrat~d in t~rm~ of block and circuit fl~ On~ of ordi-n~ry ~kill in the a~t could re~dlly e - ~ the~e ~I~r, int~
progrllm st~t -- for a pLa-e~--. , `` 2 1 ~5546 W0 98/28824 ~ : . J ~ 4~77 FIG. 3 ~how~ th~ encod-r 12 of PIG. 1 ln ~or~ detall, lnclud-lng an audlo PL~ or 31, lln~r pr dlctl~re (t.P) analy~i~ aAd quantization module 32, and open loop pitch e~timation module 33.
Xodule 34 analyze~ each frame of thQ siqnal to determlne whether th~ fr me 1~ mode A, mode B, or modQ C, a~ de~crLbed in more de-t~il bQlow. Xodul~ 35 pArfo~ excitatlon m '~ n~ 'in7 on th~ mode d~t~ l by module 3~. Pr_ 36 ~ --L- com-pros~ed ~peech blt~.
FIG. 4 shows the decoder 26 of Y~G. 2, ~ n7 a ~.oc~.~o~
41 for llnr~rlr~n7 of compressed spe~ch bit~, module 42 for .xclta-tlon ~ignal reconstruction, filter 43, ~peech ~ynthe~l~ fllter ~, and global po~t f ilter 45 .
PIG. 5A ~hows linear predlctlon analy~ls wLndows. Th- pre-ferred ~ tion y~t.m employ~ 40 m~. ~peech frame~. For ~ach frame, modul~ 32 ~ LP (lin-ar ~ rtlo-~) analy~i~ on two 30 ms. windows that are spaced apart by 20 m~. Th~s fLr~t LP
window 1~ c. \~ A at the middle, and the second LP window i~ cen-t~red at th- l~adlng edg~ of th~ ~p~ch f ra~e ~uch that the s~conc;
LP window est~nd~ 15 m~. into tho n~st framo. In oth-r word~, modul~ 32 an~lyz~s a fir~t part of th~ frame (~P window 1) to qen-~r~t- ~ flr~t ~t of fllter '~{r~ t~ and analyz~ a ~econd p~rt of th~ frame and ~ part of a n-st fram (LP wlndow 2) to gen~
rat~ a ~cond set of filter ~
rIG. 5B ~how~ pltch analy~i~ window~. For .each frame, module 32 p~-f~- pltch analysi~ on two 37.62S m~. wLndow~. ThR fir~t pitch analy~is wlndow i~ caAt~L~ at the middl~, and the ~econd pitch analy~is wlndow is cer.te ~d at the l~adlng edge of the woss/2ss24 2 1 6554 6 ~ 77 ~pe~ch frame Duch that thQ ocond pit~h analy~1- window extond~
18 8125 m- lnto the ne~t fr me In other word~, module 32 tn~-A third part of the fr~me (pitch analysi~ window 1) to gen-~rate ~ f~rDt pitch e~timato ant analyzeD a fourth part of the frAme and a part of the ne~t frame (pitch analy-i~ window 2) to generate a Decond pitch e~timat~
~ odul~ 32 employ~ ~ultiplication by ~ Hamming window followeo by a tenth order au~ G-,O lation ~athod of ~ tnaly~L- Nith thi-method of I~P ~naly~iK, module 32 obtalns optimal filter coQf-ficient~ and optimal roflectlon coeffl~-1s~t- In additlon, the re~idual enorgy after LP an~lyDis is alDo readily obtained ~nd, when ~A~ ei as a frtction of thfJ speech energy of the windowed LP ~n~ly-iD buffnr, i~ denoted t- 31 for th~ first LP wLndow ~nd a2 for the second rP wlndow The~e output~ of tho rP analy~i-are uDed ~,' lft,~ tly in the mode ~el~ n algorith~ a~ me~sures of ~pectr~l stationarity, as '- hf~i in ~ore detail below Aft~r LP analy-i~, module 32 ~ th ~r-~' ~ the f~lter coet'f~r~ for the fir-t r~ window, and for th- Decond LP win-dow, by 25 ~z, con~ert~ the ~ rl- ~ to ten line Dpectr~l fre~
tLSF), and ~ th?S~ t n lin~ Dp.~ctr~l f.~ n~ ie~
with a 26-bit LS~ vector ql:~nt~tion (VQ), a~ '- hed below llodule 32 employ- t 26-bit vector qutnt~7~t~on (VQ) for e~ch s t of ten LSFD ~hl- VQ provid.~D good and robuDt ~lLg -nr~
~cro~ a wide range of h~nd-et- ~nd D~ r~ S-partte VQ
co~ are ~ ~' for IRS filt-red tnd ~fltt unfilt.?red (~non-IRs-filtere?d ) speech ~-t~r~Al Tl~e ~nT~-nt1~i LSP vf~ctor 1~ qu-ne~ by th~ S flltered VQ ttble- as well t~ th- fltt _ g _ WO 95/28824 ` 2 1 ~ ~ 5 4 6 PCT/US95/04577 unfLlterQd~ VQ table- The optimum clas~iflcation i~ selected on th~ ba~ls of the cepstral dl~tortlon mea~ure Withln each cla~Lflcatlon, the vector quantlzation i~ carrled out ~lultiple candltates for each split vector are chosen on the basil~ of energy welghtet mean ~quare error, and an overall optimal selectlon i~
mado within each cla~-iflcatlon on th~ ba-l~ of tho cep~tral dlstortlon mea~ure among all comblnation- of cantLdate~ After the optimum c1A~1fi~ation is cho~Qn, thQ q -nt1 ~ llne spectral L,e~l,.s~cles ar~ ~o.~ ~ to filter coeff1~i~nt~
21ore ~ 1fir~11y, module 32 quantlze- the ten line spectr~l frequencles for both sets with a 26-bit multl-cod~bool~ spllt vec-tor quantlzer that clA~ifie~ the ~nT~-nt~?ed llne spectral fre-qu~ncy vector a- a ~voicQd IRS-fLltered,- ~unvolcet IRS-flltered,~
~volcad non-IRS-flltQred,~ and "unvolcQd non-IRS-flltered~ v~ctor, where ~RS~ r~fer~ to Ln~ '~At~ cfla_ ~e ~y~t~m fllter a~
r -ifi~i by CC~q~T, B1U8 ~OOk, RQC.P.4~.
FIG 6 show an outllne of thQ LSF vector guantizatlon pro-c~ odule 32 employ~ ~ spllt vector q ~ ~ for each cla~-lflcatlon, 5n~ 5~"~ a 3-4-3 pllt ve~ctor qu~ntlzer for the volc~d IRS-fllter d~ and th~ ~volced non-IRS-flltQred~ categorie~
51 and S3 T'ne flr-t three LSF- u~e an 8-blt: ' ' ln functior modul~ 55 and 57, th~ ne~ct four LSF- u~- a 10-blt ~ Ln functlon modulQ- 59 and 61, and the la~t thre~e LSFs use a 6-bit co~l~hook ln functlon modulQ~ 63 and 65. For thQ ~unvoiced IRS-fllt~r~td- ~nd tho ~unvoiced non-IRS-filter~d~ categorl~ 52 ~nd 54~ a 3-3-4 lspl$t vector quantizQr Ls u~ d The flrst threst LSF~ USQ a 7-bit ~ in functlon slodules 56 and 58, th- ne~t - : - 21 65546 wo ss/2ss24 . ~ ~ s77 thr~o LSF~ u~ aA 8-blt vector ~ in function module~ 60 and 62, and the last four LSFs U8f, a 9-b$t co~l^~^,ol~ ln function mod-ule~ 6~. And 66 Prom e~ch spllt vector ,o~ ol~, the three be~ft candLdAte~ arQ selected in functLon module~ 67, 6a, 69, and 70 uJing the energy ~_~qht- me~n ~qu_re error crltQrLa The fnerqy welghting reflects the po~Qr lev~l of the spectrAl envelo~ at ~ch l1n~ ~p~ctral f~l r The thre~ be~t candldAte~ for each of the three spl1t vector~ re~ult in a tot_l of twenty-~evQn com-b1n~tLons for each ~;c~f ~ The search 1~ constr~lned so that at le~st one combln_tlon would re~ult in ~n ordered ~et of LSF~
Thls i~ usu~lly a very mlld con~tr~lnt impo~ed on the ~earch The optimum combln~tion of these twenty-~even comb1natlons 1~ ~elected in functlon module 71 rie,p_n~lfn~ on the cepstral dl~tortlon mea-~ure Flnally, the optim~l C~tQgory or ~lA~1ff~etlon is deter-mined _l-o on the ba~i~ of the cep~tr~ll dl~tortlon me~ure The quAnt1- ~ LSFs ~re c~ L-~ to filter co~fff^f-nt- and then to . ,~oc~,Lcl~tion l~q~ for lnterpol_tlon y~
The re~ultlng LSF vector q.~-ntf --r 8chem~ 1~ not only eff~c-tive acro~s nL -~--r~ but al-o acro~ v~rylng degree~ of IRS fil-tering which mod~l- the fnfl ~ ~~ of th- h~nd~et ~ - Th~
: -~--' of th v~ctor ql~-ntf7~r- ~r train~d fro~ a ~1~cty talker spe-ch 'f't^~--G u~1n~ fl~t a~ w~ IRS f~ I ~h~pLn~ Thl~
i~ ~~~lgn~f to provide consl~tent ~nd good pc,~ 9 _cro~ sev-fr~l spe_ker~ And ~Icro~ v_rlou- h-- ~sC~ The average log ~pec-tral distortlon ~Acro~ the entlre TIA h~lf r_te d~t~ba~e i~ ~p-prwcim~tely 1 2 dB for IRS flltered ~peech d_ta ~nd Arr~ teiy 1.3 dB for non-IRS flltered speech d~t~l.
`. 2~ 65 4 wo ss/2ss24 5 6 i ~"1 ~c l~77 Two e~timAte- of the pltch ~re deto m1-- per fr~e ~t lnter-ral~ of 20 m ec ThQs~ opQn loop pLtch e~tim~te~ ~re u~ed in mode ~slection and to encode the clo~ed loop pitch an~ly-$- Lf th~ ~e-lected mode i~ a ~, nAntly voicQd mods Module 33 deto-m~ the two pitch e~tLmate~ from the two pitch ~n~lysL~ wlndow~ ~~ lhsd _bore ln connection w$th FIG 5B
using ~ 1fiod form of the pitch tr~cking ~lgorithm shown in FIG 7 Thi~ pitch Q~timation ~lgorithm m~k~- an initi~l pitch ~-tim_te in function module 73 u-ing ~n error function calcul~ted for ~11 v~lue~ in the set {(22 0, 22 5, , 11~ 5~, follow_d by pitch tr~cking to yield ~n o~r-r~ll optimum pitch r~lu~ Function module 74 employs look-bAck pitch tr_cking u~ing the error func-tion~ and pitch e~timatQs of the preriou~ two pitch ~n~ly~is win-dow~ Function module 75 employ~ look-~he~d pltch tracking using thQ ~rror function- of th- two future pitch analy~i~ window~ D--cision modul~ 76 _--eq pitch e~tim~te~ ng on look-bJck ~nd look-~hQ_d pitch trAcking to yiald ~n ov-r_ll optimum pitch rlllue ~t output ~ The pitch e~tim~tion ~lgorithm ~hown ln FIG
tha error function~ of two futurO pitch ~naly~i~ win-dow~ for it~ look-ah~d pitc~ tr~cking ~nd thu- ~ del~y of 40 IlU In order to aroid thi~ ponalty, th L_~f __ ~ co~-1r~t1~7n ~y~tem employ~ ~ 1f1r~t~1 of the pitch e~tLmation ~lgorithm of YIG 7 ~ IG 8 ~how~ th~ open loop pitch e~t~ 33 of rIG 3 Lnmore d~tail Pitch ~n~ly-i~ window~ on- ~nd two ~r~ input to re-~pQCtiV~ Co_putQ Qrror function- 331 And 332 Th~ output~ of tho~ error functlon comput~tion ~r~ input to ~ rgf1- L of ' 1 G5~46 WO95/28824 P~,11~J.,._'0~'77 p~t pltch eJtimate- 333, and the roflned pitch e-timate- are i~ent to both look b~ck and look ah-ad pitch tr~r1r{n5t 33~. and 335 for pitch window one The output~ of the pitch tr~lring circuits are input to ~elector 336 which select the open loop pitch on~ as the f is~t output The ~elected op~n loop pltch one l- alJo lnput to a look b~ck pitch trJ~cking circuit for pLtch window two whlch out-puts the open loop pitch two Fig 9 how~ the - 'i f i9d pitch tr~r--~ng algorlthm imple-mented by th- pitch estim tion circuitry of FIG 8 The ~~fi~
p$tch eJtl~ t~n algorithm Qmploy- the sam error function as in the Fig 7 algorithm in each pitch an~ly-i~ window, but the pitch tracking scheme i- ~ltered Prlor to pitch t-arl~ ng for either the first or second pitch analysis window, the pre~ious two pitch ~stimate- of the two previous pitch analy i- window are ref ined in function modul~ 81 and 82, re-pectively, with both look-back pitch ~_--'n5t and look-ahead pitch tracking u-ing the ~rror func-tion- of the current two pitch analy~iJ wlndow~ ThiJ i- followed' by look-back pitch trl-r--in~ in fu~ction modul~ 83 for th~ fir~t pitch analy~i~ window using th- r~fined pitch ~timate- and error fllnrri~n~ of th~ two prl~rious pitch an~ly-i~ window ~ook-ahe~d pitch i 'n~ for th~ fir-t pitch annly iJ windo~ in function modul- 8~ i- li2ited to u-ing th- rror function of the second pitch an~ly~i~ window The two e-timate- ar- _ red in deri~ior module 8S to yield an o~-r~ll best pitch e-timat~ for the fir~t pitch analy i~ window For the -cond pitch analy~ window, look-back pitch i ' 'n~t i8 carried out in function modul~ 86 as well a~ th~ pitch estimate of the first pitch analyJis window and _ 13 --f~ 21 6~546 W0 9512882J r~ . ' 1;77 it~ rror function No look-ahead pitch ~r^cl~nrJ i~ u~d for thi~
~econd pltch analy~i~ window wlth th~ re~ult that the look-back pltch e~tLmate 1 taken to bQ the overall be-t pLtch e~ti~te at output 87 PIG 10 show~ the modn d~termLnatlon procP~in7 performed by mode selector 34 . DerPn~t~ n~ on spectral st~tionarlty, pltch ~tationarity, ahort t~rm energy, Ahort tQrm level gradient, and zero cros~lng r~te of each 40 m~ frame, m ode ~lector 34 cla~
fie~ each fr_me lnto one of threo modQ-~ volcQd _nd statlonary mode (Mode A), unvolced or ~rAn~ nt mode (~lode 8), ~nd b~ J
nol~e mode (~odQ C) !Sore speciflcally, mode ~elector 34 gener-ates two loglc~l values, each indicating spectr~l st~tionarity or ~imi1~rity of ~pectr_l content between the currently ~L. e~
fram~ and the prevlou~ frame (St-p 1010) Node selector 34 g~n~r ~tes tw- logicAl v~lue~ indlcating pltch tation~rity, ~imilArity of f lnri tal f~ le~, between the ~ y ~ e~?i fr~Q
and th~ pr~vlou~ fram~ (Step 1020) ~lode ~1ect~?~ 34 gennr~te~
two loglcal value- indlcating th~l zero, ~r ~~lng rat~ of tho cur-r~ntly ~ EI frame (step 1030), a r~te in~l-- - by thQ
h~gher ~ ~ ~ ~ of tho fram~ r~l~tiv~ to the lower of th~ frame ModQ ~slector 3~ gQnQr_te~ twq loglcal v~luQ~ ind$catlng lQvel ~ '~Pnt- within th~ currently y: ~?~ fr_me (step 1030) ~lode ~ Lo~ 34, ~.ta- flve logical valu~- lndicating short-term energy of the currently pro-c~-~ed frame (Step 1050) Su~ ly, mode selector 34 deter-mine~ the mode of thQ frame to be modQ A, moda a, or mode C, de-pendlng on the value~ gener~ted in Step~ 1010-1050 tStep 1060) -- 1~. --2 f 6 ~ 5 4 6 wo ss/2ss24 r~ 0 1~77 F~G 11 1~ a block dlagr~m ~howinq a proce~ of Step 1010 of FIG 10 ln mor- detail The pro~q~in7 of F~G 11 dQtermLne~ a cepstral dl~tortlon ln dB Module 1110 convert~ the guantized f Llter coef f icient~ of window 2 of the current f rame lnto the lag domain, and module 1120 convert- the quantizQd fllter coefflclont~
of window 2 of tho previou~ f rame into thQ laq domaln ~(odule 1130 lnterpolatQ- the output- of moduls~ 1110 and 1120, and ~odule 11~.0 cv ~.Ls the output of modhle 1130 back lnto fllter co-~fici~n-e Modulo 1150 co.,~ .,L~ the output from module 11~0 into the c~pstral domaln, ar~d module 1160 c~ Ls the llnTlAnt1 7ed fil~
- ter coefilclent~ from window 1 of tho current frame lnto the cnp~tral do~aLn ModulQ 11~0 gnnerate~ the cep~tril dl~tortion dc from th~ outputs of 1150 and 1160 PIG 12 ~how~ genQratlon of ~pectral ~tatlonarlty value LPCFIAGl, whieh 18 a r~latlv~ly ~trong 1n 1~r~eor of ~pectral ~tatlonarlty for the fr_me ~lode ~elector 3~ ~ LPCFLAGl u-lng a ~ 'nA~ n of tw~ te~-hn~ -- for - n~ pectral ~tationarity The flrst technlgue ~ the c-p~tral dl~tor-tlon dc u-ing compar_tor~ 1210 and 1220 In Flg 12, th- dtl t` h~ input to comparator 1210 1- -~ 0 and th~ dt2 th~ ld inpue to comparator 1220 1~ -6.0 ~ he seeond tr-~n~T~ i5 ba-ed on thQ ~ l energy after Il?C analy l-, ~::A~ ai a~ a fraetion of the LPC analy~ peech buffer ~p~etral energy Thl~ nergy 1~ a ~ v~..L of LPC analysl-, a- ~9~ above ThQ ~1 lnput to eomparator 1230 i- th- ~J~ energy for th~ filt~r ::9~1c~ t of window 1 and the ~2 input to comparator 1240 1- th~ r~trl~ l energy of 21 6~546 WO 9~/28824 P~ .J.. 1'77 the flltQr coefficientA of window 2. The tl input to compara-torJ 1230 ~nd 1240 i- a thr~hold equ~l to 0 . 25 .
PIG. 13 how~ dataflow within mode ~olQctor 34 for a genera-tion of spQctral 3tationarity valuQ f lag LPCFLllG2, ~hich i~ a rel~tiYeiy weak indicator of ~pectral stationarity. The proce~-lng shown in FIG. 13 i- ~imil~r to that ~hown in FIG. 12, e~cept th~t LPCP~AG2 i~ ba~d on a rQlativoly r~la~ced s~t of thre~hold~.
~he dt2 input to comparator 1310 i~ -6.0, thQ dt3 input to com-parator 1320 i~ -4.0, the dt~ input to comp~rator 1350 i~ -2.0, the .~tl input to comparator~ 1330 ~nd 1340 i~ a thrQ~hold 0.25, and the ~t2 to comparators 1360 and 1370 i~ 0.15.
Mode selector 34 mea~ure~ pLtch se~tinn~ity u~ing both the opQn loop pitch value~ of the currQnt fr mQ, denoted a~ Pl for pltch window 1 and P2 for pitch window 2, and th~ open loop pitch valu~ of window 2 of th~ pr~vlou~ fr~o donoted by P_l. A lowor rangQ of pitch value~ (PLlPUl) ~nd an upper r~ngQ of pltch valuQ-( PL 2PU2 ) ar PLl MIN (~ P2) - Pt P~l llIN (P_l, P2) + Pt PL2 ~A~ (P_l, P2) Pt PU2 IIA~ (P_l, P2) + Pt, wh~r- Pt 1~ 8Ø If tho t ro r~nge~ arn - o rl~1ngr i.o., PL~
~ PU~ ~ then only a weak indicator of pitch ~tation~rity, dQnoted by PITCXPLAG2, is E ~ i hle ~nd P~TCHPLAC2 i~ ~Qt if Pl liQ~ withir~
~ither thn lower rango (PL1, PUl) or upp~r ran~o (PL2, PU2). If 2~ 65546 wo ss/2ss24 ~ 577 the two rang-~ are overlapping, i ~, PL2 ~ PUl, a ~trong indic~-tor of piteh ~tationarity, denoted by PITC~FLAGl, i~ po~ihi~ and i~ set if P1 lie~ within the r~ng- (PL~ PU) ~ where PL ' ~P-l+p2)~2 2pt P ~ ~P IP )/2 1 2P
FIG 1~ ~how~ a dat~flow for gener~tinq PTTC~FLAGl and PITCHFLAG2 wlthin mode ~le~tor 34 Nodule 14005 ~ ~ te3 ~n output equal to the input having the larg-~t value, and module 14010, - t211 an output equal to the input having th~ ~mall~t value~ Nodule 1420 generates an output that i~ an averags of ~hq v~lue~ of the two input~ Module~ 14030, 14035, 14040, 140~5, 14050 ~nd 14055 aro adder- Module~ 14080, 14025 and 1~090 are AD gates Nodule 1408? L~ an inYerter Nodule~ 14065, 14070, ~nd 140?5 are eaeh logic bloc3c~ generating a true output when (C~B)~(C~A) The clrcult of FIG 14 ~l-o ~ r~l~Ah~l1ty value~ V 1 Vl, and V2, eaeh indicatlng wh ther th value~ P 1' Pl, and P2, r~peetiv-ly, ar~ r liable Typlc~llly, th-~- r^l~ah~l~ty valu~
~re a ~ ~ L of th- pltch calculatlon algorith~ Th circuit ~hown ln FIG 14, t~- fal~e v~lue~ for PIq~G 1 and PITC~}J~G 2 lf any of the~ f lag~ V 1 ' Yl ' V2 ~ ar~ f al~- Pro-e-~lng of th~-e rQl~h~l~ty value~ i~ opt~
FIG 15 ~how~ dataflow wlthln mode ~ 34, for g~neratin~
two loglc~l valu~ indleatlng a zQro c_ ~ng rate for the fr~
Nodul-~ 15002, 15004, 15006, 15008, 15010, 15012, 1501J and 15016 wo ss/2ss24 2 1 6 5 5 4 6 ~ 77 ach count th~l numher of zQro ~ i nq~ ln a re~pectiv~ 5 mil-D~ l f~ - of the fram~ currently being ~,~cE~ei For ~camplc, module 15006 countJ the num_er of 2ero LOD~n~ of the ~ignal o~lrri"~ from th~ time 10 millir~ ' from the beginning of the frame to the time lS m~ from the beqinning of th~ frame Comparators lS018, 15020, 15022, 1402~, 15026, 15028, 15030, an~i 15032 in comblnation with adder 15035, g~n_L ,te a ~ralue indlcating the numher of 5 m~llir~ ~ (IIS) ~' r - haYing zero cro~ing~
of ~ lS C tos 15040 Qt~ the fl~g ZC_BOW when the number of ~uch ~--hf ~ leDs than 2, and the comparator 1503~ set~
the flag ZC HIGH when the numher of such 8 hf ~ is greater than 5 The irDalu~ ZCt input to comparatorD 15018-15032 is lS, the valuc Ztl lnput to to 150~0 i~ 2, and th- ~alue Zt2 input to comparator 15037 i~ 5 rlgD 16A, 16B, and 16C how a d~ta flow for gonerating two logical Yalue~ indicati~r~ of ~hort t~rm lev~ Mod~
l-ctor 34 - _D ~hort t~rm l~r l ~ , an indication of t ~n.i~nt~ within a frame, u-ing ~ ~~ filtered ver~ion of th~ - -' input signal amplitude ISodule 16005 g~nerate~ the ~ l t~ ralue of th input Dign~l S(n), module 16010 - - it~
input ~ignnl, and 1~ fllt-r 16015 ~ e~ ~ ~ignal Al,ln) th~t, ~t t~ in~tant n, iD- e ~ i by A~,(n) - (63/64)AI~(n~ (1/64)C(I D(n)¦ ) where the -~irg function C( ) i~ th~ ~I-law function _ 18 --21 6~46 WO 95128824 i i ~ p~ 0 ~'77 in CCIqT G 711 Delay 16025 generates an output that iB a 10 ms-delayed ~rer~lon of it~ Lnput and subtractor 16027 generate~ a dlf-f~renes bQtween AI,~n) and the AL~n~ ~odule 16030 generate~ a ~ignal that Ls an absolute value of its input ~ ery S ms, mode ~elector 34 compares AL~n~ with that of 10 m~ ago and, if the differ--nce ~ n)-A~(n-80)¦ ~xceeds a ~ixod relaxed th ~ t~ a counter ( In th~ preceding ex-pression, 80 c~L,~ ~ ds to 8 samples per ~sS times 10 ~ As shown in Fig 16C, Lf this difference does not ~ceed a relatively stringent threshold ~Lt2 ~ 32) for any ~ mode sslector ~3 s-ts LVBFLAG2, wQakly indicating ~m ab~onc~ of t~n-~nt~ A~
hown in ~ig 16B, if th~ ~ di6 exceed~ ~I more relax~d th l1ho~ Ltl - 10) for no more than one _ - (Lt3 - 2) mode ~-l9cl a- 34 getg LV~PLAGl, gtronqly indicating an absence of tran-sients lloro sporif~ l ly, Fig 163 shows delay circuit~ 16032-16046 that each g~ACLat~ a S ms delayod v~r-ion of its input Each of latch~s 16048-16062 ave a ignal on it- input Latche~ 16048-16062 ar- trob d at a c~,mmGn time, n~ar th- ~nd of ach 40 m~
pe~ch fra~e, ~o that each latch ~a~re~ ~ portion of the fram~
~ i by S m- from the portion ~ved by ~m ad~ac~mt latch C _~ ~oY- 16064-16078 e~ch compar~ th~ output of a re~p cti~r~
l~tch to the th~ ld Ltl and adder 16080 ~um- thQ comparator outputs and s~nd- the sum to comparator 16082 for comparison to th~ ol~ L
Fig 16C how~ a circuit for generating LVLY~aG2 ~n Fig 16C, delays 16132-16146 are similar to th- d~lays ~hown in ; ;`
wo95128824 2 ~ 65 ~46 ~ o Is77 FllJ 16B ~nd latche~ 16148-16162 arQ ~imilar to the latche~ ~hown in Flg 16B Comp~rator~ 16164-16178 e~ch comp~re ~n output of a re~poctlvo latch to ths threshold Lt2 ~ 2 Thu~, OR g~te 16180 generatee a true output if any of th~ latched ~ignal originatinq from ~odule 16030 exceed~ the thre~hold Lt2 Inverter 16182 in-v rt~ thc output of OR gat~ 16180 Flg 17 hows a dat~ flow for genQratins par~mQter~ indica-tlve of ahort tsrm energy Short tsrm energy iB me~ured a~ th~
me~n squ~r~ energy (~vorage energy per ~ample) on ~ frame b~si~
well a~ on ~ 5 m~ b~ The ~hort tarm energy 1~ det~rm1 n~d relative to ~ b _1~9 v~.d energy Ebn Ebn i~ initi~lly ~t to a con~t nt Eo ~ tlOO ~c (12)1~2)2 S~ Lly, when c framo 1~
d-t^rmi~~~ to be mode C, Ebn 1~ -t equ~l to (7/8)Ebn + (1/8)Eo Thus, some of the ~ ol-~ employed in the cLrcuit of FIG 17 aro ~d~ptlYe In Plg 17, Et~ - O ~0~ E~n~ Btl - 5, Et2 ' 2 5 ~bn' Et3 1~8~bn~ ~t4 ' Ebn~ Ets ' 0~707 gbn~ ~nd Et6 ~ 16 0 T~- ~hort term energy on ~ 5 ~ b~ provide- an indication of ~_ of ~pe~ch tl~ .L th~ fram~ u~lng 1l ~lngl~ fl~g EFSAGl, ~hich i~ 3 ~1 by tR-ting tho ~hort t-rm ennrgy on ~ 5 m~ b~ go,in-t ~ 1, in_~ count~r ~ ~r the d i~ nd t~-ting the counter'~ fin~l v~lue n-t ~ f~ed th~ hAld C ,-r~nq th~ ~hort term enerqy on ~
fr~ ba~i~ to variou~ thre~hold- provLd~ indication of ab~-nce of ~po-ch ~k ~ .L th~ framo ln the form of ~ev-r~l fl~g~ with varyinq d~gree~ of ~nnf~d~n~e The~ fl~g~ ~ro denoted a~ E~LAt;2, EFLI~G3, EFLAC4, and EF~AG5 _ 20 --- ` 2l ~546 W095/28824 ,. ~- . PCTIUS95/04577 FIG 17 shows d_taflow within mode selector 34 for generAting th~se flag~ Module~ 1~002, 17004, 17006, 17008, 17010, 17015, 1~020, and 17022 each count the energY in a respective 5 NS
subframe of the fr_me currently being ~ esl~d Comp_rators 17030, 17032, 17034, 17036, 170~8, 17040, 17042, and 17044, in combinatlon with addQr 17050, count thQ numbQr of ~ubframe~ h_Ying an enerQ e '~nq Eto ' 0 707Ebn FIGS 18A, 18B, and 18C ~how th~ rro~P~rin~ of ~tep 1060 Node selector 34 f$r~t rlA~ thQ framQ a~ b~_~yL~ d noise (modQ C) or Ypeech (modes A or B) Mode C tond~ to be character-iz~d by low en-rgy, relativQly hlgh D~' 1 8tAtionarity betW~Qn th~ currQnt frame ~nd the pr viou- fram~l, a rel~tive ab~ence of pitch ~tationarity between the c~rrQnt fram~ and the pr~vious framQ, and a high z~ro c ~~n~ rat- P-- ~ ' noL~e ~mode C) i~ d~-lA ~ QithQr on thQ ba-i~ of the bL~o.~; L short term energg flag EFLAG5 alone or by ~ ` 'n~q we~ker ~hort term energY flag~
Er~AG4, ~AG3, ~nd EFLAG2 with oth~r f lag~ indicating high zero ing rat, ab~enc- of pitch, ab~-nce of ~n~ , etc ~ lorQ ~}-- f~ y, if the mod~ of tho proYiou~ fr~ wa~ A or' if EF~AG2 i~ not tru, ~ c'ng ~OC~ to ~t~p 18045 (~t-p 18005) St p 18005 en-ur-- th~t th~ curr~nt frame will not be d- C if th~ previou- frame wa~ modQ A ~he CurrQnt frame i~
~ode C lf (I~CE~G1 and EFI,AG3) i~ tru~ or (IPCFLaG2 _nd EFIAG4) i~ tru~ or EFI AG5 i~ tru- ( ~t~p~ 18010, 18015, and 18020 ) The currQnt frame i~ mod~ C if ~not PITC~FIAGl) and LPCFIAGl and ZC_HIG2~ true (~t-p 18025) or ( tnot PITC~JUl) and (not PIl~ ) and IPCFLAG2 and ZC_~IIG~ true (~t~p 18030) Thu~, _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ W095128824 ~ 'i'"; i ` ~ 216~5~6 r~ 1577 the ~,~ J~in~ ~hown in Fig 18A deto~1n~- whether the frAme cor-La~ s to a fir~t de (Mode C), d~ g on whether a speech t is sub~tanti~lly absent from the frame In step 18045, ~ score i~ calculated ~leponrl~nl~ on the mode of thQ previous fr me If the mode of the previous ramQ was mode A, the scor~ is 1 + Lvr~ + eyLAcl + ZC LOW If the prevlouM mode -w~ mode B, the ~core i~ 0 + LVFLAGl + ~FLAGl + ZC ~OW If the mode of the previou~ frame wa~ mode C, the ~ore i~ 2 + LYFLAGl +
EFI,AGl + ZC LOW
If the DdQ of the previou~ fr~me w~ mode C or not LY~FLAG2, the mode of the current fr~me is mode B tst~p 18050) The curr~nt framQ i~ mode A if (rPCP~ PITCHFIAGl) 1~ true, provided thc score L~ not les~ than 2 (~tep~ 18060 and 18055) The current fram~ i- mode A if tLPC~AGl and PI~rcHFLAG2) 1~ tru~ or (LPCFLAG2 and PITCHFLAGl ) is true, provided score i~ not le~ th~n 3 ( ~tep~
18070, 18075, ~nd 18080 ) S~ tly, ~peech encod~r 12 gener~t~- an encoded frame in Ac~ A with one of ~ fir~t coding ~chem~ (~ coding ~chemQ for mod~ C), when th- frame ____ ~ d~ to ths first Dde, and an al-t~rnatlv coding ~che (~ codlng schem~ for mod~ A or B), wh-n th- fr~ doe- not c~ to the fir t mod~ d-- ~-~ in mod- det~il below For mod~ A, only th~ ~econd ~et of lln~ ~p~ctr~
v~ctor ~u~ntiz~t~on indlcQ~ nQ~d to be tr~n~mitted because the first s-t can be ~nferred at the r~ceiver du~ to the slowly vary-ing natur of the voc~l tract shape ~n ~dditlon, th~ fir~t and -cond op n loop pitch e~timate~ ~re qr-nt~ nd transmitted 21 ~5546 wo g~/28824 - -- r~ 4'77 . ;:
b~cause they ~re used to encode the closed loop pltch esti~ate~ in e~ch ~ubframe The qu~ntization of the second open loop pitch estimate is a~ ed using a non-uniform 4-bit quantizer while~
the quantization of the fir~t open loop pitch e~timate i~ ac-1~ d u~ing a dif ferentLal non-uniform 3-bit qu~ntizer Since the vector quantization indice~ of the LSF'~ for the fir~t linear prediction analysis window arQ nelther tran~mitted nor used in mode selection, they need not be c~lcul~ted in mode A Thi-r duce~ the c ,l~ity of the short term predictor ~ection of th~
encoder in thls mode Thi~ reduced lP~ity a~ well a~ the lower blt rate of the short term predictor F~ -t~LA in mode A i5 off~et by f~ter update of all the ~ccit~tion model p~ ~Q ~.
For mode B, both sets of llne spectral f~ r.~ vector qu~n-tlr~t~on mu~t be transm~ttQd because of potential spectral nonstationarity ~lowever, for the fir~t ~et of line spectral fre-y~ we need search only 2 of the 4 cl~ification~ or catego-ries This is because the IRS v~ non-IRS solection v~ries very Jlowiy with tiD~ If the s-cond J-t of lin~ ~pectr~l L ~
~re cho-~n from th~ ~voiced IRS-flltQred c~t-; r~ then the first ~t ca~ be ~ ~' to b~ from ith~r the ~voiced IRS-filt-red- or ~ oiced IRS-filtQr~d~ ~ If the ~econd ~ot of lin ~p-ctral frequencieJ were cho-~n from the ~unvoiced IRS-filtered ,~tog ~, then again the fir~t ~et can be ~,~ L
to bQ from either the ~voiced IRS-filtered~ or ~unvoiced IRS-fllt~r~d c~te, ls If the ~Qcond ~et of lin~ ~pectral frequen-ci~- w~r~ cho-~n from the ~voiced non-~RS-filtered~ category, then _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ . . _ Wo ssl28824 ' " ~ ' ' 2 1 6 5 5 4 6 A ~ ~ Q4 77 the flrst set can be Q~pected to be from either the ~voiced non-IRS-filt.red~ or ~unvoiceA non-IRS filtered~ categorie~ Fin~lly, if the ~econd set of line spe~tral freguencie-D ware chosen from th~ ~'unvoiced non-IRS-filtered~ category, then again the first set can be ~ L~ to be from either the ~voiced non-IRS-flltered~ or ~unvoiced non-IRS-filtered~ CGt~3 1Q~ A~ a re~ult only two cat-egories of LSF ~^oA^~o^~ need be Dearched for the quantization of the flr$t D^et of liAe Dpectral frequencie~ Furthermore, only 25 bitD^ arn n~ded to encode thQ-e ~Iuantizatlon indice~ in-tead of the 26 needed for th^D Decond set of LSF'-, ince the optimal cat-ogory for the first ~et can be coded u-Ding ~u-t 1 blt Por mode B, neith~r of the two open loop pitch e-timate- are tr n-Dmitted ~ince they are not u~ed in guiding the clo-ed loop pltch e~tima-t~-, The higher ,l-Yity involved in - '~ng a- well a- thQ
higher bit rate of the short term predictor F' t~LD in mode B
is , ~ated by a slower update of all the excitation model pa-rameterD .
l~or mode C, only the D^econd Det of lLne ~pectral f..~ r~
vector gu~r~r~t~ indlce~ need to be tran-mitted because for th.
human e_r i- not a~ -n-itive to r_pid ch~nge- in ~ Dhape ~a~at~r ~ for noi~y input- FurthRr, ~uch rapid pectral shape var~A~ are atypic_l for many kind~ of ~', ' noi~e ourc~ Por mode C, n ither of the two op~n loop pitch e-Dtimate~
are tran-~itted since they are not u-Qd in guidAing the clo-ed loop pitch e-tim_tion Th- low~r ~ AY~ty involved a- well a~ th.
lower bit rate of th~ short term predictor pA - te.D in mode C is ` - . 21 65546 WO 95/28824 ' I ~ . C.'C 1'77 --t~d by _ fA~ter upd_te of the fLxed cP~ho~k gain portion of the excitatLon model p_rametQr~.
- The gain qu_nti2ation tablQs are tailored to edch of the modes. Al~o in e_ch mode, the clo~ed loop p~rameter~ are refined uOiAg A delayed de~ n appro~ch. Thi~ delayed d~ isn i~ em-ployed in such a WAy th_t the over_ll codQc dQlay i~ not in-cre~sed. Such A dQlayed de~ n ArFrOA-h is very effective in tr~sltlon reglon~.
In modQ A, the qu~ntlzation indlceO co.,~..dlng to the sec-ond sQt of ~hort term predlctor coQfficlents a~ well a~ the op~n loop pitch e-tim~te~ arQ tr_nOm$tt~d. ~nly the~Q q---nt1- 1 param-t-r~ _ro u~ed in thQ Qxclt~tion ~ ng. The 40-mOec speech framQ is d$~1ded into sev~n O~ ~ . ThQ fir~t si~ _re 5 . 75 mOec in length and ~-lrQnth Lo 5 . 5 mO~c in length . In e~ch ..hf r ~n $nterpol_ted Oet of ~hort tQrm prsdlctor coQfficient~
~re u~ed. The lntQrpolatlon lo dono in thQ a~L~cv . ~1 Ation lag domAin. tl~ing thi~ interpol~t~d ~et of cseff~ n~, a clo~ed loop ~n~lyOi~ by 0~ '--i- a~ u~ed to dQrive the optimum pLtch $nd~, pitch gnin lnd~x, f$~ed _- '~ ind ~, and fixed c~nho~)~ g~in index for Q~ch _ . ThQ clo~d loop pitch in-do~ ~rch r~nq i~ round an ~nt~rpolAted tra~-ctory of th- op n loop pltch Q~tim~tQ~. Th- tr~dQ-off betweQn thQ ~earch r~nqe and the pitch rQ~olutlon 1~ donQ ln ~ ~ynam~c fa~hlon d~-pQnding on thQ cl~ of thQ opQn loop pitch QOtimatQ~. The f$xed _c~ l employO zlnc pulo~l ~h~pe~ whlch arQ r~htAin~d u~ins ~ 25 -i: ! 2 ~ 5 5 5 4 6 WO 95/28824 1 ~ rr4'77 weighted combination of the sinc pulse and a phase shifted VQr-~ion of its Hllbert tr~n~form The fixed c '~ gain Ls guan-tized in a differentLal m~nner The analysis by synthesiq technique that is used to derive the excitation model parameters employs an i~t~rpolated ~et of short term predi ctor coefficients in each , h~ ThQ
d-termination of the optimal set of Q~cit~tion model parameter~
for e~ch subframe is dete~min~ only at the end of each 40 IIID.
frAme bec~u~- of delayed deciD~on In derivlng the excitat~ on model parameters, all the seven ~ 1 L - are a~Du~ed to be of l~ngth 5 ~5 mD or forty-si% DampleD However, for the l_st or -venth Dubframe, thQ end of D,bf updateD DUch a~ the ad~ptLve CO~ update and the updatQ of the loc_l ~hort term predictor tat~ vA-~Ahl~ ~re c~rried out only for a D~'~ leAgth of 5 5 mD or forty-four sampleD
The short term predictor FA ~- or lin-~r prediction fil~
ter p~ram ters are interpolated from 2lubf to m'f The lnterpolAtion iD c~rried out ln the a~ < ~l~tion dos~in The n~Arr--l{ -~ ~ lo~ tlon ?ff~Ci d-rived from th~ ne~
filt~r: ~''{r{~nt- for th~ D~ond llne_r ~_ '{~lon an~lyDi~ win~
dow _re denoted ~1- {~ for th~ pr~vlou~ ~0 m fr~me ~nd by {~2(1)} for th~ current 40 mD frame for O _i<10 with ~_1(0)-~2(0)-1 0 Then th~ lnterpolated ~.L~ Ation coef-fl~ients {~'m(~)} ~re then given by m(f)- 'm ~2(f)~[l~vm~ ~ l(f)~ 1 _m<7,0 < f~ 10, 2~ 65546 ~ wo 95/~824 p~.", . ~4~77 ;
or.in vector notation ~ m VmP2+~l~Vm~P~ m~7.
Here, vm is the interpolating weight for subframe m. The inter-polated lag~ {P~m~}~ are ~ub~e.~ tly con~,..LLad to the short tQrm pr~dlctor filter coQfficient~ {a'm( ~
Th~ choice of interpolating weight~ affect~ voica quality in thi~ mod~ ~iqn1f~c^ntly. For thi~ rea-on, they must be determined c~r~fully. The~ int~rpolating weightJ vm hav- beQn detormin~l for subfram~ m by m~n~m~z1n~ the mean ~qu~r~ error between ~ctual ~hort term ~pectral envelope Sm J(~) And the inturpolated short torm power ~pectral envelope S~m J(~) ov~r all speech frame~ J of a very large speech databa~e. ~n other word~, m is det~rmin~d by ~n~m~ 7ing E, ' ~j 21 l¦S,.,,t~)-S .,J~ 2dt,~.
IS the actual A..loc< .-lAtion: ~f~ for ~ ~f m in ~rame J ar- d~not~d by {~ J(k)}, th n by d~finitlon Sm,Jtw) ~ m J(k) e~~wk 0 ~ k -- 2~ --`~ . ` 21 65546 Woss/2ss24 ` ~ ` ;` r~ Q~77 Sub~tituting the abov~ ~quations into thQ pLe- '~n~ equation, it can b- ~hown thAt minimi2in~ Em is equivalent to min;miZinSJ E~m wher~ ~ m is giv~n by m J k~ [om,Jtk) ~' m,J(k)]2, or in vector notAtion ~ m ~ m,J~~ m,J I 1 2, wher~ p~l- ts the vector norm Sub~tltuting p ~ J
into the sboY ~qu~tion, dlffQrenti~ting with r~pect to vm and ~-ttln~ lt to 2~ero r-~ult~ in -Y~
~; lx~
wh-r~ SJ '2 J~ '-1 J 8nd ~,J 'm,J '-l,J and ' SJ,~,J
i- th- dot product b~tws~n v~ctor~ SJ ~nd ~m J The vslue~ of vm calculsted bY th~ aboY method u~ing a v-ry large ~p~qch databa~e ~r- furth-r fin- tun d by li~t-ning tQ~t~
I!h targ-t ~roctor taC for th adsptlYe ~ narch i~
r lat d to th- ~p -ch Y-ctor ~ in ~ach ~ ~ bY -~taCLZ
H r~ th- quar low~r t~^nrl~- toQplits mstrl~ who-~ first column contsin- th- i~pul~ re~pon~- o~ th- 1nt~pol~ted short t~ t^~ {8 D~(f)~ for th~ ~ ~ ~ snd ~ i~ the veceor rort~n~ng it~ z~ro input ~ n~- Th- tsrSI-t v-ctor taC L- most ~ily cslculat~ ubtr_cting th- s~ro lnput -a~ ~3 ~ ':om _ 29 --, .
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ wo 95/288z4 ! 2 1 6 ~ 5 4 6 ~ 77 the speech vector 8 and filtering the difference by the inver~e ~hort term predlctor with zero inlti_l state~.
The adAptive co~ search in adaptive ~o~ho~lrq 3506 and 3507 employ~ a spectrally weLghted mean ~quare error ~i to mea-3ure the diJtance between a candidate v~ctor rl and the target vector taC as given by ~ i ( tac~ r ~ ) W( tac~P ~rf ) -Here, ~'1 is the a~ociated gAin and ~ is the spectral weighting matri~ iJ a po~itive def initc symmetric toeplit2 matri~c that i~
d~riv~td from the truncated impulJ~ e of the ~ irJhtr~d ~hort t~rm predictor with fllter, ~f1~ t~ ~_ m(i)7 }. The ~, ~rJhtin7 f_ctor 7 iS 0.8. Sub~tituting for the optimum ~i in the abov~ e~preJsion, the distortlon term can be rewritten aJ
T t~l]2 i taCl~taC-.~
wher~ the correlatlon term t~C~Ilrl and ei i~ the energy term rlT~lrl. Only tho~e rAnrl~rlAte~A ar~ c~n~i~' ~ that have a po~ltlve corrnlation. ~he be~t candidate vector~ are the one~
that have po~itive correlations and thc highe~t value~ of t,$,2 ~1 wossl2ss2t i~ 2 ~ ~ ~ 5 4 6 F~ 'Ot577 i The c_ndldate vQctOr rl coLL~ dO to dlfferent pitch te-lays The~e pLtch del_ys in sample~ liQ in the rAnge t 20 ,146 1 Fraction-l pitch dQlays arQ possible but the fractioA~l part ~ is restricted to b~ either 0 00, 0 25, O SO or 0 75 The candidate vector ~OLL ~ n7 to an integer delay L is simply read from the vdaptive ~ o~ l~, which io A collection of the pAot excitttion sampleO For a mixed (intQger plu!v fraction) delay L+f the por-tion of the adAptive cod~ho 1 cQntered _round thQ Oection cor-responding to thQ integer dQlay L io f llterod by a polyphave f 11-tar c~LL~ nA~n~ to fr_ction f T- lete candidatQ vQctOr~
~;OIL v~ Aing to low dQlay VA-1UQJ 1Q~ than a suhfr_me length are complQted ln the same m~nn~r aO sugge~ted by J. C ` 1I Qt al ~uprA Th~ polypha~e fllt~r; ~ nts are derlved from a pro-tOtypQ low p o8 filter drsl~n~i to h_VQ good pa~QhAnA as well as good ~vL~,~b~nd ch racterl~tic~ ~_ch polyph_~e filter ha~ 8 tap~
Tha Ad_ptiv~ c~ Q_rch do~ not s~arch _11 candidate vectorJ For thQ f irst 3 0~ -, a 5-bit sQ_rch range is de-te~;nad by thQ tiQcond quantlzed op~n loop pitch eOtimate P 1 f th~ prevlou~ 40 mr framo _nd th~ flrtlt -nt~ e~ op~n loop pitch -tim_to P 1 of the curr~nt 40 mt~ fr~ If th~ prevlou~ ~od~
w~r~l B, th~n the Y_lUQ of P I 1- talcen to b~ thq la~t ~ ,bf L
pitch d-lay in th~ provlou_ fr_m~ ~or th~ t ~ D.'' -~1~ thi~
S-blt ~-~rch rangs i- d~ by th~ econd qu~nt i ~ ~ open loo~
pltch ~ti~te P 2 Of th~ current 4 0 m~ fr_mQ and th~ flr~t qu~n-tized opan loop pitch e~timAte P l of th~ current 40 m~ frA~
}ror th~ iir-t 3 ti~ this S-bit ~Arch r~nge i~ ~plit in:o 2 4-blt r_ng~ wlth aach r~ngQ c~ntara~A around P 1 and P 1 I f =
~ wo 9~/28824 6 ~ ~ 4 6 P ~ I, ., ~ ,~, 77 the~e two 4-bit r~nge~ overlap, then ~ ~Lngle 5-bit range ia u~ed which is centered around {P' l+P'1}/2. Similarly, for the laat 4 ~ hf --, this 5-bit s~arch range is split into 2 4-bit ranqes with each r~nge centered around P'l and P'2. If these two r-bit ranges overlap, then a single 5-bit range i~ used which is cen-tered ~round ~P'l+P'2}/2.
The search range sQlection also det~rmin~Q what fractional re~olution is needed for the clo~ed loop pitch. Thls de~ired fractional re~olution is deto~insd directly from the quantized open loop pitch estimat~s P' 1 and P~ 1 for the first 3 subframes and from P'l and P'2 for the la~t 4 8..hf ~. If the two deter-mining open loop pLtch ~timatQ~ ar- within 4 intQgQr del~y~ of Qach othQr re~ulting in a ~ingle 5-bit search rangQ, only 8 inte-g~r delay~ ~.. te~d around the mid-point are ~Qarched but frac-tional pitch f portion can ~sume valu~ of 0.00, 0.25, 0.50, or 0.75 and are th~..,fGl~ also searched. Thu~ 3 bit~ are u~ed to ~ncode the integer portion while 2 bit~ are u~ed to encode the fr~ctLonal portion of the clo~ed loop pitch. If thQ two determin-ing open loop pitch estimatQ~ arQ within 8 intQger dQlay~ of each other re~ulting in a ~ingle 5-bit ~arch rangQ, only 16 int~ger d l~y ~ round thQ mid-point aro ~Qarched but fractional pitch f portion can a~sumQ value- of 0.0 or 0.5 and are therefore al~o 8 ~ ~ ~ 1. Thu- 4 bit~ are u~ed to encode thQ intQger portion while 1 bit i~ u~Qd to encod~ th~ fraction~l portion of the clo~ed loop pLtch. If thQ two dQtP~in{n~ open loop pitch e~tinate~ are morQ than 8 integer dQlay~ apart, only lnteger d~l~y~ ., f~0.;
only, ~r~ rched in either the ~lngle 5-blt ~arch r~nge or the WO 95128824 1; ' ! .... 2 1 ~ 5 5 4 6 ~ ~ 1 / " ., s , 77 2 ~.-b$t search ranges tetermined. ThUR all 5 bits are spent in -l{n~ the integer portion of the closed loop pitch.
The ~earch c lr~i ty may be reduced in the ca~e of frac-tional pitch delays by first searching for the optimum inteqer delay ~nd ~earching for the optimum fractional pitch delay only in it~ n~j~hhorhr od. One of the 5-bit indice~, the all zero index, i~ c~ ~ for the all zero adaptivQ co~ m1~ vector. ~his is a~ -ted by trimming the 5-bit or 32 pitch delay search ranqe to a 31 pitch delay search range. A- indlcated before, the search i~ restricted to only positive correlatLon~ and the all zero index is chosen if no such positive correlation is found. Th~ adaptiYe co~ ol~ gain 18 d-tr~m{- ~ after s~arch by quantizing the ratio of the optimum correlation to thQ optimu~ energy u~ing a non-uniform 3-bit quantizer. Thi~ 3-bit quantizer only ha~ po~itive gain values in lt since only po~ltive gaLn~ are pos~ible.
Since delayed ~e~ ion i~ e~nployed, the adaptive codr~hoolr s-arch l,~l r~3 thQ two bQ~t pitch dQlay or lag candidates in all Lt~ . Purtl ~ for ,.~ '~ two to ~i~c, thi~ ha~ to be t~d for th~ two be~t target v~ctor~ by the two bQ~t s-t- of ~citation modQl F L d~riYud for the previou~
in the currQnt frame. ~rhi~ re-ult~ ln two be-t lag can-didat~ alld the as~ociated two adaptiYe ~ r gains for hl bf - on- and in four be~t lag c~ndidat~- and the a~ociated four adaptlve ~odn~ovl~ qain~ for "~bf J~ two to ~i~c at the end of th~ ~earch proce~. In each ca~, the targ-t vector for the flsed :: -':~`- i~ derived by ~ubtractinq th~ ~caled adaptive '~~ Dc'- v~ctor from the target for the ataptive co~ ook ~earch, (~ W095128824 ,: . 2 1 6 5 5 4 6 .~,1/U., _'0~577 . _ ~ i,',"
i-e-~ t~e ~ t~C-P Optropt~ where rOpt i~ the seleeted adaptive ho ~lr vsetor and Popt is the asrociated adaptlve cod~ho~
gain .
In mode A, the fix~d cod~hook eonsists of general excitation pulse shape~ eonstrueted from the dLserete Jinc and co c fune-tlons. The Jfne funetion i~ defLned ar Jlne~n) ' ~frn~,rn~ ~ n - O
~fne(0) - 1 n - O
~nd the co~c funetion i~ defLned ar coJc(n) . I-coJ(rn~ , n - O
~n COJC(0) ' 0 n - O
Wlth the~e d~fLnitions Ln mind, the g 1~-- ' exeltation pUlSQ
~haper are ~O..~.L ,.. Lol ar followr~
Zl ( n ) - A ~fnc( n ) I 1~ co~c( n+l ) ~ s l(n) - A Jfne(n) - B co!rc(n-l) The w~ight~ A and El nr~ eho~-n to ba 0.866 ~nd 0.5 respec-tLvely. With the Jfne and COJC f~ t~n~ timQ alignQd, they cor-rQspond to whnt is known a~ zfne ba~i~ f~nrt~^n~ sO(n). Inform~l i~t ning tQ-t~ ~how that ~ - r~fted pul-- shap~ improv~ voice uality of the ~ynt~ 7~ ~peQeh.
The fised ~ for mode A eon~i~t~ of 2 parts eaeh haYi:lg 45 VectOrJ. Th~ fir~t p~rt eonrirt~ of the pul~e rh~lpe z l(n-~S) and i~ 90 ~ample~ long. The ith veetor i~ ~imply the veetor t!at ~tart~ fro~ the ith c~ entry. The ~eond p~rt eon~i~t~ of pe rl(n-~S) ~nd ~ gO ~ple~ long. ~re ~gain, the W09S/28824 ~ 6 ~ o~ ~ 04 7, ~
ith vector i~ simply the vector that starts from the ith rod~hoo entry. ~oth c~.dPh~Qo~A are further trimmed to reduce all small valuus q~peci~lly near the beginning and end of both cod~hool~ to zero. In addition, w~ note that every even ~ample in either co~l~ho~ is identlcal to zero by definition. All this contribute~
to making the ,~,A~ho.~-~ very ~par~e. In addition, we note that both c~ rQ overlApping with ad~Acent vectors h~vinq all but on~ entry in common.
- The ovqrl Arp~n~ nature and th~ spAr~ity of the ~,o.lrho,~ are ~xploited in the co~l~ho~ arch which u~e- the 8A e di~tortion measure as in the adaptivQ coA~ search. This measure calcu-latQ~ the dl~tance between the fixed co~ target vector t~c ~nd every candidate fixed cod~ vector cl _-lSi ' t t~C-~ lCi ) W ( t~C-~ iCi ) Where W i~ the sAme spectral weight$ng mAtrix u~ed in the adaptive ~o~n~olc search And ~ the optimum value of the gain for that ith ~ lc vector. Once the optimum vQctOr ha~ been ~elected ~or each c~-~ol~, the ~ g~ln mAgnitude is quan-tized out~ide the ~e_rch loop by, i~ g the r_tio of thQ opti-mum corr~lation to the optimum energy by ~ non-uniform 4-bit qu~n-tiz~r in odd ~ nd a 3-bit dlfi~ AI non-uniform qu~n-tiz-r in n~en A--''' . E~oth q--nt~r~ h~ve z~ro gAin a~ on- of th ir entri~. The optimal di~tortion for each ~ th-n c~ lAted and the opti~al .ud~ s-le~te~.
The fixed c~ ol~ inde~c for each ~ in the r~nge 0-44 if th~ optimal c~ from ~ 1~n-45) but i~ mapped to :;
~ W095/28824 ~ ,`` r~ c~ol'77 the range 45-89 ~f the opti~l ~a~ on~ from zl(n-45) By com-bLnLng the fixed ~ hook indLces of two consecutive frames I and J_~ 90I+J, we can encode the re~ultlng index u~ing 13 bits This i~ done for 8 i~ -- 1 and 2, 3 And 4, 5 and 6 For ~ubframe 7, the fixed ~o~l~hook index i8 simply encoded u~ing 7 blts The fixed codebook gALn sLgn i~ encoded u~ing 1 bit Ln all ~
~ 'f ~. Th~ fLxed co~iAhook g~in mAgnLtude i8 encoded u~ing 4 bLts ln 8 h' - 1, 3, 5, 7 ~nd u~Lng 3 blt~ ln r~hf - 2, 4, Duu to delAyed ~e~ilTin~, there _re twa tArqet vector~ t8C for thQ fLxed cocl~ hont~ earch Ln the fLr-t ~ ~nding to the tra be~t l~g c~ndLdate- and theLr .c..... ~,,lLng gaLn~ prov$ded by the c~o-ed loop AdaptLve col~hook seArch. For ~-lhf ~~ two to ~-vQn, there Are four target vector~ c~ to the two be~t A-t~ of excitation model FAr Le,O det~ for the previous 8~ }f ~o far _nd to the two be~t lAAg cAndLd~te~ _nd their g~in~ provided by the ad~ptive ~ hook ~e~rch in the current 9 '' . The fixed co~hook ~e_rch i8 th~,efc ~ cArried out two tlme- Ln _ ~ ~ on and four tLme~ Ln ~--hf ~ two to ix 3ut th~ ty do-~ not ~-- -r- in ~ proportLon_t~ m~nner bec_u~e Ln e~ch _ ~ , the Qnergy ter~ c~!lllcl _re the ~e It i~
only t~ ~n~ Atinn term~ tT~C~ICl th,t _re ~t~f~'~ ~ Ln e~ch of th~ two ~ - -- for s~'' on~ and Ln e~ch of th~ four ~earche~
Ln ~1 ' - two to even Delayed JV Al~ earch helps to smooth the pLtch _nd gain CV~ -- ' A Ln _ C~P coder Delayed ~ i nn ia e~ployed in thi~
-- 3s --wo ssi~2ss24 ~- ? i -. - . 2 ~ 6 5 5 4 6 P~llu~, ~4~77 !
. .
invention in Duch a way that the overall codec delay is not in-creas~d Thus, in every subframe, the cloDed loop pitch search PLVI ~6i~ the ~ best estimates For each of the-e M best estimateS
and N best previou-D nl` f parameter~ IN optimum pitch gi~in indices, f i xed ~ h~nk LndiceD, f ixed ~od~ho~k gain indices, and fixed ~ h,o.~- gain DignD ~re derived At the end of the .~' , the~e ~N solutions are prunad to the L best using cumu-lative S~R for the current 40 m~ frame a~ th~ criteria Por th~
fir~t Dl ~ ~ ~2r ~1 and ~2 are u~-d ~or the laDt ~ hf ~2, N~2 and L~l aro UD~d I'or all other 8 ~hf c- -, 1~2, iN-2 and L-2 are used Tho delayed ~ inn approach i8 particularly ef-fectlve Ln the tran~ltlon of volced to unvoiced and un~roiced to volced r~gionD ThlD delayed ~le~ n i ,~ J~-l re~ultD ln N time~
th~ le~ity of the clo-ed loop pitch sQarch but much le~- than ~N times the ~ ty of the fix~d ' ':~' search in each ~ir ' Thl~ i~ becauDe only the correlatlon termi~ need to be calculated ~N time~ for the fixed codGhon~ in each Dubframe but thia energy terms need to be c~lculated only once Tho optlmal ~ ~L;~ for each L ` ~ are detr~ - I only at th~ end of th- ~.0 m~. frame u-lng ~_ '~~ Th~ pruning of ~1 ltir?n- to L ~1~1Ut;r~n~ 18 ~tored for e~ch ii ~f ~ to enable th~
trac~ bacle An exampl~ of how t ~ c ~ 1 { hr~ 3ho~rn in PIG 20 The dark, th~ck line lndlcate~ th~ optlmal path ob-t~ined by t~_- ' - after the la~t ~ r In mode 8, the quantization lndlce- of both set~ of ~hort t-r~ 1- llctor r- Le~.D are tran~mitted but not thQ open loop pltch e~timat~- Th- 40-mDec speech fra 1~ divlded ~nto five _ 36 --WO95/~8824 2 1 6 5546 P~ . c~ 77 B~ each 8 msec long. As ln mode A, an interpolated set o~
filtQr coefficients is used to derive the pitch index, pitch gain lntQx, fiXQd co~hoo~ indQx, and fixod cod~-ho~i~ gain index in a cloDed loop analysis by syntheDis f ashion . ThQ cloDed loop pitch search is unre~tricted in itD range, and only integer pitch delDy are searched. The fixed ~ D a multi-innovation co~ hool~
with zinc pulse section~ aD well aD Hadamard sections. The zinc pul~e sectionD are well suited for ~ n~ nt ~ while the .lAI'i~-. d 9ection-D are better DUitQd for unvoiced segmQnts. The f$xed cod~hool~ sQarch ~ iB '~fied to take advantage of this .
The higher ln-~ ty lnvolved a~ wall aD tha highQr blt rate of the short term predictor r L6~ in mode E iB ~-Dted by a slower update of the excit~tion model r- ~LD.
For mode ~, th~ 40 mD. Jpoech frame iD diYided into five Dubf -. ~ach subfrDme iB of length 8 mD. or sixty-four ~ampleD. The excitation model parameters in each subframe are the adaptive co~lAh>o~ lndex, th~ adaptive . oAnho~ gain, the fixed ind~, and the fi~c d ~ g~in. Ther- 1D no fiXQd codA~ r gain -Dlgn since it i-D alway- poDitiv~ Dt eD-timateD of thesa ~!- ' ar~ de~ - uDing ~n an~lyDiD by -DyntheDiD
method in each D~ ~ . The overall be~t s-ti~at~ iD determ~ ~Dd at the end of the 40 mD. framQ u~ing a delayed ~ approach Dimil~r to mods A.
The Dhort term predictor r~ te D or lin~ar prsdiction fil-tQr E~- L~ D are interpolated from D~'r to '' in the tlon lag domain. ~he r 1~ ~i cu~co~ tion lags -- 37 _ woss/2ss24 ` 2 ~ 65S46 ~"~, I 77 d-rived from thQ quantized fllter coeffLcient~ fo~ the second lin-~ar prediction ~naly~i~ wintow ~r~ denoted a~ ti)~ for the pre~ious 40 ms. frame. The co~ ... ~..ding lag~ for the fir~t and ~econd linear prediction analysis window~ for the current 40 mls.
f rame are denoted by { P 1 ( f ) } and { r2 ~ f ) ~ re~p~ctively . The - 1; 7~ tion ensure~ that ~ -1 ( ) ~1~ ) ~ 2 ( 0 ) 1- 0 ThQ
int~rpolated autocorrelation lags ~m(f)~ are glven by ~ m(f) ~m p~ )+om ~l(f)+[l-~m-tm]~2(i)~
l~m~-5, 0<~ 10 or in vector not~tion ~ m ~m ~-1+m ~l+tl-~m-t].~2 l< m~-s.
Here ~m and Pm are the interpolating weight~ for a~lb~ m.
Th~ interpolation lag~ {~ m(~)} ar~ ly ....~_ L~i to the ~hort term predictor filter - ~c~Pnt~ {a m(~)}.
Tho choice of interpolating wei~Jhts i~ not ~- critical in thl- mode ~ it i~ in mod- A. ~T~ , they h~v~ be-n deter-mined u~lng th~ 8~ ob~ective crlt~rla a~ in mode A ~nd fine tun-lng t~l~m by li~t~ning te~t~. Th- v~lue~ of "m and ~m whlch m~n~m~-- the ob~ective cr~teri~ ~m c~n be ~hown to be rmC-~B
c2 -AB
S C-r,l,A
_ 38 --W095128824 2 1 6 55 46 P~ 577 where A ~ J I I P-1,J-~2,Jl I
B - S I I ~_l,J-t2,J1 1 2 C - <~-l,J-'2,J~'l,J-'2,J ' Sm ~ ~ <~-l,J ~~2,J~'m,J -'2,J ' ~m "m,J -~2,J~l,J -~2,J ~
Ac before, ~ 1 J dQnote~ the Au~oc~ tion lag vQetor do-rivQd from thQ q ~-nti i filtQr coQffici L~ of the second lin~ar predlction analy~L~ window of fr~me J-l, '1 J dRnote~ the a,~o~Ll~latlon lag vector deriv~d from the quantized filter coef-ficient~ of the fir~t linQar prQdiction analy~is window of fralDe J~ ~2 J denote- th- ~U oc~L.9lAtion lag vQctor derivQd from the filtQr ~ ~ of the ~eond linear prediction ~n~ly~i~ window of frame J, and 'm J d not~- th~ ~ctual A t6~ _lAtinn l~g vQCtOr dQrived from thQ ~peQeh ~ample~ in ~ of frame J
Th~ Ad~ptiv~ CC~IA~L~O~ ~e~reh in modl~ B i~ ~imil_r to th~t in mod~ A in that th~ target veetor for th~ ~Q~rch i~ dQrived in the sam~ mA~n~r and th- di~tortion mea~ure u~ld in thQ ~e~rch i~ the ~am~ However, thero ar~ ~ome diffr--- ~. Only all integer piteh dQl~y- in th~ rang- [20,146] ar~ s-arehed and no fraetional _ 39 --woss/2ss24 ; 2~ 65546 r~l,. 01577 pLtch d~lay~ are searched A~ Ln mode A, only poDitive correla-tion~ are considered in the ~earch and the all z~ro index cor-r~pnn~i~ng to an all zero vector iJ assigned if no po~itive cor-relations are found The optimal adaptive cod~ho~l~ index is en-coded u~ing ~ bit~ The adaptive ~dn~on~- gain, whLch i8 guaran-teed to be po~itive, iD g ~nti ~1 outside the search loop u~ing a 3-bit non-uniform guantizer ThlD quantizer is diff~rent from that u~d in mod~ A
AJ in mode A, del~yed ttQ~f r~o'l i8 employed ~o that ~daptive ~oleho~ earch p vl.~ æe thQ two be~t pitch d~lay candidate~ in all Dl b) . In addition, ln 8~ ~ - two to flve, thlD ha~ to be ~ ' for the two b~t target vector~ ,,co~l by th- two be-t s-t~s of excitation model ~ t~ derived for the previou~
r-' - resulting in 4 set~ of adaptive ~ lndLces ~nd ~ociated gain~ ~t the end of th~ _ ~r . In o~eh c~-e, the targut vector for the fixed ~ earch iD derived by ~ub-tracting the ~caled adaptiYe co~t~ol~ vector from the t~rget of th~ adaptive ~ ' '- veetor Th~ fi~d .: -'-~` in mod~ a 9-bit multi-innovation co~nh~A~ with thre~ nn- Th~ fir~t i~ r' veetor sum ~ctlon and th~ ~eond and third ~ LL - ar- r-l~ted to gener~l-i~ d ~ t~ r pul~- ~hap~ z l(n) ~nd zl(n) rQ~pQetivQly The~e pu~ h~pe- h~ve been defined earlier Th~ fir~t ~eetion of thi~
:~ : and the a~oei~ted seareh ~ b~ed on the pub-lieation by D Lin ~Ultr~-~a~t CISLP Coding U~ing llultl C~ -hoo~
Innovation-~, ICASSP92 W~ notQ that in thl~ seetion, th~r~ are -- ~0 --wo 95n8824 . . 2 ~ 6 5 5 ~ 6 ~ ' 0 1 7, 256 innovatlon vectors and thQ se_rch p~oc~lu.~ gu_rantees ~ po5i-tiYe g_in The Decond _nd third DectionJ have 64 innov_tion vec-torD e_ch _nd thuir sQ_rch p.~ d~.~ can produce both positive ~5 wHll aD nQgAtive gains One - of the multi-innov_tLon ~o~hook is the deter-miniDtic vector-sum code conDL.~L~d from the Had_mard matrix Hm The codo vector of the vector-~um code a~ u~ed in this invention is ~ sed as .
UL ' S ~im v m~n),0 ~ ~15, .. 1 wher~ the ba_iD vector~ vmtn) are ~lhtA1n~ from th- rowD of th-P-' r~-SylveDter mAtrix and ~im ~ ~ 1 The ba~i3 vector~ Are D~lected ba~ed on a 2e r partition of th~ P-' -d mAtrix The cod- vectorD of th I - rd vector-~u~ _ ~' are v~lues and binary valu d cote ~s,~ e Cp~red to previou~ly con~id-ered Alg~'~rAic codes, the HadamArd vector-~um cod-s are con-~.a Lo~ to pOD~ mor- lde_l f , ~ r and ph~e char~cteri~-ticD ThL~ i~ due to the b_si~ v ctor p~rtition ~chem~ u~ed in thi~ r {~ for th~ ~A~- r~ m~tri~ which can be i.,L~ ed a~
unLorm 1 { g of th~ ord~red r rd matris row vec-tor~. In contr_~t, non-unlform F ,l{'"J m thod~ h~vo ~_ 1u {nf~-{gr ro~ult-.
The second section of th~ multi-innovation c~-: ~ conDist~
of the pula~ Dh_p- s l(n-63) and i~ 127 ~mple~ long Th~ ith v ctor of thLs ~-ction i~ ~imply th~ vector th~t ~t~rt- from the ith ntry of thLs ~ction Th~ thLrd s~ctLon consistD of th~
wo ss/2ss24 ~ 2 1 6 5 5 4 6 r~ m ~ ~4~ 77 pUl~Q shapQ z l(n-63) ~nd i8 127 ~ampleg long. HerQ i~gain, thQ
ith vQctor of thi3 ~ection is ~imply thQ vector that start~ from the ith entry of thi~ sQction. Both thQ sQcond and third section~
en~oy th~ adYant~qe~ of an oYerlapping naturQ ~nd spar~ity th~t can be exploited by the s~arch ~L~ Le ~utt as in thQ f Lxed co~ in mode A. A~ indlcated earlier, tho ~earch pr4~ e i~
not restrLctQd to pos$tive corrQlation~ and ~L~Lefore both posi-tiYQ a~ wQll as nQgativQ gains can re~ult in the second and third ~ction~ .
OncQ thQ optimum Yector ha~ boen ~el~-~ for each sQctLon, thQ ~o~rho~ gain magnitudQ is q---n~ 1 outsidQ thQ ~Qarch loop by ql~n~r~-~n~ thQ ratio of thQ optimum correlation to the optimus~
nQrgy by a non-uniform 4-bit q~,~nei~or in ~ ~. Thl~
quantiz~r i~ r~fff '~ for the fir~t ~ection whil~ thQ ~econd and third ~ections U~Q a common quant$zer. All ql~~nt~ ~or~ have zero gain a~ one of their entriQ~. Tho optimal di~tortion for e~ch ~ction is then calculated and th~ optim~l ~Qction is finally ~e-lec~ed .
Th~ fi~d c~l~ol~ ind~c for Q~ch ~ in thQ range 0-255 if th optimal ~ YQctor i~ from thQ Ur' rd s~ction.
If it is f~om ths z_l~n-63) ~ction and tho gain sign i~ po~itiYe, it i~ mapp~d to tho r~nqQ 256-319. ~t i~ from the z 1(n-63) ~c-tion and th~ gain ~ign i~ nQgatil~o~ it i~ mapp~d to the range 320-183. 1~ lt l- ~rr~3 t-- zl(n-~ ) ~ th- 9~ lgn l~ ltive, lt :-- WO 95128824 2 1 6 5 5 4 6 ~ / L~. ~ 77 io mapped to thQ r~ngo 384-447 ~f it i~ from the zl(n-63) ~ec-tion and thQ gain 3ign i~ nQgativQ, it i~s m~pped to the r~nge 448-511 The re~ulting index c~n be encoded u~ing 9 bits The fixed co~ho~L g~in magnitude i3 encoded u~ing 4 bits in ~11 5 hf ~ or modQ C, thQ 40 m~ frame i~ divid~d into five ~L": ~ a~
in mod~ 8 Each _ ~- i8 of lQngth 8 m3 or 64 O~mple~l The excit~tion modQl p~rameter~ in e_ch ~ ~re the ~daptive ~odnh~) index, thQ ad~ptive co~ gain, thQ fixed co~lAh~
index, and 2 fiXQd co~nhoo~ g~in-, one flxed ~od~ho^l~ gain being A--_ ~te~l with each half of the ~ubframe Both are gu r~nteed to be po~itivQ and ~ if~ there io no Oiqn infon~tion ~ociat-d with th m A~ in both mode~ A ~nd B, bQot estimate- of thnOe pa-t~ O ar~ A~tD~m1n~ uOing an ~nalysiO by D~ ~t.fl~l~ method in ~nch - Th~ overall b~ot e-tim~te i~ d~to~ir~ t thQ end of thQ ~0 m~ fr~m~ u~ing ~ del~yed ~ n method idQntic_l to that uo~d in mode- A and B
The ~hort term predictor p te~O or linear pr diction fil-t-r ~ L~n _re int^ pol~ted from a ~ ~ to _ ~' - in the c ~ lag domain in Qxactly the same m~nner _0 in modQ
B Howev~r, th~ Int~rr~latinq weight- ~ nd m a-r different fr th~t u~ d in mod~ B Th-y ~r obt~~~l by u~Lng the proc--dure '~ ~ ~ I for modQ B but u~ing various ~ ~ d noi~
ourc~- ~- t--a i n t nq materi~l .
Th~ _daptlY~ e_rch in mod- C 1- ~ al to that in mod B escept th_t both po~itive a- w ll ~- nQg_tive correla-tlons ~r~ ~llowed in the ~Qarch Th optim~l _daptive ~boo) index i- oncod d u-ing ~ bito ~h~ adaptlY ~ gain, which -- ~,3 --Woss/zss24 ~ - '; 2 ~ 6S546 ~ 4577 could be either posltLve or negative, l~ gllAnt~ -i outside the sQ~rch loop u~lng A 3-blt non-uniform quAntlzer. Thi~ quantizer i5 different from th_t usQd ln eithQr mode A or mode B Ln that it h_s a more re~tricted range And may have negative value~ as well.
By ~llowing both po~itive ~ ~ell _~ neg~tive correlation~ in the sQ~rch loop ~nd by having ~ qu~ntlzQr with ~ re~tr~cted dynamic range, periodic artifacts in the synthesized bA~-~,tLv.u~d noi~e due to the adAptlve co l~ho ~ _re reduced CAnAl~-rAhly. In fact, tho ~daptlvQ C~ Ol~ now beha~reA moro likQ _nother fixed co~iAhoolr.
A~ in mode A And mode B, delAyed ~s~ n i~ e~ployed And the adAptive ,~~ o~ ~e~rch ~ h.- ~ the twv be~t cAndidAte~ in _ll ~ ~ -. In ~dditlon, in L ' ~ - twv to flv~, thi_ ha~ to b~
rQpeated for the twv target vQctOr~ L--' ' by the two be~t s~t~
of excitAtion model rA te~ dQrived for the previou~ g~
re~ulting in 4 ~et~ of adaptive ~A~ ' indlce~ and a~-oci~ted g~ins at thu end of thQ s.~ . In each ca-e, thQ target vector for th~ fixed _c '~': :k ~earch i~ derived by ~ubtracting the ~caled ~d~ptivQ ' ' ~' vQctor from thQ t~rget of thQ adaptlvQ ^'-'~ )~
v~ctor.
Th~ fis~d ~ t in mod C 1- a 8-blt multi-innovatlon '~ '- and i~ 'IC'A1 to th~ v~ctor ~um s~ction in thQ n~od- B fl~t~d multi-innov~tion c~ -. ThQ ~e ~oarch pro-cQdurQ ~ e i in thQ public_tion by D . Lin ~Ultra-Fa~t CELP
Codinq U~ing Nulti-Codshool~ ~nnovation~, ICASSP92, i~ used here.
ThQr~ are 256 ~ ' vQctor~ and thQ soarch p v.~u.~ guar_ntees ~ po~itivo g_ln. ThQ flXQd c~le inde~ i~ Qncod~d u~ing 8 blt~ .
_ _ _ _ _ woss/2ss24 - 2 ~ 65546 r~ Sl?$~77 Once thQ optimum co~0~0~k vector ha- been selected, the opti-mum correlatlon and optimum energy are calculated for the first half of the 8 hf - a~ woll a~ the ~econd half of th~ nubframe separately The ratio of the correlation to the energy in both halve~ are guantized ~n~ r~nd~ntly using a S-blt non-unifor~ quAn-tizer that ha~ zero gain a~ one of it~ ontri-~ The u~e of 2 gain~ per 8 b~ en~ure~ a ~h~ e,.u~u.Lion of the back-qround noi~e Due to the delayed r~r,~r~ n, ther~ are two ~et~ of optimum fixed co~ hor~i~ indice~ and gain~ in ~ one and four ~t~ in two to five The delay~d d~ ~l^n ~ - in modQ C i~
n~ to that u~ed in other mode- A and B The optimal par_m-oter~ for ~ach ~ are ~ L ~-- at the end of the 40 m~
frame u~ing an identical t The bit allocatlon among variou~ p~ L61~ i~ _ ri7ed in Figure~ 21A and 21B for mode A, Ylgure 22 for mode B, and Flg~re 23 for mode C The-e p- ~ are packQd by the packing cir-cu$try 36 of Figure 3 Th ~e I L~c- ar- packed in the ~am~
a~ th-y ar~ tabulated in th~- Flgur~ Thu~ for mod~ A, u~ing the name notation a- in Flgur~- 21A and 21B, th y are packQd into a 168 blt ~ise packet every ~0 ms in thQ fsll ng seqUQnCes ~IODEl, ~SP2, ACGl, ACG3, ACG4, ACG5, ACG7, I~CG2, ACG6, PISCNl, PITC~2, AC~1, SIGNl, FCGl, ACI2, SIGN2, FCG2, ACI3, SIGN3, FC~3, ACI4, SIGN4, FCG4, ACI5, SIGNS, PCG5, ACI6, SIG~6, FCG6, ACI7, SIGN~, PCG7, FCI12, FCI34, ~CI56, AND FCI7 For mode ~, u~2nq th~
a notation a~ in Figur~ 21A and 21B, th~ ~ - L6.. ar- packed into a 168 bit ~is~ pack-t ev ry 40 m;c in the foll~ n~ ~equ-nce2 - ~5 --. _ _ _ _ _ _ _ _ _ _ _ wo ~sn8824 ! 2 1 6 5 5 4 6 r~ m '4'77 MODEl, LSP2, ACGl, ACG2, ACG3, ACG4, ACG5, ACIl, FCGl, FCIl, ACI2, FCG2, FCI2, ACI3, FCG3, FCI3, ACI4, FCG4, FCI4, FCI4, ACI5, FCGS, FCI5, LSPl, and MODE2. For mode C, using the ~ame notation a~ in Figures 21A and 21B, they are packed into a 168 bit size packet evQry 40 m~ in the following ~ MODE1, ~SP2, ACGl, ACG2, ACG3, ACG4, ACGS, ACIl, FCG2_1, FCIl, ACI2, FCG2_2, FCI2, ACI3, FCG2 3, FCI3, ACI4, FCG2_4, FCI4, ACI5, FCG2 S, FCI5, FCGl_l, FCGl 2, FCGl 3, FCGl 4, FCGl 5, and MOD~2. The packing ~-~u~ e ln all three mode~ is elesi~n~d to reduce the sensitivity of an ~rror in th~ mode bit~ MODEl and MODE2.
The p~ck$ng i~ done from the MSB or bit 7 to ~SB in blt 0 from bytQ 1 to byte 21. XODEl occ~r1~ the NSB or bit 7 of byte 1. By te~tLng thi~ blt, we can deter 1ne whether the - -~~p~ech belong~ to mode A or not. I~ it 1~ not mode A, we te~t th~
~ODE2 that o~c~ri~ the LSB or bit 0 of byte 21 to decide between mode B and modQ C.
The speech decoder 46 (FIG. 4) i~ ~hown in FIG. 24 and re-ceiv~ the ~ 9~ speech bit~tr-am in the same orm a~ put out by th~ speech ~ncoder of ~IG. 3. Th~ p~rameter~ ar~ ~nrac~
~fter ~ ning whoth-r th~ roceived mode bit~ ate a 1rJt mode (l~ode C), ~ ~cond mode ~lode 13), or ~ th$rd mode (Xode A).
The~ are then u~ed to D~ iZe the speech. Speech decoder 46 ~ynths~ the part of the ~ign~l c~.L~.~..1ing to the frame, ~ '1ng on the second ~et of filter coeffic$ent~, lnd~-p~n~ nt~y of the fir~t g~t of filter coefflc$ent~ ~md the fir~t and ~econd pitch e~timate~, when the f rame i~ dQto~1 n~d to be the 4 2 1 65546 ~ 77 fir~t mode (mode C); ~ynthesizQs the part of the ~ignal cor-re~pont;n~ to the fr~me, Aep~n~lin5~ on the fir~t and ~econd set~ of fllter coQfficient~, inA~ ~ tly of thQ fir~t and second pitch e~timates, when the frame is de~erm~ned to be the second mode (Mode B); and ~ynthe~i~es a part of the ~ignal c~L.. ~onding to the fram~, dep~"A~n~ on thQ ~-cond set of filter co~ffiri~Qts and the first and ~econd pitch e~timatQs, ~nAApAn i tly of the fir~t ~et of filter ~oeff~ nte, when the frame i~ det~in~d to be the third mode (mode A) In addition, thQ speech decoder receives a cyclic reA~ln~i~nry chQck (CRC) ba-ed bad framQ indicator from the channel decoder 45 (FIG 1) Thi- b~d fr~me indictor fl~g i~ used to trigger the bad frame error m~elking and error ~ ction~ (not ~hown) of th~
decoder The~H can ~l~o be ~ by some built-in error d~-tection ~chem~
Speech decoder 46 tQ~ts thQ ~SB or bit 7 of byte 1 to se~ if the - ~rel speech packet c~ o d~ to mode A OtherwiJe, th~ LS~I or bit 0 of byt~ 21 i- t~t d to ~e if the p~cket cor-r~ to mod- 8 or mod~ C Once thQ corr~ct mod~ of thQ ro-c-ived ~ peech pack~t i~ d~tn~m~-~, th~ }~ t~L~ of tho r~c~iv~d l~p~ch fr~me ar- ~, ' i and u~ed to ~yntheJize the ~peQch In ~ddition, th~ pe~ch decod r reCeivQ- a cyclic redun-d~ncy ch~ck (CRC) b~ed bad frame indicator from th~ channel de-coder 2S in l!'igure 1 Thi~ bad f rame indicator f lag i~ u~ed to trigg~r the b~d fr~m~ m~king and error L6C~ L.r portion~ of peech d-coder Th~ can al~o b~ ~ris, ~ by ~om~ built-in er-ror dQtectlon scheme~
- ~7 _ W0 sS/2ss24 ' ~ ' ~ 2 1 6 5 5 4 6 r~ c ~577 In mode A, the received ~Qcond set of line spectr~l fLe~ y indlee~ ~r~ used to reconstruct the qu~ntized fllter coeffLcients which then are converted to aucoc~r cl~tLon lags In e~ch ~l-h' ~~ the ~t~;c~-L,l~tion laq~ are interpolated using the same weight~ ~ u~ed Ln the encoder for mode A and then cu~cLLed to ~hort t-rm predictor filtor ~ fi~nt~ The open loop pitch indices ~IrQ .~ L~e1 to q -rlti - ~ open loop pitch value~ In ~aeh subframe, the~e open loop valuc-~ Ar~ us~d along with e~ch r~eeivod 5-bit adaptive - '-'- '~ inde% to ' ~^~{r^ the pitch do-lay candidate The ~daptiv~ co~ veetor CULL~ jn~ to thi~
dQl~y i~ de~ ' fr the adaptive ' -~ 10~ in Figur~ 24 The adaptivra c~1rho<,k g~in inde~c for e~ch ~.` '. is u~ed to ob-tain the adaptive c ~l~ galn whieh th~n i- ~pplied to the mul-tiplier 104 to ~eal~ the adaptive ~ veetor The fi~c~d v~etor for e~eh ~ubfr~me i~ irlf~rred from the fi~cQd 101 from the ~eeeived fi%ed ~ lr inde~c ~-oei~ted with that subfra~e ~nd thl- iS ~ealed by the ~ d co~nhool~ g~in, obt~1- ~ from th~ reeeiYc-d fi%~d ~ gnin ind~ nd the ~ign ind~c for thAt .,'f~ , by ~ultlpll-r 102 aoth the ~e~led adap-tiVQ c~ '- veetor ~nd tho ~eal~d fi%ed ~ '- vector are ~ummsd by u~m~r 105 to produce an ~elt~tlon ~ign~l whleh i~ en-hane-d by a plteh prefllter 106 a~ in L A Ger~on and M ~ Ja~uik, ~upr~ t~t1t n slgn~l i- u~ed to d~rivQ the hort term predietor 107 nd the ynt~ speech i5 e~ -ly further ~n~ ad by n glob~l pole-zero filter 109 with built in peetr~l tilt corr-etion ~nd enQrgy r~ z~tion At th~ end of eaeh D~' f~ , thl~ ad~pti~e e~ k iS upd~ted by W0 95/28824 - 2 1 6 5 5 4 6 r~ z,,s, ~ 1'77 the excLtatLon signal a~ indicated by the dotted line in ~lgure 25 .
In mode B, both ~et~ of line spectral frequency indices are used to recon~truct both the fir~t and second sets of quantized f$1ter ~o~ffl~iants whLch 8~ tly are converted to au~ tLon lags. In each Dl ` ' r the~e ~ltoc~ latLon l~g~ are interpolated u~ing exactly the ~ame weight~ aJ used in the encoder in mode B and then converted to short term predictor coeffi~-iants. In each subframe, the received adaptive co~lahoo Lndex i~ used to deriva the adaptLve cod~hoolr vector from the ~daptLve ~ ,ho L- 103 and the rec~Lved fLXQd ~ ~'~ '- index i~
used to derLve thQ fixed co~h~k gain indQx are used Ln each subf rame to retrievQ the adaptive ,~.h.~ gain and the f ixed cori~ho~r gain. The exeit~tion vQCtor L~ L~d by ~caling the adaptivQ -~ veetor by thQ adaptivQ col~hool~ gain u~ing multiplier 10~, Yealing thQ fixed ~vd~ho~O~ vQetor by the fix~d ~od~h~ok gain u~ing multiplier 102, and ~umming them using ~ummer 105: A- Ln mode A, thi- L~ i by th- piteh prQfilter 106 prior to ~..L'--i~ by thQ short te m predietor 107. ThQ synth2-~12ed ~p~Qeh i~ further ~nllr-~l ~ by th~ global polQ-zero po~tflltQr 108. At the end of e~eh - '' , thQ adaptLve h>o~ i- updated by thQ Qxeitatlon sLgnal a~ indie~ted by the dotted line in FlgurQ 2~.
In mode C, thQ reeeLved seeond ~et of lin~ 8p~etral f~
indiee~ arQ u~ed to reeonJtruet the qu~nt~ filter eoefficientJ
~hieh thQn are c~ ed to au~occ LL~,latlon lag~ . ~n each ' f , th~ ~- Locc ~ ~lation lag~ aro int~rpolatQd u~ing th~ Jame _ ~,g _ W095~28824 ; ~ 2 1 65546 r~ cl 77 w~ight~ a~ u~od in the encoder for mode C ant then converted to hort t~rm predictor filtQr coefficients In each subframe the received ataptive co~eho~k index i~ used to derive the adaptivQ
corlr~hook vector from the adaptive co~hool~ 103 and the received fixed ~ index i3 u~ed to derive thQ fixed codr~ho~l~ vector from the fixQd coARh~o~ 101 ThQ adaptivQ c~dr~h~k gain index and th~ fixed co~lrhoolc gAin indice~ are used in e~ch 3ubframe to re-tri~v~ the ad~ptive . ~ Ihc lc gain and the fixed _c~ - g~ins for both hAlve~ of thQ ~ The excitation vector is recon-~ by scaling thQ ~daptivs ~o~R~ook vector by thQ adaptivQ40dAl"oo~- gAin u~ing multiplicr 10J, llcalinq the fir~t h~lf of thQ
fl~ed ~ vQctOr by the fir~t fi~ed ~nl~oA~ g~in using ~ul-tiplier 102 and the s~cond half of the fl~ed ~ v~ctor by th~ ~econd fi~d co~J~hoolc g~in u-inq multipliQr 102, and ~ulmninq th~l scaled adAptiv~ ~nd fi~ed .~n~ok v~ctorJ u-ing ~ummer 105 As in mode~ A and B, this i~ ~nhAn~r~ by thQ pitch prefilter 106 prior thQ synthe~is by the ~hort t~rm prediceor 107 The ~ynthe-sized ~p~ch i- furehor a ~~ by the qlobal pol--zero postfilt~r 108 Th~ r ~ ArA of th ~ pitch prefiltQr and global po~t~llt~r u-ed in e~ch ~odQ ar~l dlfferQnt and are t~ilored to ~ch ~od . At th~ Qnd of each ~ ~ , th~ adaptiv~ iJ
upd~t-d by th~ e~cit~tion ign~l _- indicated by th~ dotted lino in Flgure 2~..
A- an_ltern~tiv~ to the illu~trAt~d 1 t, th~
n mAy be practiced wlth a ~hortQr fra~, ~uch a- ~1 22 5 m~
fr~e, a~ hoYn in Fig 25 With ~uch a fra~, it miqht b~
d~-irAhl~ to proce~- only one LP an_ly~i~ window p~r fra~
wos~/28824 2 1 ~546 Pcrlus9s/o~s77 in~tead of the two LP analysis windows lllustrated. The analysis window might begin after a duration Tb relative to the beginning of the current f rame and extend into the next f rame where the window would end after a duration Te relative to the beginning of the next frame, where Te ~ Tb In other wordJ, the total duration of an analysis window could be longer than the duration of ~
frame, and two consecutiYe windows could, therefore, encompas~ a particular frame. Thus, a current frame could be analyzed by processing the analysis window for the current frame together with the analysis window for the previous frame.
Thu~, the pref erred co~munic~tion sy~tem detects when nois~
i~ the pred i n~nt - t of a signal f rame and encodes a noise-predominated frame differently than for a speech-predomi-nated frame. Thls ~pecial ~n~-oA~ n~ for noise avoids some of the typical artLfacts produced when noi~e 1~ encoded with a scheme optimized for speech. This special ~ncoAing allow improved voice quality in a low rate bit-rate codec systQm.
Additional advantage~ and '{fic~tlon~ will re~dily occur to tho~e s3cillQd in the art. T~ invQntion in it~ broader aspects is therefor~ not limited to the spQcific dQta$1s, representative ap-par~tu~, and illu~trative example~ shown and de~cribed. ~arious modif ic~tion~ and Yariation~ can b~ made to the present invention ~ithout depa~tlnq from the ~cop~ or spir~t of the inventiorl, and it i~ intend~d that t~e pr~sent inYention cover the modifica~ions a~d ~ariAtion3 pro~ided thQ~ co3e with~n th6~ scope of ch~? 2ppende~1 c ~ ~ims and their equi~ent& .
et
Rl2T1~P r ~ ~ o~ T~S DR~DGS
~ he forqgo;n~ And other ob~ect-, Aspect- ~nd _dv_nt~qe- will be ~atter u~d~L~L~ from the followlnq det~iled de-cription of ~
preferr~d ` ~ L of the invention wlth reforence to the drav-inqs, in which I
FIG l 18 _ block di_qram of a tr~n~mitter in ~ wlrele~ com-munic_tion sy~tem Acc~r~i{nq to a pr~ferred A ' ~ t of the in-v~ntion;
~ IG 2 is ~ block di~gr~m of ~ receiver in ~ wir~la~- com-munic_tion ~y~tem Accor~l1n~ to the p.~f.L._d ~ i t of the invention;
FIG 3 i- block diAgram of th- encoder in the tran-mitter Jhown in FIG . l;
FIG 4 i- ~ bloc~c dlagr~m of the decod~r in the receiv-r shown in FIG. 2 ~ TG 5A i~ a ti~ng dlagrA showing th~ Alla t of linear predictlon ~m~ly~s window- in th~ encoder shown ln FIG 3;
; `;- `~ 2 ~ 65546 WO95/28824 p~,""~ c~o1-77 rIG~ 5~ timing dl_grA~ ~howLng the ~ , t of pit~h prediction ~n~ly~i~ windows for open loop pitch prediction in the encoder Yhown Ln FIG 3;
FIG 6 and 68 _re a f lowchart illustr_ting the 26-blt line spectral ~ vector quAnti2atlon proce-- performed by th-encoder of l! ~G 3;
FIG ~ is a flowchart illustrAting the op~_tinn of ~ pitch tr~l cklng algorithm;
FTG 8 i~ _ block diagra~ showing in more det_il the open loop pitch e~tlm~tion of the encoder shown in FIG 3;
FIG g i- a f ~ t illu~tr~ting th- oper~tion of thn modi-fied pitch i 'ng algorithm i ,1~ by th- op~n loop pitch ~tim tion ~hown in F$G B;
PIG 10 i~ _ fl~ t ~howing the ~__ m~ ' -9 ~ ~ r - by the mode i~t^~m~nA~ n module ~hown in ~IG 3;
FIG 11 is a dataflow di_gra~ showing a part of the proce~-ing of a ~tep of det~ininq spectr_l ~tationarity ~r~lue~ shown ir~
FIG 10;
wo ss/zssz4 Pcr/usss/04s77 ~IG 12 1- a dataflow diagram showing anothQr part of the ~e~-in~ of the step of det~ininq spectral statlonarity v~l-u~;
FIG 13 18 a dataflow diaqram showing ~nother part of the proces~ing of the ~tep of det~"nin;nq ~pectral ~t_tlonarity val-u~ 5 FIG 14 i~ a dataflow diagram ~howing th~ pro~ nq of the stop of det~ n;~J pltch stationarity value~ ~hown in FIG 10;
FIG 15 is a ~A~fl~ dlagram showlng the pro~a~ln~ of the ~t-p of g~nerating z~ro cro~ing rat~ valu~ ~hown ln FIG 10;
FIG 16 is a dataflow dl_gram showlng th~ p~u~e~~~nq of the ~tep of det~n~q level grA~i~^nt value~ ln YIG 10;
FIG 17 1~ a d~t~ dlagram showing tho p,~c ~-in7 of tha _top of date~n~ng Ahort-t~rm energy value- ~hown in FIG 10;
~ IGS. 18~, 18B and 18C are a fl~ t of detn~in~n~ the moda b~- ~d on th~ ~ U~d value- a~ hown in YIG 10;
FIG. 19 i- a S~locl~ dlagram showing in mor~ det~il the ~ tlon of th~ e~ccltatlon l~ng c~rcultry o~ the encodet ~hown in PIG 3;
_ 5 _ 2 1 6 ~ 5 4 6 w0 ss/2ss24 r~l~L ./~ ~s77 PIGS 20 1J a diagram lllustratLng a proce~Lng of the ~ncod~r ~how Ln FLg 3;
FIGS 21~ ant 21B are a chart of speech coder ~ ~er~ for mod~ A;
FIGS 22 LJ a chart of ~peech coder parameter~ for mode A;
FIG 23 L~ a chart of spe~ch coder paramet~r~ for mode A;
~ IG 24 Ls a block dLagram Lllu~tratlng a ~_ _ e ~ i nq of the ~peech decoder ghowA ln FIG 4; and PIG 25 Ls a timing diagram showing ~n alternative ~1~, t of llnear predictlon analy~l~ window-~n DEscRIPq!~ON OF A r~rSr~, ~M~nr~T~vuq~ OF ~HE lh.r~
FIG 1 ~how~ the tr~n~mitter of the i.,af~ tion~y~t~ Analoq-to-dlgltal (AtD) ~ ,La~ 11 Rample- analog ~peech fro~ a t~lq~h~ - hand-~t at an 8 1~}~ rate, ~_,L. to digltal value- and tupplie~ the dlgital v~lue- to the speech en-cod~r 12 Channel encoder 13 further ~ncode~ th~ signal, a~ may be requlred ln a digltal ~ r ~ 1 rtlom~ ~y tem, and ~p-pll~ a r~ultlng encoded bit ~tr~am to a modulator 14 Digital-to-~n~log (DtA) converter 15 c~ L~ the output of th~ modulator wo g5n8824 P~
1~, to Ph_~- Shit ~ying (PS~) ~ignal~ Radlo fr~ (RFl up cv ~ .L&r 16 amplifLe~ and fL~q,_n ~ multiplie~ the PS~ ~iignals and ~upplie~ thQ amplified ~lgnal~ to anttinna 17 A low-pa~, AntiAliA~i"q, filtQr (not thown) filt-r~ tho ~na-log speech signal input to A/D converter 11 A high-pa~ cont ordQr blqu~d, filter (not ~hown~ filter~ th~ digitized ~ample~
fsom A/D Co~, LLt ll Th- tran~f~r function i~
l 2z-1 +z-2 HE~p(Z) ' 1 -1 . 8891Z-i +0 . 89503Z-2 The hiqh pa~i filt~r attQnuate~ D C or hum contamination nay occur in the i n~ -q ~peech sign~l FIG 2 Hhow~ th~ receivQr of tho L_~f3'_ld ~Ation Jy~~
tem RF down CV~ LL~ 22 receive~ a ~ignal from antQnna 21 and hoteLv~ tho ~ign_l to An i I~te -tL~.~ !) . A/D
cv ~ LL r 23 cv ~, L~ the ~F signAl to ~ digital bit ~tre_m, znd ~d 1 Ator 24 ' ' 1 Ate~ the re~ulting ~it ~tre~m At thi~
point the reVQr~Q of the ;~i~7 proce~ ln th- trAn~mitter talc~
plac- Ch_nn~l decodQr 2S _nd ~pe-ch d~cod~r 26 p~rform '-- 'ing O/A cv,~Les 27 ,~ ~e-i--- _mllog ~p~ch from th~ output of thQ
~peech decoder ISuch of th~ p~cer~ hed in thi~ ~! f ~Ation i~
f ' by a guneral purpo~ ~ign_l ~ a ;"~ progrAm DL~t t~ To facilitate a de~cript$on of th- ~ .f~L..I com-munic~tlon ~y~tem, howeYer, th~ p.~r.. ~ r ~c~tion ~y~tem L~
illustrat~d in t~rm~ of block and circuit fl~ On~ of ordi-n~ry ~kill in the a~t could re~dlly e - ~ the~e ~I~r, int~
progrllm st~t -- for a pLa-e~--. , `` 2 1 ~5546 W0 98/28824 ~ : . J ~ 4~77 FIG. 3 ~how~ th~ encod-r 12 of PIG. 1 ln ~or~ detall, lnclud-lng an audlo PL~ or 31, lln~r pr dlctl~re (t.P) analy~i~ aAd quantization module 32, and open loop pitch e~timation module 33.
Xodule 34 analyze~ each frame of thQ siqnal to determlne whether th~ fr me 1~ mode A, mode B, or modQ C, a~ de~crLbed in more de-t~il bQlow. Xodul~ 35 pArfo~ excitatlon m '~ n~ 'in7 on th~ mode d~t~ l by module 3~. Pr_ 36 ~ --L- com-pros~ed ~peech blt~.
FIG. 4 shows the decoder 26 of Y~G. 2, ~ n7 a ~.oc~.~o~
41 for llnr~rlr~n7 of compressed spe~ch bit~, module 42 for .xclta-tlon ~ignal reconstruction, filter 43, ~peech ~ynthe~l~ fllter ~, and global po~t f ilter 45 .
PIG. 5A ~hows linear predlctlon analy~ls wLndows. Th- pre-ferred ~ tion y~t.m employ~ 40 m~. ~peech frame~. For ~ach frame, modul~ 32 ~ LP (lin-ar ~ rtlo-~) analy~i~ on two 30 ms. windows that are spaced apart by 20 m~. Th~s fLr~t LP
window 1~ c. \~ A at the middle, and the second LP window i~ cen-t~red at th- l~adlng edg~ of th~ ~p~ch f ra~e ~uch that the s~conc;
LP window est~nd~ 15 m~. into tho n~st framo. In oth-r word~, modul~ 32 an~lyz~s a fir~t part of th~ frame (~P window 1) to qen-~r~t- ~ flr~t ~t of fllter '~{r~ t~ and analyz~ a ~econd p~rt of th~ frame and ~ part of a n-st fram (LP wlndow 2) to gen~
rat~ a ~cond set of filter ~
rIG. 5B ~how~ pltch analy~i~ window~. For .each frame, module 32 p~-f~- pltch analysi~ on two 37.62S m~. wLndow~. ThR fir~t pitch analy~is wlndow i~ caAt~L~ at the middl~, and the ~econd pitch analy~is wlndow is cer.te ~d at the l~adlng edge of the woss/2ss24 2 1 6554 6 ~ 77 ~pe~ch frame Duch that thQ ocond pit~h analy~1- window extond~
18 8125 m- lnto the ne~t fr me In other word~, module 32 tn~-A third part of the fr~me (pitch analysi~ window 1) to gen-~rate ~ f~rDt pitch e~timato ant analyzeD a fourth part of the frAme and a part of the ne~t frame (pitch analy-i~ window 2) to generate a Decond pitch e~timat~
~ odul~ 32 employ~ ~ultiplication by ~ Hamming window followeo by a tenth order au~ G-,O lation ~athod of ~ tnaly~L- Nith thi-method of I~P ~naly~iK, module 32 obtalns optimal filter coQf-ficient~ and optimal roflectlon coeffl~-1s~t- In additlon, the re~idual enorgy after LP an~lyDis is alDo readily obtained ~nd, when ~A~ ei as a frtction of thfJ speech energy of the windowed LP ~n~ly-iD buffnr, i~ denoted t- 31 for th~ first LP wLndow ~nd a2 for the second rP wlndow The~e output~ of tho rP analy~i-are uDed ~,' lft,~ tly in the mode ~el~ n algorith~ a~ me~sures of ~pectr~l stationarity, as '- hf~i in ~ore detail below Aft~r LP analy-i~, module 32 ~ th ~r-~' ~ the f~lter coet'f~r~ for the fir-t r~ window, and for th- Decond LP win-dow, by 25 ~z, con~ert~ the ~ rl- ~ to ten line Dpectr~l fre~
tLSF), and ~ th?S~ t n lin~ Dp.~ctr~l f.~ n~ ie~
with a 26-bit LS~ vector ql:~nt~tion (VQ), a~ '- hed below llodule 32 employ- t 26-bit vector qutnt~7~t~on (VQ) for e~ch s t of ten LSFD ~hl- VQ provid.~D good and robuDt ~lLg -nr~
~cro~ a wide range of h~nd-et- ~nd D~ r~ S-partte VQ
co~ are ~ ~' for IRS filt-red tnd ~fltt unfilt.?red (~non-IRs-filtere?d ) speech ~-t~r~Al Tl~e ~nT~-nt1~i LSP vf~ctor 1~ qu-ne~ by th~ S flltered VQ ttble- as well t~ th- fltt _ g _ WO 95/28824 ` 2 1 ~ ~ 5 4 6 PCT/US95/04577 unfLlterQd~ VQ table- The optimum clas~iflcation i~ selected on th~ ba~ls of the cepstral dl~tortlon mea~ure Withln each cla~Lflcatlon, the vector quantlzation i~ carrled out ~lultiple candltates for each split vector are chosen on the basil~ of energy welghtet mean ~quare error, and an overall optimal selectlon i~
mado within each cla~-iflcatlon on th~ ba-l~ of tho cep~tral dlstortlon mea~ure among all comblnation- of cantLdate~ After the optimum c1A~1fi~ation is cho~Qn, thQ q -nt1 ~ llne spectral L,e~l,.s~cles ar~ ~o.~ ~ to filter coeff1~i~nt~
21ore ~ 1fir~11y, module 32 quantlze- the ten line spectr~l frequencles for both sets with a 26-bit multl-cod~bool~ spllt vec-tor quantlzer that clA~ifie~ the ~nT~-nt~?ed llne spectral fre-qu~ncy vector a- a ~voicQd IRS-fLltered,- ~unvolcet IRS-flltered,~
~volcad non-IRS-flltQred,~ and "unvolcQd non-IRS-flltered~ v~ctor, where ~RS~ r~fer~ to Ln~ '~At~ cfla_ ~e ~y~t~m fllter a~
r -ifi~i by CC~q~T, B1U8 ~OOk, RQC.P.4~.
FIG 6 show an outllne of thQ LSF vector guantizatlon pro-c~ odule 32 employ~ ~ spllt vector q ~ ~ for each cla~-lflcatlon, 5n~ 5~"~ a 3-4-3 pllt ve~ctor qu~ntlzer for the volc~d IRS-fllter d~ and th~ ~volced non-IRS-flltQred~ categorie~
51 and S3 T'ne flr-t three LSF- u~e an 8-blt: ' ' ln functior modul~ 55 and 57, th~ ne~ct four LSF- u~- a 10-blt ~ Ln functlon modulQ- 59 and 61, and the la~t thre~e LSFs use a 6-bit co~l~hook ln functlon modulQ~ 63 and 65. For thQ ~unvoiced IRS-fllt~r~td- ~nd tho ~unvoiced non-IRS-filter~d~ categorl~ 52 ~nd 54~ a 3-3-4 lspl$t vector quantizQr Ls u~ d The flrst threst LSF~ USQ a 7-bit ~ in functlon slodules 56 and 58, th- ne~t - : - 21 65546 wo ss/2ss24 . ~ ~ s77 thr~o LSF~ u~ aA 8-blt vector ~ in function module~ 60 and 62, and the last four LSFs U8f, a 9-b$t co~l^~^,ol~ ln function mod-ule~ 6~. And 66 Prom e~ch spllt vector ,o~ ol~, the three be~ft candLdAte~ arQ selected in functLon module~ 67, 6a, 69, and 70 uJing the energy ~_~qht- me~n ~qu_re error crltQrLa The fnerqy welghting reflects the po~Qr lev~l of the spectrAl envelo~ at ~ch l1n~ ~p~ctral f~l r The thre~ be~t candldAte~ for each of the three spl1t vector~ re~ult in a tot_l of twenty-~evQn com-b1n~tLons for each ~;c~f ~ The search 1~ constr~lned so that at le~st one combln_tlon would re~ult in ~n ordered ~et of LSF~
Thls i~ usu~lly a very mlld con~tr~lnt impo~ed on the ~earch The optimum combln~tion of these twenty-~even comb1natlons 1~ ~elected in functlon module 71 rie,p_n~lfn~ on the cepstral dl~tortlon mea-~ure Flnally, the optim~l C~tQgory or ~lA~1ff~etlon is deter-mined _l-o on the ba~i~ of the cep~tr~ll dl~tortlon me~ure The quAnt1- ~ LSFs ~re c~ L-~ to filter co~fff^f-nt- and then to . ,~oc~,Lcl~tion l~q~ for lnterpol_tlon y~
The re~ultlng LSF vector q.~-ntf --r 8chem~ 1~ not only eff~c-tive acro~s nL -~--r~ but al-o acro~ v~rylng degree~ of IRS fil-tering which mod~l- the fnfl ~ ~~ of th- h~nd~et ~ - Th~
: -~--' of th v~ctor ql~-ntf7~r- ~r train~d fro~ a ~1~cty talker spe-ch 'f't^~--G u~1n~ fl~t a~ w~ IRS f~ I ~h~pLn~ Thl~
i~ ~~~lgn~f to provide consl~tent ~nd good pc,~ 9 _cro~ sev-fr~l spe_ker~ And ~Icro~ v_rlou- h-- ~sC~ The average log ~pec-tral distortlon ~Acro~ the entlre TIA h~lf r_te d~t~ba~e i~ ~p-prwcim~tely 1 2 dB for IRS flltered ~peech d_ta ~nd Arr~ teiy 1.3 dB for non-IRS flltered speech d~t~l.
`. 2~ 65 4 wo ss/2ss24 5 6 i ~"1 ~c l~77 Two e~timAte- of the pltch ~re deto m1-- per fr~e ~t lnter-ral~ of 20 m ec ThQs~ opQn loop pLtch e~tim~te~ ~re u~ed in mode ~slection and to encode the clo~ed loop pitch an~ly-$- Lf th~ ~e-lected mode i~ a ~, nAntly voicQd mods Module 33 deto-m~ the two pitch e~tLmate~ from the two pitch ~n~lysL~ wlndow~ ~~ lhsd _bore ln connection w$th FIG 5B
using ~ 1fiod form of the pitch tr~cking ~lgorithm shown in FIG 7 Thi~ pitch Q~timation ~lgorithm m~k~- an initi~l pitch ~-tim_te in function module 73 u-ing ~n error function calcul~ted for ~11 v~lue~ in the set {(22 0, 22 5, , 11~ 5~, follow_d by pitch tr~cking to yield ~n o~r-r~ll optimum pitch r~lu~ Function module 74 employs look-bAck pitch tr_cking u~ing the error func-tion~ and pitch e~timatQs of the preriou~ two pitch ~n~ly~is win-dow~ Function module 75 employ~ look-~he~d pltch tracking using thQ ~rror function- of th- two future pitch analy~i~ window~ D--cision modul~ 76 _--eq pitch e~tim~te~ ng on look-bJck ~nd look-~hQ_d pitch trAcking to yiald ~n ov-r_ll optimum pitch rlllue ~t output ~ The pitch e~tim~tion ~lgorithm ~hown ln FIG
tha error function~ of two futurO pitch ~naly~i~ win-dow~ for it~ look-ah~d pitc~ tr~cking ~nd thu- ~ del~y of 40 IlU In order to aroid thi~ ponalty, th L_~f __ ~ co~-1r~t1~7n ~y~tem employ~ ~ 1f1r~t~1 of the pitch e~tLmation ~lgorithm of YIG 7 ~ IG 8 ~how~ th~ open loop pitch e~t~ 33 of rIG 3 Lnmore d~tail Pitch ~n~ly-i~ window~ on- ~nd two ~r~ input to re-~pQCtiV~ Co_putQ Qrror function- 331 And 332 Th~ output~ of tho~ error functlon comput~tion ~r~ input to ~ rgf1- L of ' 1 G5~46 WO95/28824 P~,11~J.,._'0~'77 p~t pltch eJtimate- 333, and the roflned pitch e-timate- are i~ent to both look b~ck and look ah-ad pitch tr~r1r{n5t 33~. and 335 for pitch window one The output~ of the pitch tr~lring circuits are input to ~elector 336 which select the open loop pitch on~ as the f is~t output The ~elected op~n loop pltch one l- alJo lnput to a look b~ck pitch trJ~cking circuit for pLtch window two whlch out-puts the open loop pitch two Fig 9 how~ the - 'i f i9d pitch tr~r--~ng algorlthm imple-mented by th- pitch estim tion circuitry of FIG 8 The ~~fi~
p$tch eJtl~ t~n algorithm Qmploy- the sam error function as in the Fig 7 algorithm in each pitch an~ly-i~ window, but the pitch tracking scheme i- ~ltered Prlor to pitch t-arl~ ng for either the first or second pitch analysis window, the pre~ious two pitch ~stimate- of the two previous pitch analy i- window are ref ined in function modul~ 81 and 82, re-pectively, with both look-back pitch ~_--'n5t and look-ahead pitch tracking u-ing the ~rror func-tion- of the current two pitch analy~iJ wlndow~ ThiJ i- followed' by look-back pitch trl-r--in~ in fu~ction modul~ 83 for th~ fir~t pitch analy~i~ window using th- r~fined pitch ~timate- and error fllnrri~n~ of th~ two prl~rious pitch an~ly-i~ window ~ook-ahe~d pitch i 'n~ for th~ fir-t pitch annly iJ windo~ in function modul- 8~ i- li2ited to u-ing th- rror function of the second pitch an~ly~i~ window The two e-timate- ar- _ red in deri~ior module 8S to yield an o~-r~ll best pitch e-timat~ for the fir~t pitch analy i~ window For the -cond pitch analy~ window, look-back pitch i ' 'n~t i8 carried out in function modul~ 86 as well a~ th~ pitch estimate of the first pitch analyJis window and _ 13 --f~ 21 6~546 W0 9512882J r~ . ' 1;77 it~ rror function No look-ahead pitch ~r^cl~nrJ i~ u~d for thi~
~econd pltch analy~i~ window wlth th~ re~ult that the look-back pltch e~tLmate 1 taken to bQ the overall be-t pLtch e~ti~te at output 87 PIG 10 show~ the modn d~termLnatlon procP~in7 performed by mode selector 34 . DerPn~t~ n~ on spectral st~tionarlty, pltch ~tationarity, ahort t~rm energy, Ahort tQrm level gradient, and zero cros~lng r~te of each 40 m~ frame, m ode ~lector 34 cla~
fie~ each fr_me lnto one of threo modQ-~ volcQd _nd statlonary mode (Mode A), unvolced or ~rAn~ nt mode (~lode 8), ~nd b~ J
nol~e mode (~odQ C) !Sore speciflcally, mode ~elector 34 gener-ates two loglc~l values, each indicating spectr~l st~tionarity or ~imi1~rity of ~pectr_l content between the currently ~L. e~
fram~ and the prevlou~ frame (St-p 1010) Node selector 34 g~n~r ~tes tw- logicAl v~lue~ indlcating pltch tation~rity, ~imilArity of f lnri tal f~ le~, between the ~ y ~ e~?i fr~Q
and th~ pr~vlou~ fram~ (Step 1020) ~lode ~1ect~?~ 34 gennr~te~
two loglcal value- indlcating th~l zero, ~r ~~lng rat~ of tho cur-r~ntly ~ EI frame (step 1030), a r~te in~l-- - by thQ
h~gher ~ ~ ~ ~ of tho fram~ r~l~tiv~ to the lower of th~ frame ModQ ~slector 3~ gQnQr_te~ twq loglcal v~luQ~ ind$catlng lQvel ~ '~Pnt- within th~ currently y: ~?~ fr_me (step 1030) ~lode ~ Lo~ 34, ~.ta- flve logical valu~- lndicating short-term energy of the currently pro-c~-~ed frame (Step 1050) Su~ ly, mode selector 34 deter-mine~ the mode of thQ frame to be modQ A, moda a, or mode C, de-pendlng on the value~ gener~ted in Step~ 1010-1050 tStep 1060) -- 1~. --2 f 6 ~ 5 4 6 wo ss/2ss24 r~ 0 1~77 F~G 11 1~ a block dlagr~m ~howinq a proce~ of Step 1010 of FIG 10 ln mor- detail The pro~q~in7 of F~G 11 dQtermLne~ a cepstral dl~tortlon ln dB Module 1110 convert~ the guantized f Llter coef f icient~ of window 2 of the current f rame lnto the lag domain, and module 1120 convert- the quantizQd fllter coefflclont~
of window 2 of tho previou~ f rame into thQ laq domaln ~(odule 1130 lnterpolatQ- the output- of moduls~ 1110 and 1120, and ~odule 11~.0 cv ~.Ls the output of modhle 1130 back lnto fllter co-~fici~n-e Modulo 1150 co.,~ .,L~ the output from module 11~0 into the c~pstral domaln, ar~d module 1160 c~ Ls the llnTlAnt1 7ed fil~
- ter coefilclent~ from window 1 of tho current frame lnto the cnp~tral do~aLn ModulQ 11~0 gnnerate~ the cep~tril dl~tortion dc from th~ outputs of 1150 and 1160 PIG 12 ~how~ genQratlon of ~pectral ~tatlonarlty value LPCFIAGl, whieh 18 a r~latlv~ly ~trong 1n 1~r~eor of ~pectral ~tatlonarlty for the fr_me ~lode ~elector 3~ ~ LPCFLAGl u-lng a ~ 'nA~ n of tw~ te~-hn~ -- for - n~ pectral ~tationarity The flrst technlgue ~ the c-p~tral dl~tor-tlon dc u-ing compar_tor~ 1210 and 1220 In Flg 12, th- dtl t` h~ input to comparator 1210 1- -~ 0 and th~ dt2 th~ ld inpue to comparator 1220 1~ -6.0 ~ he seeond tr-~n~T~ i5 ba-ed on thQ ~ l energy after Il?C analy l-, ~::A~ ai a~ a fraetion of the LPC analy~ peech buffer ~p~etral energy Thl~ nergy 1~ a ~ v~..L of LPC analysl-, a- ~9~ above ThQ ~1 lnput to eomparator 1230 i- th- ~J~ energy for th~ filt~r ::9~1c~ t of window 1 and the ~2 input to comparator 1240 1- th~ r~trl~ l energy of 21 6~546 WO 9~/28824 P~ .J.. 1'77 the flltQr coefficientA of window 2. The tl input to compara-torJ 1230 ~nd 1240 i- a thr~hold equ~l to 0 . 25 .
PIG. 13 how~ dataflow within mode ~olQctor 34 for a genera-tion of spQctral 3tationarity valuQ f lag LPCFLllG2, ~hich i~ a rel~tiYeiy weak indicator of ~pectral stationarity. The proce~-lng shown in FIG. 13 i- ~imil~r to that ~hown in FIG. 12, e~cept th~t LPCP~AG2 i~ ba~d on a rQlativoly r~la~ced s~t of thre~hold~.
~he dt2 input to comparator 1310 i~ -6.0, thQ dt3 input to com-parator 1320 i~ -4.0, the dt~ input to comp~rator 1350 i~ -2.0, the .~tl input to comparator~ 1330 ~nd 1340 i~ a thrQ~hold 0.25, and the ~t2 to comparators 1360 and 1370 i~ 0.15.
Mode selector 34 mea~ure~ pLtch se~tinn~ity u~ing both the opQn loop pitch value~ of the currQnt fr mQ, denoted a~ Pl for pltch window 1 and P2 for pitch window 2, and th~ open loop pitch valu~ of window 2 of th~ pr~vlou~ fr~o donoted by P_l. A lowor rangQ of pitch value~ (PLlPUl) ~nd an upper r~ngQ of pltch valuQ-( PL 2PU2 ) ar PLl MIN (~ P2) - Pt P~l llIN (P_l, P2) + Pt PL2 ~A~ (P_l, P2) Pt PU2 IIA~ (P_l, P2) + Pt, wh~r- Pt 1~ 8Ø If tho t ro r~nge~ arn - o rl~1ngr i.o., PL~
~ PU~ ~ then only a weak indicator of pitch ~tation~rity, dQnoted by PITCXPLAG2, is E ~ i hle ~nd P~TCHPLAC2 i~ ~Qt if Pl liQ~ withir~
~ither thn lower rango (PL1, PUl) or upp~r ran~o (PL2, PU2). If 2~ 65546 wo ss/2ss24 ~ 577 the two rang-~ are overlapping, i ~, PL2 ~ PUl, a ~trong indic~-tor of piteh ~tationarity, denoted by PITC~FLAGl, i~ po~ihi~ and i~ set if P1 lie~ within the r~ng- (PL~ PU) ~ where PL ' ~P-l+p2)~2 2pt P ~ ~P IP )/2 1 2P
FIG 1~ ~how~ a dat~flow for gener~tinq PTTC~FLAGl and PITCHFLAG2 wlthin mode ~le~tor 34 Nodule 14005 ~ ~ te3 ~n output equal to the input having the larg-~t value, and module 14010, - t211 an output equal to the input having th~ ~mall~t value~ Nodule 1420 generates an output that i~ an averags of ~hq v~lue~ of the two input~ Module~ 14030, 14035, 14040, 140~5, 14050 ~nd 14055 aro adder- Module~ 14080, 14025 and 1~090 are AD gates Nodule 1408? L~ an inYerter Nodule~ 14065, 14070, ~nd 140?5 are eaeh logic bloc3c~ generating a true output when (C~B)~(C~A) The clrcult of FIG 14 ~l-o ~ r~l~Ah~l1ty value~ V 1 Vl, and V2, eaeh indicatlng wh ther th value~ P 1' Pl, and P2, r~peetiv-ly, ar~ r liable Typlc~llly, th-~- r^l~ah~l~ty valu~
~re a ~ ~ L of th- pltch calculatlon algorith~ Th circuit ~hown ln FIG 14, t~- fal~e v~lue~ for PIq~G 1 and PITC~}J~G 2 lf any of the~ f lag~ V 1 ' Yl ' V2 ~ ar~ f al~- Pro-e-~lng of th~-e rQl~h~l~ty value~ i~ opt~
FIG 15 ~how~ dataflow wlthln mode ~ 34, for g~neratin~
two loglc~l valu~ indleatlng a zQro c_ ~ng rate for the fr~
Nodul-~ 15002, 15004, 15006, 15008, 15010, 15012, 1501J and 15016 wo ss/2ss24 2 1 6 5 5 4 6 ~ 77 ach count th~l numher of zQro ~ i nq~ ln a re~pectiv~ 5 mil-D~ l f~ - of the fram~ currently being ~,~cE~ei For ~camplc, module 15006 countJ the num_er of 2ero LOD~n~ of the ~ignal o~lrri"~ from th~ time 10 millir~ ' from the beginning of the frame to the time lS m~ from the beqinning of th~ frame Comparators lS018, 15020, 15022, 1402~, 15026, 15028, 15030, an~i 15032 in comblnation with adder 15035, g~n_L ,te a ~ralue indlcating the numher of 5 m~llir~ ~ (IIS) ~' r - haYing zero cro~ing~
of ~ lS C tos 15040 Qt~ the fl~g ZC_BOW when the number of ~uch ~--hf ~ leDs than 2, and the comparator 1503~ set~
the flag ZC HIGH when the numher of such 8 hf ~ is greater than 5 The irDalu~ ZCt input to comparatorD 15018-15032 is lS, the valuc Ztl lnput to to 150~0 i~ 2, and th- ~alue Zt2 input to comparator 15037 i~ 5 rlgD 16A, 16B, and 16C how a d~ta flow for gonerating two logical Yalue~ indicati~r~ of ~hort t~rm lev~ Mod~
l-ctor 34 - _D ~hort t~rm l~r l ~ , an indication of t ~n.i~nt~ within a frame, u-ing ~ ~~ filtered ver~ion of th~ - -' input signal amplitude ISodule 16005 g~nerate~ the ~ l t~ ralue of th input Dign~l S(n), module 16010 - - it~
input ~ignnl, and 1~ fllt-r 16015 ~ e~ ~ ~ignal Al,ln) th~t, ~t t~ in~tant n, iD- e ~ i by A~,(n) - (63/64)AI~(n~ (1/64)C(I D(n)¦ ) where the -~irg function C( ) i~ th~ ~I-law function _ 18 --21 6~46 WO 95128824 i i ~ p~ 0 ~'77 in CCIqT G 711 Delay 16025 generates an output that iB a 10 ms-delayed ~rer~lon of it~ Lnput and subtractor 16027 generate~ a dlf-f~renes bQtween AI,~n) and the AL~n~ ~odule 16030 generate~ a ~ignal that Ls an absolute value of its input ~ ery S ms, mode ~elector 34 compares AL~n~ with that of 10 m~ ago and, if the differ--nce ~ n)-A~(n-80)¦ ~xceeds a ~ixod relaxed th ~ t~ a counter ( In th~ preceding ex-pression, 80 c~L,~ ~ ds to 8 samples per ~sS times 10 ~ As shown in Fig 16C, Lf this difference does not ~ceed a relatively stringent threshold ~Lt2 ~ 32) for any ~ mode sslector ~3 s-ts LVBFLAG2, wQakly indicating ~m ab~onc~ of t~n-~nt~ A~
hown in ~ig 16B, if th~ ~ di6 exceed~ ~I more relax~d th l1ho~ Ltl - 10) for no more than one _ - (Lt3 - 2) mode ~-l9cl a- 34 getg LV~PLAGl, gtronqly indicating an absence of tran-sients lloro sporif~ l ly, Fig 163 shows delay circuit~ 16032-16046 that each g~ACLat~ a S ms delayod v~r-ion of its input Each of latch~s 16048-16062 ave a ignal on it- input Latche~ 16048-16062 ar- trob d at a c~,mmGn time, n~ar th- ~nd of ach 40 m~
pe~ch fra~e, ~o that each latch ~a~re~ ~ portion of the fram~
~ i by S m- from the portion ~ved by ~m ad~ac~mt latch C _~ ~oY- 16064-16078 e~ch compar~ th~ output of a re~p cti~r~
l~tch to the th~ ld Ltl and adder 16080 ~um- thQ comparator outputs and s~nd- the sum to comparator 16082 for comparison to th~ ol~ L
Fig 16C how~ a circuit for generating LVLY~aG2 ~n Fig 16C, delays 16132-16146 are similar to th- d~lays ~hown in ; ;`
wo95128824 2 ~ 65 ~46 ~ o Is77 FllJ 16B ~nd latche~ 16148-16162 arQ ~imilar to the latche~ ~hown in Flg 16B Comp~rator~ 16164-16178 e~ch comp~re ~n output of a re~poctlvo latch to ths threshold Lt2 ~ 2 Thu~, OR g~te 16180 generatee a true output if any of th~ latched ~ignal originatinq from ~odule 16030 exceed~ the thre~hold Lt2 Inverter 16182 in-v rt~ thc output of OR gat~ 16180 Flg 17 hows a dat~ flow for genQratins par~mQter~ indica-tlve of ahort tsrm energy Short tsrm energy iB me~ured a~ th~
me~n squ~r~ energy (~vorage energy per ~ample) on ~ frame b~si~
well a~ on ~ 5 m~ b~ The ~hort tarm energy 1~ det~rm1 n~d relative to ~ b _1~9 v~.d energy Ebn Ebn i~ initi~lly ~t to a con~t nt Eo ~ tlOO ~c (12)1~2)2 S~ Lly, when c framo 1~
d-t^rmi~~~ to be mode C, Ebn 1~ -t equ~l to (7/8)Ebn + (1/8)Eo Thus, some of the ~ ol-~ employed in the cLrcuit of FIG 17 aro ~d~ptlYe In Plg 17, Et~ - O ~0~ E~n~ Btl - 5, Et2 ' 2 5 ~bn' Et3 1~8~bn~ ~t4 ' Ebn~ Ets ' 0~707 gbn~ ~nd Et6 ~ 16 0 T~- ~hort term energy on ~ 5 ~ b~ provide- an indication of ~_ of ~pe~ch tl~ .L th~ fram~ u~lng 1l ~lngl~ fl~g EFSAGl, ~hich i~ 3 ~1 by tR-ting tho ~hort t-rm ennrgy on ~ 5 m~ b~ go,in-t ~ 1, in_~ count~r ~ ~r the d i~ nd t~-ting the counter'~ fin~l v~lue n-t ~ f~ed th~ hAld C ,-r~nq th~ ~hort term enerqy on ~
fr~ ba~i~ to variou~ thre~hold- provLd~ indication of ab~-nce of ~po-ch ~k ~ .L th~ framo ln the form of ~ev-r~l fl~g~ with varyinq d~gree~ of ~nnf~d~n~e The~ fl~g~ ~ro denoted a~ E~LAt;2, EFLI~G3, EFLAC4, and EF~AG5 _ 20 --- ` 2l ~546 W095/28824 ,. ~- . PCTIUS95/04577 FIG 17 shows d_taflow within mode selector 34 for generAting th~se flag~ Module~ 1~002, 17004, 17006, 17008, 17010, 17015, 1~020, and 17022 each count the energY in a respective 5 NS
subframe of the fr_me currently being ~ esl~d Comp_rators 17030, 17032, 17034, 17036, 170~8, 17040, 17042, and 17044, in combinatlon with addQr 17050, count thQ numbQr of ~ubframe~ h_Ying an enerQ e '~nq Eto ' 0 707Ebn FIGS 18A, 18B, and 18C ~how th~ rro~P~rin~ of ~tep 1060 Node selector 34 f$r~t rlA~ thQ framQ a~ b~_~yL~ d noise (modQ C) or Ypeech (modes A or B) Mode C tond~ to be character-iz~d by low en-rgy, relativQly hlgh D~' 1 8tAtionarity betW~Qn th~ currQnt frame ~nd the pr viou- fram~l, a rel~tive ab~ence of pitch ~tationarity between the c~rrQnt fram~ and the pr~vious framQ, and a high z~ro c ~~n~ rat- P-- ~ ' noL~e ~mode C) i~ d~-lA ~ QithQr on thQ ba-i~ of the bL~o.~; L short term energg flag EFLAG5 alone or by ~ ` 'n~q we~ker ~hort term energY flag~
Er~AG4, ~AG3, ~nd EFLAG2 with oth~r f lag~ indicating high zero ing rat, ab~enc- of pitch, ab~-nce of ~n~ , etc ~ lorQ ~}-- f~ y, if the mod~ of tho proYiou~ fr~ wa~ A or' if EF~AG2 i~ not tru, ~ c'ng ~OC~ to ~t~p 18045 (~t-p 18005) St p 18005 en-ur-- th~t th~ curr~nt frame will not be d- C if th~ previou- frame wa~ modQ A ~he CurrQnt frame i~
~ode C lf (I~CE~G1 and EFI,AG3) i~ tru~ or (IPCFLaG2 _nd EFIAG4) i~ tru~ or EFI AG5 i~ tru- ( ~t~p~ 18010, 18015, and 18020 ) The currQnt frame i~ mod~ C if ~not PITC~FIAGl) and LPCFIAGl and ZC_HIG2~ true (~t-p 18025) or ( tnot PITC~JUl) and (not PIl~ ) and IPCFLAG2 and ZC_~IIG~ true (~t~p 18030) Thu~, _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ W095128824 ~ 'i'"; i ` ~ 216~5~6 r~ 1577 the ~,~ J~in~ ~hown in Fig 18A deto~1n~- whether the frAme cor-La~ s to a fir~t de (Mode C), d~ g on whether a speech t is sub~tanti~lly absent from the frame In step 18045, ~ score i~ calculated ~leponrl~nl~ on the mode of thQ previous fr me If the mode of the previous ramQ was mode A, the scor~ is 1 + Lvr~ + eyLAcl + ZC LOW If the prevlouM mode -w~ mode B, the ~core i~ 0 + LVFLAGl + ~FLAGl + ZC ~OW If the mode of the previou~ frame wa~ mode C, the ~ore i~ 2 + LYFLAGl +
EFI,AGl + ZC LOW
If the DdQ of the previou~ fr~me w~ mode C or not LY~FLAG2, the mode of the current fr~me is mode B tst~p 18050) The curr~nt framQ i~ mode A if (rPCP~ PITCHFIAGl) 1~ true, provided thc score L~ not les~ than 2 (~tep~ 18060 and 18055) The current fram~ i- mode A if tLPC~AGl and PI~rcHFLAG2) 1~ tru~ or (LPCFLAG2 and PITCHFLAGl ) is true, provided score i~ not le~ th~n 3 ( ~tep~
18070, 18075, ~nd 18080 ) S~ tly, ~peech encod~r 12 gener~t~- an encoded frame in Ac~ A with one of ~ fir~t coding ~chem~ (~ coding ~chemQ for mod~ C), when th- frame ____ ~ d~ to ths first Dde, and an al-t~rnatlv coding ~che (~ codlng schem~ for mod~ A or B), wh-n th- fr~ doe- not c~ to the fir t mod~ d-- ~-~ in mod- det~il below For mod~ A, only th~ ~econd ~et of lln~ ~p~ctr~
v~ctor ~u~ntiz~t~on indlcQ~ nQ~d to be tr~n~mitted because the first s-t can be ~nferred at the r~ceiver du~ to the slowly vary-ing natur of the voc~l tract shape ~n ~dditlon, th~ fir~t and -cond op n loop pitch e~timate~ ~re qr-nt~ nd transmitted 21 ~5546 wo g~/28824 - -- r~ 4'77 . ;:
b~cause they ~re used to encode the closed loop pltch esti~ate~ in e~ch ~ubframe The qu~ntization of the second open loop pitch estimate is a~ ed using a non-uniform 4-bit quantizer while~
the quantization of the fir~t open loop pitch e~timate i~ ac-1~ d u~ing a dif ferentLal non-uniform 3-bit qu~ntizer Since the vector quantization indice~ of the LSF'~ for the fir~t linear prediction analysis window arQ nelther tran~mitted nor used in mode selection, they need not be c~lcul~ted in mode A Thi-r duce~ the c ,l~ity of the short term predictor ~ection of th~
encoder in thls mode Thi~ reduced lP~ity a~ well a~ the lower blt rate of the short term predictor F~ -t~LA in mode A i5 off~et by f~ter update of all the ~ccit~tion model p~ ~Q ~.
For mode B, both sets of llne spectral f~ r.~ vector qu~n-tlr~t~on mu~t be transm~ttQd because of potential spectral nonstationarity ~lowever, for the fir~t ~et of line spectral fre-y~ we need search only 2 of the 4 cl~ification~ or catego-ries This is because the IRS v~ non-IRS solection v~ries very Jlowiy with tiD~ If the s-cond J-t of lin~ ~pectr~l L ~
~re cho-~n from th~ ~voiced IRS-flltQred c~t-; r~ then the first ~t ca~ be ~ ~' to b~ from ith~r the ~voiced IRS-filt-red- or ~ oiced IRS-filtQr~d~ ~ If the ~econd ~ot of lin ~p-ctral frequencieJ were cho-~n from the ~unvoiced IRS-filtered ,~tog ~, then again the fir~t ~et can be ~,~ L
to bQ from either the ~voiced IRS-filtered~ or ~unvoiced IRS-fllt~r~d c~te, ls If the ~Qcond ~et of lin~ ~pectral frequen-ci~- w~r~ cho-~n from the ~voiced non-~RS-filtered~ category, then _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ . . _ Wo ssl28824 ' " ~ ' ' 2 1 6 5 5 4 6 A ~ ~ Q4 77 the flrst set can be Q~pected to be from either the ~voiced non-IRS-filt.red~ or ~unvoiceA non-IRS filtered~ categorie~ Fin~lly, if the ~econd set of line spe~tral freguencie-D ware chosen from th~ ~'unvoiced non-IRS-filtered~ category, then again the first set can be ~ L~ to be from either the ~voiced non-IRS-flltered~ or ~unvoiced non-IRS-filtered~ CGt~3 1Q~ A~ a re~ult only two cat-egories of LSF ~^oA^~o^~ need be Dearched for the quantization of the flr$t D^et of liAe Dpectral frequencie~ Furthermore, only 25 bitD^ arn n~ded to encode thQ-e ~Iuantizatlon indice~ in-tead of the 26 needed for th^D Decond set of LSF'-, ince the optimal cat-ogory for the first ~et can be coded u-Ding ~u-t 1 blt Por mode B, neith~r of the two open loop pitch e-timate- are tr n-Dmitted ~ince they are not u~ed in guiding the clo-ed loop pltch e~tima-t~-, The higher ,l-Yity involved in - '~ng a- well a- thQ
higher bit rate of the short term predictor F' t~LD in mode B
is , ~ated by a slower update of all the excitation model pa-rameterD .
l~or mode C, only the D^econd Det of lLne ~pectral f..~ r~
vector gu~r~r~t~ indlce~ need to be tran-mitted because for th.
human e_r i- not a~ -n-itive to r_pid ch~nge- in ~ Dhape ~a~at~r ~ for noi~y input- FurthRr, ~uch rapid pectral shape var~A~ are atypic_l for many kind~ of ~', ' noi~e ourc~ Por mode C, n ither of the two op~n loop pitch e-Dtimate~
are tran-~itted since they are not u-Qd in guidAing the clo-ed loop pitch e-tim_tion Th- low~r ~ AY~ty involved a- well a~ th.
lower bit rate of th~ short term predictor pA - te.D in mode C is ` - . 21 65546 WO 95/28824 ' I ~ . C.'C 1'77 --t~d by _ fA~ter upd_te of the fLxed cP~ho~k gain portion of the excitatLon model p_rametQr~.
- The gain qu_nti2ation tablQs are tailored to edch of the modes. Al~o in e_ch mode, the clo~ed loop p~rameter~ are refined uOiAg A delayed de~ n appro~ch. Thi~ delayed d~ isn i~ em-ployed in such a WAy th_t the over_ll codQc dQlay i~ not in-cre~sed. Such A dQlayed de~ n ArFrOA-h is very effective in tr~sltlon reglon~.
In modQ A, the qu~ntlzation indlceO co.,~..dlng to the sec-ond sQt of ~hort term predlctor coQfficlents a~ well a~ the op~n loop pitch e-tim~te~ arQ tr_nOm$tt~d. ~nly the~Q q---nt1- 1 param-t-r~ _ro u~ed in thQ Qxclt~tion ~ ng. The 40-mOec speech framQ is d$~1ded into sev~n O~ ~ . ThQ fir~t si~ _re 5 . 75 mOec in length and ~-lrQnth Lo 5 . 5 mO~c in length . In e~ch ..hf r ~n $nterpol_ted Oet of ~hort tQrm prsdlctor coQfficient~
~re u~ed. The lntQrpolatlon lo dono in thQ a~L~cv . ~1 Ation lag domAin. tl~ing thi~ interpol~t~d ~et of cseff~ n~, a clo~ed loop ~n~lyOi~ by 0~ '--i- a~ u~ed to dQrive the optimum pLtch $nd~, pitch gnin lnd~x, f$~ed _- '~ ind ~, and fixed c~nho~)~ g~in index for Q~ch _ . ThQ clo~d loop pitch in-do~ ~rch r~nq i~ round an ~nt~rpolAted tra~-ctory of th- op n loop pltch Q~tim~tQ~. Th- tr~dQ-off betweQn thQ ~earch r~nqe and the pitch rQ~olutlon 1~ donQ ln ~ ~ynam~c fa~hlon d~-pQnding on thQ cl~ of thQ opQn loop pitch QOtimatQ~. The f$xed _c~ l employO zlnc pulo~l ~h~pe~ whlch arQ r~htAin~d u~ins ~ 25 -i: ! 2 ~ 5 5 5 4 6 WO 95/28824 1 ~ rr4'77 weighted combination of the sinc pulse and a phase shifted VQr-~ion of its Hllbert tr~n~form The fixed c '~ gain Ls guan-tized in a differentLal m~nner The analysis by synthesiq technique that is used to derive the excitation model parameters employs an i~t~rpolated ~et of short term predi ctor coefficients in each , h~ ThQ
d-termination of the optimal set of Q~cit~tion model parameter~
for e~ch subframe is dete~min~ only at the end of each 40 IIID.
frAme bec~u~- of delayed deciD~on In derivlng the excitat~ on model parameters, all the seven ~ 1 L - are a~Du~ed to be of l~ngth 5 ~5 mD or forty-si% DampleD However, for the l_st or -venth Dubframe, thQ end of D,bf updateD DUch a~ the ad~ptLve CO~ update and the updatQ of the loc_l ~hort term predictor tat~ vA-~Ahl~ ~re c~rried out only for a D~'~ leAgth of 5 5 mD or forty-four sampleD
The short term predictor FA ~- or lin-~r prediction fil~
ter p~ram ters are interpolated from 2lubf to m'f The lnterpolAtion iD c~rried out ln the a~ < ~l~tion dos~in The n~Arr--l{ -~ ~ lo~ tlon ?ff~Ci d-rived from th~ ne~
filt~r: ~''{r{~nt- for th~ D~ond llne_r ~_ '{~lon an~lyDi~ win~
dow _re denoted ~1- {~ for th~ pr~vlou~ ~0 m fr~me ~nd by {~2(1)} for th~ current 40 mD frame for O _i<10 with ~_1(0)-~2(0)-1 0 Then th~ lnterpolated ~.L~ Ation coef-fl~ients {~'m(~)} ~re then given by m(f)- 'm ~2(f)~[l~vm~ ~ l(f)~ 1 _m<7,0 < f~ 10, 2~ 65546 ~ wo 95/~824 p~.", . ~4~77 ;
or.in vector notation ~ m VmP2+~l~Vm~P~ m~7.
Here, vm is the interpolating weight for subframe m. The inter-polated lag~ {P~m~}~ are ~ub~e.~ tly con~,..LLad to the short tQrm pr~dlctor filter coQfficient~ {a'm( ~
Th~ choice of interpolating weight~ affect~ voica quality in thi~ mod~ ~iqn1f~c^ntly. For thi~ rea-on, they must be determined c~r~fully. The~ int~rpolating weightJ vm hav- beQn detormin~l for subfram~ m by m~n~m~z1n~ the mean ~qu~r~ error between ~ctual ~hort term ~pectral envelope Sm J(~) And the inturpolated short torm power ~pectral envelope S~m J(~) ov~r all speech frame~ J of a very large speech databa~e. ~n other word~, m is det~rmin~d by ~n~m~ 7ing E, ' ~j 21 l¦S,.,,t~)-S .,J~ 2dt,~.
IS the actual A..loc< .-lAtion: ~f~ for ~ ~f m in ~rame J ar- d~not~d by {~ J(k)}, th n by d~finitlon Sm,Jtw) ~ m J(k) e~~wk 0 ~ k -- 2~ --`~ . ` 21 65546 Woss/2ss24 ` ~ ` ;` r~ Q~77 Sub~tituting the abov~ ~quations into thQ pLe- '~n~ equation, it can b- ~hown thAt minimi2in~ Em is equivalent to min;miZinSJ E~m wher~ ~ m is giv~n by m J k~ [om,Jtk) ~' m,J(k)]2, or in vector notAtion ~ m ~ m,J~~ m,J I 1 2, wher~ p~l- ts the vector norm Sub~tltuting p ~ J
into the sboY ~qu~tion, dlffQrenti~ting with r~pect to vm and ~-ttln~ lt to 2~ero r-~ult~ in -Y~
~; lx~
wh-r~ SJ '2 J~ '-1 J 8nd ~,J 'm,J '-l,J and ' SJ,~,J
i- th- dot product b~tws~n v~ctor~ SJ ~nd ~m J The vslue~ of vm calculsted bY th~ aboY method u~ing a v-ry large ~p~qch databa~e ~r- furth-r fin- tun d by li~t-ning tQ~t~
I!h targ-t ~roctor taC for th adsptlYe ~ narch i~
r lat d to th- ~p -ch Y-ctor ~ in ~ach ~ ~ bY -~taCLZ
H r~ th- quar low~r t~^nrl~- toQplits mstrl~ who-~ first column contsin- th- i~pul~ re~pon~- o~ th- 1nt~pol~ted short t~ t^~ {8 D~(f)~ for th~ ~ ~ ~ snd ~ i~ the veceor rort~n~ng it~ z~ro input ~ n~- Th- tsrSI-t v-ctor taC L- most ~ily cslculat~ ubtr_cting th- s~ro lnput -a~ ~3 ~ ':om _ 29 --, .
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ wo 95/288z4 ! 2 1 6 ~ 5 4 6 ~ 77 the speech vector 8 and filtering the difference by the inver~e ~hort term predlctor with zero inlti_l state~.
The adAptive co~ search in adaptive ~o~ho~lrq 3506 and 3507 employ~ a spectrally weLghted mean ~quare error ~i to mea-3ure the diJtance between a candidate v~ctor rl and the target vector taC as given by ~ i ( tac~ r ~ ) W( tac~P ~rf ) -Here, ~'1 is the a~ociated gAin and ~ is the spectral weighting matri~ iJ a po~itive def initc symmetric toeplit2 matri~c that i~
d~riv~td from the truncated impulJ~ e of the ~ irJhtr~d ~hort t~rm predictor with fllter, ~f1~ t~ ~_ m(i)7 }. The ~, ~rJhtin7 f_ctor 7 iS 0.8. Sub~tituting for the optimum ~i in the abov~ e~preJsion, the distortlon term can be rewritten aJ
T t~l]2 i taCl~taC-.~
wher~ the correlatlon term t~C~Ilrl and ei i~ the energy term rlT~lrl. Only tho~e rAnrl~rlAte~A ar~ c~n~i~' ~ that have a po~ltlve corrnlation. ~he be~t candidate vector~ are the one~
that have po~itive correlations and thc highe~t value~ of t,$,2 ~1 wossl2ss2t i~ 2 ~ ~ ~ 5 4 6 F~ 'Ot577 i The c_ndldate vQctOr rl coLL~ dO to dlfferent pitch te-lays The~e pLtch del_ys in sample~ liQ in the rAnge t 20 ,146 1 Fraction-l pitch dQlays arQ possible but the fractioA~l part ~ is restricted to b~ either 0 00, 0 25, O SO or 0 75 The candidate vector ~OLL ~ n7 to an integer delay L is simply read from the vdaptive ~ o~ l~, which io A collection of the pAot excitttion sampleO For a mixed (intQger plu!v fraction) delay L+f the por-tion of the adAptive cod~ho 1 cQntered _round thQ Oection cor-responding to thQ integer dQlay L io f llterod by a polyphave f 11-tar c~LL~ nA~n~ to fr_ction f T- lete candidatQ vQctOr~
~;OIL v~ Aing to low dQlay VA-1UQJ 1Q~ than a suhfr_me length are complQted ln the same m~nn~r aO sugge~ted by J. C ` 1I Qt al ~uprA Th~ polypha~e fllt~r; ~ nts are derlved from a pro-tOtypQ low p o8 filter drsl~n~i to h_VQ good pa~QhAnA as well as good ~vL~,~b~nd ch racterl~tic~ ~_ch polyph_~e filter ha~ 8 tap~
Tha Ad_ptiv~ c~ Q_rch do~ not s~arch _11 candidate vectorJ For thQ f irst 3 0~ -, a 5-bit sQ_rch range is de-te~;nad by thQ tiQcond quantlzed op~n loop pitch eOtimate P 1 f th~ prevlou~ 40 mr framo _nd th~ flrtlt -nt~ e~ op~n loop pitch -tim_to P 1 of the curr~nt 40 mt~ fr~ If th~ prevlou~ ~od~
w~r~l B, th~n the Y_lUQ of P I 1- talcen to b~ thq la~t ~ ,bf L
pitch d-lay in th~ provlou_ fr_m~ ~or th~ t ~ D.'' -~1~ thi~
S-blt ~-~rch rangs i- d~ by th~ econd qu~nt i ~ ~ open loo~
pltch ~ti~te P 2 Of th~ current 4 0 m~ fr_mQ and th~ flr~t qu~n-tized opan loop pitch e~timAte P l of th~ current 40 m~ frA~
}ror th~ iir-t 3 ti~ this S-bit ~Arch r~nge i~ ~plit in:o 2 4-blt r_ng~ wlth aach r~ngQ c~ntara~A around P 1 and P 1 I f =
~ wo 9~/28824 6 ~ ~ 4 6 P ~ I, ., ~ ,~, 77 the~e two 4-bit r~nge~ overlap, then ~ ~Lngle 5-bit range ia u~ed which is centered around {P' l+P'1}/2. Similarly, for the laat 4 ~ hf --, this 5-bit s~arch range is split into 2 4-bit ranqes with each r~nge centered around P'l and P'2. If these two r-bit ranges overlap, then a single 5-bit range i~ used which is cen-tered ~round ~P'l+P'2}/2.
The search range sQlection also det~rmin~Q what fractional re~olution is needed for the clo~ed loop pitch. Thls de~ired fractional re~olution is deto~insd directly from the quantized open loop pitch estimat~s P' 1 and P~ 1 for the first 3 subframes and from P'l and P'2 for the la~t 4 8..hf ~. If the two deter-mining open loop pLtch ~timatQ~ ar- within 4 intQgQr del~y~ of Qach othQr re~ulting in a ~ingle 5-bit search rangQ, only 8 inte-g~r delay~ ~.. te~d around the mid-point are ~Qarched but frac-tional pitch f portion can ~sume valu~ of 0.00, 0.25, 0.50, or 0.75 and are th~..,fGl~ also searched. Thu~ 3 bit~ are u~ed to ~ncode the integer portion while 2 bit~ are u~ed to encode the fr~ctLonal portion of the clo~ed loop pitch. If thQ two determin-ing open loop pitch estimatQ~ arQ within 8 intQger dQlay~ of each other re~ulting in a ~ingle 5-bit ~arch rangQ, only 16 int~ger d l~y ~ round thQ mid-point aro ~Qarched but fractional pitch f portion can a~sumQ value- of 0.0 or 0.5 and are therefore al~o 8 ~ ~ ~ 1. Thu- 4 bit~ are u~ed to encode thQ intQger portion while 1 bit i~ u~Qd to encod~ th~ fraction~l portion of the clo~ed loop pLtch. If thQ two dQtP~in{n~ open loop pitch e~tinate~ are morQ than 8 integer dQlay~ apart, only lnteger d~l~y~ ., f~0.;
only, ~r~ rched in either the ~lngle 5-blt ~arch r~nge or the WO 95128824 1; ' ! .... 2 1 ~ 5 5 4 6 ~ ~ 1 / " ., s , 77 2 ~.-b$t search ranges tetermined. ThUR all 5 bits are spent in -l{n~ the integer portion of the closed loop pitch.
The ~earch c lr~i ty may be reduced in the ca~e of frac-tional pitch delays by first searching for the optimum inteqer delay ~nd ~earching for the optimum fractional pitch delay only in it~ n~j~hhorhr od. One of the 5-bit indice~, the all zero index, i~ c~ ~ for the all zero adaptivQ co~ m1~ vector. ~his is a~ -ted by trimming the 5-bit or 32 pitch delay search ranqe to a 31 pitch delay search range. A- indlcated before, the search i~ restricted to only positive correlatLon~ and the all zero index is chosen if no such positive correlation is found. Th~ adaptiYe co~ ol~ gain 18 d-tr~m{- ~ after s~arch by quantizing the ratio of the optimum correlation to thQ optimu~ energy u~ing a non-uniform 3-bit quantizer. Thi~ 3-bit quantizer only ha~ po~itive gain values in lt since only po~ltive gaLn~ are pos~ible.
Since delayed ~e~ ion i~ e~nployed, the adaptive codr~hoolr s-arch l,~l r~3 thQ two bQ~t pitch dQlay or lag candidates in all Lt~ . Purtl ~ for ,.~ '~ two to ~i~c, thi~ ha~ to be t~d for th~ two be~t target v~ctor~ by the two bQ~t s-t- of ~citation modQl F L d~riYud for the previou~
in the currQnt frame. ~rhi~ re-ult~ ln two be-t lag can-didat~ alld the as~ociated two adaptiYe ~ r gains for hl bf - on- and in four be~t lag c~ndidat~- and the a~ociated four adaptlve ~odn~ovl~ qain~ for "~bf J~ two to ~i~c at the end of th~ ~earch proce~. In each ca~, the targ-t vector for the flsed :: -':~`- i~ derived by ~ubtractinq th~ ~caled adaptive '~~ Dc'- v~ctor from the target for the ataptive co~ ook ~earch, (~ W095128824 ,: . 2 1 6 5 5 4 6 .~,1/U., _'0~577 . _ ~ i,',"
i-e-~ t~e ~ t~C-P Optropt~ where rOpt i~ the seleeted adaptive ho ~lr vsetor and Popt is the asrociated adaptlve cod~ho~
gain .
In mode A, the fix~d cod~hook eonsists of general excitation pulse shape~ eonstrueted from the dLserete Jinc and co c fune-tlons. The Jfne funetion i~ defLned ar Jlne~n) ' ~frn~,rn~ ~ n - O
~fne(0) - 1 n - O
~nd the co~c funetion i~ defLned ar coJc(n) . I-coJ(rn~ , n - O
~n COJC(0) ' 0 n - O
Wlth the~e d~fLnitions Ln mind, the g 1~-- ' exeltation pUlSQ
~haper are ~O..~.L ,.. Lol ar followr~
Zl ( n ) - A ~fnc( n ) I 1~ co~c( n+l ) ~ s l(n) - A Jfne(n) - B co!rc(n-l) The w~ight~ A and El nr~ eho~-n to ba 0.866 ~nd 0.5 respec-tLvely. With the Jfne and COJC f~ t~n~ timQ alignQd, they cor-rQspond to whnt is known a~ zfne ba~i~ f~nrt~^n~ sO(n). Inform~l i~t ning tQ-t~ ~how that ~ - r~fted pul-- shap~ improv~ voice uality of the ~ynt~ 7~ ~peQeh.
The fised ~ for mode A eon~i~t~ of 2 parts eaeh haYi:lg 45 VectOrJ. Th~ fir~t p~rt eonrirt~ of the pul~e rh~lpe z l(n-~S) and i~ 90 ~ample~ long. The ith veetor i~ ~imply the veetor t!at ~tart~ fro~ the ith c~ entry. The ~eond p~rt eon~i~t~ of pe rl(n-~S) ~nd ~ gO ~ple~ long. ~re ~gain, the W09S/28824 ~ 6 ~ o~ ~ 04 7, ~
ith vector i~ simply the vector that starts from the ith rod~hoo entry. ~oth c~.dPh~Qo~A are further trimmed to reduce all small valuus q~peci~lly near the beginning and end of both cod~hool~ to zero. In addition, w~ note that every even ~ample in either co~l~ho~ is identlcal to zero by definition. All this contribute~
to making the ,~,A~ho.~-~ very ~par~e. In addition, we note that both c~ rQ overlApping with ad~Acent vectors h~vinq all but on~ entry in common.
- The ovqrl Arp~n~ nature and th~ spAr~ity of the ~,o.lrho,~ are ~xploited in the co~l~ho~ arch which u~e- the 8A e di~tortion measure as in the adaptivQ coA~ search. This measure calcu-latQ~ the dl~tance between the fixed co~ target vector t~c ~nd every candidate fixed cod~ vector cl _-lSi ' t t~C-~ lCi ) W ( t~C-~ iCi ) Where W i~ the sAme spectral weight$ng mAtrix u~ed in the adaptive ~o~n~olc search And ~ the optimum value of the gain for that ith ~ lc vector. Once the optimum vQctOr ha~ been ~elected ~or each c~-~ol~, the ~ g~ln mAgnitude is quan-tized out~ide the ~e_rch loop by, i~ g the r_tio of thQ opti-mum corr~lation to the optimum energy by ~ non-uniform 4-bit qu~n-tiz~r in odd ~ nd a 3-bit dlfi~ AI non-uniform qu~n-tiz-r in n~en A--''' . E~oth q--nt~r~ h~ve z~ro gAin a~ on- of th ir entri~. The optimal di~tortion for each ~ th-n c~ lAted and the opti~al .ud~ s-le~te~.
The fixed c~ ol~ inde~c for each ~ in the r~nge 0-44 if th~ optimal c~ from ~ 1~n-45) but i~ mapped to :;
~ W095/28824 ~ ,`` r~ c~ol'77 the range 45-89 ~f the opti~l ~a~ on~ from zl(n-45) By com-bLnLng the fixed ~ hook indLces of two consecutive frames I and J_~ 90I+J, we can encode the re~ultlng index u~ing 13 bits This i~ done for 8 i~ -- 1 and 2, 3 And 4, 5 and 6 For ~ubframe 7, the fixed ~o~l~hook index i8 simply encoded u~ing 7 blts The fixed codebook gALn sLgn i~ encoded u~ing 1 bit Ln all ~
~ 'f ~. Th~ fLxed co~iAhook g~in mAgnLtude i8 encoded u~ing 4 bLts ln 8 h' - 1, 3, 5, 7 ~nd u~Lng 3 blt~ ln r~hf - 2, 4, Duu to delAyed ~e~ilTin~, there _re twa tArqet vector~ t8C for thQ fLxed cocl~ hont~ earch Ln the fLr-t ~ ~nding to the tra be~t l~g c~ndLdate- and theLr .c..... ~,,lLng gaLn~ prov$ded by the c~o-ed loop AdaptLve col~hook seArch. For ~-lhf ~~ two to ~-vQn, there Are four target vector~ c~ to the two be~t A-t~ of excitation model FAr Le,O det~ for the previous 8~ }f ~o far _nd to the two be~t lAAg cAndLd~te~ _nd their g~in~ provided by the ad~ptive ~ hook ~e~rch in the current 9 '' . The fixed co~hook ~e_rch i8 th~,efc ~ cArried out two tlme- Ln _ ~ ~ on and four tLme~ Ln ~--hf ~ two to ix 3ut th~ ty do-~ not ~-- -r- in ~ proportLon_t~ m~nner bec_u~e Ln e~ch _ ~ , the Qnergy ter~ c~!lllcl _re the ~e It i~
only t~ ~n~ Atinn term~ tT~C~ICl th,t _re ~t~f~'~ ~ Ln e~ch of th~ two ~ - -- for s~'' on~ and Ln e~ch of th~ four ~earche~
Ln ~1 ' - two to even Delayed JV Al~ earch helps to smooth the pLtch _nd gain CV~ -- ' A Ln _ C~P coder Delayed ~ i nn ia e~ployed in thi~
-- 3s --wo ssi~2ss24 ~- ? i -. - . 2 ~ 6 5 5 4 6 P~llu~, ~4~77 !
. .
invention in Duch a way that the overall codec delay is not in-creas~d Thus, in every subframe, the cloDed loop pitch search PLVI ~6i~ the ~ best estimates For each of the-e M best estimateS
and N best previou-D nl` f parameter~ IN optimum pitch gi~in indices, f i xed ~ h~nk LndiceD, f ixed ~od~ho~k gain indices, and fixed ~ h,o.~- gain DignD ~re derived At the end of the .~' , the~e ~N solutions are prunad to the L best using cumu-lative S~R for the current 40 m~ frame a~ th~ criteria Por th~
fir~t Dl ~ ~ ~2r ~1 and ~2 are u~-d ~or the laDt ~ hf ~2, N~2 and L~l aro UD~d I'or all other 8 ~hf c- -, 1~2, iN-2 and L-2 are used Tho delayed ~ inn approach i8 particularly ef-fectlve Ln the tran~ltlon of volced to unvoiced and un~roiced to volced r~gionD ThlD delayed ~le~ n i ,~ J~-l re~ultD ln N time~
th~ le~ity of the clo-ed loop pitch sQarch but much le~- than ~N times the ~ ty of the fix~d ' ':~' search in each ~ir ' Thl~ i~ becauDe only the correlatlon termi~ need to be calculated ~N time~ for the fixed codGhon~ in each Dubframe but thia energy terms need to be c~lculated only once Tho optlmal ~ ~L;~ for each L ` ~ are detr~ - I only at th~ end of th- ~.0 m~. frame u-lng ~_ '~~ Th~ pruning of ~1 ltir?n- to L ~1~1Ut;r~n~ 18 ~tored for e~ch ii ~f ~ to enable th~
trac~ bacle An exampl~ of how t ~ c ~ 1 { hr~ 3ho~rn in PIG 20 The dark, th~ck line lndlcate~ th~ optlmal path ob-t~ined by t~_- ' - after the la~t ~ r In mode 8, the quantization lndlce- of both set~ of ~hort t-r~ 1- llctor r- Le~.D are tran~mitted but not thQ open loop pltch e~timat~- Th- 40-mDec speech fra 1~ divlded ~nto five _ 36 --WO95/~8824 2 1 6 5546 P~ . c~ 77 B~ each 8 msec long. As ln mode A, an interpolated set o~
filtQr coefficients is used to derive the pitch index, pitch gain lntQx, fiXQd co~hoo~ indQx, and fixod cod~-ho~i~ gain index in a cloDed loop analysis by syntheDis f ashion . ThQ cloDed loop pitch search is unre~tricted in itD range, and only integer pitch delDy are searched. The fixed ~ D a multi-innovation co~ hool~
with zinc pulse section~ aD well aD Hadamard sections. The zinc pul~e sectionD are well suited for ~ n~ nt ~ while the .lAI'i~-. d 9ection-D are better DUitQd for unvoiced segmQnts. The f$xed cod~hool~ sQarch ~ iB '~fied to take advantage of this .
The higher ln-~ ty lnvolved a~ wall aD tha highQr blt rate of the short term predictor r L6~ in mode E iB ~-Dted by a slower update of the excit~tion model r- ~LD.
For mode ~, th~ 40 mD. Jpoech frame iD diYided into five Dubf -. ~ach subfrDme iB of length 8 mD. or sixty-four ~ampleD. The excitation model parameters in each subframe are the adaptive co~lAh>o~ lndex, th~ adaptive . oAnho~ gain, the fixed ind~, and the fi~c d ~ g~in. Ther- 1D no fiXQd codA~ r gain -Dlgn since it i-D alway- poDitiv~ Dt eD-timateD of thesa ~!- ' ar~ de~ - uDing ~n an~lyDiD by -DyntheDiD
method in each D~ ~ . The overall be~t s-ti~at~ iD determ~ ~Dd at the end of the 40 mD. framQ u~ing a delayed ~ approach Dimil~r to mods A.
The Dhort term predictor r~ te D or lin~ar prsdiction fil-tQr E~- L~ D are interpolated from D~'r to '' in the tlon lag domain. ~he r 1~ ~i cu~co~ tion lags -- 37 _ woss/2ss24 ` 2 ~ 65S46 ~"~, I 77 d-rived from thQ quantized fllter coeffLcient~ fo~ the second lin-~ar prediction ~naly~i~ wintow ~r~ denoted a~ ti)~ for the pre~ious 40 ms. frame. The co~ ... ~..ding lag~ for the fir~t and ~econd linear prediction analysis window~ for the current 40 mls.
f rame are denoted by { P 1 ( f ) } and { r2 ~ f ) ~ re~p~ctively . The - 1; 7~ tion ensure~ that ~ -1 ( ) ~1~ ) ~ 2 ( 0 ) 1- 0 ThQ
int~rpolated autocorrelation lags ~m(f)~ are glven by ~ m(f) ~m p~ )+om ~l(f)+[l-~m-tm]~2(i)~
l~m~-5, 0<~ 10 or in vector not~tion ~ m ~m ~-1+m ~l+tl-~m-t].~2 l< m~-s.
Here ~m and Pm are the interpolating weight~ for a~lb~ m.
Th~ interpolation lag~ {~ m(~)} ar~ ly ....~_ L~i to the ~hort term predictor filter - ~c~Pnt~ {a m(~)}.
Tho choice of interpolating wei~Jhts i~ not ~- critical in thl- mode ~ it i~ in mod- A. ~T~ , they h~v~ be-n deter-mined u~lng th~ 8~ ob~ective crlt~rla a~ in mode A ~nd fine tun-lng t~l~m by li~t~ning te~t~. Th- v~lue~ of "m and ~m whlch m~n~m~-- the ob~ective cr~teri~ ~m c~n be ~hown to be rmC-~B
c2 -AB
S C-r,l,A
_ 38 --W095128824 2 1 6 55 46 P~ 577 where A ~ J I I P-1,J-~2,Jl I
B - S I I ~_l,J-t2,J1 1 2 C - <~-l,J-'2,J~'l,J-'2,J ' Sm ~ ~ <~-l,J ~~2,J~'m,J -'2,J ' ~m "m,J -~2,J~l,J -~2,J ~
Ac before, ~ 1 J dQnote~ the Au~oc~ tion lag vQetor do-rivQd from thQ q ~-nti i filtQr coQffici L~ of the second lin~ar predlction analy~L~ window of fr~me J-l, '1 J dRnote~ the a,~o~Ll~latlon lag vector deriv~d from the quantized filter coef-ficient~ of the fir~t linQar prQdiction analy~is window of fralDe J~ ~2 J denote- th- ~U oc~L.9lAtion lag vQctor derivQd from the filtQr ~ ~ of the ~eond linear prediction ~n~ly~i~ window of frame J, and 'm J d not~- th~ ~ctual A t6~ _lAtinn l~g vQCtOr dQrived from thQ ~peQeh ~ample~ in ~ of frame J
Th~ Ad~ptiv~ CC~IA~L~O~ ~e~reh in modl~ B i~ ~imil_r to th~t in mod~ A in that th~ target veetor for th~ ~Q~rch i~ dQrived in the sam~ mA~n~r and th- di~tortion mea~ure u~ld in thQ ~e~rch i~ the ~am~ However, thero ar~ ~ome diffr--- ~. Only all integer piteh dQl~y- in th~ rang- [20,146] ar~ s-arehed and no fraetional _ 39 --woss/2ss24 ; 2~ 65546 r~l,. 01577 pLtch d~lay~ are searched A~ Ln mode A, only poDitive correla-tion~ are considered in the ~earch and the all z~ro index cor-r~pnn~i~ng to an all zero vector iJ assigned if no po~itive cor-relations are found The optimal adaptive cod~ho~l~ index is en-coded u~ing ~ bit~ The adaptive ~dn~on~- gain, whLch i8 guaran-teed to be po~itive, iD g ~nti ~1 outside the search loop u~ing a 3-bit non-uniform guantizer ThlD quantizer is diff~rent from that u~d in mod~ A
AJ in mode A, del~yed ttQ~f r~o'l i8 employed ~o that ~daptive ~oleho~ earch p vl.~ æe thQ two be~t pitch d~lay candidate~ in all Dl b) . In addition, ln 8~ ~ - two to flve, thlD ha~ to be ~ ' for the two b~t target vector~ ,,co~l by th- two be-t s-t~s of excitation model ~ t~ derived for the previou~
r-' - resulting in 4 set~ of adaptive ~ lndLces ~nd ~ociated gain~ ~t the end of th~ _ ~r . In o~eh c~-e, the targut vector for the fixed ~ earch iD derived by ~ub-tracting the ~caled adaptiYe co~t~ol~ vector from the t~rget of th~ adaptive ~ ' '- veetor Th~ fi~d .: -'-~` in mod~ a 9-bit multi-innovation co~nh~A~ with thre~ nn- Th~ fir~t i~ r' veetor sum ~ctlon and th~ ~eond and third ~ LL - ar- r-l~ted to gener~l-i~ d ~ t~ r pul~- ~hap~ z l(n) ~nd zl(n) rQ~pQetivQly The~e pu~ h~pe- h~ve been defined earlier Th~ fir~t ~eetion of thi~
:~ : and the a~oei~ted seareh ~ b~ed on the pub-lieation by D Lin ~Ultr~-~a~t CISLP Coding U~ing llultl C~ -hoo~
Innovation-~, ICASSP92 W~ notQ that in thl~ seetion, th~r~ are -- ~0 --wo 95n8824 . . 2 ~ 6 5 5 ~ 6 ~ ' 0 1 7, 256 innovatlon vectors and thQ se_rch p~oc~lu.~ gu_rantees ~ po5i-tiYe g_in The Decond _nd third DectionJ have 64 innov_tion vec-torD e_ch _nd thuir sQ_rch p.~ d~.~ can produce both positive ~5 wHll aD nQgAtive gains One - of the multi-innov_tLon ~o~hook is the deter-miniDtic vector-sum code conDL.~L~d from the Had_mard matrix Hm The codo vector of the vector-~um code a~ u~ed in this invention is ~ sed as .
UL ' S ~im v m~n),0 ~ ~15, .. 1 wher~ the ba_iD vector~ vmtn) are ~lhtA1n~ from th- rowD of th-P-' r~-SylveDter mAtrix and ~im ~ ~ 1 The ba~i3 vector~ Are D~lected ba~ed on a 2e r partition of th~ P-' -d mAtrix The cod- vectorD of th I - rd vector-~u~ _ ~' are v~lues and binary valu d cote ~s,~ e Cp~red to previou~ly con~id-ered Alg~'~rAic codes, the HadamArd vector-~um cod-s are con-~.a Lo~ to pOD~ mor- lde_l f , ~ r and ph~e char~cteri~-ticD ThL~ i~ due to the b_si~ v ctor p~rtition ~chem~ u~ed in thi~ r {~ for th~ ~A~- r~ m~tri~ which can be i.,L~ ed a~
unLorm 1 { g of th~ ord~red r rd matris row vec-tor~. In contr_~t, non-unlform F ,l{'"J m thod~ h~vo ~_ 1u {nf~-{gr ro~ult-.
The second section of th~ multi-innovation c~-: ~ conDist~
of the pula~ Dh_p- s l(n-63) and i~ 127 ~mple~ long Th~ ith v ctor of thLs ~-ction i~ ~imply th~ vector th~t ~t~rt- from the ith ntry of thLs ~ction Th~ thLrd s~ctLon consistD of th~
wo ss/2ss24 ~ 2 1 6 5 5 4 6 r~ m ~ ~4~ 77 pUl~Q shapQ z l(n-63) ~nd i8 127 ~ampleg long. HerQ i~gain, thQ
ith vQctor of thi3 ~ection is ~imply thQ vector that start~ from the ith entry of thi~ sQction. Both thQ sQcond and third section~
en~oy th~ adYant~qe~ of an oYerlapping naturQ ~nd spar~ity th~t can be exploited by the s~arch ~L~ Le ~utt as in thQ f Lxed co~ in mode A. A~ indlcated earlier, tho ~earch pr4~ e i~
not restrLctQd to pos$tive corrQlation~ and ~L~Lefore both posi-tiYQ a~ wQll as nQgativQ gains can re~ult in the second and third ~ction~ .
OncQ thQ optimum Yector ha~ boen ~el~-~ for each sQctLon, thQ ~o~rho~ gain magnitudQ is q---n~ 1 outsidQ thQ ~Qarch loop by ql~n~r~-~n~ thQ ratio of thQ optimum correlation to the optimus~
nQrgy by a non-uniform 4-bit q~,~nei~or in ~ ~. Thl~
quantiz~r i~ r~fff '~ for the fir~t ~ection whil~ thQ ~econd and third ~ections U~Q a common quant$zer. All ql~~nt~ ~or~ have zero gain a~ one of their entriQ~. Tho optimal di~tortion for e~ch ~ction is then calculated and th~ optim~l ~Qction is finally ~e-lec~ed .
Th~ fi~d c~l~ol~ ind~c for Q~ch ~ in thQ range 0-255 if th optimal ~ YQctor i~ from thQ Ur' rd s~ction.
If it is f~om ths z_l~n-63) ~ction and tho gain sign i~ po~itiYe, it i~ mapp~d to tho r~nqQ 256-319. ~t i~ from the z 1(n-63) ~c-tion and th~ gain ~ign i~ nQgatil~o~ it i~ mapp~d to the range 320-183. 1~ lt l- ~rr~3 t-- zl(n-~ ) ~ th- 9~ lgn l~ ltive, lt :-- WO 95128824 2 1 6 5 5 4 6 ~ / L~. ~ 77 io mapped to thQ r~ngo 384-447 ~f it i~ from the zl(n-63) ~ec-tion and thQ gain 3ign i~ nQgativQ, it i~s m~pped to the r~nge 448-511 The re~ulting index c~n be encoded u~ing 9 bits The fixed co~ho~L g~in magnitude i3 encoded u~ing 4 bits in ~11 5 hf ~ or modQ C, thQ 40 m~ frame i~ divid~d into five ~L": ~ a~
in mod~ 8 Each _ ~- i8 of lQngth 8 m3 or 64 O~mple~l The excit~tion modQl p~rameter~ in e_ch ~ ~re the ~daptive ~odnh~) index, thQ ad~ptive co~ gain, thQ fixed co~lAh~
index, and 2 fiXQd co~nhoo~ g~in-, one flxed ~od~ho^l~ gain being A--_ ~te~l with each half of the ~ubframe Both are gu r~nteed to be po~itivQ and ~ if~ there io no Oiqn infon~tion ~ociat-d with th m A~ in both mode~ A ~nd B, bQot estimate- of thnOe pa-t~ O ar~ A~tD~m1n~ uOing an ~nalysiO by D~ ~t.fl~l~ method in ~nch - Th~ overall b~ot e-tim~te i~ d~to~ir~ t thQ end of thQ ~0 m~ fr~m~ u~ing ~ del~yed ~ n method idQntic_l to that uo~d in mode- A and B
The ~hort term predictor p te~O or linear pr diction fil-t-r ~ L~n _re int^ pol~ted from a ~ ~ to _ ~' - in the c ~ lag domain in Qxactly the same m~nner _0 in modQ
B Howev~r, th~ Int~rr~latinq weight- ~ nd m a-r different fr th~t u~ d in mod~ B Th-y ~r obt~~~l by u~Lng the proc--dure '~ ~ ~ I for modQ B but u~ing various ~ ~ d noi~
ourc~- ~- t--a i n t nq materi~l .
Th~ _daptlY~ e_rch in mod- C 1- ~ al to that in mod B escept th_t both po~itive a- w ll ~- nQg_tive correla-tlons ~r~ ~llowed in the ~Qarch Th optim~l _daptive ~boo) index i- oncod d u-ing ~ bito ~h~ adaptlY ~ gain, which -- ~,3 --Woss/zss24 ~ - '; 2 ~ 6S546 ~ 4577 could be either posltLve or negative, l~ gllAnt~ -i outside the sQ~rch loop u~lng A 3-blt non-uniform quAntlzer. Thi~ quantizer i5 different from th_t usQd ln eithQr mode A or mode B Ln that it h_s a more re~tricted range And may have negative value~ as well.
By ~llowing both po~itive ~ ~ell _~ neg~tive correlation~ in the sQ~rch loop ~nd by having ~ qu~ntlzQr with ~ re~tr~cted dynamic range, periodic artifacts in the synthesized bA~-~,tLv.u~d noi~e due to the adAptlve co l~ho ~ _re reduced CAnAl~-rAhly. In fact, tho ~daptlvQ C~ Ol~ now beha~reA moro likQ _nother fixed co~iAhoolr.
A~ in mode A And mode B, delAyed ~s~ n i~ e~ployed And the adAptive ,~~ o~ ~e~rch ~ h.- ~ the twv be~t cAndidAte~ in _ll ~ ~ -. In ~dditlon, in L ' ~ - twv to flv~, thi_ ha~ to b~
rQpeated for the twv target vQctOr~ L--' ' by the two be~t s~t~
of excitAtion model rA te~ dQrived for the previou~ g~
re~ulting in 4 ~et~ of adaptive ~A~ ' indlce~ and a~-oci~ted g~ins at thu end of thQ s.~ . In each ca-e, thQ target vector for th~ fixed _c '~': :k ~earch i~ derived by ~ubtracting the ~caled ~d~ptivQ ' ' ~' vQctor from thQ t~rget of thQ adaptlvQ ^'-'~ )~
v~ctor.
Th~ fis~d ~ t in mod C 1- a 8-blt multi-innovatlon '~ '- and i~ 'IC'A1 to th~ v~ctor ~um s~ction in thQ n~od- B fl~t~d multi-innov~tion c~ -. ThQ ~e ~oarch pro-cQdurQ ~ e i in thQ public_tion by D . Lin ~Ultra-Fa~t CELP
Codinq U~ing Nulti-Codshool~ ~nnovation~, ICASSP92, i~ used here.
ThQr~ are 256 ~ ' vQctor~ and thQ soarch p v.~u.~ guar_ntees ~ po~itivo g_ln. ThQ flXQd c~le inde~ i~ Qncod~d u~ing 8 blt~ .
_ _ _ _ _ woss/2ss24 - 2 ~ 65546 r~ Sl?$~77 Once thQ optimum co~0~0~k vector ha- been selected, the opti-mum correlatlon and optimum energy are calculated for the first half of the 8 hf - a~ woll a~ the ~econd half of th~ nubframe separately The ratio of the correlation to the energy in both halve~ are guantized ~n~ r~nd~ntly using a S-blt non-unifor~ quAn-tizer that ha~ zero gain a~ one of it~ ontri-~ The u~e of 2 gain~ per 8 b~ en~ure~ a ~h~ e,.u~u.Lion of the back-qround noi~e Due to the delayed r~r,~r~ n, ther~ are two ~et~ of optimum fixed co~ hor~i~ indice~ and gain~ in ~ one and four ~t~ in two to five The delay~d d~ ~l^n ~ - in modQ C i~
n~ to that u~ed in other mode- A and B The optimal par_m-oter~ for ~ach ~ are ~ L ~-- at the end of the 40 m~
frame u~ing an identical t The bit allocatlon among variou~ p~ L61~ i~ _ ri7ed in Figure~ 21A and 21B for mode A, Ylgure 22 for mode B, and Flg~re 23 for mode C The-e p- ~ are packQd by the packing cir-cu$try 36 of Figure 3 Th ~e I L~c- ar- packed in the ~am~
a~ th-y ar~ tabulated in th~- Flgur~ Thu~ for mod~ A, u~ing the name notation a- in Flgur~- 21A and 21B, th y are packQd into a 168 blt ~ise packet every ~0 ms in thQ fsll ng seqUQnCes ~IODEl, ~SP2, ACGl, ACG3, ACG4, ACG5, ACG7, I~CG2, ACG6, PISCNl, PITC~2, AC~1, SIGNl, FCGl, ACI2, SIGN2, FCG2, ACI3, SIGN3, FC~3, ACI4, SIGN4, FCG4, ACI5, SIGNS, PCG5, ACI6, SIG~6, FCG6, ACI7, SIGN~, PCG7, FCI12, FCI34, ~CI56, AND FCI7 For mode ~, u~2nq th~
a notation a~ in Figur~ 21A and 21B, th~ ~ - L6.. ar- packed into a 168 bit ~is~ pack-t ev ry 40 m;c in the foll~ n~ ~equ-nce2 - ~5 --. _ _ _ _ _ _ _ _ _ _ _ wo ~sn8824 ! 2 1 6 5 5 4 6 r~ m '4'77 MODEl, LSP2, ACGl, ACG2, ACG3, ACG4, ACG5, ACIl, FCGl, FCIl, ACI2, FCG2, FCI2, ACI3, FCG3, FCI3, ACI4, FCG4, FCI4, FCI4, ACI5, FCGS, FCI5, LSPl, and MODE2. For mode C, using the ~ame notation a~ in Figures 21A and 21B, they are packed into a 168 bit size packet evQry 40 m~ in the following ~ MODE1, ~SP2, ACGl, ACG2, ACG3, ACG4, ACGS, ACIl, FCG2_1, FCIl, ACI2, FCG2_2, FCI2, ACI3, FCG2 3, FCI3, ACI4, FCG2_4, FCI4, ACI5, FCG2 S, FCI5, FCGl_l, FCGl 2, FCGl 3, FCGl 4, FCGl 5, and MOD~2. The packing ~-~u~ e ln all three mode~ is elesi~n~d to reduce the sensitivity of an ~rror in th~ mode bit~ MODEl and MODE2.
The p~ck$ng i~ done from the MSB or bit 7 to ~SB in blt 0 from bytQ 1 to byte 21. XODEl occ~r1~ the NSB or bit 7 of byte 1. By te~tLng thi~ blt, we can deter 1ne whether the - -~~p~ech belong~ to mode A or not. I~ it 1~ not mode A, we te~t th~
~ODE2 that o~c~ri~ the LSB or bit 0 of byte 21 to decide between mode B and modQ C.
The speech decoder 46 (FIG. 4) i~ ~hown in FIG. 24 and re-ceiv~ the ~ 9~ speech bit~tr-am in the same orm a~ put out by th~ speech ~ncoder of ~IG. 3. Th~ p~rameter~ ar~ ~nrac~
~fter ~ ning whoth-r th~ roceived mode bit~ ate a 1rJt mode (l~ode C), ~ ~cond mode ~lode 13), or ~ th$rd mode (Xode A).
The~ are then u~ed to D~ iZe the speech. Speech decoder 46 ~ynths~ the part of the ~ign~l c~.L~.~..1ing to the frame, ~ '1ng on the second ~et of filter coeffic$ent~, lnd~-p~n~ nt~y of the fir~t g~t of filter coefflc$ent~ ~md the fir~t and ~econd pitch e~timate~, when the f rame i~ dQto~1 n~d to be the 4 2 1 65546 ~ 77 fir~t mode (mode C); ~ynthesizQs the part of the ~ignal cor-re~pont;n~ to the fr~me, Aep~n~lin5~ on the fir~t and ~econd set~ of fllter coQfficient~, inA~ ~ tly of thQ fir~t and second pitch e~timates, when the frame is de~erm~ned to be the second mode (Mode B); and ~ynthe~i~es a part of the ~ignal c~L.. ~onding to the fram~, dep~"A~n~ on thQ ~-cond set of filter co~ffiri~Qts and the first and ~econd pitch e~timatQs, ~nAApAn i tly of the fir~t ~et of filter ~oeff~ nte, when the frame i~ det~in~d to be the third mode (mode A) In addition, thQ speech decoder receives a cyclic reA~ln~i~nry chQck (CRC) ba-ed bad framQ indicator from the channel decoder 45 (FIG 1) Thi- b~d fr~me indictor fl~g i~ used to trigger the bad frame error m~elking and error ~ ction~ (not ~hown) of th~
decoder The~H can ~l~o be ~ by some built-in error d~-tection ~chem~
Speech decoder 46 tQ~ts thQ ~SB or bit 7 of byte 1 to se~ if the - ~rel speech packet c~ o d~ to mode A OtherwiJe, th~ LS~I or bit 0 of byt~ 21 i- t~t d to ~e if the p~cket cor-r~ to mod- 8 or mod~ C Once thQ corr~ct mod~ of thQ ro-c-ived ~ peech pack~t i~ d~tn~m~-~, th~ }~ t~L~ of tho r~c~iv~d l~p~ch fr~me ar- ~, ' i and u~ed to ~yntheJize the ~peQch In ~ddition, th~ pe~ch decod r reCeivQ- a cyclic redun-d~ncy ch~ck (CRC) b~ed bad frame indicator from th~ channel de-coder 2S in l!'igure 1 Thi~ bad f rame indicator f lag i~ u~ed to trigg~r the b~d fr~m~ m~king and error L6C~ L.r portion~ of peech d-coder Th~ can al~o b~ ~ris, ~ by ~om~ built-in er-ror dQtectlon scheme~
- ~7 _ W0 sS/2ss24 ' ~ ' ~ 2 1 6 5 5 4 6 r~ c ~577 In mode A, the received ~Qcond set of line spectr~l fLe~ y indlee~ ~r~ used to reconstruct the qu~ntized fllter coeffLcients which then are converted to aucoc~r cl~tLon lags In e~ch ~l-h' ~~ the ~t~;c~-L,l~tion laq~ are interpolated using the same weight~ ~ u~ed Ln the encoder for mode A and then cu~cLLed to ~hort t-rm predictor filtor ~ fi~nt~ The open loop pitch indices ~IrQ .~ L~e1 to q -rlti - ~ open loop pitch value~ In ~aeh subframe, the~e open loop valuc-~ Ar~ us~d along with e~ch r~eeivod 5-bit adaptive - '-'- '~ inde% to ' ~^~{r^ the pitch do-lay candidate The ~daptiv~ co~ veetor CULL~ jn~ to thi~
dQl~y i~ de~ ' fr the adaptive ' -~ 10~ in Figur~ 24 The adaptivra c~1rho<,k g~in inde~c for e~ch ~.` '. is u~ed to ob-tain the adaptive c ~l~ galn whieh th~n i- ~pplied to the mul-tiplier 104 to ~eal~ the adaptive ~ veetor The fi~c~d v~etor for e~eh ~ubfr~me i~ irlf~rred from the fi~cQd 101 from the ~eeeived fi%ed ~ lr inde~c ~-oei~ted with that subfra~e ~nd thl- iS ~ealed by the ~ d co~nhool~ g~in, obt~1- ~ from th~ reeeiYc-d fi%~d ~ gnin ind~ nd the ~ign ind~c for thAt .,'f~ , by ~ultlpll-r 102 aoth the ~e~led adap-tiVQ c~ '- veetor ~nd tho ~eal~d fi%ed ~ '- vector are ~ummsd by u~m~r 105 to produce an ~elt~tlon ~ign~l whleh i~ en-hane-d by a plteh prefllter 106 a~ in L A Ger~on and M ~ Ja~uik, ~upr~ t~t1t n slgn~l i- u~ed to d~rivQ the hort term predietor 107 nd the ynt~ speech i5 e~ -ly further ~n~ ad by n glob~l pole-zero filter 109 with built in peetr~l tilt corr-etion ~nd enQrgy r~ z~tion At th~ end of eaeh D~' f~ , thl~ ad~pti~e e~ k iS upd~ted by W0 95/28824 - 2 1 6 5 5 4 6 r~ z,,s, ~ 1'77 the excLtatLon signal a~ indicated by the dotted line in ~lgure 25 .
In mode B, both ~et~ of line spectral frequency indices are used to recon~truct both the fir~t and second sets of quantized f$1ter ~o~ffl~iants whLch 8~ tly are converted to au~ tLon lags. In each Dl ` ' r the~e ~ltoc~ latLon l~g~ are interpolated u~ing exactly the ~ame weight~ aJ used in the encoder in mode B and then converted to short term predictor coeffi~-iants. In each subframe, the received adaptive co~lahoo Lndex i~ used to deriva the adaptLve cod~hoolr vector from the ~daptLve ~ ,ho L- 103 and the rec~Lved fLXQd ~ ~'~ '- index i~
used to derLve thQ fixed co~h~k gain indQx are used Ln each subf rame to retrievQ the adaptive ,~.h.~ gain and the f ixed cori~ho~r gain. The exeit~tion vQCtor L~ L~d by ~caling the adaptivQ -~ veetor by thQ adaptivQ col~hool~ gain u~ing multiplier 10~, Yealing thQ fixed ~vd~ho~O~ vQetor by the fix~d ~od~h~ok gain u~ing multiplier 102, and ~umming them using ~ummer 105: A- Ln mode A, thi- L~ i by th- piteh prQfilter 106 prior to ~..L'--i~ by thQ short te m predietor 107. ThQ synth2-~12ed ~p~Qeh i~ further ~nllr-~l ~ by th~ global polQ-zero po~tflltQr 108. At the end of e~eh - '' , thQ adaptLve h>o~ i- updated by thQ Qxeitatlon sLgnal a~ indie~ted by the dotted line in FlgurQ 2~.
In mode C, thQ reeeLved seeond ~et of lin~ 8p~etral f~
indiee~ arQ u~ed to reeonJtruet the qu~nt~ filter eoefficientJ
~hieh thQn are c~ ed to au~occ LL~,latlon lag~ . ~n each ' f , th~ ~- Locc ~ ~lation lag~ aro int~rpolatQd u~ing th~ Jame _ ~,g _ W095~28824 ; ~ 2 1 65546 r~ cl 77 w~ight~ a~ u~od in the encoder for mode C ant then converted to hort t~rm predictor filtQr coefficients In each subframe the received ataptive co~eho~k index i~ used to derive the adaptivQ
corlr~hook vector from the adaptive co~hool~ 103 and the received fixed ~ index i3 u~ed to derive thQ fixed codr~ho~l~ vector from the fixQd coARh~o~ 101 ThQ adaptivQ c~dr~h~k gain index and th~ fixed co~lrhoolc gAin indice~ are used in e~ch 3ubframe to re-tri~v~ the ad~ptive . ~ Ihc lc gain and the fixed _c~ - g~ins for both hAlve~ of thQ ~ The excitation vector is recon-~ by scaling thQ ~daptivs ~o~R~ook vector by thQ adaptivQ40dAl"oo~- gAin u~ing multiplicr 10J, llcalinq the fir~t h~lf of thQ
fl~ed ~ vQctOr by the fir~t fi~ed ~nl~oA~ g~in using ~ul-tiplier 102 and the s~cond half of the fl~ed ~ v~ctor by th~ ~econd fi~d co~J~hoolc g~in u-inq multipliQr 102, and ~ulmninq th~l scaled adAptiv~ ~nd fi~ed .~n~ok v~ctorJ u-ing ~ummer 105 As in mode~ A and B, this i~ ~nhAn~r~ by thQ pitch prefilter 106 prior thQ synthe~is by the ~hort t~rm prediceor 107 The ~ynthe-sized ~p~ch i- furehor a ~~ by the qlobal pol--zero postfilt~r 108 Th~ r ~ ArA of th ~ pitch prefiltQr and global po~t~llt~r u-ed in e~ch ~odQ ar~l dlfferQnt and are t~ilored to ~ch ~od . At th~ Qnd of each ~ ~ , th~ adaptiv~ iJ
upd~t-d by th~ e~cit~tion ign~l _- indicated by th~ dotted lino in Flgure 2~..
A- an_ltern~tiv~ to the illu~trAt~d 1 t, th~
n mAy be practiced wlth a ~hortQr fra~, ~uch a- ~1 22 5 m~
fr~e, a~ hoYn in Fig 25 With ~uch a fra~, it miqht b~
d~-irAhl~ to proce~- only one LP an_ly~i~ window p~r fra~
wos~/28824 2 1 ~546 Pcrlus9s/o~s77 in~tead of the two LP analysis windows lllustrated. The analysis window might begin after a duration Tb relative to the beginning of the current f rame and extend into the next f rame where the window would end after a duration Te relative to the beginning of the next frame, where Te ~ Tb In other wordJ, the total duration of an analysis window could be longer than the duration of ~
frame, and two consecutiYe windows could, therefore, encompas~ a particular frame. Thus, a current frame could be analyzed by processing the analysis window for the current frame together with the analysis window for the previous frame.
Thu~, the pref erred co~munic~tion sy~tem detects when nois~
i~ the pred i n~nt - t of a signal f rame and encodes a noise-predominated frame differently than for a speech-predomi-nated frame. Thls ~pecial ~n~-oA~ n~ for noise avoids some of the typical artLfacts produced when noi~e 1~ encoded with a scheme optimized for speech. This special ~ncoAing allow improved voice quality in a low rate bit-rate codec systQm.
Additional advantage~ and '{fic~tlon~ will re~dily occur to tho~e s3cillQd in the art. T~ invQntion in it~ broader aspects is therefor~ not limited to the spQcific dQta$1s, representative ap-par~tu~, and illu~trative example~ shown and de~cribed. ~arious modif ic~tion~ and Yariation~ can b~ made to the present invention ~ithout depa~tlnq from the ~cop~ or spir~t of the inventiorl, and it i~ intend~d that t~e pr~sent inYention cover the modifica~ions a~d ~ariAtion3 pro~ided thQ~ co3e with~n th6~ scope of ch~? 2ppende~1 c ~ ~ims and their equi~ent& .
et
Claims (12)
1. A method of processing a signal having a speech component, the signal being organized as a plurality of frames, the method comprising the steps, performed for each frame, of:
determining whether the frame corresponds to a first mode, depending on whether the speech component is substantially absent from the frame;
generating an encoded frame in accordance with one of a first coding scheme, when the frame corresponds to the first mode, and an alternative coding scheme, when the frame does not correspond to the first mode; and decoding the encoded frame in accordance with one of the first coding scheme, when the frame corresponds to the first mode, and the alternative coding scheme when the frame does not correspond to the first mode.
determining whether the frame corresponds to a first mode, depending on whether the speech component is substantially absent from the frame;
generating an encoded frame in accordance with one of a first coding scheme, when the frame corresponds to the first mode, and an alternative coding scheme, when the frame does not correspond to the first mode; and decoding the encoded frame in accordance with one of the first coding scheme, when the frame corresponds to the first mode, and the alternative coding scheme when the frame does not correspond to the first mode.
2. The method of claim 1 wherein the step of determining includes the substep of:
comparing an energy content of the frame to one or more thresholds.
comparing an energy content of the frame to one or more thresholds.
3. The method of claim 1 wherein the step of determining includes to substeps of:
comparing an energy content of the frame to a one or more thresholds; and subsequently updating one of the thresholds, using the energy content, when the frame corresponds to the first mode.
comparing an energy content of the frame to a one or more thresholds; and subsequently updating one of the thresholds, using the energy content, when the frame corresponds to the first mode.
4. The method of claim 1, wherein the determining step includes the substep of:
comparing a spectral content of the frame to a spectral content of a previous frame.
comparing a spectral content of the frame to a spectral content of a previous frame.
5. The method of claim 4 wherein the comparing step includes the substeps of:
determining a set of filter coefficients corresponding to the frame; and determining another set of filter coefficients corresponding to a previous frame.
determining a set of filter coefficients corresponding to the frame; and determining another set of filter coefficients corresponding to a previous frame.
6. The method of claim 1 wherein the determining step includes the substep of:
comparing a fundamental frequency of the frame to a fundamental frequency of a previous frame.
comparing a fundamental frequency of the frame to a fundamental frequency of a previous frame.
7. The method of claim 1 wherein the step of determining includes the substep of:
comparing a number of zero crossings of the frame to one or more thresholds.
comparing a number of zero crossings of the frame to one or more thresholds.
8. The method of claim 1 wherein the step of determining includes the substep of:
measuring transitions in amplitude within the frame.
measuring transitions in amplitude within the frame.
9. A method of processing a signal having a speech component, the signal being organized as a plurality of frames, the method comprising the steps, performed for each frame, of:
analyzing a first part of the frame to generate a first set of filter coefficients;
analyzing a second part of the frame and a part of a next frame to generate second set of filter coefficients;
analyzing a third part of the frame to generate a first pitch estimate;
analyzing a fourth part of the frame and a part of the next frame to generate a second pitch estimate;
determining whether the frame is a one of a first mode, a second mode, and a third mode, depending on measures of energy content of the frame and spectral content of the frame;
synthesizing a part of the signal corresponding to the frame, depending on the second set of filter coefficients and the first and second pitch estimates, independently of the first set of filter coefficients, when the frame is determined to be the third mode;
synthesizing the part of the signal corresponding to the frame, depending on the first and second sets of filter coefficients, independently of the first and second pitch estimates, when the frame is determined to be the second mode; and synthesizing the part of the signal corresponding to the frame, depending on the second set of filter coefficients, independently of the first set of filter coefficients and the first and second pitch estimates when the frame is determined to be the first mode.
analyzing a first part of the frame to generate a first set of filter coefficients;
analyzing a second part of the frame and a part of a next frame to generate second set of filter coefficients;
analyzing a third part of the frame to generate a first pitch estimate;
analyzing a fourth part of the frame and a part of the next frame to generate a second pitch estimate;
determining whether the frame is a one of a first mode, a second mode, and a third mode, depending on measures of energy content of the frame and spectral content of the frame;
synthesizing a part of the signal corresponding to the frame, depending on the second set of filter coefficients and the first and second pitch estimates, independently of the first set of filter coefficients, when the frame is determined to be the third mode;
synthesizing the part of the signal corresponding to the frame, depending on the first and second sets of filter coefficients, independently of the first and second pitch estimates, when the frame is determined to be the second mode; and synthesizing the part of the signal corresponding to the frame, depending on the second set of filter coefficients, independently of the first set of filter coefficients and the first and second pitch estimates when the frame is determined to be the first mode.
10. The method of claim 9, wherein the determining step includes the substep of:
determining a mode depending on a determined mode of a previous frame.
determining a mode depending on a determined mode of a previous frame.
11. The method of claim 9 wherein the determining step includes the substep of:
determining the mode to be the first mode only when the determined mode of a previous frame is either the first mode or the second mode.
determining the mode to be the first mode only when the determined mode of a previous frame is either the first mode or the second mode.
12. The method of claim 9, wherein the determining step includes the substep of:
determining the mode to be the third mode only when the determined mode of a previous frame is either the third mode or the second mode.
determining the mode to be the third mode only when the determined mode of a previous frame is either the third mode or the second mode.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US22788194A | 1994-04-15 | 1994-04-15 | |
US227,881 | 1994-04-15 | ||
US229,271 | 1994-04-18 | ||
US08/229,271 US5734789A (en) | 1992-06-01 | 1994-04-18 | Voiced, unvoiced or noise modes in a CELP vocoder |
Publications (1)
Publication Number | Publication Date |
---|---|
CA2165546A1 true CA2165546A1 (en) | 1995-11-02 |
Family
ID=26921843
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002165546A Abandoned CA2165546A1 (en) | 1994-04-15 | 1995-04-17 | Method of encoding a signal containing speech |
Country Status (7)
Country | Link |
---|---|
US (2) | US5734789A (en) |
EP (1) | EP0704088B1 (en) |
AT (1) | ATE202232T1 (en) |
CA (1) | CA2165546A1 (en) |
DE (1) | DE69521254D1 (en) |
FI (1) | FI956107A (en) |
WO (1) | WO1995028824A2 (en) |
Families Citing this family (309)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ES2166355T3 (en) * | 1991-06-11 | 2002-04-16 | Qualcomm Inc | VARIABLE SPEED VOCODIFIER. |
TW271524B (en) | 1994-08-05 | 1996-03-01 | Qualcomm Inc | |
US5774856A (en) * | 1995-10-02 | 1998-06-30 | Motorola, Inc. | User-Customized, low bit-rate speech vocoding method and communication unit for use therewith |
CA2188369C (en) * | 1995-10-19 | 2005-01-11 | Joachim Stegmann | Method and an arrangement for classifying speech signals |
DE69629485T2 (en) * | 1995-10-20 | 2004-06-09 | America Online, Inc. | COMPRESSION SYSTEM FOR REPEATING TONES |
JP4005154B2 (en) * | 1995-10-26 | 2007-11-07 | ソニー株式会社 | Speech decoding method and apparatus |
FR2741743B1 (en) * | 1995-11-23 | 1998-01-02 | Thomson Csf | METHOD AND DEVICE FOR IMPROVING SPEECH INTELLIGIBILITY IN LOW-FLOW VOCODERS |
US5956674A (en) * | 1995-12-01 | 1999-09-21 | Digital Theater Systems, Inc. | Multi-channel predictive subband audio coder using psychoacoustic adaptive bit allocation in frequency, time and over the multiple channels |
US5689615A (en) * | 1996-01-22 | 1997-11-18 | Rockwell International Corporation | Usage of voice activity detection for efficient coding of speech |
US5774849A (en) * | 1996-01-22 | 1998-06-30 | Rockwell International Corporation | Method and apparatus for generating frame voicing decisions of an incoming speech signal |
JP3157116B2 (en) * | 1996-03-29 | 2001-04-16 | 三菱電機株式会社 | Audio coding transmission system |
GB2312360B (en) * | 1996-04-12 | 2001-01-24 | Olympus Optical Co | Voice signal coding apparatus |
US5937374A (en) * | 1996-05-15 | 1999-08-10 | Advanced Micro Devices, Inc. | System and method for improved pitch estimation which performs first formant energy removal for a frame using coefficients from a prior frame |
US6047254A (en) * | 1996-05-15 | 2000-04-04 | Advanced Micro Devices, Inc. | System and method for determining a first formant analysis filter and prefiltering a speech signal for improved pitch estimation |
US5809459A (en) * | 1996-05-21 | 1998-09-15 | Motorola, Inc. | Method and apparatus for speech excitation waveform coding using multiple error waveforms |
US5751901A (en) | 1996-07-31 | 1998-05-12 | Qualcomm Incorporated | Method for searching an excitation codebook in a code excited linear prediction (CELP) coder |
US7788092B2 (en) * | 1996-09-25 | 2010-08-31 | Qualcomm Incorporated | Method and apparatus for detecting bad data packets received by a mobile telephone using decoded speech parameters |
JP2001501790A (en) * | 1996-09-25 | 2001-02-06 | クゥアルコム・インコーポレイテッド | Method and apparatus for detecting bad data packets received by a mobile telephone using decoded speech parameters |
US6014622A (en) | 1996-09-26 | 2000-01-11 | Rockwell Semiconductor Systems, Inc. | Low bit rate speech coder using adaptive open-loop subframe pitch lag estimation and vector quantization |
US5794182A (en) * | 1996-09-30 | 1998-08-11 | Apple Computer, Inc. | Linear predictive speech encoding systems with efficient combination pitch coefficients computation |
US6192336B1 (en) | 1996-09-30 | 2001-02-20 | Apple Computer, Inc. | Method and system for searching for an optimal codevector |
GB2318029B (en) * | 1996-10-01 | 2000-11-08 | Nokia Mobile Phones Ltd | Audio coding method and apparatus |
FI964975A (en) * | 1996-12-12 | 1998-06-13 | Nokia Mobile Phones Ltd | Speech coding method and apparatus |
US6148282A (en) * | 1997-01-02 | 2000-11-14 | Texas Instruments Incorporated | Multimodal code-excited linear prediction (CELP) coder and method using peakiness measure |
EP0976216B1 (en) * | 1997-02-27 | 2002-11-27 | Siemens Aktiengesellschaft | Frame-error detection method and device for error masking, specially in gsm transmissions |
JP3444131B2 (en) * | 1997-02-27 | 2003-09-08 | ヤマハ株式会社 | Audio encoding and decoding device |
US6167375A (en) * | 1997-03-17 | 2000-12-26 | Kabushiki Kaisha Toshiba | Method for encoding and decoding a speech signal including background noise |
US6064954A (en) * | 1997-04-03 | 2000-05-16 | International Business Machines Corp. | Digital audio signal coding |
KR100198476B1 (en) * | 1997-04-23 | 1999-06-15 | 윤종용 | Quantizer and the method of spectrum without noise |
IL120788A (en) * | 1997-05-06 | 2000-07-16 | Audiocodes Ltd | Systems and methods for encoding and decoding speech for lossy transmission networks |
JP3206497B2 (en) * | 1997-06-16 | 2001-09-10 | 日本電気株式会社 | Signal Generation Adaptive Codebook Using Index |
DE19729494C2 (en) | 1997-07-10 | 1999-11-04 | Grundig Ag | Method and arrangement for coding and / or decoding voice signals, in particular for digital dictation machines |
JP2001500285A (en) * | 1997-07-11 | 2001-01-09 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | Transmitter and decoder with improved speech encoder |
WO1999003095A1 (en) * | 1997-07-11 | 1999-01-21 | Koninklijke Philips Electronics N.V. | Transmitter with an improved harmonic speech encoder |
US6058359A (en) * | 1998-03-04 | 2000-05-02 | Telefonaktiebolaget L M Ericsson | Speech coding including soft adaptability feature |
US6253173B1 (en) * | 1997-10-20 | 2001-06-26 | Nortel Networks Corporation | Split-vector quantization for speech signal involving out-of-sequence regrouping of sub-vectors |
US6006179A (en) * | 1997-10-28 | 1999-12-21 | America Online, Inc. | Audio codec using adaptive sparse vector quantization with subband vector classification |
US5966688A (en) * | 1997-10-28 | 1999-10-12 | Hughes Electronics Corporation | Speech mode based multi-stage vector quantizer |
US5999897A (en) * | 1997-11-14 | 1999-12-07 | Comsat Corporation | Method and apparatus for pitch estimation using perception based analysis by synthesis |
JP3357829B2 (en) * | 1997-12-24 | 2002-12-16 | 株式会社東芝 | Audio encoding / decoding method |
US6470309B1 (en) * | 1998-05-08 | 2002-10-22 | Texas Instruments Incorporated | Subframe-based correlation |
JP3180762B2 (en) * | 1998-05-11 | 2001-06-25 | 日本電気株式会社 | Audio encoding device and audio decoding device |
US6415252B1 (en) * | 1998-05-28 | 2002-07-02 | Motorola, Inc. | Method and apparatus for coding and decoding speech |
US6141638A (en) * | 1998-05-28 | 2000-10-31 | Motorola, Inc. | Method and apparatus for coding an information signal |
US6141639A (en) * | 1998-06-05 | 2000-10-31 | Conexant Systems, Inc. | Method and apparatus for coding of signals containing speech and background noise |
US6249758B1 (en) * | 1998-06-30 | 2001-06-19 | Nortel Networks Limited | Apparatus and method for coding speech signals by making use of voice/unvoiced characteristics of the speech signals |
US6453289B1 (en) | 1998-07-24 | 2002-09-17 | Hughes Electronics Corporation | Method of noise reduction for speech codecs |
JP4308345B2 (en) * | 1998-08-21 | 2009-08-05 | パナソニック株式会社 | Multi-mode speech encoding apparatus and decoding apparatus |
US6330533B2 (en) | 1998-08-24 | 2001-12-11 | Conexant Systems, Inc. | Speech encoder adaptively applying pitch preprocessing with warping of target signal |
US6493665B1 (en) * | 1998-08-24 | 2002-12-10 | Conexant Systems, Inc. | Speech classification and parameter weighting used in codebook search |
US6823303B1 (en) * | 1998-08-24 | 2004-11-23 | Conexant Systems, Inc. | Speech encoder using voice activity detection in coding noise |
US6449590B1 (en) * | 1998-08-24 | 2002-09-10 | Conexant Systems, Inc. | Speech encoder using warping in long term preprocessing |
US7117146B2 (en) * | 1998-08-24 | 2006-10-03 | Mindspeed Technologies, Inc. | System for improved use of pitch enhancement with subcodebooks |
US6104992A (en) * | 1998-08-24 | 2000-08-15 | Conexant Systems, Inc. | Adaptive gain reduction to produce fixed codebook target signal |
US6480822B2 (en) * | 1998-08-24 | 2002-11-12 | Conexant Systems, Inc. | Low complexity random codebook structure |
WO2000011649A1 (en) * | 1998-08-24 | 2000-03-02 | Conexant Systems, Inc. | Speech encoder using a classifier for smoothing noise coding |
US6507814B1 (en) * | 1998-08-24 | 2003-01-14 | Conexant Systems, Inc. | Pitch determination using speech classification and prior pitch estimation |
US6240386B1 (en) * | 1998-08-24 | 2001-05-29 | Conexant Systems, Inc. | Speech codec employing noise classification for noise compensation |
US7072832B1 (en) * | 1998-08-24 | 2006-07-04 | Mindspeed Technologies, Inc. | System for speech encoding having an adaptive encoding arrangement |
US6493666B2 (en) * | 1998-09-29 | 2002-12-10 | William M. Wiese, Jr. | System and method for processing data from and for multiple channels |
DE19845888A1 (en) * | 1998-10-06 | 2000-05-11 | Bosch Gmbh Robert | Method for coding or decoding speech signal samples as well as encoders or decoders |
US6463407B2 (en) | 1998-11-13 | 2002-10-08 | Qualcomm Inc. | Low bit-rate coding of unvoiced segments of speech |
JP3180786B2 (en) * | 1998-11-27 | 2001-06-25 | 日本電気株式会社 | Audio encoding method and audio encoding device |
US6691084B2 (en) * | 1998-12-21 | 2004-02-10 | Qualcomm Incorporated | Multiple mode variable rate speech coding |
US6456964B2 (en) * | 1998-12-21 | 2002-09-24 | Qualcomm, Incorporated | Encoding of periodic speech using prototype waveforms |
US6311154B1 (en) | 1998-12-30 | 2001-10-30 | Nokia Mobile Phones Limited | Adaptive windows for analysis-by-synthesis CELP-type speech coding |
US6754265B1 (en) * | 1999-02-05 | 2004-06-22 | Honeywell International Inc. | VOCODER capable modulator/demodulator |
US6681203B1 (en) * | 1999-02-26 | 2004-01-20 | Lucent Technologies Inc. | Coupled error code protection for multi-mode vocoders |
EP1088304A1 (en) * | 1999-04-05 | 2001-04-04 | Hughes Electronics Corporation | A frequency domain interpolative speech codec system |
JP4218134B2 (en) * | 1999-06-17 | 2009-02-04 | ソニー株式会社 | Decoding apparatus and method, and program providing medium |
US6487531B1 (en) | 1999-07-06 | 2002-11-26 | Carol A. Tosaya | Signal injection coupling into the human vocal tract for robust audible and inaudible voice recognition |
US7092881B1 (en) * | 1999-07-26 | 2006-08-15 | Lucent Technologies Inc. | Parametric speech codec for representing synthetic speech in the presence of background noise |
DE69943185D1 (en) * | 1999-08-10 | 2011-03-24 | Telogy Networks Inc | Background energy estimate |
US6535843B1 (en) * | 1999-08-18 | 2003-03-18 | At&T Corp. | Automatic detection of non-stationarity in speech signals |
DE60043601D1 (en) * | 1999-08-23 | 2010-02-04 | Panasonic Corp | Sprachenkodierer |
DE69932460T2 (en) * | 1999-09-14 | 2007-02-08 | Fujitsu Ltd., Kawasaki | Speech coder / decoder |
US6782360B1 (en) * | 1999-09-22 | 2004-08-24 | Mindspeed Technologies, Inc. | Gain quantization for a CELP speech coder |
US6581032B1 (en) * | 1999-09-22 | 2003-06-17 | Conexant Systems, Inc. | Bitstream protocol for transmission of encoded voice signals |
US6604070B1 (en) * | 1999-09-22 | 2003-08-05 | Conexant Systems, Inc. | System of encoding and decoding speech signals |
US7315815B1 (en) | 1999-09-22 | 2008-01-01 | Microsoft Corporation | LPC-harmonic vocoder with superframe structure |
US6959274B1 (en) * | 1999-09-22 | 2005-10-25 | Mindspeed Technologies, Inc. | Fixed rate speech compression system and method |
US6438518B1 (en) * | 1999-10-28 | 2002-08-20 | Qualcomm Incorporated | Method and apparatus for using coding scheme selection patterns in a predictive speech coder to reduce sensitivity to frame error conditions |
GB2357683A (en) * | 1999-12-24 | 2001-06-27 | Nokia Mobile Phones Ltd | Voiced/unvoiced determination for speech coding |
WO2001052241A1 (en) * | 2000-01-11 | 2001-07-19 | Matsushita Electric Industrial Co., Ltd. | Multi-mode voice encoding device and decoding device |
US8645137B2 (en) | 2000-03-16 | 2014-02-04 | Apple Inc. | Fast, language-independent method for user authentication by voice |
WO2001078061A1 (en) * | 2000-04-06 | 2001-10-18 | Telefonaktiebolaget Lm Ericsson (Publ) | Pitch estimation in a speech signal |
EP1143414A1 (en) * | 2000-04-06 | 2001-10-10 | TELEFONAKTIEBOLAGET L M ERICSSON (publ) | Estimating the pitch of a speech signal using previous estimates |
WO2001084536A1 (en) * | 2000-04-28 | 2001-11-08 | Deutsche Telekom Ag | Method for detecting a voice activity decision (voice activity detector) |
US6564182B1 (en) * | 2000-05-12 | 2003-05-13 | Conexant Systems, Inc. | Look-ahead pitch determination |
US20020116186A1 (en) * | 2000-09-09 | 2002-08-22 | Adam Strauss | Voice activity detector for integrated telecommunications processing |
US6850884B2 (en) * | 2000-09-15 | 2005-02-01 | Mindspeed Technologies, Inc. | Selection of coding parameters based on spectral content of a speech signal |
US6842733B1 (en) | 2000-09-15 | 2005-01-11 | Mindspeed Technologies, Inc. | Signal processing system for filtering spectral content of a signal for speech coding |
US7457750B2 (en) * | 2000-10-13 | 2008-11-25 | At&T Corp. | Systems and methods for dynamic re-configurable speech recognition |
US6947888B1 (en) | 2000-10-17 | 2005-09-20 | Qualcomm Incorporated | Method and apparatus for high performance low bit-rate coding of unvoiced speech |
US7171355B1 (en) | 2000-10-25 | 2007-01-30 | Broadcom Corporation | Method and apparatus for one-stage and two-stage noise feedback coding of speech and audio signals |
CN1200403C (en) * | 2000-11-30 | 2005-05-04 | 松下电器产业株式会社 | Vector quantizing device for LPC parameters |
US7472059B2 (en) * | 2000-12-08 | 2008-12-30 | Qualcomm Incorporated | Method and apparatus for robust speech classification |
US6633839B2 (en) * | 2001-02-02 | 2003-10-14 | Motorola, Inc. | Method and apparatus for speech reconstruction in a distributed speech recognition system |
DE60233283D1 (en) * | 2001-02-27 | 2009-09-24 | Texas Instruments Inc | Obfuscation method in case of loss of speech frames and decoder dafer |
US6658383B2 (en) | 2001-06-26 | 2003-12-02 | Microsoft Corporation | Method for coding speech and music signals |
US7526431B2 (en) * | 2001-09-05 | 2009-04-28 | Voice Signal Technologies, Inc. | Speech recognition using ambiguous or phone key spelling and/or filtering |
US7505911B2 (en) * | 2001-09-05 | 2009-03-17 | Roth Daniel L | Combined speech recognition and sound recording |
US7225130B2 (en) * | 2001-09-05 | 2007-05-29 | Voice Signal Technologies, Inc. | Methods, systems, and programming for performing speech recognition |
US7467089B2 (en) * | 2001-09-05 | 2008-12-16 | Roth Daniel L | Combined speech and handwriting recognition |
US7444286B2 (en) * | 2001-09-05 | 2008-10-28 | Roth Daniel L | Speech recognition using re-utterance recognition |
US7313526B2 (en) | 2001-09-05 | 2007-12-25 | Voice Signal Technologies, Inc. | Speech recognition using selectable recognition modes |
US7809574B2 (en) | 2001-09-05 | 2010-10-05 | Voice Signal Technologies Inc. | Word recognition using choice lists |
ITFI20010199A1 (en) | 2001-10-22 | 2003-04-22 | Riccardo Vieri | SYSTEM AND METHOD TO TRANSFORM TEXTUAL COMMUNICATIONS INTO VOICE AND SEND THEM WITH AN INTERNET CONNECTION TO ANY TELEPHONE SYSTEM |
US6785645B2 (en) | 2001-11-29 | 2004-08-31 | Microsoft Corporation | Real-time speech and music classifier |
TW589618B (en) * | 2001-12-14 | 2004-06-01 | Ind Tech Res Inst | Method for determining the pitch mark of speech |
US6647366B2 (en) * | 2001-12-28 | 2003-11-11 | Microsoft Corporation | Rate control strategies for speech and music coding |
US7206740B2 (en) * | 2002-01-04 | 2007-04-17 | Broadcom Corporation | Efficient excitation quantization in noise feedback coding with general noise shaping |
US7302387B2 (en) * | 2002-06-04 | 2007-11-27 | Texas Instruments Incorporated | Modification of fixed codebook search in G.729 Annex E audio coding |
JP4433668B2 (en) * | 2002-10-31 | 2010-03-17 | 日本電気株式会社 | Bandwidth expansion apparatus and method |
WO2004084467A2 (en) * | 2003-03-15 | 2004-09-30 | Mindspeed Technologies, Inc. | Recovering an erased voice frame with time warping |
KR20050008356A (en) * | 2003-07-15 | 2005-01-21 | 한국전자통신연구원 | Apparatus and method for converting pitch delay using linear prediction in voice transcoding |
US7596488B2 (en) * | 2003-09-15 | 2009-09-29 | Microsoft Corporation | System and method for real-time jitter control and packet-loss concealment in an audio signal |
US7412376B2 (en) * | 2003-09-10 | 2008-08-12 | Microsoft Corporation | System and method for real-time detection and preservation of speech onset in a signal |
US20050065787A1 (en) * | 2003-09-23 | 2005-03-24 | Jacek Stachurski | Hybrid speech coding and system |
US7426462B2 (en) * | 2003-09-29 | 2008-09-16 | Sony Corporation | Fast codebook selection method in audio encoding |
US7349842B2 (en) * | 2003-09-29 | 2008-03-25 | Sony Corporation | Rate-distortion control scheme in audio encoding |
US7325023B2 (en) * | 2003-09-29 | 2008-01-29 | Sony Corporation | Method of making a window type decision based on MDCT data in audio encoding |
US7283968B2 (en) | 2003-09-29 | 2007-10-16 | Sony Corporation | Method for grouping short windows in audio encoding |
FR2867649A1 (en) * | 2003-12-10 | 2005-09-16 | France Telecom | OPTIMIZED MULTIPLE CODING METHOD |
US8473286B2 (en) * | 2004-02-26 | 2013-06-25 | Broadcom Corporation | Noise feedback coding system and method for providing generalized noise shaping within a simple filter structure |
US7668712B2 (en) * | 2004-03-31 | 2010-02-23 | Microsoft Corporation | Audio encoding and decoding with intra frames and adaptive forward error correction |
US8712768B2 (en) * | 2004-05-25 | 2014-04-29 | Nokia Corporation | System and method for enhanced artificial bandwidth expansion |
US8788265B2 (en) * | 2004-05-25 | 2014-07-22 | Nokia Solutions And Networks Oy | System and method for babble noise detection |
JP5010823B2 (en) | 2004-10-14 | 2012-08-29 | 三星エスディアイ株式会社 | POLYMER ELECTROLYTE MEMBRANE FOR DIRECT OXIDATION FUEL CELL, ITS MANUFACTURING METHOD, AND DIRECT OXIDATION FUEL CELL SYSTEM INCLUDING THE SAME |
US7707034B2 (en) * | 2005-05-31 | 2010-04-27 | Microsoft Corporation | Audio codec post-filter |
US7177804B2 (en) * | 2005-05-31 | 2007-02-13 | Microsoft Corporation | Sub-band voice codec with multi-stage codebooks and redundant coding |
US7831421B2 (en) * | 2005-05-31 | 2010-11-09 | Microsoft Corporation | Robust decoder |
KR101223559B1 (en) * | 2005-06-24 | 2013-01-22 | 삼성에스디아이 주식회사 | Method of preparing polymer membrane for fuel cell |
US20100131276A1 (en) * | 2005-07-14 | 2010-05-27 | Koninklijke Philips Electronics, N.V. | Audio signal synthesis |
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US7633076B2 (en) * | 2005-09-30 | 2009-12-15 | Apple Inc. | Automated response to and sensing of user activity in portable devices |
US8630849B2 (en) * | 2005-11-15 | 2014-01-14 | Samsung Electronics Co., Ltd. | Coefficient splitting structure for vector quantization bit allocation and dequantization |
KR100766896B1 (en) * | 2005-11-29 | 2007-10-15 | 삼성에스디아이 주식회사 | Polymer electrolyte for fuel cell and fuel cell system comprising same |
MX2008009088A (en) * | 2006-01-18 | 2009-01-27 | Lg Electronics Inc | Apparatus and method for encoding and decoding signal. |
JP3981399B1 (en) * | 2006-03-10 | 2007-09-26 | 松下電器産業株式会社 | Fixed codebook search apparatus and fixed codebook search method |
US20070188841A1 (en) * | 2006-02-10 | 2007-08-16 | Ntera, Inc. | Method and system for lowering the drive potential of an electrochromic device |
AU2011247874B2 (en) * | 2006-03-10 | 2012-03-15 | Iii Holdings 12, Llc | Fixed codebook searching apparatus and fixed codebook searching method |
ES2347825T3 (en) * | 2006-03-20 | 2010-11-04 | Mindspeed Technologies, Inc. | ATTENTION OF THE TONE RECORD IN OPEN LOOP. |
KR100900438B1 (en) * | 2006-04-25 | 2009-06-01 | 삼성전자주식회사 | Apparatus and method for voice packet recovery |
US8712766B2 (en) * | 2006-05-16 | 2014-04-29 | Motorola Mobility Llc | Method and system for coding an information signal using closed loop adaptive bit allocation |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
KR100788706B1 (en) * | 2006-11-28 | 2007-12-26 | 삼성전자주식회사 | Method for encoding and decoding of broadband voice signal |
US20080129520A1 (en) * | 2006-12-01 | 2008-06-05 | Apple Computer, Inc. | Electronic device with enhanced audio feedback |
US7805308B2 (en) * | 2007-01-19 | 2010-09-28 | Microsoft Corporation | Hidden trajectory modeling with differential cepstra for speech recognition |
DE602008001787D1 (en) * | 2007-02-12 | 2010-08-26 | Dolby Lab Licensing Corp | IMPROVED RELATIONSHIP BETWEEN LANGUAGE TO NON-LINGUISTIC AUDIO CONTENT FOR ELDERLY OR HARMFUL ACCOMPANIMENTS |
JP5530720B2 (en) | 2007-02-26 | 2014-06-25 | ドルビー ラボラトリーズ ライセンシング コーポレイション | Speech enhancement method, apparatus, and computer-readable recording medium for entertainment audio |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
CN101308651B (en) * | 2007-05-17 | 2011-05-04 | 展讯通信(上海)有限公司 | Detection method of audio transient signal |
US9053089B2 (en) * | 2007-10-02 | 2015-06-09 | Apple Inc. | Part-of-speech tagging using latent analogy |
KR101449431B1 (en) * | 2007-10-09 | 2014-10-14 | 삼성전자주식회사 | Method and apparatus for encoding scalable wideband audio signal |
US8620662B2 (en) | 2007-11-20 | 2013-12-31 | Apple Inc. | Context-aware unit selection |
US10002189B2 (en) * | 2007-12-20 | 2018-06-19 | Apple Inc. | Method and apparatus for searching using an active ontology |
US9330720B2 (en) * | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US20090252913A1 (en) * | 2008-01-14 | 2009-10-08 | Military Wraps Research And Development, Inc. | Quick-change visual deception systems and methods |
US8065143B2 (en) | 2008-02-22 | 2011-11-22 | Apple Inc. | Providing text input using speech data and non-speech data |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
CN101261836B (en) * | 2008-04-25 | 2011-03-30 | 清华大学 | Method for enhancing excitation signal naturalism based on judgment and processing of transition frames |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US8464150B2 (en) | 2008-06-07 | 2013-06-11 | Apple Inc. | Automatic language identification for dynamic text processing |
KR20100006492A (en) | 2008-07-09 | 2010-01-19 | 삼성전자주식회사 | Method and apparatus for deciding encoding mode |
US20100030549A1 (en) | 2008-07-31 | 2010-02-04 | Lee Michael M | Mobile device having human language translation capability with positional feedback |
US8768702B2 (en) | 2008-09-05 | 2014-07-01 | Apple Inc. | Multi-tiered voice feedback in an electronic device |
US8898568B2 (en) * | 2008-09-09 | 2014-11-25 | Apple Inc. | Audio user interface |
US8583418B2 (en) | 2008-09-29 | 2013-11-12 | Apple Inc. | Systems and methods of detecting language and natural language strings for text to speech synthesis |
US8712776B2 (en) * | 2008-09-29 | 2014-04-29 | Apple Inc. | Systems and methods for selective text to speech synthesis |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US9959870B2 (en) | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
US8862252B2 (en) | 2009-01-30 | 2014-10-14 | Apple Inc. | Audio user interface for displayless electronic device |
US8380507B2 (en) | 2009-03-09 | 2013-02-19 | Apple Inc. | Systems and methods for determining the language to use for speech generated by a text to speech engine |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US10540976B2 (en) * | 2009-06-05 | 2020-01-21 | Apple Inc. | Contextual voice commands |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US8682649B2 (en) * | 2009-11-12 | 2014-03-25 | Apple Inc. | Sentiment prediction from textual data |
US8781822B2 (en) * | 2009-12-22 | 2014-07-15 | Qualcomm Incorporated | Audio and speech processing with optimal bit-allocation for constant bit rate applications |
US8600743B2 (en) * | 2010-01-06 | 2013-12-03 | Apple Inc. | Noise profile determination for voice-related feature |
CN102687199B (en) | 2010-01-08 | 2015-11-25 | 日本电信电话株式会社 | Coding method, coding/decoding method, code device, decoding device |
US8381107B2 (en) | 2010-01-13 | 2013-02-19 | Apple Inc. | Adaptive audio feedback system and method |
US8311838B2 (en) | 2010-01-13 | 2012-11-13 | Apple Inc. | Devices and methods for identifying a prompt corresponding to a voice input in a sequence of prompts |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
WO2011089450A2 (en) | 2010-01-25 | 2011-07-28 | Andrew Peter Nelson Jerram | Apparatuses, methods and systems for a digital conversation management platform |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
US8713021B2 (en) | 2010-07-07 | 2014-04-29 | Apple Inc. | Unsupervised document clustering using latent semantic density analysis |
US8719006B2 (en) | 2010-08-27 | 2014-05-06 | Apple Inc. | Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis |
US8719014B2 (en) | 2010-09-27 | 2014-05-06 | Apple Inc. | Electronic device with text error correction based on voice recognition data |
US10515147B2 (en) | 2010-12-22 | 2019-12-24 | Apple Inc. | Using statistical language models for contextual lookup |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US8781836B2 (en) | 2011-02-22 | 2014-07-15 | Apple Inc. | Hearing assistance system for providing consistent human speech |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US8990074B2 (en) | 2011-05-24 | 2015-03-24 | Qualcomm Incorporated | Noise-robust speech coding mode classification |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US10672399B2 (en) | 2011-06-03 | 2020-06-02 | Apple Inc. | Switching between text data and audio data based on a mapping |
US8812294B2 (en) | 2011-06-21 | 2014-08-19 | Apple Inc. | Translating phrases from one language into another using an order-based set of declarative rules |
JP5752324B2 (en) * | 2011-07-07 | 2015-07-22 | ニュアンス コミュニケーションズ, インコーポレイテッド | Single channel suppression of impulsive interference in noisy speech signals. |
US8706472B2 (en) | 2011-08-11 | 2014-04-22 | Apple Inc. | Method for disambiguating multiple readings in language conversion |
US8994660B2 (en) | 2011-08-29 | 2015-03-31 | Apple Inc. | Text correction processing |
US8762156B2 (en) | 2011-09-28 | 2014-06-24 | Apple Inc. | Speech recognition repair using contextual information |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9280610B2 (en) | 2012-05-14 | 2016-03-08 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US10417037B2 (en) | 2012-05-15 | 2019-09-17 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US8775442B2 (en) | 2012-05-15 | 2014-07-08 | Apple Inc. | Semantic search using a single-source semantic model |
WO2013185109A2 (en) | 2012-06-08 | 2013-12-12 | Apple Inc. | Systems and methods for recognizing textual identifiers within a plurality of words |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
US8935167B2 (en) | 2012-09-25 | 2015-01-13 | Apple Inc. | Exemplar-based latent perceptual modeling for automatic speech recognition |
EP2947650A1 (en) | 2013-01-18 | 2015-11-25 | Kabushiki Kaisha Toshiba | Speech synthesizer, electronic watermark information detection device, speech synthesis method, electronic watermark information detection method, speech synthesis program, and electronic watermark information detection program |
EP2954514B1 (en) | 2013-02-07 | 2021-03-31 | Apple Inc. | Voice trigger for a digital assistant |
US9977779B2 (en) | 2013-03-14 | 2018-05-22 | Apple Inc. | Automatic supplementation of word correction dictionaries |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US10642574B2 (en) | 2013-03-14 | 2020-05-05 | Apple Inc. | Device, method, and graphical user interface for outputting captions |
US10572476B2 (en) | 2013-03-14 | 2020-02-25 | Apple Inc. | Refining a search based on schedule items |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
US9733821B2 (en) | 2013-03-14 | 2017-08-15 | Apple Inc. | Voice control to diagnose inadvertent activation of accessibility features |
AU2014233517B2 (en) | 2013-03-15 | 2017-05-25 | Apple Inc. | Training an at least partial voice command system |
WO2014144579A1 (en) | 2013-03-15 | 2014-09-18 | Apple Inc. | System and method for updating an adaptive speech recognition model |
CN105144133B (en) | 2013-03-15 | 2020-11-20 | 苹果公司 | Context-sensitive handling of interrupts |
WO2014144395A2 (en) | 2013-03-15 | 2014-09-18 | Apple Inc. | User training by intelligent digital assistant |
US10748529B1 (en) | 2013-03-15 | 2020-08-18 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
WO2014197336A1 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
WO2014197334A2 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
WO2014197335A1 (en) | 2013-06-08 | 2014-12-11 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
DE112014002747T5 (en) | 2013-06-09 | 2016-03-03 | Apple Inc. | Apparatus, method and graphical user interface for enabling conversation persistence over two or more instances of a digital assistant |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
KR101809808B1 (en) | 2013-06-13 | 2017-12-15 | 애플 인크. | System and method for emergency calls initiated by voice command |
AU2014306221B2 (en) | 2013-08-06 | 2017-04-06 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US10296160B2 (en) | 2013-12-06 | 2019-05-21 | Apple Inc. | Method for extracting salient dialog usage from live data |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
AU2015266863B2 (en) | 2014-05-30 | 2018-03-15 | Apple Inc. | Multi-command single utterance input method |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US9467569B2 (en) * | 2015-03-05 | 2016-10-11 | Raytheon Company | Methods and apparatus for reducing audio conference noise using voice quality measures |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US20170069306A1 (en) * | 2015-09-04 | 2017-03-09 | Foundation of the Idiap Research Institute (IDIAP) | Signal processing method and apparatus based on structured sparsity of phonological features |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
DK179309B1 (en) | 2016-06-09 | 2018-04-23 | Apple Inc | Intelligent automated assistant in a home environment |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
DK179415B1 (en) | 2016-06-11 | 2018-06-14 | Apple Inc | Intelligent device arbitration and control |
DK179049B1 (en) | 2016-06-11 | 2017-09-18 | Apple Inc | Data driven natural language event detection and classification |
DK179343B1 (en) | 2016-06-11 | 2018-05-14 | Apple Inc | Intelligent task discovery |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
WO2018133951A1 (en) * | 2017-01-23 | 2018-07-26 | Huawei Technologies Co., Ltd. | An apparatus and method for enhancing a wanted component in a signal |
DK179745B1 (en) | 2017-05-12 | 2019-05-01 | Apple Inc. | SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT |
DK201770431A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
CN110782906B (en) * | 2018-07-30 | 2022-08-05 | 南京中感微电子有限公司 | Audio data recovery method and device and Bluetooth equipment |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4058676A (en) * | 1975-07-07 | 1977-11-15 | International Communication Sciences | Speech analysis and synthesis system |
US4771465A (en) * | 1986-09-11 | 1988-09-13 | American Telephone And Telegraph Company, At&T Bell Laboratories | Digital speech sinusoidal vocoder with transmission of only subset of harmonics |
US5125030A (en) * | 1987-04-13 | 1992-06-23 | Kokusai Denshin Denwa Co., Ltd. | Speech signal coding/decoding system based on the type of speech signal |
JP2609752B2 (en) * | 1990-10-09 | 1997-05-14 | 三菱電機株式会社 | Voice / in-band data identification device |
US5293449A (en) * | 1990-11-23 | 1994-03-08 | Comsat Corporation | Analysis-by-synthesis 2,4 kbps linear predictive speech codec |
US5495555A (en) * | 1992-06-01 | 1996-02-27 | Hughes Aircraft Company | High quality low bit rate celp-based speech codec |
US5341456A (en) * | 1992-12-02 | 1994-08-23 | Qualcomm Incorporated | Method for determining speech encoding rate in a variable rate vocoder |
US5459814A (en) * | 1993-03-26 | 1995-10-17 | Hughes Aircraft Company | Voice activity detector for speech signals in variable background noise |
-
1994
- 1994-04-18 US US08/229,271 patent/US5734789A/en not_active Expired - Lifetime
-
1995
- 1995-04-17 AT AT95916376T patent/ATE202232T1/en not_active IP Right Cessation
- 1995-04-17 WO PCT/US1995/004577 patent/WO1995028824A2/en active IP Right Grant
- 1995-04-17 EP EP95916376A patent/EP0704088B1/en not_active Expired - Lifetime
- 1995-04-17 CA CA002165546A patent/CA2165546A1/en not_active Abandoned
- 1995-04-17 DE DE69521254T patent/DE69521254D1/en not_active Expired - Lifetime
- 1995-10-11 US US08/540,637 patent/US5596676A/en not_active Expired - Lifetime
- 1995-12-19 FI FI956107A patent/FI956107A/en not_active Application Discontinuation
Also Published As
Publication number | Publication date |
---|---|
WO1995028824A2 (en) | 1995-11-02 |
FI956107A (en) | 1996-01-08 |
ATE202232T1 (en) | 2001-06-15 |
WO1995028824A3 (en) | 1995-11-16 |
EP0704088A1 (en) | 1996-04-03 |
DE69521254D1 (en) | 2001-07-19 |
EP0704088B1 (en) | 2001-06-13 |
US5596676A (en) | 1997-01-21 |
FI956107A0 (en) | 1995-12-19 |
US5734789A (en) | 1998-03-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2165546A1 (en) | Method of encoding a signal containing speech | |
EP2154679B1 (en) | Method and apparatus for speech coding | |
US5732389A (en) | Voiced/unvoiced classification of speech for excitation codebook selection in celp speech decoding during frame erasures | |
Spanias | Speech coding: A tutorial review | |
US6480822B2 (en) | Low complexity random codebook structure | |
CN1112671C (en) | Method of adapting noise masking level in analysis-by-synthesis speech coder employing short-team perceptual weichting filter | |
US5699485A (en) | Pitch delay modification during frame erasures | |
US20020016711A1 (en) | Encoding of periodic speech using prototype waveforms | |
KR19990006262A (en) | Speech coding method based on digital speech compression algorithm | |
EP1420391B1 (en) | Generalized analysis-by-synthesis speech coding method, and coder implementing such method | |
JPH10207498A (en) | Input voice coding method by multi-mode code exciting linear prediction and its coder | |
US20030195746A1 (en) | Speech coding/decoding method and apparatus | |
EP1204092A2 (en) | Speech decoder capable of decoding background noise signal with high quality | |
CN1113586A (en) | Removal of swirl artifacts from CELP based speech coders | |
Yong et al. | Efficient encoding of the long-term predictor in vector excitation coders | |
Yeldener et al. | A mixed sinusoidally excited linear prediction coder at 4 kb/s and below | |
Miki et al. | A pitch synchronous innovation CELP (PSI-CELP) coder for 2-4 kbit/s | |
Burnett et al. | A mixed prototype waveform/CELP coder for sub 3 kbit/s | |
Juan et al. | An 8-kb/s conjugate-structure algebraic CELP (CS-ACELP) speech coding | |
Ma et al. | 400bps High-Quality Speech Coding Algorithm | |
JPH06130996A (en) | Code excitation linear predictive encoding and decoding device | |
KR100269357B1 (en) | Speech recognition method | |
Taniguchi et al. | Principal axis extracting vector excitation coding: high quality speech at 8 kb/s | |
Jung et al. | A cascaded algebraic codebook structure to improve the performance of speech coder | |
KR100346732B1 (en) | Noise code book preparation and linear prediction coding/decoding method using noise code book and apparatus therefor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
FZDE | Discontinued |