CN103474074A - Voice pitch period estimation method and device - Google Patents

Voice pitch period estimation method and device Download PDF

Info

Publication number
CN103474074A
CN103474074A CN2013104094338A CN201310409433A CN103474074A CN 103474074 A CN103474074 A CN 103474074A CN 2013104094338 A CN2013104094338 A CN 2013104094338A CN 201310409433 A CN201310409433 A CN 201310409433A CN 103474074 A CN103474074 A CN 103474074A
Authority
CN
China
Prior art keywords
pitch period
value
voice signal
normalized autocorrelation
autocorrelation functions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013104094338A
Other languages
Chinese (zh)
Other versions
CN103474074B (en
Inventor
闫建新
张勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Guangsheng Research And Development Institute Co ltd
Original Assignee
Shenzhen Rising Source Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Rising Source Technology Co ltd filed Critical Shenzhen Rising Source Technology Co ltd
Priority to CN201310409433.8A priority Critical patent/CN103474074B/en
Publication of CN103474074A publication Critical patent/CN103474074A/en
Application granted granted Critical
Publication of CN103474074B publication Critical patent/CN103474074B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a method and a device for estimating a voice pitch period. The device comprises: the device comprises a signal preprocessing unit, a normalized autocorrelation function calculating unit and a pitch period post-processing unit. The method comprises the following steps: s1, preprocessing the voice signal for removing the direct current component, perception weighting and signal down sampling; s2, calculating a normalized autocorrelation function value of the preprocessed voice signal; s3, determining the maximum value in the normalized autocorrelation function value in the pitch period searching range, and determining the pitch period candidate value corresponding to the maximum value as the pitch period estimated value of the voice signal. The invention better overcomes the frequency multiplication and half frequency errors in the pitch period estimation, improves the anti-noise performance of the pitch period estimation method, reduces the operation complexity of the algorithm and improves the corresponding digital audio/voice coding efficiency. The invention can be suitable for the fundamental tone search in various voice coding and decoding algorithms and has wide applicability.

Description

The pitch estimation method and apparatus
Technical field
The present invention relates to speech coding technology, more particularly, relate to a kind of pitch estimation method and apparatus.
Background technology
Pitch period refers to the cycle of vocal cord vibration when the people pronounces.Pitch period is an important problem in voice coding, and its accuracy will directly have influence on coding quality and the efficiency of speech coder.Redundancy can be effectively removed in pitch period analysis accurately in speech, reduces the bit number of coding, realizes low bit rate high-quality speech coding.But, due to the singularity of voice, the accurate search of pitch period can face following difficulty:
(1) voice signal changes very complicatedly, and the glottal excitation waveform is not a periodic pulse train completely, and the cycle of speech waveform becomes while being.
(2) the beginning and end part at voice does not have the such periodicity of vocal cord vibration, and the transition sound such as some pure and impure sound are to be difficult to judge that it belongs to cycle or nonperiodic signal, thereby also just are unable to estimate pitch period.
(3) will from voice signal, remove sound channel impact, directly only the information relevant with vocal cord vibration is more difficult in taking-up.
(4) what define each pitch period in voiced segments accurately starts and finishes the reliable measurements that this difficulty has limited fundamental tone, this is not only because voice signal itself is quasi-periodic (being that fundamental tone is vicissitudinous), simultaneously also because waveform is subject to the impact of resonance peak and noise etc.
(5) in actual applications, ground unrest can affect the performance of pitch Detection, particularly important for mobile communication environment, because waveform often there will be high level of noise.
(6) the pitch period variation range is large has brought certain difficulty also to accurate pitch Detection.
At present, also do not have a kind of general method can accurately extract reliably voice pitch period in either case.Traditional fundamental tone detecting method, can be divided into time domain method and frequency domain method.In time domain, traditional pitch period algorithm comprises based on average magnitude difference function (Average Magnitude Difference Function, AMDF) fundamental tone algorithm for estimating, based on short-time autocorrelation function (Autocorrelation Function, ACF) Pitch Detection Algorithm.These two kinds of algorithms can be referring to the introduction as Publication about Document:
Chu,Wai?C.Speech?coding?algorithms:foundation?and?evolution?of?standardized?coders.John?Wiley&Sons,Inc.2003,pp.33-45。
Angle at frequency domain, Griffin and Lim have proposed a kind of frequency domain pitch period estimation scheme (D.W.Griffin, J.S.Lim.Multiband Excitation Vocoder.IEEE Trans ASSP, 1988,36 (8)),, for multi-band excitation speech coding algorithm (MBE), this pitch period algorithm for estimating adopts the closed-Loop Analysis synthetic method, the matched signal frequency-domain waveform, obtain optimum pitch period and estimate.
In actual applications, the pitch search algorithm based on time domain is because its algorithm is simple, and performance is better and be used widely.For example current speech coding standard G.729, in AMR-WB, all taked the improved short-time autocorrelation function of time domain (ACF) Pitch Detection Algorithm (Bao Changchun. low code check digital speech code basis. Beijing: publishing house of Beijing University of Technology, 2001.2.).But the ACF method of time domain easily produces " frequency multiplication " and " half frequently " mistake usually, the AMDF method can not effectively be followed the tracks of speech frequency and be changed fast.Frequency domain method generally adopts Cepstrum Method, owing to introducing logarithm operation, calculated amount is increased considerably, and be subject to the impact of noise.
Summary of the invention
The technical problem to be solved in the present invention is, above-mentioned defect for prior art, a kind of low complex degree, efficient pitch estimation method and apparatus are provided, can overcome preferably frequency multiplication and half frequency mistake in the pitch period estimation, and energy raising anti-noise performance.
The technical solution adopted for the present invention to solve the technical problems is: propose a kind of pitch estimation method, comprise the steps:
S1, the pre-service of voice signal being removed to DC component, perceptual weighting and signal down-sampling;
S2, use following formula calculate the normalized autocorrelation functions value of described pretreated voice signal:
ρ ( τ ) = Σ n = 0 N - 1 s ( n ) s ( n - τ ) Σ n = 0 N - 1 s 2 ( n ) Σ n = 0 N - 1 s 2 ( n - τ ) ,
Wherein, ρ (τ) means the normalized autocorrelation functions value, and s (n) is the voice signal after perceptual weighting, and τ means the voice fundamental cycle candidate value in search, the length that N is a frame signal after the signal down-sampling;
S3, determine the maximal value in described normalized autocorrelation functions value in the pitch period hunting zone, by described maximal value, corresponding pitch period candidate value is defined as the pitch period estimated value of described voice signal.
In an embodiment, described step S1 further comprises:
S11, to the voice signal inner sampling rate that resamples;
S12, the voice signal resampled is carried out to high-pass filtering to remove DC component;
S13, the voice signal after high-pass filtering is carried out to perceptual weighting;
S14, the voice signal after perceptual weighting is carried out to low-pass filtering and 1/2 down-sampling.
In an embodiment, described inner sampling rate is 12.8kHz, and the cutoff frequency of described high-pass filtering is 50Hz.
In an embodiment, described step S3 further comprises:
S31, according to the sampling rate of voice signal, by the pitch period hunting zone, be divided between the first interval, Second Region and the 3rd interval, obtain respectively each interval normalized autocorrelation functions maximal value and corresponding pitch period candidate value;
S32, the weight parameter that foundation is certain, select the normalized autocorrelation functions maximal value of described pitch period hunting zone from the normalized autocorrelation functions maximal value in described three intervals, by this maximal value, corresponding pitch period candidate value is defined as the pitch period estimated value of described voice signal.
In an embodiment, described step S32 further comprises: judge whether the normalized autocorrelation functions maximal value between Second Region is more than or equal to the normalized autocorrelation functions maximal value in the first interval and the product of described weight parameter, if, by the normalized autocorrelation functions maximal value between Second Region, corresponding pitch period candidate value is defined as the pitch period estimated value of described voice signal, otherwise, further judge whether the normalized autocorrelation functions maximal value in the 3rd interval is more than or equal to the normalized autocorrelation functions maximal value in the first interval and the product of described weight parameter, if, by the normalized autocorrelation functions maximal value in the 3rd interval, corresponding pitch period candidate value is defined as the pitch period estimated value of described voice signal, otherwise corresponding pitch period candidate value is defined as the pitch period estimated value of described voice signal by the normalized autocorrelation functions maximal value in the first interval.
In an embodiment, between described the first interval, Second Region and the 3rd interval is specially [L_min, 39], and [40,79], [80, L_max], wherein L_min means the initial value of pitch period hunting zone, L_max means the end value of pitch period hunting zone.
The present invention also proposes a kind of pitch estimation device for solving its technical matters, comprising:
The Signal Pretreatment unit, the pre-service of voice signal being removed to DC component, perceptual weighting and signal down-sampling;
The normalized autocorrelation functions computing unit, used following formula to calculate the normalized autocorrelation functions value of described pretreated voice signal:
ρ ( τ ) = Σ n = 0 N - 1 s ( n ) s ( n - τ ) Σ n = 0 N - 1 s 2 ( n ) Σ n = 0 N - 1 s 2 ( n - τ ) ,
Wherein, ρ (τ) means the normalized autocorrelation functions value, and s (n) is the voice signal after perceptual weighting, and τ means the voice fundamental cycle candidate value in search, the length that N is a frame signal after the signal down-sampling;
The pitch period post-processing unit, determine the maximal value in described normalized autocorrelation functions value in the pitch period hunting zone, and by described maximal value, corresponding pitch period candidate value is defined as the pitch period estimated value of described voice signal.
In an embodiment, described Signal Pretreatment unit is further to the voice signal inner sampling rate that resamples, then the voice signal resampled is carried out to high-pass filtering to remove DC component, subsequently the voice signal after high-pass filtering is carried out to perceptual weighting, finally the voice signal after perceptual weighting is carried out to low-pass filtering and 1/2 down-sampling.
In an embodiment, described pitch period post-processing unit is further according to the sampling rate of voice signal, the pitch period hunting zone is divided into to the first interval, between Second Region and the 3rd interval, obtain respectively each interval normalized autocorrelation functions maximal value and corresponding pitch period candidate value, and according to certain weight parameter, select the normalized autocorrelation functions maximal value of described pitch period hunting zone from the normalized autocorrelation functions maximal value in described three intervals, by this maximal value, corresponding pitch period candidate value is defined as the pitch period estimated value of described voice signal.
In an embodiment, the normalized autocorrelation functions maximal value that described pitch period post-processing unit is selected described pitch period hunting zone according to certain weight parameter from the normalized autocorrelation functions maximal value in described three intervals is specially: judge whether the normalized autocorrelation functions maximal value between Second Region is more than or equal to the normalized autocorrelation functions maximal value in the first interval and the product of described weight parameter, if, by the normalized autocorrelation functions maximal value between Second Region, corresponding pitch period candidate value is defined as the pitch period estimated value of described voice signal, otherwise, further judge whether the normalized autocorrelation functions maximal value in the 3rd interval is more than or equal to the normalized autocorrelation functions maximal value in the first interval and the product of described weight parameter, if, by the normalized autocorrelation functions maximal value in the 3rd interval, corresponding pitch period candidate value is defined as the pitch period estimated value of described voice signal, otherwise corresponding pitch period candidate value is defined as the pitch period estimated value of described voice signal by the normalized autocorrelation functions maximal value in the first interval.
Pitch estimation method and apparatus of the present invention, based on the normalized autocorrelation functions pitch Detection, and introduce pre-service and post-processing technology in the pitch period estimation, frequency multiplication and half frequency mistake during pitch period is estimated have been overcome preferably, promoted the noise robustness of pitch period method of estimation, reduce the computational complexity of algorithm simultaneously, improved corresponding DAB/voice coding efficiency.The present invention can be applicable to the pitch search in various voice coding/decoding algorithmss, has applicability widely.
The accompanying drawing explanation
Below in conjunction with drawings and Examples, the invention will be further described, in accompanying drawing:
Fig. 1 is the process flow diagram of the pitch estimation method of one embodiment of the invention;
Fig. 2 is the process flow diagram of a specific embodiment of step 110 in Fig. 1;
Fig. 3 is the process flow diagram of a specific embodiment of step 130 in Fig. 1;
Fig. 4 is the logic diagram of the pitch estimation device of one embodiment of the invention.
Embodiment
In order to make purpose of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.
Fig. 1 shows the process flow diagram of the pitch estimation method 100 of one embodiment of the invention.As shown in Figure 1, this pitch estimation method 100 comprises:
In step 110, voice signal is removed to the pre-service of DC component, perceptual weighting and signal down-sampling.
In step 120, calculate the normalized autocorrelation functions value of pretreated voice signal.The present invention uses following normalized autocorrelation functions:
ρ ( τ ) = Σ n = 0 N - 1 s ( n ) s ( n - τ ) Σ n = 0 N - 1 s 2 ( n ) Σ n = 0 N - 1 s 2 ( n - τ ) ,
Wherein, ρ (τ) means the normalized autocorrelation functions value, and s (n) is the voice signal after perceptual weighting, and τ means the voice fundamental cycle candidate value in search, the length that N is a frame signal after the signal down-sampling.
In step 130, determine the maximal value in the normalized autocorrelation functions value in the pitch period hunting zone, by described maximal value, corresponding pitch period candidate value is defined as the pitch period estimated value of voice signal.
The present invention has introduced the Signal Pretreatment technology in pitch period is estimated.Fig. 2 shows the process flow diagram of a specific embodiment of the Signal Pretreatment step 110 shown in Fig. 1.As shown in Figure 2, this Signal Pretreatment step 110 further comprises:
In step 111, to the voice signal inner sampling rate (Fs=12.8kHz) that resamples.
In later step 112, the voice signal resampled is carried out to high-pass filtering.The cutoff frequency of high-pass filtering wave filter can be 50Hz, and its purpose is to remove DC component.
Then in step 113, the voice signal after high-pass filtering is carried out to perceptual weighting.
In final step 114, the voice signal after perceptual weighting being carried out to low-pass filtering and 1/2 down-sampling, will be 3.2kHz by the signal broadband.
In further preferred embodiment, thereby the present invention in Signal Pretreatment step 110, can also add numerical filter to remove resonance peak and high frequency noise is estimated pitch period more accurately.
The present invention, before carrying out the pitch period search, carries out pre-service to the voice signal of inputting, and so both can filtering estimate inoperative HFS to pitch period, also can reduce the computational complexity of algorithm simultaneously.
The present invention has also introduced the pitch period post-processing technology in pitch period is estimated.Fig. 3 shows the process flow diagram of a specific embodiment of the pitch period post-processing step 130 shown in Fig. 1.As shown in Figure 3, this pitch period post-processing step 130 further comprises:
In step 131, according to the sampling rate of voice signal, by the pitch period hunting zone, be divided between the first interval, Second Region and the 3rd interval, obtain respectively each interval normalized autocorrelation functions maximal value and corresponding pitch period candidate value.
In an embodiment, the pitch period hunting zone is [L_min, L_max], and wherein L_min means the initial value of pitch period hunting zone, and L_max means the end value of pitch period hunting zone.Sample frequency according to aforesaid voice signal, can be divided into this pitch period hunting zone following three intervals, i.e. the first interval [L_min, 39], [40,79] between Second Region, the 3rd interval [80, L_max], so that determine correct pitch period estimated value in these three intervals.In specific embodiment, L_min and L_max can be respectively 0 and 256.Based on above three intervals, can obtain maximum ρ (τ) value in each interval and corresponding pitch period candidate value τ, be designated as ρ max1, ρ max2and ρ max3, τ 1, τ 2and τ 3.
In step 132, according to certain weight parameter, select the normalized autocorrelation functions maximal value of described pitch period hunting zone from the normalized autocorrelation functions maximal value in described three intervals, by this maximal value, corresponding pitch period candidate value is defined as the pitch period estimated value of described voice signal.
In an embodiment, selected weight parameter c(can be near the numerical value 1.0, for example 0.97) and, can carry out by the following method to determine optimum pitch period candidate value τ opt:
At first judge the normalized autocorrelation functions maximal value ρ between Second Region max2whether be more than or equal to the normalized autocorrelation functions maximal value ρ in the first interval max1with the product of weight parameter c, if so, by the normalized autocorrelation functions maximal value ρ between Second Region max2corresponding pitch period candidate value τ 2be defined as the pitch period estimated value of voice signal, otherwise, further judge the normalized autocorrelation functions maximal value ρ in the 3rd interval max3whether be more than or equal to the normalized autocorrelation functions maximal value ρ in the first interval max1with the product of weight parameter c, if so, by the normalized autocorrelation functions maximal value ρ in the 3rd interval max3corresponding pitch period candidate value τ 3be defined as the pitch period estimated value of voice signal, otherwise by the normalized autocorrelation functions maximal value ρ in the first interval max1corresponding pitch period candidate value τ 1be defined as the pitch period estimated value of voice signal.
Relevant mathematical notation is as follows:
Make τ opt1, ρ maxmax1;
If ρ max2>=c ρ max, ρ maxmax2, τ opt2;
If ρ max3>=c ρ max, ρ maxmax3, τ opt3.
Further in preferred embodiment, the present invention in pitch period post-processing step 130, can also utilize normalized autocorrelation functions judgement voice signal clear/accuracy that turbid characteristic is estimated to promote pitch period.
Pitch estimation method based on above introduction, the present invention also proposes a kind of pitch estimation device.Fig. 4 shows the logic diagram of the pitch estimation device 400 of one embodiment of the invention.As shown in Figure 4, this pitch estimation device 400 comprises Signal Pretreatment unit 410, normalized autocorrelation functions computing unit 420 and pitch period post-processing unit 430.The voice signal of the 410 pairs of inputs in Signal Pretreatment unit is removed the pre-service of DC component, perceptual weighting and signal down-sampling.Normalized autocorrelation functions computing unit 420 is used following formula to calculate the normalized autocorrelation functions value through Signal Pretreatment unit 410 pretreated voice signals:
ρ ( τ ) = Σ n = 0 N - 1 s ( n ) s ( n - τ ) Σ n = 0 N - 1 s 2 ( n ) Σ n = 0 N - 1 s 2 ( n - τ ) ,
Wherein, ρ (τ) means the normalized autocorrelation functions value, and s (n) is the voice signal after perceptual weighting, and τ means the voice fundamental cycle candidate value in search, the length that N is a frame signal after the signal down-sampling.Pitch period post-processing unit 430 is determined the maximal value in the normalized autocorrelation functions value in the pitch period hunting zone, and by this maximal value, corresponding pitch period candidate value is defined as the pitch period estimated value of described voice signal.
In a specific embodiment, Signal Pretreatment unit 410 is at first to the voice signal inner sampling rate (Fs=12.8kHz) that resamples of input, then the voice signal resampled is carried out to high-pass filtering, the cutoff frequency of wave filter can be 50Hz, its purpose is to remove DC component, subsequently the voice signal after high-pass filtering being carried out to perceptual weighting, finally the voice signal after perceptual weighting is carried out to low-pass filtering and 1/2 down-sampling, will be 3.2kHz by the signal broadband.So both can filtering estimate inoperative HFS to pitch period, also can reduce the computational complexity of algorithm simultaneously.
In a specific embodiment, pitch period post-processing unit 430 is according to the sampling rate of voice signal, by the pitch period hunting zone, be divided between the first interval, Second Region and the 3rd interval, the first interval [L_min for example, 39], between Second Region [40,79], the 3rd interval [80, L_max], wherein L_min means the initial value of pitch period hunting zone, L_max means the end value of pitch period hunting zone, then obtain respectively maximum ρ (τ) value in each interval and corresponding pitch period candidate value τ, be designated as ρ max1, ρ max2and ρ max3, τ 1, τ 2and τ 3.Pitch period post-processing unit 430 can be also near the numerical value 1.0 according to certain weight parameter c(, for example 0.97) and, carry out by the following method to determine optimum pitch period candidate value τ opt:
Make τ opt1, ρ maxmax1;
If ρ max2>=c ρ max, ρ maxmax2, τ opt2;
If ρ max3>=c ρ max, ρ maxmax3, τ opt3.
Pitch estimation method and apparatus of the present invention, based on the normalized autocorrelation functions pitch Detection, and introduce pre-service and post-processing technology in the pitch period estimation, frequency multiplication and half frequency mistake during pitch period is estimated have been overcome preferably, promote the noise robustness of pitch period method of estimation, improved corresponding DAB/voice coding efficiency.Below provide the Performance Ratio of pitch search algorithm in the present invention and AMR-WB+:
1, performance test methods: sequence of calculation average signal-to-noise ratio (SNR), it is defined as follows:
Figure BDA0000379417320000091
segSNR ‾ = 1 N SF Σ i = 0 N SF - 1 segSNR i ,
Wherein, N(N=256) be the length of a frame voice signal, N sFbe the totalframes of a voice sequence, x w(n) be the signal of original signal after perceptual weighting,
Figure BDA0000379417320000094
for the signal of the voice signal through after coding/decoding after perceptual weighting.
2, test result
Two kinds of sequence of algorithms average SNR contrasts of table 1 (monophony)
Figure BDA0000379417320000093
Figure BDA0000379417320000101
Two kinds of sequence of algorithms average SNR contrasts of table 2 (stereo)
Figure BDA0000379417320000102
3, test result analysis
(1) from test result, the algorithm performance that the present invention proposes slightly is better than the pitch period searching algorithm performance of AMR-WB+, and computational complexity is than the complexity of AMR-WB+ algorithm suitable (also slightly a little bit smaller).
(2) from the interpretation of result of table 1 and table 2, es02, two sequential coding poor-performings of s_cl_mt_2_org, the s_cl_ft_3_org coding efficiency is best.Be middle-aged male sound by sequential analysis es02, two sequences of s_cl_mt_2_org, s_cl_ft_3_org is young woman's sound.By Algorithm Analysis, the parameter that doubling time detected that prevents of setting in this and algorithm of the present invention is chosen relevant, this parameter is an empirical value, algorithm is mainly considered schoolgirl, scholar without a xiucai degree's situation at present, the characteristics of these sequences are that its pitch period variation range is large, and rapidly, Comparatively speaking its pitch period variation of middle-aged male sound is very mild, and variation range is relative also less.
(3) test in along tape test some typical noisy speech s_no_ft_9_org, s_no_2t_1_org, s_no_2t_2_org, s_no_2t_3_org, s_no_ft_1_org, such as the situation that contains a large amount of ground unrests on airport etc., from test result, the noiseproof feature of algorithm of the present invention is better than the AMR-WB+ algorithm.

Claims (10)

1. a pitch estimation method, is characterized in that, comprises the steps:
S1, the pre-service of voice signal being removed to DC component, perceptual weighting and signal down-sampling;
S2, use following formula calculate the normalized autocorrelation functions value of described pretreated voice signal:
ρ ( τ ) = Σ n = 0 N - 1 s ( n ) s ( n - τ ) Σ n = 0 N - 1 s 2 ( n ) Σ n = 0 N - 1 s 2 ( n - τ ) ,
Wherein, ρ (τ) means the normalized autocorrelation functions value, and s (n) is the voice signal after perceptual weighting, and τ means the voice fundamental cycle candidate value in search, the length that N is a frame signal after the signal down-sampling;
S3, determine the maximal value in described normalized autocorrelation functions value in the pitch period hunting zone, by described maximal value, corresponding pitch period candidate value is defined as the pitch period estimated value of described voice signal.
2. method according to claim 1, is characterized in that, described step S1 further comprises:
S11, to the voice signal inner sampling rate that resamples;
S12, the voice signal resampled is carried out to high-pass filtering to remove DC component;
S13, the voice signal after high-pass filtering is carried out to perceptual weighting;
S14, the voice signal after perceptual weighting is carried out to low-pass filtering and 1/2 down-sampling.
3. method according to claim 2, is characterized in that, described inner sampling rate is 12.8kHz, and the cutoff frequency of described high-pass filtering is 50Hz.
4. method according to claim 1, is characterized in that, described step S3 further comprises:
S31, according to the sampling rate of voice signal, by the pitch period hunting zone, be divided between the first interval, Second Region and the 3rd interval, obtain respectively each interval normalized autocorrelation functions maximal value and corresponding pitch period candidate value;
S32, the weight parameter that foundation is certain, select the normalized autocorrelation functions maximal value of described pitch period hunting zone from the normalized autocorrelation functions maximal value in described three intervals, by this maximal value, corresponding pitch period candidate value is defined as the pitch period estimated value of described voice signal.
5. method according to claim 4, it is characterized in that, described step S32 further comprises: judge whether the normalized autocorrelation functions maximal value between Second Region is more than or equal to the normalized autocorrelation functions maximal value in the first interval and the product of described weight parameter, if, by the normalized autocorrelation functions maximal value between Second Region, corresponding pitch period candidate value is defined as the pitch period estimated value of described voice signal, otherwise, further judge whether the normalized autocorrelation functions maximal value in the 3rd interval is more than or equal to the normalized autocorrelation functions maximal value in the first interval and the product of described weight parameter, if, by the normalized autocorrelation functions maximal value in the 3rd interval, corresponding pitch period candidate value is defined as the pitch period estimated value of described voice signal, otherwise corresponding pitch period candidate value is defined as the pitch period estimated value of described voice signal by the normalized autocorrelation functions maximal value in the first interval.
6. method according to claim 5, is characterized in that, between described the first interval, Second Region and the 3rd interval is specially [L_min, 39], [40,79], [80, L_max], wherein L_min means the initial value of pitch period hunting zone, L_max means the end value of pitch period hunting zone.
7. a pitch estimation device, is characterized in that, comprising:
The Signal Pretreatment unit, the pre-service of voice signal being removed to DC component, perceptual weighting and signal down-sampling;
The normalized autocorrelation functions computing unit, used following formula to calculate the normalized autocorrelation functions value of described pretreated voice signal:
ρ ( τ ) = Σ n = 0 N - 1 s ( n ) s ( n - τ ) Σ n = 0 N - 1 s 2 ( n ) Σ n = 0 N - 1 s 2 ( n - τ ) ,
Wherein, ρ (τ) means the normalized autocorrelation functions value, and s (n) is the voice signal after perceptual weighting, and τ means the voice fundamental cycle candidate value in search, the length that N is a frame signal after the signal down-sampling;
The pitch period post-processing unit, determine the maximal value in described normalized autocorrelation functions value in the pitch period hunting zone, and by described maximal value, corresponding pitch period candidate value is defined as the pitch period estimated value of described voice signal.
8. device according to claim 7, it is characterized in that, described Signal Pretreatment unit is further to the voice signal inner sampling rate that resamples, then the voice signal resampled is carried out to high-pass filtering to remove DC component, subsequently the voice signal after high-pass filtering is carried out to perceptual weighting, finally the voice signal after perceptual weighting is carried out to low-pass filtering and 1/2 down-sampling.
9. device according to claim 7, it is characterized in that, described pitch period post-processing unit is further according to the sampling rate of voice signal, the pitch period hunting zone is divided into to the first interval, between Second Region and the 3rd interval, obtain respectively each interval normalized autocorrelation functions maximal value and corresponding pitch period candidate value, and according to certain weight parameter, select the normalized autocorrelation functions maximal value of described pitch period hunting zone from the normalized autocorrelation functions maximal value in described three intervals, by this maximal value, corresponding pitch period candidate value is defined as the pitch period estimated value of described voice signal.
10. device according to claim 9, it is characterized in that, the normalized autocorrelation functions maximal value that described pitch period post-processing unit is selected described pitch period hunting zone according to certain weight parameter from the normalized autocorrelation functions maximal value in described three intervals is specially: judge whether the normalized autocorrelation functions maximal value between Second Region is more than or equal to the normalized autocorrelation functions maximal value in the first interval and the product of described weight parameter, if, by the normalized autocorrelation functions maximal value between Second Region, corresponding pitch period candidate value is defined as the pitch period estimated value of described voice signal, otherwise, further judge whether the normalized autocorrelation functions maximal value in the 3rd interval is more than or equal to the normalized autocorrelation functions maximal value in the first interval and the product of described weight parameter, if, by the normalized autocorrelation functions maximal value in the 3rd interval, corresponding pitch period candidate value is defined as the pitch period estimated value of described voice signal, otherwise corresponding pitch period candidate value is defined as the pitch period estimated value of described voice signal by the normalized autocorrelation functions maximal value in the first interval.
CN201310409433.8A 2013-09-09 2013-09-09 Pitch estimation method and apparatus Active CN103474074B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310409433.8A CN103474074B (en) 2013-09-09 2013-09-09 Pitch estimation method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310409433.8A CN103474074B (en) 2013-09-09 2013-09-09 Pitch estimation method and apparatus

Publications (2)

Publication Number Publication Date
CN103474074A true CN103474074A (en) 2013-12-25
CN103474074B CN103474074B (en) 2016-05-11

Family

ID=49798895

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310409433.8A Active CN103474074B (en) 2013-09-09 2013-09-09 Pitch estimation method and apparatus

Country Status (1)

Country Link
CN (1) CN103474074B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105185385A (en) * 2015-08-11 2015-12-23 东莞市凡豆信息科技有限公司 Voice fundamental tone frequency estimation method based on gender anticipation and multi-frequency-band parameter mapping
CN106205638A (en) * 2016-06-16 2016-12-07 清华大学 A kind of double-deck fundamental tone feature extracting method towards audio event detection
CN107039051A (en) * 2016-02-03 2017-08-11 重庆工商职业学院 Fundamental frequency detection method based on ant group optimization
CN108830232A (en) * 2018-06-21 2018-11-16 浙江中点人工智能科技有限公司 A kind of voice signal period divisions method based on multiple dimensioned nonlinear energy operator
CN109119097A (en) * 2018-10-30 2019-01-01 Oppo广东移动通信有限公司 Fundamental tone detecting method, device, storage medium and mobile terminal
CN110168641A (en) * 2016-10-04 2019-08-23 弗劳恩霍夫应用研究促进协会 Device and method for determining pitch information
CN110390953A (en) * 2019-07-25 2019-10-29 腾讯科技(深圳)有限公司 It utters long and high-pitched sounds detection method, device, terminal and the storage medium of voice signal

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108831504B (en) * 2018-06-13 2020-12-04 西安蜂语信息科技有限公司 Method and device for determining pitch period, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4486900A (en) * 1982-03-30 1984-12-04 At&T Bell Laboratories Real time pitch detection by stream processing
US5127053A (en) * 1990-12-24 1992-06-30 General Electric Company Low-complexity method for improving the performance of autocorrelation-based pitch detectors
CN101149924A (en) * 2006-09-18 2008-03-26 华为技术有限公司 Method and device for implementing open-loop pitch search

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4486900A (en) * 1982-03-30 1984-12-04 At&T Bell Laboratories Real time pitch detection by stream processing
US5127053A (en) * 1990-12-24 1992-06-30 General Electric Company Low-complexity method for improving the performance of autocorrelation-based pitch detectors
CN101149924A (en) * 2006-09-18 2008-03-26 华为技术有限公司 Method and device for implementing open-loop pitch search

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵丹明: "基于归一化自相关函数的开环基音分析算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105185385A (en) * 2015-08-11 2015-12-23 东莞市凡豆信息科技有限公司 Voice fundamental tone frequency estimation method based on gender anticipation and multi-frequency-band parameter mapping
CN107039051A (en) * 2016-02-03 2017-08-11 重庆工商职业学院 Fundamental frequency detection method based on ant group optimization
CN106205638A (en) * 2016-06-16 2016-12-07 清华大学 A kind of double-deck fundamental tone feature extracting method towards audio event detection
CN106205638B (en) * 2016-06-16 2019-11-08 清华大学 A kind of double-deck fundamental tone feature extracting method towards audio event detection
CN110168641A (en) * 2016-10-04 2019-08-23 弗劳恩霍夫应用研究促进协会 Device and method for determining pitch information
CN110168641B (en) * 2016-10-04 2023-09-22 弗劳恩霍夫应用研究促进协会 Apparatus and method for determining pitch information
CN108830232A (en) * 2018-06-21 2018-11-16 浙江中点人工智能科技有限公司 A kind of voice signal period divisions method based on multiple dimensioned nonlinear energy operator
CN108830232B (en) * 2018-06-21 2021-06-15 浙江中点人工智能科技有限公司 Voice signal period segmentation method based on multi-scale nonlinear energy operator
CN109119097A (en) * 2018-10-30 2019-01-01 Oppo广东移动通信有限公司 Fundamental tone detecting method, device, storage medium and mobile terminal
CN110390953A (en) * 2019-07-25 2019-10-29 腾讯科技(深圳)有限公司 It utters long and high-pitched sounds detection method, device, terminal and the storage medium of voice signal
CN110390953B (en) * 2019-07-25 2023-11-17 腾讯科技(深圳)有限公司 Method, device, terminal and storage medium for detecting howling voice signal

Also Published As

Publication number Publication date
CN103474074B (en) 2016-05-11

Similar Documents

Publication Publication Date Title
CN103474074B (en) Pitch estimation method and apparatus
CN103854662B (en) Adaptive voice detection method based on multiple domain Combined estimator
Prasad et al. Automatic segmentation of continuous speech using minimum phase group delay functions
CN101968957B (en) Voice detection method under noise condition
CN102054480B (en) Method for separating monaural overlapping speeches based on fractional Fourier transform (FrFT)
US10510363B2 (en) Pitch detection algorithm based on PWVT
CN103440872B (en) The denoising method of transient state noise
CN111128213B (en) Noise suppression method and system for processing in different frequency bands
CN102543073B (en) Shanghai dialect phonetic recognition information processing method
CN104021789A (en) Self-adaption endpoint detection method using short-time time-frequency value
CN103646649A (en) High-efficiency voice detecting method
CN101154383B (en) Method and device for noise suppression, phonetic feature extraction, speech recognition and training voice model
EP3739582A1 (en) Voice detection
CN104183245A (en) Method and device for recommending music stars with tones similar to those of singers
Morales-Cordovilla et al. A pitch based noise estimation technique for robust speech recognition with missing data
CN104599677A (en) Speech reconstruction-based instantaneous noise suppressing method
CN108682432B (en) Speech emotion recognition device
CN103996399B (en) Speech detection method and system
CN101625858A (en) Method for extracting short-time energy frequency value in voice endpoint detection
US10522160B2 (en) Methods and apparatus to identify a source of speech captured at a wearable electronic device
CN101447183A (en) Processing method of high-performance confidence level applied to speech recognition system
CN112116909A (en) Voice recognition method, device and system
Jain et al. Marginal energy density over the low frequency range as a feature for voiced/non-voiced detection in noisy speech signals
US6470311B1 (en) Method and apparatus for determining pitch synchronous frames
Patil et al. Effectiveness of Teager energy operator for epoch detection from speech signals

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220513

Address after: 510530 No. 10, Nanxiang 2nd Road, Science City, Luogang District, Guangzhou, Guangdong

Patentee after: Guangdong Guangsheng research and Development Institute Co.,Ltd.

Address before: 518057 6th floor, software building, No. 9, Gaoxin Zhongyi Road, high tech Zone, Nanshan District, Shenzhen, Guangdong Province

Patentee before: SHENZHEN RISING SOURCE TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right