Pitch detection method, device and medium based on discrete logarithmic Fourier transform
A technology of Fourier transform and discrete logarithm, applied in speech analysis, instruments, etc., can solve problems affecting pitch accuracy and correlation
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
no. 1 example
[0035]As described in the Background of the Invention section, in the logarithmic frequency space, the distance from the pitch to a predetermined harmonic (such as the third harmonic) is constant. In order to avoid increasing the amount of computation, the frequency search window can be fixed to such a distance. However, since the pitch varies greatly, the fixed window does not work, or works well, when the pitch is too low or too high, because the actual pitch will be outside the frequency window, or some harmonic Waves can exceed the window and affect the correlation value, such as Figure 4 shown.
[0036] So, in theory, if you move up or down the fixed-size frequency search window appropriately, you can always cover the same number of harmonics (and fundamental tone).
[0037] Based on such an idea, in the first embodiment, the inventor proposes to move the window according to the latest valid pitch of the previously detected wave segment.
[0038] In this example, if ...
no. 2 example
[0045] In the second embodiment, the method may further include a step of calculating a score (S500), based on the correlation obtained in the step of performing a correlation between the spectrum and the template (S200) in the pitch detection of the last wave segment As a result, a score is calculated. Correspondingly, the frequency window modifier may further include a score calculator 310 for performing the above step S500.
[0046] In this case, the move operation is performed only when the score is within a certain range, thereby avoiding unnecessary move operations that increase the amount of computation. For example, when the present invention is applied to human speech recognition, a score value that is too low likely means that the current wave is not human speech, and therefore no action is required. That is, for example, the move operation is performed only if the score is above a first threshold.
[0047] The score reflects the confidence value of the detected pi...
no. 3 example
[0054] In the third embodiment, the moving means 314 (correspondingly, the step of moving the frequency window) in the second embodiment may be replaced by the expanding means 316 (correspondingly, the step of expanding the frequency window).
[0055] In this case, the expansion operation is performed only when the score is within a certain range, thereby avoiding unnecessary expansion operations that increase the amount of computation. In particular, if the score is high enough, there is no need to expand the frequency bin and a standard sized frequency bin can be used. For example, if the score is lower than the second threshold, indicating that the confidence value of the latest detected pitch is relatively low, the frequency range may be extended to cover more possible pitch values.
[0056] As a criterion for performing the step of expanding the frequency window (S600), the second threshold of the score depends on a specific method of calculating the score. For example, ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 