Voice activity detection method and device, and equipment
A voice activity detection and voice technology, applied in voice analysis, instruments, etc., can solve the problems of time-consuming and labor-intensive, complex design and debugging, and poor versatility of VAD tools, so as to achieve good versatility and improve recognition efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0038] figure 1 It is a flow chart of the voice activity detection method provided by Embodiment 1 of the present invention. The embodiment of the present invention provides a speech activity detection method for the problems that the current VAD tools are poor in versatility, require complex design and debugging for different product lines, and consume time and effort. Such as figure 1 Said, the specific steps of the method are as follows:
[0039] Step S101, extracting the acoustic features of the audio frame to be detected.
[0040] Wherein, the acoustic feature of the audio frame may be information representing the feature of the audio signal.
[0041] In this embodiment, the acoustic feature can be Mel Frequency Cepstral Coefficient (MFCC for short), Mel-scale Filter Bank (Mel-scale Filter Bank, FBank for short), linear predictive cepstral coefficient (Linear Predictive Cepstral Coding (LPCC for short), or the magnitude of Fast Fourier Transform (FFT for short), etc. ...
Embodiment 2
[0054] figure 2 It is a flow chart of the voice activity detection method provided by Embodiment 2 of the present invention. On the basis of the first embodiment above, in this embodiment, before extracting the acoustic features of the audio frame to be detected, it also includes: acquiring the audio to be detected, performing frame processing on the audio to be detected, and obtaining at least one audio frame to be detected Detected audio frames. Such as figure 2 As shown, the specific steps of the method are as follows:
[0055] Step S201. Acquire the audio to be detected, perform frame processing on the audio to be detected, and obtain at least one audio frame to be detected.
[0056] The audio to be detected in this embodiment may include silence and / or noise segments. For example, it may be a piece of audio input by the user. Based on the user's speaking habits and surrounding environment, most user-input audio includes long periods of silence and noise.
[0057] Du...
Embodiment 3
[0097] image 3 It is a schematic structural diagram of a voice activity detection device provided in Embodiment 3 of the present invention. The voice activity detection device provided in the embodiment of the present invention can execute the processing flow provided in the voice activity detection method embodiment. Such as image 3 As shown, the device 30 includes: a feature extraction module 301 , a detection module 302 and a determination module 303 .
[0098] Specifically, the feature extraction module 301 is used to extract the acoustic features of the audio frame to be detected.
[0099] The detection module 302 is used for inputting the acoustic feature of the audio frame to be detected into a preset deep neural network model, and calculates the value of the output node corresponding to the audio frame to be detected, and the deep neural network model is formed by the acoustic feature of each audio frame in the training data and labeled data to train the deep neur...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


