Unlock instant, AI-driven research and patent intelligence for your innovation.

Voice activity detection method and device, and equipment

A voice activity detection and voice technology, applied in voice analysis, instruments, etc., can solve the problems of time-consuming and labor-intensive, complex design and debugging, and poor versatility of VAD tools, so as to achieve good versatility and improve recognition efficiency

Inactive Publication Date: 2018-10-12
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF8 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a voice activity detection method, device and equipment to solve the problem that current VAD tools are poor in versatility and require complex design and debugging for different product lines, which is time-consuming and labor-intensive.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice activity detection method and device, and equipment
  • Voice activity detection method and device, and equipment
  • Voice activity detection method and device, and equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0038] figure 1 It is a flow chart of the voice activity detection method provided by Embodiment 1 of the present invention. The embodiment of the present invention provides a speech activity detection method for the problems that the current VAD tools are poor in versatility, require complex design and debugging for different product lines, and consume time and effort. Such as figure 1 Said, the specific steps of the method are as follows:

[0039] Step S101, extracting the acoustic features of the audio frame to be detected.

[0040] Wherein, the acoustic feature of the audio frame may be information representing the feature of the audio signal.

[0041] In this embodiment, the acoustic feature can be Mel Frequency Cepstral Coefficient (MFCC for short), Mel-scale Filter Bank (Mel-scale Filter Bank, FBank for short), linear predictive cepstral coefficient (Linear Predictive Cepstral Coding (LPCC for short), or the magnitude of Fast Fourier Transform (FFT for short), etc. ...

Embodiment 2

[0054] figure 2 It is a flow chart of the voice activity detection method provided by Embodiment 2 of the present invention. On the basis of the first embodiment above, in this embodiment, before extracting the acoustic features of the audio frame to be detected, it also includes: acquiring the audio to be detected, performing frame processing on the audio to be detected, and obtaining at least one audio frame to be detected Detected audio frames. Such as figure 2 As shown, the specific steps of the method are as follows:

[0055] Step S201. Acquire the audio to be detected, perform frame processing on the audio to be detected, and obtain at least one audio frame to be detected.

[0056] The audio to be detected in this embodiment may include silence and / or noise segments. For example, it may be a piece of audio input by the user. Based on the user's speaking habits and surrounding environment, most user-input audio includes long periods of silence and noise.

[0057] Du...

Embodiment 3

[0097] image 3 It is a schematic structural diagram of a voice activity detection device provided in Embodiment 3 of the present invention. The voice activity detection device provided in the embodiment of the present invention can execute the processing flow provided in the voice activity detection method embodiment. Such as image 3 As shown, the device 30 includes: a feature extraction module 301 , a detection module 302 and a determination module 303 .

[0098] Specifically, the feature extraction module 301 is used to extract the acoustic features of the audio frame to be detected.

[0099] The detection module 302 is used for inputting the acoustic feature of the audio frame to be detected into a preset deep neural network model, and calculates the value of the output node corresponding to the audio frame to be detected, and the deep neural network model is formed by the acoustic feature of each audio frame in the training data and labeled data to train the deep neur...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a voice activity detection method and device, and equipment. The voice activity detection method comprises the steps that a deep neural network model is obtained by training a deep neural network through acoustic features of all audio frames in training data and mark data in advance; acoustic features of extracted to-be-detected audio frames are directly input into the preset deep neural network model, and values of output nodes corresponding to the to-be-detected audio frames are calculated; and according to the values of the output nodes corresponding to the to-be-detected audio frames, whether the to-be-detected audio frames are effective voice or nor is determined. Aiming at different application scenes and product lines, only the training data corresponding to the application scenes or the product lines need to be adopted to train the deep neural network, and the obtained deep neural network model can be applicable to the corresponding scene or product lineand can be applicable to the various different scenes or product lines; and universality is good, the complex feature design process and artificial debugging do not need to be conducted on the acoustic features, and the efficiency for recognition of the audio frames is improved.

Description

technical field [0001] The invention relates to the field of voice recognition, in particular to a voice activity detection method, device and equipment. Background technique [0002] With the continuous popularization of voice search services, more and more people begin to use their own voice as a means of interaction. The user uploads the input audio to the server through the mobile terminal, and the server performs voice recognition and search based on the audio. [0003] Based on the user's speaking habits, the audio input by most users includes a long period of silence. If all the input audio is transmitted to the server, the long period of silence in the audio will consume a lot of traffic, and at the same time give the server's voice recognition The engine puts a lot of pressure on it. At present, most of the voice activity detection (Voice Activity Detector, VAD for short) tools are used to identify and eliminate long-term silent segments from the audio signal stre...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L25/78G10L25/30
CPCG10L25/30G10L25/78
Inventor 李超朱唯鑫文铭
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD