Voice activity detection method and system

A voice activity detection and voice segment technology, which is applied in voice analysis, instruments, etc., can solve problems such as performance degradation and impact of detection results, and achieve good detection performance and good self-adaptive effects

Active Publication Date: 2016-07-27
SPREADTRUM COMM (SHANGHAI) CO LTD
View PDF8 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the background environment with low signal-to-noise ratio or unstable noise, the detectio

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice activity detection method and system
  • Voice activity detection method and system
  • Voice activity detection method and system

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0034] In order to make the above-mentioned objects, features and advantages of the present invention more obvious and understandable, specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0035] Reference figure 1 , Explains the voice activity detection method 100 according to an embodiment of the present invention. The method includes the following steps.

[0036] S101: Calculate the spectral density of the current frame of the audio signal.

[0037] In the method 100, voice activity detection is performed frame by frame. Specifically, the audio signal is divided into multiple frames, and then each frame of the audio signal is detected separately to determine the voice segment and the non-speech segment of the audio signal. Among them, the length of each frame can be set from 10ms to 30ms. Therefore, the current frame of the audio signal is the frame for which voice activity detection is currently required...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a voice activity detection method and system. The method comprises the steps: calculating the spectrum density of a current frame of an audio signal; estimating an expected value of the spectrum density of noise; calculating the signal to noise ratio of the current frame based on the spectrum density of the current frame and the spectrum density of noise; and generating a voice activity detection result based on the signal to noise ratio of the current frame and a preset threshold. Therefore, the voice activity detection result is related with the probability statistics distribution of noise, thereby overcoming the impact on the detection result from noise. Meanwhile, the preset threshold is a dynamic threshold and is related with the change of noise, thereby enabling the detection result to be adapted to the noise environment of the current frame.

Description

technical field [0001] The invention relates to voice recognition technology, in particular to a voice activity detection method and system thereof. Background technique [0002] Voice activity detection (Voice Activity detection, VAD), also known as voice detection, is used in voice processing to detect the presence or absence of voice, thereby separating voice segments and non-speech segments in a signal. VAD can be used for echo cancellation, noise suppression, speaker recognition and speech recognition, etc. [0003] The traditional VAD algorithm often selects the short-term energy, spectrum energy, zero-crossing rate and other characteristics of the audio signal for judgment. Therefore, in a pure speech environment and a high SNR environment, the performance is better. However, in a background environment with a low signal-to-noise ratio or unstable noise, the detection result will be affected by the noise feature quantity, resulting in performance degradation. [00...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L21/0208G10L25/84
Inventor 孙廷玮林福辉
Owner SPREADTRUM COMM (SHANGHAI) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products