Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Voice activity detection method and system

A voice activity detection and voice segment technology, which is applied in voice analysis, instruments, etc., can solve problems such as performance degradation and impact of detection results, and achieve good detection performance and good self-adaptive effects

Active Publication Date: 2016-07-27
SPREADTRUM COMM (SHANGHAI) CO LTD
View PDF8 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the background environment with low signal-to-noise ratio or unstable noise, the detection result will be affected by the noise feature quantity, resulting in performance degradation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice activity detection method and system
  • Voice activity detection method and system
  • Voice activity detection method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] In order to make the above objects, features and advantages of the present invention more comprehensible, specific embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0035] refer to figure 1 , which illustrates the voice activity detection method 100 according to an embodiment of the present invention. The method includes the following steps.

[0036] S101. Calculate the spectral density of the current frame of the audio signal.

[0037] In method 100, voice activity detection is performed on a frame-by-frame basis. Specifically, the audio signal is divided into multiple frames, and then each frame of the audio signal is detected separately, so as to determine the speech segment and the non-speech segment of the audio signal. Wherein, the length range of each frame may be set to 10ms to 30ms. Therefore, the current frame of the audio signal is the current frame for which voice activity detection ne...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a voice activity detection method and system. The method comprises the steps: calculating the spectrum density of a current frame of an audio signal; estimating an expected value of the spectrum density of noise; calculating the signal to noise ratio of the current frame based on the spectrum density of the current frame and the spectrum density of noise; and generating a voice activity detection result based on the signal to noise ratio of the current frame and a preset threshold. Therefore, the voice activity detection result is related with the probability statistics distribution of noise, thereby overcoming the impact on the detection result from noise. Meanwhile, the preset threshold is a dynamic threshold and is related with the change of noise, thereby enabling the detection result to be adapted to the noise environment of the current frame.

Description

technical field [0001] The invention relates to voice recognition technology, in particular to a voice activity detection method and system thereof. Background technique [0002] Voice activity detection (Voice Activity detection, VAD), also known as voice detection, is used in voice processing to detect the presence or absence of voice, thereby separating voice segments and non-speech segments in a signal. VAD can be used for echo cancellation, noise suppression, speaker recognition and speech recognition, etc. [0003] The traditional VAD algorithm often selects the short-term energy, spectrum energy, zero-crossing rate and other characteristics of the audio signal for judgment. Therefore, in a pure speech environment and a high SNR environment, the performance is better. However, in a background environment with a low signal-to-noise ratio or unstable noise, the detection result will be affected by the noise feature quantity, resulting in performance degradation. [00...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L21/0208G10L25/84
Inventor 孙廷玮林福辉
Owner SPREADTRUM COMM (SHANGHAI) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products