Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for extracting short-time energy frequency value in voice endpoint detection

A technology of endpoint detection and extraction method, which is applied in speech analysis, speech recognition, instruments, etc., and can solve problems such as poor performance, poor discrimination effect, and failure

Inactive Publication Date: 2010-01-13
CHINA DIGITAL VIDEO BEIJING
View PDF0 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method can accurately distinguish speech from noises such as car engines and door closing sounds, but it is less effective in distinguishing speech from music
[0008] No matter which audio parameters are used, traditional speech endpoint detection methods have great shortcomings in specific noise environments
For example, energy-based methods do not perform well in low SNR environments; information-entropy-based algorithms fail in music backgrounds

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for extracting short-time energy frequency value in voice endpoint detection
  • Method for extracting short-time energy frequency value in voice endpoint detection
  • Method for extracting short-time energy frequency value in voice endpoint detection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049]The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0050] (1) Extraction of three audio characteristic parameters of short-term energy, short-term zero-crossing rate and short-term information entropy

[0051] 1. Short-term energy

[0052] Energy is one of the most frequently used audio feature parameters and is the most intuitive representation of speech signals. The energy analysis of speech signals is based on the fact that speech signal amplitudes vary considerably over time. The energy can be used to distinguish the unvoiced and voiced segments of pronunciation, the larger energy value corresponds to the unvoiced segment, and the smaller energy value corresponds to the voiced segment. For signals with high signal-to-noise ratio, energy can be used to judge whether there is speech or not. The noise energy without speech signal is small, but the energy will increase significantly when th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to voice detection technology in an automatic caption generating system, in particular to a method for extracting a short-time energy frequency value in voice endpoint detection. The method comprises the following steps: dividing an audio sampling sequence into frames with fixed lengths, and forming a frame sequence; extracting three audio characteristic parameters comprising short-time energy, short-time zero-crossing rate and short-time information entropy aiming at data of each frame; and calculating short-time energy frequency values of the data of each frame according to the audio characteristic parameters, and forming a short-time energy frequency value sequence. By combining the audio characteristic parameters of a time domain and a frequency domain, the method can develop respective advantages of the characteristic parameters, and can avoid respective disadvantages to a certain extent at the same time so as to effectively treat background noise of various different types.

Description

technical field [0001] The invention relates to a speech detection technology in an automatic subtitle generation system, in particular to a method for extracting short-time energy-frequency values ​​in speech endpoint detection. Background technique [0002] Speech endpoint detection technology is a new field of speech technology research, which is applied in automatic subtitle generation system. The current subtitle production method first needs to prepare a subtitle manuscript. This subtitle manuscript refers to a text file written in advance before making a TV program, which records the title of the program, what the host wants to say, and what the interviewee said. words and other content. When making TV programs, editors add audio and video materials to the storyboard of non-linear editing software, and then edit them according to the gist of the program. Editing operations generally include modifying the position of the material, adding some special effects, adding ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L11/00G10L11/02G10L15/04
Inventor 李祺马华东郑侃彦韩忠涛张婷
Owner CHINA DIGITAL VIDEO BEIJING
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products