End-point detecting method, apparatus and speech recognition system based on sliding window

An endpoint detection and voice technology, applied in voice recognition, voice analysis, instruments, etc., can solve problems such as deterioration of system recognition rate

Active Publication Date: 2006-04-26
INST OF ACOUSTICS CHINESE ACAD OF SCI +2
View PDF0 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

A good endpoint detection algorithm can provide good system robustness; on the contrary, a poor endpoint detection algorithm will lead to a sharp deterioration of the system recognition rate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • End-point detecting method, apparatus and speech recognition system based on sliding window
  • End-point detecting method, apparatus and speech recognition system based on sliding window
  • End-point detecting method, apparatus and speech recognition system based on sliding window

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] figure 1 is a schematic diagram of a sliding window in the time domain. The present invention adopts the idea of ​​a sliding window, takes a certain number of frames as the size of a sliding window, and then judges whether the voice begins to appear and ends according to whether the energy sum of all voices in the window is greater than or less than a certain parameter, to improve robustness. Such as figure 1 As shown, the horizontal horizontal axis represents the time of each frame of the input speech signal, and the vertical vertical axis represents the signal (level) amplitude of each frame of the input speech signal. The white strip frame is the sliding window.

[0040] figure 2 is another schematic diagram of a sliding window. exist figure 2 In , the lower horizontal axis is still the time axis, which represents the time of each frame of the input speech signal. However, the vertical axis represents the frequency spectrum transformed in the frequency domai...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an end detection method and device, which comprises the following steps: applying a window to the input phonetic signal; selecting certain frame quantity as window dimension; affirming the background noise starting point in the input phonetic signal; calculating the background noise energy; calculating the present frame phonetic energy and window energy; comparing the widow total phonetic energy whether more than the product of background noise energy multiplied by phonetic starting point signal-to-noise ratio; sliding the window to the next frame and returning to calculate the present frame phonetic energy if not; judging the present frame as phonetic starting point if yes. The invention improves the detection accuracy, robustness and total discrimination of phonetic identification system, which is used in the phonetic identification system.

Description

technical field [0001] The present invention relates to a method of endpoint detection (VAD), more specifically, the present invention relates to a method and device for detecting a speech endpoint used in a speech recognition system, and a speech recognition system using the detection method. Background technique [0002] In the speech recognition application system, the input signal includes the speech signal of the user speaking, the background noise signal and so on. The process of extracting the speech signal of the user's utterance from the input signal is called endpoint detection. [0003] The difficulty of commercialization of the speech recognition system lies in the improvement of robustness (Robustness). The robustness of the speech recognition system is affected by many uncertain factors such as the speaker and the speech channel used in the environment. A speech recognition system may have a high recognition rate in a normal test, but when used in an actual e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/20G10L21/0232
Inventor 余洪涌赵庆卫
Owner INST OF ACOUSTICS CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products