Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for detecting speech endpoints and system

An endpoint and voice technology, applied in the computer field, can solve problems such as poor performance of detection technology

Active Publication Date: 2012-06-27
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF4 Cites 39 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention provides a method and system for detecting voice endpoints to solve the problem of poor performance of the existing voice endpoint detection technology

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for detecting speech endpoints and system
  • Method for detecting speech endpoints and system
  • Method for detecting speech endpoints and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0081] Embodiment 1. This embodiment provides a method for detecting voice endpoints, see figure 1 shown, including the following steps:

[0082] S11. Perform preprocessing on each frame of the audio signal of the input audio segment.

[0083] Specific audio signal preprocessing includes, but is not limited to, preprocessing of one or any combination of the following: pre-emphasis of each frame of audio signal (i.e. high-frequency part power boost), fast Fourier transform (FFT) and sub-band division, etc. .

[0084] S12. Extract the feature value of each frame of audio signal from the preprocessed audio signal of each frame.

[0085] The purpose of feature extraction is to extract one or several features for each frame of audio signal to distinguish speech / non-speech frames. Specifically extracted feature values ​​include, but are not limited to, one or any combination of the following: subband spectrum entropy, energy, zero-crossing rate, and correlation. In this embodime...

Embodiment 2

[0119] Embodiment 2. This embodiment provides a method for detecting voice endpoints, see figure 2 shown, including the following steps:

[0120] S21. Perform preprocessing on each frame of the audio signal of the input audio segment.

[0121] The specific description is consistent with S11 and will not be repeated here.

[0122] S22. Extract the feature value of each frame of audio signal from the preprocessed audio signal of each frame.

[0123] The specific description is consistent with that of S12 and will not be repeated here.

[0124] S23. Search each frame in the input audio segment according to the forward order, and if the feature value of the current frame is greater than or equal to the current threshold value, update the current threshold value by using the feature value of the current frame.

[0125] In this embodiment, it is assumed that in the forward search process of the previous audio segment, the last frame of the previous audio segment is searched, and...

Embodiment 3

[0150] Embodiment 3. This embodiment provides a system for detecting voice endpoints, see image 3 As shown, it includes: an audio signal preprocessing unit 31 , a feature extraction unit 32 , a first direction search and threshold adaptive unit 33 , a second end point position detection unit 34 and a second direction search and first end point position detection unit 35 .

[0151] Wherein, the audio signal preprocessing unit 31 is configured to preprocess the audio signal of each frame of the audio segment. Specifically, the audio signal preprocessed by the audio signal preprocessing unit 31 includes, but is not limited to, preprocessing of one or any combination of the following: pre-emphasis of each frame of audio signal, fast Fourier transform (FFT) and sub-band division, etc.

[0152] A feature extraction unit 32, configured to extract feature values ​​of each frame of audio signal from the preprocessed each frame of audio signal. Specifically, the purpose of feature ext...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method for detecting speech endpoints and a system, which relate to the technical field of computers and are used for solving the problem of poor performances of existing speech endpoint detection technology. The method includes A1, searching for various frames in an inputted audio band, and utilizing a characteristic value of a current frame to update a current threshold value if the characteristic value of the current frame is larger than or equal to the current threshold value; A2, judging that a second speech endpoint of the audio band is detected if characteristic values of N continuous frames are smaller than the current threshold value, and stopping a first direction search process; and A3, starting search from the second speech endpoint according to a second-direction sequence, and utilizing the characteristic values of the N continuous frames to detect a first speech endpoint of the audio band. The N is a preset frame number. The system comprises a first forward-direction search and threshold self-adaptive unit, a finishing point position detecting unit and a reverse-direction search and starting point position detecting unit. The method for detecting speech endpoints and the system are applicable to all audio search environments.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a method and system for detecting voice endpoints. Background technique [0002] The existing speech endpoint detection technology is based on a premise that the initial segment of the speech is assumed to be noise, and the threshold value is trained using the initial segment of noise. This assumption may not always be established in mobile phone voice search applications. Sometimes the user will start talking immediately after pressing the search button. Detection is not accurate. [0003] Based on the above assumptions, the existing speech endpoint detection method adopts the process of assuming that the initial segment is a non-speech segment, divides the speech signal into frames, extracts feature values ​​frame by frame, and uses the feature value to compare with the threshold value set in advance. If it is greater than or equal to the threshold value, it is judged as sp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L11/02G10L25/06
Inventor 宋辉
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products