Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method and system for detecting voice endpoints

An endpoint and voice technology, applied in the computer field, can solve problems such as poor performance of detection technology

Active Publication Date: 2015-08-05
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention provides a method and system for detecting voice endpoints to solve the problem of poor performance of the existing voice endpoint detection technology

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for detecting voice endpoints
  • A method and system for detecting voice endpoints
  • A method and system for detecting voice endpoints

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0081] Embodiment 1. This embodiment provides a method for detecting voice endpoints, see figure 1 shown, including the following steps:

[0082] S11. Preprocess the audio signal of each frame of the input audio segment.

[0083] The specific audio signal preprocessing includes, but is not limited to, one or any combination of the following preprocessing: pre-emphasis of each frame of audio signal (ie, power boosting of high-frequency parts), fast Fourier transform (FFT), and sub-band division, etc. .

[0084] S12. Extract the feature value of each frame of audio signal from the preprocessed audio signal of each frame.

[0085] The purpose of feature extraction is to extract one or several features for each frame of audio signal to distinguish speech / non-speech frames. Specifically extracted feature values ​​include, but are not limited to, one or any combination of the following: subband spectral entropy, energy, zero-crossing rate, correlation, and the like. In this embo...

Embodiment 2

[0119] Embodiment 2. This embodiment provides a method for detecting voice endpoints, see figure 2 shown, including the following steps:

[0120] S21. Preprocess the audio signal of each frame of the input audio segment.

[0121] The specific description is consistent with that of S11 and will not be repeated here.

[0122] S22. Extract the feature value of each frame of audio signal from the preprocessed audio signal of each frame.

[0123] The specific description is consistent with that of S12, and will not be repeated here.

[0124] S23. Search each frame in the input audio segment in a forward sequence, and if the feature value of the current frame is greater than or equal to the current threshold value, update the current threshold value by using the feature value of the current frame.

[0125] In this embodiment, it is assumed that in the forward search process of the previous audio segment, the last frame of the previous audio segment is searched, and there is no s...

Embodiment 3

[0150] Embodiment 3. This embodiment provides a system for detecting voice endpoints, see image 3 As shown, it includes: audio signal preprocessing unit 31 , feature extraction unit 32 , first direction search and threshold adaptation unit 33 , second end point position detection unit 34 and second direction search and first end point position detection unit 35 .

[0151] The audio signal preprocessing unit 31 is configured to preprocess the audio signal of each frame of the audio segment. Specifically, the audio signal preprocessed by the audio signal preprocessing unit 31 includes, but is not limited to, one or any combination of the following preprocessing: pre-emphasis, fast Fourier transform (FFT), and sub-band division of each frame of audio signal.

[0152] The feature extraction unit 32 is configured to extract the feature value of each frame of audio signal from the preprocessed audio signal of each frame. Specifically, the purpose of feature extraction performed by...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method for detecting speech endpoints and a system, which relate to the technical field of computers and are used for solving the problem of poor performances of existing speech endpoint detection technology. The method includes A1, searching for various frames in an inputted audio band, and utilizing a characteristic value of a current frame to update a current threshold value if the characteristic value of the current frame is larger than or equal to the current threshold value; A2, judging that a second speech endpoint of the audio band is detected if characteristic values of N continuous frames are smaller than the current threshold value, and stopping a first direction search process; and A3, starting search from the second speech endpoint according to a second-direction sequence, and utilizing the characteristic values of the N continuous frames to detect a first speech endpoint of the audio band. The N is a preset frame number. The system comprises a first forward-direction search and threshold self-adaptive unit, a finishing point position detecting unit and a reverse-direction search and starting point position detecting unit. The method for detecting speech endpoints and the system are applicable to all audio search environments.

Description

technical field [0001] The present invention relates to the field of computer technology, in particular to a method and system for detecting voice endpoints. Background technique [0002] The existing voice endpoint detection technology is based on a premise, that is, it is assumed that the initial segment of speech is noise, and the threshold value is trained by using the initial noise segment. This assumption may not always be true in mobile phone voice search applications. Sometimes the user will start talking immediately after pressing the search button. At this time, the threshold value training of the existing endpoint detection technology will be wrong, resulting in the difference between the starting point and the ending point. Detection is not accurate. [0003] Based on the above assumptions, the process adopted by the existing voice endpoint detection method is to assume that the initial segment is a non-voice segment, divide the voice signal into frames, extract...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L25/54G10L25/06
Inventor 宋辉
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products