Method for detecting speech endpoints and system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An endpoint and voice technology, applied in the computer field, can solve problems such as poor performance of detection technology

Active Publication Date: 2012-06-27

BEIJING BAIDU NETCOM SCI & TECH CO LTD

View PDF4 Cites 39 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] The present invention provides a method and system for detecting voice endpoints to solve the problem of poor performance of the existing voice endpoint detection technology

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0081] Embodiment 1. This embodiment provides a method for detecting voice endpoints, see figure 1 shown, including the following steps:

[0082] S11. Perform preprocessing on each frame of the audio signal of the input audio segment.

[0083] Specific audio signal preprocessing includes, but is not limited to, preprocessing of one or any combination of the following: pre-emphasis of each frame of audio signal (i.e. high-frequency part power boost), fast Fourier transform (FFT) and sub-band division, etc. .

[0084] S12. Extract the feature value of each frame of audio signal from the preprocessed audio signal of each frame.

[0085] The purpose of feature extraction is to extract one or several features for each frame of audio signal to distinguish speech / non-speech frames. Specifically extracted feature values include, but are not limited to, one or any combination of the following: subband spectrum entropy, energy, zero-crossing rate, and correlation. In this embodime...

Embodiment 2

[0119] Embodiment 2. This embodiment provides a method for detecting voice endpoints, see figure 2 shown, including the following steps:

[0120] S21. Perform preprocessing on each frame of the audio signal of the input audio segment.

[0121] The specific description is consistent with S11 and will not be repeated here.

[0122] S22. Extract the feature value of each frame of audio signal from the preprocessed audio signal of each frame.

[0123] The specific description is consistent with that of S12 and will not be repeated here.

[0124] S23. Search each frame in the input audio segment according to the forward order, and if the feature value of the current frame is greater than or equal to the current threshold value, update the current threshold value by using the feature value of the current frame.

[0125] In this embodiment, it is assumed that in the forward search process of the previous audio segment, the last frame of the previous audio segment is searched, and...

Embodiment 3

[0150] Embodiment 3. This embodiment provides a system for detecting voice endpoints, see image 3 As shown, it includes: an audio signal preprocessing unit 31 , a feature extraction unit 32 , a first direction search and threshold adaptive unit 33 , a second end point position detection unit 34 and a second direction search and first end point position detection unit 35 .

[0151] Wherein, the audio signal preprocessing unit 31 is configured to preprocess the audio signal of each frame of the audio segment. Specifically, the audio signal preprocessed by the audio signal preprocessing unit 31 includes, but is not limited to, preprocessing of one or any combination of the following: pre-emphasis of each frame of audio signal, fast Fourier transform (FFT) and sub-band division, etc.

[0152] A feature extraction unit 32, configured to extract feature values of each frame of audio signal from the preprocessed each frame of audio signal. Specifically, the purpose of feature ext...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a method for detecting speech endpoints and a system, which relate to the technical field of computers and are used for solving the problem of poor performances of existing speech endpoint detection technology. The method includes A1, searching for various frames in an inputted audio band, and utilizing a characteristic value of a current frame to update a current threshold value if the characteristic value of the current frame is larger than or equal to the current threshold value; A2, judging that a second speech endpoint of the audio band is detected if characteristic values of N continuous frames are smaller than the current threshold value, and stopping a first direction search process; and A3, starting search from the second speech endpoint according to a second-direction sequence, and utilizing the characteristic values of the N continuous frames to detect a first speech endpoint of the audio band. The N is a preset frame number. The system comprises a first forward-direction search and threshold self-adaptive unit, a finishing point position detecting unit and a reverse-direction search and starting point position detecting unit. The method for detecting speech endpoints and the system are applicable to all audio search environments.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a method and system for detecting voice endpoints. Background technique [0002] The existing speech endpoint detection technology is based on a premise that the initial segment of the speech is assumed to be noise, and the threshold value is trained using the initial segment of noise. This assumption may not always be established in mobile phone voice search applications. Sometimes the user will start talking immediately after pressing the search button. Detection is not accurate. [0003] Based on the above assumptions, the existing speech endpoint detection method adopts the process of assuming that the initial segment is a non-speech segment, divides the speech signal into frames, extracts feature values frame by frame, and uses the feature value to compare with the threshold value set in advance. If it is greater than or equal to the threshold value, it is judged as sp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L11/02G10L25/06

Inventor 宋辉

Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method for detecting speech endpoints and system

What is Al technical title? Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document. An endpoint and voice technology, applied in the computer field, can solve problems such as poor performance of detection technology

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An endpoint and voice technology, applied in the computer field, can solve problems such as poor performance of detection technology

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology