Method for detecting speech endpoints and system
An endpoint and voice technology, applied in the computer field, can solve problems such as poor performance of detection technology
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0081] Embodiment 1. This embodiment provides a method for detecting voice endpoints, see figure 1 shown, including the following steps:
[0082] S11. Perform preprocessing on each frame of the audio signal of the input audio segment.
[0083] Specific audio signal preprocessing includes, but is not limited to, preprocessing of one or any combination of the following: pre-emphasis of each frame of audio signal (i.e. high-frequency part power boost), fast Fourier transform (FFT) and sub-band division, etc. .
[0084] S12. Extract the feature value of each frame of audio signal from the preprocessed audio signal of each frame.
[0085] The purpose of feature extraction is to extract one or several features for each frame of audio signal to distinguish speech / non-speech frames. Specifically extracted feature values include, but are not limited to, one or any combination of the following: subband spectrum entropy, energy, zero-crossing rate, and correlation. In this embodime...
Embodiment 2
[0119] Embodiment 2. This embodiment provides a method for detecting voice endpoints, see figure 2 shown, including the following steps:
[0120] S21. Perform preprocessing on each frame of the audio signal of the input audio segment.
[0121] The specific description is consistent with S11 and will not be repeated here.
[0122] S22. Extract the feature value of each frame of audio signal from the preprocessed audio signal of each frame.
[0123] The specific description is consistent with that of S12 and will not be repeated here.
[0124] S23. Search each frame in the input audio segment according to the forward order, and if the feature value of the current frame is greater than or equal to the current threshold value, update the current threshold value by using the feature value of the current frame.
[0125] In this embodiment, it is assumed that in the forward search process of the previous audio segment, the last frame of the previous audio segment is searched, and...
Embodiment 3
[0150] Embodiment 3. This embodiment provides a system for detecting voice endpoints, see image 3 As shown, it includes: an audio signal preprocessing unit 31 , a feature extraction unit 32 , a first direction search and threshold adaptive unit 33 , a second end point position detection unit 34 and a second direction search and first end point position detection unit 35 .
[0151] Wherein, the audio signal preprocessing unit 31 is configured to preprocess the audio signal of each frame of the audio segment. Specifically, the audio signal preprocessed by the audio signal preprocessing unit 31 includes, but is not limited to, preprocessing of one or any combination of the following: pre-emphasis of each frame of audio signal, fast Fourier transform (FFT) and sub-band division, etc.
[0152] A feature extraction unit 32, configured to extract feature values of each frame of audio signal from the preprocessed each frame of audio signal. Specifically, the purpose of feature ext...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com