Voice activity detection method and device
An endpoint detection and voice technology, applied in voice analysis, speech recognition, instruments, etc., can solve the problem of low accuracy of endpoint detection and achieve the effect of ensuring accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
example 1
[0064] Example 1. There are overlapping voices in the voice signal. For example, the acquired text information corresponding to the voice signal is as follows.
[0065] User A: I don't know that's true;
[0066] User B: True.
[0067] Analysis: The words spoken by user A and user B include the same voice "really", and there are overlapping voices. Although the overlapping voices belong to non-linguistic information, they do not belong to the dragging information, hesitation information and delay information in non-linguistic information Any one of them, therefore, the detection duration when the speech end endpoint is detected will be processed according to the second duration in step 203.
example 2
[0068] Example 2. There are deep utterances in the voice signal. For example, the acquired text information corresponding to the voice signal is as follows.
[0069] User A: Let's wait;
[0070] User B: OK.
[0071] Analysis: The acquired voice signals show that the speaking states of both user A and user B are low-pitched, which refers to the state information of the speaker, which belongs to non-linguistic information, but does not belong to any of the drag information, hesitation information and delay information. One, therefore, the detection duration when the end point of speech is detected will be processed according to the second duration in step 203 .
example 3
[0072] Example 3. There is a speech pause in the speech signal. For example, the acquired text information corresponding to the speech signal is as follows.
[0073] User A: He drives (pause 200 milliseconds) up the hill; (turn-turn 1.3 seconds)
[0074] User B: Really? (Pause 150ms) How far?
[0075] Analysis: The acquired voice signal shows that user A pauses when he says "drive" and "up the mountain", and user B says "really" and "how far", which belongs to the delay information in non-linguistic feature information, so , the detection duration when the speech end endpoint is detected will be processed according to the first duration in step 203.
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com