Voice end detection method, device, terminal and storage medium

An endpoint detection and speech technology, applied in the computer field, can solve problems such as false truncation, slow speech speed, inaccurate speech recognition, etc., and achieve the effect of improving accuracy

Active Publication Date: 2019-05-17
APOLLO INTELLIGENT CONNECTIVITY (BEIJING) TECH CO LTD
View PDF9 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, everyone speaks at a different rate, with some people speaking fast and others speaking slowly
If the same VAD parameter threshold is used for all people in the speech endpoint detection process, some people's speech recognition effect will be better,

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice end detection method, device, terminal and storage medium
  • Voice end detection method, device, terminal and storage medium
  • Voice end detection method, device, terminal and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0023] figure 1 It is a flow chart of the voice endpoint detection method provided by Embodiment 1 of the present invention. This embodiment is applicable to the situation of detecting the user's voice endpoint in the voice-based human-computer interaction process, and the method can be executed by the voice endpoint detection device. The device can be implemented in the form of software and / or hardware, and can be integrated on a terminal with a voice recognition function, such as an intelligent mobile terminal and a vehicle-mounted device.

[0024] Such as figure 1 As shown, the voice endpoint detection method provided in this embodiment may include:

[0025] S110. Determine whether the difference between the user's current speech rate and the historical average speech rate is within a preset difference range.

[0026] During the voice interaction process between the user and the terminal, the terminal can call a voice collection device, such as a microphone, to obtain the...

Embodiment 2

[0038] figure 2 It is a flow chart of the speech endpoint detection method provided by Embodiment 2 of the present invention, and this embodiment is further optimized on the basis of the foregoing embodiments. Such as figure 2 As shown, the method may include:

[0039] S210. Determine whether the difference between the user's current speech rate and the historical average speech rate is within a preset difference range.

[0040] S220. If the difference between the current speech rate and the historical average speech rate is not in the preset difference range, during the next speech endpoint detection process adjacent to the current speech endpoint detection, based on the moment when the user's speech energy starts to decrease, perform the target duration The speech energy corresponding to the end time of the duration extension is used as the speech energy threshold, wherein the target duration is a preset time length determined according to the current speech rate.

[00...

Embodiment 3

[0047] image 3 It is a schematic structural diagram of a speech endpoint detection device provided in Embodiment 3 of the present invention. This embodiment is applicable to the situation of detecting a user's speech endpoint during a speech-based human-computer interaction process. The device can be implemented in the form of software and / or hardware, and can be integrated on a terminal with a voice recognition function, such as an intelligent mobile terminal and a vehicle-mounted device.

[0048] Such as image 3 As shown, the speech endpoint detection device provided in this embodiment may include a speech rate determination module 310 and a speech energy threshold adjustment module 320, wherein:

[0049] Speech rate determination module 310, configured to determine whether the difference between the user's current speech rate and the historical average speech rate is within a preset difference range;

[0050] Speech energy threshold adjustment module 320, for if the dif...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a voice end detection method, a device, a terminal and a storage medium. The method comprises the following steps: confirming whether the difference of a present speaking speed and a historical average speaking speed of a user is within a preset difference range or not; if the difference of the present speaking speed and the historical average speaking speed is not within the preset difference rang, adjusting a voice energy threshold of voice end detection according to the present speaking speed, and confirming a voice over end of the user according tothe adjusted voice energy threshold in the a next voice end detection process adjacent to present voice end detection. By adopting the method disclosed by the embodiment of the invention, the problemthat voice recognition results are low in accuracy because of sharing of parameter thresholds in a conventional voice end detection method can be solved, individual detection on voice ends of different users can be achieved, and voice recognition accuracy can be improved.

Description

technical field [0001] The embodiment of the present invention relates to the field of computer technology, and in particular to a voice endpoint detection method, device, terminal and storage medium. Background technique [0002] The traditional voice endpoint detection (Voices Active Defect, VAD) algorithm is mainly based on indicators such as zero-crossing rate and sound level to detect whether a sentence is over. If the voice energy value of the preceding consecutive preset number of M0 frames in the obtained voice stream is lower than the previously specified energy value threshold Elow, and the energy value of the voice of the next consecutive M0 frames is greater than Elow, then when the voice energy value increases The place is the start endpoint of the speech. Similarly, if the speech energy value of several consecutive frames is large, and the subsequent speech frame energy value becomes smaller, that is, less than the specified energy value threshold Ehigh, and l...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L25/87G10L25/78G10L25/21
Inventor 欧阳能钧贺学焱彭汉迎
Owner APOLLO INTELLIGENT CONNECTIVITY (BEIJING) TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products