Speech real-time variable-speed play method and device
A playback method and equipment technology, applied in speech analysis, transmission systems, electrical components, etc., can solve the problems of being unable to hear the other party's speech clearly, unable to hear the other party's speech clearly, and speaking at a high speed.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
no. 1 Embodiment
[0126] Such as figure 2 As shown, the device capable of realizing voice real-time variable-speed playback mainly includes three modules, namely: Ring Buffer (Ring Buffer) module 201, Voice Activity Detection (VAD) module 202, Waveform Similarity Overlap-Add (WaveformSimilarity Overlap-Add, WSOLA) module 203 and rate adjustment module 204.
[0127] Wherein, the ring buffer module 201 is mainly responsible for controlling the inflow and outflow of voice data stored in the ring buffer.
[0128] The VAD module 202 is mainly responsible for detecting speech segments and non-speech segments. Specifically, the detection may be performed every preset time interval, for example, the preset time length is 20 milliseconds (ms).
[0129] The WSOLA module 203 is mainly responsible for adjusting the duration of the voice data to be played according to the play mode control command issued by the rate adjustment module 204, thereby controlling the normal play, fast play or slow play of the ...
no. 2 Embodiment
[0138] Based on the device provided by the first specific embodiment, the Ring Buffer module 201 in the device is mainly responsible for the management of the voice data flow, and judges the ring buffer according to the available data volume of the ring buffer and the upper limit or lower limit of the setting. The current state of the buffer, and notify the rate adjustment module of the current state of the ring buffer, and the rate adjustment module makes reasonable control instructions according to the acquired data storage status of the ring buffer to ensure that the data in the ring buffer is not due to Overflow or underflow occurs due to slow or fast playback.
[0139] The Ring Buffer module 201 includes at least three member variables: a first read pointer (reader), a write pointer (writer) and available data volume. Such as Figure 4 Shown is a schematic diagram of the Ring Buffer module 201 reading and writing the data packets of the ring buffer.
[0140] When voice ...
no. 3 Embodiment
[0143] Based on the device provided by the first specific embodiment, the VAD module 202 in the device is mainly responsible for reading voice data from the ring buffer to carry out VAD judgment, and the obtained judgment result is sent to the rate adjustment module; As a result, the playback mode control command is determined to control the WSOLA module to adjust the voice duration to realize normal playback, fast playback or slow playback of the device, so as to maximize the use of the limited buffer resources of the ring buffer to slow down the voice segment.
[0144] There are various VAD algorithms that can be used for VAD decision. In this specific embodiment, a VAD algorithm using an Adaptive Multi-Rate (Adaptive Multi-Rate, AMR) encoder is taken as an example for illustration.
[0145] Such as Figure 5 Shown is a schematic diagram of the process of VAD reading voice data from the ring buffer. The second read pointer is set in the Ring Buffer module 201, and the VAD m...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com