Speech real-time variable-speed play method and device

A playback method and equipment technology, applied in speech analysis, transmission systems, electrical components, etc., can solve the problems of being unable to hear the other party's speech clearly, unable to hear the other party's speech clearly, and speaking at a high speed.

Active Publication Date: 2018-02-23
CHINA ACAD OF TELECOMM TECH
View PDF7 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] First, when people with hearing problems answer the phone, for example, hearing-impaired people or the elderly, they often cannot hear the other party because the other party speaks to

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech real-time variable-speed play method and device
  • Speech real-time variable-speed play method and device
  • Speech real-time variable-speed play method and device

Examples

Experimental program
Comparison scheme
Effect test

no. 1 Embodiment

[0126] Such as figure 2 As shown, the device capable of realizing voice real-time variable-speed playback mainly includes three modules, namely: Ring Buffer (Ring Buffer) module 201, Voice Activity Detection (VAD) module 202, Waveform Similarity Overlap-Add (WaveformSimilarity Overlap-Add, WSOLA) module 203 and rate adjustment module 204.

[0127] Wherein, the ring buffer module 201 is mainly responsible for controlling the inflow and outflow of voice data stored in the ring buffer.

[0128] The VAD module 202 is mainly responsible for detecting speech segments and non-speech segments. Specifically, the detection may be performed every preset time interval, for example, the preset time length is 20 milliseconds (ms).

[0129] The WSOLA module 203 is mainly responsible for adjusting the duration of the voice data to be played according to the play mode control command issued by the rate adjustment module 204, thereby controlling the normal play, fast play or slow play of the ...

no. 2 Embodiment

[0138] Based on the device provided by the first specific embodiment, the Ring Buffer module 201 in the device is mainly responsible for the management of the voice data flow, and judges the ring buffer according to the available data volume of the ring buffer and the upper limit or lower limit of the setting. The current state of the buffer, and notify the rate adjustment module of the current state of the ring buffer, and the rate adjustment module makes reasonable control instructions according to the acquired data storage status of the ring buffer to ensure that the data in the ring buffer is not due to Overflow or underflow occurs due to slow or fast playback.

[0139] The Ring Buffer module 201 includes at least three member variables: a first read pointer (reader), a write pointer (writer) and available data volume. Such as Figure 4 Shown is a schematic diagram of the Ring Buffer module 201 reading and writing the data packets of the ring buffer.

[0140] When voice ...

no. 3 Embodiment

[0143] Based on the device provided by the first specific embodiment, the VAD module 202 in the device is mainly responsible for reading voice data from the ring buffer to carry out VAD judgment, and the obtained judgment result is sent to the rate adjustment module; As a result, the playback mode control command is determined to control the WSOLA module to adjust the voice duration to realize normal playback, fast playback or slow playback of the device, so as to maximize the use of the limited buffer resources of the ring buffer to slow down the voice segment.

[0144] There are various VAD algorithms that can be used for VAD decision. In this specific embodiment, a VAD algorithm using an Adaptive Multi-Rate (Adaptive Multi-Rate, AMR) encoder is taken as an example for illustration.

[0145] Such as Figure 5 Shown is a schematic diagram of the process of VAD reading voice data from the ring buffer. The second read pointer is set in the Ring Buffer module 201, and the VAD m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a speech real-time variable-speed play method and device, and provides a solution scheme for realizing speech variable-speed play in the real-time speech communication process.The method comprises the following steps: a device receives speech data and receives a control instruction of speech play rate; the device saves the speech data to a buffer area and determines the data storage state of the buffer area; the device reads speech data to be played from the buffer area, and carries out voice activity detection on the speech data to be played to obtain a detection result; and the device adjusts time duration of the speech data to be played according to the control instruction, the data storage state of the buffer area and the detection result.

Description

technical field [0001] The invention relates to the technical field of audio signal processing, in particular to a method and device for playing voice at a real-time variable speed. Background technique [0002] In real-time voice communication technology, the speed at which the receiving end plays the voice is the same as the speed at which the sending end sends the voice. [0003] However, in practical applications, the following scenarios often exist: [0004] First, when people with hearing problems answer the phone, for example, hearing-impaired people or the elderly, they often cannot hear the other party because the other party speaks too fast; [0005] Second, when a person with normal hearing is answering a call from a foreigner, they often cannot hear the other party's speech clearly because they cannot respond to some key information. [0006] In view of this, it is necessary to realize voice variable-speed playback during real-time voice communication. Conten...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L21/043G10L21/057H04L29/06
CPCG10L21/043G10L21/057H04L65/613H04L65/762
Inventor 邹莹梁民
Owner CHINA ACAD OF TELECOMM TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products