Deep learning-based adaptive voice speed playback system, method therefor, and computer program therefor
The deep learning-based system dynamically adjusts phoneme speeds per sentence to enhance speech playback quality and naturalness, addressing the limitations of uniform speed ratios in existing systems.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- INDUSTRY UNIVERSITY COOPERATION FOUNDATION HANYANG UNIVERSITY
- Filing Date
- 2025-10-24
- Publication Date
- 2026-06-18
AI Technical Summary
Existing speech speed playback systems apply uniform speed ratios to all phonemes without considering sentence context, leading to signal distortion and poor performance, especially for synthesized speech.
A deep learning-based system that dynamically adjusts phoneme-level pronunciation speeds using a speech speed predictor and generative model, incorporating a variational autoencoder and flow model to adaptively modify speech speed based on sentence context.
The system provides natural and flexible speech playback by adjusting phoneme speeds per sentence, improving sound quality and maintaining acoustic features like pitch and timbre, outperforming conventional methods in various evaluation metrics.
Smart Images

Figure 1 
Figure 2