Deep learning-based adaptive voice speed playback system, method therefor, and computer program therefor

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
The deep learning-based system dynamically adjusts phoneme speeds per sentence to enhance speech playback quality and naturalness, addressing the limitations of uniform speed ratios in existing systems.

WO2026127353A1 Publication Date: 2026-06-18INDUSTRY UNIVERSITY COOPERATION FOUNDATION HANYANG UNIVERSITY

0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: INDUSTRY UNIVERSITY COOPERATION FOUNDATION HANYANG UNIVERSITY
Filing Date: 2025-10-24
Publication Date: 2026-06-18

Application Information

Patent Timeline

24 Oct 2025

Application

18 Jun 2026

Publication

WO2026127353A1

IPC: G10L21/043; G10L13/10; G10L15/00; G10L21/10; G10L25/18; G06N3/084; G06N3/0475; G06N3/0455

AI Tagging

Application Domain

Biological models Speech recognition

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing speech speed playback systems apply uniform speed ratios to all phonemes without considering sentence context, leading to signal distortion and poor performance, especially for synthesized speech.

⚗Method used

A deep learning-based system that dynamically adjusts phoneme-level pronunciation speeds using a speech speed predictor and generative model, incorporating a variational autoencoder and flow model to adaptively modify speech speed based on sentence context.

🎯Benefits of technology

The system provides natural and flexible speech playback by adjusting phoneme speeds per sentence, improving sound quality and maintaining acoustic features like pitch and timbre, outperforming conventional methods in various evaluation metrics.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 1
Figure 2

Patent Text Reader

Abstract

Disclosed are a deep learning-based adaptive voice speed playback system and a method therefor. The deep learning-based adaptive voice speed playback system according to one disclosed embodiment comprises: an encoding module extracting language features on the basis of an input text sequence, extracting acoustic features on the basis of original voice data, predicting a phoneme-level pronunciation playback rate for each sentence on the basis of the language features, and combining Gaussian upsampled language features and the acoustic features using the predicted phoneme-level pronunciation playback rate to output adaptive acoustic feature data in which the pronunciation speed of each phoneme is dynamically adjusted for each sentence; and a decoding module providing a speed-adjusted voice signal in the form of an original voice waveform on the basis of the adaptive acoustic feature data using a deep learning-based generative model, wherein the deep learning-based generative model is generated by integrating a variational autoencoder (VAE) for modeling a probability distribution of the adaptive acoustic feature data and a flow model for converting the probability distribution predicted by the VAE into a mel-spectrogram.

Need to check novelty before this filing date? Find Prior Art