An audio synthesis method based on a vits model improvement and a storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By improving the loss function and optimizing the hyperparameters of the VITS model, the problems of insufficient flexibility and naturalness of traditional speech synthesis technology in the telecommunications field are solved, achieving efficient and natural speech synthesis and improving the user experience of intelligent customer service in telecommunications.

CN120895023BActive Publication Date: 2026-06-23JIANGSU ZHIHENG INFORMATION TECH SERVICES CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: JIANGSU ZHIHENG INFORMATION TECH SERVICES CO LTD
Filing Date: 2025-09-11
Publication Date: 2026-06-23

Application Information

Patent Timeline

11 Sep 2025

Application

23 Jun 2026

Publication

CN120895023B

IPC: G10L13/10; G10L13/027; G10L13/04; G10L21/02; G10L25/87; G06N3/0475; G06N3/0455

AI Tagging

Application Domain

Biological models Speech synthesis

Technology Topics

Audio synthesisText entry

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Customer service dialogue quality inspection method and device, computer device and storage medium
CN119179777BDigital data information retrieval Semantic analysisText entryEngineering
Segmentation violation review method and device for audio, electronic equipment and storage medium
CN116166837BBiological models Speech recognitionText entryAudio frequency
Task processing model training method and related method, device, equipment and medium
CN122262257ADigital data information retrieval Special data processing applicationsText entryEngineering
Emotion triple extraction method and device
CN115017881BDigital data information retrieval Semantic analysis Pattern recognitionText entry
using unpaired data for cross-modal generation models through cycle consistency
CN122270764ABiological models AlgorithmText entry

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN120895023B_ABST

Patent Text Reader

Abstract

The application discloses an audio synthesis method based on a VITS model improvement and a storage medium, and belongs to the technical field of speech synthesis. The method comprises the following steps: obtaining text of to-be-synthesized audio data, and preprocessing the text; inputting the preprocessed text into a pre-trained adaptive speech synthesis model AdaVITS to perform audio synthesis; and obtaining generated audio data according to the output of the adaptive speech synthesis model AdaVITS. The adaptive speech synthesis model AdaVITS is based on a speech synthesis model VITS, a loss function of the speech synthesis model VITS is improved and increased to obtain a joint loss function of the adaptive speech synthesis model AdaVITS, and the joint loss function is optimally solved, so that the speech quality and the training efficiency are synergistically optimized.

Need to check novelty before this filing date? Find Prior Art