An audio synthesis method based on a vits model improvement and a storage medium

By improving the loss function and optimizing the hyperparameters of the VITS model, the problems of insufficient flexibility and naturalness of traditional speech synthesis technology in the telecommunications field are solved, achieving efficient and natural speech synthesis and improving the user experience of intelligent customer service in telecommunications.

CN120895023BActive Publication Date: 2026-06-23JIANGSU ZHIHENG INFORMATION TECH SERVICES CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
JIANGSU ZHIHENG INFORMATION TECH SERVICES CO LTD
Filing Date
2025-09-11
Publication Date
2026-06-23

Smart Images

  • Figure CN120895023B_ABST
    Figure CN120895023B_ABST
Patent Text Reader

Abstract

The application discloses an audio synthesis method based on a VITS model improvement and a storage medium, and belongs to the technical field of speech synthesis. The method comprises the following steps: obtaining text of to-be-synthesized audio data, and preprocessing the text; inputting the preprocessed text into a pre-trained adaptive speech synthesis model AdaVITS to perform audio synthesis; and obtaining generated audio data according to the output of the adaptive speech synthesis model AdaVITS. The adaptive speech synthesis model AdaVITS is based on a speech synthesis model VITS, a loss function of the speech synthesis model VITS is improved and increased to obtain a joint loss function of the adaptive speech synthesis model AdaVITS, and the joint loss function is optimally solved, so that the speech quality and the training efficiency are synergistically optimized.
Need to check novelty before this filing date? Find Prior Art