An audio synthesis method based on a vits model improvement and a storage medium
By improving the loss function and optimizing the hyperparameters of the VITS model, the problems of insufficient flexibility and naturalness of traditional speech synthesis technology in the telecommunications field are solved, achieving efficient and natural speech synthesis and improving the user experience of intelligent customer service in telecommunications.
CN120895023BActive Publication Date: 2026-06-23JIANGSU ZHIHENG INFORMATION TECH SERVICES CO LTD
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- JIANGSU ZHIHENG INFORMATION TECH SERVICES CO LTD
- Filing Date
- 2025-09-11
- Publication Date
- 2026-06-23
Smart Images

Figure CN120895023B_ABST
Abstract
The application discloses an audio synthesis method based on a VITS model improvement and a storage medium, and belongs to the technical field of speech synthesis. The method comprises the following steps: obtaining text of to-be-synthesized audio data, and preprocessing the text; inputting the preprocessed text into a pre-trained adaptive speech synthesis model AdaVITS to perform audio synthesis; and obtaining generated audio data according to the output of the adaptive speech synthesis model AdaVITS. The adaptive speech synthesis model AdaVITS is based on a speech synthesis model VITS, a loss function of the speech synthesis model VITS is improved and increased to obtain a joint loss function of the adaptive speech synthesis model AdaVITS, and the joint loss function is optimally solved, so that the speech quality and the training efficiency are synergistically optimized.
Need to check novelty before this filing date? Find Prior Art