Speech enhancement method and device, training method and device, computer device, and storage medium

By combining decomposition codec and diffusion model processing, the semantic and acoustic properties of speech signals are decoupled, solving the problems of semantic information loss and acoustic feature distortion in traditional methods. This achieves efficient speech enhancement in complex noise environments, improving the accuracy and security of voice interaction.

CN121034331BActive Publication Date: 2026-06-23PING AN TECH (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
PING AN TECH (SHENZHEN) CO LTD
Filing Date
2025-09-05
Publication Date
2026-06-23

Smart Images

  • Figure CN121034331B_ABST
    Figure CN121034331B_ABST
Patent Text Reader

Abstract

The present application relates to the technical field of speech processing, and can be applied to the fields of finance and medicine, and discloses a speech enhancement method, a training method, a device, computer equipment and a storage medium, the speech enhancement method comprising: receiving noisy speech input, encoding the noisy speech through a pre-trained decomposition codec to obtain a hidden representation; using a pre-trained semantic diffusion model to perform step-by-step denoising processing on the hidden representation to obtain a semantic token sequence corresponding to clear speech; combining the semantic token sequence and the hidden representation of the noisy speech as a condition, and performing step-by-step denoising processing through a pre-trained acoustic diffusion model to obtain an acoustic token sequence corresponding to clear speech; and inputting the semantic token sequence and the acoustic token sequence into a decoder of the decomposition codec to reconstruct and generate a clear speech signal. The present application significantly improves the robustness of speech enhancement in a complex noise environment and reduces the interference of noise on the key attributes of speech.
Need to check novelty before this filing date? Find Prior Art