Synthesized speech identification method, apparatus and system, storage medium, and device
By constructing a multi-dimensional feature extraction and clustering identification model, the problem of low accuracy in identifying highly realistic AI synthesized speech in existing technologies has been solved, achieving effective recognition of synthesized speech from different speakers and improving the security of identity authentication.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- CHINA TELECOM ARTIFICIAL INTELLIGENCE TECHNOLOGY (BEIJING) CO LTD
- Filing Date
- 2025-09-05
- Publication Date
- 2026-06-18
AI Technical Summary
Existing synthetic speech identification technologies have low accuracy when faced with highly realistic AI-forged synthetic audio, and they are particularly unable to distinguish synthetic speech from speakers with different voice characteristics, leading to identity authentication security issues.
A target dataset is constructed based on real and synthetic speech data from multiple speakers. Through a discrimination model consisting of a feature extraction module, a classification module, and a judgment module, feature vectors of multiple dimensions are extracted and clustered to generate speaker categories. The authenticity of the speech data is judged based on centroid and similarity threshold.
It improves the accuracy and reliability of identifying synthesized speech from different speakers, effectively recognizes highly realistic AI-synthesized speech, and enhances the security of identity authentication.
Smart Images

Figure 1 
Figure 2