Voiceprint recognition method and system based on fusion of multiple voice features
A voiceprint recognition and voice feature technology, applied in voice analysis, instruments, etc., can solve the problems of lower accuracy rate, voice vulnerability to channel change and environment change, etc., and achieve the effect of improving the accuracy of the algorithm
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
specific Embodiment approach )
[0105] get raw audio;
[0106] Extract the original spectral features in the original audio, and output the first feature vector after aggregating the original spectral features;
[0107] Extract the MFCC features in the original audio, and aggregate the MFCC features to output the second feature vector;
[0108] Input the first feature vector and the second feature vector into the deep neural network (DNN) for feature fusion, and output the third feature vector;
[0109] Speaker classification is performed according to the third feature vector.
[0110] The above-mentioned voiceprint recognition method based on the fusion of multiple speech features, wherein, the process of the original spectral feature aggregation output comprises the following steps:
[0111] Passing the original spectral features through a two-dimensional convolutional neural network (2D-CNN) to obtain the original spectral feature aggregation layer;
[0112] Extract and output the first feature vector ...
Embodiment 1
[0123] 1) Original audio input, get original audio;
[0124] 2) Extract the original spectral features from the original audio, and process the original spectral features through a two-dimensional convolutional neural network (2D-CNN) to obtain the original spectral feature aggregation layer, and extract and output the fixed length of the original spectral feature aggregation layer. the first eigenvector;
[0125] 3) Extract the MFCC features in the original audio, process the MFCC features through a one-dimensional convolutional neural network (1D-CNN) to obtain the MFCC feature aggregation layer, extract and output the fixed-length second feature vector in the MFCC feature aggregation layer ;
[0126] 4) Input the first feature vector and the second feature vector into the deep neural network (DNN) for feature fusion, and output the third feature vector;
[0127] 5) To classify speakers according to the third feature vector, any one of batch gradient descent (BGD), stochas...
Embodiment 2
[0137] 1) The audio acquisition module 101 is used to acquire original audio;
[0138] 2) The original spectral feature acquisition module 102 is used to extract the original spectral feature data in the original audio and transmit the original spectral feature data to the aggregation layer feature acquisition module 104;
[0139] 3) The MFCC feature acquisition module 103 is used to extract the MFCC feature data in the original audio and transmit the MFCC feature data to the aggregation layer feature acquisition module 104;
[0140] 4) The aggregation layer feature acquisition module 104 is configured to receive the original spectral feature data and the MFCC feature data, extract the first feature vector and the second feature vector respectively, and transmit the first feature vector and the second feature vector to the fusion module 105;
[0141] 5) The fusion module 105 is configured to receive the first feature vector and the second feature vector and input the first fea...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 

