An end-to-end speaker clustering method and system
A clustering method and speaker technology, applied in the field of speaker recognition learning, can solve the problem of heavy workload of manual calibration
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0048] The present invention proposes an end-to-end speaker clustering method, which specifically includes:
[0049] 1) Collect at least two speaker voice data
[0050] The number of label categories of the voice data to be clustered must be two or more.
[0051] 2) Extract the acoustic features of the speech data
[0052] Extract the feature of Mel frequency cepstral coefficient, assuming that the frame length is 25ms and the step size is 10ms, so a 39-dimensional feature vector will be obtained in each frame length, assuming that there are N frames, so that each voice can get 39xN binary dimensional mfcc feature matrix.
[0053] 3) Design a speaker clustering neural network model as a clustering and classification model
[0054] Design a convolutional neural network with two output branches: a classification branch and a clustering branch.
[0055] 4) Design a speaker recognition neural network model as a pre-training model
[0056] The network structure is consistent w...
specific Embodiment
[0075] The present invention proposes an end-to-end speaker clustering system, the specific examples of which are as follows:
[0076] An end-to-end speaker clustering system, the overall structure is attached Figure 5 Specifically, the following modules are included:
[0077] 1) Voice collection and storage module, as attached figure 1 shown
[0078] Acquire voice data and store it locally.
[0079] 2) Acoustic feature extraction module, as attached figure 2 shown
[0080] Extract the acoustic features of the speaker's voice as the input of the neural network.
[0081] 3) Neural network model pre-training module, as attached image 3 shown
[0082] Used to get the initial parameters of the clustering neural network
[0083] 3-1) The data set of the known speaker label has M categories;
[0084] 3-2) The output category of the neural network is set as M category;
[0085] 3-3) Train the neural network until convergence;
[0086] 3-4) Obtain the pre-training model ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


