An end-to-end speaker clustering method and system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A clustering method and speaker technology, applied in the field of speaker recognition learning, can solve the problem of heavy workload of manual calibration

Active Publication Date: 2021-04-27

SICHUAN CHANGHONG ELECTRIC CO LTD

View PDF4 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The purpose of the present invention is to overcome the problem of the huge workload of manual calibration in the prior art, and provide an end-to-end speaker clustering method and system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0048] The present invention proposes an end-to-end speaker clustering method, which specifically includes:

[0049] 1) Collect at least two speaker voice data

[0050] The number of label categories of the voice data to be clustered must be two or more.

[0051] 2) Extract the acoustic features of the speech data

[0052] Extract the feature of Mel frequency cepstral coefficient, assuming that the frame length is 25ms and the step size is 10ms, so a 39-dimensional feature vector will be obtained in each frame length, assuming that there are N frames, so that each voice can get 39xN binary dimensional mfcc feature matrix.

[0053] 3) Design a speaker clustering neural network model as a clustering and classification model

[0054] Design a convolutional neural network with two output branches: a classification branch and a clustering branch.

[0055] 4) Design a speaker recognition neural network model as a pre-training model

[0056] The network structure is consistent w...

specific Embodiment

[0075] The present invention proposes an end-to-end speaker clustering system, the specific examples of which are as follows:

[0076] An end-to-end speaker clustering system, the overall structure is attached Figure 5 Specifically, the following modules are included:

[0077] 1) Voice collection and storage module, as attached figure 1 shown

[0078] Acquire voice data and store it locally.

[0079] 2) Acoustic feature extraction module, as attached figure 2 shown

[0080] Extract the acoustic features of the speaker's voice as the input of the neural network.

[0081] 3) Neural network model pre-training module, as attached image 3 shown

[0082] Used to get the initial parameters of the clustering neural network

[0083] 3-1) The data set of the known speaker label has M categories;

[0084] 3-2) The output category of the neural network is set as M category;

[0085] 3-3) Train the neural network until convergence;

[0086] 3-4) Obtain the pre-training model ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention discloses an end-to-end speaker clustering method, which includes the following steps: S001: collecting the voice data of at least two speakers; S002: extracting the acoustic features of the voice data; S003: designing a speaker clustering neural network Model for clustering and classification; S004: Design a speaker recognition neural network model for pre-training model; S003: Use speaker voice data with known label information to train speaker recognition neural network model; S004: Use speech Initialize the speaker clustering neural network model with the parameters of the human recognition model; S005: Use the speaker voice data with unknown labels to train the speaker clustering neural network model; S006: The speaker clustering neural network model converges, and output the speaker voice with unknown labels Label information of the data. The invention can greatly reduce the workload of manually participating in data calibration, and also helps to improve the accuracy of the speaker recognition model.

Description

technical field [0001] The invention relates to a speaker recognition learning method and system, in particular to an end-to-end speaker clustering method and system. Background technique [0002] In recent years, with the rapid development of artificial intelligence technology, more and more products with artificial intelligence technology have appeared in people's daily life, especially the sudden emergence of smart speakers in recent years. Among them, the voiceprint recognition technology is the standard configuration of almost all smart speakers, and users can complete account login, shopping payment, etc. by using their own voice. [0003] The development of deep learning has made amazing progress in many fields, including the field of speaker recognition. However, changing the structure of the network has less and less impact on the classification accuracy, so people focus on expanding the data set. , the scale of existing common datasets is somewhat insufficient com...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G10L17/02G10L17/04G10L17/14G10L17/18

CPCG10L17/02G10L17/04G10L17/14G10L17/18

Inventor 伍强

Owner SICHUAN CHANGHONG ELECTRIC CO LTD

An end-to-end speaker clustering method and system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

specific Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology