Speaker recognition method based on three-dimensional convolutional neural network text independence and system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of speaker recognition and neural network, which is applied in the field of text-independent speaker recognition and speaker recognition, can solve the problems of reducing experience, cumbersome steps, and heavy workload, and achieves the effect of improving differentiation and improving experience.

Active Publication Date: 2017-12-12

SICHUAN CHANGHONG ELECTRIC CO LTD

View PDF12 Cites 31 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] Problems existing in the existing speaker recognition technology: (1) speaker recognition algorithms are basically based on text-related, that is, the registered and recognized sentences must be consistent, which greatly reduces the user experience; (2) some text-based The current speaker recognition algorithms are all manually designed features, the steps are cumbersome, and the workload is heavy; (3) in the user registration stage, the user's multiple voiceprint features are averaged to form the registration model, which ignores the same word even if it is used by the same user. There is a big difference in what people say

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0068] Take training a model containing 1000 speakers as an example to illustrate the speaker recognition model training process.

[0069] (1) Collect samples of each speaker, indicator: number of samples 3000 samples per person;

[0070] (2) The voice preprocessing module processes all voice data to obtain three-dimensional training data;

[0071] (3) Divide all training data into 4:1 randomly and use them as training set and validation set;

[0072] (4) The residual network training model is used, and the model training is terminated when the recognition accuracy of the model on the verification set remains basically unchanged, and an offline speaker recognition model is obtained.

[0073] Registration mode

[0074] (1) Voice sample collection

[0075] Collect training samples by recording;

[0076] (2) Voice preprocessing

[0077] Use the voice preprocessing module to preprocess the voice to generate registration data;

[0078] (3) Feature extraction

[0079] The offline model generated in...

Embodiment 2

[0081] Take the registration of a data set containing 10 speakers as an example to illustrate the speaker registration process.

[0082] (1) Collect the voice data of 10 speakers, with 20 voice data samples per person;

[0083] (2) The voice preprocessing module processes all voice data to obtain three-dimensional data of each speaker;

[0084] (3) Use the offline model generated in the training phase to extract features, and save the features of each person in the database, speaker0, speaker1,..., speaker9;

[0085] Recognition mode

[0086] (1) Voice sample collection

[0087] Collect training samples by recording.

[0088] (2) Voice preprocessing

[0089] The offline model generated in the training phase is used to extract features from the preprocessed speech to generate test data.

[0090] (3) Extract features

[0091] The offline model generated in the training phase is used to extract features of the preprocessed speech.

[0092] (4) Feature comparison

[0093] Find the cosine distance ...

Embodiment 3

[0095] Take identifying a speaker as an example to illustrate the process of speaker identification.

[0096] (1) Collect one piece of voice data of the speaker;

[0097] (2) The voice preprocessing module processes all voice data, and duplicates the test sample repeatedly according to the depth of the three-dimensional data used in the training data to obtain the three-dimensional data of this sample;

[0098] (3) The offline model generated in the training phase is used to extract features;

[0099] (4) This feature and the feature registered in the database have cosine distances to get sim0, sim1,..., sim9, find the maximum value sim_max of these 10 similarities and the speaker_x of the corresponding speaker, if the maximum value is greater than Threshold sim, the sample is accepted as speaker_x, otherwise, it is recognized as an unregistered speaker.

[0100] In summary, the present invention implements a text-independent speaker recognition method and system based on a three-dimen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a speaker recognition system based on three-dimensional convolutional neural network text independence. The speaker recognition system comprises a module I, namely a voice acquisition module, a module II, namely a voice preprocessing module, a module III, namely a speaker recognition model training module, and a module IV, namely a speaker recognition module, wherein the voice acquisition module is used for acquiring voice data; the voice preprocessing module is used for extracting mel-frequency cepstrum coefficient characteristics of original voice data and used for ejecting non-voice data in the characteristics, and thus final training data are acquired; the speaker recognition model training module is sued for training off-line models recognized by a speaker; and the speaker recognition module is used for recognizing identity of a speaker in real time. The invention further discloses a speaker recognition method based on three-dimensional convolutional neural network text independence. By adopting the speaker recognition method and the speaker recognition system based on three-dimensional convolutional neural network text independence, the purpose that registration of a user is independent from a recognized text is achieved, and thus the user experience can be improved.

Description

Technical field [0001] The invention relates to a speaker recognition method and system, in particular to a text-independent speaker recognition method and system based on a three-dimensional convolutional neural network, and belongs to the technical field of intelligent recognition. Background technique [0002] With the development of artificial intelligence, the prospect of voice control systems for smart homes has begun to emerge. However, even if the current voice recognition technology has basically reached the standard that people need, there are still some flaws in the smart home voice control system, such as how to accurately identify the identity of the user who issued the command, and the speaker recognition ( That is, voiceprint recognition) is one of the effective solutions. The smart home system recognizes the user's identity and can push relevant content according to the user's personal preferences. In this way, the use of speaker recognition can further improve ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L17/02G10L17/04G10L17/18

CPCG10L17/02G10L17/04G10L17/18

Inventor 伍强

Owner SICHUAN CHANGHONG ELECTRIC CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Speaker recognition method based on three-dimensional convolutional neural network text independence and system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology