Speaker recognition method based on three-dimensional convolutional neural network text independence and system

A technology of speaker recognition and neural network, which is applied in the field of text-independent speaker recognition and speaker recognition, can solve the problems of reducing experience, cumbersome steps, and heavy workload, and achieves the effect of improving differentiation and improving experience.

Active Publication Date: 2017-12-12
SICHUAN CHANGHONG ELECTRIC CO LTD
View PDF12 Cites 31 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Problems existing in the existing speaker recognition technology: (1) speaker recognition algorithms are basically based on text-related, that is, the registered and recognized sentences must be consistent, which greatly reduces the user experience; (2) some text-based The current speaker recognition algorithms are all manually designed features, the steps are cumbersome, and the workload is heavy; (3) in the user registration stage, the user's multiple voiceprint features are averaged to form the registration model, which ignores the same word even if it is used by the same user. There is a big difference in what people say

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speaker recognition method based on three-dimensional convolutional neural network text independence and system
  • Speaker recognition method based on three-dimensional convolutional neural network text independence and system
  • Speaker recognition method based on three-dimensional convolutional neural network text independence and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0068] Take training a model containing 1000 speakers as an example to illustrate the speaker recognition model training process.

[0069] (1) Collect samples of each speaker, indicator: number of samples 3000 samples per person;

[0070] (2) The voice preprocessing module processes all voice data to obtain three-dimensional training data;

[0071] (3) Divide all training data into 4:1 randomly and use them as training set and validation set;

[0072] (4) The residual network training model is used, and the model training is terminated when the recognition accuracy of the model on the verification set remains basically unchanged, and an offline speaker recognition model is obtained.

[0073] Registration mode

[0074] (1) Voice sample collection

[0075] Collect training samples by recording;

[0076] (2) Voice preprocessing

[0077] Use the voice preprocessing module to preprocess the voice to generate registration data;

[0078] (3) Feature extraction

[0079] The offline model generated in...

Embodiment 2

[0081] Take the registration of a data set containing 10 speakers as an example to illustrate the speaker registration process.

[0082] (1) Collect the voice data of 10 speakers, with 20 voice data samples per person;

[0083] (2) The voice preprocessing module processes all voice data to obtain three-dimensional data of each speaker;

[0084] (3) Use the offline model generated in the training phase to extract features, and save the features of each person in the database, speaker0, speaker1,..., speaker9;

[0085] Recognition mode

[0086] (1) Voice sample collection

[0087] Collect training samples by recording.

[0088] (2) Voice preprocessing

[0089] The offline model generated in the training phase is used to extract features from the preprocessed speech to generate test data.

[0090] (3) Extract features

[0091] The offline model generated in the training phase is used to extract features of the preprocessed speech.

[0092] (4) Feature comparison

[0093] Find the cosine distance ...

Embodiment 3

[0095] Take identifying a speaker as an example to illustrate the process of speaker identification.

[0096] (1) Collect one piece of voice data of the speaker;

[0097] (2) The voice preprocessing module processes all voice data, and duplicates the test sample repeatedly according to the depth of the three-dimensional data used in the training data to obtain the three-dimensional data of this sample;

[0098] (3) The offline model generated in the training phase is used to extract features;

[0099] (4) This feature and the feature registered in the database have cosine distances to get sim0, sim1,..., sim9, find the maximum value sim_max of these 10 similarities and the speaker_x of the corresponding speaker, if the maximum value is greater than Threshold sim, the sample is accepted as speaker_x, otherwise, it is recognized as an unregistered speaker.

[0100] In summary, the present invention implements a text-independent speaker recognition method and system based on a three-dimen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a speaker recognition system based on three-dimensional convolutional neural network text independence. The speaker recognition system comprises a module I, namely a voice acquisition module, a module II, namely a voice preprocessing module, a module III, namely a speaker recognition model training module, and a module IV, namely a speaker recognition module, wherein the voice acquisition module is used for acquiring voice data; the voice preprocessing module is used for extracting mel-frequency cepstrum coefficient characteristics of original voice data and used for ejecting non-voice data in the characteristics, and thus final training data are acquired; the speaker recognition model training module is sued for training off-line models recognized by a speaker; and the speaker recognition module is used for recognizing identity of a speaker in real time. The invention further discloses a speaker recognition method based on three-dimensional convolutional neural network text independence. By adopting the speaker recognition method and the speaker recognition system based on three-dimensional convolutional neural network text independence, the purpose that registration of a user is independent from a recognized text is achieved, and thus the user experience can be improved.

Description

Technical field [0001] The invention relates to a speaker recognition method and system, in particular to a text-independent speaker recognition method and system based on a three-dimensional convolutional neural network, and belongs to the technical field of intelligent recognition. Background technique [0002] With the development of artificial intelligence, the prospect of voice control systems for smart homes has begun to emerge. However, even if the current voice recognition technology has basically reached the standard that people need, there are still some flaws in the smart home voice control system, such as how to accurately identify the identity of the user who issued the command, and the speaker recognition ( That is, voiceprint recognition) is one of the effective solutions. The smart home system recognizes the user's identity and can push relevant content according to the user's personal preferences. In this way, the use of speaker recognition can further improve ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L17/02G10L17/04G10L17/18
CPCG10L17/02G10L17/04G10L17/18
Inventor 伍强
Owner SICHUAN CHANGHONG ELECTRIC CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products