Voiceprint recognition method and system based on fusion of multiple voice features

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A voiceprint recognition and voice feature technology, applied in voice analysis, instruments, etc., can solve the problems of lower accuracy rate, voice vulnerability to channel change and environment change, etc., and achieve the effect of improving the accuracy of the algorithm

Active Publication Date: 2020-08-11

SHANGHAI YITU NETWORK SCI & TECH

View PDF4 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0002] With the development of information technology and the popularization of the Internet, there are more and more application occasions that require the use of personal identification. In terms of traditional needs, there are various online accounts, online payment, access control, etc., and with the Internet and The application and promotion of artificial intelligence requires different identification methods and systems according to the habits and characteristics of different people, such as fingerprint, face, and voiceprint recognition. Among them, voiceprint recognition is a kind of biometric technology. Generate an identity vector indicating the identity information of the voice inputter, and determine whether the inputters of the two voices are the same user by calculating the similarity between the identity vectors of the voices at both ends, but voices are susceptible to channel variability and The environment changes, and the accuracy of voiceprint recognition based on a single voice feature will be greatly reduced. Therefore, a method and system that can integrate multiple voice features for voiceprint recognition is needed.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

specific Embodiment approach ）

[0105] get raw audio;

[0106] Extract the original spectral features in the original audio, and output the first feature vector after aggregating the original spectral features;

[0107] Extract the MFCC features in the original audio, and aggregate the MFCC features to output the second feature vector;

[0108] Input the first feature vector and the second feature vector into the deep neural network (DNN) for feature fusion, and output the third feature vector;

[0109] Speaker classification is performed according to the third feature vector.

[0110] The above-mentioned voiceprint recognition method based on the fusion of multiple speech features, wherein, the process of the original spectral feature aggregation output comprises the following steps:

[0111] Passing the original spectral features through a two-dimensional convolutional neural network (2D-CNN) to obtain the original spectral feature aggregation layer;

[0112] Extract and output the first feature vector ...

Embodiment 1

[0123] 1) Original audio input, get original audio;

[0124] 2) Extract the original spectral features from the original audio, and process the original spectral features through a two-dimensional convolutional neural network (2D-CNN) to obtain the original spectral feature aggregation layer, and extract and output the fixed length of the original spectral feature aggregation layer. the first eigenvector;

[0125] 3) Extract the MFCC features in the original audio, process the MFCC features through a one-dimensional convolutional neural network (1D-CNN) to obtain the MFCC feature aggregation layer, extract and output the fixed-length second feature vector in the MFCC feature aggregation layer ;

[0126] 4) Input the first feature vector and the second feature vector into the deep neural network (DNN) for feature fusion, and output the third feature vector;

[0127] 5) To classify speakers according to the third feature vector, any one of batch gradient descent (BGD), stochas...

Embodiment 2

[0137] 1) The audio acquisition module 101 is used to acquire original audio;

[0138] 2) The original spectral feature acquisition module 102 is used to extract the original spectral feature data in the original audio and transmit the original spectral feature data to the aggregation layer feature acquisition module 104;

[0139] 3) The MFCC feature acquisition module 103 is used to extract the MFCC feature data in the original audio and transmit the MFCC feature data to the aggregation layer feature acquisition module 104;

[0140] 4) The aggregation layer feature acquisition module 104 is configured to receive the original spectral feature data and the MFCC feature data, extract the first feature vector and the second feature vector respectively, and transmit the first feature vector and the second feature vector to the fusion module 105;

[0141] 5) The fusion module 105 is configured to receive the first feature vector and the second feature vector and input the first fea...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a voiceprint recognition method and system based on fusion of multiple voice features. The method comprises the following steps: acquiring an original audio; extracting original frequency spectrum features in the original audio, aggregating the original frequency spectrum features, and outputting a first feature vector; extracting MFCC features in the original audio, aggregating the MFCC features, and outputting a second feature vector; inputting the first feature vector and the second feature vector into a deep neural network for feature fusion, and outputting a thirdfeature vector; carrying out speaker classification according to the third feature vector. The system comprises an audio acquisition module, an original spectrum feature acquisition module, an MFCC feature acquisition module, an aggregation layer feature acquisition module, a fusion module and a speaker classification module.

Description

technical field [0001] The invention relates to the technical field of voiceprint recognition, in particular to a voiceprint recognition method and system based on fusion of multiple voice features. Background technique [0002] With the development of information technology and the popularization of the Internet, there are more and more applications that need to use human identification. Traditional needs include various online accounts, online payment, access control, etc. The application and promotion of artificial intelligence requires different identification methods and systems according to the habitual characteristics of different people, such as fingerprint, face, and voiceprint recognition. Among them, voiceprint recognition is a kind of biometric technology. To generate an identity vector indicating the identity information of the voice inputter, it can be determined by calculating the similarity between the identity vectors of the voices at both ends whether the i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L17/02G10L17/04G10L17/18G10L25/18G10L25/24G10L25/30

CPCG10L17/02G10L17/18G10L17/04G10L25/18G10L25/24G10L25/30

Inventor 陈华官张志齐

Owner SHANGHAI YITU NETWORK SCI & TECH

Voiceprint recognition method and system based on fusion of multiple voice features

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

specific Embodiment approach ）

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology