Multi-attention feature fusion speaker recognition method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of speaker recognition and feature fusion, applied in character and pattern recognition, speech analysis, instruments, etc., can solve the problem of not being able to fully utilize multiple branches, and achieve the effect of suppressing noise and enhancing effective information

Pending Publication Date: 2021-12-07

JIANGSU UNIV

View PDF0 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

At present, researchers have found that multi-branch features can help the model learn more discriminative speaker representations. This method maps features to different branches through different convolution kernel parameters, and each branch is processed separately. Finally, the The features of each branch are fused. When the multi-branch features are fused, the traditional method adopts the method of adding or splicing, which cannot give full play to the characteristics of the multi-branch features.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0023] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments, but the protection scope of the present invention is not limited thereto.

[0024] Such as figure 1 As shown, a kind of speaker recognition method of multi-attention feature fusion of the present invention, carries out short-time Fourier transform to speech signal and obtains spectrogram, and spectrogram obtains Fbank feature through Mel filter, and Fbank feature is used as deep speech The input features of the person representation model. The deep speaker representation model includes a feature extractor and a speaker classifier. The Fbank feature is extracted as a speaker embedding through the feature extractor. The speaker representation represents the speaker embedding in a speech signal. voiceprint information; in the training phase of the deep speaker representation model, the speaker classifier is used to map the speaker representation to ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a multi-attention feature fusion speaker recognition method, which comprises the following steps of: constructing a deep speaker characterization model which comprises a feature extractor and a speaker classifier, taking Fbank features as input of the deep speaker characterization model, extracting the Fbank features as speaker characterization through the feature extractor, in a training stage, using a speaker classifier to map the speaker representation to a speaker tag, and constructing a loss function to optimize a deep speaker representation model; and in a test stage, comparing similarity between speaker representations by adopting a cosine distance, and judging whether the speakers are the same speakers or not according to a threshold value. According to the method, weighted fusion is performed on the features of different branches through multi-attention feature fusion, and the multi-attention feature fusion comprises a space attention mechanism and a channel attention mechanism, so that effective information in each branch is enhanced, and speaker recognition performance with higher robustness is obtained.

Description

technical field [0001] The invention belongs to the technical field of artificial intelligence, and in particular relates to a speaker recognition method for fusion of multiple attention features. Background technique [0002] With the development of voice technology, more and more devices support voice control, such as smartphones, smart speakers and smart cars. In order to improve the security of voice control, speaker recognition technology is often added to these smart devices as a front-end service to ensure that only specific speakers use these voice services. Speaker recognition is a very hot topic, and many methods have been proposed to solve this problem. [0003] The core step of speaker recognition is to extract speaker representations from speech signals. In the early days, the probability density function of the speech signal was used to describe the identity information of the speaker, and the Gaussian Mixture Model-Universal Background Model (GMM-UBM) was on...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L17/04G10L17/18G06K9/62

CPCG10L17/04G10L17/18G06F18/253

Inventor 毛启容秦友才万子楷任庆桦

Owner JIANGSU UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Multi-attention feature fusion speaker recognition method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology