Multi-attention feature fusion speaker recognition method

A technology of speaker recognition and feature fusion, applied in character and pattern recognition, speech analysis, instruments, etc., can solve the problem of not being able to fully utilize multiple branches, and achieve the effect of suppressing noise and enhancing effective information

Pending Publication Date: 2021-12-07
JIANGSU UNIV
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, researchers have found that multi-branch features can help the model learn more discriminative speaker representations. This method maps features to different branches through different convolution kernel parameters, and each branch is processed separately. Finally, the The features of each branch are fused. When the multi-branch features are fused, the traditional method adopts the method of adding or splicing, which cannot give full play to the characteristics of the multi-branch features.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-attention feature fusion speaker recognition method
  • Multi-attention feature fusion speaker recognition method
  • Multi-attention feature fusion speaker recognition method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments, but the protection scope of the present invention is not limited thereto.

[0024] Such as figure 1 As shown, a kind of speaker recognition method of multi-attention feature fusion of the present invention, carries out short-time Fourier transform to speech signal and obtains spectrogram, and spectrogram obtains Fbank feature through Mel filter, and Fbank feature is used as deep speech The input features of the person representation model. The deep speaker representation model includes a feature extractor and a speaker classifier. The Fbank feature is extracted as a speaker embedding through the feature extractor. The speaker representation represents the speaker embedding in a speech signal. voiceprint information; in the training phase of the deep speaker representation model, the speaker classifier is used to map the speaker representation to ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a multi-attention feature fusion speaker recognition method, which comprises the following steps of: constructing a deep speaker characterization model which comprises a feature extractor and a speaker classifier, taking Fbank features as input of the deep speaker characterization model, extracting the Fbank features as speaker characterization through the feature extractor, in a training stage, using a speaker classifier to map the speaker representation to a speaker tag, and constructing a loss function to optimize a deep speaker representation model; and in a test stage, comparing similarity between speaker representations by adopting a cosine distance, and judging whether the speakers are the same speakers or not according to a threshold value. According to the method, weighted fusion is performed on the features of different branches through multi-attention feature fusion, and the multi-attention feature fusion comprises a space attention mechanism and a channel attention mechanism, so that effective information in each branch is enhanced, and speaker recognition performance with higher robustness is obtained.

Description

technical field [0001] The invention belongs to the technical field of artificial intelligence, and in particular relates to a speaker recognition method for fusion of multiple attention features. Background technique [0002] With the development of voice technology, more and more devices support voice control, such as smartphones, smart speakers and smart cars. In order to improve the security of voice control, speaker recognition technology is often added to these smart devices as a front-end service to ensure that only specific speakers use these voice services. Speaker recognition is a very hot topic, and many methods have been proposed to solve this problem. [0003] The core step of speaker recognition is to extract speaker representations from speech signals. In the early days, the probability density function of the speech signal was used to describe the identity information of the speaker, and the Gaussian Mixture Model-Universal Background Model (GMM-UBM) was on...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L17/04G10L17/18G06K9/62
CPCG10L17/04G10L17/18G06F18/253
Inventor 毛启容秦友才万子楷任庆桦
Owner JIANGSU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products