A Speaker Recognition Method Based on Multi-Stream Hierarchical Fusion Transform Features and Long Short-Term Memory Networks

A long-short-term memory and speaker identification technology, which is applied in the field of speaker identification, can solve the problems of poor identification results, failure to obtain better results, and inability to effectively describe the differences in deep-seated characteristics of speakers.

Active Publication Date: 2020-05-22
SOUTH CHINA UNIV OF TECH
View PDF20 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] 1. Shallow features such as MFCC cannot effectively describe the differences in deep characteristics of speakers, and bottleneck features cannot represent the differences in deep characteristics of speakers from multiple aspects
[0005] 2. The current speaker modeling method, such as GMM, with the increase of the number of speakers, the recognition result of the method gradually deteriorates and cannot achieve better results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Speaker Recognition Method Based on Multi-Stream Hierarchical Fusion Transform Features and Long Short-Term Memory Networks
  • A Speaker Recognition Method Based on Multi-Stream Hierarchical Fusion Transform Features and Long Short-Term Memory Networks
  • A Speaker Recognition Method Based on Multi-Stream Hierarchical Fusion Transform Features and Long Short-Term Memory Networks

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0079] Such as figure 1 Shown is the flow chart of the embodiment of the present invention, and concrete steps are as follows:

[0080] S1. Acoustic feature extraction: Extracting Filterbank features and MFCC features from speech samples, specifically including the following steps:

[0081] S1.1. Pre-emphasis: f(z)=1-αz -1 Filter the input speech for the transfer function, where the value range of α is [0.9,1];

[0082] S1.2. Framing: After pre-emphasis, the voice is divided into voice frames of a specific length, and the frame length is L, the frame shift is S, and the voice of the rth frame is expressed as x r (n), where 1≤r≤R, 0≤n≤N-1, R and N represent the number of frames and the sampling points of each frame of speech respectively;

[0083] S1.3. Windowing: Multiply each frame of speech by the window function w(n), and the window function is a Hamming window, which is recorded as:

[0084]

[0085] S1.4. Extracting Filterbank features and MFCC features, the specif...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a speaker recognition method based on multi-stream hierarchical fusion transform features and a long-short-term memory network. The method comprises: a Filterbank feature and aMel frequency cepstral coefficient feature are extracted from a speech sample as two feature flows respectively; the two feature flows are inputted into two deep belief networks with bottleneck layers respectively to carry out feature conversion, thereby obtaining two bottleneck feature flows; the two bottleneck feature flows are spliced and the spliced feature flows are inputted into a third deep belief network with a bottleneck layer to carry out feature conversion, so that a feature after fusion conversion; and then a speaker to which the voice sample belongs is determined by using a long-short-term memory network. According to the invention, fusion conversion is carried out on inputted acoustic features by using multiple deep belief networks; and compared with the single acoustic feature or the single feature after neural network conversion, the provided feature after fusion conversion is able to depict characteristic differences of different speakers, so that an excellent effectis obtained in speaker identification.

Description

technical field [0001] The invention relates to the technical fields of speech processing and deep learning, in particular to a speaker identification method based on multi-stream layered fusion transformation features and long-short-term memory network. Background technique [0002] Pattern recognition is a hot spot in current research, and speaker recognition is one of its subfields. Speaker recognition refers to distinguishing the speaker's identity from the existing speaker set through a piece of speech. At present, the MelFrequency Cepstral Coefficient (MFCC) feature, Filterbank feature, and I-Vector feature are the most commonly used audio features to describe the differences in speaker characteristics, and have achieved good speaker recognition results. The above features are all superficial features, which cannot deeply represent the differences in characteristics of each speaker, and have certain limitations. In recent years, with the development of deep learning t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G10L17/20G10L17/18G10L17/02G10L17/04
CPCG10L17/02G10L17/04G10L17/18G10L17/20
Inventor 李鹏乾李艳雄
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products