A Speaker Recognition Method Based on Multi-Stream Hierarchical Fusion Transform Features and Long Short-Term Memory Networks

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A long-short-term memory and speaker identification technology, which is applied in the field of speaker identification, can solve the problems of poor identification results, failure to obtain better results, and inability to effectively describe the differences in deep-seated characteristics of speakers.

Active Publication Date: 2020-05-22

SOUTH CHINA UNIV OF TECH

View PDF20 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] 1. Shallow features such as MFCC cannot effectively describe the differences in deep characteristics of speakers, and bottleneck features cannot represent the differences in deep characteristics of speakers from multiple aspects

[0005] 2. The current speaker modeling method, such as GMM, with the increase of the number of speakers, the recognition result of the method gradually deteriorates and cannot achieve better results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0079] Such as figure 1 Shown is the flow chart of the embodiment of the present invention, and concrete steps are as follows:

[0080] S1. Acoustic feature extraction: Extracting Filterbank features and MFCC features from speech samples, specifically including the following steps:

[0081] S1.1. Pre-emphasis: f(z)=1-αz -1 Filter the input speech for the transfer function, where the value range of α is [0.9,1];

[0082] S1.2. Framing: After pre-emphasis, the voice is divided into voice frames of a specific length, and the frame length is L, the frame shift is S, and the voice of the rth frame is expressed as x r (n), where 1≤r≤R, 0≤n≤N-1, R and N represent the number of frames and the sampling points of each frame of speech respectively;

[0083] S1.3. Windowing: Multiply each frame of speech by the window function w(n), and the window function is a Hamming window, which is recorded as:

[0084]

[0085] S1.4. Extracting Filterbank features and MFCC features, the specif...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a speaker recognition method based on multi-stream hierarchical fusion transform features and a long-short-term memory network. The method comprises: a Filterbank feature and aMel frequency cepstral coefficient feature are extracted from a speech sample as two feature flows respectively; the two feature flows are inputted into two deep belief networks with bottleneck layers respectively to carry out feature conversion, thereby obtaining two bottleneck feature flows; the two bottleneck feature flows are spliced and the spliced feature flows are inputted into a third deep belief network with a bottleneck layer to carry out feature conversion, so that a feature after fusion conversion; and then a speaker to which the voice sample belongs is determined by using a long-short-term memory network. According to the invention, fusion conversion is carried out on inputted acoustic features by using multiple deep belief networks; and compared with the single acoustic feature or the single feature after neural network conversion, the provided feature after fusion conversion is able to depict characteristic differences of different speakers, so that an excellent effectis obtained in speaker identification.

Description

technical field [0001] The invention relates to the technical fields of speech processing and deep learning, in particular to a speaker identification method based on multi-stream layered fusion transformation features and long-short-term memory network. Background technique [0002] Pattern recognition is a hot spot in current research, and speaker recognition is one of its subfields. Speaker recognition refers to distinguishing the speaker's identity from the existing speaker set through a piece of speech. At present, the MelFrequency Cepstral Coefficient (MFCC) feature, Filterbank feature, and I-Vector feature are the most commonly used audio features to describe the differences in speaker characteristics, and have achieved good speaker recognition results. The above features are all superficial features, which cannot deeply represent the differences in characteristics of each speaker, and have certain limitations. In recent years, with the development of deep learning t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G10L17/20G10L17/18G10L17/02G10L17/04

CPCG10L17/02G10L17/04G10L17/18G10L17/20

Inventor 李鹏乾李艳雄

Owner SOUTH CHINA UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A Speaker Recognition Method Based on Multi-Stream Hierarchical Fusion Transform Features and Long Short-Term Memory Networks

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology