Multi-target learning far field speech recognition method based on amplitude and phase information

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A technology of phase information and speech recognition, applied in speech recognition, speech analysis, instruments, etc., can solve problems such as performance degradation, lower speech recognition accuracy, target speech interference, etc.

Inactive Publication Date: 2019-05-17

TIANJIN UNIV

View PDF6 Cites 11 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] After years of research, near-field speech recognition technology has made major breakthroughs and greatly improved performance, but there are still many problems in far-field speech recognition technology. interference, thereby reducing the accuracy of speech recognition, resulting in a sharp drop in performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0044] The present invention will be further described below through specific embodiments and accompanying drawings. The embodiments of the present invention are for better understanding of the present invention by those skilled in the art, and do not limit the present invention in any way.

[0045] like image 3 As shown, a far-field speech recognition method based on multi-target learning of amplitude and phase information, including the following steps:

[0046] Step 1, input data preparation: select the data provided by the REVERB 2014 challenge for the data set, and prepare data for the data in the training set, development set and verification set respectively;

[0047] Step 2, feature extraction:

[0048] 1) Feature extraction based on amplitude information: through framing, windowing, and for each short-time analysis window, the signal is converted from the time domain to the frequency domain by fast Fourier transform and the corresponding spectrum is obtained, and t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a multi-target learning far field speech recognition method based on amplitude and phase information, which comprises the following steps of: S1, preparing input data; S2, extracting an amplitude characteristic and various phase characteristics; and S3, constructing a multi-task deep neural network, inputting the extracted amplitude characteristic and phase characteristicsinto the neural network to train, and outputting an enhanced speech and the enhanced characteristics. The enhanced speech is utilized to carry out SRMR evaluation, and the enhanced characteristics areutilized to carry out speech recognition. The method utilizing multi-target learning, which is disclosed by the invention, simultaneously enhances the speech and the characteristics, and compared toan existing method, in consideration of a bad effect of characteristics of a group delay system (MGDCC) under a reverberation speech, adds another phase characteristic, i.e., channel information (PBSFVT) of a source separation method based on a phase domain, to make up the defect of the MGDCC, so as to improve speech recognition accuracy.

Description

technical field [0001] The invention belongs to the technical field of far-field speech recognition, and in particular relates to a far-field speech recognition method based on multi-target learning of amplitude and phase information. Background technique [0002] Voice interaction is the most direct and natural way of communication in human society. As one of the key technologies, speech recognition can convert speech signals into text by recognizing speech signals. Speech recognition is an interdisciplinary subject involving a wide range of fields, and its ultimate goal is to enable humans to interact with computers by voice. [0003] After years of research, near-field speech recognition technology has made major breakthroughs and greatly improved performance, but there are still many problems in far-field speech recognition technology. This reduces the accuracy of speech recognition and leads to a sharp drop in performance. Therefore, it is necessary to perform speech...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L15/16G10L15/01G10L21/0232G10L21/0264

Inventor党建武崔凌赫王龙标李东播

OwnerTIANJIN UNIV

Multi-target learning far field speech recognition method based on amplitude and phase information

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements:Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology