Multi-target learning far field speech recognition method based on amplitude and phase information

A technology of phase information and speech recognition, applied in speech recognition, speech analysis, instruments, etc., can solve problems such as performance degradation, lower speech recognition accuracy, target speech interference, etc.

Inactive Publication Date: 2019-05-17
TIANJIN UNIV
View PDF6 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] After years of research, near-field speech recognition technology has made major breakthroughs and greatly improved performance, but there are still many problems in far-field speech recognition technology. interference, thereby reducing the accuracy of speech recognition, resulting in a sharp drop in performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-target learning far field speech recognition method based on amplitude and phase information
  • Multi-target learning far field speech recognition method based on amplitude and phase information
  • Multi-target learning far field speech recognition method based on amplitude and phase information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] The present invention will be further described below through specific embodiments and accompanying drawings. The embodiments of the present invention are for better understanding of the present invention by those skilled in the art, and do not limit the present invention in any way.

[0045] like image 3 As shown, a far-field speech recognition method based on multi-target learning of amplitude and phase information, including the following steps:

[0046] Step 1, input data preparation: select the data provided by the REVERB 2014 challenge for the data set, and prepare data for the data in the training set, development set and verification set respectively;

[0047] Step 2, feature extraction:

[0048] 1) Feature extraction based on amplitude information: through framing, windowing, and for each short-time analysis window, the signal is converted from the time domain to the frequency domain by fast Fourier transform and the corresponding spectrum is obtained, and t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-target learning far field speech recognition method based on amplitude and phase information, which comprises the following steps of: S1, preparing input data; S2, extracting an amplitude characteristic and various phase characteristics; and S3, constructing a multi-task deep neural network, inputting the extracted amplitude characteristic and phase characteristicsinto the neural network to train, and outputting an enhanced speech and the enhanced characteristics. The enhanced speech is utilized to carry out SRMR evaluation, and the enhanced characteristics areutilized to carry out speech recognition. The method utilizing multi-target learning, which is disclosed by the invention, simultaneously enhances the speech and the characteristics, and compared toan existing method, in consideration of a bad effect of characteristics of a group delay system (MGDCC) under a reverberation speech, adds another phase characteristic, i.e., channel information (PBSFVT) of a source separation method based on a phase domain, to make up the defect of the MGDCC, so as to improve speech recognition accuracy.

Description

technical field [0001] The invention belongs to the technical field of far-field speech recognition, and in particular relates to a far-field speech recognition method based on multi-target learning of amplitude and phase information. Background technique [0002] Voice interaction is the most direct and natural way of communication in human society. As one of the key technologies, speech recognition can convert speech signals into text by recognizing speech signals. Speech recognition is an interdisciplinary subject involving a wide range of fields, and its ultimate goal is to enable humans to interact with computers by voice. [0003] After years of research, near-field speech recognition technology has made major breakthroughs and greatly improved performance, but there are still many problems in far-field speech recognition technology. This reduces the accuracy of speech recognition and leads to a sharp drop in performance. Therefore, it is necessary to perform speech...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L15/16G10L15/01G10L21/0232G10L21/0264
Inventor 党建武崔凌赫王龙标李东播
Owner TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products