End-to-end bone and air conduction voice combined recognition method

A recognition method and speech technology, applied in speech recognition, speech analysis, instruments, etc., to improve the recognition performance and reduce the error rate

Pending Publication Date: 2022-05-13
NORTHWESTERN POLYTECHNICAL UNIV
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Since there is no publicly available large-scale bone-air-conduction speech database that can be used for deep learning sp

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • End-to-end bone and air conduction voice combined recognition method
  • End-to-end bone and air conduction voice combined recognition method
  • End-to-end bone and air conduction voice combined recognition method

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment

[0063] S1: Get synchronized bone conduction and air conduction voice data (x a ,x b ) to build a data set, where x a For pure air conduction voice recorded in an anechoic laboratory or in a relatively quiet environment, x b Bone conduction voice for simultaneous recording. Downsample all speech to 16kHz, 16bit quantization. The input data of the model is noisy air conduction and bone conduction speech, and the output is the text y corresponding to the speech. Because the bone conduction speech itself does not introduce environmental noise, we only add noise to the air conduction speech according to a certain range of signal-to-noise ratio, that is, in is the noisy air-conducted speech, n a for ambient noise. The final dataset is Then further set 84% of the dataset as a training set, 8% as a validation set, and the remaining 8% as a test set.

[0064] S2: Data Augmentation and Feature Extraction

[0065] S21: Change the speech rate of the speech signal to perform p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an end-to-end bone and air conduction voice combined recognition method, which comprises the following steps of: firstly, acquiring synchronous air conduction and bone conduction voice data to construct a data set, and outputting a corresponding text; performing data enhancement and acoustic feature extraction on the air conduction and bone conduction voice signals; then, a Conformer-based end-to-end deep neural network model is built, and the Conformer-based end-to-end deep neural network model is composed of three parts, namely two branch networks for processing air conduction voice and bone conduction voice, and a fusion network based on multi-mode Transducer; and then training the neural network, and finally obtaining a corresponding recognition result through the trained network. Compared with a traditional method that voice recognition is carried out only through air conduction voice signals, the combined recognition method can remarkably reduce the error rate of voice recognition, and the overall recognition performance of the system is improved.

Description

technical field [0001] The invention belongs to the technical field of speech recognition, and in particular relates to a bone-air conduction speech joint recognition method. Background technique [0002] In the past decade, thanks to the rise and progress of deep learning, robust automatic speech recognition has achieved remarkable development and has been applied in various fields such as smartphones, smart home appliances, and automobiles. Robust speech recognition algorithms based on deep learning can be mainly divided into two types, one is to remove noise at the front end of the system, including speech enhancement, extracting noise-robust features, etc., and the other is to design an automatic Robust recognition models adapted to different noisy scenarios. However, so far, these deep learning-based speech recognition methods are based on air-conduction speech. Due to the conduction characteristics of speech in the air, speech is easily disturbed by environmental noi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L15/02G10L15/06G10L15/16G10L15/20G10L15/26
CPCG10L15/02G10L15/063G10L15/26G10L15/16G10L15/20
Inventor 王谋陈俊淇张晓雷王逸平
Owner NORTHWESTERN POLYTECHNICAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products