End-to-end bone and air conduction voice combined recognition method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A recognition method and speech technology, applied in speech recognition, speech analysis, instruments, etc., to improve the recognition performance and reduce the error rate

Pending Publication Date: 2022-05-13

NORTHWESTERN POLYTECHNICAL UNIV

View PDF0 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Since there is no publicly available large-scale bone-air-conduction speech database that can be used for deep learning speech recognition, so far, there is no work on deep learning-based end-to-end bone-air conduction joint speech recognition.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

specific Embodiment

[0063] S1: Get synchronized bone conduction and air conduction voice data (x a ,x b ) to build a data set, where x a For pure air conduction voice recorded in an anechoic laboratory or in a relatively quiet environment, x b Bone conduction voice for simultaneous recording. Downsample all speech to 16kHz, 16bit quantization. The input data of the model is noisy air conduction and bone conduction speech, and the output is the text y corresponding to the speech. Because the bone conduction speech itself does not introduce environmental noise, we only add noise to the air conduction speech according to a certain range of signal-to-noise ratio, that is, in is the noisy air-conducted speech, n a for ambient noise. The final dataset is Then further set 84% of the dataset as a training set, 8% as a validation set, and the remaining 8% as a test set.

[0064] S2: Data Augmentation and Feature Extraction

[0065] S21: Change the speech rate of the speech signal to perform p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an end-to-end bone and air conduction voice combined recognition method, which comprises the following steps of: firstly, acquiring synchronous air conduction and bone conduction voice data to construct a data set, and outputting a corresponding text; performing data enhancement and acoustic feature extraction on the air conduction and bone conduction voice signals; then, a Conformer-based end-to-end deep neural network model is built, and the Conformer-based end-to-end deep neural network model is composed of three parts, namely two branch networks for processing air conduction voice and bone conduction voice, and a fusion network based on multi-mode Transducer; and then training the neural network, and finally obtaining a corresponding recognition result through the trained network. Compared with a traditional method that voice recognition is carried out only through air conduction voice signals, the combined recognition method can remarkably reduce the error rate of voice recognition, and the overall recognition performance of the system is improved.

Description

technical field [0001] The invention belongs to the technical field of speech recognition, and in particular relates to a bone-air conduction speech joint recognition method. Background technique [0002] In the past decade, thanks to the rise and progress of deep learning, robust automatic speech recognition has achieved remarkable development and has been applied in various fields such as smartphones, smart home appliances, and automobiles. Robust speech recognition algorithms based on deep learning can be mainly divided into two types, one is to remove noise at the front end of the system, including speech enhancement, extracting noise-robust features, etc., and the other is to design an automatic Robust recognition models adapted to different noisy scenarios. However, so far, these deep learning-based speech recognition methods are based on air-conduction speech. Due to the conduction characteristics of speech in the air, speech is easily disturbed by environmental noi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L15/02G10L15/06G10L15/16G10L15/20G10L15/26

CPCG10L15/02G10L15/063G10L15/26G10L15/16G10L15/20Y02T90/00

Inventor 王谋陈俊淇张晓雷王逸平

Owner NORTHWESTERN POLYTECHNICAL UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

End-to-end bone and air conduction voice combined recognition method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

specific Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology