Noisy speech gender identification method and system based on lightweight neural network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of gender recognition and neural network, which is applied in the field of noise-containing speech gender recognition methods and systems, can solve problems such as difficult to extract audio features, low accuracy of male and female voice recognition, and small models, so as to avoid the superposition of time delay and improve Accuracy, Effect of Simplified Algorithms

Active Publication Date: 2021-02-19

北京快鱼电子股份公司

View PDF5 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] The first step in the fine-tuning of male and female voices is to realize real-time speech gender recognition, which is a classic binary classification problem. There are two methods for speech gender recognition in the prior art, one is the traditional machine learning method, and the Perform feature extraction to achieve data dimensionality reduction, and input it into the Gaussian mixture model or SVM model to train the model parameters. The traditional machine learning method has a small model, but it relies heavily on the accuracy of audio feature extraction. In the environment of unknown noise, the existing It is difficult for feature extraction methods to extract desired audio features, such as pitch pitch extraction. Therefore, in a noisy environment, the accuracy of traditional machine learning for male and female voice recognition is not high. Noise reduction algorithm is usually a trade-off between audio quality and noise reduction range, and some noise will still remain after noise reduction. In addition, the noise reduction algorithm will also introduce delay. When the noise reduction algorithm and the gender classification algorithm When connected in series, the time delay is not lower than the superposition of the two; the other is based on the deep learning neural network method, extracting speech acoustic features, building a neural network model, and then using softmax for classification. Compared with traditional machine learning The method, the way of neural network can avoid the dependence on the accuracy of the audio features of the input model. The acoustic features can be high-dimensional primary features, which can get higher recognition accuracy in the case of certain noise, but based on deep learning neural network In this method, since the audio segment input into the network is usually 1s~4s in length, although high accuracy is obtained, real-time performance is sacrificed, and the input of the neural network is usually a high-dimensional feature such as time-spectrum STFT or MFCC, and there is network training. There are many parameters and large models, and it is difficult to apply them to embedded devices

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0056] like figure 1 and figure 2 As shown, the present invention provides a kind of noise-containing speech gender recognition method based on lightweight neural network, and the method comprises the following steps:

[0057] S100: Mix pure male and female speech audio and pure noise audio to synthesize noisy speech.

[0058] S110: Collect pure male and female voice audio; use the pure male and female voice data in the TIMIT open source corpus and Librivox free audiobook audio prediction library, the number of pure male and female voice samples is 1:1, and the sampling rate is self-determined, only the sampling rate of the audio to be predicted and The sampling rate of the training samples here should be the same, for example, the sampling rate of the samples is 16kHz (but not limited to this);

[0059] S120: Complete the voice activity labeling and male and female category labeling corresponding to the pure voice; since it is pure voice, use the data window length of 30 m...

Embodiment 2

[0107] like Figure 10 As shown, the present invention provides a noise-containing speech gender recognition system based on a lightweight neural network, including a noise-containing speech synthesis module, an audio feature extraction module, a lightweight neural network model construction and training module, and a gender prediction module;

[0108] The noisy speech synthesis module is used to mix pure male and female voice audio and pure noise audio to synthesize noisy speech;

[0109] The audio feature extraction module is used to extract the audio feature of the noise-containing speech; the audio feature only includes: multiple BFCC features and first-order derivatives and second-order derivatives of some BFCC features, pitch gain value, fundamental frequency cycle value, voice short-term Zero crossing rate;

[0110] The lightweight neural network model construction and training module is used to construct and train a lightweight neural network model based on audio feat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a noisy speech gender identification method and system based on a lightweight neural network. The noisy speech gender identification method comprises the following steps of: synthesizing noisy speech based on pure male and female speech audios and pure noise audios; extracting audio features of the noisy speech, wherein the audio features only comprise a plurality of BFCC features, and first-order time derivatives and second-order time derivatives, fundamental tone gain values, fundamental frequency period values and voice short-time zero-crossing rates of part of the BFCC features; constructing a lightweight neural network model based on the audio features and performing training, wherein the lightweight neural network model comprises a voice activity branch, a noise reduction branch and a gender classification branch; and carrying out noisy speech gender prediction based on the lightweight neural network model. According to the noisy speech gender identification method, the lightweight neural network model containing noise reduction branches and male and female sound classification branches is established within a time range of 30 ms, the accuracy rate ishigh, and the method is suitable for actual application scenes containing unknown noise.

Description

technical field [0001] The invention relates to the technical field of speech recognition, in particular to a method and system for gender recognition of noisy speech based on a lightweight neural network. Background technique [0002] A good voice will give the listener an inexplicable appeal in daily communication. Gladstone, who served as the British Prime Minister for four times, said: "Voice is the most powerful instrument in communication." Certain special occupations and occasions require voice Higher, such as hosting, broadcasting, live broadcast and game voice chat, etc., but not everyone has a nice voice, so fine adjustment of male and female voices is required. [0003] The first step in the fine-tuning of male and female voices is to realize real-time speech gender recognition, which is a classic binary classification problem. There are two methods for speech gender recognition in the prior art, one is the traditional machine learning method, and the Perform fea...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L17/02G10L17/04G10L17/06G10L17/18G06N3/04G06N3/08

CPCG10L17/02G10L17/04G10L17/06G10L17/18G06N3/08G06N3/045

Inventor 张瑜袁斌

Owner 北京快鱼电子股份公司

Noisy speech gender identification method and system based on lightweight neural network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology