Combined model training method and system

A technology that combines models and training methods, applied in systems, speech analysis, instruments, etc. to determine the direction or offset, can solve the problems of inability to guarantee positioning performance, increase the amount of calculation, and achieve an accurate and robust DOA Estimation effect, voice interaction effect improvement, effect of improving accuracy

Active Publication Date: 2019-05-03
AISPEECH CO LTD
View PDF2 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Keyword-based target speaker localization method: Because it trains the mask network separately, the obtained time-frequency mask and localization tasks are independent of each other, which cannot guarantee the best localization performance; and the input features it uses are pre-extracted The phase difference feature between the sine-cosine channels increases the amount of additional calculation
The joint training method of time-frequency mask and DOA estimation network based on acoustic vector sensor: it uses an acoustic vector sensor, which is more complex and costly than ordinary microphone arrays; the estimated time-frequency mask is in the complex domain Compared with the real number field, it is more complex and has a large amount of calculation; the input features used are the data ratio between channels of the sub-band, power spectrum, coherence vector, etc., and feature extraction needs to be performed explicitly in advance, which increases the amount of additional calculation.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Combined model training method and system
  • Combined model training method and system
  • Combined model training method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0024] like figure 1 Shown is a flowchart of a joint model training method provided by an embodiment of the present invention, including the following steps:

[0025] S11: implicitly extracting the phase spectrum and the logarithmic magnitude spectrum of the noisy speech training set;

[0026] S12: Using the amplitude spectrum segment expanded b...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a combined model training method. The method comprises the following steps: extracting the phase spectrum and the logarithm magnitude spectrum of a noisy voicetraining set in an implicit manner; by utilizing the magnitude spectrum fragments of the logarithm magnitude spectrum after expansion as the input features of a time frequency masking network, and byutilizing the noisy voice training set and a clear voice training set, determining a target masking label used for training the time frequency masking network, based on the input features and the target masking label, training the time frequency masking network, and estimating a soft threshold mask; and enhancing the phase spectrum of the noisy voice training set by utilizing the soft threshold mask, wherein the enhanced phase spectrum is adopted as the input features of a DOA (direction of arrival) estimation network, and training the DOA estimation network. The embodiment of the invention further provides a combined model training system. According to the embodiment of the invention, by setting the target masking label, the input features are extracted in an implicit manner, and the time frequency masking network and DOA estimation network combined training is more suitable for the DOA estimation task.

Description

technical field [0001] The invention relates to the field of sound source localization, in particular to a joint model training method and system. Background technique [0002] Sound source localization is the task of estimating the speaker DOA (Direction of arrival) from the received speech signal. DOA estimation is essential for various applications such as human-computer interaction and teleconferencing, and is also widely used in speech Enhanced beamforming. For example, sound source localization is added to the chat video. As the chat user's position changes, the voice received by the user at the other end can feel the change of the other party's position, improving the user experience. [0003] In order to determine the direction of arrival, the target speaker localization method based on keywords can be used: use the neural network to estimate the time-frequency mask separately, and then use the estimated mask to enhance the input features of the direction of arrival...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/06G10L15/16G01S3/808
Inventor 钱彦旻张王优周瑛
Owner AISPEECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products