Single channel-based non-supervision target speaker speech extraction method

A speech extraction and unsupervised technology, which is applied in the directions of speech analysis, speech recognition, character and pattern recognition, etc., can solve problems such as the inability to guarantee the accuracy of clustering, the difficulty in selecting the characteristics of the human ear model, and the large amount of computation. , to achieve the effect of improving adaptability and intelligence

Active Publication Date: 2018-12-07
SHANTOU UNIV
View PDF10 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0015] MFCC coefficients are clustered, and MFCC extracts corresponding features based on framing speech. For longer speech segments, such as 40-minute classroom recordings, the amount of calculation will be large, and the clustering accuracy cannot be guaranteed.
[0016] 4. The article > is based on CASA for speech separation, simulating the human ear for speech separation, but the characteristics of the model human ear are difficult to select
[0018] 6. The influence of noise still exists in the results of single-channel speech separation. The above speech separation methods seldom further denoise the speech separation results and purify and separate the speech signals.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Single channel-based non-supervision target speaker speech extraction method
  • Single channel-based non-supervision target speaker speech extraction method
  • Single channel-based non-supervision target speaker speech extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0072] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0073] refer to figure 1 As shown, a single-channel, unsupervised target speaker voice extraction method of the present invention includes a teacher's language detection step and a teacher's language GGMM model training step.

[0074] Such as figure 2 As shown, teacher language detection should include the following steps:

[0075] S110, recording;

[0076] S120, voice signal preprocessing;

[0077] S130. Speech segmentation and modeling;

[0078] S140. Teacher voice detection.

[0079] Such as image 3 As shown, the teacher's voice GGMM model training functional unit should include the following steps:

[0080] S110, recording;

[0081] S120, voice signal preprocessing;

[0082] S130. Speech segmentation and modeling;

[0083] S240, clustering.

[008...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a single channel-based non-supervision target speaker speech extraction method comprising a teacher language detection step and a teacher language model training step; the teacher language detection step comprises the following parts: obtaining speech data from a classroom recording; processing speech signals; speech segmentation and modeling, the speech segmentation comprises steps of segmenting the classroom speech at equal length, aiming at each segment of speech and extracting corresponding MFCC features, and building each segment speech GMM modelaccording to the MFCC features; teacher speed detection, calculating the similarity between the GMM model of each segment speech except for teacher speech types and a GGMM, tagging the GMM models smaller than a set threshold as teacher speech types, thus obtaining the final teacher speech types; the teacher language GGMM model training step comprises the following parts: clustering the speech dataobtained in S3; obtaining an initial teacher speech type, and extracting the GGMM model according to the initial teacher speech type. The method can effectively improve the system adaptability and intelligence in real applications, thus laying foundation for following applications and researches.

Description

technical field [0001] The invention relates to a voice extraction method, in particular to a single-channel, unsupervised target speaker voice extraction method in a complex multi-speaker situation. Background technique [0002] The assurance of the quality of education is the key to our education at all levels. In improving the quality of education, improving the quality of teaching, especially the quality of classroom teaching should be the top priority. However, the current traditional method is based on manual (peer) on-site observation and evaluation. Although this type of method can play a certain role, it does not have universal operability and universal objectivity. The reasons are: It is difficult for a teaching authority to inspect the classroom, make evaluations and give suggestions all the time, which will inevitably bring a heavy burden to teaching management and is unnecessary. Furthermore, traditional on-site observation and evaluation cannot objectively ev...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/06G10L15/00G06K9/62
CPCG10L15/005G10L15/063G10L2015/0636G10L2015/0638G10L25/24G10L2015/0631G06F18/23213G06F18/214
Inventor 姜大志陈逸飞
Owner SHANTOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products