Single channel-based non-supervision target speaker speech extraction method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A speech extraction and unsupervised technology, which is applied in the directions of speech analysis, speech recognition, character and pattern recognition, etc., can solve problems such as the inability to guarantee the accuracy of clustering, the difficulty in selecting the characteristics of the human ear model, and the large amount of computation. , to achieve the effect of improving adaptability and intelligence

Active Publication Date: 2018-12-07

SHANTOU UNIV

View PDF10 Cites 11 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0015] MFCC coefficients are clustered, and MFCC extracts corresponding features based on framing speech. For longer speech segments, such as 40-minute classroom recordings, the amount of calculation will be large, and the clustering accuracy cannot be guaranteed.

[0016] 4. The article > is based on CASA for speech separation, simulating the human ear for speech separation, but the characteristics of the model human ear are difficult to select

[0018] 6. The influence of noise still exists in the results of single-channel speech separation. The above speech separation methods seldom further denoise the speech separation results and purify and separate the speech signals.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0072] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0073] refer to figure 1 As shown, a single-channel, unsupervised target speaker voice extraction method of the present invention includes a teacher's language detection step and a teacher's language GGMM model training step.

[0074] Such as figure 2 As shown, teacher language detection should include the following steps:

[0075] S110, recording;

[0076] S120, voice signal preprocessing;

[0077] S130. Speech segmentation and modeling;

[0078] S140. Teacher voice detection.

[0079] Such as image 3 As shown, the teacher's voice GGMM model training functional unit should include the following steps:

[0080] S110, recording;

[0081] S120, voice signal preprocessing;

[0082] S130. Speech segmentation and modeling;

[0083] S240, clustering.

[008...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The embodiment of the invention discloses a single channel-based non-supervision target speaker speech extraction method comprising a teacher language detection step and a teacher language model training step; the teacher language detection step comprises the following parts: obtaining speech data from a classroom recording; processing speech signals; speech segmentation and modeling, the speech segmentation comprises steps of segmenting the classroom speech at equal length, aiming at each segment of speech and extracting corresponding MFCC features, and building each segment speech GMM modelaccording to the MFCC features; teacher speed detection, calculating the similarity between the GMM model of each segment speech except for teacher speech types and a GGMM, tagging the GMM models smaller than a set threshold as teacher speech types, thus obtaining the final teacher speech types; the teacher language GGMM model training step comprises the following parts: clustering the speech dataobtained in S3; obtaining an initial teacher speech type, and extracting the GGMM model according to the initial teacher speech type. The method can effectively improve the system adaptability and intelligence in real applications, thus laying foundation for following applications and researches.

Description

technical field [0001] The invention relates to a voice extraction method, in particular to a single-channel, unsupervised target speaker voice extraction method in a complex multi-speaker situation. Background technique [0002] The assurance of the quality of education is the key to our education at all levels. In improving the quality of education, improving the quality of teaching, especially the quality of classroom teaching should be the top priority. However, the current traditional method is based on manual (peer) on-site observation and evaluation. Although this type of method can play a certain role, it does not have universal operability and universal objectivity. The reasons are: It is difficult for a teaching authority to inspect the classroom, make evaluations and give suggestions all the time, which will inevitably bring a heavy burden to teaching management and is unnecessary. Furthermore, traditional on-site observation and evaluation cannot objectively ev...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L15/06G10L15/00G06K9/62

CPCG10L15/005G10L15/063G10L2015/0636G10L2015/0638G10L25/24G10L2015/0631G06F18/23213G06F18/214

Inventor 姜大志陈逸飞

Owner SHANTOU UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Single channel-based non-supervision target speaker speech extraction method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology