Unlock instant, AI-driven research and patent intelligence for your innovation.

Speaker segmentation method based on long short-term memory deep neural network

A deep neural network, long and short-term memory technology, applied in speech analysis, instruments, etc., can solve the problems of segmentation and recording accuracy, inability to adapt to speaker feature space, prolonged model training time, etc., to achieve fast and convenient construction , high accuracy, and the effect of improving labeling accuracy

Active Publication Date: 2022-02-22
FUDAN UNIV
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, on the one hand, collecting a large amount of training data is an extremely difficult task, and the increase in the amount of data also leads to a longer model training time; on the other hand, the clustering method used by the existing methods cannot adapt to the generated speaker Issues such as feature space, segmentation and recording accuracy will be affected

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speaker segmentation method based on long short-term memory deep neural network
  • Speaker segmentation method based on long short-term memory deep neural network
  • Speaker segmentation method based on long short-term memory deep neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] In order to make the technical means, creative features, goals and effects achieved by the present invention easy to understand, the speaker segmentation method based on the long short-term memory deep neural network of the present invention will be described in detail below in conjunction with the embodiments and accompanying drawings.

[0017]

[0018] In this embodiment, the data set used is TIMIT. TIMIT is a speech data benchmark for acoustic speech research and automatic speech recognition system development and evaluation. The dataset contains about 5 hours of audio data collected using wideband microphones in eight major dialect regions in the United States. Recordings are recorded at 16-bit, 16000Hz sampling rate. There are audio samples from 630 people in the TIMIT dataset, and 6300 single-speaker speech samples were obtained by asking each volunteer to read 10 polysyllabic sentences. Over 396,000 speaker transition points were obtained by randomly concatena...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a speaker segmentation labeling method and device based on long-short-term memory neural network, which is characterized in that the speaker recognition sample labeling model based on long-short-term memory deep neural network is used to detect each speech from the audio to be tested. The appearance and duration of human speech, including: step S1, preprocessing the audio to be tested to obtain audio frame-level features f1 and audio frame-level features f2; step S2, building a speaker recognition sample labeling model based on long short-term memory deep neural network , the speaker sample labeling model includes a speaker switching detection sub-model and a speaker feature modeling sub-model; step S3, respectively training the speaker switching detection sub-model and a speaker feature modeling sub-model; step S4, audio frame-level The feature f1 and the audio frame-level feature f2 are input into the speaker recognition sample labeling model based on the long short-term memory deep neural network to complete the classification record of the speaking time period of each speaker in the audio to be tested.

Description

technical field [0001] The invention belongs to the technical field of computer hearing and artificial intelligence, and relates to a method for speaker switching detection, speech feature space modeling and segmentation labeling in an auditory scene, in particular to a speaker switching detection based on a bidirectional long-short-term memory network model And voice segment labeling method. Background technique [0002] With the rapid improvement of machine learning technology and computer hardware performance, breakthroughs have been made in recent years in computer vision, natural language processing and speech detection and other application fields. Speaker segmentation is a basic task in computer speech processing, and its recording accuracy has been greatly improved. [0003] The task of speaker segmentation annotation can be divided into two key subtasks: speaker switch detection and speech feature space modeling. [0004] Among them, the speaker switching detectio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L17/18G10L17/02G10L17/04G10L25/24
CPCG10L17/18G10L17/02G10L17/04G10L25/24
Inventor 宓仕达杜姗姗冯瑞
Owner FUDAN UNIV