Speaker segmentation method based on long short-term memory deep neural network
A deep neural network, long and short-term memory technology, applied in speech analysis, instruments, etc., can solve the problems of segmentation and recording accuracy, inability to adapt to speaker feature space, prolonged model training time, etc., to achieve fast and convenient construction , high accuracy, and the effect of improving labeling accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0016] In order to make the technical means, creative features, goals and effects achieved by the present invention easy to understand, the speaker segmentation method based on the long short-term memory deep neural network of the present invention will be described in detail below in conjunction with the embodiments and accompanying drawings.
[0017]
[0018] In this embodiment, the data set used is TIMIT. TIMIT is a speech data benchmark for acoustic speech research and automatic speech recognition system development and evaluation. The dataset contains about 5 hours of audio data collected using wideband microphones in eight major dialect regions in the United States. Recordings are recorded at 16-bit, 16000Hz sampling rate. There are audio samples from 630 people in the TIMIT dataset, and 6300 single-speaker speech samples were obtained by asking each volunteer to read 10 polysyllabic sentences. Over 396,000 speaker transition points were obtained by randomly concatena...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


