Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Complex audio segmentation clustering method based on bottleneck feature

A technology for segmentation and clustering and bottlenecks, applied in speech analysis, speech recognition, special data processing applications, etc., can solve the problems of strong subjectivity, high cost of manual labeling, and low efficiency.

Inactive Publication Date: 2017-07-14
SOUTH CHINA UNIV OF TECH
View PDF1 Cites 49 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although manual labeling can be used to find out the audio types in the audio stream, the cost of manual labeling is high, subjectivity is strong, and the efficiency is low, while the supervised audio classification method needs to know the audio types in the audio stream in advance, and train specific audio types in advance. type classifier

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Complex audio segmentation clustering method based on bottleneck feature
  • Complex audio segmentation clustering method based on bottleneck feature
  • Complex audio segmentation clustering method based on bottleneck feature

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0165] Figure 4 It is a flowchart of an embodiment of the complex audio segmentation clustering method based on bottleneck features, and it mainly includes the following processes:

[0166] 1. Construction of deep neural network with bottleneck layer: read in training data and extract MFCC features, and then train a DNN feature extractor with bottleneck layer through two steps of unsupervised pre-training and supervised precise adjustment; the specific steps include:

[0167] S1.1. Read in the training data and extract the features of Mel-frequency cepstral coefficients. The specific steps are as follows:

[0168] S1.1.1, pre-emphasis: set the transfer function of the digital filter as H(z)=1-αz -1 , where α is a coefficient and its value is: 0.9≤α≤1, and the read-in audio stream is pre-emphasized after passing through the digital filter;

[0169] S1.1.2, Framing: Set the frame length of the audio frame to 25 milliseconds, the frame shift to 10 milliseconds, and the number ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a complex audio segmentation clustering method based on a bottleneck feature. The method comprises the steps that a deep neural network with a bottleneck layer is constructed; a complex audio stream is read, and endpoint detection is carried out on the complex audio stream; the audio feature of a non-silent segment is extracted and input into the deep neural network; the bottleneck feature is extracted from the bottleneck layer of the deep neural network; the bottleneck feature is used as input, and an audio segmentation method based on the Bayesian information criterion is used, so that each audio segment contains only one kind of audio type and adjacent audio segments have different audio types; a spectral clustering algorithm is used to cluster segmented audio segments to acquire the number of audio types of complex audios; and the audio segments of the same audio type are merged together. According to the invention, the used bottleneck feature is a deep transform feature, can more effectively describe the feature difference of the complex audio type than a traditional audio feature, and acquires an excellent effect in complex audio segmentation clustering.

Description

technical field [0001] The invention relates to audio signal processing and pattern recognition technology, in particular to a complex audio segmentation and clustering method based on bottleneck features. Background technique [0002] With the development and popularization of multimedia acquisition equipment, the Internet and cloud storage platforms, the demand for analysis and retrieval of massive and complex audio content is becoming increasingly urgent. As an unsupervised method, complex audio segmentation and clustering are one of the important means of audio content analysis. Although manual labeling can be used to find out the audio types in the audio stream, the cost of manual labeling is high, subjectivity is strong, and the efficiency is low, while the supervised audio classification method needs to know the audio types in the audio stream in advance, and train specific audio types in advance. type of classifier. Therefore, unsupervised complex audio segmentatio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/04G10L15/26G10L25/24G10L25/30G10L25/51G06F17/30
Inventor 李艳雄王琴李先苦张雪张聿晗
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products