Unlock instant, AI-driven research and patent intelligence for your innovation.

Sound scene recognizing method based on label amplification and multi-spectrum fusion

A scene recognition and spectrogram technology, applied in the field of scene recognition, can solve the problem of not considering clustering and extracting super-category labels, and achieve the effects of fast training convergence, improved performance, and system robustness

Active Publication Date: 2018-12-04
SOUTH CHINA NORMAL UNIVERSITY
View PDF6 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Similarly, Document 4 assumes that hierarchical labels already exist, and does not consider how to cluster and extract supercategory labels

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sound scene recognizing method based on label amplification and multi-spectrum fusion
  • Sound scene recognizing method based on label amplification and multi-spectrum fusion
  • Sound scene recognizing method based on label amplification and multi-spectrum fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0031] Such as figure 1 As shown, in this embodiment, a sound scene recognition method based on tag amplification and multi-spectrum fusion includes the following steps:

[0032] Step S1: the data set used in this embodiment includes the Development file set and the Evaluation file set of DCASE2017 sound scene recognition; 90% of the Development file set is used as the training part Tr, and the remaining 10% is used as the verification part V1, and the Evaluation file set is used as the verification part V1 As a test part Te. The audio files in each file set are 10 seconds long. Without loss of generality, this embodiment only uses two spectrogram formats to describe the implementation steps: one is the STFT spectrogram, and the other is the CQT spectrogram.

[0033] Step S2: Take out the audio files one by one from Tr, and obtain the STFT time-frequency characteristic value after operations such as framing, windowing, and short-time Fourier transform, and organize the time-...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a sound scene recognizing method based on label amplification and multi-spectrum fusion. The method comprises the following steps: by adopting different signal processing technologies, generating various spectra for sound scene data; aiming at each spectrum, training a deep convolution neural network model which serves as a basic classification model; by utilizing the labelamplification technology, amplifying an ultra-class label for a sample, improving the original network model into a multi-task learning model by using an artificially constructed layered label, and optimizing the performances of the basic classification model; and extracting the sample features by utilizing the improved basic classification model, splicing a plurality of depth features of a voicescene file, and carrying out dimensionality reduction, thus obtaining the global features. The plurality of global features of the corresponding different spectra are fused, and an SVM classifier istrained and serves as the final classification model. According to the method, the multi-spectrum feature fusion technology is adopted, so that the recognition performance is effectively promoted; andwith the provided label amplification and model promotion method, the performances of the basic classifier can be effectively optimized, and the method can be popularized to other application researches.

Description

technical field [0001] The invention belongs to the technical field of scene recognition, and in particular relates to a sound scene recognition method based on tag amplification and multi-spectrum fusion. Background technique [0002] The sound scene recognition technology analyzes the audio data to determine the attributes, functions and uses of the space environment where the machine is located. Sound scene recognition based on convolutional neural network has become one of the most effective methods in this field. Since the sound scene dataset is labeled according to the function of the place, the problem of similarity between classes is more prominent, such as libraries and self-study classrooms, which are easy to misjudge each other. On the other hand, these data, which are inherently similar in acoustic features, are indiscriminately considered as different categories when training the network model due to different functional purposes, which hinders the network mode...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L25/30G10L25/51G10L25/18G06K9/62G06N3/04
CPCG10L25/18G10L25/30G10L25/51G06N3/045G06F18/2411G06F18/214
Inventor 郑伟平刑晓涛莫振尧
Owner SOUTH CHINA NORMAL UNIVERSITY