A Sound Scene Recognition Method Based on Label Augmentation and Multispectrum Fusion

A scene recognition, multi-spectrum technology, applied in the field of scene recognition, can solve the problem of not considering clustering to extract super-category labels, etc., to achieve the effect of fast training convergence, system robustness, and optimized performance

Active Publication Date: 2021-07-09
SOUTH CHINA NORMAL UNIVERSITY
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Similarly, Document 4 assumes that hierarchical labels already exist, and does not consider how to cluster and extract supercategory labels

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Sound Scene Recognition Method Based on Label Augmentation and Multispectrum Fusion
  • A Sound Scene Recognition Method Based on Label Augmentation and Multispectrum Fusion
  • A Sound Scene Recognition Method Based on Label Augmentation and Multispectrum Fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0030] like figure 1 As shown, in this embodiment, a sound scene recognition method based on tag amplification and multi-spectrum fusion includes the following steps:

[0031] Step S1: the data set used in this embodiment includes the Development file set and the Evaluation file set of DCASE2017 sound scene recognition; 90% of the Development file set is used as the training part Tr, and the remaining 10% is used as the verification part V1, and the Evaluation file set is used as the verification part V1 As a test part Te. The audio files in each file set are 10 seconds long. Without loss of generality, this embodiment only uses two spectrogram formats to describe the implementation steps: one is the STFT spectrogram, and the other is the CQT spectrogram.

[0032] Step S2: Take out the audio files one by one from Tr, and obtain the STFT time-frequency characteristic value after operations such as framing, windowing, and short-time Fourier transform, and organize the time-fre...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a sound scene recognition method based on label amplification and multi-spectrogram fusion, which includes: using different signal processing technologies to generate multiple spectrograms for sound scene data; The convolutional neural network model is used as the basic classification model; the label amplification technology is used to expand the super-category labels for the samples, and the artificially constructed hierarchical labels are used to improve the original network model into a multi-task learning model, and optimize the basic classification model. Performance; using the improved basic classification model to extract sample features, splicing multiple depth features of sound scene files, and reducing dimensionality to obtain global features. Fuse multiple global features corresponding to different spectrograms, train the SVM classifier, and use it as the final classification model. The invention applies the multi-spectrogram feature fusion technology to effectively improve the recognition performance; the proposed label amplification and model improvement method can effectively optimize the performance of the basic classifier and can be extended to other application researches.

Description

technical field [0001] The invention belongs to the technical field of scene recognition, and in particular relates to a sound scene recognition method based on tag amplification and multi-spectrum fusion. Background technique [0002] The sound scene recognition technology analyzes the audio data to determine the attributes, functions and uses of the space environment where the machine is located. Sound scene recognition based on convolutional neural network has become one of the most effective methods in this field. Since the sound scene dataset is labeled according to the function of the place, the problem of similarity between classes is more prominent, such as libraries and self-study classrooms, which are easy to misjudge each other. On the other hand, these data, which are inherently similar in acoustic features, are indiscriminately considered as different categories when training the network model due to different functional purposes, which hinders the network mode...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G10L25/30G10L25/51G10L25/18G06K9/62G06N3/04
CPCG10L25/18G10L25/30G10L25/51G06N3/045G06F18/2411G06F18/214
Inventor 郑伟平刑晓涛莫振尧
Owner SOUTH CHINA NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products