Unlock instant, AI-driven research and patent intelligence for your innovation.

Audio multi-label classification method based on deep learning

A deep learning, multi-label technology, applied in the field of multi-label classification

Inactive Publication Date: 2021-03-26
HUNAN UNIV
View PDF0 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The invention discloses an audio multi-label classification method based on deep learning, which solves the problem of automatically classifying complex environmental sounds under noise interference

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Audio multi-label classification method based on deep learning
  • Audio multi-label classification method based on deep learning
  • Audio multi-label classification method based on deep learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The hardware environment of the present invention is mainly a server whose GPU model is GeForce GTX 2080Ti. The software implementation uses ubuntu 16.04 as the platform, adopts the Python programming language, and is developed based on the deep learning framework TensorFlow. The experimental data set comes from the FSDKaggle2019 data set on the Kaggle platform. The data set consists of two parts, namely Freesound Dataset (FSD) and Yahoo Flickr Creative Commons 100M dataset (YFCC). FSD is based on AudioSet, and YFCC is a set of Audio track for Flickr videos. The entire dataset contains 80 class labels, such as applause, cows, rain, etc. The specific implementation process is mainly divided into five parts: data preprocessing, audio feature extraction, model construction and training, model evaluation, and audio label classification. details as follows:

[0036] 1. Data preprocessing

[0037] Since the original audio data set contains noise interference, this patent ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the field of audio marking of environmental sound recognition, in particular to a multi-label classification method based on deep learning for noisy audio. According to the content of the invention, data preprocessing comprises performing noise reduction processing on a data set by using an RNNoise algorithm. The audio feature extraction comprises carrying out short-time Fourier transform on an audio, then converting the audio into MFCC feature data, and then inputting the MFCC feature data into a VGGish network to obtain 128-dimensional high level feature embedding; the model construction comprises the steps that a CNN and an RNN neural network are determined to be used, the CNN can well utilize a two-dimensional structure of input data to process voice data, andthe RNN can well utilize correlation between labels to orderly predict the labels; the model training comprises tracking a loss function value and a classification error, and updating model parametersuntil a model with relatively high accuracy is obtained. The model evaluation comprises defining evaluation indexes and calculating average precision; the audio multi-label classification comprises the steps of loading the trained model and outputting a predicted label probability result. The process is shown in Figure 1.

Description

technical field [0001] The invention relates to the field of audio marking for environmental sound recognition, in particular to a deep learning-based multi-label classification method for audio with noise. Specifically, after the audio feature is extracted, it is used as the input of the neural network for training to obtain a model with high accuracy, so as to perform label classification. Background technique [0002] In recent years, deep learning has been widely used in speech recognition, image classification, automatic driving and other fields, and the classification of environmental sound recognition is a problem that is widely used in real life. At present, the research on this problem is gradually became a hotspot. [0003] Traditional single-label classification mainly solves the problem that an example belongs to only one category. However, in real life, due to the complexity and polysemy of the objective object itself, there is often no absolute single-label c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/65G06F16/683G06K9/62G06N3/04G06N3/08
CPCG06F16/65G06F16/683G06N3/08G06N3/047G06N3/045G06F18/241G06F18/2415
Inventor 陈浩马文钟雄虎
Owner HUNAN UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More