Audio scene recognition method based on feature pyramid network

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A feature pyramid and scene recognition technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problem of ineffective use of underlying features, and achieve the effect of fast prediction and improved model performance.

Inactive Publication Date: 2019-08-02

TIANJIN UNIV

View PDF4 Cites 16 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the bottom-up feature extraction process of traditional CNNs cannot effectively utilize the detailed information of the underlying features.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

specific example

[0040] 1. Read the audio signal and perform truncation processing, and each segment is cut into a voice segment with a fixed duration of 10s;

[0041] 2. Perform frame-by-frame and window processing on fixed-duration speech signals, with 2048 sampling points per frame and 2048-point Hamming window;

[0042] 3. The signal after framing is extracted and logarithmized through the Mel filter bank, the number of filters is 134, the window length of the filter is 1704 points, and 852 points are overlapped between frames;

[0043] 4. Normalize the obtained Mel spectrogram;

[0044] 5. Input the normalized Mel spectrogram into the ASCFPN network for forward propagation;

[0045] 6. Use the voting method to count the prediction results of each frame, and the most predicted scene category is output as the prediction result of the entire audio.

[0046] Table 1 Comparison of various audio scene recognition algorithms

[0047]

[0048] As shown in the above table, ASCFPN is an algor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an audio scene recognition method based on a feature pyramid network. The method includes the steps of establishing a feature pyramid network model for audio scene recognition;training an audio scene recognition feature pyramid network model by using a training set containing the audio files of different scene categories and the corresponding scene categories; reading an audio file to be identified and cutting off the audio file; conducting extraction of Mel features, obtaining a two-dimensional Mel spectrogram of each audio frame, normalizing the two-dimensional Mel spectrogram, training the normalized two-dimensional Mel spectrogram for forward propagation of the audio scene recognition feature pyramid network model to obtain prediction probabilities for different audio scene categories, and taking the scene category with the maximum prediction probability as prediction output of the audio frame corresponding to the two-dimensional Mel spectrogram; and predicting the whole audio file which needs to be identified. According to the method, underlying feature information is fully utilized, and model performance is improved. Information brought by more and more data provided under the current big data trend can be fully utilized, and the prediction speed is high.

Description

technical field [0001] The invention relates to an audio scene recognition method. In particular, it relates to an audio scene recognition method based on feature pyramid network. Background technique [0002] Audio scene recognition is a method that allows a machine to process a recorded audio file or an uploaded data stream in order to allow the machine to imitate humans to identify specific background information (such as parks, streets or restaurants) behind the audio. [0003] In the field of machine learning, in order to solve the problem of scene recognition, many different models and audio feature representation methods have been proposed. As early as 1997, related research on the use of neural networks to solve the problem of scene audio has appeared. In 1998, Liu et al. used Recurrent Neural Networks (RNNs) and nearest neighbor classifiers to distinguish five different types of environmental sounds. However, due to the introduction of too many parameters in the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G10L15/06G10L25/30G10L25/45G10L25/03

CPCG10L15/06G10L15/063G10L25/03G10L25/30G10L25/45

Inventor张涛梁晋华

OwnerTIANJIN UNIV

Audio scene recognition method based on feature pyramid network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

specific example

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology