Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Audio scene recognition method based on feature pyramid network

A feature pyramid and scene recognition technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problem of ineffective use of underlying features, and achieve the effect of fast prediction and improved model performance.

Inactive Publication Date: 2019-08-02
TIANJIN UNIV
View PDF4 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the bottom-up feature extraction process of traditional CNNs cannot effectively utilize the detailed information of the underlying features.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Audio scene recognition method based on feature pyramid network
  • Audio scene recognition method based on feature pyramid network
  • Audio scene recognition method based on feature pyramid network

Examples

Experimental program
Comparison scheme
Effect test

specific example

[0040] 1. Read the audio signal and perform truncation processing, and each segment is cut into a voice segment with a fixed duration of 10s;

[0041] 2. Perform frame-by-frame and window processing on fixed-duration speech signals, with 2048 sampling points per frame and 2048-point Hamming window;

[0042] 3. The signal after framing is extracted and logarithmized through the Mel filter bank, the number of filters is 134, the window length of the filter is 1704 points, and 852 points are overlapped between frames;

[0043] 4. Normalize the obtained Mel spectrogram;

[0044] 5. Input the normalized Mel spectrogram into the ASCFPN network for forward propagation;

[0045] 6. Use the voting method to count the prediction results of each frame, and the most predicted scene category is output as the prediction result of the entire audio.

[0046] Table 1 Comparison of various audio scene recognition algorithms

[0047]

[0048] As shown in the above table, ASCFPN is an algor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an audio scene recognition method based on a feature pyramid network. The method includes the steps of establishing a feature pyramid network model for audio scene recognition;training an audio scene recognition feature pyramid network model by using a training set containing the audio files of different scene categories and the corresponding scene categories; reading an audio file to be identified and cutting off the audio file; conducting extraction of Mel features, obtaining a two-dimensional Mel spectrogram of each audio frame, normalizing the two-dimensional Mel spectrogram, training the normalized two-dimensional Mel spectrogram for forward propagation of the audio scene recognition feature pyramid network model to obtain prediction probabilities for different audio scene categories, and taking the scene category with the maximum prediction probability as prediction output of the audio frame corresponding to the two-dimensional Mel spectrogram; and predicting the whole audio file which needs to be identified. According to the method, underlying feature information is fully utilized, and model performance is improved. Information brought by more and more data provided under the current big data trend can be fully utilized, and the prediction speed is high.

Description

technical field [0001] The invention relates to an audio scene recognition method. In particular, it relates to an audio scene recognition method based on feature pyramid network. Background technique [0002] Audio scene recognition is a method that allows a machine to process a recorded audio file or an uploaded data stream in order to allow the machine to imitate humans to identify specific background information (such as parks, streets or restaurants) behind the audio. [0003] In the field of machine learning, in order to solve the problem of scene recognition, many different models and audio feature representation methods have been proposed. As early as 1997, related research on the use of neural networks to solve the problem of scene audio has appeared. In 1998, Liu et al. used Recurrent Neural Networks (RNNs) and nearest neighbor classifiers to distinguish five different types of environmental sounds. However, due to the introduction of too many parameters in the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/06G10L25/30G10L25/45G10L25/03
CPCG10L15/06G10L15/063G10L25/03G10L25/30G10L25/45
Inventor 张涛梁晋华
Owner TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products