Audio scene recognition method based on feature pyramid network

A feature pyramid and scene recognition technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problem of ineffective use of underlying features, and achieve the effect of fast prediction and improved model performance.
CN110085218AInactive Publication Date: 2019-08-02TIANJIN UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
TIANJIN UNIV
Publication Date
2019-08-02
Estimated Expiration
Not applicable · inactive patent

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses an audio scene recognition method based on a feature pyramid network. The method includes the steps of establishing a feature pyramid network model for audio scene recognition;training an audio scene recognition feature pyramid network model by using a training set containing the audio files of different scene categories and the corresponding scene categories; reading an audio file to be identified and cutting off the audio file; conducting extraction of Mel features, obtaining a two-dimensional Mel spectrogram of each audio frame, normalizing the two-dimensional Mel spectrogram, training the normalized two-dimensional Mel spectrogram for forward propagation of the audio scene recognition feature pyramid network model to obtain prediction probabilities for different audio scene categories, and taking the scene category with the maximum prediction probability as prediction output of the audio frame corresponding to the two-dimensional Mel spectrogram; and predicting the whole audio file which needs to be identified. According to the method, underlying feature information is fully utilized, and model performance is improved. Information brought by more and more data provided under the current big data trend can be fully utilized, and the prediction speed is high.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to an audio scene recognition method. In particular, it relates to an audio scene recognition method based on feature pyramid network. Background technique

[0002] Audio scene recognition is a method that allows a machine to process a recorded audio file or an uploaded data stream in order to allow the machine to imitate humans to identify specific background information (such as parks, streets or restaurants) behind the audio.

[0003] In the field of machine learning, in order to solve the problem of scene recognition, many different models and audio feature representation methods have been proposed. As early as 1997, related research on the use of neural networks to solve the problem of scene audio has appeared. In 1998, Liu et al. used Recurrent Neural Networks (RNNs) and nearest neighbor classifiers to distinguish five different types of environmental sounds. However, due to the introduction of too many parameters in the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More