Robust acoustic scene recognition method based on local learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An acoustic scene and partial learning technology, applied in the field of acoustic scene recognition, can solve problems such as unbalanced number of samples in different channels, mismatched audio channels, and low accuracy of acoustic scene recognition, and solve the problem of unbalanced number of device categories and fast Computing speed, easy-to-implement effects

Active Publication Date: 2019-08-27

HARBIN INST OF TECH

View PDF8 Cites 7 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] The present invention provides a robust acoustic scene recognition method based on local learning to solve the problem that the accuracy of acoustic scene recognition is not high when audio channels do not match and the number of samples in different channels is unbalanced.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

specific Embodiment approach 1

[0017] Specific implementation manner 1: This implementation manner provides a robust acoustic scene recognition method based on local learning, which specifically includes the following steps:

[0018] Step 1. Collect sound signals of different acoustic scenes at a sampling frequency of 44.1KHz and perform frequency domain feature extraction. The collected audio is divided into frame sequences with a frame length of 40ms. The 40-dimensional FBank (filtered) is extracted from each frame of data. Set) feature to establish a training sample set;

[0019] Step 2: Preprocessing the feature data extracted in Step 1:

[0020] Calculate the mean and standard deviation in each dimension for the features extracted in step 1, as attached figure 1 As shown, calculate the mean value μ for all samples along the time axis, and calculate the standard deviation σ in the same way; use the obtained mean value and standard deviation to normalize all features;

[0021] Step 3: Channel adaptation and data...

specific Embodiment approach 2

[0025] Specific embodiment two: This embodiment is different from specific embodiment one in that all the features normalized by using the mean and standard deviation obtained in step two are specifically:

[0026] Use the obtained mean and standard deviation to normalize the characteristic data according to the following formula:

[0027]

[0028] Where x norm Indicates the normalized data, μ is the mean, σ is the standard deviation; x is the characteristic data.

[0029] The other steps and parameters are the same as in the first embodiment.

specific Embodiment approach 3

[0030] Specific embodiment three: This embodiment is different from specific embodiment two in that the mean value shift in step three is specifically:

[0031] Add the difference ε to the normalized data with probability p:

[0032]

[0033] Among them, μ most Represents the data mean vector of the device with the largest number of samples; N represents the number of devices other than the device with the largest number of samples, μ i Represents the data mean vector of the i-th device except for the device with the largest number of samples; i=1,...,N; in order to increase the robustness of the system, not all data are added by difference, but by probability p plus, p∈[0,1].

[0034] The other steps and parameters are the same as in the second embodiment.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a robust acoustic scene recognition method based on local learning, and belongs to the technical field of sound signal processing. The robust acoustic scene recognition method comprises the steps: firstly, sound signals of different acoustic scenes are collected, and frequency domain feature extraction is conducted; extracted feature data are pre-processed; then the normalized data are subjected to mean value translation, and data augmentation is conducted through a mixup method; then a convolution neural network model is established according to the local learning thought, a training sample set after data augmentation is input into the model to be trained, and the trained model is obtained; and finally, a to-be-recognized sample is sequentially subjected to frequency domain feature extraction data pre-processing, and input into the trained model to be recognized, and the acoustic scene recognition result is obtained. The problem that the acoustic scene recognition accuracy is low under the conditions of audio channel mismatch and the unbalanced number of different channel samples is solved; and the robust acoustic scene recognition method can be suitable foracoustic scene recognition with various channels and the unbalanced number of the different channel samples.

Description

Technical field [0001] The invention relates to an acoustic scene recognition method, which belongs to the technical field of sound signal processing. Background technique [0002] Sound scene recognition can be widely used in fields such as robots and unmanned vehicles that need to effectively perceive the surrounding sound environment. However, there are often more than one sound collection devices in the real world, and different collection devices have different channel characteristics, so the collected signals are usually not exactly the same. How to automatically and accurately classify the scenes of sounds input from different channels and realize robust acoustic scene recognition has become an urgent and challenging research topic. [0003] In order to achieve robust acoustic scene recognition, it is necessary to make full use of the prior knowledge of the data. At present, most of the methods are acoustic scene recognition methods under pure speech or the same channel; s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L25/51G10L25/30G10L25/18

CPCG10L25/18G10L25/30G10L25/51

Inventor 韩纪庆杨皓郑贵滨郑铁然

Owner HARBIN INST OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Robust acoustic scene recognition method based on local learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

specific Embodiment approach 1

specific Embodiment approach 2

specific Embodiment approach 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology