Voice sample equalization method combining mixed sampling and random forest

A random forest and voice sample technology, applied in the field of data processing, can solve problems such as noise intrusion, failure to consider the distribution of nearby majority class samples, and loss of classification information of data sets

Active Publication Date: 2022-05-27
SUZHOU UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] For this reason, the technical problem to be solved by the present invention is to overcome the SMOTE oversampling in the prior art when generating new samples without considering the distribution of the nearby majority class samples, a lot of noise will be synthesized and invaded into the majority class sample space and ENN undersampling will Defects that cause loss of classification information in datasets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice sample equalization method combining mixed sampling and random forest
  • Voice sample equalization method combining mixed sampling and random forest
  • Voice sample equalization method combining mixed sampling and random forest

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] The present invention will be further described below with reference to the accompanying drawings and specific embodiments, so that those skilled in the art can better understand the present invention and implement it, but the embodiments are not intended to limit the present invention.

[0055] refer to figure 1 shown, figure 1 It is a flow chart of the first specific embodiment of the speech sample equalization method of joint mixed sampling and random forest provided by the present invention; the specific operation steps are as follows:

[0056] Step S101: Collect an initial voice data set, perform feature extraction on the initial voice data set, and obtain an extracted voice data feature set;

[0057] Step S102: Use oversampling SMOTE to analyze the minority class samples of the speech data feature set and generate new target minority class samples according to the minority class samples, and use undersampling ENN to analyze the nearest neighbor samples and all th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a voice sample equalization method combining mixed sampling and random forest. The method comprises the following steps: firstly, carrying out feature extraction on an initial voice data set; then carrying out equalization processing on the extracted voice data feature set by using SMOTE-ENN hybrid sampling to obtain a currently equalized voice data set; secondly, inputting the current balanced voice data set into the two-factor random forest model, and outputting a classification evaluation index and an out-of-bag error classification rate of the two-factor random forest model; and finally, judging whether the classification evaluation indexes converge or not, and if the classification evaluation indexes converge, outputting the current balanced voice data set. Otherwise, updating the mixed sampling rate of the SMOTE-ENN mixed sampling according to the out-of-bag error classification rate, returning to carry out equalization processing on the extracted voice data set again until the classification evaluation index converges, and outputting the current equalization voice data set. According to the method, the SMOTE-ENN mixed sampling and the two-factor random forest model are combined to balance the data set, so that the sample data with high information value is reserved to the maximum extent.

Description

technical field [0001] The present invention relates to the technical field of data processing, in particular to a speech sample equalization method, apparatus, device and computer-readable storage medium based on joint mixed sampling and random forest. Background technique [0002] In recent years, artificial intelligence technology has made breakthroughs in speech recognition. However, the problem of data imbalance has always been a challenging problem in machine learning. The data with uneven distribution of categories will cause the recognition ability of the classifier to be significantly biased towards the majority category, and cannot achieve satisfactory classification performance for the minority category. [0003] Currently, traditional imbalanced learning techniques for solving imbalanced data classification problems can be divided into two categories: internal methods and external methods. The internal approach is to improve existing classification algorithms t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/02G06K9/62
CPCG10L15/02G06F18/214G06F18/24323
Inventor 张晓俊周长伟朱欣程陶智赵鹤鸣
Owner SUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products