Subway fault data classification method based on unbalanced data set

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of fault data and classification methods, applied in data processing applications, instruments, calculations, etc., can solve problems such as intrusion into the distribution space of negative samples, sample deletion, and insufficient consideration of spatial distribution, so as to achieve good model generalization ability and improve recognition The effect of rate, good classification effect

Pending Publication Date: 2020-09-04

NANJING UNIV OF SCI & TECH

View PDF4 Cites 11 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The disadvantage of this method is that undersampling technology can easily delete samples containing important information while deleting samples.

[0006] The traditional SMOTE algorithm does not consider the spatial distribution of samples enough, and lacks judgment rules for synthetic samples, which leads to the intrusion of synthetic positive samples into the distribution space of negative samples and affects the data classification effect.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0114] Step 1. Obtain the unbalanced data set D required for the experiment from the Guangzhou Metro operation data;

[0115] Step 2, divide the data set D into training data set D Train and the test dataset D Test ,Specific steps are as follows:

[0116] 2.1) Randomly divide the unbalanced data set into 5 parts with the same number of samples;

[0117] 2.2) One of the 5 samples is randomly selected as the test data set, and the other 4 samples are used as the training data set.

[0118] Step 3, put D Train The data samples in are divided into positive sample sets N min (minority class samples) and negative class sample set N maj (majority class samples), and calculate the number of samples to be sampled: T=N maj -N min ;

[0119] Step 4, use the k-Means clustering algorithm to classify the positive data set N min Clustering to get k clusters C i ,i=1,2,...,k. The specific steps of the K-Means clustering algorithm are as follows:

[0120] 4.1) The input data is pos...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a subway fault data classification method based on an unbalanced data set. The method comprises the following steps: inputting an original unbalanced data set, and dividing theunbalanced data set into a training data set and a test data set; the training data set is divided into a positive class sample set and a negative class sample set, wherein the positive class sampleset is a minority class sample, and the negative class sample set is a majority class sample; dividing the positive class sample set into K different clusters by using a K-Means clustering algorithm;for each cluster, sampling the data set by using an improved SMOTE algorithm to finally obtain a balanced data set; taking the SVM as a weak classifier, and constructing an integrated classifier by using an AdaBoost algorithm; and evaluating the performance of the integrated classifier by using the test data set. The method can effectively improve the recognition rate of a small number of types ofsamples in the unbalanced data set while guaranteeing the overall accuracy, and has a better effect in the classification prediction of the unbalanced data set.

Description

technical field [0001] The invention belongs to the technical field of data mining, in particular to a subway fault data classification method based on an unbalanced data set. Background technique [0002] During the long-term operation of the subway, the probability of equipment failure is very high. If it cannot be dealt with in time, it will cause great losses. Therefore, timely and effective fault diagnosis of the subway is becoming increasingly important. In fault diagnosis, fault data classification is the key technology. Classification methods are widely used in the field of prediction, and most classification methods require that the distribution of data is relatively uniform. If the distribution of the data is seriously unbalanced, the minority data is likely to be treated as noise. Data in real life often presents the characteristics of unbalanced distribution, that is, in the data set, the number of samples of different categories varies greatly. A large number...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/62G06Q50/26

CPCG06Q50/26G06F18/23213G06F18/2411G06F18/214

Inventor张永左婷婷谢志鸿方立超单梁徐志良

OwnerNANJING UNIV OF SCI & TECH

Subway fault data classification method based on unbalanced data set

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology