Imbalance sample weighting method suitable for training of support vector machine

A technology of support vector machines and training samples, which is applied to computer parts, instruments, character and pattern recognition, etc., and can solve problems such as tiny, space and time waste

Inactive Publication Date: 2015-03-25
JIANGSU KING INTELLIGENT SYST
View PDF2 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Although the above-mentioned algorithms have made great improvements on the traditional support vector machine method, and have shown their respective advantages in dealing with large data and unbalanced data, they have ignored the fact that only those near the final classification decision surface in the training data set are Only the outer data samples may become support vectors, and most of the remaining samples play a small or no role in training the support vector machine. It is a space and time to weight these redundant data whose practical significance can be ignored. waste on

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Imbalance sample weighting method suitable for training of support vector machine
  • Imbalance sample weighting method suitable for training of support vector machine
  • Imbalance sample weighting method suitable for training of support vector machine

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] According to the specific steps above, combined with Figure 1-3 , a specific implementation example of the present invention is given below.

[0035] figure 1 After step 1, K-means clustering is performed, and K=6, and 6 sub-categories can be obtained. The sample data represent two types of data, where the square represents one type, and the circle represents the other type. T={T i |i=1,...,6}, the data of a certain category are surrounded by dotted ellipse lines. Corresponding to the 6 subcategories obtained, according to the distribution of the categories corresponding to the data objects contained in them, these 6 subcategories are divided into pure subcategories containing only a single category, recorded as UT={T 2 ,T 4 , T 5 , T 6} and subclasses containing two or more classes of mixed subclasses MT = {T 1 , T 3}. Each mixed subclass in MT is further divided into multiple pure subclasses, recorded as UMT={T 1A , T 1B , T 3A , T 3B}, get 8 sets of pur...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention is mainly used in the field of artificial intelligence and relates to an imbalance sample weighting method suitable for training of a support vector machine. The imbalance sample weighting method comprises the steps of utilizing clustering and Fisher discriminant rate criterion to perform redundant data reduction, then calculating the distances between reduced data samples and a fuzzy classification face, endowing corresponding weights according to the distances and then using the weighted data samples to perform training of the support vector machine. For solving the problem that a traditional support vector machine is still needed to be modified and improved on the aspect of large dataset processing or imbalance data sample processing, a novel algorithm is provided, corresponding weighting is conducted on the reduced large sample data again so as to perform training learning of the support vector machine, the training speed of the support vector machine is improved, the classification accuracy of the support vector machine is also improved, and great benefit is brought to classification of datasets of large samples.

Description

technical field [0001] The present invention is mainly used in the field of artificial intelligence, especially the technology of pattern recognition, relates to redundant data reduction based on clustering and Fisher's discriminant rate and a method for weighting unbalanced samples, especially a method suitable for supporting Imbalanced sample weighting method for vector machine training. Background technique [0002] Data classification has always been an important application branch in the field of artificial intelligence such as pattern recognition, and is widely used in character recognition, face detection and recognition, etc. There are a variety of classification techniques currently available, including decision tree methods, neural network methods, and support vector machine methods. The support vector machine method has gradually developed into the most widely used and most prominent classification effect because of its scientific statistical learning theoretical ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06V30/194G06F18/2411
Inventor 彭长生沈项军蔡炜
Owner JIANGSU KING INTELLIGENT SYST
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products