Sample label missing data classifier training method

A technology with missing data and sample labels, applied in the field of data analysis, can solve problems such as poor generality, poor performance of classifiers or classification methods, etc.

Inactive Publication Date: 2016-11-23
CHINA UNIV OF PETROLEUM (EAST CHINA)
View PDF3 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] In order to overcome the defects of poor versatility of the existing technology on different data sets and the poor generalization performance of the obtained classifier or classification method, the present invention provides an optimization solution technology, which regards all unlabeled samples as positive samples, Taking the reliability of its label as the decision variable to be solved, an optimization model is established based on the principle of structural risk minimization, and an effective algorithm is provided to solve it.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sample label missing data classifier training method
  • Sample label missing data classifier training method
  • Sample label missing data classifier training method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0084] The present invention is further described below in conjunction with accompanying drawing and embodiment. Select 3 polypeptide identification datasets to test the effectiveness of the disclosed method. These 3 datasets are listed in Table 1: the total number of samples of yeast, ups1 and tal08; The number of known negative samples; the number of unlabeled samples. Each data set is randomly divided into two sub-sets according to the ratio of 1:1 - training set and test set. The data processing method provided by the present invention is on the training set Compute, get the classification function, and test the performance of the classification function on an independent test set.

[0085] For the three data sets tested, a unified parameter setting is adopted: in the adaptive semi-supervised learning model (8), μ=5.0, c 1 = c 2 = 1.0, adopt the square loss function, and determine it by formula (6) In the termination criterion of Algorithm 1, r 1 The value is 0.5.

[...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a sample label missing data classifier training method which is suitable for processing classified data of two groups of samples, wherein all label data of one group of the samples are lost. An optimized solution technology is provided, label reliability of unlabeled samples is taken as a decision variable to be solved, an optimizing model is established on the basis of a structural risk minimization principle, the model can directly call a nonlinear-programmed toolkit on a small and medium-sized data set for solution, two convex-programmed sub-problems can be solved respectively by an alternate search algorithm on a large-scale data set, and two part variables of the model are solved by means of iteration. The method is high in universality in the different data sets, and good promotion performance on individual test sets is achieved.

Description

technical field [0001] The invention relates to a data analysis method, in particular to a support vector machine-based classifier training method for sample label missing data. Background technique [0002] For the classification data with known two types of sample labels, a more successful data classification method is the support vector machine. This type of method is guided by the principle of structural risk minimization. In a relatively simple function set, the minimum empirical Loss. For nonlinear classification problems, kernel functions are usually introduced to overcome the curse of dimensionality to a considerable extent. Support vector machines have become an important class of data processing and analysis methods for supervised learning problems. However, many practical applications The collected data is often incomplete, especially, the label information of a large part of the data is missing. The method of finding its inherent laws from this type of data usual...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/2411G06F18/214
Inventor 梁锡军夏重杭
Owner CHINA UNIV OF PETROLEUM (EAST CHINA)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products