Sample label missing data classifier training method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology with missing data and sample labels, applied in the field of data analysis, can solve problems such as poor generality, poor performance of classifiers or classification methods, etc.

Inactive Publication Date: 2016-11-23

CHINA UNIV OF PETROLEUM (EAST CHINA)

View PDF3 Cites 15 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0009] In order to overcome the defects of poor versatility of the existing technology on different data sets and the poor generalization performance of the obtained classifier or classification method, the present invention provides an optimization solution technology, which regards all unlabeled samples as positive samples, Taking the reliability of its label as the decision variable to be solved, an optimization model is established based on the principle of structural risk minimization, and an effective algorithm is provided to solve it.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0084] The present invention is further described below in conjunction with accompanying drawing and embodiment. Select 3 polypeptide identification datasets to test the effectiveness of the disclosed method. These 3 datasets are listed in Table 1: the total number of samples of yeast, ups1 and tal08; The number of known negative samples; the number of unlabeled samples. Each data set is randomly divided into two sub-sets according to the ratio of 1:1 - training set and test set. The data processing method provided by the present invention is on the training set Compute, get the classification function, and test the performance of the classification function on an independent test set.

[0085] For the three data sets tested, a unified parameter setting is adopted: in the adaptive semi-supervised learning model (8), μ=5.0, c 1 = c 2 = 1.0, adopt the square loss function, and determine it by formula (6) In the termination criterion of Algorithm 1, r 1 The value is 0.5.

[...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a sample label missing data classifier training method which is suitable for processing classified data of two groups of samples, wherein all label data of one group of the samples are lost. An optimized solution technology is provided, label reliability of unlabeled samples is taken as a decision variable to be solved, an optimizing model is established on the basis of a structural risk minimization principle, the model can directly call a nonlinear-programmed toolkit on a small and medium-sized data set for solution, two convex-programmed sub-problems can be solved respectively by an alternate search algorithm on a large-scale data set, and two part variables of the model are solved by means of iteration. The method is high in universality in the different data sets, and good promotion performance on individual test sets is achieved.

Description

technical field [0001] The invention relates to a data analysis method, in particular to a support vector machine-based classifier training method for sample label missing data. Background technique [0002] For the classification data with known two types of sample labels, a more successful data classification method is the support vector machine. This type of method is guided by the principle of structural risk minimization. In a relatively simple function set, the minimum empirical Loss. For nonlinear classification problems, kernel functions are usually introduced to overcome the curse of dimensionality to a considerable extent. Support vector machines have become an important class of data processing and analysis methods for supervised learning problems. However, many practical applications The collected data is often incomplete, especially, the label information of a large part of the data is missing. The method of finding its inherent laws from this type of data usual...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/62

CPCG06F18/2411G06F18/214

Inventor 梁锡军夏重杭

Owner CHINA UNIV OF PETROLEUM (EAST CHINA)

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Sample label missing data classifier training method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology