Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Method for semi-supervised learning of structured data

A structured data, semi-supervised learning technology, applied in the computer field, to achieve the effect of improving performance

Active Publication Date: 2019-07-05
CENT SOUTH UNIV
View PDF6 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The raw data in many real-world applications is structured and unlabeled, but a supervised learning task requires a large amount of manually labeled data as a training set, and high-quality manually labeled data means more manpower, domain knowledge and time overhead

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for semi-supervised learning of structured data
  • Method for semi-supervised learning of structured data
  • Method for semi-supervised learning of structured data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0030] The present invention mainly proposes a new model Embedding GAN (EmGAN) suitable for structured data on the basis of semi-supervised GAN (semi-supervised). Next, it will be introduced in detail from the following three aspects: model structure, generator and the objective function of the discriminator.

[0031] 1. Model structure

[0032] The structure of the whole algorithm model is as follows figure 1 shown. The model is divided into three parts by the dashed box:

[0033] A) The upper left corner is the preprocessing part of the original data x (structured data containing K class labels), which includes labeled samples x l and its label y l , sample x without label u and test set samples {x test ,y test}. As shown in the figure, we divide the feature set of the original data x into a subset of categorical features x CT and the numerical feature subset x NL two parts.

[0034] B) Inside the dashed box on the right is a six-layer fully connected network D(x;...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for semi-supervised learning of structured data, and the method comprises the steps: building a semi-supervised adversarial neural network model for the structured data, carrying out the preprocessing of original structured data X, and enabling the features of the original data X to be divided into a category type feature subset xCT and a numerical type feature subset xNL; the original input of the model discriminator is {x1, x2, xg}, wherein x1 is a positive integer;, xu is respectively a labeled sample and an unlabeled sample; wherein xg is a sample generatedby the generator; feature sets contained in x1 and xu are the same; inputting the class feature subset xCT of the sample into an Engineering layer; obtaining a corresponding dense embedding vector E(xCT), combining the dense embedding vector E (xCT) with the numeric feature subset xNL to obtain a sample E(xCT) + xNL with a new feature set, obtaining a normalized sample containing the new featureset by applying a BN technology, inputting the new sample into a discriminator for training, and generating a sample xg which is directly used as the input of the discriminator; the generator is composed of three layers of full-connection networks, BN is applied to the output of each layer to prevent gradient dispersion, noise serves as the output, and a production sample xg with the characteristic E(xCT) + xNL is obtained.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a method for semi-supervised learning of structured data. Background technique [0002] Semi-supervised learning problems have attracted a lot of attention in many domains, such as: anomaly detection, email archiving, etc. The raw data in many real-world applications is structured and unlabeled, but a supervised learning task requires a large amount of manually labeled data as a training set, and high-quality manually labeled data means more manpower, domain knowledge and time overhead. However, semi-supervised is a learning paradigm proposed to solve this contradiction. It can use a large amount of easily obtained unlabeled data and a small amount of manually labeled data to enhance the final performance of the classifier. Essentially, this method uses a large amount of unlabeled data to correct the assumptions learned on the labeled data. [0003] At present, many differen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/20G06K9/62G06N3/04
CPCG06N3/045G06F18/214
Inventor 邓晓衡黄戎沈海澜
Owner CENT SOUTH UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products