Ensemble self-training algorithm based on nearest neighbor density and semi-supervised k nearest neighbor (SSKNN)

A self-training and semi-supervised technology, applied in the field of computer information, can solve problems such as self-training local overfitting, failure to consider the information of the original spatial structure distribution of the sample, and inability to notice the influence of the sample to be tested, etc.

Inactive Publication Date: 2017-09-22
CHONGQING NORMAL UNIVERSITY
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, when the WKNN classifier in ESTAALCV classifies the test samples, it cannot pay attention to the influence of the unlabeled sample set on the test samples.
Moreover, in self-training, a small number of labeled samples are randomly selected as the initial set of labeled samples, which does not take into account the original spatial structure and distribution information of the samples, which may cause local overfitting of self-training.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Ensemble self-training algorithm based on nearest neighbor density and semi-supervised k nearest neighbor (SSKNN)
  • Ensemble self-training algorithm based on nearest neighbor density and semi-supervised k nearest neighbor (SSKNN)
  • Ensemble self-training algorithm based on nearest neighbor density and semi-supervised k nearest neighbor (SSKNN)

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0091] The present invention will be further described below in conjunction with the accompanying drawings. It should be noted that this embodiment is based on the technical solution, and provides detailed implementation and specific operation process, but the protection scope of the present invention is not limited to the present invention. Example.

[0092] like figure 1 As shown, an integrated self-training method based on the neighbor density and semi-supervised KNN, including using the neighbor density method to select the training set for initializing the classifier and using the integrated self-training method for self-training:

[0093] The training set for selecting the initialized classifier using the neighbor density method as described in S1 is as follows:

[0094] enter:

[0095] Raw dataset D;

[0096] Parameters: number of marked samples N_L, number of neighbors K1, average cosine similarity neighbors K2;

[0097] Output: labeled dataset L, unlabeled dataset...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an ensemble self-training algorithm based on nearest neighbor density and semi-supervised k nearest neighbor (SSKNN). A nearest neighbor density algorithm is used to select initializing labeled samples, adding k nearest neighbor samples around the labeled sample into a labeled candidate set is avoided, the initializing labeled samples are enabled to be as decentralized as possible to better reflect an original spatial structure of the samples, and at the same time, the samples with highest densities are selected, in the labeled sample candidate set, as the labeled samples. In order to improve the performance of data editing, the SSKNN is used to replace WKNN (weighted k nearest neighbor), the problem that when the WKNN is used to perform data editing, only influences of the labeled samples on to-be-measured sample categories are taken into account, and unlabeled samples around to-be-measured samples are not utilized is offset, and comparison experiment on UCI datasets verifies the validity of the provided algorithm.

Description

technical field [0001] The invention relates to the field of computer information technology, in particular to an integrated self-training method based on neighbor density and semi-supervised KNN. Background technique [0002] Integrated self-training algorithm [1] is integrated learning [2] and self-training [3,4] combined semi-supervised learning [5] frame. Compared with other semi-supervised learning methods, it does not require harsh assumptions, so it is favored by many scholars, but how to select reliable samples to add to the training set in the integrated self-training algorithm has always been a hot issue in semi-supervised learning. . [0003] Some scholars select reliable samples by taking the method of confidence, MFA Hady [6] A Co-Trainingby Committee integrated self-training learning framework is proposed. This method integrates multiple classifiers for self-training learning, where the confidence is the average posterior probability of multiple classifie...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & AuthorityApplications(China)
IPC IPC(8): G06K9/62
CPCG06F18/24147G06F18/214
Inventor吕佳黎隽男
OwnerCHONGQING NORMAL UNIVERSITY