Feature selection method and system for iterative integration of multi-class imbalanced genomics data

A technology for genomics data and feature selection, applied in genomics, proteomics, electronic digital data processing, etc., can solve the problems of low classification accuracy in predicting small class samples, ignoring the highly correlated features of small class labels, etc. Achieve the effect of improving classification accuracy and improving classification recognition ability

Inactive Publication Date: 2018-03-02
SHENZHEN UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] In view of the deficiencies in the prior art above, the purpose of the present invention is to provide a feature selection method and system for iterative integration of multi-class unbalanced genomics data, aiming to solve the problem of low classification accuracy of the existing feature selection methods for predicting small class samples, Issues such as features that are highly correlated with small class labels are ignored

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Feature selection method and system for iterative integration of multi-class imbalanced genomics data
  • Feature selection method and system for iterative integration of multi-class imbalanced genomics data
  • Feature selection method and system for iterative integration of multi-class imbalanced genomics data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The present invention provides a feature selection method and system for iterative integration of multi-type unbalanced genomics data. In order to make the purpose, technical solution and effect of the present invention more clear and definite, the present invention will be further described in detail below. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0040] see figure 1 , figure 1 It is a flow chart of a multi-category unbalanced genomics data iterative integration feature selection method provided by the present invention, which includes steps:

[0041] S1. Divide multi-category unbalanced genomics data into K sub-datasets with two types of samples;

[0042] S2. For each sub-data set, use the method of oversampling and undersampling to balance the number of samples of the two types through the iterative process, and perform feature selection in each iterati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses a multiclass unbalanced genomics data iterative integrated feature selection method and system. Aiming at the characteristic of unbalanced data distribution of multi-labeled genomics data, the present invention provides the iterative feature selection method. On the basis of integrating classifiers in a one-to-many manner, undersampling or oversampling and feature selection are iteratively operated, so that samples of a data set gradually reach a balanced state along with gradual decrease of the number of features. By adopting a classifier obtained after integration in the process, classification identification capability on subclass samples can be obviously improved. A weak classifier based on sub balanced data training is integrated into a strong classifier by adopting an integrated learning technology, so that classification accuracy can be obviously improved.

Description

technical field [0001] The invention relates to the field of feature selection and recognition, in particular to a feature selection method and system for iterative integration of multi-class unbalanced genomics data. Background technique [0002] Genomic microarray technology has been widely used in cancer diagnosis, but the identification and determination of cancer-related genes remains a major challenge. Genomic microarray data usually has tens of thousands of genes, and discovering potential markers or gene sets related to cancer among these genes is a very important task. If this type of gene selection problem is analyzed from the perspective of machine learning, it can be regarded as a feature selection problem. The goal of feature selection is to identify features that are highly correlated with class labels. [0003] According to whether learning methods are used to evaluate feature subsets, feature selection methods can be divided into three main categories: (1) ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F19/18
CPCG16B20/00G16B40/00
Inventor 杨峻山纪震朱泽轩周家锐殷夫
Owner SHENZHEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products