Ensemble classification method based on randomized greedy feature selection

A feature selection and classification method technology, applied in the field of bioinformatics and data mining, can solve the problem of poor difference

Active Publication Date: 2017-07-28
DALIAN UNIV OF TECH
View PDF2 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In the case of a large number of base classifiers, there wil...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Ensemble classification method based on randomized greedy feature selection
  • Ensemble classification method based on randomized greedy feature selection
  • Ensemble classification method based on randomized greedy feature selection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0077] like figure 1As shown, the overall design idea of ​​the present invention is: because the gene expression data has the characteristics of high dimensionality, small sample size and high redundancy, it is necessary to select important genes before classifying them. Firstly, a randomized greedy algorithm is used to select the gene subsets with the weighted local modular function as the heuristic information. Multiple feature subsets are generated through multiple randomized feature selections to form multiple different training sets for the integrated classification model. The randomized feature selection method not only screens out important genes for the classification model, but also expands the search range of the classification model in the feature space. In order to further improve the classification performance of the integrated classification model and improve the efficiency of classification, the method based on neighbor propagation clustering is used to select ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an ensemble classification method based on randomized greedy feature selection, and belongs to the field of bioinformatics and data mining. The method is used for classifying gene expression data related to plant stress response. The method includes the following steps that 1, randomness is introduced into a traditional greedy algorithm to conduct feature selection; 2, a weighting local modular function serving as a community discovery evaluation index in a complex network is used as heuristic information of the randomized greedy algorithm; 3, base classifiers are trained in each feature subset with a support vector machine algorithm; 4, clustering partition is conducted on the base classifiers with an affinity propagation clustering algorithm; 5, base classifiers serving as class representative points in the cluster are used for conducting integration, and an ensemble classification model is formed with a simple majority voting method. By means of the method, whether plant samples are stressed or not can be recognized according to gene expression data, and the microarray data classification precision is greatly improved; besides, the algorithm is high in generalization capability and has very high stability.

Description

technical field [0001] The invention belongs to the fields of bioinformatics and data mining, and in particular relates to the selection of important genes of gene expression data and the construction of a selective integrated classification model. Background technique [0002] The development of high-throughput sequencing technology provides researchers with massive gene expression data, and extracting valuable information from it has become a research hotspot in bioinformatics. Plants are often affected by diseases, insect pests and environmental factors during the growth process. How to predict and do a good job in prevention and control will play a very important role in the development of forestry, agriculture, animal husbandry, environmental protection and other aspects. Due to the characteristics of "high dimensionality", "small sample" and "high redundancy" of gene expression data, the traditional single classification algorithm will have problems such as poor classi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/24
CPCG16B40/00
Inventor 孟军张晶
Owner DALIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products