Cancer gene classification method and device based on two-stage depth feature selection and storage medium

A technology of deep features and classification methods, applied in neural learning methods, instruments, biological neural network models, etc., can solve the problems of large number of selected features and low classification accuracy, so as to avoid important genes from being selected, improve classification accuracy, Highly identifiable effect

Active Publication Date: 2021-06-08
QILU UNIV OF TECH
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0013] For the deep feature selection algorithm research so far, many problems have been solved, but there are still problems with a large number of selected features and low classification accuracy. In order to solve these problems, this paper The invention provides a cancer gene classification method based on two-stage deep feature selection, which improves the final classification accuracy through two-stage deep feature selection. The main problems to be solved by the present invention are as follows:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cancer gene classification method and device based on two-stage depth feature selection and storage medium
  • Cancer gene classification method and device based on two-stage depth feature selection and storage medium
  • Cancer gene classification method and device based on two-stage depth feature selection and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0072] A cancer gene classification method based on two-stage deep feature selection, through two-stage deep feature selection to improve the accuracy of cancer classification, such as figure 1 shown, including the following steps:

[0073] A. Training cancer gene classification model

[0074] (1) Obtain training data

[0075] The first stage: Integrate three feature selection algorithms for comprehensive feature selection to obtain feature subsets; this ensures that the final selected feature subsets are small and precise;

[0076] The second stage: using an unsupervised neural network to obtain the best representation of a subset of features; improves the final classification accuracy.

[0077] (2) Divide the best representation of the feature subset into a training set and a test set, and input it into the cancer gene classification model for training;

[0078] B. Cancer gene classification

[0079] Preprocess the cancer gene data to be detected and input it into the trai...

Embodiment 2

[0081] According to a kind of cancer gene classification method based on two-stage deep feature selection described in Example 1, the difference is:

[0082] In step B, the process of preprocessing the detected cancer gene data is as follows: After removing the null value and non-numeric data in the cancer gene data to be detected, the best representation of the feature subset is obtained through the first and second stages of processing, Feed a subset of features into a trained cancer gene classification model.

[0083] The present invention realizes feature selection in consideration of various aspects by using an integrated feature selection method; and extracts the best representation of features by using a non-supervised neural network to obtain cleaner gene features and improve classification accuracy.

Embodiment 3

[0085] According to a kind of cancer gene classification method based on two-stage deep feature selection described in Example 1, such as figure 1 As shown, the difference is:

[0086] Based on the boosting integrated feature selection method, three feature selection algorithms are integrated to achieve comprehensive feature selection. The three feature selection algorithms include analysis of variance (ANOVA), RReliefF algorithm and random forest algorithm (RF); the implementation process of comprehensive feature selection is as follows:

[0087] (1) Perform feature selection on the original data through variance analysis and RReliefF algorithm, and obtain candidate feature subsets according to two internal operations;

[0088] (2) Use the random forest algorithm to sort the candidate feature subsets according to the feature importance, and select the required feature subsets.

[0089] These three methods take into account the characteristics of gene features themselves, the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a cancer gene classification method and device based on two-stage depth feature selection and a storage medium. The method comprises the steps: A, training a cancer gene classification model: (1) obtaining training data: in the first stage, integrating three feature selection algorithms to carry out comprehensive feature selection, and obtaining a feature subset; in the second stage, the optimal representation of a feature subset is obtained by using an unsupervised neural network; (2) dividing the optimal representation of the feature subset into a training set and a test set, and inputting the training set and the test set into a neural network for training; and B, cancer gene classification: preprocessing to-be-detected cancer gene data, and inputting the preprocessed to-be-detected cancer gene data into the trained cancer gene classification model to realize cancer gene classification. According to the invention, by using the integrated feature selection method, feature selection is carried out in consideration of all aspects; and the optimal representation of the features is extracted by using the unsupervised neural network, so that cleaner gene features are obtained, and the classification precision is improved.

Description

technical field [0001] The invention relates to a cancer gene classification method, equipment and storage medium based on two-stage deep feature selection, and belongs to the technical field of gene expression. Background technique [0002] Cancer is one of the deadliest diseases in the world. The time of cancer discovery directly determines the treatment effect and life safety of the patient. [0003] The use of machine learning to process gene microarray data sets plays an important role in assisting the early diagnosis of cancer, but the number of gene features in microarray data sets is much larger than the number of samples, resulting in sample imbalance and affecting the efficiency and accuracy of classification. Feature selection is particularly important for gene array data. Existing deep feature selection algorithms are all committed to selecting important features from high dimensions, but they do not consider the large number of retained features and the poor p...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06N3/08
CPCG06N3/088G06F18/24
Inventor 董祥军胡艳羽
Owner QILU UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products