Gene classification method and system based on clustering and random forest algorithms

A technology of random forest algorithm and classification method, applied in the field of gene classification, can solve problems such as inaccurate classification, and achieve the effect of improving clustering accuracy, reducing complexity, and accurately dividing

Active Publication Date: 2018-11-20
HENAN NORMAL UNIV
View PDF5 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to provide a gene classification method and system based on clustering and random forest algorithm, to solve the problem of inaccurate classification due to less sample data in current gene classification methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Gene classification method and system based on clustering and random forest algorithms
  • Gene classification method and system based on clustering and random forest algorithms
  • Gene classification method and system based on clustering and random forest algorithms

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] The specific embodiments of the present invention will be further described below in conjunction with the accompanying drawings.

[0060] An embodiment of the gene classification method based on the clustering and random forest algorithm of the present invention.

[0061] The random forest algorithm has a relatively large dependence on the training set. The number of elements in the training sample set is large and the samples are rich, so the model will be trained more perfectly. However, for small sample genetic data, the training set has fewer elements, and the random forest model Accuracy is not guaranteed. Therefore, gene classification method of the present invention has fused clustering algorithm and random forest algorithm, as figure 2 As shown in , the cluster centers obtained by the clustering algorithm are added to the training set as an expansion of the training set to fully train the random forest model and obtain better classification results. The flow ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a gene classification method and system based on clustering and random forest algorithms and belongs to the technical field of biological information. The method comprises a step of acquiring gene sample data, clustering the acquired gene sample data by using the clustering algorithm to obtain a cluster center, and supplementing a training sample set with an obtained cluster center set, a step of adjusting the number of fixed decision tree random description attributes in a traditional random forest algorithm to a random value, wherein on one hand, strong decision trees in a decision tree set are kept, on the other hand, the number of average random description attributes of the decision tree set is reduced, thus the correlation between the decision trees is further reduced, and a step of predicting genetic data to be classified by using each decision tree in a random forest model. According to the method and the system, the cluster center obtained through theclustering algorithm is taken as artificial data to expand the training set of the random forest model, thus the random forest model is fully trained, the obtained classification model has high precision, and the accuracy of the classification of genetic data is improved.

Description

technical field [0001] The invention relates to a gene classification method and system based on clustering and random forest algorithms, belonging to the technical field of biological information. Background technique [0002] Genetic data classification is a research focus and hot issue in the field of information and decision-making at present, because genetic data has the characteristics of high dimensionality, small sample size, high noise, and many redundant attributes. Using traditional data analysis methods may face problems such as long time-consuming and low classification accuracy. In view of these characteristics of genetic data, existing pattern recognition and machine learning algorithms, such as linear discriminant, support vector machine, artificial neural network and k-nearest neighbor algorithms, have been widely researched and applied. [0003] Random Forest algorithm (Random Forest, RF) is a kind of integrated learning algorithm, which has the characteri...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/22G06F19/24
Inventor 孙林李源邹宁徐天贺韩庆阳孔祥琳张霄雨赵婧秦小营殷腾宇朱帅浩王学敏
Owner HENAN NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products