The invention relates to a
gene classification method and
system based on clustering and
random forest algorithms and belongs to the technical field of biological information. The method comprises a step of acquiring
gene sample data, clustering the acquired
gene sample data by using the clustering
algorithm to obtain a cluster center, and supplementing a training sample set with an obtained cluster center set, a step of adjusting the number of fixed
decision tree random description attributes in a traditional
random forest algorithm to a random value, wherein on one hand, strong decision trees in a
decision tree set are kept, on the other hand, the number of average random description attributes of the
decision tree set is reduced, thus the correlation between the decision trees is further reduced, and a step of predicting
genetic data to be classified by using each decision tree in a
random forest model. According to the method and the
system, the cluster center obtained through theclustering
algorithm is taken as artificial data to expand the
training set of the random forest model, thus the random forest model is fully trained, the obtained classification model has high precision, and the accuracy of the classification of
genetic data is improved.