Biological information recognition method based on dynamic sample selection integration

A biological information identification and biological information technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of not being able to adjust the relationship between the recognition rate of small samples according to needs, the lack of training data, and the impact, etc., to achieve The effect of effectively dealing with biometric information identification problems and reducing time overhead

Inactive Publication Date: 2010-06-30
XIDIAN UNIV
View PDF0 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method has the advantages of using less training data and fast execution speed, but it starts from a set of balanced data sets and will be affected by the selection of the initial balanced data set
[0009] In short, the various methods in the past are difficult to identify small-class samples, and they cannot adjust the size relationship between the total recognition rate and the recognition rate of small-class samples according to the needs.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Biological information recognition method based on dynamic sample selection integration
  • Biological information recognition method based on dynamic sample selection integration
  • Biological information recognition method based on dynamic sample selection integration

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] refer to figure 1 , the specific implementation process of the present invention is as follows:

[0043] Step 1. Determine the biological information data to be trained and tested.

[0044] This method is a bioinformatics data recognition problem, so first there are some training samples with labels. In the experiment, 40% of the labeled data is randomly selected as the training set X, and the other part is used as the test set.

[0045] Step 2. Normalize the determined training set data.

[0046] For the determined training set data, the following formula is used to normalize the data to remove the influence of the magnitude of the data, and to obtain the characteristics of the normalized training data:

[0047]

[0048] Among them, v=(f 1 ,, f 2 ,..., f n ) represents the training data, min(v) represents (f 1 ,, f 2 ,..., f n ) among the minimum value, max(v) means (f 1 ,, f 2 ,..., f n ) of the maximum value. Thus v'=(f' 1 ,, f 2 ',..., f n ') is the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a biological information recognition method based on dynamic sample selection integration, mainly solving the problem of low correct recognition rate of subclass samples caused by data imbalance. The realizing process for solving the problem comprises the following steps: (1) a training set is divided into a series of balanced sub data sets by adopting a training set dividing method; (2) the obtained balanced sub data sets are divided into respective matrix classifiers as initial training sets; (3) on the matrix classifiers, cyclic training is carried out by adopting a dynamic sample selecting method; (4) a testing set is tested by decision functions obtained in each training, thus obtaining decision results; (5) weight of the decision results is calculated by adopting a cost-sensitive idea; and (6) the decision results of each time are weighted and integrated, thus obtaining the final recognition result. Compared with the prior art, the method has the advantages of high accuracy and low calculation complexity, the size relation between a correct ratio and a recall ratio can be regulated as required, and the method is used for recognizing biological information, network intrusion and financial fraud and detecting anti-spam.

Description

technical field [0001] The invention belongs to the technical field of information processing, relates to biological information identification, and is used for snoRNA identification, microRNA precursor identification, and authenticity identification of SNP sites in bioinformatics, and can also be used for network intrusion, financial fraud, and anti-spam detection. Background technique [0002] In bioinformatics research, there are a large number of classification problems with class imbalance. For example: non-coding RNA gene mining, especially microRNA mining. In addition, such problems are often encountered in SNP site discrimination, snoRNA identification, and microArray data analysis. Since the positive examples in most problems come from experimental verification, and the negative examples usually do not require experimental verification, the cost of obtaining negative examples is low and the cost of positive examples is high, so there are usually far more negative ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/00
Inventor 缑水平焦李成杨辉朱虎明吴建设杨淑媛侯彪张佳
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products