Marker screening method based on single variable and pair variables

A screening method and univariate technology, applied in the field of biological data analysis, can solve the problem of high false positives of molecular individual markers, achieve effective data processing methods, and expand the search space

Inactive Publication Date: 2018-09-14
DALIAN UNIV OF TECH
View PDF3 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, molecular individual markers often suffer from high false positives

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Marker screening method based on single variable and pair variables
  • Marker screening method based on single variable and pair variables
  • Marker screening method based on single variable and pair variables

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The specific implementation of the present invention will be further described below in combination with a technical solution and a set of simulated data. The simulated data is only used to describe the present invention for easy understanding, rather than to limit the present invention.

[0029] Be the simulated data of the present invention in table 1, data comprises two types (c 1 and c 2 ), each category contains 5 samples, a total of 4 variables: f 1 , f 2 , f 3 and f 4 .

[0030] Table 1: Variable f 1 , f 2 , f 3 and f 4 Values ​​on 10 samples

[0031]

[0032] (1) We use the variable f 1 As an example, calculate the optimal splitting point for a variable. the variable f 1 Arranged in ascending order, the sorting result is {-11,-10,-6,-3,-2,-2,1,6,8,10}; the midpoint of two adjacent values ​​​​is used as the split point, then all The split points are {-10.5,-8,-4.5,-2.5,-2,-0.5,3.5,7,9}; use the formula (1) to calculate the information gain of the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of biological data analysis, and relates to a marker screening method based on a single variable and pair variables. As biological data has the advantagesof being high in dimensionality and small in sample size, variables are evaluated and selected through the simple and accurate decision rule, and classification and forecasting are important tasks ofbiological data analysis. For comprehensively evaluating the variables, a best split point of the single variable is calculated through information gain. A new variable is constructed through the bestsplit point, and the pair variables are established through the new variable and an original variable corresponding to the new variable. Meanwhile, variables of primal space generate variable pairs through combination two by two. All the pair variables are scored according to two scoring criterions, and are sorted in descending order according to scores, k pairs of variables which have the highest score and are not overlapped are selected to establish a fusion classifier. According to the marker screening method based on the single variable and the pair variables, the new variable is constructed through the single variable, the classification performance of the single variable and the classification performance of the pair variables can be evaluated through the same rules, and the and effective data processing method is provided.

Description

technical field [0001] The invention belongs to the technical field of biological data analysis, relates to a marker screening method based on univariate and pair variables, and is a feature selection and classification method for simultaneously measuring univariate and pair variables. Background technique [0002] Biological data usually has the characteristics of high dimensionality and small sample size, so using simple and accurate decision rules to evaluate and select variables to classify and predict is an important task of biological data analysis, which has great significance for the research of disease diagnosis, drug efficacy, prognosis, etc. important meaning. [0003] Single-molecule markers are often used as important indicators for clinical diagnosis and prognosis. For example, alpha-fetoprotein (AFP) has been considered the preferred serum tumor marker for the diagnosis of liver cancer. However, molecular individual markers often suffer from high false positi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/12G06F19/24
CPCG16B5/00G16B40/00
Inventor 林晓惠宋欢欢张艳慧
Owner DALIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products