Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Detection methods for microarray mismarked samples

A technology for labeling samples and detection methods, applied in the field of computational biology, can solve problems such as low recall rate

Inactive Publication Date: 2009-08-26
JILIN UNIV
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, both algorithms are only applied in a single microarray dataset and have not been extensively validated on other datasets
Malossini (2006) proposed two classification perturbation methods to detect mislabeled samples. The stability method can achieve good recognition results, but still has the defect of low recall rate.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Detection methods for microarray mismarked samples
  • Detection methods for microarray mismarked samples
  • Detection methods for microarray mismarked samples

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] In the following, the present invention will be described in detail through the example of two-category gene chip data of breast cancer. The breast cancer (breast) gene expression profile data set of West et al. is a general data set, which contains 49 breast cancer samples, including 25 estrogen receptor positive (ER+) samples, estrogen receptor There were 24 negative (ER-) samples, and the gene chip contained 7129 genes. On this basis, suspicious samples 11, 14, 16, 31, 33, 45, 46, 40, 43 were removed, and then samples 1, 2, 3, 47, 48, 49 were manually flipped to make them mislabeled samples. The obtained data set is the instance data that will be used below.

[0051] 1. Overall Disturbance Impact Recognition Method

[0052] Step 1: Take an unflipped sample x from the dataset i , making its class label y i =-y i ;

[0053] Step 2: For each sample x in the dataset j , put x j As the test sample, other samples as the training set, for the sample x j Regression ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to detection methods for microarray data mismarked samples, belonging to the field of computational biology. The invention uses the influence of data perturbation on the regression model to identify suspected mismarked samples in the microarray; if the method of data perturbation is used in the pretreatment of disease gene expression data, the influence and loss caused by mismarks can be reduced. In the invention, a cofinal model between a description sample classmark and a sample gene expression vector is established, and then the classmarks of all the samples are perturbed in sequence to establish a perturbation regression matrix so as to capture the influence of the perturbation on the regression model. The invention defines three perturbation influence indexes: perturbation influence value, overall perturbation influence value and comprehensive perturbation influence value. Based on the defined three perturbation influence indexes, three detection methods for mismarked sample in response to microarray data are given: overall perturbation influence value identification method, comprehensive perturbation influence value discriminance method and gradual rectification method.

Description

technical field [0001] The patent of the present invention relates to a set of calculation methods for detecting mislabeled samples of microarray data, belonging to the field of computational biology. Background technique [0002] During the collection of gene expression data, due to the objective reasons of experimental methods and the subjective negligence of experimental operators, a lot of noise data is often introduced, and mislabeling of samples is one of the more common ones. The so-called mislabeling of samples means that the samples that originally belonged to a certain category are wrongly labeled as other categories, so that the sample becomes a wrong sample. This kind of situation is often seen in experiments on diseases, and the reasons are mostly the influence of subjective factors, such as the misoperation of the experimenter, the wrong judgment of the doctor, and so on. Since classification methods are widely used in medical cancer diagnosis and other fields...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F19/00G06K9/62
Inventor 梁艳春张琛吴春国周柚王岩杜伟
Owner JILIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products