Differentially expressed gene identification method based on combined constraint non-negative matrix factorization

A technology of differentially expressed genes and non-negative matrix decomposition, applied in the field of pattern recognition, can solve the problems of no advantage in differential expression feature selection, lack of sparsity in non-negative matrix decomposition, etc., and achieve the effect of improving robustness

Active Publication Date: 2017-08-04
HANGZHOU HANGENE BIOTECH CO LTD
View PDF8 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, there is still room for improvement in the non-negative matrix factorization method, for example: due to the lack of sparsity in the process of processing gene expression data, the non-negative matrix factorization has no advantage in the selection of differentially expressed features; human cancer gene expression data usually contains some Outliers and noise, the traditional non-negative matrix factorization method cannot effectively deal with the influence of outliers and noise

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Differentially expressed gene identification method based on combined constraint non-negative matrix factorization
  • Differentially expressed gene identification method based on combined constraint non-negative matrix factorization
  • Differentially expressed gene identification method based on combined constraint non-negative matrix factorization

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] With the rapid development of deep sequencing technology and gene chip technology, a large amount of gene expression profile data has emerged. Therefore, finding a suitable data analysis method to process huge gene expression profile data has become a research hotspot in bioinformatics. Due to the limitation of experimental conditions, there are usually only a few dozen experimental samples, and gene sequencing technology can monitor tens of thousands of genes at the same time. Therefore, the analysis of gene expression profile data is a typical singular value problem in statistics - high-dimensional small sample problem . Usually, the dimensionality reduction method can be used to reduce the complexity of the data and improve the accuracy of the analysis results. Many dimension processing techniques, such as principal component analysis PCA, singular value decomposition SVD and other algorithms have been widely used. But they still have some shortcomings, the principa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a differentially expressed gene identification method based on combined constraint non-negative matrix factorization. The method comprises the following steps of 1, representing a cancer-gene expression data set with a non-negative matrix X, 2, constructing a diagonal matrix Q and an element-full matrix E, 3, introducing manifold learning in the classical non-negative matrix factorization method, conducting orthogonal-constraint sparseness and constraint on a coefficient matrix G, and obtaining a combined constraint non-negative matrix factorization target function, 4, calculating the target function, and obtaining iterative formulas of a basis matrix F and the coefficient matrix G, 5, conducting semi-supervision non-negative matrix factorization on the non-negative data set X, and obtaining the basis matrix F and the coefficient matrix G after iteration convergence, 6, obtaining an evaluation vector (the formula is shown in the description), sorting elements in the evaluation vector (the formula is shown in the description) from large to small according to the basis matrix F, and obtaining differentially expressed genes, 7, testing and analyzing the identified differentially expressed genes through a GO tool. The identification method can effectively extract the differentially expressed genes where cancer data is concentrated, and be applied in discovering differential features in a human disease gene database. The identification method has important clinical significance for early diagnosis and target treatment of diseases.

Description

technical field [0001] The invention discloses a method for identifying differentially expressed genes based on joint constrained non-negative matrix decomposition, which belongs to the technical field of pattern recognition and can be used to identify differentially expressed genes in cancer and provide a basis for early diagnosis and treatment of cancer. Background technique [0002] In recent years, the incidence of cancer has increased year by year, the early diagnosis rate is low, the mortality rate is high, and the pathogenesis is very complicated. Mining and discovering the relevant information contained in the cancer gene expression data will help people gain an in-depth understanding of disease-related expressed genes and their regulatory networks. [0003] With the rapid development of deep sequencing technology and gene chip technology, large-scale gene expression profile data have been generated, and only a few genes are related to cell canceration. In many case...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/20
Inventor 代凌云刘金星郑春厚
Owner HANGZHOU HANGENE BIOTECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products