Mass spectrum data missing value filling method and system based on non-negative matrix factorization

A non-negative matrix decomposition, mass spectrometry data technology, applied in the field of data missing processing, can solve the problems of poor filling value stability, affecting filling performance, result deviation, etc., to achieve stable performance, excellent filling accuracy, and good filling accuracy.

Active Publication Date: 2020-10-30
XIAMEN UNIV
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The existence of extreme values ​​will directly affect the filling performance, making t

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mass spectrum data missing value filling method and system based on non-negative matrix factorization
  • Mass spectrum data missing value filling method and system based on non-negative matrix factorization
  • Mass spectrum data missing value filling method and system based on non-negative matrix factorization

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0092]In this embodiment, the mass spectrometry data sets are respectively placed in the three missing modes of MCAR, MNAR and MM. In each missing mode, different proportions of missing values ​​are simulated, and three disclosed missing value filling methods and the present invention are used. The method performs filling respectively, and the filling accuracy is evaluated through the improved NRMSE, thereby verifying the performance of the method of the present invention.

[0093] 1. Research object

[0094] Collection of mass spectrometry data sets: ①Data set 1 is from the experiment in the literature "Identification of Altered Metabolic Pathways in Plasma and CSF in Mild Cognitive Impairment and Alzheimer's Disease Using Metabolomics" published by Trushina et al. (Trushina, Dutta, Persson, Mielke, & Petersen, 2013). A plasma data set based on liquid chromatography / mass spectrometry (LC-MS) non-targeted metabolomics study of Alzheimer's disease (AD), the data set consists of...

Embodiment 2

[0112] In this example, a mass spectrometry data set is used to generate a data matrix simulating the missing pattern of MM, and extreme values ​​of different proportions are added, followed by filling with three published missing value filling methods and the method of the present invention, and evaluating the filling by the improved NRMSE accuracy, thereby verifying the performance of the method of the present invention.

[0113] 1. Research object

[0114] Mass Spectrometry Dataset Collection: Mass Spectrometry Dataset 1 is from the experiments published by Trushina et al. "Identification of Altered Metabolic Pathways in Plasma and CSF in Mild Cognitive Impairment and Alzheimer's Disease Using Metabolomics", which is based on liquid chromatography / mass spectrometry (LC-MS) The plasma data set of Alzheimer's disease (AD) non-targeted metabolomics research, the data set contains 557 assigned metabolites, 45 samples, and the missing value ratio of the data set is 11.7%;

[01...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a mass spectrum data missing value filling method and system based on non-negative matrix factorization, and the method comprises the steps: carrying out the pre-filling of missing values of a data set matrix, and obtaining a missing-free initial data matrix; carrying out logarithm transformation on all elements in the initial data matrix without loss; taking a group of dimension parameters of non-negative matrix factorization, and respectively carrying out non-negative matrix factorization to obtain a group of corresponding reconstruction matrixes; performing exponential transformation on the element values of the reconstructed matrix; calculating reconstruction errors between all reconstruction matrixes after exponential transformation and the missing-free initial data matrix; calculating corresponding weights under different reconstruction matrixes according to the reconstruction errors; performing weighted average on the reconstruction matrix to obtain a weighted reconstruction matrix; filling the missing positions in the data set matrix with the element values at the corresponding positions in the weighted reconstruction matrix; and carrying out characteristic metabolite identification and pathway analysis based on the missing-free final data matrix. According to the method, the data filling precision can be improved.

Description

technical field [0001] The invention relates to the field of missing data processing, in particular to a method and system for filling missing values ​​of mass spectrum data based on non-negative matrix decomposition. Background technique [0002] Mass spectrometry is a spectroscopic method with the same name as spectroscopic technology. It is a high-resolution analytical technique for identifying compounds by preparing, separating, and detecting gas or liquid ions. Because mass spectrometry can provide rich molecular structure information, and has the characteristics of high specificity and high sensitivity, it has been widely used in many fields such as chemical engineering, environmental energy, medicine, life and material science. Metabolomics research based on mass spectrometry technology refers to the use of gas chromatography (gaschromatography, GC-) or liquid chromatography (liquid chromatography, LC-) combined with mass spectrometry (mass spectrometry, MS) to study ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/16
CPCG06F17/16
Inventor 许晶晶王远山董继扬
Owner XIAMEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products