Mass spectrum data compression method

A technology of mass spectrometry data and compression method, which is applied in the direction of electronic digital data processing, special data processing application, digital data information retrieval, etc., can solve the problems of no significant increase in compression rate, poor software adaptability, and low compression rate, etc., to achieve Fast reading speed, small space, efficient data exchange and reading effect

Pending Publication Date: 2021-11-16
碳硅杭州生物科技有限责任公司
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, they do not use some of the inherent biological characteristics of mass spectrometry data in the compression algorithm, so the compression rate has not been greatly improved.
The Toffee format uses the hardware features of TOF mass spectrometers for compression, but it is only effective for TOF mass spectrometers and lacks a universal data format
[0005] In the field of proteomics based on mass spectrometry, mass spectrometry files obtained by using data-independent acquisition (DIA) are often more than 10GB. Taking plasma samples as an example, the original manufacturer after using the Sciex6600 instrument for 90-minute gradient DIA acquisition The file is 4GB, and it is about 25GB when converted into mzML format. A conventional proteomics project generally contains hundreds of such files, and only the original file has TB-level storage costs. When computing in a distributed environment The resulting bandwidth cost is also very high. Other mass spectrometry data compression formats on the market have problems such as low compression rate and poor software adaptability. Although the calculation problem of a single project can be temporarily solved by purchasing higher configuration equipment in a short period of time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mass spectrum data compression method
  • Mass spectrum data compression method
  • Mass spectrum data compression method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] like figure 1 Shown, a kind of mass spectrometry data compression method, comprises the following steps:

[0031] S1. Segment the original mass spectrum data file, which is divided into mass spectrum data and basic metadata, wherein the mass spectrum data includes a mass-to-nucleus ratio array and an intensity array, and the mass-to-nucleus ratio array and the intensity array have the same length and are in one-to-one correspondence;

[0032] S2. The mass-to-core ratio array and the intensity array are compressed in the ZDPD compression kernel and converted into binary data; at the same time, the basic metadata information of the mass spectrum is saved in JSON format; after the end of step S1, it also includes deleting the point where the intensity information of the mass spectrum data is 0.

[0033] like figure 2 As shown, the algorithm principle and compression steps of ZDPD are as follows:

[0034] S21. Perform integer conversion according to the required target p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A mass spectrum data compression method comprises the following steps: S1, segmenting an original data file into mass spectrum data and basic metadata, wherein the mass spectrum data comprises a mass-kernel ratio array and an intensity array, and the mass-kernel ratio array and the intensity array are the same in length and are in one-to-one correspondence; S2, compressing the prime-kernel ratio array and the intensity array into binary data in a ZDPD compression kernel, wherein the basic metadata information of the mass spectrum is stored in a JSON format; S3, directly outputting a binary array generated when the ZDPD is used for compressing the kernel in the step S2 as a mass spectrum data Air format file, and merging related basic index data generated by a mass spectrometer under a multi-strategy index of a data dependence mode / a data non-dependence mode / a PRM mode / a traditional mode into the JSON format metadata in the step S2 during data compression, and forming a complete JSON file in the metadata format. According to the invention, a large-scale proteomics data center is established, and a set of self-developed calculation-oriented high-performance data format is realized.

Description

technical field [0001] The invention belongs to the technical field of file compression, in particular to a mass spectrum data compression method. Background technique [0002] Mass spectrometer is an instrument for detecting the mass-to-nucleus ratio (m / z) of charged ions. It has a wide range of application scenarios in many fields such as scientific research, medical treatment, and the environment. With the development of high-resolution mass spectrometers, the raw data files generated by mass spectrometers are also increasing significantly. Original files that used to be 10MB have become 10GB or even larger. Currently the most widely used format in the field of open data formats is the mzML format published in 2011. Since the data files of mass spectrometry were not high at that time, mzML paid more attention to the scalability and standardization of the format. In terms of data compression, it directly used the zlib compression method for data compression. There is no...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B50/50G06F16/174
CPCG16B50/50G06F16/1744
Inventor 陆妙善王瑞敏安绍维
Owner 碳硅杭州生物科技有限责任公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products