Construction method and device of plant metabolite database, medium and terminal

A construction method and metabolite technology, which can be used in database management systems, chemical informatics data warehouses, chemical information database systems, etc. Information-rich, easy-to-use effects

Pending Publication Date: 2021-11-12
上海鹿明生物科技有限公司
0 Cites 3 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0005] In view of the above-mentioned shortcomings of the prior art, the object of the present invention is to provide a method, device, medium and terminal for constructing a plant metabolite database,...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Method used

In summary, the present invention provides a construction method, device, medium and terminal of a plant metabolite database, with dozens of different plant tissue samples as data sources, on the basis of ensuring that metabolites are abundant, combined with public A self-built database with wide coverage and high accuracy has been constructed in multi-dimensional ways such as database, literature, standard products, manual proofreading and biological information, so as to meet the needs of plant and plant extract sample retrieval; the constructed plant metabolite database and Compared with the public database, the redundant part in the public database was removed from the list at the initial stage of database construction, ensuring that the included compounds are all plant natural products, and the biological source information is marked at the same time, thus greatly reducing the false positives generated during the annotation process. Improved the accuracy of annotation; the constructed plant metabolite database contains information on the chromatographic dimension, and through retention time dimension comparison, it has a good distinction for some isomers, which also improves the accuracy of annotation; In addition, the metabolite data in the constructed plant metabolite database is rich in information, such as spectrum information, compound adduct form information, Inchikey information, classification information, tissue sample source information, etc., and has detailed biological information for each compound Notes, easy to use, easier to check and reference. Therefore, the present invention effectively overcomes various shortcoming...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Abstract

The invention provides a construction method and device of a plant metabolite database, a medium and a terminal. The method comprises the following steps: exporting all plant metabolite data in a public database; screening the exported data based on a preset screening condition to obtain a plant metabolite data set; collecting and processing plant tissue samples to extract mass spectrogram data matched with the plant metabolite data set; acquiring chromatography data matched with the plant metabolite data set based on the retention time of a standard substance and the plant tissue samples; and constructing a plant metabolite database based on the plant metabolite data set, the mass spectrogram data and the chromatography data. According to the method, a database with wide coverage and high accuracy is constructed in combination with a public database, literature, standard substances, manual proofreading, biological information and other multi-dimensional modes, redundant parts in the public database are removed, biological source information is labeled, and false positive in the annotation process is reduced; and the annotation accuracy is improved through retention time.

Application Domain

Database management systemsCheminformatics data warehousing +4

Technology Topic

Mass spectrometryEngineering +8

Image

  • Construction method and device of plant metabolite database, medium and terminal
  • Construction method and device of plant metabolite database, medium and terminal
  • Construction method and device of plant metabolite database, medium and terminal

Examples

  • Experimental program(4)

Example Embodiment

[0033] Example one
[0034] Such as figure 1 As shown, the embodiment of the present invention proposes a flow chart of a method of constructing a plant metabolite database, including:
[0035] Step S11. Export all plant metabolites data in the public database. Optionally, the public database can be used in Metlin, HMDB, MassBank, etc. Take HMDB as an example, use Python to export all of the layer information of all metabolites on the HMDB website; find the DISPOSITION of each metabolite to find the DISPOSITION, find the Biological column in this layer, confirm whether the compound is plant with Plant Information
[0036] Step S12. Screened the exported data based on preset screening conditions to obtain plant metabolic data sets. Specifically, based on the composition element, molecular weight, state, the number of nitrogen atoms, the number of sulfur atoms, and / or the number of phosphorus atoms, for screening of the derived data.
[0037] In the preferred embodiment of the present embodiment, the preset screening conditions include: a first screening condition, the compound should be non-single, remove all of the monosensic elements. In the preferred embodiment of the present embodiment, the preset screening conditions include: second screening conditions, the molecular weight of the compound should be less than 1500. In the preferred embodiment of the present embodiment, the preset screening conditions include: third screening conditions, the recording of the compound in the STATUS layer information should be: detected, or quantified, or detected and quantified. In the preferred embodiment of the present embodiment, the preset screening conditions include: fourth screening conditions, the number of compound nitrogen atoms should be less than or equal to 7. In the preferred embodiment of the present embodiment, the preset screening conditions include: a fifth screening condition, the number of sulfur atoms in the compound should be less than or equal to 2. In the preferred embodiment of the present embodiment, the preset screening conditions include: sixth screening conditions, the number of phosphorus atoms in the compound should be less than or equal to 3. In the preferred embodiment of the present embodiment, the preset screening conditions include: seventh screening conditions, when the phosphorus atoms in the compound are present, and when the number is 1, the number of oxygen atoms should be greater than or equal to 4. In the preferred embodiment of the present embodiment, the preset screening conditions include: eight screening conditions, when the phosphorus atoms in the compound are present, and when the number is 2, the number of oxygen atoms should be greater than or equal to 7. In the preferred embodiment of the present embodiment, the preset screening conditions include: ninth screening conditions, when the phosphorus atoms in the compound are present, and when the number is 3, the number of oxygen atoms should be greater than or equal to 9. In the preferred embodiment of the present embodiment, the preset screening conditions include: tenth screening conditions, when there is no phosphorus atom in the compound, the sum of the nitrogen atoms and oxygen atoms should be less than or equal to the number of carbon atoms.
[0038] Further, after the screening is completed, the information list of the target plant compound and the SDF structure file of the compound, combined all SDF files, forming a data set comprising 6000+ plant metabolites.
[0039] Step S13. Collect plant tissue samples and processed to extract mass spectrum data that matches the plant metabolite data set. Specifically, the tissue samples of the preselected plant roots, stems, leaves, flowers and / or fruit are collected and pre-treatment; the pre-treatment form includes solid phase extraction, defensive entry medium, infusion solid extraction, ultrafiltration or Immunogenesis; use liquid chromatography-mass spectrometry techniques to obtain sample nature spectrum data and sample chromatographic data, based on pre-treated plant tissue samples to obtain sample nature spectrum data and sample chromatographic data; The data set is mapped to obtain mass spectrum data that matches the plant metabolite data set; where the positive ion candidate add form, the negative ion candidate adds form, the precursor ion mass deviation range, the fragment ion mass deviation range The mapping matching range and / or the fragment matching score is mapped, and finally extracts mass spectrum data that matches the plant metabolite data set.
[0040] In some examples, 23 common plants such as wheat, sand, sunflower, rape, blueberry, stem, leaves, flowers or fruits are collected, and the pretreatment is summarized as follows:
[0041] A. Weigh 80 mg samples, add internal standard (L-2-chlorophenylalanine, 0.3 mg / ml; LYSO PC17: 0, 0.01 mg / mL;) 20 μl, 600 ml of methanol - water ( V1: V2 = 7: 3).
[0042] B. Two small steel beads were added, and 2min was pre-cooled at -20 ° C, and the grinder (60 Hz, 2min) was added.
[0043] C. Extraction of ice water bath for 30 min, - 20 ° C for 20 min.
[0044] D. Centrifuge for 10 min (13000 rpm, 4 ° C), and all supernatant was loaded into a 1.5 ml EP tube.
[0045] E. Add 400 μl of methanol-water again in the residue (V3: V4 = 7: 3).
[0046] F. Ice water bath Ultrasonic extraction 20 min, - 20 ° C for 20 min.
[0047] G. Centrifuge for 10 min (13000 rpm, 4 ° C), and all supernatants were mixed with the supernatant in step D, and the supernatant was totally 1 mL.
[0048] H. Take 300 ul, filter membrane filtration, and bottled.
[0049] I. Take 300 ul to the supernatant, smooth, recall the pure water with 300 ul, centrifuge, take the upper filter, and bottled.
[0050] J. The remaining 400 ul supernatant saves to the -80 degree refrigerator.
[0051] Further, a liquid chromatography-mass spectrometry (such as AB 6600PLUS and Thermo QE instrument), the above tissue samples are subjected to data acquisition, and the mass spectrum data and chromatographic data of plant tissue samples are obtained. Analyze the mass spectrum data of the plant tissue sample (for example, using Waters company's Progenesis Qi Analysis Software), set the positive ion candidate plus form: [M + H] +, M +, [2M + H] +, [M + K] +, [M + Na] +, [M + NH4] +, [M-H2O + H] +; Set an negative ion candidate add form: [2M-H] +, [M-H2O-H] -, [M + FA-H] -, [M + CL] -, [M-H] -; Set precursor ion mass deviation ≤5ppm; set fragment ion mass deviation ≤10ppm; set mapping matching total score ≥ 40; Debris matching points ≥ 10; mapping of data sets of 6000+ plant metabolites.
[0052] In the preferred embodiment of the present embodiment, the plant metabolite data set is matched to the biofile data of the plant tissue sample, and the matching result is positively correlated with the total score of the mapping match. For example, the biore source information of the candidate compound is matched to the sample tissue of the spectrum data, and the candidate compound is from the tissue sample from the same subject, and the total score +5 is mapped.
[0053] Further, the spectral information corresponding to the successful metabolite is exported in the form of a data matrix, and the collection summary is saved in the form of an MSP file to store the mass spectrum information of 6000+ plant metabolites.
[0054] Step S14. Chromatographic data that matches the plant metabolite data set is obtained based on the retention time of the standard product and the plant tissue sample. Specifically, based on the original data matrix of plant tissue samples corresponding to the above-described matching metabolite (optional Watersis Qi Analysis Software), the Metabolic Number of Matching Success is CSV Reserved Time Data corresponding to its retention time. The list export; the same, the original data matrix of the standard is analyzed, form a list of CSV retention time data for standard metabolites and its retention time; integrate the above two retention time lists to get the complete 6000+ plant metabolites Chromatographic data.
[0055] Step S15. Based on the plant metabolite data set, mass spectrogram data and chromatographic data, plant metabolites database is constructed. In some examples, use Waters' ProGenesis Qi Analysis Software, call 6000+ plant metabolites, the MSP files of the spectral information, and the CSV file of the chromatographic information, integrated the formation of the complete plant metabolites database. Preferably, the biological source information in the database corresponds to each compound, forming a separate Excel form, convenient to call. When using a database, the database after the QI analysis software is called, and the retention time deviation is set to ≤0.1min, the precursor ion mass deviation is ≤ 100 ppm, and the fragment ion mass deviation is ≤10ppm, which can be used normally.
[0056] In order to further illustrate the advantages of the plant metabolite database (self-construction library) constructed in the present invention, the original data of the blueberry seedlings is obtained in accordance with the above-mentioned plant tissue samples, in QI software On the original data of the blueberry seedling, the public library is used to search and self-study libraries.
[0057] Figure 2A and Figure 2b A schematic diagram of traceability of blueberry seedlings in public libraries and self-study libraries, respectively. Figure 2A For the public library, the metabolite portion of the black marker is a predictive compound, and has never been reported in plants; some metabolites are derived from animal-specific metabolic ways; some metabolites are non-natural products, source from the environment Pollutants or pharmaceuticals synthesized by plant. Figure 2b For the traceability of the self-study library, it is obviously compared with the public library. The compounds whose comments are all plants natural metabolic products that can be traced back to the HMDB web page record or literature.
[0058] Figure 3A and Figure 3b A schematic diagram of the blueberry seedlings in the public library and the self-study library, which is more known, blueberry seedlings Figure 3b The spectrum match in the self-study library is higher. The resulting spectral information and chromatographic information are more abundant, and completely matched fragment ion information, accurate to the number of female ion ion in the decimal point, error is not more than 0.1 min keep time.
[0059] Figure 4A and Figure 4b A selection result of the symbolized isomer in the example in an example in the public library and the self-study library. Figure 4A The comment results of the public community are shown that there is a candidate compound that is substantially consistent, it is difficult to distinguish it. Figure 4b By retention time dimension comparison, the target compounds of the annotation can be easily distinguished.
[0060] Figure 5A and Figure 5b A schematic diagram of the file contents of the public library and the self-study library. Figure 5AThe Chinese public library opens with NOTEPAD, which is more single, except for the spectrogram matrix information, only basic information such as compound molecular formula, Inchikey, classification. The self-study library in Figure B opens, divides the spectrogram information, compound molecular formula, compound add form, Inchikey, classification, etc. The hyperlink is included in the form of inclusion, which is convenient to find.
[0061] Table 1 Comparison of annotation results for different plant tissue samples in public libraries and self-study libraries. By analyzing the search results of different plant tissue samples and plant self-study libraries, although compared to the public library, the compounds of the self-study library have decreased, but the number of plant metabolites contained in the public library The ratio is only about 35%, and the result is too positive. In contrast, the self-study library avoids this problem, ensuring that the results are plant source metabolites.
[0062] Table 1 Comparison of annotation results of different plant tissue samples in public libraries and self-study libraries
[0063]
[0064] In some embodiments, the method can be applied to a controller, such as an ARM (Advanced Risc Machines) controller, FPGA (Field Programmable Gate Array) controller, SOC (System on Chip) controller, DSP (Digital Signal Processing) controller, or MCU (MicroController Unit) controller, etc. In some embodiments, the method can also be applied to include a memory, a memory controller, one or more processing units (CPUs), peripheral interfaces, RF circuits, audio circuitry, speakers, microphones, input / output (I / O) Computers of components such as subsystems, display, other outputs, or control devices, and external ports; including, but not limited to, such as desktop, laptop, tablet, smartphone, smart TV, personal digital assistant (Personal) Digital Assistant, referred to as PDA, etc. In other embodiments, the method can also be applied to a server that can be arranged on one or more entity servers based on a functional, load, and the like, or may be composed of a distributed or set server cluster.

Example Embodiment

[0065] Example 2
[0066] Such as Image 6 As shown, the embodiment of the present invention proposes a structural apparatus of a plant metabolite database. The build device of the plant metabolite database provided in this embodiment includes: exporting module 61 for exporting all plant metabolites data in the public database; screening module 62, configured to screen the exported data based on preset screening conditions to acquire plant Metabolic data set; spectral data acquisition module 63 for collecting plant tissue samples and processes mass spectrum data that matches the plant metabolite data set; chromatographic data acquisition module 64 for standard and The retention time of the plant tissue sample acquires chromatographic data that matches the plant metabolite data set; the construction module 65 is used to construct a plant metabolites database based on the plant metabolite data set, mass spectrum data, and chromatographic data.
[0067] It should be noted that the modules provided in this embodiment are similar to the methods provided above, so they are not described again. It should also be noted that the division of each module of the above device should be understood is merely a logic function. When actually implementation, it can be integrated into a physical entity, or physically separate. Moreover, these modules can be implemented in the form of software by software; can also be implemented in the form of hardware; it can also be implemented in the form of a processing element call software, and some modules can be implemented in the form of hardware. For example, the screening module may be a separately set processing element, or an integrated in a single chip of the device, in addition, in the form of the device code in the memory of the device, a one of the above devices Call and execute the functions of the above filter modules. Implementation and similarity of other modules. In addition, these modules can be integrated together or can be implemented independently. The processing element described here can be an integrated circuit with signal processing capability. During the implementation, the various steps or above each module of the above method may be completed by the integrated logic circuit or software form of the hardware in the processor element.
[0068] For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as one or more specific integrated circuits, abbreviation ASICs, or, one or more microprocessors ( Digital Signal Processor, Abbreviation DSP), or, one or more field programmable gate arrays, referred to as FPGAs, etc. Again, when an a module is implemented by processing component scheduler code, the processing element can be a general purpose processor, a central processor (CPU) or other processor that can call program code. For example, these modules can be integrated together, implemented in the form of SYSTEM-ON-A-CHIP, SOC.

Example Embodiment

[0069] Example three
[0070] The embodiment of the present invention proposes a computer readable storage medium that stores a computer program that implements a construction method of the plant metabolite database when executed by the processor.
[0071] One of ordinary skill in the art will appreciate that all or part of the steps of the above method embodiments can be accomplished by a computer program related hardware. The aforementioned computer program can be stored in a computer readable storage medium. When the program is executed, the step of including the above method embodiments is performed; the aforementioned storage medium comprises: a medium such as a ROM, a RAM, a disk, or an optical disk, which can store the program code.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Similar technology patents

Anti-piracy filtering system for use in transmission of digital video works

InactiveCN102004888AWide coverageNo monitoring blind spotsProgram/content distribution protectionDigital fingerprintCost (economic)
Owner:HUAZHONG UNIV OF SCI & TECH

Classification and recommendation of technical efficacy words

  • Wide coverage
  • Improve accuracy
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products