A small molecule unknown substance identification method, system, storage medium and electronic device
By analyzing the secondary mass spectra of known substances, bond breaking reactions and neutral loss-of-substrate structures can be determined. By combining and matching unknown substances, the problem of insufficient coverage of mass spectrometry databases can be solved, and accurate identification of small molecule unknowns can be achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- WUHAN METWARE BIOTECHNOLOGY CO LTD
- Filing Date
- 2025-08-21
- Publication Date
- 2026-06-19
AI Technical Summary
Existing mass spectrometry databases have limited coverage, resulting in poor accuracy in identifying unknown small molecules and an inability to effectively identify substances not included in the database.
By obtaining the target secondary mass spectrum of a known substance, analyzing the target bond breaking reaction and neutral loss substructure, arranging and combining substructure combinations, and matching them with the actual secondary mass spectrum of the unknown substance to be identified, suitable fragment combinations and structural formulas are determined, and reasonable structural formulas are screened using validation scores.
It improves the accuracy of identifying small molecule unknowns by reverse deducing the structural formula of the unknown to be identified, thereby enhancing the precision of the identification results.
Smart Images

Figure CN121027273B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of mass spectrometry analysis technology, specifically to a method, system, storage medium, and electronic device for identifying unknown small molecules. Background Technology
[0002] Proteomics and metabolomics are emerging omics disciplines in systems biology, following genomics and transcriptomics. Proteomics aims to comprehensively understand the composition, structure, function, and interactions of proteins, while metabolomics aims to provide a panoramic study of small molecule metabolites in biological samples, revealing comprehensive information about metabolism and health in organisms. Both omics have been widely applied in important fields such as disease diagnosis, mechanism research, drug discovery, and food safety. The main research objects of metabolomics and proteomics are small molecules, which are extremely complex in composition, very diverse in type, and have varied physicochemical properties. Many of these small molecules have not been effectively identified. Therefore, there is an urgent need for an effective method for identifying unknown small molecules and uncovering their omics information.
[0003] Currently, the common method for identifying small molecule unknowns is to compare the secondary mass spectrum of the small molecule unknown with a mass spectrometry database to identify it. However, since current mass spectrometry databases can only cover a very small number of substances, many substances not included in the mass spectrometry database cannot be effectively identified, resulting in poor accuracy in identifying small molecule unknowns. Summary of the Invention
[0004] To improve the accuracy of small molecule unknown substance identification, this application provides a method, system, storage medium, and electronic device for small molecule unknown substance identification.
[0005] The first aspect of this application provides a method for identifying small molecule unknowns, specifically including:
[0006] Obtain the target secondary mass spectrum of at least one known substance, wherein the known substance is a compound with a known chemical structure;
[0007] The target secondary mass spectrum is analyzed to obtain the target bond breaking reaction corresponding to each first mass fragment in the target secondary mass spectrum, and the corresponding neutral loss substructure is determined according to the target bond breaking reaction. The target bond breaking reaction is the bond breaking reaction required to obtain the first mass fragment.
[0008] The neutral loss substructures are arranged and combined to obtain at least one substructure combination, and the theoretical mass spectrometry fragment molecular weight corresponding to each substructure combination is determined.
[0009] Obtain the actual secondary mass spectrum of the unknown substance to be identified, match the actual secondary mass spectrum with the theoretical mass spectrometry fragment molecular weights corresponding to each of the substructure combinations, and determine the substructure combination corresponding to the successfully matched theoretical mass spectrometry fragment molecular weights as the target combination;
[0010] Based on each of the target combinations, a suitable fragment combination constituting at least one of the unknown substances to be identified is determined, and at least one undetermined structural formula corresponding to the unknown substance to be identified is determined according to the target bond breaking reaction corresponding to the suitable fragment combination. The suitable fragment combination is a combination of experimental parent ions constituting the unknown substance to be identified.
[0011] Each of the proposed structural formulas is analyzed and verified to obtain a corresponding verification score. Based on each verification score, the identification result of the unknown object to be identified is determined.
[0012] By employing the above technical solution, after obtaining the target secondary mass spectrum, the spectrum is analyzed to determine the target bond breaking reaction and neutral loss substructures that occur during the formation of the first mass spectrometry fragment. Then, these neutral loss substructures are arranged and combined to obtain substructure combinations. These combinations are then considered as possible lost parts during the secondary mass spectrometry of the unknown substance to be identified and matched with the mass spectrometry fragments in the actual secondary mass spectrum to ultimately determine the target combination. Further, based on each target combination, suitable fragment combinations corresponding to the experimental precursor ion of the unknown substance to be identified are determined, thereby determining the possible structure of the experimental precursor ion. Then, through the target bond breaking reaction corresponding to the suitable fragment combination, the possible structural formula of the unknown substance to be identified is deduced in reverse, i.e., the undetermined structural formula. Finally, based on the verification score, the most reasonable undetermined structural formula is selected from among the undetermined structural formulas. Based on this, the actual secondary mass spectrum of the unknown substance to be identified is interpreted, thus enabling more accurate identification of the unknown substance and improving the accuracy of small molecule unknown substance identification.
[0013] In one embodiment, the step of analyzing the target secondary mass spectrum to obtain the target bond breaking reaction corresponding to each first mass spectral fragment in the target secondary mass spectrum specifically includes:
[0014] The target chemical bond in the chemical structure of the known substance is subjected to simulated breaking operation to obtain at least one broken first parent nucleus structure, wherein the target chemical bond is a chemical bond that does not contain H;
[0015] Calculate the first theoretical molecular weight corresponding to each first parent nucleus structure, and subtract the experimental molecular weight of each first theoretical molecular weight from the experimental molecular weight of a single first mass spectrometer fragment in the target secondary mass spectrum and take the absolute value to obtain the corresponding first molecular weight deviation value.
[0016] If the first molecular weight deviation value is less than the corresponding deviation threshold, then the first parent nucleus structure to which the corresponding first theoretical molecular weight belongs is determined as the final parent nucleus structure of the corresponding first mass spectrometry fragment.
[0017] Based on the final parent nucleus structure, the target bond breaking reaction of the corresponding first mass spectrometry fragment is determined.
[0018] In one embodiment, the step of matching the actual secondary mass spectrum with the theoretical mass spectrometry fragment molecular weights corresponding to each of the substructure combinations, and determining the substructure combination corresponding to the successfully matched theoretical mass spectrometry fragment molecular weights as the target combination, specifically includes:
[0019] The actual secondary mass spectrum is preprocessed to obtain the processed result;
[0020] The actual molecular weight of a single second mass spectrometry fragment in the processed result is determined, and the difference between the actual molecular weight and the molecular weight of each theoretical mass spectrometry fragment is calculated and the absolute value is taken to obtain the second molecular weight deviation value of the corresponding theoretical mass spectrometry fragment molecular weight.
[0021] If the second molecular weight deviation value is less than the corresponding deviation threshold, then the corresponding theoretical mass spectrometry fragment molecular weight is determined as the successfully matched theoretical mass spectrometry fragment molecular weight, and the substructure combination corresponding to the successfully matched theoretical mass spectrometry fragment molecular weight is determined as the target combination.
[0022] In one implementation, determining a suitable combination of fragments constituting at least one of the unknown objects to be identified, based on each of the target combinations, specifically includes:
[0023] The neutral loss substructure in each target combination is determined as the reference substructure. The difference between the theoretical molecular weight of each reference substructure and the actual molecular weight of the corresponding second mass spectrometry fragment is calculated and the absolute value is taken to obtain the third molecular weight deviation value.
[0024] The final molecular weight deviation values of the corresponding target combination are obtained by summing the third molecular weight deviation values of each of them.
[0025] According to the order of the final molecular weight deviation value from smallest to largest, a preset number of target combinations are selected from each of the target combinations to determine the candidate combinations;
[0026] The candidate combinations are arranged and combined with the first parent nucleus structure to obtain at least one candidate fragment combination. The difference between the molecular weight of each candidate fragment combination and the molecular weight of the experimental parent ion is calculated and the absolute value is taken to obtain the corresponding fourth molecular weight deviation value. The experimental parent ion is the parent ion corresponding to the unknown substance to be identified.
[0027] If the fourth molecular weight deviation value is less than the corresponding deviation threshold, then the corresponding candidate fragment combination is determined as a suitable fragment combination.
[0028] In one embodiment, determining at least one undetermined structural formula corresponding to the unknown substance to be identified based on the target bond breaking reaction corresponding to the suitable fragment combination specifically includes:
[0029] Obtain the target bond breaking reaction corresponding to the neutral loss substructure in the suitable fragment combination and identify it as the key bond breaking reaction;
[0030] Based on the key bond breaking reactions described above, retrosynthetic operations are performed sequentially to obtain the corresponding reaction products;
[0031] Based on the reaction products described, at least one undetermined structural formula of the unknown substance to be identified is obtained.
[0032] In one implementation, the step of performing analytical verification on each of the undetermined structural formulas to obtain the corresponding verification score specifically includes:
[0033] For each of the undetermined structural formulas, a simulated fracture operation is performed to obtain the corresponding fractured second parent nucleus substructure.
[0034] The second theoretical molecular weight of the second parent nucleus structure is determined, and the difference between the experimental molecular weight of each second mass spectrometry fragment and the second theoretical molecular weight is calculated and the absolute value is taken to obtain the corresponding fifth molecular weight deviation value.
[0035] If the fifth molecular weight deviation value is less than the corresponding deviation threshold, then the corresponding second mass spectrometry fragment is determined to be a successfully matched fragment;
[0036] The number of successfully matched fragments is counted, and the ratio of the number of these fragments to the total number of fragments in the second mass spectrometry is determined as the verification score of the corresponding undetermined structure.
[0037] In one embodiment, the step of arranging and combining each of the neutral loss substructures to obtain at least one substructure combination specifically includes:
[0038] By using a pre-defined knapsack algorithm, the neutral loss substructures are arranged and combined to obtain at least one substructure combination.
[0039] A second aspect of this application provides a system for identifying small molecule unknowns, specifically comprising:
[0040] The data acquisition module is used to acquire the target secondary mass spectrum of at least one known substance, wherein the known substance is a compound with a known chemical structure;
[0041] The reaction determination module is used to analyze the target secondary mass spectrum to obtain the target bond breaking reaction corresponding to each first mass fragment in the target secondary mass spectrum, and to determine the corresponding neutral loss substructure based on the target bond breaking reaction. The target bond breaking reaction is the bond breaking reaction required to obtain the first mass fragment.
[0042] The structure combination module is used to arrange and combine each of the neutral loss substructures to obtain at least one substructure combination, and to determine the theoretical mass spectrometry fragment molecular weight corresponding to each substructure combination.
[0043] The fragment matching module is used to obtain the actual secondary mass spectrum of the unknown substance to be identified, match the actual secondary mass spectrum with the theoretical mass spectrometry fragment molecular weights corresponding to each of the substructure combinations, and determine the substructure combination corresponding to the successfully matched theoretical mass spectrometry fragment molecular weights as the target combination.
[0044] The structure determination module is used to determine, based on each of the target combinations, a suitable fragment combination constituting at least one of the unknown substances to be identified, and to determine at least one undetermined structural formula corresponding to the unknown substance to be identified according to the target bond breaking reaction corresponding to the suitable fragment combination, wherein the suitable fragment combination is a combination of experimental parent ions constituting the unknown substance to be identified.
[0045] The substance identification module is used to analyze and verify each of the proposed structural formulas, obtain the corresponding verification scores, and determine the identification result of the unknown substance to be identified based on the verification scores.
[0046] By adopting the above technical solution, the data acquisition module acquires the target secondary mass spectrum, the reaction determination module analyzes the target secondary mass spectrum to determine the target bond breaking reaction and neutral loss substructure corresponding to the first mass spectrum fragment, then the structure combination module determines at least one substructure combination and the corresponding theoretical mass spectrum fragment molecular weight, then the fragment matching module determines the substructure combination corresponding to the successfully matched theoretical mass spectrum fragment molecular weight as the target combination, the structure determination module determines the appropriate fragment combination and the corresponding target bond breaking reaction based on each target combination, and finally the substance identification module analyzes and verifies each undetermined structure to obtain the corresponding verification score, and determines the identification result of the unknown substance to be identified based on each verification score.
[0047] A third aspect of this application provides a computer-readable storage medium storing a computer program that, when loaded and executed by a processor, performs the steps of the method described in any one of the first aspects.
[0048] A fourth aspect of this application provides an electronic device, specifically comprising:
[0049] A processor, a memory, and a computer program stored in the memory and capable of running on the processor, the processor being configured to load and execute the computer program stored in the memory to cause the electronic device to perform the method as described in any one of the first aspects.
[0050] In summary, this application includes at least one of the following beneficial technical effects: Analyzing the target secondary mass spectrum to determine the target bond breaking reaction and neutral loss substructures during the formation of the first mass spectrometry fragment; then, arranging and combining each neutral loss substructure to obtain substructure combinations; then, using each substructure combination as a possible lost part during the secondary mass spectrometry of the unknown to be identified, and matching it with the mass spectrometry fragments in the actual secondary mass spectrum to finally determine the target combination. Furthermore, based on each target combination, determining the suitable fragment combination corresponding to the experimental precursor ion of the unknown to be identified, thereby determining the possible structure of the experimental precursor ion of the unknown to be identified; then, through the target bond breaking reaction corresponding to the suitable fragment combination, reversibly deriving the possible structural formula of the unknown to be identified, i.e., the undetermined structural formula; finally, based on the verification score, selecting the more reasonable undetermined structural formula from each undetermined structural formula, and thus interpreting the actual secondary mass spectrum of the unknown to be identified, thereby more accurately identifying the unknown to be identified and improving the accuracy of small molecule unknown identification. Attached Figure Description
[0051] Figure 1 This is a schematic flowchart of a method for identifying unknown small molecules provided in an embodiment of this application;
[0052] Figure 2 This is a schematic diagram of the structure of a small molecule unknown substance identification system provided in an embodiment of this application.
[0053] Explanation of reference numerals in the attached diagram: 11. Data acquisition module; 12. Reaction determination module; 13. Structure combination module; 14. Fragment matching module; 15. Structure determination module; 16. Substance identification module. Detailed Implementation
[0054] To enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments.
[0055] In the description of the embodiments of this application, words such as "exemplarily," "for example," or "for instance" are used to indicate examples, illustrations, or explanations. Any embodiment or design described as "exemplarily," "for example," or "for instance" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or designs. Specifically, the use of words such as "exemplarily," "for example," or "for instance" is intended to present the relevant concepts in a specific manner.
[0056] In the description of the embodiments of this application, the term "and / or" is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, B existing alone, or A and B existing simultaneously. Furthermore, unless otherwise stated, the term "multiple" means two or more. For example, multiple systems refer to two or more systems, and multiple screen terminals refer to two or more screen terminals. In addition, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the indicated technical features. Thus, a feature defined with "first" or "second" may explicitly or implicitly include one or more of that feature. The terms "comprising," "including," "having," and their variations all mean "including but not limited to," unless otherwise specifically emphasized.
[0057] See Figure 1 This application discloses a flowchart of a method for identifying small molecule unknowns, which can be implemented using a computer program or run on a small molecule unknown identification system based on the von Neumann architecture. The computer program can be integrated into an application or run as a standalone tool application, specifically including:
[0058] S101: Obtain the target secondary mass spectrum of at least one known substance, where the known substance is a compound with a known chemical structure.
[0059] Specifically, in the embodiments of this application, the known substance is a compound with a known chemical structure. The known substance can be a biological sample, such as a tissue section. In other embodiments, the known substance can also be a macromolecular compound. The target secondary mass spectrum is a high-resolution mass spectrum such as time-of-flight mass spectrometry or orbital trap mass spectrometry. The secondary mass spectrum, also known as the MS2 spectrum, is generated through tandem mass spectrometry experiments and is used to analyze the fragmentation behavior of specific precursor ions or parent ions, providing information on molecular structure or chemical composition. It is an important tool in mass spectrometry analysis and is widely used in small molecule identification, proteomics, metabolomics, and other fields.
[0060] This application discloses a method for identifying small molecule unknowns, in which the execution entity is a server. The server is wirelessly connected to a terminal, which is a personal computer or tablet computer. The terminal has an application related to the identification of small molecule unknowns installed on it, and the server is the backend server corresponding to the application. Specifically, it can be a standalone physical server or a cluster of multiple physical servers. Further, a feasible method for obtaining the target secondary mass spectrum of a known substance is as follows: personnel send a start command for the identification of the small molecule unknown substance through the terminal. Based on the start command, the server accesses a public database such as Massbank or Nist while connected to the network, and obtains the target secondary mass spectrum of at least one known substance. In other embodiments, the target secondary mass spectrum can also be directly sent to the server through the terminal. The target secondary mass spectrum is a secondary mass spectrum obtained in the laboratory using standards. The process of obtaining the secondary mass spectrum is briefly described as follows: a full scan of the standards is performed using a MALDI mass spectrometer to obtain a primary mass spectrum. Then, based on the primary mass spectrum, a specific ion (such as a precursor ion or parent ion) is selected for further fragmentation or analysis to generate a secondary mass spectrum. This is prior art and will not be elaborated further here. It should be noted that the target secondary mass spectrum includes multiple mass spectrometry fragments or fragment ions, which are generated during the fragmentation of the precursor ion (parent ion).
[0061] S102: Analyze the target secondary mass spectrum to obtain the target bond breaking reaction corresponding to each first mass fragment in the target secondary mass spectrum, and determine the corresponding neutral loss substructure based on the target bond breaking reaction. The target bond breaking reaction is the bond breaking reaction required to obtain the first mass fragment.
[0062] Specifically, after obtaining the target secondary mass spectrum, it needs to be analyzed. One feasible analysis method is to use a pre-defined simulated fragmentation tool to simulate the fragmentation of the target chemical bonds in the known chemical structure, obtaining at least one fragmented first core structure. Each simulated fragmentation corresponds to one fragmented first core structure. The target chemical bond is a chemical bond in the known chemical structure that does not contain hydrogen (H). The simulated fragmentation tool can be the GAMESS tool; in other embodiments, it can also be the MSFragment tool. Furthermore, the core structure refers to the most stable and core framework of the molecule, usually composed of strong chemical bonds (such as carbon-carbon bonds), which are not easily completely fragmented during mass spectrometry fragmentation.
[0063] Furthermore, the first theoretical molecular weight corresponding to a single first nucleus fragment structure is calculated using the pre-defined Sirius tool. Then, the deviation between the experimental molecular weight of a single first mass spectrometry fragment in the target secondary mass spectrum and the respective first theoretical molecular weights is determined. Specifically, the difference between each first theoretical molecular weight and the experimental molecular weight of a single first mass spectrometry fragment is calculated, and the absolute value is taken to obtain the first molecular weight deviation value between the corresponding first nucleus fragment structure and the single first mass spectrometry fragment. This first molecular weight deviation value is compared with a corresponding deviation threshold. If the first molecular weight deviation value is less than the corresponding deviation threshold, it indicates that the molecular weight deviation between the corresponding first nucleus fragment structure and this single first mass spectrometry fragment is small. Therefore, the corresponding first nucleus fragment structure is determined as the final nucleus fragment structure of that single first mass spectrometry fragment. It should be noted that there may be multiple final nucleus fragment structures with small molecular weight deviations from a single first mass spectrometry fragment.
[0064] Furthermore, based on the determined final parent nucleus structure, the target bond-breaking reaction for the corresponding first mass spectrometry fragment is determined. One feasible implementation method is as follows: Since each final parent nucleus structure corresponds to one bond-breaking reaction, a single first mass spectrometry fragment may correspond to multiple bond-breaking reactions. Therefore, the target bond-breaking reaction for the corresponding first mass spectrometry fragment is selected from the bond-breaking reactions corresponding to each final parent nucleus structure. The specific selection process is as follows: calculate the bond-breaking energy of each bond-breaking reaction, select the minimum bond-breaking energy from all bond-breaking energies, and determine the bond-breaking reaction corresponding to the minimum bond-breaking energy as the target bond-breaking reaction for that first mass spectrometry fragment. Similarly, the target bond-breaking reactions corresponding to each first mass spectrometry fragment in the target secondary mass spectrum can be determined. The target bond-breaking reaction is the bond-breaking reaction required to obtain the first mass spectrometry fragment. A bond-breaking reaction refers to a chemical reaction in which chemical bonds in a compound molecule break, leading to the decomposition or recombination of the molecular structure.
[0065] Furthermore, since each bond-breaking reaction consists of an electrically neutral lost substructure and a charged parent nucleus substructure, the corresponding neutral lost substructure can be determined based on the target bond-breaking reaction of each first mass spectrometry fragment. It should be noted that the neutral lost substructure refers to the electrically neutral portion lost from the molecule during the chemical bond breaking process. This portion of the structure carries no charge and typically leaves in the form of a small molecule or group.
[0066] S103: Arrange and combine each neutral loss substructure to obtain at least one substructure combination, and determine the theoretical mass spectrometry fragment molecular weight corresponding to each substructure combination.
[0067] Specifically, a preset combination algorithm is used to arrange and combine each neutral loss substructure to obtain at least one substructure combination. In this embodiment, the combination algorithm can be a knapsack algorithm; in other embodiments, the combination algorithm can also be a dynamic programming algorithm. Further, a single substructure combination is considered as a virtual mass spectrometry fragment, and the molecular weight of the theoretical mass spectrometry fragment corresponding to the substructure combination is determined. The specific determination process is as follows: based on the molecular formula of at least one neutral loss substructure in the substructure combination, the precise molecular formula of the entire substructure combination can be determined. Finally, based on the precise molecular formula, the molecular weight of the theoretical mass spectrometry fragment corresponding to the substructure combination can be determined.
[0068] S104: Obtain the actual secondary mass spectrum of the unknown substance to be identified, match the actual secondary mass spectrum with the theoretical mass spectrometry fragment molecular weights corresponding to each substructure combination, and determine the substructure combination corresponding to the successfully matched theoretical mass spectrometry fragment molecular weights as the target combination.
[0069] Specifically, after determining the theoretical mass spectrometry fragment molecular weights corresponding to each substructure combination, the actual secondary mass spectrum of the unknown to be identified is obtained using a MALDI mass spectrometer. The specific acquisition logic is detailed in step S101 and will not be repeated here. The actual secondary mass spectrum includes multiple mass spectrometry fragments. Further, the mass spectrometry fragments in the actual secondary mass spectrum are matched with the molecular weights of the theoretical mass spectrometry fragments corresponding to each substructure combination. One feasible matching method is as follows: First, the actual secondary mass spectrum is preprocessed. Specifically, noise with signal intensity below the corresponding threshold is removed, and signals with molecular weights higher than the corresponding parent ion are removed. Then, the molecular weight deviation between different mass spectrometry fragments in the actual secondary mass spectrum is calculated. If the molecular weight deviation is less than the corresponding deviation threshold, the corresponding mass spectrometry fragments are merged. This completes the preprocessing of the actual secondary mass spectrum, yielding the processed result. The unknown to be identified is a small molecule unknown.
[0070] In other embodiments, the number of noises with signal intensities below a corresponding threshold in the actual secondary mass spectrometer is counted. If the number exceeds a preset threshold, historical records of abnormal (high) detector temperatures in the mass spectrometer are obtained. These historical records include, but are not limited to, historical operating parameters that exhibited abnormalities when the detector temperature was abnormal, and the wind speed range of the cooling fan. Based on these historical records, multiple historical operating parameters that exhibited abnormalities when the detector temperature was abnormal are obtained. The first frequency of recurrence of a single historical operating parameter among all historical operating parameters is counted. If the first frequency exceeds a corresponding frequency threshold, the corresponding historical operating parameter is determined as the target operating parameter, i.e., the operating parameter that is likely to cause detector temperature abnormalities. Further, based on the above historical records, the wind speed range of the cooling fan when the detector temperature is abnormal under the premise of an abnormal single target operating parameter is obtained. The second frequency of recurrence of a single wind speed range among all wind speed ranges is counted. If the second frequency exceeds a corresponding frequency threshold, the corresponding wind speed range is determined as the target wind speed range corresponding to the target operating parameter, i.e., the wind speed range that is likely to cause detector temperature abnormalities.
[0071] Furthermore, a first weight is determined for each target operating parameter, and a second weight is determined for the target wind speed range corresponding to each target operating parameter. The first weight is the ratio of the first frequency of each target operating parameter to the sum of the first frequencies of all target operating parameters, and the second weight is the ratio of the second frequency of a single target wind speed range corresponding to a target operating parameter to the sum of the second frequencies of all corresponding target wind speed ranges. It should be noted that the cooling fan can be a cooling fan installed in the room where the mass spectrometer is located, or a pre-installed cooling fan within the mass spectrometer.
[0072] The actual data of each operating parameter of the detector is acquired. If the actual data is not within the corresponding normal data range, the corresponding operating parameter is identified as an abnormal operating parameter, and the actual fan speed of the cooling fan is acquired simultaneously. If the abnormal operating parameter is a target operating parameter, and the actual fan speed is included in the target fan speed range corresponding to the abnormal operating parameter, the corresponding target fan speed range is identified as the key fan speed range. The first product of the first weight of the abnormal operating parameter and the second weight of the corresponding key fan speed range is calculated. If the first product exceeds a preset first threshold, it indicates that the current temperature of the detector is likely abnormal, and the background noise level of the detector is likely to increase, making low mass-to-charge ratio fragment signals more likely to be regarded as noise, resulting in a large amount of noise. Therefore, the noise in the actual secondary mass spectrum is re-removed. At the same time, the actual temperature of the detector is acquired through a temperature sensor. If the actual temperature exceeds a preset temperature threshold, it indicates that the current temperature of the detector is indeed abnormal, thus verifying that the temperature threshold is reasonable.
[0073] If no abnormal operating parameters exist, calculate the second product of the first weight of each target operating parameter and the second weight of the corresponding target wind speed range. Determine the target wind speed range where the actual wind speed is located as the important wind speed range. Summate the second products corresponding to the important wind speed range to obtain the first summation result. If the first summation result exceeds the preset second threshold, it indicates that using the actual wind speed for heat dissipation is prone to causing abnormal detector temperature. Then, sum the second products corresponding to the same target wind speed range to obtain the second summation result. The larger the second summation result, the greater the possibility of abnormal detector temperature. Select the smallest second summation result from the various second summation results, determine the maximum value in the target wind speed range corresponding to the smallest second summation result as the suitable wind speed, and adjust the wind speed of the cooling fan to the suitable wind speed.
[0074] Further, the actual molecular weight of each individual second mass spectrometry fragment in the processed result is determined. Then, the difference between the actual molecular weight and the molecular weight of each theoretical mass spectrometry fragment is calculated, and the absolute value is taken to obtain the second molecular weight deviation value of the corresponding theoretical mass spectrometry fragment molecular weight. The second molecular weight deviation value is compared with the corresponding deviation threshold. If the second molecular weight deviation value is less than the deviation threshold, it indicates that the molecular weight deviation between the second mass spectrometry fragment and the corresponding substructure combination is small. In this case, the corresponding theoretical mass spectrometry fragment molecular weight is determined as the successfully matched theoretical mass spectrometry fragment molecular weight, and the substructure combination corresponding to the successfully matched theoretical mass spectrometry fragment molecular weight is determined as the target combination. It should be noted that there may be multiple target combinations with small molecular weight deviations from individual second mass spectrometry fragments.
[0075] S105: Based on each target combination, determine a suitable fragment combination that constitutes at least one unknown substance to be identified, and determine at least one undetermined structural formula corresponding to the unknown substance to be identified according to the target bond breaking reaction corresponding to the suitable fragment combination.
[0076] Specifically, after determining at least one target combination for each second mass spectrometry fragment, the neutral loss substructure contained in each target combination is designated as a reference substructure. The difference between the theoretical molecular weight of each reference substructure and the actual molecular weight of the corresponding second mass spectrometry fragment is calculated, and the absolute value is taken to obtain the corresponding third molecular weight deviation value. Then, the third molecular weight deviation values are summed to obtain the final molecular weight deviation value corresponding to the target combination, thereby characterizing the molecular weight deviation between the target combination and the corresponding second mass spectrometry fragment. A predetermined number of target combinations are selected from each target combination as candidate combinations. The candidate combinations can be understood as target combinations with a high degree of matching with the corresponding second mass spectrometry fragment, that is, the electroneutrally neutral portion that is easily lost during the fragmentation of the parent ion to form the second mass spectrometry fragment.
[0077] Further, each candidate combination is permuted and combined with each first parent nucleon structure to obtain at least one corresponding candidate fragment combination. This permutation and combination is performed using a dynamic programming algorithm or a knapsack algorithm. Next, the difference between the molecular weight of each candidate fragment combination (the molecular weight of the theoretical parent ion corresponding to the candidate fragment combination) and the molecular weight of the experimental parent ion is calculated, and the absolute value is taken to obtain the corresponding fourth molecular weight deviation value. Here, the experimental parent ion is the parent ion in the first-level mass spectrum of the unknown substance to be identified, and its molecular weight can be determined from the first-level mass spectrum of the unknown substance to be identified. Finally, the individual fourth molecular weight deviation value is compared with the corresponding deviation threshold. If the fourth molecular weight deviation value is less than the deviation threshold, then the corresponding candidate fragment combination is determined as a suitable fragment combination, that is, a combination constituting the experimental parent ion generated by the unknown substance to be identified.
[0078] Once a suitable fragment combination constituting at least one unknown substance to be identified is determined, at least one undetermined structural formula corresponding to the unknown substance to be identified needs to be determined based on the target bond breaking reaction corresponding to the suitable fragment combination. The undetermined structural formula is a possible structural formula of the unknown substance to be identified. In an embodiment of this application, a feasible determination method is as follows: Since the suitable fragment combination contains a combination of neutral lost substructures (target combination) and a parent nucleus substructure, the corresponding target bond breaking reaction can be determined based on the neutral lost substructure in a single suitable fragment combination, and the target bond breaking reaction corresponding to that suitable fragment combination is identified as the key bond breaking reaction. Further, based on the key bond breaking reactions corresponding to each suitable fragment combination, retrosynthetic operations are performed sequentially to obtain the corresponding reaction products, thereby determining at least one undetermined structural formula of the unknown substance to be identified.
[0079] S106: Perform analytical verification on each undetermined structural formula to obtain the corresponding verification score, and determine the identification result of the unknown object to be identified based on each verification score.
[0080] Specifically, after at least one undetermined structural formula corresponding to the unknown substance to be identified is determined, a simulated fracture operation is performed on each undetermined structural formula using a preset simulated fracture tool to obtain the corresponding fractured second parent nucleus structure. Further, the second theoretical molecular weight of the second parent nucleus structure is determined, and then the difference between the experimental molecular weight and the second theoretical molecular weight of each second mass spectrometry fragment is calculated and its absolute value is taken to obtain the corresponding fifth molecular weight deviation value. See step S101 for details, which will not be elaborated here.
[0081] The fifth molecular weight deviation value is compared with the corresponding deviation threshold. If the fifth molecular weight deviation value is less than the corresponding deviation threshold, it indicates that the molecular weight deviation between the second parent nucleus structure and the corresponding second mass spectrometry fragment is small. Therefore, the corresponding second mass spectrometry fragment is identified as a successfully matched fragment. Next, the number of successfully matched fragments is counted. The more fragments, the higher the degree of matching between the corresponding undetermined structure and the unknown substance to be identified. The more mass spectrometry fragments that can be explained in the actual second-order mass spectrum of the unknown substance, the higher the degree of explanation for the actual second-order mass spectrum of the unknown substance. The ratio of this number of fragments to the total number of fragments in the second mass spectrometry is then determined as the verification score of the corresponding undetermined structure. The higher the verification score, the greater the probability that the corresponding undetermined structure is the structure of the unknown substance to be identified.
[0082] After the verification score corresponding to each undetermined structure is determined, the identification result of the unknown to be identified is determined based on each verification score. One feasible implementation method is as follows: the verification score is compared with a preset score threshold. If the verification score exceeds the score threshold, the corresponding undetermined structure is determined as a suitable structure. Finally, based on the suitable structure, the parent nucleus structure corresponding to the mass spectrometry fragment in the actual secondary mass spectrum of the unknown to be identified is interpreted, thereby realizing the identification of the mass spectrometry peak in the actual secondary mass spectrum of the unknown to be identified, and thus accurately determining the identification result of the unknown to be identified.
[0083] The implementation principle of the small molecule unknown substance identification method in this application is as follows: The target secondary mass spectrum is analyzed to determine the target bond breaking reaction and neutral loss substructure that occurs during the formation of the first mass spectrum fragment. Then, the neutral loss substructures are arranged and combined to obtain substructure combinations. These substructure combinations are then considered as the parts that may be lost during the secondary mass spectrometry of the unknown substance to be identified, and matched with the mass spectrum fragments in the actual secondary mass spectrum to finally determine the target combination. Further, based on each target combination, suitable fragment combinations corresponding to the experimental precursor ion of the unknown substance to be identified are determined, thereby determining the possible structure of the experimental precursor ion. Then, through the target bond breaking reaction corresponding to the suitable fragment combination, the possible structural formula of the unknown substance to be identified is deduced in reverse, i.e., the undetermined structural formula. Finally, based on the verification score, a more reasonable undetermined structural formula is selected from the various undetermined structural formulas. Based on this, the actual secondary mass spectrum of the unknown substance to be identified is interpreted, thereby more accurately identifying the unknown substance and improving the accuracy of small molecule unknown substance identification.
[0084] The following are system embodiments of this application, which can be used to execute the method embodiments of this application. For details not disclosed in the system embodiments of this application, please refer to the method embodiments of this application.
[0085] Please see Figure 2This is a schematic diagram of the structure of the small molecule unknown substance identification system provided in this application embodiment. This small molecule unknown substance identification system can be implemented as all or part of a system through software, hardware, or a combination of both. The system includes a data acquisition module 11, a reaction determination module 12, a structure combination module 13, a fragment matching module 14, a structure determination module 15, and a substance identification module 16.
[0086] Data acquisition module 11 is used to acquire the target secondary mass spectrum of at least one known substance, wherein the known substance is a compound with a known chemical structure;
[0087] The reaction determination module 12 is used to analyze the target secondary mass spectrum, obtain the target bond breaking reaction corresponding to each first mass fragment in the target secondary mass spectrum, and determine the corresponding neutral loss substructure based on the target bond breaking reaction. The target bond breaking reaction is the bond breaking reaction required to obtain the first mass fragment.
[0088] The structure combination module 13 is used to arrange and combine each neutral loss substructure to obtain at least one substructure combination, and to determine the theoretical mass spectrometry fragment molecular weight corresponding to each substructure combination.
[0089] The fragment matching module 14 is used to obtain the actual secondary mass spectrum of the unknown substance to be identified, match the actual secondary mass spectrum with the theoretical mass spectrum fragment molecular weights corresponding to each substructure combination, and determine the substructure combination corresponding to the successfully matched theoretical mass spectrum fragment molecular weights as the target combination.
[0090] The structure determination module 15 is used to determine, based on each target combination, a suitable fragment combination constituting at least one unknown substance to be identified, and to determine at least one undetermined structural formula corresponding to the unknown substance to be identified according to the target bond breaking reaction corresponding to the suitable fragment combination. The suitable fragment combination is a combination of experimental parent ions constituting the unknown substance to be identified.
[0091] The substance identification module 16 is used to analyze and verify each undetermined structural formula, obtain the corresponding verification score, and determine the identification result of the unknown substance to be identified based on each verification score.
[0092] Optionally, the reaction determination module 12 is specifically used for:
[0093] The target chemical bond in the chemical structure of a known substance is simulated to break, and at least one broken first nucleus structure is obtained. The target chemical bond is a chemical bond that does not contain H.
[0094] Calculate the first theoretical molecular weight corresponding to each first parent nucleus structure, and subtract the experimental molecular weight of each first theoretical molecular weight from the experimental molecular weight of a single first mass fragment in the target secondary mass spectrum and take the absolute value to obtain the corresponding first molecular weight deviation value.
[0095] If the first molecular weight deviation value is less than the corresponding deviation threshold, then the first parent nucleus structure to which the corresponding first theoretical molecular weight belongs is determined as the final parent nucleus structure of the corresponding first mass spectrometry fragment.
[0096] Based on the final parent nucleus structure, the target bond breaking reaction of the corresponding first mass spectrometry fragment was determined.
[0097] Optional, fragment matching module 14, specifically used for:
[0098] The actual secondary mass spectrum was preprocessed to obtain the processed result;
[0099] Determine the actual molecular weight of a single second mass spectrometry fragment in the processed result, calculate the difference between the actual molecular weight and the molecular weight of each theoretical mass spectrometry fragment, and take the absolute value to obtain the second molecular weight deviation value of the corresponding theoretical mass spectrometry fragment molecular weight.
[0100] If the second molecular weight deviation value is less than the corresponding deviation threshold, the corresponding theoretical mass spectrometry fragment molecular weight is determined as the successfully matched theoretical mass spectrometry fragment molecular weight, and the substructure combination corresponding to the successfully matched theoretical mass spectrometry fragment molecular weight is determined as the target combination.
[0101] Optionally, the structure determination module 15 is specifically used for:
[0102] The neutral loss substructure in each target combination is determined as the reference substructure. The difference between the theoretical molecular weight of each reference substructure and the actual molecular weight of the corresponding second mass spectrometry fragment is calculated and the absolute value is taken to obtain the third molecular weight deviation value.
[0103] The final molecular weight deviation value of the target combination is obtained by summing the deviation values of each third molecular weight.
[0104] Based on the final molecular weight deviation value in ascending order, a preset number of target combinations are selected from each target combination to determine the candidate combinations;
[0105] Each candidate combination is arranged and combined with each first parent nucleon structure to obtain at least one candidate fragment combination. The difference between the molecular weight of each candidate fragment combination and the molecular weight of the experimental parent ion is calculated and the absolute value is taken to obtain the corresponding fourth molecular weight deviation value. The experimental parent ion is the parent ion corresponding to the unknown substance to be identified.
[0106] If the fourth molecular weight deviation value is less than the corresponding deviation threshold, then the corresponding candidate fragment combination is determined as the suitable fragment combination.
[0107] Optionally, the structure determination module 15 is specifically used for:
[0108] Obtain the target bond breaking reaction corresponding to the neutral loss substructure in a suitable fragment combination and identify it as the key bond breaking reaction;
[0109] Based on the key bond breaking reactions, retrosynthetic operations were performed sequentially to obtain the corresponding reaction products;
[0110] Based on the reaction products, at least one structural formula to be determined for the unknown substance to be identified is obtained.
[0111] Optional, substance identification module 16, specifically used for:
[0112] For each undetermined structural formula, a simulated fracture operation is performed to obtain the corresponding fractured second parent nucleus substructure.
[0113] The second theoretical molecular weight of the second parent nucleon structure is determined, and the difference between the experimental molecular weight and the second theoretical molecular weight of each second mass spectrometry fragment is calculated and the absolute value is taken to obtain the corresponding fifth molecular weight deviation value.
[0114] If the fifth molecular weight deviation value is less than the corresponding deviation threshold, then the corresponding second mass spectrometry fragment is determined to be a successfully matched fragment.
[0115] The number of successfully matched fragments is counted, and the ratio of the number of these fragments to the total number of fragments in the second mass spectrometry is determined as the verification score of the corresponding undetermined structure.
[0116] Optional, structural assembly module 13, specifically used for:
[0117] By using a pre-defined knapsack algorithm, each neutral missing substructure is arranged and combined to obtain at least one substructure combination.
[0118] It should be noted that the above embodiments of the small molecule unknown substance identification system are only illustrated by the division of the functional modules described above when executing the small molecule unknown substance identification method. In practical applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the small molecule unknown substance identification system and the small molecule unknown substance identification method embodiment provided above belong to the same concept, and the implementation process is detailed in the method embodiment, which will not be repeated here.
[0119] This application also discloses a computer-readable storage medium, which stores a computer program, wherein when the computer program is executed by a processor, it implements a method for identifying small molecule unknowns as described in the above embodiments.
[0120] The computer program can be stored in a computer-readable medium. The computer program includes computer program code, which can be in the form of source code, object code, executable file, or certain middleware. The computer-readable medium includes any entity or device capable of carrying computer program code, recording media, USB flash drive, portable hard drive, magnetic disk, optical disk, computer memory, read-only memory (ROM), random access memory (RAM), electrical carrier signals, telecommunication signals, and software distribution media, etc. It should be noted that the computer-readable medium includes, but is not limited to, the above-mentioned components.
[0121] The method for identifying small molecule unknowns according to the above embodiments is stored in the computer-readable storage medium and loaded and executed on the processor to facilitate the storage and application of the above method.
[0122] This application also discloses an electronic device in which a computer program is stored in a computer-readable storage medium. When the computer program is loaded and executed by a processor, it implements the above-mentioned method for identifying small molecule unknowns.
[0123] The electronic device can be a desktop computer, a laptop computer, or a cloud server, and includes, but is not limited to, a processor and a memory. For example, the electronic device may also include input / output devices, network access devices, and buses.
[0124] The processor can be a central processing unit (CPU). Of course, depending on the actual use, it can also be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), off-the-shelf programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or any conventional processor, etc., and this application does not limit it.
[0125] The memory can be an internal storage unit of an electronic device, such as a hard disk or RAM, or an external storage device, such as a plug-in hard disk, smart memory card (SMC), secure digital card (SD), or flash memory card (FC) equipped on the electronic device. Furthermore, the memory can be a combination of an internal storage unit and an external storage device. The memory is used to store computer programs and other programs and data required by the electronic device. The memory can also be used to temporarily store data that has been output or will be output. This application does not limit this.
[0126] In this electronic device, the method for identifying small molecule unknowns according to the above embodiments is stored in the memory of the electronic device and loaded and executed on the processor of the electronic device for convenient use.
[0127] The foregoing description is merely an exemplary embodiment of this disclosure and should not be construed as limiting the scope of this disclosure. Any equivalent changes and modifications made in accordance with the teachings of this disclosure shall still fall within the scope of this disclosure. This application is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not described in this disclosure. The specification and embodiments are considered exemplary only, and the scope and spirit of this disclosure are defined by the claims.
Claims
1. A method for identifying unknown small molecule substances, characterized in that, The method includes: Obtain the target secondary mass spectrum of at least one known substance, wherein the known substance is a compound with a known chemical structure; The target secondary mass spectrum is analyzed to obtain the target bond breaking reaction corresponding to each first mass spectral fragment in the target secondary mass spectrum, including: performing a simulated breaking operation on the target chemical bond in the chemical structure of the known substance to obtain at least one broken first parent nucleus structure, wherein the target chemical bond is a chemical bond without H; Calculate the first theoretical molecular weight corresponding to each first parent nucleus structure, and subtract the experimental molecular weight of each first theoretical molecular weight from the experimental molecular weight of a single first mass spectrometer fragment in the target secondary mass spectrum and take the absolute value to obtain the corresponding first molecular weight deviation value. If the first molecular weight deviation value is less than the corresponding deviation threshold, then the first parent nucleus structure to which the corresponding first theoretical molecular weight belongs is determined as the final parent nucleus structure of the corresponding first mass spectrometry fragment. The bond cleavage reaction with the smallest cleavage bond energy is selected from the bond cleavage reactions corresponding to each final parent nucleus structure and determined as the target bond cleavage reaction for the first mass spectrometry fragment. The corresponding neutral loss substructure is determined based on the target bond cleavage reaction. The target bond cleavage reaction is the bond cleavage reaction required to obtain the first mass spectrometry fragment. The neutral loss substructures are arranged and combined to obtain at least one substructure combination, and the theoretical mass spectrometry fragment molecular weight corresponding to each substructure combination is determined. Obtain the actual secondary mass spectrum of the unknown substance to be identified, match the actual secondary mass spectrum with the theoretical mass spectrometry fragment molecular weights corresponding to each of the substructure combinations, and determine the substructure combination corresponding to the successfully matched theoretical mass spectrometry fragment molecular weights as the target combination; Based on each of the target combinations, a suitable fragment combination constituting at least one of the unknown substances to be identified is determined, and at least one undetermined structural formula corresponding to the unknown substance to be identified is determined according to the target bond breaking reaction corresponding to the suitable fragment combination. The suitable fragment combination is a combination of experimental parent ions constituting the unknown substance to be identified. Each of the undetermined structural formulas is analyzed and verified to obtain the corresponding verification score, including: performing a simulated fracture operation on each of the undetermined structural formulas to obtain the corresponding fractured second mother core substructure; The second theoretical molecular weight of the second parent nucleus structure is determined, and the difference between the experimental molecular weight of each second mass spectrometry fragment and the second theoretical molecular weight is calculated and the absolute value is taken to obtain the corresponding fifth molecular weight deviation value. The second mass spectrometry fragment is the mass spectrometry fragment in the processed result after the actual second-order mass spectrum preprocessing. If the fifth molecular weight deviation value is less than the corresponding deviation threshold, then the corresponding second mass spectrometry fragment is determined to be a successfully matched fragment; The number of successfully matched fragments is counted, and the ratio of the number of these fragments to the total number of fragments in the second mass spectrometry is determined as the verification score of the corresponding undetermined structure. Based on each verification score, the identification result of the unknown substance to be identified is determined.
2. The method for identifying small molecule unknowns according to claim 1, characterized in that, The step of matching the actual secondary mass spectrum with the theoretical mass spectrometry fragment molecular weights corresponding to each of the substructure combinations, and determining the substructure combinations corresponding to the successfully matched theoretical mass spectrometry fragment molecular weights as the target combinations, specifically includes: The actual secondary mass spectrum is preprocessed to obtain the processed result; The actual molecular weight of a single second mass spectrometry fragment in the processed result is determined, and the difference between the actual molecular weight and the molecular weight of each theoretical mass spectrometry fragment is calculated and the absolute value is taken to obtain the second molecular weight deviation value of the corresponding theoretical mass spectrometry fragment molecular weight. If the second molecular weight deviation value is less than the corresponding deviation threshold, then the corresponding theoretical mass spectrometry fragment molecular weight is determined as the successfully matched theoretical mass spectrometry fragment molecular weight, and the substructure combination corresponding to the successfully matched theoretical mass spectrometry fragment molecular weight is determined as the target combination.
3. The method of claim 2, wherein the small molecule unknown is identified by the method of claim 1. The step of determining a suitable fragment combination constituting at least one of the unknown objects to be identified based on each of the target combinations specifically includes: The neutral loss substructure in each target combination is determined as the reference substructure. The difference between the theoretical molecular weight of each reference substructure and the actual molecular weight of the corresponding second mass spectrometry fragment is calculated and the absolute value is taken to obtain the third molecular weight deviation value. The final molecular weight deviation values of the corresponding target combination are obtained by summing the third molecular weight deviation values of each of them. According to the order of the final molecular weight deviation value from smallest to largest, a preset number of target combinations are selected from each of the target combinations to determine the candidate combinations; The candidate combinations are arranged and combined with the first parent nucleus structure to obtain at least one candidate fragment combination. The difference between the molecular weight of each candidate fragment combination and the molecular weight of the experimental parent ion is calculated and the absolute value is taken to obtain the corresponding fourth molecular weight deviation value. The experimental parent ion is the parent ion corresponding to the unknown substance to be identified. If the fourth molecular weight deviation value is less than the corresponding deviation threshold, then the corresponding candidate fragment combination is determined as a suitable fragment combination.
4. The method for identifying small molecule unknowns according to claim 1, characterized in that, The step of determining at least one undetermined structural formula corresponding to the unknown substance to be identified based on the target bond breaking reaction corresponding to the suitable fragment combination specifically includes: Obtain the target bond breaking reaction corresponding to the neutral loss substructure in the suitable fragment combination and identify it as the key bond breaking reaction; Based on the key bond breaking reactions described above, retrosynthetic operations are performed sequentially to obtain the corresponding reaction products; Based on the reaction products described, at least one undetermined structural formula of the unknown substance to be identified is obtained.
5. The method for identifying small molecule unknowns according to claim 1, characterized in that, The step of arranging and combining each of the neutral loss substructures to obtain at least one substructure combination specifically includes: By using a pre-defined knapsack algorithm, the neutral loss substructures are arranged and combined to obtain at least one substructure combination.
6. A system for identifying small molecule unknowns, used to implement the method for identifying small molecule unknowns according to any one of claims 1 to 5, characterized in that, include: The data acquisition module (11) is used to acquire the target secondary mass spectrum of at least one known substance, wherein the known substance is a compound with a known chemical structure; The reaction determination module (12) is used to analyze the target secondary mass spectrum to obtain the target bond breaking reaction corresponding to each first mass spectrum fragment in the target secondary mass spectrum, and to determine the corresponding neutral loss substructure based on the target bond breaking reaction. The target bond breaking reaction is the bond breaking reaction required to obtain the first mass spectrum fragment. The structure combination module (13) is used to arrange and combine each of the neutral loss substructures to obtain at least one substructure combination, and to determine the theoretical mass spectrometry fragment molecular weight corresponding to each substructure combination. The fragment matching module (14) is used to obtain the actual secondary mass spectrum of the unknown to be identified, match the actual secondary mass spectrum with the theoretical mass spectrum fragment molecular weights corresponding to each of the substructure combinations, and determine the substructure combination corresponding to the successfully matched theoretical mass spectrum fragment molecular weights as the target combination. The structure determination module (15) is used to determine, based on each of the target combinations, a suitable fragment combination constituting at least one of the unknown substances to be identified, and to determine at least one undetermined structural formula corresponding to the unknown substance to be identified according to the target bond breaking reaction corresponding to the suitable fragment combination, wherein the suitable fragment combination is a combination of experimental parent ions constituting the unknown substance to be identified. The substance identification module (16) is used to analyze and verify each of the undetermined structural formulas, obtain the corresponding verification score, and determine the identification result of the undetermined unknown substance based on each of the verification scores.
7. A computer-readable storage medium having stored therein a computer program, characterized in that, When the computer program is loaded and executed by the processor, it implements the method of any one of claims 1-5.
8. An electronic device comprising a memory, a processor, and a computer program stored in the memory and capable of running on the processor, characterized in that, When the processor loads and executes the computer program, it implements the method of any one of claims 1-5.