Database construction method, substance identification apparatus, system and method, and computer device, program product and medium

By comparing the protein fingerprint of the microorganism to be tested with a reference object in two stages, the protein difference information is used for accurate identification, which solves the identification error problem caused by the high similarity of multiple microorganisms or the presence of multiple microorganisms in the sample, and improves the accuracy and efficiency of microbial identification.

WO2026124684A1PCT designated stage Publication Date: 2026-06-18ZYBIO INC

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
ZYBIO INC
Filing Date
2025-12-26
Publication Date
2026-06-18

AI Technical Summary

Technical Problem

In existing technologies, when the results of microbial mass spectrometry detection are highly similar to the results of mass spectrometry detection of multiple known categories of microorganisms, or when the sample contains multiple microorganisms, the identification results are prone to large errors and low efficiency.

Method used

By comparing the protein fingerprint of the test object with the standard protein fingerprint of the reference object in a two-stage process, the candidate reference objects with high similarity are first excluded, and then the protein difference information is used for a second comparison to determine the identification result.

🎯Benefits of technology

It improves the accuracy and efficiency of microbial identification, can identify mixed reference results, and ensures the precision of identification results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure PCTCN2025146028-FTAPPB-I100001
    Figure PCTCN2025146028-FTAPPB-I100001
Patent Text Reader

Abstract

Disclosed in the present application are a database construction method, a substance identification apparatus, system and method, and a computer device, a program product and a medium. The substance identification method comprises: performing first similarity comparison on a protein fingerprint spectrum of an object under test and a standard protein fingerprint spectrum of each reference object, so as to obtain a plurality of candidate reference objects similar to the object under test; and on the basis of protein difference information of the candidate reference objects, performing second similarity comparison on the object under test and the candidate reference objects, and on the basis of comparison results of the candidate reference objects, determining an identification result of the object under test. On the basis of protein difference information corresponding to each candidate reference object determined during primary comparison, secondary similarity comparison is performed on an object under test, so as to determine, on the basis of a secondary comparison result, an identification result of the object under test. Therefore, by means of the use of protein difference information, the problem of it not being possible to determine an identification result of an object under test due to the relatively high similarity between protein fingerprint spectra of a plurality of microorganisms of known categories or the existence of a plurality of microorganisms in the object under test is solved, thereby improving the accuracy of microorganism identification.
Need to check novelty before this filing date? Find Prior Art

Description

Database construction methods, substance identification devices, systems, methods, computer equipment, program products and media Technical Field

[0001] This disclosure generally relates to the field of microbial detection technology, and specifically to a substance identification device, method, computer equipment, program product, and medium. Background Technology

[0002] In existing technologies, the identification results of the microorganisms to be tested can be determined by applying mass spectrometry ionization technology. Specifically, matrix-assisted laser desorption / ionization technology can be used to determine the mass spectrometry detection results of the microorganisms to be tested, and the species of the microorganisms to be tested can be determined by comparing the mass spectrometry detection results with the standard mass spectrometry detection results of known categories of microorganisms.

[0003] However, when the mass spectrometry results of the microorganism to be tested are highly similar to the standard mass spectrometry results of multiple known categories of microorganisms, or when the microorganism sample to be tested contains multiple microorganisms, it is easy to be unable to determine the species of the microorganism to be tested, thus leading to a certain error in the identification results of the microorganism to be tested.

[0004] On the other hand, the mass spectrometry results of the microorganism to be tested are compared one by one with the standard mass spectrometry results of multiple known categories of microorganisms. When there are multiple matching standard mass spectrometry results of the microorganism to be tested, or when there are multiple microorganisms in the microorganism sample to be tested, it is necessary to further compare the multiple matching standard mass spectrometry results. In this case, in addition to the accuracy of the identification results being a pain point in the industry, the time spent in the above process also reduces the efficiency of microorganism identification.

[0005] Therefore, the poor accuracy of microbial identification results in existing technologies is a problem that urgently needs to be solved. Summary of the Invention

[0006] In view of the above-mentioned defects or deficiencies in the prior art, it is desirable to provide a substance identification device, method, computer equipment, program product and medium that compares the test object with reference objects based on protein difference information, so as to more accurately determine the degree of similarity between the test object and each reference object, thereby obtaining more accurate identification results of the test object and improving the accuracy of current microbial identification.

[0007] In a first aspect, the present invention provides a substance identification device, comprising a detection module for detecting a test object and obtaining its protein fingerprint, a storage module for storing microbial identification-related data, and a data processing module for retrieving data from the storage module for data processing. The data processing module is configured as follows:

[0008] The protein fingerprint of the test object is compared with the standard protein fingerprint of each reference object to obtain multiple candidate reference objects that are similar to the test object. The candidate reference objects include at least two types, and the two types include different species or different subspecies.

[0009] Determine the protein difference information for each candidate reference object; the protein difference information is the protein feature expression of the candidate reference object, and the protein difference information is used to distinguish the candidate reference object from one or more other candidate reference objects;

[0010] Based on the protein difference information of the candidate reference objects, a second similarity comparison is performed between the test object and the candidate reference objects, and the identification result of the test object is determined based on the comparison results of each candidate reference object.

[0011] In a second aspect, a method for identifying substances is provided, comprising: performing a first similarity comparison between the protein fingerprint spectrum of the test object and the standard protein fingerprint spectrum of each reference object to obtain multiple candidate reference objects similar to the test object, wherein the candidate reference objects include at least two types, and the two types include different species or different subspecies;

[0012] Determine the protein difference information for each candidate reference object; the protein difference information is the protein feature expression of the candidate reference object, and the protein difference information is used to distinguish the candidate reference object from one or more other candidate reference objects;

[0013] Based on the protein difference information of the candidate reference objects, a second similarity comparison is performed between the test object and the candidate reference objects, and the identification result of the test object is determined based on the comparison results of each candidate reference object.

[0014] Thirdly, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, it performs the steps performed by the substance identification device as described in the first aspect.

[0015] Fourthly, a computer-readable storage medium is provided having a computer program stored thereon, characterized in that, when executed by a processor, the program performs the steps performed by the substance identification device as described in the first aspect.

[0016] Fifthly, a computer program product is provided, which includes instructions that, when executed, cause the steps performed by the substance identification device in the first aspect to be performed.

[0017] Sixthly, a method for creating a database includes: determining multiple reference object combinations, each reference object combination including multiple reference objects with similar protein feature expression; for each reference object combination, performing protein expression difference analysis on each reference object in the reference object combination to obtain protein difference information of each reference object in the reference object combination; the protein difference information is the protein feature expression of the reference object and is used to distinguish the reference object from the other reference objects in the reference object combination; and creating a difference information database based on the correspondence between each reference object combination and the protein difference information of each reference object in the reference object combination.

[0018] Seventhly, a method for identifying microorganisms includes: acquiring a protein fingerprint of a sample to be tested; matching the protein fingerprint of the sample to be tested with a standard protein fingerprint to obtain a first matching result; determining whether the first matching result belongs to a reference object combination; if so, re-matching the protein fingerprint of the sample to be tested with a difference information database created based on the above-mentioned database creation method to obtain a second matching result; and confirming the second matching result as the identification result; if not, confirming the first matching result as the identification result.

[0019] Eighthly, a method for identifying microorganisms includes: acquiring a protein fingerprint of a sample to be tested; matching the protein fingerprint of the sample to be tested with a standard protein fingerprint to obtain a first matching result; determining whether the first matching result belongs to a reference object combination; if so, re-matching the protein fingerprint of the sample to be tested with a difference information database created based on the aforementioned database creation method to obtain a second matching result, and confirming the second matching result as the identification result; if not, determining whether there are at least two types of strains in the first matching result; if so, performing protein expression difference analysis on all types of strains in the first matching result to obtain protein difference information of each type of strain in the first matching result; re-matching the protein fingerprint of the sample to be tested with the protein difference information of each type of strain to obtain a third matching result, and using the third matching result as the identification result; if not, using the first matching result as the identification result.

[0020] Ninthly, a microbial identification system includes a quality analysis device, a memory, and a processor. The memory stores instructions, and the processor executes the instructions to perform the steps performed by the aforementioned substance identification device or the aforementioned database creation method, or the processor executes the instructions to perform the aforementioned microbial identification method.

[0021] In a tenth aspect, a substance identification device includes a detection module for detecting a test object and obtaining its protein fingerprint spectrum, a storage module for storing substance identification-related data, and a data processing module for calling data from the storage module for data processing. The data processing module is configured to: perform a first similarity comparison between the protein fingerprint spectrum of the test object and the standard protein fingerprint spectra of each reference object to obtain multiple candidate reference objects similar to the test object, wherein the candidate reference objects include at least two types, and the two types include different species or different subspecies; determine mixed information based on the candidate reference objects, and output a normal identification result or a mixed reference result for the test object based on the mixed information.

[0022] Compared to existing technologies that directly compare the target substance with known categories of microorganisms to determine its identification result, which introduces a certain degree of error, the substance identification device, method, computer equipment, program product, and medium provided in this application can perform a two-stage comparison of the target substance, greatly improving identification accuracy. Firstly, a first similarity comparison can be performed between the standard protein fingerprint spectrum of a reference object and the protein fingerprint spectrum of the target substance, eliminating some reference objects and obtaining candidate reference objects that are relatively similar to the target substance. Secondly, based on the protein difference information corresponding to each candidate reference object determined in the first comparison, a second similarity comparison is performed on the target substance to obtain its identification result. Because protein difference information can distinguish different candidate references, comparing the test object with the candidate references again based on protein difference information can further clarify the similarity between the test object and each candidate reference. This allows for a more accurate identification result of the test object based on the similarity with each candidate reference, thus solving the problem of uncertain identification results due to high similarity of protein fingerprints from multiple known categories of microorganisms or the presence of multiple microorganisms in the test object, thereby improving the accuracy of microbial identification. Furthermore, it can also identify mixed reference results in microbial identification, further improving the accuracy of identification. Attached Figure Description

[0023] Other features, objects, and advantages of this application will become more apparent from the following detailed description of non-limiting embodiments with reference to the accompanying drawings:

[0024] Figure 1 is a structural diagram of the substance identification device provided in an embodiment of this application;

[0025] Figure 2 is a schematic flowchart of a substance identification method provided in an embodiment of this application;

[0026] Figure 3 is a flowchart illustrating another substance identification method provided in an embodiment of this application;

[0027] Figure 4 is a schematic diagram of the identification result of the test object provided in an embodiment of this application;

[0028] Figure 5 is a schematic diagram of the structure of the computer device provided in an embodiment of this application;

[0029] Figure 6 is an implementation environment architecture diagram of the database creation method provided in the embodiments of this application. Detailed Implementation

[0030] The present application will now be described in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and not intended to limit it. Furthermore, it should be noted that, for ease of description, only the parts relevant to the invention are shown in the accompanying drawings.

[0031] It should be noted that, unless otherwise specified, the embodiments and features described in this application can be combined with each other. The present application will now be described in detail with reference to the accompanying drawings and embodiments. Furthermore, the term "and / or" in this document is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. The terms "first" and "second," etc., in the specification and claims of the embodiments of this application are used to distinguish different objects, not to describe a specific order of objects.

[0032] First, the terminology used in this application will be explained as follows:

[0033] (1) Mass Spectrometry: This is an analytical technique that identifies and quantifies the substances present in a analyte by measuring the mass-to-charge ratio of each ion in the analyte. The mass-to-charge ratio is a symbol describing the ratio between the mass and charge of an ion. For example, it can be M / Z, where M is the standard proton mass, measured in μ or Da, and Z is the charge of the ion. Its working principle is as follows: First, the analyte is ionized through electron bombardment, chemical ionization, or matrix-assisted laser desorption / ionization. Then, an electric field is used to accelerate the charged ions, giving them kinetic energy. The ion beam is then passed through a magnetic field to deflect the ions according to their mass-to-charge ratio. Finally, a detector is used to measure the number of ions.

[0034] (2) Protein fingerprinting: Also known as protein fingerprint sequence mapping, it is a spectrum drawn based on the mass-to-charge ratio of each ion after the separation and identification of proteins in a biological sample. It can display information such as the molecular weight and content of various proteins in the sample. Protein fingerprinting can characterize the relationship between the mass-to-charge ratio (m / z) of ions and the ion abundance (i.e., intensity). The peaks with higher ion abundance (i.e., higher intensity peaks) in the protein fingerprint spectrum can be called characteristic peaks in the protein fingerprint spectrum.

[0035] Figure 1 is an architectural diagram of a substance identification device provided in an embodiment of this application. As shown in Figure 1, the substance identification device includes: a detection module 101, a storage module 102, and a data processing module 103.

[0036] For example, the detection module 101 can be used to perform mass spectrometry analysis on the analyte and obtain the protein fingerprint spectrum of the analyte. The analyte can be a microbial sample, and the detection module 101 can include a mass spectrometry identification instrument using matrix-assisted laser desorption / ionization technology (i.e., MALDI-TOF-MS). Based on this, the substance placed into the mass spectrometry identification instrument can be a substance produced after processing an in vitro sample.

[0037] The storage module 102 can be used to store data related to the identification of substances in this application. Among them, the data related to microbial identification can be protein fingerprints obtained by the detection module 101, protein fingerprints and / or genetic information of various known categories of reference objects (e.g., known categories of microorganisms), and other relevant information.

[0038] The data processing module 103 can be used to call relevant data in the storage module 102 and use the relevant data to perform data processing to execute the substance identification method described in the embodiments of this application. The data processing module 103 may be, for example, a computer device.

[0039] In a specific implementation, the detection module 101 can perform mass spectrometry detection on the test object to obtain the protein fingerprint spectrum of the test object. Further, the data processing module 103 compares the protein fingerprint spectrum of the test object with the protein fingerprint spectrum of other known categories of microorganisms stored in the storage module 102 to determine the identification result of the test object.

[0040] However, when the protein fingerprint of the microorganism to be tested is highly similar to the standard mass spectrometry detection results of multiple known categories of microorganisms, or when the sample of the microorganism to be tested contains multiple microorganisms, it is easy to make it impossible to determine the species of the microorganism to be tested. Secondly, when the microorganism to be tested contains multiple microorganisms, the mutual interference between the microorganisms can also easily lead to certain errors in the identification results of the microorganism to be tested.

[0041] Based on this, this application provides a method for substance identification, which can be executed by the substance identification device shown in Figure 1, specifically by the data processing module 103. On one hand, the method or device can perform a similarity comparison between the standard protein fingerprint spectrum of a reference object and the protein fingerprint spectrum of the object to be tested, eliminating some reference objects and obtaining candidate reference objects that are relatively similar to the object to be tested. On the other hand, based on the protein difference information corresponding to each candidate reference object determined in the first comparison, a second similarity comparison is performed on the object to be tested to obtain the identification result of the object to be tested. Since protein difference information can distinguish different candidate reference objects, comparing the object to be tested with the candidate reference objects again based on the protein difference information can further clarify the degree of similarity between the object to be tested and each candidate reference object, thereby obtaining a more accurate identification result of the object to be tested based on the degree of similarity with each candidate reference object. This solves the problem that the identification result of the object to be tested cannot be determined due to the high similarity of protein fingerprint spectra of multiple known categories of microorganisms or the presence of multiple microorganisms in the object to be tested, thus improving the accuracy of microbial identification.

[0042] Protein difference information can be acquired in real time or pre-stored in storage module 102.

[0043] For example, Figure 2 is a schematic flowchart of a substance identification method provided in an embodiment of this application. As shown in Figure 2, the data processing module 103 is configured to perform the following steps:

[0044] Step S201: Perform a first similarity comparison between the protein fingerprint of the object to be tested and the standard protein fingerprint of each reference object to obtain multiple candidate reference objects that are similar to the object to be tested.

[0045] Optionally, the candidate reference objects may include at least two types, which may include different species or different subspecies.

[0046] In this embodiment, the protein fingerprint of the test object is first compared with the standard protein fingerprints of various known categories of reference objects to screen out candidate reference objects similar to the test object. This first similarity comparison eliminates some reference objects, significantly narrowing the comparison range for the subsequent second similarity comparison, thereby improving identification efficiency to some extent.

[0047] In one possible implementation, the analyte may include at least one microorganism. Microorganisms of the same category exhibit essentially consistent mass-to-charge ratios and ion abundance relationships among the ions produced by the breakdown of various proteins during mass spectrometry analysis; that is, the protein fingerprint spectra of microorganisms of the same category are also essentially consistent. Therefore, by obtaining the protein fingerprint spectra of the analyte microorganism through mass spectrometry analysis, and by comparing the protein fingerprint image of the analyte microorganism with the protein fingerprint spectra of various known categories of microorganisms, the analyte microorganism can be identified. Therefore, in this embodiment, the protein fingerprint spectra of the analyte are first compared with the standard protein fingerprint spectra of each reference object to identify microorganisms similar to the analyte microorganism, i.e., the candidate reference objects described in this embodiment.

[0048] In one possible implementation, standard protein fingerprints of each reference object can be obtained from the standard fingerprint library stored in the storage module 102. Each reference object can be a known category of microorganism.

[0049] For example, a large number of protein fingerprints of known microorganisms can be pre-acquired and stored in storage module 102 to construct a standard fingerprint library composed of standard protein fingerprints of various microorganisms.

[0050] In one possible implementation, N reference objects with the highest similarity to the test object can be determined based on the similarity results between the test object and each reference object. Here, N is an integer greater than 1. Furthermore, candidate reference objects for the test object can be determined based on these N reference objects.

[0051] For example, the similarity results between the test object and each reference object can characterize the degree of similarity between the test object and each reference object. Specifically, the similarity results can include similarity scores between the test object and each reference object. The magnitude of the similarity score is positively correlated with the degree of similarity; that is, the higher the similarity score between the test object and a certain reference object, the higher the similarity between the test object and that reference object.

[0052] For example, the similarity results between the test object and the reference object can be obtained by comparing the similarity between the protein fingerprint of the test object and the standard protein fingerprint of each reference object.

[0053] Specifically, the similarity score between the test object and each reference object can be determined by comparing the protein fingerprint of the test object with the standard protein fingerprint of each reference object. If the similarity score between the test object and each reference object is positively correlated with the degree of similarity between the test object and each reference object, the similarity scores can be sorted in descending order, and the N reference objects corresponding to the top N similarity scores in the descending order can be identified as candidate reference objects.

[0054] In one possible implementation, a scoring method can be used to compare the similarity between the protein fingerprint of the test object and the standard protein fingerprint of each reference object.

[0055] Specifically, the similarity information of the bar-shaped peaks corresponding to the protein fingerprint spectrum of the test object can be compared with the similarity information of the standard protein fingerprint spectrum of each reference object. When the protein fingerprint spectrum of the test object contains the same characteristic peak as the protein fingerprint spectrum of a certain reference object, a score is added to the reference object; otherwise, a score is deducted. Then, the similarity result between the test object and each reference object is obtained based on the specific score.

[0056] In this embodiment of the application, candidate reference objects for the object to be tested can be determined based on the above N reference objects in the following three ways.

[0057] Method 1: N reference objects can be directly identified as candidate reference objects.

[0058] For example, when reference objects A, B, and C with the highest similarity to the test object (i.e., the highest score) are determined based on the similarity scores between the test object and each reference object, reference objects A, B, and C can be identified as candidate reference objects.

[0059] Method 2: Reference objects of the same type as the N reference objects can be used as candidate reference objects.

[0060] For example, the reference objects of the same type can be other subspecies belonging to the same bacterial species as the N reference objects. Specifically, when A1 and A2 with the highest similarity to the test object are determined based on the similarity scores between the test object and each reference object, and A1 and A2 belong to the same subspecies under the same bacterial species, subspecies A1 and A2, as well as subspecies A3 under that bacterial species, can be determined as candidate reference objects.

[0061] Method 3: When there is a target reference object belonging to the reference object combination among the N reference objects, candidate reference objects can be determined based on the reference object combination and the N reference objects.

[0062] The reference object combination can include multiple reference objects with similar protein feature expression. Protein feature expression can be the expression of ions from microbial protein degradation in a protein fingerprint, such as the relationship between ion mass-to-charge ratio and ion abundance, i.e., characteristic peaks in the protein fingerprint. Therefore, multiple reference objects with similar protein feature expression can be multiple reference objects with similar protein fingerprints.

[0063] For example, the reference object combination can be a list of similar bacteria, or a combination of individual bacteria in the list of similar bacteria.

[0064] Specifically, when reference objects A, B, and C with the highest similarity to the test object are determined based on the similarity scores between the test object and each reference object, and reference objects A and B belong to the same reference object combination, reference objects A, B, D, E, and C in the reference object combination to which reference objects A and B belong can all be determined as candidate reference objects.

[0065] Taking the reference object combination as a list of similar bacteria as an example, when there is a target reference object A in the N reference objects determined based on the similarity scores between the test object and each reference object, all reference objects included in the list of similar bacteria 1 and the N reference objects other than reference object A can be determined as candidate reference objects.

[0066] Optionally, when reference objects A, B, and C with the highest similarity to the test object are determined based on their similarity scores with each reference object, and reference object A belongs to reference object combination 1 and reference object B belongs to reference object combination 2, then reference objects A, D, and E from reference object combination 1 (to which reference object A belongs), B, F, and G from reference object combination 2 (to which reference object B belongs), and reference object C can all be identified as candidate reference objects. Note that there is no necessary connection between reference object combination 1 and reference object combination 2.

[0067] Taking the reference object combination as a list of similar bacteria as an example, when there is a target reference object A in the list of similar bacteria 1 and a target reference object B in the list of similar bacteria 2 among the N reference objects determined based on the similarity scores between the test object and each reference object, all reference objects included in the lists of similar bacteria 1 and 2, as well as the reference objects other than reference objects A and B in the N reference objects, can be determined as candidate reference objects.

[0068] It should be noted that the information of the reference object combination to which each of the above reference objects belongs, as well as the information of other reference objects in the reference object combination, can be obtained from the storage module 102 (that is, the storage module 102 pre-stores reference object combination information).

[0069] [Creating and updating reference object combinations]

[0070] For example, reference object combination information is used to characterize multiple reference object combinations and the protein fingerprint map corresponding to each reference object combination, wherein the reference object combination includes multiple reference objects with similar protein feature expression.

[0071] In one possible implementation, based on the first similarity comparison, a new reference object combination can be created or an existing reference object combination can be updated based on the N reference objects with the highest similarity to the object to be tested.

[0072] For example, update information is generated based on the new or updated reference object combination and the protein fingerprint of the reference objects recorded in the combination. The update information can characterize the update status of an existing reference object combination or a newly created reference object combination, or the update information can be added to the reference object combination information.

[0073] For example, based on the N reference objects / candidate reference objects obtained after the first similarity comparison and the corresponding gene information or kinship information of the reference objects / candidate reference objects, the existing reference object combination can be updated or a new reference object combination can be created.

[0074] Specifically, when creating a new reference object combination or updating an existing reference object combination based on the N reference objects with the highest similarity to the test object obtained through the first similarity comparison, it can be first determined whether each of the N reference objects belongs to the existing reference object combination. Then, the genetic information or kinship information of each reference object that does not belong to the existing reference object combination is compared with the reference objects in the existing reference object combination. If the comparison result indicates that the reference object that does not belong to the existing reference object combination is similar to the reference object in the existing reference object combination, then the reference object is added to the reference object combination to which the similar reference object belongs, and the reference object combination is updated.

[0075] Alternatively, if the comparison results indicate that a reference object not belonging to the existing reference object combination is not similar to the reference objects in the existing reference object combination, then the genetic information or kinship information of the reference object is compared with the genetic information or kinship information of the other reference objects not belonging to the existing reference object combination. If the comparison results indicate that the reference object is similar to the other reference objects not belonging to the existing reference object combination, then a new reference object combination is created based on the reference object and the reference objects similar to the reference object.

[0076] Optionally, in the above comparison, the specific methods for determining whether there is similarity between reference objects through genetic information or kinship information include: when the similarity of the genetic information of a reference object with other reference objects reaches a preset threshold or the kinship information is consistent, it can be determined that the reference object is similar to other reference objects. It is understood that the average nucleotide identity (ANI) in the genetic information can be used to determine whether there is similarity between reference objects.

[0077] For example, when reference subjects with an average genomic genetic similarity of 94% or higher essentially belong to the same species, reference subjects with an average genomic genetic similarity of 93%, 92%, 91%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, etc., can be defined as similar reference subjects based on the accuracy of mass spectrometry identification in actual use. It should be noted that various methods can be used to determine the similarity between reference subjects using genetic information, and this application does not impose any restrictions on this.

[0078] Correspondingly, if a certain mass spectrometer has high identification accuracy and is only prone to confusing bacterial species with an average genomic genetic similarity of 93% or more, then the above-mentioned preset threshold can be set to 93%; while based on research findings, the average genomic genetic similarity between microorganisms listed as the same reference group is 80%, so the above-mentioned preset threshold can be set to 80%.

[0079] In this embodiment of the application, when determining candidate reference objects similar to the test object, the reference object with the highest similarity to the test object can be created as a reference object combination, and the specific reference objects and their protein fingerprints recorded in the reference object combination can be recorded and stored in the storage module 102.

[0080] For example, the N reference objects with the highest similarity to the object to be tested can be created as a new reference object combination, or the existing reference object combination can be updated based on the N reference objects with the highest similarity to the object to be tested, so as to form updated information based on the newly created reference object combination or the updated reference object combination and the protein fingerprint of each reference object in the reference object combination, thereby adding the updated information to the reference object combination information.

[0081] For example, when reference objects A, B, and C with the highest similarity to the object under test are determined based on the similarity results between the object under test and each reference object, reference objects A, B, and C can be created as reference object combination 1.

[0082] Specifically, the protein fingerprints of the aforementioned reference object combination 1 and reference objects A, B, and C can be added to the reference object combination information. Here, reference object combination 1 can be understood as a list of similar objects (e.g., a list of similar bacteria).

[0083] Preferably, if the similarity of the gene information of reference objects A, B, and C reaches a preset threshold or the kinship information of reference objects A, B, and C is consistent, then reference objects A, B, and C are used to create or update reference object combination 1; otherwise, no creation or update is performed.

[0084] For example, when reference object A belongs to the existing reference object combination 1 while reference objects B and C do not belong to reference object combination 1, if the similarity of the genetic information of reference objects A, B, and C reaches a preset threshold or the kinship information of reference objects A, B, and C is consistent, then reference objects B and C are added to reference object combination 1 to update reference object combination 1; otherwise, no update is performed.

[0085] For example, when reference objects A, B, and C do not belong to the existing reference object combination, if the comparison of gene information or kinship information between each reference object by the data processing module 103 can determine that the similarity between the gene information of reference objects A, B, and C and the gene information of one or more reference objects in the existing reference object combination 1 reaches a preset threshold, then reference objects A, B, and C are added to the reference object combination 1 to update the reference object combination 1.

[0086] For example, when reference objects A1, A2, and B1 with the highest similarity to the test object are determined based on the similarity results between the test object and each reference object, and reference objects A1 and A2 belong to the same reference object combination 2, reference object B1 can be added to reference object combination 2 to update reference object combination 2.

[0087] Step S202: Determine the protein difference information of each candidate reference object. The protein difference information refers to the protein feature expression of the candidate reference object and is used to distinguish the candidate reference object from one or more other candidate reference objects. It is understood that determining the protein difference information of each candidate reference object means obtaining the protein difference information from the storage module 102 for subsequent comparison; it does not emphasize the "confirmation" action. As long as the protein difference information is used for subsequent comparison, "confirmation" is necessarily performed. The protein difference information can be obtained through real-time comparison of data in the storage module 102, or it can be obtained by pre-comparing a set of protein difference information and pre-stored in the storage module 102.

[0088] In this embodiment of the application, after performing the first similarity comparison in step S201, multiple candidate reference objects similar to the test object can be identified. Since the candidate reference objects are all highly similar to the test object and there is a certain degree of similarity between them, in order to perform a more accurate second similarity comparison, the protein difference information of each candidate reference object can be obtained in step S202 so that the test object and each candidate reference object can be compared based on the protein difference information to obtain a more accurate identification result.

[0089] For example, protein difference information of a candidate reference object can be used to distinguish that candidate reference object from one or more other candidate reference objects. For instance, protein difference information can be the differential characteristic peaks in the protein fingerprint of a candidate reference object, the genome of a candidate reference object that is different from one or more other candidate reference objects, or the protein gene of a candidate reference object that is different from one or more other candidate reference objects.

[0090] It should be noted that, among the various methods listed in this application for determining the protein difference information of each candidate reference object and performing a second similarity comparison based on the protein difference information, defining the protein difference information as any of the above methods will not affect the understanding and implementation of the technical solution of this application.

[0091] The following text focuses on the example of protein difference characteristics, specifically the differential peaks.

[0092] For example, the differential characteristic peaks of candidate reference objects, or combinations of differential characteristic peaks composed of differential characteristic peaks of candidate reference objects, can be used to uniquely characterize candidate reference objects.

[0093] Specifically, taking candidate reference objects A, B, and C as an example, when the protein fingerprint spectrum of candidate reference object A contains a characteristic peak 1, while the protein fingerprint spectra of candidate reference objects B and C do not contain a characteristic peak 1, then characteristic peak 1 can be determined as the unique differential characteristic peak representing candidate reference object A. That is, characteristic peak 1 is used to distinguish candidate reference object A from the other two candidate reference objects, and characteristic peak 1 is the protein differential information of candidate reference object A.

[0094] Optionally, if, based on gene alignment, protein alignment, or protein fingerprint comparison, it is determined that the protein fingerprint of candidate reference object A includes characteristic peaks 1, 2, and 3, the protein fingerprint of candidate reference object B includes characteristic peaks 2 and 3, and the protein fingerprint of candidate reference object C includes characteristic peaks 1 and 3, then it can be concluded that the difference characteristic peak between candidate reference object A and candidate reference object B is characteristic peak 1 in the protein fingerprint, and the difference characteristic peak between candidate reference object A and candidate reference object C is characteristic peak 2 in the protein fingerprint. In this case, characteristic peak 1 or characteristic peak 2 alone cannot distinguish candidate reference object A from the other two candidate reference objects. Instead, a combination of characteristic peaks 1, 2, and 3 or a combination of characteristic peaks 1 and 2 (i.e., the difference characteristic peak combination) is needed to uniquely characterize candidate reference object A. In other words, the difference characteristic peak combination is used to distinguish candidate reference object A from the other two candidate reference objects.

[0095] It should be noted that, in the above example, although a single characteristic peak cannot uniquely characterize a candidate reference object, if a combination of difference characteristic peaks can uniquely characterize a candidate reference object, then characteristic peak 1, characteristic peak 2, or characteristic peak 3, as characteristic peaks constituting the combination of difference characteristic peaks, can all be identified as the difference characteristic peaks described in this application.

[0096] Secondly, although characteristic peak 1 or the combination of characteristic peaks 1 and 3 can only distinguish between candidate reference object 1 and candidate reference object 2, but not between candidate reference object 1 and candidate reference object 3 (i.e. characteristic peak 1 can only distinguish between candidate reference object and the other candidate reference object), characteristic peak 1 or the combination of characteristic peaks 1 and 3 also belongs to the protein difference information between candidate reference object and other candidate reference objects described in this application.

[0097] Optionally, when the protein fingerprint spectra of all reference objects in reference object combination 1 contain characteristic peak 1, while the protein fingerprint spectra of all reference objects in reference object combination 2 do not contain characteristic peak 1, then characteristic peak 1 can be identified as the difference characteristic peak between reference object combination 1 and reference object combination 2. That is, characteristic peak 1 can be used as a candidate reference object that can be used to uniquely characterize different categories. For example, if candidate reference object A belongs to reference object combination 1 and candidate reference object B belongs to reference object combination 2, then characteristic peak 1 can be the difference characteristic peak between candidate reference objects A and B.

[0098] Optionally, the differential characteristic peaks among candidate references can also be characteristic peaks corresponding to mutually exclusive gene fragments among the candidate references. It should be noted that the same gene fragment exists only once in a single organism; therefore, the protein characteristic expression of this gene fragment in a single organism has only one molecular weight. Based on this, the protein characteristic expression (i.e., characteristic peaks) corresponding to mutually exclusive gene fragments can be identified as differential characteristic peaks.

[0099] For example, taking "single eyelid" and "double eyelid" as examples, the gene expression fragments of "single eyelid" and "double eyelid" are mutually exclusive. When the translation result of gene fragment 1 of candidate reference object A is double eyelid and the translation result of gene fragment 2 of candidate reference object B is single eyelid, the characteristic peak 1 of gene fragment 1 in the protein fingerprint can be used as the differential characteristic peak of candidate reference object A, and the characteristic peak 2 of gene fragment 2 in the protein fingerprint can be used as the differential characteristic peak of candidate reference object B.

[0100] In one possible implementation, for each candidate reference object, protein expression differential analysis can be performed on each candidate reference object to obtain protein differential information of the candidate reference objects.

[0101] Specifically, after identifying candidate reference subjects, the data processing module 103 can use forward compilation to perform protein expression differential analysis on the candidate reference subjects to determine protein difference information in real time. Forward compilation can be based on characteristic proteins obtained from the genome encoding of the candidate reference subjects, or it can be based on protein analysis results from a protein database.

[0102] Taking the protein difference information as the differential characteristic peaks in the protein fingerprint spectrum corresponding to a certain candidate reference as an example, the determination of differential characteristic peaks is explained:

[0103] First, differential characteristic peaks of candidate reference subjects are determined based on their genomes:

[0104] For example, characteristic proteins of candidate reference objects can be determined first based on the genome of the candidate reference objects, and differential proteins of each candidate reference object can be determined based on the characteristic proteins of each candidate reference object. In this way, the characteristic peaks corresponding to the differential proteins in the corresponding protein fingerprints can be determined as the differential characteristic peaks of each candidate reference object.

[0105] Optionally, differentially expressed genomes among the candidate reference objects can be identified first based on their genomes, and differentially expressed proteins of each candidate reference object can be identified based on their differentially expressed genomes. The characteristic peaks corresponding to the differentially expressed proteins in the corresponding protein fingerprints can then be identified as the differential characteristic peaks of each candidate reference object.

[0106] Optionally, the characteristic proteins of the candidate reference objects can be determined first based on their genomes, and then the characteristic peaks of each candidate reference object in the corresponding protein fingerprint can be determined based on the characteristic proteins of each candidate reference object, thereby identifying the characteristic peaks of each candidate reference object in the corresponding protein fingerprint as the differential characteristic peaks of each candidate reference object.

[0107] Secondly, differential characteristic peaks of candidate reference materials are determined based on protein databases:

[0108] For example, characteristic proteins of candidate reference objects can be obtained from a protein database, and the characteristic peaks corresponding to the characteristic proteins of each candidate reference object in the corresponding protein fingerprint spectrum can be identified as the differential characteristic peaks of each candidate reference object.

[0109] Optionally, characteristic proteins of candidate reference objects can be obtained from a protein database to determine the differential proteins of each candidate reference object based on the characteristic proteins of each candidate reference object, thereby determining the characteristic peaks corresponding to the differential proteins of each candidate reference object in the corresponding protein fingerprint as the differential characteristic peaks of each candidate reference object.

[0110] Third, the differential characteristic peaks of candidate reference objects are determined based on their protein fingerprint profiles:

[0111] For example, the protein fingerprints of each candidate reference object can be compared to obtain the differential characteristic peaks of each candidate reference object.

[0112] Of course, the method of determining protein difference information also includes obtaining protein difference information of each candidate reference object from the difference information database; wherein, the difference information database stores at least the protein difference information of the candidate reference objects, and the protein difference information at least contains difference characteristic peaks; the method of establishing the difference information database can be selected from the above three methods.

[0113] In one possible implementation, before determining the protein difference information of each candidate reference object, interfering features in the protein fingerprint profiles of each candidate reference object can be removed to eliminate interference when determining the protein difference information and improve the accuracy of the determination. The interfering features can be common features in the protein fingerprint profiles of each candidate reference object.

[0114] Specifically, the expression of the same protein features in the protein fingerprints of each candidate reference object can be ignored, and the differential characteristic peaks of each candidate reference object can be determined based on the protein fingerprints after ignoring the expression of the same protein features.

[0115] In one possible implementation, during the process of determining the characteristic proteins of candidate reference objects based on their genomes, the expression deficiencies of theoretically differentially expressed peaks can be corrected. These theoretically differentially expressed peaks can be the differentially expressed peaks corresponding to the genomes of the candidate reference objects.

[0116] Specifically, it is possible to determine the protein expression loss at the differential characteristic peaks in the genome of candidate reference objects, and to correct the differential characteristic peaks in the protein fingerprint based on the protein expression loss.

[0117] Step S203: Based on the protein difference information of the candidate reference objects, a second similarity comparison is performed between the test object and the candidate reference objects, and the identification result of the test object is determined based on the comparison results of each candidate reference object.

[0118] In this embodiment of the application, since the protein feature expression characterized by the protein difference information can distinguish different candidate reference objects, when the test object and the candidate reference objects are compared again based on the protein difference information of the candidate reference objects determined in step S202, the degree of similarity between the test object and each candidate reference object can be further clarified, so as to obtain a more accurate identification result of the test object based on the degree of similarity with each candidate reference object.

[0119] In one possible implementation, a second similarity comparison is performed between the test object and each candidate reference object based on the protein difference information of the candidate reference objects to obtain the identification result of the test object. The second similarity comparison can be implemented using a similarity scoring method.

[0120] For example, the mass-to-charge ratio (m / z value), intensity, and weight information of the difference characteristic peaks corresponding to protein difference information in the protein fingerprint spectrum can be adjusted to increase the importance of the difference characteristic peaks in determining the similarity comparison results between the test object and the candidate reference object, so as to obtain a more accurate comparison result between the test object and the candidate reference object based on the difference characteristic peaks of the candidate reference object.

[0121] Specifically, for each candidate reference object, the scoring weight of the differential characteristic peaks of the candidate reference object can be increased. Based on the weight and amplitude of each characteristic peak in the protein fingerprint spectrum of the candidate reference object, the similarity score between the candidate reference object and the test object is scored to obtain the similarity score between the candidate reference object and the test object. Based on the similarity score of the candidate reference object, the matching result between the test object and the candidate reference object is output.

[0122] For example, when comparing the bar-shaped peak information corresponding to the protein fingerprint spectrum of the test object with the protein fingerprint spectrum of the candidate reference object, a scoring operation can be performed based on the weight and amplitude of the characteristic peak when the two have the same characteristic peak, so as to obtain the similarity score between the candidate reference object and the test object; when the protein fingerprint spectrum of the test object shows the difference characteristic peak of the candidate reference object, a scoring operation can be performed based on the increased scoring weight (e.g., product coefficient).

[0123] Optionally, for each candidate reference object, the candidate reference object and the test object can be compared based on the differential characteristic peaks of the candidate reference object, so as to output the matching result between the test object and the candidate reference object based on whether the protein fingerprint spectrum of the test object contains the same characteristic peaks as the differential characteristic peaks.

[0124] For example, when comparing the bar-shaped peak information corresponding to the protein fingerprint spectrum of the test object with the protein fingerprint spectrum of the candidate reference object, if the protein fingerprint spectrum of the test object contains the same characteristic peak as the difference characteristic peak, it can be determined that the test object matches the candidate reference object; otherwise, it is determined that the test object does not match the candidate reference object.

[0125] For example, the identification result of the test object can be determined based on the matching result between the test object and the candidate reference object.

[0126] Specifically, when the similarity score between the test object and the candidate reference object is higher than the preset threshold, it can be determined that the test object and the candidate reference object are a match. At this time, the candidate reference object that matches the test object can be determined as the identification result of the test object.

[0127] Optionally, when the similarity score between the test object and the candidate reference object is the highest, it can be determined that the test object and the candidate reference object are the same, and at this time the candidate reference object can be identified as the test object.

[0128] In one possible implementation, different types of candidate reference objects can correspond to different types of identification results for the objects to be tested.

[0129] For example, when multiple candidate reference objects include at least several subspecies belonging to the same bacterial species, the identification result of the test object corresponds to the subspecies.

[0130] Optionally, when multiple candidate reference objects include at least multiple strains belonging to the same genus, the identification result of the test object corresponds to the bacterial species.

[0131] Optionally, when multiple candidate reference objects include at least multiple species belonging to the same complex microbial community, the identification result to be tested corresponds to the microbial species.

[0132] Optionally, when multiple candidate reference objects include at least multiple bacteria belonging to the same list of similar bacteria, the identification result of the test object corresponds to the bacteria.

[0133] Compared to existing technologies that directly determine the identification result of the test object by comparing it with known categories of microorganisms, the substance identification device provided in this application can perform a second similarity comparison of the test object based on the protein difference information corresponding to each candidate reference object determined in the first comparison, after using the standard protein fingerprint spectrum of the reference object to perform a first similarity comparison of the protein fingerprint spectrum of the test object. Since the protein feature expression characterized by the protein difference information can distinguish different candidate reference objects, the application of the protein difference information can further clarify the degree of similarity between the test object and each candidate reference object, so as to obtain a more accurate identification result of the test object based on the degree of similarity with each candidate reference object. This solves the problem that the identification result of the test object cannot be determined due to the high similarity of the protein fingerprint spectra of multiple known categories of microorganisms, and improves the accuracy of microbial identification.

[0134] [Creating a database of protein differential information]

[0135] The following is a further explanation of "pre-compare and obtain a set of protein difference information, and then pre-store it in storage module 102".

[0136] To address the time-consuming process of comparing the mass spectrometry results of the microorganism to be tested with the standard mass spectrometry results of multiple known categories of microorganisms in existing technologies, a method can be adopted to screen multiple known categories of microorganisms whose mass spectrometry results are similar to those of the microorganism to be tested, and use them as candidate reference objects. Then, the protein difference information of each candidate reference object is determined one by one, and the microorganism to be tested is further compared with the candidate reference objects based on the protein difference information of the candidate reference objects. Based on the further comparison results, the identification result of the microorganism to be tested is determined.

[0137] However, determining the protein difference information of each of the aforementioned candidate reference objects is time-consuming, resulting in low efficiency in the identification of the microorganisms to be tested. To address this, this application proposes a database creation method. This method pre-analyzes the protein expression differences of reference objects with similar protein characteristics in multiple reference object combinations to obtain protein difference information for each reference object. Then, a difference information database is constructed based on the protein difference information of each reference object combination and its constituent reference objects. This allows for the determination of the microorganism identification result based on the protein difference information in the difference database when at least one similar object to the microorganism to be identified is initially determined, thereby improving the efficiency of microorganism identification.

[0138] In one embodiment of this application, an implementation scenario of this application is provided. Figure 6 is an implementation environment architecture diagram of the database creation method provided in the embodiment of this application. As shown in Figure 6, the database creation device 10 includes a grouping module 11, a processing module 12, and a creation module 13. The database creation device 10 is, for example, a computer device.

[0139] For example, the grouping module 11 is used to determine multiple reference objects with similar protein feature expressions as a reference object combination based on the protein feature expression of each reference object; the processing module 12 is used to perform protein expression difference analysis on each reference object in each reference object combination to obtain protein difference information of each reference object; the creation module 13 is used to construct a difference information database based on the multiple reference object combinations determined by the grouping module 11 and the protein difference information of each reference object in each reference object combination obtained by the processing module 12.

[0140] In a specific implementation, the grouping module 11 can perform mass spectrometry analysis on the reference objects to obtain the protein fingerprint spectrum of each reference object, thereby determining multiple reference object combinations based on the protein fingerprint spectrum of each reference object; wherein, the reference object can be the microbial sample to be tested, and the grouping module 11 can include a mass spectrometry identification instrument using matrix-assisted laser desorption / ionization technology (i.e., MALDI-TOF-MS), based on which the substance placed into the mass spectrometry identification instrument can be the substance produced after processing the in vitro sample.

[0141] In another embodiment of this application, a method for creating a database is also provided. Exemplarily, the method specifically includes the following steps:

[0142] Step S401: Determine multiple reference object combinations, which include multiple reference objects with similar protein feature expression.

[0143] In one possible implementation, standard protein fingerprints of multiple reference objects can be obtained. Based on the similarity of the standard protein fingerprints of each reference object, reference objects with similar standard protein fingerprints can be identified as a reference object combination. Taking the example that each reference object is a known species of microorganism, the reference object combination can also be called a list of similar bacteria.

[0144] For example, for each reference object, a large number of protein fingerprints of the reference object can be obtained by mass spectrometry analysis, and a standard protein fingerprint of the reference object can be obtained based on the processing results of the large number of protein fingerprints.

[0145] Correspondingly, a protein fingerprint library can be established based on the correspondence between each reference object and its determined standard protein fingerprint, so that the standard protein fingerprint corresponding to the reference object can be directly obtained from the protein fingerprint library.

[0146] Another possible implementation involves using the genetic or kinship information of multiple reference objects to identify a group of reference objects whose genetic similarity reaches a preset threshold or whose kinship is consistent. It is understandable that the average nucleotide identity (ANI) in the genetic information can be used to determine whether similarities exist between the reference objects.

[0147] For example, when reference subjects with an average genomic genetic similarity of 94% or higher essentially belong to the same species, reference subjects with an average genomic genetic similarity of 93%, 92%, 91%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, etc., can be defined as similar reference subjects based on the accuracy of mass spectrometry identification in actual use. It should be noted that various methods can be used to determine the similarity between reference subjects using genetic information, and this application does not impose any restrictions on this.

[0148] Correspondingly, if a certain mass spectrometer has high identification accuracy and is only prone to confusing bacterial species with an average genomic genetic similarity of 93% or more, then the above-mentioned preset threshold can be set to 93%; while based on research findings, the average genomic genetic similarity between microorganisms listed as the same reference group is 80%, so the above-mentioned preset threshold can be set to 80%.

[0149] In this embodiment of the application, by determining multiple reference objects with similar protein fingerprint patterns as a reference object combination, other similar reference objects of the test object can be intuitively determined based on the reference object combination after initially obtaining similar reference objects of the test object, thereby shortening the comparison time between the test object and other reference objects.

[0150] Step S402: For each reference object combination, perform protein expression differential analysis on each reference object in the reference object combination to obtain protein difference information of each reference object in the reference object combination; the protein difference information is the protein feature expression of the reference object and is used to distinguish the reference object from the other reference objects in the reference object combination.

[0151] For example, based on the characteristic peaks in a protein fingerprint, protein difference information can be the difference characteristic peaks in a standard protein fingerprint of a reference object.

[0152] In one possible implementation, protein expression differential analysis of each reference object can be performed through forward compilation to determine the protein difference information of each reference object.

[0153] For example, forward compilation can be based on the characteristic proteins obtained from the genome encoding of each reference object, or it can be based on the protein analysis results of the reference object in a protein database.

[0154] For example, protein difference information of a reference object can be used to distinguish that reference object from one or more other reference objects. For instance, protein difference information can be the differential characteristic peaks in the protein fingerprint of a reference object, the genome of a reference object that is different from one or more other reference objects, or the protein gene of a reference object that is different from one or more other reference objects.

[0155] It should be noted that defining protein difference information in this application according to any of the above-mentioned methods will not affect the understanding and implementation of the technical solution of this application.

[0156] Step S403: Create a difference information database based on each reference object combination and the correspondence between the protein difference information of each reference object in the reference object combination.

[0157] For example, after the creation of the above-mentioned difference information database is completed, the difference information database can be used to identify the species of the test object (e.g., the test microorganism).

[0158] For example, an application process for a difference information database provided in this application embodiment includes the following steps:

[0159] Step S501: Perform a first similarity comparison between the protein fingerprint of the test object and the standard protein fingerprint of each reference object to obtain multiple candidate reference objects similar to the test object.

[0160] Optionally, the candidate reference objects may include at least two types, which may include different species or different subspecies.

[0161] In this embodiment, the protein fingerprint of the test object is first compared with the standard protein fingerprints of various known categories of reference objects to screen out candidate reference objects similar to the test object. This first similarity comparison eliminates some reference objects, significantly narrowing the comparison range for the subsequent second similarity comparison, thereby improving identification efficiency to some extent.

[0162] Step S502: Based on the difference information database, obtain the protein difference information of each candidate reference object;

[0163] In this embodiment of the application, after performing the first similarity comparison in step S501, multiple candidate reference objects similar to the test object can be identified. Since the candidate reference objects are all highly similar to the test object and there is a certain degree of similarity between them, in order to perform a more accurate second similarity comparison, in step S502, the protein difference information of each candidate reference object can be obtained from the difference database so that the test object and each candidate reference object can be compared in the second similarity based on the protein difference information to obtain a more accurate identification result.

[0164] Step S503: Based on the protein difference information of each candidate reference object, perform a second similarity comparison between the test object and the candidate reference objects, and determine the identification result of the test object based on the comparison results of each candidate reference object.

[0165] In this embodiment of the application, since the protein feature expression characterized by the protein difference information can distinguish different candidate reference objects, when the test object and the candidate reference objects are compared again based on the protein difference information of the candidate reference objects determined in step S502, the degree of similarity between the test object and each candidate reference object can be further clarified, so as to obtain a more accurate identification result of the test object based on the degree of similarity with each candidate reference object.

[0166] It should be noted that when there is no protein difference information corresponding to the candidate reference object in the above difference information database, the reference object combination to which the candidate reference object belongs can be determined first. Then, by performing protein expression difference analysis on each reference object and the candidate reference object in the reference object combination, the protein difference information obtained after the protein expression difference analysis can be stored in the difference information database to complete the update of the difference information database.

[0167] In another embodiment of this application, an example of protein differential information is also provided.

[0168] For example, taking protein difference information as difference characteristic peaks as an example, the difference characteristic peaks of the reference object or the combination of difference characteristic peaks composed of the difference characteristic peaks of the reference object can be used to uniquely characterize the reference object.

[0169] Specifically, taking reference objects A, B, and C as an example, when the protein fingerprint spectrum of reference object A contains a characteristic peak 1, and the protein fingerprint spectra of reference objects B and C do not contain a characteristic peak 1, then characteristic peak 1 can be identified as the unique differential characteristic peak representing reference object A. That is, characteristic peak 1 is used to distinguish reference object A from the other two reference objects, and characteristic peak 1 is the protein differential information of reference object A.

[0170] Optionally, if, based on gene alignment, protein alignment, or protein fingerprint comparison, it is determined that the protein fingerprint of reference object A includes characteristic peaks 1, 2, and 3, the protein fingerprint of reference object B includes characteristic peaks 2 and 3, and the protein fingerprint of reference object C includes characteristic peaks 1 and 3, then it can be concluded that the difference characteristic peak between reference object A and reference object B is characteristic peak 1 in the protein fingerprint, and the difference characteristic peak between reference object A and reference object C is characteristic peak 2 in the protein fingerprint. In this case, characteristic peak 1 or characteristic peak 2 alone cannot distinguish reference object A from the other two reference objects. Instead, a combination of characteristic peaks 1, 2, and 3 or a combination of characteristic peaks 1 and 2 (i.e., the difference characteristic peak combination) is needed to uniquely characterize reference object A. That is, the difference characteristic peak combination is used to distinguish reference object A from the other two reference objects.

[0171] It should be noted that, in the above example, although a single characteristic peak cannot uniquely characterize the reference object, when a combination of different characteristic peaks can uniquely characterize the reference object, characteristic peak 1, characteristic peak 2, or characteristic peak 3, as characteristic peaks constituting the combination of different characteristic peaks, can all be identified as the different characteristic peaks described in this application.

[0172] Secondly, although characteristic peak 1 or the combination of characteristic peaks 1 and 3 can only distinguish between reference object 1 and reference object 2, but not between reference object 1 and reference object 3 (i.e. characteristic peak 1 can only distinguish between the reference object and the other reference object), characteristic peak 1 or the combination of characteristic peaks 1 and 3 also belongs to the protein difference information between the reference object and the other reference objects described in this application.

[0173] Optionally, when the protein fingerprint spectra of all reference objects in reference object combination 1 contain characteristic peak 1, while the protein fingerprint spectra of all reference objects in reference object combination 2 do not contain characteristic peak 1, then characteristic peak 1 can be identified as the difference characteristic peak between reference object combination 1 and reference object combination 2. That is, characteristic peak 1 can be used as a unique reference object that can be used to characterize different categories. For example, if reference object A belongs to reference object combination 1 and reference object B belongs to reference object combination 2, then characteristic peak 1 can be the difference characteristic peak between reference objects A and B.

[0174] Optionally, the differential characteristic peaks between reference objects can also be characteristic peaks corresponding to mutually exclusive gene fragments between the reference objects. It should be noted that the same gene fragment exists only once in a single organism; therefore, the protein characteristic expression of this gene fragment in a single organism has only one molecular weight. Based on this, the protein characteristic expression (i.e., characteristic peaks) corresponding to mutually exclusive gene fragments can be identified as differential characteristic peaks.

[0175] For example, taking "single eyelid" and "double eyelid" as examples, the gene expression fragments of "single eyelid" and "double eyelid" are mutually exclusive. When the translation result of gene fragment 1 of reference object A is double eyelid and the translation result of gene fragment 2 of reference object B is single eyelid, the characteristic peak 1 of gene fragment 1 in the protein fingerprint can be used as the differential characteristic peak of reference object A, and the characteristic peak 2 of gene fragment 2 in the protein fingerprint can be used as the differential characteristic peak of reference object B.

[0176] In another embodiment of this application, based on the protein difference information in this application being difference characteristic peaks, an example is also provided for determining the protein difference information (i.e., difference characteristic peaks) of each reference object.

[0177] For example, taking the protein difference information as the difference characteristic peaks in the protein fingerprint spectrum corresponding to a certain reference object as an example, the determination of difference characteristic peaks is explained:

[0178] First, the differential characteristic peaks of the reference object are determined based on the reference object's genome:

[0179] For example, characteristic proteins of a reference object can be determined first based on the genome of the reference object, and differential proteins of each reference object can be determined based on the characteristic proteins of each reference object. In this way, the characteristic peaks corresponding to the differential proteins in the corresponding protein fingerprints can be identified as the differential characteristic peaks of each reference object.

[0180] Optionally, differentially expressed genomes in reference objects can be identified first based on the genome of the reference object, and differentially expressed proteins in each reference object can be identified based on the differentially expressed genomes of each reference object. In this way, the characteristic peaks corresponding to the differentially expressed proteins in the corresponding protein fingerprints can be identified as the differential characteristic peaks of each reference object.

[0181] Optionally, characteristic proteins of a reference object can be determined first based on the genome of the reference object, and characteristic peaks of each reference object in the corresponding protein fingerprint can be determined based on the characteristic proteins of each reference object, thereby identifying the characteristic peaks of each reference object in the corresponding protein fingerprint as the differential characteristic peaks of each reference object.

[0182] Secondly, differential characteristic peaks of reference objects are determined based on protein databases:

[0183] For example, characteristic proteins of reference objects can be obtained from a protein database, and the characteristic peaks corresponding to the characteristic proteins of each reference object in the corresponding protein fingerprint spectrum can be identified as the differential characteristic peaks of each reference object.

[0184] Optionally, characteristic proteins of reference objects can be obtained from a protein database to determine differentially expressed proteins of each reference object based on the characteristic proteins of each reference object, thereby identifying the characteristic peaks corresponding to the differentially expressed proteins of each reference object in the corresponding protein fingerprint as the differentially expressed characteristic peaks of each reference object.

[0185] Third, the differential characteristic peaks of the reference object are determined based on the protein fingerprint of the reference object:

[0186] For example, the protein fingerprints of each reference object can be compared to obtain the characteristic peaks of each reference object.

[0187] In one possible implementation, before determining the protein difference information of each reference object, interfering features in the protein fingerprint profiles of each reference object can be removed to eliminate interference when determining the protein difference information and improve the accuracy of the determination. The interfering features can be common features in the protein fingerprint profiles of each reference object.

[0188] Specifically, the expression of the same protein features in the protein fingerprints of each reference object can be ignored, and the differential characteristic peaks of each reference object can be determined based on the protein fingerprints after ignoring the expression of the same protein features.

[0189] This application also provides a method for microbial identification. In some possible embodiments, the method includes: creating a difference information database according to the creation method described above; obtaining a protein fingerprint spectrum of the sample to be tested; matching the protein fingerprint spectrum of the sample to be tested with a standard protein fingerprint spectrum to obtain a first matching result; determining whether the first matching result belongs to a reference object combination; if so, re-matching the protein fingerprint spectrum of the sample to be tested based on the difference information database to obtain a second matching result; and confirming the second matching result as the identification result; if not, confirming the first matching result as the identification result. This method can quickly and accurately obtain identification results.

[0190] In some possible implementations, the microbial identification method includes: creating a differential information database according to the creation method of the above-described embodiments; obtaining the protein fingerprint spectrum of the sample to be tested; matching the protein fingerprint spectrum of the sample to be tested with a standard protein fingerprint spectrum to obtain a first matching result; determining whether the first matching result belongs to a reference object combination; if so, re-matching the protein fingerprint spectrum of the sample to be tested based on the differential information database to obtain a second matching result, and confirming the second matching result as the identification result; if not, determining whether there are at least two types of strains in the first matching result; if so, performing protein expression differential analysis on all types of strains in the first matching result to obtain protein difference information of each type of strain in the first matching result; re-matching the protein fingerprint spectrum of the sample to be tested based on the protein difference information of each type of strain to obtain a third matching result, and using the third matching result as the identification result; if not, using the first matching result as the identification result. This method can quickly and accurately obtain identification results, and when the first matching result requires further analysis but there is no corresponding information in the differential information database, protein difference information can be obtained by re-performing protein expression differential analysis, thereby enabling further analysis.

[0191] This application also provides a microbial identification system, which includes a quality analysis device, a memory, and a processor. The memory stores instructions, and the processor is used to execute the instructions to implement the database creation method of the above embodiments, or the processor is used to execute the instructions to implement the microbial identification method of the above embodiments.

[0192] [Mixed Reference Results]

[0193] In another embodiment of this application, the identification result of the test object may also include a mixed reference object, such as a mixed bacteria.

[0194] For example, the substance identification device provided in this application has a detection module for detecting the test object and obtaining its protein fingerprint spectrum, a storage module for storing substance identification-related data, and a data processing module for calling data from the storage module for data processing, wherein the data processing module is configured as follows:

[0195] The protein fingerprint of the test object is compared with the standard protein fingerprint of each reference object to obtain multiple candidate reference objects that are similar to the test object. The candidate reference objects include at least two types, and the two types include different species or different subspecies.

[0196] For example, the protein fingerprint of the test object is compared with each reference object to determine the N reference objects with the highest similarity as candidate reference objects, where N is an integer greater than 1. Based on the N reference objects, mixed information is judged, and the ordinary identification result or mixed reference result of the test object is output based on the mixed information.

[0197] In one possible implementation, when the output is a mixed reference result, the test object can be identified as a mixed bacteria. It is understood that the method for outputting ordinary identification results is basically the same as the methods in other embodiments described above. For example, the identification result is obtained by performing a second similarity comparison between the test object and N reference objects (i.e., candidate reference objects) based on protein difference information, as shown in step S203. The following mainly explains the implementation method for judging mixed information and the output of mixed reference results.

[0198] Optionally, the method for determining mixed information based on N reference objects and outputting a normal identification result or a mixed reference result is as follows: determine whether there are multiple reference objects among the N reference objects that do not belong to the same combination of reference objects. If so, output a mixed reference result; otherwise, output a normal identification result. It should be noted that during this process, the determination result of mixed information (i.e., "yes" or "no") can be output, or the determination result of mixed information can be omitted, and only the mixed reference result or normal identification result needs to be output according to this logic.

[0199] Optionally, the method for determining mixed information based on N reference objects and outputting a normal identification result or a mixed reference result is as follows: determine whether there are multiple reference objects among the N reference objects that do not belong to the same group of similar reference objects. If so, output a mixed reference result; otherwise, output a normal identification result. It should be noted that during this process, the determination result of mixed information (i.e., "yes" or "no") can be output, or the determination of mixed information can be omitted, and only the mixed reference result or normal identification result needs to be output according to this logic.

[0200] Among them, the same group of similar reference objects refers to the reference objects that meet the similarity judgment conditions among N reference objects; meeting the similarity judgment conditions means one or more of the following conditions: (a) multiple reference objects belong to the same reference object combination, (b) or the genetic information similarity of multiple reference objects reaches a certain threshold (which is called the similarity threshold), or (c) multiple reference objects belong to different subspecies within the same species.

[0201] For example, when reference objects with an average genomic genetic similarity of 94% or more are basically of the same species, reference objects with an average genomic genetic similarity of 93%, 92%, 91%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, etc. can be defined as similar reference objects, depending on the accuracy of mass spectrometry identification.

[0202] Correspondingly, if a mass spectrometer is set to be considered similar to the reference object if it meets any one of the three similarity criteria (a), (b), or (c) above, and the reference object combination in the mass spectrometer is relatively complete, then the above threshold can be set to a value close to 94%, such as 93%, 92%, 91%, etc.; while if a mass spectrometer basically does not include the reference object combination, then the above threshold is determined according to the identification accuracy of the mass spectrometer.

[0203] Optionally, based on N reference objects, the mixed information is judged and the mixed reference result is output. In the mixed reference result, at least one result is obtained by performing a second similarity comparison based on the protein difference information of multiple candidate reference objects.

[0204] Furthermore, if all candidate reference objects are mixed reference objects, the first similarity comparison result is output.

[0205] Furthermore, if not all candidate reference objects are mixed reference objects, a mixed reference result is output, which includes the result of the second similarity comparison.

[0206] For example, the method for judging mixed information based on N reference objects and outputting a normal identification result or a mixed reference result is:

[0207] If Q out of the N reference objects meet the similarity determination criteria, then the Q reference objects are a group of similar reference objects, where 1≤Q≤N;

[0208] If N = Q, then the test object is compared with the candidate reference objects of N reference objects to obtain the identification result of the test object; since the N reference objects belong to the same group of similar reference objects, it is only necessary to distinguish the similar reference objects, without considering the situation of multiple substances or multiple microorganisms, that is, the ordinary identification result is output.

[0209] If N≠Q, then output a hybrid reference result, which contains at least two categories of reference objects, and at least two categories do not belong to the same group of similar reference objects.

[0210] It is understandable that the Q reference objects do not all need to meet one of the three conditions (a), (b), and (c) above. For example, if three reference objects A, B, and C belong to the same reference object combination, B and C belong to different subspecies of the same species, or the genetic information similarity between B and C reaches a certain threshold, then A, B, and C obviously belong to the same group of similar reference objects.

[0211] Optionally, in the steps of creating or updating the combined information of the reference objects mentioned above, the update is performed based on the judgment result of the mixed information.

[0212] Specifically, the remaining steps for creating or updating the reference object combination information are the same, namely, adding update information to the reference object combination information. The update information is used to characterize the reference object combination created or updated based on N reference objects, as well as the protein fingerprint of the N reference objects.

[0213] The update based on the judgment result of mixed information refers to determining whether there are multiple reference objects that do not belong to the same group of similar reference objects among the N reference objects. If so, select any group of similar reference objects to update the reference object combination. If there are reference objects that do not belong to the same reference object combination in the same group of similar reference objects, then update each reference object in this group of similar reference objects to the same reference object combination.

[0214] In this embodiment of the application, by judging the mixed information of N reference objects, the identification result of the test object can be clearly determined when the test object is mixed with multiple microorganisms, so as to avoid the omission of the identification result of the test object.

[0215] [Examples of various output mixed reference results]

[0216] As mentioned above, the criteria for determining mixed information are whether multiple reference objects belong to the same set of reference objects or whether they belong to the same set of similar reference objects. The criteria for belonging to the same set of similar reference objects include belonging to the same set of reference objects, the genetic information similarity of multiple reference objects reaching a certain threshold, or multiple reference objects belonging to different subspecies within the same species. Of course, multiple conditions can also be selectively used for judgment. For example, two candidate reference objects may be determined not to belong to the same set of similar reference objects and their genetic information similarity may be below a certain threshold, resulting in a mixed reference result. This mixed reference result includes two candidate reference objects. It is understandable that other combinations of judgment conditions can also be used to determine whether multiple reference objects belong to the same set of similar reference objects.

[0217] The following examples illustrate the method for determining mixed information. However, the embodiments of this application do not limit the method for determining mixed information.

[0218] First, the specific implementation of "judging mixed information based on whether candidate reference objects belong to the same combination of reference objects" is as follows:

[0219] For example, if the first similarity comparison result between the test object and each reference object includes candidate reference objects A and B, and neither A nor B belongs to any combination of reference objects, then the second similarity comparison can be skipped, and the identification result can be directly output. The identification result must contain at least A and B.

[0220] When the first similarity comparison result between the test object and each reference object includes candidate reference objects A1, A2, and B, and A1 and A2 belong to a reference object combination, while B does not belong to any reference object combination, a second similarity comparison can be performed between the test object and the candidate reference objects A1, A2, or the reference object combination corresponding to A1 and A2. The second similarity comparison result and B constitute the identification result of the test object, and the identification result must contain at least B.

[0221] Optionally, when the first similarity comparison result between the test object and each reference object includes candidate reference objects A1, A2, and B, and A1 and A2 belong to one reference object combination a and B belong to another reference object combination b, a second similarity comparison can be performed between the test object and the candidate reference objects in reference object combination a, and a second similarity comparison can be performed between the test object and the candidate reference objects in reference object combination b. Based on the second similarity comparison, a mixed reference result is output, which includes at least one reference object in reference object combination a and at least one reference object in reference object combination b.

[0222] Optionally, when the first similarity comparison results between the test object and each reference object include candidate reference objects A1, A2, and A3, and A1, A2, and A3 all belong to the same group of reference objects, the identification result of the test object can be determined by comparing the second similarity between the test object and the candidate reference objects A1, A2, and A3 respectively.

[0223] Optionally, when the first similarity comparison results between the test object and each reference object include candidate reference objects A1, A2, B1, and B2, where A1 and A2 belong to the same reference object combination and B1 and B2 belong to another reference object combination, the identification result can be obtained by comparing the test object with the candidate reference objects A1, A2, B1, and B2 respectively. The identification result contains at least one of A1 and A2 and at least one of B1 and B2.

[0224] Optionally, when the first similarity comparison results between the test object and each reference object include candidate reference objects A1, A2, B1, B2, and C, where A1 and A2 belong to the same reference object combination, B1 and B2 belong to another reference object combination, and C does not belong to any reference object combination, the identification result can be obtained by comparing the test object with the candidate reference objects A1, A2, B1, and B2 respectively. The identification result must contain at least one of A1 and A2, at least one of B1 and B2, and at least one of C.

[0225] Optionally, when the first similarity comparison results between the test object and each reference object include candidate reference objects A, B, and C, and A, B, and C do not belong to the same reference object combination, the candidate reference objects A, B, and C can be directly used as the identification results of the test object.

[0226] If the first similarity comparison result between the test object and each reference object only contains A, and A does not belong to any combination of reference objects, then a second similarity comparison is not required, and the identification result is directly output.

[0227] Second, the specific implementation of "judging mixed information based on the similarity of candidate reference gene information" is as follows:

[0228] For example, when the first similarity comparison results between the test object and each reference object include candidate reference objects A and B, the gene information of A and B is obtained and the gene similarity is compared. If the gene similarity is equal to or lower than a certain threshold, a mixed reference result is output, which must contain at least A and B.

[0229] When the first similarity comparison results between the test object and each reference object include candidate reference objects A1, A2, and B, if the gene similarity between A1 and A2 is higher than a certain threshold, and the gene similarity between B and A1, and between B and A2 is equal to or lower than a certain threshold, then the output mixed reference result will contain at least B, and also contain the second similarity comparison results between the test object and A1 and A2.

[0230] Other possible scenarios are not listed here.

[0231] Third, the specific implementation logic of "judging whether candidate reference objects belong to different subspecies of the same species by mixing information" is consistent with the judgment logic of the first case, and will not be elaborated here.

[0232] Fourth, the specific implementation of "using multiple conditions to judge mixed information" includes:

[0233] Optionally, when the first similarity comparison results between the test object and each reference object include candidate reference objects A and B, if the similarity between the genes of A and B is equal to or lower than a certain threshold, and A and B do not belong to the same reference object combination or different subspecies of the same species, then a mixed reference result is output, which must contain at least A and B.

[0234] The above only illustrates some implementation methods for judging mixed information. Other methods can also be used to judge the mixed information of multiple reference objects. In other judgment methods, only one principle needs to be met, that is, if there are objects that do not belong to the same combination of reference objects or do not belong to the same group of similar reference objects, then the mixed reference result is output. Each combination of reference objects or each group of similar reference objects needs to have a corresponding result in the mixed reference result.

[0235] [Validation of Hybrid Reference Results]

[0236] In another embodiment of this application, the mixed reference results are verified based on the gene information, protein information, or protein fingerprint of the candidate reference object to eliminate the influence of mixing multiple substances on mass spectrometry detection. It should be noted that the verification of the mixed reference results generally occurs before outputting the mixed reference results.

[0237] The verification method can be one or more of the following methods:

[0238] (1) Determine whether the similarity of gene information of each reference object in the mixed reference result is lower than a certain threshold; if so, the verification is successful.

[0239] (2) Compare the gene of a certain reference object in the mixed reference results with the information of the other reference objects in the mixed reference results. Specifically, based on the gene inference of the interfering protein feature expression, remove the interfering protein feature expression in the standard protein fingerprint of the reference object (including setting the weight to zero), and then compare the protein fingerprint after removing the interfering protein feature expression with the protein fingerprint of the test protein.

[0240] For example, if the mixed reference results contain two bacteria, A and B, the gene information of bacteria A and B is compared to identify and remove interfering protein features (or the genes corresponding to these protein expressions). Based on this, a set of distinguishing feature peaks of bacteria A after excluding the interfering features of bacteria B is obtained. The set of distinguishing feature peaks is used to perform a third similarity comparison on bacteria A to verify bacteria A. Similarly, bacteria B can also be verified.

[0241] (3) Compare the protein information of a certain reference object in the mixed reference results (the set of characteristic proteins obtained from the protein data) with the protein information of the other reference objects in the mixed reference results. Specifically, remove the protein feature expressions that interfere with each other (including those with zero weight), and then compare the protein fingerprint after removing the interference protein feature expressions with the protein fingerprint of the test protein for the third similarity comparison.

[0242] (4) Compare the protein fingerprint of a reference object in the mixed reference results with the protein fingerprints of the other reference objects in the mixed reference results, remove the protein features that interfere with each other (including setting the weight to zero), and then compare the protein fingerprint after removing the interference protein features with the protein fingerprint of the test protein.

[0243] Optionally, the identification result is determined based on the result of the first similarity comparison and the verification result. Optionally, in verification method (1), if the similarity of the gene information of the corresponding species of each reference object in the mixed reference result is lower than a certain threshold, then the mixed reference result is correct. Optionally, in verification method (2) or (3) or (4), the result of the first similarity comparison and the result of the third similarity comparison are directly compared. If the two are consistent or the result of the third similarity comparison is not lower than a preset threshold, then the mixed reference result is correct. It should be noted that the consistency of the two here does not mean that the scores or similarity are the same, but only that the categories of the results are the same. For example, if the result of the first similarity comparison is A and B, and the result of the third similarity comparison is still A and B, then the result is correct.

[0244] Alternatively, this can be understood as removing protein feature expressions from the protein fingerprint map corresponding to a certain category in the mixed reference result that are identical to those of other categories in the mixed reference result, thus obtaining a mixed verification fingerprint map; then, a third similarity comparison is performed between the test object and the mixed verification fingerprint map, and the identification result is determined based on the results of the first similarity comparison and the third similarity comparison. This verification of the upcoming mixed reference result is particularly necessary when judging mixed information without considering the similarity of candidate reference object gene information.

[0245] Optionally, the method for removing protein feature expressions that are identical to other categories in the protein fingerprint of a certain category in the mixed reference result to obtain the mixed verification fingerprint is as follows:

[0246] Based on the comparison of gene information of the species corresponding to the mixed reference results with the gene information of the other reference objects for the same species, and based on gene deduction, the protein feature expressions that interfere with each other between the mixed reference object and the other reference objects are removed from the standard protein fingerprint of the mixed reference object; or,

[0247] Based on the protein data information in the protein database corresponding to the mixed reference results, the protein data information of each mixed reference object is compared, and interfering protein feature expressions are removed; or,

[0248] The standard protein fingerprints corresponding to each category in the mixed reference results are compared, and interfering characteristic peaks are removed. For example, protein feature expression identical to other references in the protein fingerprint of the mixed reference object can be removed. That is, mutual interference can be understood as identical peaks. Of course, mutual interference can also be understood as peaks that result in the same or similar detected characteristic peaks.

[0249] Display of hybrid reference results

[0250] In another embodiment of this application, the substance identification device provided by this application has a detection module for detecting the test object and obtaining its protein fingerprint spectrum, a storage module for storing substance identification-related data, a data processing module for calling data in the storage module for data processing, and a display module for outputting identification results.

[0251] The data processing module is configured to: perform a first similarity comparison between the protein fingerprint of the object to be tested and the standard protein fingerprint of each reference object to obtain the first similarity comparison result;

[0252] The display module is configured to display either a standard identification result or a mixed reference result based on the first similarity comparison result from the data processing module.

[0253] Optionally, the display module may display ordinary identification results or mixed reference results based on the first similarity comparison results of the data processing module. This may further include: the data processing module obtaining multiple candidate reference objects similar to the object to be tested based on the first similarity comparison results. The candidate reference objects include at least two types, and the two types may include different species or different subspecies.

[0254] The system determines mixed information based on multiple candidate reference objects, and outputs either a standard identification result or a mixed reference result for the test object based on this mixed information. The specific method for determining mixed information is the same as described above and will not be repeated here.

[0255] In another embodiment of this application, the output mixed reference results can also be marked and displayed.

[0256] For example, when outputting a mixed reference result, the data processing module 103 can generate a prompt message to indicate that the output result is a mixed reference result.

[0257] For example, the data processing module 103 can mark and display the mixed reference results; wherein, it can mark and display each category of reference objects in the mixed reference results, or mark and display similar reference objects in the mixed reference results, or mark reference objects in the mixed reference results that do not belong to the same combination of reference objects or the same group of similar reference objects.

[0258] For example, when the mixed reference result includes candidate reference objects A1, A2, and B, and candidate reference objects A1 and A2 are similar reference objects, while candidate reference object B is not similar to either candidate reference objects A1 or A2, then candidate reference objects A1, A2, and B can all be marked and displayed, or candidate reference objects A1 and B can be marked, or candidate reference objects A2 and B can be marked; or candidate reference objects A1 and A2 can be marked. Specifically, candidate reference objects A1 and A2 can be indicated to be similar reference objects through at least one of the marking methods of text marking, image marking, or color marking.

[0259] Preferably, in the two methods of labeling candidate reference object A1 and candidate reference object B, and labeling candidate reference object A2 and candidate reference object B, if the similarity between candidate reference object A1 and candidate reference object B is higher than the similarity between candidate reference object A2 and candidate reference object B, then candidate reference object A1 and candidate reference object B are selected to be labeled.

[0260] In this embodiment of the application, by marking the mixed reference results, it is not only clear that the output result is a mixed reference result, but also clear which combination the mixed reference result belongs to, so as to facilitate clinical judgment.

[0261] In another embodiment of this application, another method for substance identification is also provided. For example, Figure 3 is a flowchart illustrating another method for substance identification provided in an embodiment of this application. As shown in Figure 3, the method includes the following steps:

[0262] Step S301: Obtain the protein fingerprint of the object to be tested.

[0263] Step S302: Perform a first similarity comparison between the protein fingerprint of the object to be tested and the standard protein fingerprint of the reference object in the protein fingerprint library to obtain a primary target set similar to the object to be tested.

[0264] For example, the primary target set may include multiple candidate reference objects similar to the object to be tested, wherein each candidate reference object may be a secondary target in the primary target set.

[0265] Specifically, the candidate reference objects in the primary target set include at least two types, wherein the two types include different species or different subspecies.

[0266] Step S303: Based on the protein difference information of each secondary target in the primary target set, perform a second similarity comparison between the test object and the secondary targets.

[0267] For example, protein expression differential analysis can be performed on each secondary target to obtain protein difference information for each secondary target. This protein difference information can be characteristic peaks in the standard protein fingerprint of a reference object.

[0268] For example, Vibrio cholerae and Vibrio mimicus have similar protein expression. When the sample to be tested is Vibrio cholerae or Vibrio mimicus, the protein fingerprint of the sample to be tested, after similarity comparison using existing methods, has a very close similarity score to Vibrio cholerae and Vibrio mimicus, making accurate differentiation impossible. Table 1 is a protein difference information table provided in the embodiments of this application. As shown in Table 1, the primary target set can include Vibrio cholerae and Vibrio mimicus, that is, the primary target set can be a combination of similar bacteria of Vibrio cholerae and Vibrio mimicus.

[0269] Table 1. Protein Differential Information

[0270] Specifically, when the secondary target is the L31 protein, the differential characteristic peaks of Vibrio cholerae and Vibrio mimicry can be the characteristic peaks corresponding to a mass-to-charge ratio of 7956 Da and 7970 Da. The protein fingerprints of Vibrio cholerae and Vibrio mimicry can be obtained using the ESX2600 mass spectrometer from Zhongyuan Huiji.

[0271] For example, the mass-to-charge ratio (m / z value), intensity, and weight information of the differential characteristic peaks corresponding to protein difference information in the protein fingerprint spectrum can be adjusted to increase the proportion of the differential characteristic peaks in determining the similarity comparison results between the test object and the candidate reference object through the relevant data of the adjusted differential characteristic peaks.

[0272] Specifically, for each candidate reference object, the scoring weight of the differential characteristic peaks of the candidate reference object can be increased. Based on the weight and amplitude of each characteristic peak in the protein fingerprint spectrum of the candidate reference object, the similarity score between the candidate reference object and the test object is scored to obtain the similarity score between the candidate reference object and the test object. Based on the similarity score of the candidate reference object, the matching result between the test object and the candidate reference object is output.

[0273] For example, when comparing the bar-shaped peak information corresponding to the protein fingerprint of the test object with the protein fingerprint of the secondary target, a scoring operation can be performed based on the weight and amplitude of the characteristic peak when the two have the same characteristic peak, so as to obtain the similarity score between the secondary target and the test object; when the protein fingerprint of the test object shows the difference characteristic peak of the secondary target, a scoring operation can be performed based on the increased scoring weight (e.g., product coefficient).

[0274] Optionally, for each secondary target, the secondary target and the test object can be compared based on the differential characteristic peaks of the secondary target. The matching result between the test object and the secondary target can be output based on whether the protein fingerprint spectrum of the test object contains the same characteristic peaks as the differential characteristic peaks.

[0275] Step S304: Based on the second similarity comparison results between the object to be tested and the secondary target, determine the identification result of the object to be tested.

[0276] For example, Figure 4 is a schematic diagram of the identification result of the test object provided in the embodiment of this application. As shown in Figure 4, when the primary target set is a combination of similar bacteria of Vibrio cholerae and Vibrio mimicry, the similarity score between the candidate reference object and the test object can be obtained by increasing the scoring weight of each difference feature peak. Referring to the similarity score shown in Figure 4, the identification result of the test object can be Vibrio mimicry.

[0277] It should be noted that, as shown in Table 1 above, the protein differences between Vibrio cholerae and Vibrio mimicry are quite similar (i.e., the differences in their protein fingerprints are small). This makes it difficult for existing identification methods to distinguish between Vibrio cholerae and Vibrio mimicry. However, the method described in this application, which determines the similarity scores between candidate reference objects and test objects based on the differential protein information of each test object, can make the differences between the test objects and candidate reference objects more obvious.

[0278] Referring now to FIG5, which illustrates a schematic diagram of a computer device suitable for implementing embodiments of the present application, the computer device 500 includes a central processing unit (CPU) 501, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 502 or a program loaded from a storage portion 505 into a random access memory (RAM) 503. The RAM 503 also stores various programs and data required for the system's operating instructions. The CPU 501, ROM 502, and RAM 503 are interconnected via a bus 504. An input / output (I / O) interface 505 is also connected to the bus 504.

[0279] The following components are connected to the input / output (I / O) interface 505: an input section 506 including a keyboard, mouse, etc.; an output section 507 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and speakers, etc.; a storage section 508 including a hard disk, etc.; and a communication section 509 including a network interface card such as a LAN card, modem, etc. The communication section 509 performs communication processing via a network such as the Internet. A drive 510 is also connected to the input / output (I / O) interface 505 as needed. A removable medium 511, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on the drive 510 as needed so that computer programs read from it can be installed into the storage section 508 as needed.

[0280] Specifically, according to embodiments of this application, the processes described above with reference to any of the flowcharts Figures 2-3 can be implemented as computer software programs. For example, embodiments of this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program contains program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication section 509, and / or installed from removable medium 511. When the computer program is executed by central processing unit (CPU) 501, it performs the functions defined in the system of this application.

[0281] It should be noted that the computer-readable medium shown in this application can be a computer-readable signal medium or a computer-readable storage medium, or any combination thereof. A computer-readable storage medium can be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this application, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this application, a computer-readable signal medium can include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. Computer-readable signal media can also be any computer-readable medium compatible with computer-readable storage media, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to: wireless, wire, optical fiber, RF, etc., or any suitable combination thereof.

[0282] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operational instructions of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two connected blocks may actually be executed substantially in parallel, or they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified functions or operational instructions, or using a combination of dedicated hardware and computer instructions.

[0283] The units or modules described in the embodiments of this application can be implemented in software or hardware. The described units or modules can also be housed in a processor; for example, a processor may be described as including a semantic extraction unit, a weight allocation unit, and a determination unit. The names of these units or modules do not necessarily constitute a limitation on the unit or module itself.

[0284] In another aspect, this application also provides a computer-readable storage medium, which may be included in the computer device described in the above embodiments, or may exist independently and not assembled into the computer device. The computer-readable storage medium stores one or more programs that, when used by one or more processors, execute the methods described in this application. For example, the steps of any of the methods shown in Figures 2-3 may be executed.

[0285] This application provides a computer program product including instructions that, when executed, cause the method described in this application to be performed. For example, the steps of any of the methods shown in Figures 2-3 can be executed.

[0286] The above description is merely a preferred embodiment of this application and an explanation of the technical principles employed. Those skilled in the art should understand that the scope of disclosure in this application is not limited to technical solutions formed by specific combinations of the above-described technical features, but should also cover other technical solutions formed by arbitrary combinations of the above-described technical features or their equivalents without departing from the foregoing disclosed concept. For example, technical solutions formed by substituting the above features with (but not limited to) technical features with similar functions disclosed in this application.

Claims

1. A substance identifying apparatus characterized by comprising: The device includes a detection module for detecting the test object and obtaining its protein fingerprint, a storage module for storing data related to substance identification, and a data processing module for calling data from the storage module for data processing. The data processing module is configured as follows: The protein fingerprint of the test object is compared with the standard protein fingerprint of each reference object to obtain multiple candidate reference objects similar to the test object. The candidate reference objects include at least two types, and the two types include different species or different subspecies. Determine the protein difference information of each of the candidate reference objects; the protein difference information is the protein feature expression of the candidate reference object, and the protein difference information is used to distinguish the candidate reference object from one or more other candidate reference objects; Based on the protein difference information of the candidate reference objects, a second similarity comparison is performed between the test object and the candidate reference objects, and the identification result of the test object is determined based on the comparison results of each candidate reference object.

2. The apparatus of claim 1, wherein, The step of performing a first similarity comparison between the protein fingerprint of the test object and the standard protein fingerprint of each reference object to obtain multiple candidate reference objects similar to the test object includes: The protein fingerprint of the object to be tested is compared with the standard protein fingerprint of each reference object to obtain the similarity result between each reference object and the object to be tested; the similarity result corresponds to the similarity level. Based on the similarity results, N reference objects with the highest similarity are obtained, and the candidate reference objects are determined based on the N reference objects; where N is an integer greater than 1.

3. The apparatus of claim 2, wherein, The step of determining the candidate reference object based on the N reference objects includes: The N reference objects are selected as candidate reference objects; or... Select reference objects of the same type from the N reference objects as candidate reference objects; or... If there is a target reference object belonging to the reference object combination among the N reference objects, then the candidate reference object is determined based on the reference object combination and the N reference objects; wherein, the reference object combination includes multiple reference objects with similar protein feature expression.

4. The apparatus of claim 2, wherein, The data processing module is also configured to: Add update information to the reference object combination information, wherein the update information is used to characterize the reference object combination created or updated based on the N reference objects, and the protein fingerprint of the N reference objects; The reference object combination information is used to characterize multiple reference object combinations and the protein fingerprint map corresponding to each reference object combination. The reference object combination includes multiple reference objects with similar protein feature expression.

5. The apparatus of claim 1, wherein, The determination of protein difference information for each of the candidate reference objects includes: Protein expression differential analysis was performed on each of the candidate reference objects to obtain protein difference information for each candidate reference object.

6. The device of any one of claims 1-5, wherein, The step of determining the protein difference information of each of the candidate reference objects, wherein the protein difference information is the protein feature expression of the candidate reference object, and the protein difference information is used to distinguish the candidate reference object from one or more other candidate reference objects; Based on the protein difference information of the candidate reference objects, a second similarity comparison is performed between the test object and the candidate reference objects, and the identification result of the test object is determined based on the comparison results of each candidate reference object, including: The genomes of each candidate reference object are obtained; characteristic proteins of the candidate reference objects are determined based on their genomes; differentially expressed proteins of each candidate reference object are determined based on their characteristic proteins; differentially expressed characteristic peaks of each candidate reference object are determined based on the characteristic peaks corresponding to the differentially expressed proteins in the corresponding protein fingerprint profiles; a second similarity comparison is performed between the test object and the candidate reference objects based on the differentially expressed characteristic peaks of the candidate reference objects; and the identification result of the test object is determined based on the comparison results of each candidate reference object; or... The genomes of each candidate reference object are obtained; differentially expressed genomes are identified among the candidate reference objects based on their genomes; differentially expressed proteins are identified among the candidate reference objects based on their differentially expressed genomes; differentially expressed characteristic peaks are identified among the candidate reference objects based on the characteristic peaks corresponding to the differentially expressed proteins in their respective protein fingerprints; a second similarity comparison is performed between the test object and the candidate reference objects based on the differentially expressed characteristic peaks; and the identification result of the test object is determined based on the comparison results of each candidate reference object. Alternatively, The genomes of each candidate reference object are obtained; characteristic proteins of the candidate reference objects are determined based on their genomes; characteristic peaks of each candidate reference object in corresponding protein fingerprints are determined based on their characteristic proteins; differential characteristic peaks of each candidate reference object are determined based on their characteristic peaks in corresponding protein fingerprints; a second similarity comparison is performed between the test object and the candidate reference objects based on the differential characteristic peaks of the candidate reference objects; and the identification result of the test object is determined based on the comparison results of each candidate reference object; or... The method involves: obtaining characteristic proteins of the candidate reference objects from a protein database; determining differentially expressed proteins of each candidate reference object based on their characteristic proteins; determining differential characteristic peaks of each candidate reference object based on the characteristic peaks corresponding to the differentially expressed proteins in the corresponding protein fingerprints; performing a second similarity comparison between the test object and the candidate reference objects based on the differential characteristic peaks of the candidate reference objects; and determining the identification result of the test object based on the comparison results of each candidate reference object. Alternatively... The method involves: obtaining characteristic proteins of the candidate reference objects from a protein database; determining the characteristic peaks of each candidate reference object in the corresponding protein fingerprint based on these characteristic proteins; determining the differential characteristic peaks of each candidate reference object based on these differential characteristic peaks; performing a second similarity comparison between the test object and the candidate reference objects based on these differential characteristic peaks; and determining the identification result of the test object based on the comparison results of each candidate reference object. Alternatively... The protein fingerprints of each candidate reference object are compared to obtain the differential characteristic peaks of each candidate reference object. Based on the differential characteristic peaks of the candidate reference objects, a second similarity comparison is performed between the test object and the candidate reference objects. Based on the comparison results of each candidate reference object, the identification result of the test object is determined. The differential feature peaks of the candidate reference object, or combinations of differential feature peaks formed by the differential feature peaks of the candidate reference object, are used to uniquely characterize the candidate reference object.

7. The apparatus of claim 6, wherein, The differential characteristic peaks of the candidate reference object are the characteristic peaks corresponding to the gene segments of the candidate reference object that are mutually exclusive with one or more other candidate reference objects.

8. The apparatus of claim 6, wherein, Determining the protein difference information for each of the candidate reference objects includes: Ignore the same protein feature expression in the protein fingerprint of each candidate reference object, and determine the differential feature peaks of each candidate reference object based on the protein fingerprint after ignoring the same protein feature expression.

9. The apparatus of claim 6, wherein, The step of obtaining the genomes of each of the candidate reference objects, determining the characteristic proteins of the candidate reference objects based on the genomes of the candidate reference objects, determining the differentially expressed proteins of each of the candidate reference objects based on the characteristic proteins of each of the candidate reference objects, and determining the differential characteristic peaks of each of the candidate reference objects based on the characteristic peaks corresponding to the differentially expressed proteins in the corresponding protein fingerprints, further includes: The genome is identified as having protein expression loss at differential characteristic peaks, and the differential characteristic peaks in the protein fingerprint are corrected based on the protein expression loss.

10. The apparatus of claim 6, wherein, The step of performing a second similarity comparison between the test object and the candidate reference objects based on the protein difference information of the candidate reference objects, and determining the identification result of the test object based on the comparison results of each candidate reference object, includes: For each candidate reference object, the scoring weight of the difference characteristic peak of the candidate reference object is increased; Based on the weights and amplitudes of each characteristic peak in the protein fingerprint of the candidate reference object, a similarity score is assigned to the candidate reference object and the object to be tested to obtain the similarity result between the object to be tested and the candidate reference object.

11. The apparatus of claim 6, wherein, The step of performing a second similarity comparison between the test object and the candidate reference objects based on the protein difference information of the candidate reference objects, and determining the identification result of the test object based on the comparison results of each candidate reference object, includes: For each candidate reference object, the candidate reference object and the object to be tested are compared based on the difference feature peaks of the candidate reference object; Based on whether the protein fingerprint of the test object contains the same characteristic peak as the differential characteristic peak, the matching result between the test object and the candidate reference object is output.

12. The apparatus according to any one of claims 1-11, characterized in that, The plurality of candidate reference objects includes at least multiple subspecies belonging to the same bacterial species, and the identification result of the test object corresponds to the subspecies; or, The plurality of candidate reference objects includes at least multiple strains belonging to the same genus, and the identification result of the test object corresponds to the bacterial species; or, The multiple candidate reference objects include at least multiple species belonging to the same complex microbial community, and the identification result to be tested corresponds to the microbial species; or, The multiple candidate reference objects include at least multiple bacteria belonging to the same list of similar bacteria, and the identification result of the object to be tested corresponds to the bacteria.

13. The apparatus of any one of claims 1-5, 7-11, wherein, The step of determining the protein difference information of each of the candidate reference objects, wherein the protein difference information is the protein feature expression of the candidate reference object, and the protein difference information is used to distinguish the candidate reference object from one or more other candidate reference objects; Based on the protein difference information of the candidate reference objects, a second similarity comparison is performed between the test object and the candidate reference objects, and the identification result of the test object is determined based on the comparison results of each candidate reference object, including: The genomes of each candidate reference object are obtained; characteristic proteins of the candidate reference objects are determined in real time based on their genomes; differentially expressed proteins of each candidate reference object are determined based on their characteristic proteins; differentially expressed characteristic peaks of each candidate reference object are determined based on the characteristic peaks corresponding to the differentially expressed proteins in the corresponding protein fingerprint profiles; a second similarity comparison is performed between the test object and the candidate reference objects based on the differentially expressed characteristic peaks of the candidate reference objects; and the identification result of the test object is determined based on the comparison results of each candidate reference object; or... The genomes of each candidate reference object are obtained; differentially expressed genomes among the candidate reference objects are identified in real time based on their genomes; differentially expressed proteins among the candidate reference objects are identified based on their differentially expressed genomes; differentially expressed characteristic peaks among the candidate reference objects are determined based on the characteristic peaks corresponding to the differentially expressed proteins in their respective protein fingerprints; a second similarity comparison is performed between the test object and the candidate reference objects based on the differentially expressed characteristic peaks; and the identification result of the test object is determined based on the comparison results of each candidate reference object. Alternatively, The genomes of each candidate reference object are obtained; based on the genomes of the candidate reference objects, the characteristic proteins of the candidate reference objects are determined in real time; based on the characteristic proteins of each candidate reference object, the characteristic peaks of each candidate reference object in the corresponding protein fingerprint are determined; based on the characteristic peaks of each candidate reference object in the corresponding protein fingerprint, the differential characteristic peaks of each candidate reference object are determined; based on the differential characteristic peaks of the candidate reference objects, a second similarity comparison is performed between the test object and the candidate reference objects; and based on the comparison results of each candidate reference object, the identification result of the test object is determined; or... Feature proteins of the candidate reference objects are obtained from a protein database. Differential proteins of each candidate reference object are determined in real time based on their feature proteins. Differential feature peaks of each candidate reference object are determined based on the characteristic peaks corresponding to the differential proteins in the corresponding protein fingerprint. A second similarity comparison is performed between the test object and the candidate reference objects based on these differential feature peaks. The identification result of the test object is determined based on the comparison results of each candidate reference object. Alternatively... The characteristic proteins of the candidate reference objects are obtained from a protein database. Based on the characteristic proteins of each candidate reference object, the characteristic peaks corresponding to each candidate reference object in the corresponding protein fingerprint are determined in real time. Based on the characteristic peaks corresponding to each candidate reference object in the corresponding protein fingerprint, the differential characteristic peaks of each candidate reference object are determined. Based on the differential characteristic peaks of the candidate reference objects, a second similarity comparison is performed between the test object and the candidate reference objects. Based on the comparison results of each candidate reference object, the identification result of the test object is determined; or... The protein fingerprints of each candidate reference object are compared in real time to obtain the differential characteristic peaks of each candidate reference object. Based on the differential characteristic peaks of the candidate reference objects, a second similarity comparison is performed between the test object and the candidate reference objects. Based on the comparison results of each candidate reference object, the identification result of the test object is determined.

14. The apparatus of any one of claims 1-5, 7-11, wherein, The step of determining the protein difference information of each of the candidate reference objects, wherein the protein difference information is the protein feature expression of the candidate reference object, and the protein difference information is used to distinguish the candidate reference object from one or more other candidate reference objects; Based on the protein difference information of the candidate reference objects, a second similarity comparison is performed between the test object and the candidate reference objects, and the identification result of the test object is determined based on the comparison results of each candidate reference object, including: The process involves acquiring protein difference information for each candidate reference object in a difference information database, performing a second similarity comparison between the test object and the candidate reference objects based on the protein difference information of the candidate reference objects, and determining the identification result of the test object based on the comparison results of each candidate reference object; wherein the difference information database stores at least the protein difference information of the candidate reference objects, and the protein difference information at least contains difference feature peaks; the difference information database is established based on at least one of the following methods: Obtain the genomes of each candidate reference object; determine the characteristic proteins of the candidate reference objects based on their genomes; determine the differentially expressed proteins of each candidate reference object based on their characteristic proteins; and determine the differential characteristic peaks of each candidate reference object based on the characteristic peaks corresponding to the differentially expressed proteins in the corresponding protein fingerprints; or... Obtain the genomes of each candidate reference object; identify differentially expressed genomes among the candidate reference objects based on their genomes; identify differentially expressed proteins among the candidate reference objects based on their differentially expressed genomes; and determine the differential characteristic peaks of each candidate reference object based on the characteristic peaks corresponding to the differentially expressed proteins in their respective protein fingerprints; or... Obtain the genome of each candidate reference object; determine the characteristic proteins of the candidate reference objects based on their genomes; determine the characteristic peaks of each candidate reference object in the corresponding protein fingerprint based on their characteristic proteins; and determine the differential characteristic peaks of each candidate reference object based on their characteristic peaks in the corresponding protein fingerprint. Alternatively, Characteristic proteins of the candidate reference objects are obtained from a protein database; differentially expressed proteins of each candidate reference object are determined based on their characteristic proteins; and differentially expressed characteristic peaks of each candidate reference object are determined based on the characteristic peaks corresponding to these differentially expressed proteins in the corresponding protein fingerprints. Alternatively... The characteristic proteins of the candidate reference objects are obtained from a protein database. Based on the characteristic proteins of each candidate reference object, the characteristic peaks corresponding to each candidate reference object in the corresponding protein fingerprint are determined. Based on the characteristic peaks corresponding to each candidate reference object in the corresponding protein fingerprint, the differential characteristic peaks of each candidate reference object are determined; or... The protein fingerprints of each candidate reference object are compared to obtain the differential characteristic peaks of each candidate reference object.

15. A method of identifying a substance, characterized by, include: The protein fingerprint of the test object is compared with the standard protein fingerprint of each reference object to obtain multiple candidate reference objects similar to the test object. The candidate reference objects include at least two types, and the two types include different species or different subspecies. Determine the protein difference information of each of the candidate reference objects; the protein difference information is the protein feature expression of the candidate reference object, and the protein difference information is used to distinguish the candidate reference object from one or more other candidate reference objects; Based on the protein difference information of the candidate reference objects, a second similarity comparison is performed between the test object and the candidate reference objects, and the identification result of the test object is determined based on the comparison results of each candidate reference object.

16. A method of creating a database, characterized by, include: Identify multiple reference object combinations, the reference object combinations including multiple reference objects with similar protein feature expression; For each of the aforementioned reference object combinations, protein expression differential analysis is performed on each reference object in the reference object combination to obtain protein difference information of each reference object in the reference object combination; the protein difference information is the protein feature expression of the reference object and is used to distinguish the reference object from the other reference objects in the reference object combination; A difference information database is created based on the correspondence between each of the aforementioned reference object combinations and the protein difference information of each reference object in the aforementioned reference object combinations.

17. The method of claim 16, wherein, The protein expression differential analysis of each reference object in the reference object combination is performed to obtain at least one of the following protein difference information for each reference object in the reference object combination: Obtain the genome of each reference object in the reference object combination, and determine the protein difference information of each reference object based on the genome; Based on a protein database, characteristic proteins of each reference object in the reference object combination are obtained, and protein difference information of each reference object is determined based on the characteristic proteins. Based on the protein fingerprints of each reference object in the reference object combination, the protein difference information of each reference object is determined.

18. The method of claim 17, wherein, The determination of protein difference information for each reference object based on the genome includes: Based on the genome of the reference object, characteristic proteins of the reference object are determined; based on the characteristic proteins of each reference object, differentially expressed proteins of each reference object are determined; and the characteristic peaks corresponding to the differentially expressed proteins in the corresponding protein fingerprints are determined as the differentially expressed characteristic peaks of each reference object; or... Based on the genome of the reference object, differentially expressed genomes are identified among the reference objects; differentially expressed proteins are identified among the reference objects based on the differentially expressed genomes; and the characteristic peaks corresponding to the differentially expressed proteins in the corresponding protein fingerprints are identified as the differential characteristic peaks of each reference object; or... Based on the genome of the reference object, the characteristic proteins of the reference object are determined, and based on the characteristic proteins of each reference object, the characteristic peaks of each reference object in the corresponding protein fingerprint are determined, and the characteristic peaks of each reference object in the corresponding protein fingerprint are determined as the differential characteristic peaks of each reference object. The difference feature peaks of the reference object, or combinations of difference feature peaks formed by the difference feature peaks of the reference object, are used to uniquely characterize the reference object.

19. The method of claim 17, wherein, The step of determining the protein difference information of each of the reference objects based on the characteristic protein includes: The characteristic peaks corresponding to the characteristic proteins of each reference object in the corresponding protein fingerprint spectrum are determined as the differential characteristic peaks of each reference object; or... Based on the characteristic proteins of each of the reference objects, the differential proteins of each of the reference objects are determined, and the characteristic peaks corresponding to the differential proteins in the corresponding protein fingerprints are determined as the differential characteristic peaks of each of the reference objects.

20. The method of any one of claims 18-19, wherein, The differential characteristic peaks of the reference object are the characteristic peaks corresponding to gene fragments that are mutually exclusive between the reference object and one or more other reference objects.

21. The method according to any one of claims 16-19, characterized by, The method for determining the combination of multiple reference objects includes at least one of the following two methods: Obtain standard protein fingerprints of multiple reference objects, and identify reference objects with similar standard protein fingerprints as a group of reference objects; Based on the genetic or kinship information of the multiple reference objects, reference objects whose genetic information similarity reaches a preset threshold or whose kinship information is consistent are identified as a reference object combination.

22. A method of identifying a microorganism, characterized by, include: Obtain the protein fingerprint of the sample to be tested, and match the protein fingerprint of the sample to be tested with the standard protein fingerprint to obtain the first matching result; Determine whether the first matching result belongs to the reference object combination; if so, rematch the protein fingerprint spectrum of the sample to be tested based on the difference information database created by the database creation method described in any one of claims 16-21 to obtain the second matching result, and confirm the second matching result as the identification result; if not, confirm the first matching result as the identification result.

23. A method of identifying a microorganism, characterized by, include: Obtain the protein fingerprint of the sample to be tested, and match the protein fingerprint of the sample to be tested with the standard protein fingerprint to obtain the first matching result; Determine whether the first matching result belongs to the reference object combination; If so, the protein fingerprint spectrum of the sample to be tested is re-matched based on the difference information database created by the database creation method described in any one of claims 16-21 to obtain a second matching result, and the second matching result is confirmed as the identification result; If not, determine whether there are at least two types of strains in the first matching result. If so, perform protein expression differential analysis on all types of strains in the first matching result to obtain the protein difference information of each type of strain in the first matching result. Based on the protein difference information of each type of strain, rematch the protein fingerprint of the sample to be tested to obtain the third matching result. Use the third matching result as the identification result. If not, use the first matching result as the identification result.

24. A microbial identification system comprising: The microbial identification system includes a quality analysis device, a memory, and a processor. The memory stores instructions, and the processor executes the instructions to implement the steps performed by the substance identification device according to any one of claims 1-14, or the database creation method according to any one of claims 16-21, or the processor executes the instructions to implement the microbial identification method according to claim 15, 22, or 23.

25. A computer device comprising a memory and a processor, the memory storing instructions, the processor executing the instructions to implement the steps performed by the substance identification apparatus of any one of claims 1-14, or the database creation method of any one of claims 16-21, or the processor executing the instructions to implement the microbial identification method of claim 15, 22, or 23.

26. A computer program product, characterised in that, The computer program product includes instructions that, when executed, cause the steps performed by the substance identification apparatus as claimed in any one of claims 1-14 to be implemented, and the method as claimed in any one of claims 15-23 to be implemented.

27. A computer readable storage medium having stored thereon a computer program, characterized in that, When the computer program is executed by the processor, it implements the steps performed by the substance identification device as described in any one of claims 1-14 and the method described in any one of claims 15-23.

28. A substance identifying apparatus characterized by comprising: The system includes a detection module for detecting the test object and obtaining its protein fingerprint, a storage module for storing data related to substance identification, and a data processing module for retrieving data from the storage module for data processing. The data processing module is configured as follows: The protein fingerprint of the test object is compared with the standard protein fingerprint of each reference object to obtain multiple candidate reference objects similar to the test object. The candidate reference objects include at least two types, which include different species or different subspecies. Based on the candidate reference objects, mixed information is judged, and the ordinary identification result or mixed reference result of the test object is output based on the mixed information.

29. The device of claim 28, wherein the at least one substance is a drug of abuse. The step of determining mixed information based on the candidate reference object and outputting a general identification result or a mixed reference result for the test object based on the mixed information includes: Determine whether there are multiple reference objects that do not belong to the same group of similar reference objects among the candidate reference objects. If so, output the mixed reference result; otherwise, output the ordinary identification result. Wherein, the same group of similar reference objects refers to multiple reference objects that meet the similarity determination conditions; meeting the similarity determination conditions means one or more of the following conditions: (a) multiple reference objects belong to the same reference object group; (b) the genetic information similarity of multiple reference objects reaches the similarity threshold; (c) multiple reference objects belong to different subspecies within the same species.

30. The device of claim 28, wherein the at least one substance is a biological substance. If the mixed reference result is output after determining the mixed information based on the candidate reference objects, it also includes: If none of the candidate reference objects belong to the same group of similar reference objects, then a mixed reference result is output based on the first similarity comparison. If at least two of the candidate reference objects belong to the same group of similar reference objects, the output mixed reference result includes at least the result based on the second similarity comparison. The first similarity comparison object is the standard protein fingerprint map of each reference object, and the second similarity comparison is a comparison based on the protein difference information of the candidate reference object. The protein difference information is the protein feature expression of the candidate reference object, and the protein difference information is used to distinguish the candidate reference object from one or more other candidate reference objects.

31. The device of claim 28, wherein the at least one substance is a biological substance. The reference object combination is created or updated based on the similarity determination criteria.

32. The substance identification device of claim 28, wherein, Based on the gene information, protein information, or protein fingerprint of the candidate reference objects, the mixed reference results are verified. The verification method can be one or more of the following methods: Determine whether the similarity of gene information of each reference object in the mixed reference result is lower than a certain threshold. If so, the result is correct. The gene corresponding to a certain reference object in the mixed reference result is compared with the information of the other reference objects in the mixed reference result. The expression of mutually interfering proteins is deduced based on the gene. The expression of mutually interfering proteins in the standard protein fingerprint of the reference object is removed. The protein fingerprint after removing the expression of interfering proteins is compared with the protein fingerprint of the object to be tested. If the result of the third similarity comparison is consistent with the mixed reference result, the result is correct. The protein information of a certain reference object in the mixed reference result is compared with the protein information of the other reference objects in the mixed reference result. Interfering protein feature expressions are removed. The protein fingerprint after removing interfering protein feature expressions is compared with the protein fingerprint of the object to be tested. If the result of the third similarity comparison is consistent with the mixed reference result, the result is correct. The protein fingerprint of a reference object in the mixed reference result is compared with the protein fingerprints of the other reference objects in the mixed reference result. Interfering protein features are removed. Then, the protein fingerprint after removing interfering protein features is compared with the protein fingerprint of the object to be tested. If the result of the third similarity comparison is consistent with the mixed reference result, the result is correct.

33. The device of claim 28, wherein the at least one substance is a biological substance. It also includes a display module for outputting identification results; the display module marks and displays the hybrid reference results.

34. The device of claim 33, wherein the at least one substance is a drug. The display module's labeling and displaying of the hybrid reference results includes: labeling and displaying each category of reference object in the hybrid reference results; or, Similar reference objects in the hybrid reference results are marked and displayed; or, Reference objects that do not belong to the same combination of reference objects or the same group of similar reference objects in the mixed reference results are marked and displayed.