Method and apparatus for identifying difficult-to-differentiate bacterial species, device, and medium
By constructing a micro-difference analysis model and a dynamic matching algorithm, the accuracy and efficiency issues of mass spectrometers in identifying similar microorganisms were solved, achieving efficient identification of easily confused bacteria and making it suitable for accurate differentiation and identification of a variety of bacteria.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- AUTOBIO DIAGNOSTICS CO LTD
- Filing Date
- 2025-12-24
- Publication Date
- 2026-07-02
AI Technical Summary
Existing matrix-assisted laser desorption/ionization time-of-flight mass spectrometers have difficulty accurately distinguishing and improving identification efficiency when identifying similar microorganisms, especially when the microorganisms are closely related or have similar spectra. This often results in multiple bacteria having scores in the same confidence interval, making accurate identification difficult.
Micro-difference analysis models and dynamic matching algorithms are used to identify easily confused bacteria. By constructing micro-difference analysis models and dynamic matching algorithms, and using them separately or in combination, the analysis and identification of easily confused bacteria are carried out. This includes constructing chain analysis models to improve the accuracy and efficiency of identification.
It improves the accuracy and efficiency of matrix-assisted laser desorption/ionization time-of-flight mass spectrometry in identifying similar microorganisms, reduces the complexity of identifying easily confused bacteria, and is suitable for the identification needs of different easily confused bacteria.
Smart Images

Figure CN2025145040_02072026_PF_FP_ABST
Abstract
Description
A method, apparatus, equipment and medium for identifying easily confused bacteria
[0001] This application claims priority to Chinese Patent Application No. 202411914152.2, filed on December 24, 2024, entitled "A Method, Apparatus, Device and Medium for Identifying Easily Confounded Bacteria"; Chinese Patent Application No. 202411914161.1, filed on December 24, 2024, entitled "A Dynamic Calculation Identification Method, Apparatus, Device and Medium for Easily Confounded Bacteria"; and Chinese Patent Application No. 202411914166.4, filed on December 24, 2024, entitled "A Method, Apparatus, Device and Medium for Identifying Easily Confounded Bacteria Based on a Chain Analysis Model", the entire contents of which are incorporated herein by reference. Technical Field
[0002] This invention relates to the field of microbial identification technology, and in particular to a method, apparatus, equipment and medium for identifying easily confused bacteria.
[0003] This invention relates to the field of microbial identification technology, and in particular to a dynamic computational identification method, apparatus, equipment, and medium for easily confused bacteria.
[0004] This invention relates to the field of microbial identification technology, and in particular to a method, apparatus, equipment and medium for identifying easily confused bacteria based on a chain analysis model. Background Technology
[0005] Currently, the identification of microorganisms using MALDI-TOF MS (Matrix-Assisted Laser Desorptio12 / Io12izatio12 Time Of Flight Mass Spectrometry) mainly relies on fingerprint images. A certain matching algorithm is used to search the database, and the identification result is given after similarity scoring. There are several types of widely used algorithms: (1) Expert spectrum identification system based on species-level database construction, which adopts a weight matrix algorithm and finally scores the similarity based on the weight similarity to give the similarity percentage; (2) Super spectrum comparison algorithm, which compares the spectrum of the bacteria to be tested with all reference spectra one by one and gives the reference identification result; (3) Unsupervised pattern matching algorithm, which defines the results of a certain score range as species reliable, genus reliable and unreliable results based on big data calculation. However, regardless of whether a weight matrix, a supermap, or unsupervised pattern matching is used, to ensure that the instrument can still provide relatively accurate results under the influence of various factors such as current industrial manufacturing, experimental conditions, and sample conditions, scoring intervals (similarity percentages) are used when judging the spectral results. A certain interval is defined as reliable for the species, a certain interval for the genus, and a certain interval for unreliable results. Under this premise, a situation inevitably arises where two or more microorganisms are closely related or their collected spectral patterns are similar (i.e., the positions of high-intensity peaks in the spectral patterns are basically the same). Evaluating the spectral patterns using the current algorithm often results in the final evaluation results showing that the scores of several similar bacteria fall within the same confidence interval, making accurate identification difficult.
[0006] As can be seen from the above, how to improve the accuracy and efficiency of matrix-assisted laser desorption / ionization time-of-flight mass spectrometry in identifying similar microorganisms is a problem that needs to be solved in this field. Summary of the Invention
[0007] In view of this, the purpose of this invention is to provide a method, apparatus, equipment, and medium for identifying easily confused microorganisms, which can improve the accuracy and efficiency of matrix-assisted laser desorption / ionization time-of-flight mass spectrometry (MADS) for identifying similar microorganisms. The specific solution is as follows:
[0008] Firstly, this application discloses a method for identifying easily confused bacteria, including:
[0009] Obtain the mass spectrometry identification results sent by the original mass spectrometer, determine whether there are easily confused bacteria in the mass spectrometry identification results, and if there are easily confused bacteria in the mass spectrometry identification results, determine whether there is a micro-difference analysis model with the same type as the easily confused bacteria in the preset model library.
[0010] If a micro-difference analysis model of the same type as the easily confused bacteria exists in the preset model library, then the micro-difference analysis model is retrieved to analyze and identify the easily confused bacteria in order to obtain the model analysis and identification results.
[0011] If there is no micro-difference analysis model in the preset model library that is the same type as the easily confused bacteria, then a dynamic matching algorithm is used to perform dynamic calculation identification on the easily confused bacteria, or a dynamic matching algorithm is used and the micro-difference analysis model is retrieved to perform dynamic calculation identification on the easily confused bacteria, so as to obtain the dynamic calculation identification result.
[0012] Optionally, before determining whether a micro-difference analysis model of the same type as the easily confused bacteria exists in the preset model library, the method further includes:
[0013] Each micro-difference analysis model was constructed using the micro-difference analysis method;
[0014] Each of the aforementioned micro-difference analysis models is saved to the model library.
[0015] Optionally, the construction of each micro-difference analysis model using the micro-difference analysis method includes:
[0016] Historical data is acquired, and the historical data is analyzed using the eigenvalue analysis method and a preset eigenvalue analysis system to obtain fingerprint profiles of each bacterial species. The fingerprint profiles of each bacterial species are then processed and subjected to big data analysis to construct micro-difference analysis models.
[0017] Alternatively, a spectral trend analysis method and an artificial intelligence learning method can be used to autonomously learn the bacterial species types in the historical data to obtain the spectral trend differences between different groups. A reference spectral trend differentiation model can be constructed based on the spectral trend differences. The reference spectral trend differentiation model can be verified and corrected to obtain each of the micro-difference analysis models.
[0018] Optionally, the differential analysis model includes a chain analysis model; the construction process of the chain analysis model includes:
[0019] Obtain the protein fingerprint of a known strain that is easily confused with the mass spectrometer, and perform noise reduction on the protein fingerprint to obtain the processed protein fingerprint.
[0020] Peak multiple analysis and comparison are performed on different species in the processed protein fingerprint to obtain each unique peak corresponding to different species. Each unique peak is classified using a preset binary chain analysis model to construct a chain analysis model. The chain analysis model is then trained to obtain the target chain analysis model.
[0021] Optionally, the step of performing peak multiplexing and comparison on different species in the processed protein fingerprint to obtain the unique peaks corresponding to each species includes:
[0022] The processed protein fingerprint was analyzed using a pre-defined typing software, and multiple peak comparisons were performed on different species in the processed protein fingerprint to obtain the unique peak values corresponding to each species.
[0023] Optionally, training the chain analysis model to obtain the target chain analysis model includes:
[0024] Obtain historical species protein fingerprint peaks, and use the historical species protein fingerprint peaks to train the chain analysis model to obtain training results;
[0025] Based on the training results, the parameters in the chain analysis model are modified and adjusted to obtain the target chain analysis model.
[0026] Optionally, the process of using a dynamic matching algorithm to dynamically identify the easily confused bacteria is as follows:
[0027] Set an initial tolerance value for mass spectrum matching, and match the mass spectra of easily confused bacteria to be matched with the mass spectrometry database to obtain the initial matching result, initial peak matching status and database spectrum corresponding to the initial tolerance value;
[0028] Using the initial matching result and the initial peak matching situation, it is determined whether the initial tolerance value meets the preset convergence condition. If the initial tolerance value meets the preset convergence condition, the initial matching result is used as the dynamic calculation identification result of the easily confused bacteria.
[0029] If the initial tolerance value does not meet the preset convergence condition, the initial matching result and the initial peak matching situation are calculated to obtain the optimized tolerance value.
[0030] The database spectrum is matched based on the optimized tolerance value to obtain the current matching result and the current peak matching status. The process of judging the convergence condition is repeated until the optimized tolerance value meets the convergence condition. The current matching result is then used as the dynamic calculation and identification result of the easily confused bacteria.
[0031] Optionally, the step of setting an initial tolerance value for mass spectrum matching and matching the mass spectra of easily confused bacteria to be matched with a mass spectrometry database to obtain an initial matching result, initial peak matching status, and database spectra corresponding to the initial tolerance value includes:
[0032] Adaptively set the initial tolerance value for mass spectrum matching;
[0033] The mass spectrum of the easily confused bacteria to be matched is matched with all spectra in the mass spectrometry database to obtain the initial matching result, initial peak matching status and database spectra corresponding to the initial tolerance value; wherein, the initial peak matching status includes the error values between all matching peaks in the mass spectrometry database and the mass spectrum peak to be matched in the mass spectrum of the bacteria to be matched.
[0034] Optionally, determining whether the initial tolerance value satisfies a preset convergence condition using the initial matching result and the initial peak matching status includes:
[0035] A termination equation is defined, and the initial matching result and the initial peak matching condition are input into the termination equation. It is then determined whether the initial tolerance value satisfies a preset convergence condition. The termination equation is:
[0036] end = f(R0, B0);
[0037] Where R0 is the initial matching result and B0 is the initial peak matching situation.
[0038] Optionally, determining whether the initial tolerance value meets a preset convergence condition, if the initial tolerance value meets the preset convergence condition, includes:
[0039] Determine whether all mass spectrometry peak error values in the initial matching result are not greater than the initial tolerance value;
[0040] If all mass spectrometry peak error values in the initial matching results are not greater than the initial tolerance value, then the initial peak matching is uniquely identified. If the uniqueness identification is successful, then the initial tolerance value is determined to meet the preset convergence condition.
[0041] Optionally, before calculating the initial matching result and the initial peak matching situation, the method further includes:
[0042] Set the intermediate variable matching results and intermediate variable peak matching status;
[0043] The values of the initial matching result and the initial peak matching condition are assigned to the intermediate variable matching result and the intermediate variable peak matching condition.
[0044] Optionally, the step of calculating the optimized tolerance value based on the initial matching result and the initial peak matching situation includes:
[0045] A tolerance adjustment equation is defined, and the intermediate variable matching results and intermediate variable peak matching are input into the tolerance adjustment equation for calculation to obtain the optimized tolerance value; the tolerance adjustment equation is:
[0046] t n=g(R) n B n );
[0047] Among them, t n To optimize the tolerance value, R n For the intermediate variable matching result, B n This refers to the peak matching situation for intermediate variables.
[0048] Optionally, the step of using a dynamic matching algorithm to dynamically identify the easily confused bacteria includes:
[0049] Determine whether a target dynamic computation startup rule exists that corresponds to each dynamic computation startup rule in the dynamic computation startup rule base;
[0050] If a target dynamic operation startup rule exists that corresponds to each of the dynamic operation startup rules in the dynamic operation startup rule base, then the target dynamic operation startup rule is started, and a dynamic matching algorithm is used to perform dynamic operation identification on easily confused bacteria.
[0051] Optionally, before determining whether a target dynamic computing startup rule exists corresponding to each dynamic computing startup rule in the dynamic computing startup rule base, the method further includes:
[0052] Based on business needs and according to the preset rule construction method, each dynamic operation startup rule is constructed;
[0053] Each of the aforementioned dynamic operation startup rules is saved to the preset dynamic operation startup rule library.
[0054] Optionally, the step of employing a dynamic matching algorithm and retrieving the micro-difference analysis model to perform dynamic calculation identification on the easily confused bacteria to obtain dynamic calculation identification results includes:
[0055] Set an initial tolerance value for mass spectrum matching, perform the first round of dynamic calculation, and match the mass spectra of easily confused bacteria to be matched with the mass spectrometry database to obtain the initial matching result, initial peak matching status and database spectrum corresponding to the initial tolerance value.
[0056] Using the initial matching result and the initial peak matching situation, it is determined whether the initial tolerance value meets the preset convergence condition. If the initial tolerance value meets the preset convergence condition, the dynamic calculation ends, and the initial matching result is used as the dynamic calculation identification result of the easily confused bacteria.
[0057] If the initial tolerance value does not meet the preset convergence condition, that is, the first round of dynamic calculation cannot produce an identification result, then it is determined whether there is a micro-difference analysis model in the preset model library that is the same type as the current matching result. If there is a micro-difference analysis model in the preset model library that is the same type as the current matching result, then the micro-difference analysis model is called to analyze and identify the current matching result in order to obtain the model analysis and identification result.
[0058] If there is no micro-difference analysis model in the preset model library that is the same type as the current matching result, then the initial matching result and the initial peak matching situation are calculated to obtain the optimized tolerance value;
[0059] Based on the optimized tolerance value, a new round of dynamic calculation and model matching process is repeated until the identification result is obtained.
[0060] Secondly, this application discloses a device for identifying easily confused bacteria, comprising:
[0061] The judgment module is used to obtain the mass spectrometry identification results sent by the original mass spectrometer, and to determine whether there are easily confused bacteria in the mass spectrometry identification results. If there are easily confused bacteria in the mass spectrometry identification results, it is then determined whether there is a micro-difference analysis model with the same type as the easily confused bacteria in the preset model library.
[0062] The model analysis module is used to retrieve a micro-difference analysis model that is the same type as the easily confused bacteria in the preset model library to analyze and identify the easily confused bacteria, so as to obtain the model analysis and identification results.
[0063] The dynamic calculation identification module is used to perform dynamic calculation identification on the easily confused bacteria if there is no micro-difference analysis model of the same type as the easily confused bacteria in the preset model library, or to perform dynamic calculation identification on the easily confused bacteria by using a dynamic matching algorithm and calling the micro-difference analysis model, so as to obtain the dynamic calculation identification result.
[0064] Thirdly, this application discloses an electronic device, including:
[0065] Memory, used to store computer programs;
[0066] A processor is used to execute the computer program to implement the aforementioned method for identifying easily confused bacteria.
[0067] Fourthly, this application discloses a computer storage medium for storing a computer program; wherein, when the computer program is executed by a processor, it implements the steps of the aforementioned disclosed method for identifying easily confused bacteria.
[0068] As can be seen, this application provides a method for identifying easily confused bacteria, including acquiring the mass spectrometry identification results sent by the original mass spectrometer, determining whether easily confused bacteria exist in the mass spectrometry identification results, if easily confused bacteria exist in the mass spectrometry identification results, determining whether there is a micro-difference analysis model of the same type as the easily confused bacteria in a preset model library; if there is a micro-difference analysis model of the same type as the easily confused bacteria in the preset model library, retrieving the micro-difference analysis model to analyze and identify the easily confused bacteria, so as to obtain the model analysis and identification results; if there is no micro-difference analysis model of the same type as the easily confused bacteria in the preset model library, a dynamic matching algorithm is used to perform dynamic calculation identification on the easily confused bacteria, or a dynamic matching algorithm is used and the micro-difference analysis model is retrieved to perform dynamic calculation identification on the easily confused bacteria, so as to obtain the dynamic calculation identification results. This application first determines whether easily confused bacteria exist in the mass spectrometry identification results sent by the original mass spectrometer. If easily confused bacteria exist, it then determines whether a micro-difference analysis model of the same type as the easily confused bacteria exists in the preset model library. If a micro-difference analysis model of the same type as the easily confused bacteria exists, the micro-difference analysis model is retrieved to analyze and identify the easily confused bacteria to obtain the model analysis and identification results. If no micro-difference analysis model of the same type as the easily confused bacteria exists, a dynamic matching algorithm is used to perform dynamic calculation identification of the easily confused bacteria, or a dynamic matching algorithm is used and the micro-difference analysis model is retrieved to perform dynamic calculation identification of the easily confused bacteria to obtain the dynamic calculation identification results. This application realizes a multi-faceted identification process of easily confused bacteria through the micro-difference analysis model and the dynamic matching algorithm, thereby improving the accuracy and efficiency of matrix-assisted laser desorption / ionization time-of-flight mass spectrometer in identifying similar microorganisms.
[0069] In view of this, the purpose of this invention is to provide a method, apparatus, device, and medium for dynamic computational identification of easily confused bacteria, which can reduce the complexity of dynamic computational identification of easily confused bacteria and improve the efficiency and accuracy of dynamic computational identification of easily confused bacteria. The specific solution is as follows:
[0070] Fifthly, this application discloses a dynamic computational identification method for easily confused bacteria, comprising:
[0071] Set an initial tolerance value for mass spectrum matching, and match the mass spectra of easily confused bacteria to be matched with the mass spectrometry database to obtain the initial matching result, initial peak matching status and database spectrum corresponding to the initial tolerance value;
[0072] Using the initial matching result and the initial peak matching situation, it is determined whether the initial tolerance value meets the preset convergence condition. If the initial tolerance value meets the preset convergence condition, the initial matching result is used as the dynamic calculation identification result of the easily confused bacteria.
[0073] If the initial tolerance value does not meet the preset convergence condition, the initial matching result and the initial peak matching situation are calculated to obtain the optimized tolerance value.
[0074] The database spectrum is matched based on the optimized tolerance value to obtain the current matching result and the current peak matching status. The process of judging the convergence condition is repeated until the optimized tolerance value meets the convergence condition. The current matching result is then used as the dynamic calculation and identification result of the easily confused bacteria.
[0075] Optionally, the step of setting an initial tolerance value for mass spectrum matching and matching the mass spectra of easily confused bacteria to be matched with a mass spectrometry database to obtain an initial matching result, initial peak matching status, and database spectra corresponding to the initial tolerance value includes:
[0076] Adaptively set the initial tolerance value for mass spectrum matching;
[0077] The mass spectrum of the easily confused bacteria to be matched is matched with all spectra in the mass spectrometry database to obtain the initial matching result, initial peak matching status and database spectra corresponding to the initial tolerance value; wherein, the initial peak matching status includes the error values between all matching peaks in the mass spectrometry database and the mass spectrum peak to be matched in the mass spectrum of the bacteria to be matched.
[0078] Optionally, determining whether the initial tolerance value satisfies a preset convergence condition using the initial matching result and the initial peak matching status includes:
[0079] A termination equation is defined, and the initial matching result and the initial peak matching condition are input into the termination equation. It is then determined whether the initial tolerance value satisfies a preset convergence condition. The termination equation is:
[0080] end = f(R0, B0);
[0081] Where R0 is the initial matching result and B0 is the initial peak matching situation.
[0082] Optionally, determining whether the initial tolerance value meets a preset convergence condition, if the initial tolerance value meets the preset convergence condition, includes:
[0083] Determine whether all mass spectrometry peak error values in the initial matching result are not greater than the initial tolerance value;
[0084] If all mass spectrometry peak error values in the initial matching results are not greater than the initial tolerance value, then the initial peak matching is uniquely identified. If the uniqueness identification is successful, then the initial tolerance value is determined to meet the preset convergence condition.
[0085] Optionally, before calculating the initial matching result and the initial peak matching situation, the method further includes:
[0086] Set the intermediate variable matching results and intermediate variable peak matching status;
[0087] The values of the initial matching result and the initial peak matching condition are assigned to the intermediate variable matching result and the intermediate variable peak matching condition.
[0088] Optionally, the step of calculating the optimized tolerance value based on the initial matching result and the initial peak matching situation includes:
[0089] A tolerance adjustment equation is defined, and the intermediate variable matching results and intermediate variable peak matching are input into the tolerance adjustment equation for calculation to obtain the optimized tolerance value; the tolerance adjustment equation is:
[0090] t n =g(R) n B n );
[0091] Among them, t n To optimize the tolerance value, R n For the intermediate variable matching result, B n This refers to the peak matching situation for intermediate variables.
[0092] Optionally, before setting the initial tolerance value for mass spectrum matching, the method further includes:
[0093] Based on business needs and according to the preset rule construction method, each dynamic operation startup rule is constructed;
[0094] Each of the aforementioned dynamic operation startup rules is saved to the preset dynamic operation startup rule library.
[0095] Determine whether a target dynamic computing startup rule exists locally, corresponding to each of the dynamic computing startup rules in the dynamic computing startup rule base;
[0096] If a target dynamic operation startup rule corresponding to each of the dynamic operation startup rules in the dynamic operation startup rule base exists locally, then the target dynamic operation startup rule is started, and the process of dynamic operation identification of easily confused bacteria is executed.
[0097] Sixthly, this application discloses a dynamic computational identification device for easily confused bacteria, comprising:
[0098] The matching module is used to set the initial tolerance value for mass spectrum matching and match the mass spectrum of easily confused bacteria to be matched with the mass spectrometry database to obtain the initial matching result, initial peak matching status and database spectrum corresponding to the initial tolerance value.
[0099] The first calculation identification result determination module is used to determine whether the initial tolerance value meets the preset convergence condition by using the initial matching result and the initial peak matching situation. If the initial tolerance value meets the preset convergence condition, the initial matching result is used as the dynamic calculation identification result of the easily confused bacteria.
[0100] The calculation module is used to calculate the initial matching result and the initial peak matching situation to obtain an optimized tolerance value if the initial tolerance value does not meet the preset convergence condition.
[0101] The second operation identification result determination module is used to match the database spectrum based on the optimized tolerance value to obtain the current matching result and the current peak matching status, and repeat the process of judging the convergence condition until the optimized tolerance value meets the convergence condition, and use the current matching result as the dynamic operation identification result of the easily confused bacteria.
[0102] Optionally, the matching module includes:
[0103] The initial tolerance setting module is used to adaptively set the initial tolerance value for mass spectrum matching;
[0104] The spectrum matching module is used to match the mass spectrum of easily confused bacteria to be matched with all spectra in the mass spectrometry database to obtain the initial matching result, initial peak matching status and database spectrum corresponding to the initial tolerance value; wherein, the initial peak matching status includes the error value between all matching peaks in the mass spectrometry database and the mass spectrum peak to be matched in the mass spectrum to be matched.
[0105] In a seventh aspect, this application discloses an electronic device, comprising:
[0106] Memory, used to store computer programs;
[0107] A processor is used to execute the computer program to implement the aforementioned dynamic computational identification method for easily confused bacteria.
[0108] Eighthly, this application discloses a computer storage medium for storing a computer program; wherein, when the computer program is executed by a processor, it implements the steps of the aforementioned disclosed dynamic computational identification method for easily confused bacteria.
[0109] As can be seen, this application provides a dynamic computational identification method for easily confused bacteria, including setting an initial tolerance value for mass spectrum matching, and matching the mass spectrum of the easily confused bacteria to be matched with a mass spectrometry database to obtain an initial matching result, an initial peak matching situation, and a database spectrum corresponding to the initial tolerance value; using the initial matching result and the initial peak matching situation, determining whether the initial tolerance value meets a preset convergence condition; if the initial tolerance value meets the preset convergence condition, then the initial matching result is used as the dynamic computational identification result of the easily confused bacteria; if the initial tolerance value does not meet the preset convergence condition, then the initial matching result and the initial peak matching situation are calculated to obtain an optimized tolerance value; based on the optimized tolerance value, the database spectrum is matched to obtain a current matching result and a current peak matching situation; the convergence condition judgment process is repeated until the optimized tolerance value meets the convergence condition, and the current matching result is used as the dynamic computational identification result of the easily confused bacteria. This application utilizes a predefined initial tolerance value to match the mass spectrum to be matched with a mass spectrometry database. It then determines whether the initial tolerance value meets a preset convergence condition. If it does, the initial matching result is used as the dynamic identification result for easily confused bacteria. If not, an optimized tolerance value is calculated, and the matching is performed again until the optimized tolerance value meets the convergence condition. The current matching result is then used as the dynamic identification result for easily confused bacteria. This application enables dynamic identification of easily confused bacteria without adding an extra database, through dynamic matching between the mass spectrum to be matched and the mass spectrometry database, and by determining the convergence of the tolerance value. This reduces the workload of dynamic identification of mixed bacteria mass spectra and is applicable to the identification of various easily confused bacteria, reducing the complexity of dynamic identification of easily confused bacteria and improving its efficiency and accuracy.
[0110] In view of this, the purpose of this invention is to provide a method, apparatus, device, and medium for identifying easily confused bacteria based on a chain analysis model, which can reduce the dependence on easily confused bacterial profiles and improve the accuracy and efficiency of easily confused bacterial identification. The specific solution is as follows:
[0111] Ninthly, this application discloses a method for identifying easily confused bacteria based on a chain analysis model, comprising:
[0112] Obtain the protein fingerprint of a known strain that is easily mixed with mass spectrometry, and perform noise reduction on the protein fingerprint to obtain the processed protein fingerprint.
[0113] Peak multiple analysis and comparison are performed on different species in the processed protein fingerprint to obtain each unique peak corresponding to different species. Each unique peak is classified using a preset binary chain analysis model to construct a chain analysis model. The chain analysis model is trained to obtain the target chain analysis model.
[0114] The protein fingerprint of the easily confused bacteria to be identified is input into the target chain analysis model for identification, so as to output the identification results of the easily confused bacteria to be identified.
[0115] Optionally, obtaining the protein fingerprint of a known strain that is easily mixed with the mass spectrometer, and performing noise processing on the protein fingerprint, includes:
[0116] Collect microbial protein fingerprints and screen the protein fingerprints of known strains that meet the preset conditions for easy conjugation of mass spectrometry from all the microbial protein fingerprints.
[0117] The protein fingerprint spectrum is noise-processed using preset typing software.
[0118] Optionally, the step of using preset typing software to process noise in the protein fingerprint includes:
[0119] The protein fingerprint is input into a preset typing software, and noise is processed on the protein fingerprint based on wavelet transform; the noise processing includes exponential smoothing elimination.
[0120] Optionally, the step of performing peak multiplexing and comparison on different species in the processed protein fingerprint to obtain the unique peaks corresponding to each species includes:
[0121] The processed protein fingerprint was analyzed using a pre-defined typing software, and multiple peak comparisons were performed on different species in the processed protein fingerprint to obtain the unique peak values corresponding to each species.
[0122] Optionally, training the chain analysis model to obtain the target chain analysis model includes:
[0123] Obtain historical species protein fingerprint peaks, and use the historical species protein fingerprint peaks to train the chain analysis model to obtain training results;
[0124] Based on the training results, the parameters in the chain analysis model are modified and adjusted to obtain the target chain analysis model.
[0125] Optionally, before inputting the protein fingerprint of the easily confused bacteria to be identified into the target chain analysis model for identification, the method further includes:
[0126] The target chain analysis model is stored in a modular manner, and model startup rules are set.
[0127] Optionally, the step of inputting the protein fingerprint of the easily confused bacteria to be identified into the target chain analysis model for identification includes:
[0128] Obtain the protein fingerprint profile of the easily confused bacteria to be identified, and determine whether the protein fingerprint profile meets the model activation rules;
[0129] If the protein fingerprint meets the model activation rules, the target chain analysis model is activated, and the protein fingerprint is input into the target chain analysis model for identification.
[0130] In a tenth aspect, this application discloses a device for identifying easily confused bacteria based on a chain analysis model, comprising:
[0131] The spectrum acquisition and processing module is used to acquire protein fingerprint spectra of known strains that are easily mixed with mass spectrometry, and to perform noise processing on the protein fingerprint spectra to obtain the processed protein fingerprint spectra.
[0132] The model building module is used to perform peak multiple analysis and comparison on different species in the processed protein fingerprint to obtain each special unique peak corresponding to different species. The module uses a preset binary chain analysis model to classify each special unique peak to construct a chain analysis model and trains the chain analysis model to obtain the target chain analysis model.
[0133] The easily confused bacteria identification module is used to input the protein fingerprint of the easily confused bacteria to be identified into the target chain analysis model for identification, so as to output the identification results of the easily confused bacteria to be identified.
[0134] In its eleventh aspect, this application discloses an electronic device, comprising:
[0135] Memory, used to store computer programs;
[0136] A processor is used to execute the computer program to implement the aforementioned method for identifying easily confused bacteria based on a chain analysis model.
[0137] In a twelfth aspect, this application discloses a computer storage medium for storing a computer program; wherein, when the computer program is executed by a processor, it implements the steps of the aforementioned disclosed method for identifying easily confused bacteria based on a chain analysis model.
[0138] Therefore, this application provides a method for identifying easily confused bacteria based on a chain analysis model, including obtaining protein fingerprints of known strains that are easily confused by mass spectrometry; performing noise processing on the protein fingerprints to obtain processed protein fingerprints; performing peak multiplexing and comparison on different species in the processed protein fingerprints to obtain unique peaks corresponding to different species; classifying each unique peak using a preset binary chain analysis model to construct a chain analysis model; training the chain analysis model to obtain a target chain analysis model; inputting the protein fingerprint of the easily confused bacteria to be identified into the target chain analysis model for identification; and outputting the identification result of the easily confused bacteria to be identified. This application noise-processes the protein fingerprints of known strains that are easily confused with each other in mass spectrometry. Then, it performs peak multiplexing and comparison on different species within the processed protein fingerprints. A binary chain analysis model is used to classify the unique peaks obtained from the peak multiplexing and comparison, thereby constructing a chain analysis model. This reduces the reliance on easily confused bacterial fingerprints. Furthermore, the trained target chain analysis model is used to identify the protein fingerprints and output the identification results. This improves the accuracy and efficiency of easily confused bacterial identification and is applicable not only to the accurate differentiation of pairwise mixed bacteria but also to the accurate identification of mixed bacteria of multiple species. Attached Figure Description
[0139] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.
[0140] Figure 1 is a flowchart of a method for identifying easily confused bacteria disclosed in this application;
[0141] Figure 2 is a flowchart of a specific implementation method of the first multiple analysis and comparison method disclosed in this application;
[0142] Figure 3 is a flowchart of the specific implementation method of the second multiple analysis and comparison disclosed in this application;
[0143] Figure 4 is a schematic diagram of the structure of an easily confused bacteria identification device disclosed in this application;
[0144] Figure 5 is a structural diagram of an electronic device provided in this application;
[0145] Figure 6 is a flowchart of a dynamic calculation identification method for easily confused bacteria disclosed in this application;
[0146] Figure 7 is a flowchart of a dynamic calculation identification method disclosed in this application;
[0147] Figure 8 is a schematic diagram of the structure of a dynamic calculation identification device for easily confused bacteria disclosed in this application;
[0148] Figure 9 is a flowchart of a method for identifying easily confused bacteria based on a chain analysis model disclosed in this application;
[0149] Figure 10 is a schematic diagram of the structure of a device for identifying easily confused bacteria based on a chain analysis model disclosed in this application. Detailed Implementation
[0150] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0151] Currently, the identification of microorganisms using MALDI-TOF MS mainly relies on fingerprint images. A certain matching algorithm is used to search the database, and the identification result is given after scoring the similarity. There are several types of widely used algorithms: (1) Expert spectrum identification system based on species-level database construction, which uses a weight matrix algorithm and finally scores the similarity based on the weight similarity to give the similarity percentage; (2) Super spectrum comparison algorithm, which compares the spectrum of the bacteria to be tested with all reference spectra one by one and gives the reference identification result; (3) Unsupervised pattern matching algorithm, which defines the result of a certain score range as species reliable, genus reliable and unreliable results based on big data calculation. However, whether using a weight matrix, super spectrum or unsupervised pattern matching, in order to ensure that the instrument can still give relatively accurate results under the influence of various factors such as current industrial manufacturing, experimental conditions and sample conditions, a scoring range (similar percentage) is used when judging the spectrum results, and it is stipulated that the result within a certain range is species reliable, a certain range is genus reliable and a certain range is unreliable. Under these circumstances, a situation inevitably arises where, when two or more microorganisms are closely related or their collected spectra are similar (i.e., the high-intensity peaks in the spectra are essentially the same), the current algorithm often results in several bacteria having scores within the same confidence interval, making accurate identification difficult. Therefore, improving the accuracy and efficiency of matrix-assisted laser desorption / ionization time-of-flight mass spectrometry (MADS) for identifying closely related microorganisms is a problem that needs to be solved in this field.
[0152] Referring to Figure 1, this embodiment of the invention discloses a method for identifying easily confused bacteria, which may specifically include:
[0153] Step S11: Obtain the mass spectrometry identification results sent by the original mass spectrometer, determine whether there are easily confused bacteria in the mass spectrometry identification results, and if there are easily confused bacteria in the mass spectrometry identification results, determine whether there is a micro-difference analysis model of the same type as the easily confused bacteria in the preset model library.
[0154] In this application, easily confused bacteria refer to two or more species that are closely related, belong to the same complex group, or have similar protein fingerprint profiles. Their protein fingerprint profiles show high similarity, meaning at least two species have identification scores within the same confidence interval. In this case, different types of bacteria (i.e., different species or different genera) within the same confidence interval are called easily confused bacteria or difficult-to-distinguish bacteria by mass spectrometry, or simply easily confused bacteria. Confidence intervals are generally divided into multiple segments based on identification scores, such as: species-level confidence, genus-level confidence, and unconfidential.
[0155] In this embodiment, before determining whether a micro-difference analysis model of the same type as the easily confused bacteria exists in the preset model library, the method further includes: constructing each micro-difference analysis model using the micro-difference analysis method; and saving each micro-difference analysis model to the model library. The construction of each micro-difference analysis model using the micro-difference analysis method includes: acquiring historical data; analyzing the historical data using eigenvalue analysis and a preset eigenvalue analysis system to obtain the distribution and intensity of micro-difference peaks in the fingerprint spectrum of each bacterial species; establishing a primary micro-difference model based on the micro-difference peaks; and correcting the accuracy and precision of the micro-difference model using big data analysis combined with a rolling adjustment method of model parameters to obtain a finalized micro-difference analysis model; or, using a spectral trend analysis method and an artificial intelligence learning method to autonomously learn the bacterial species types in the historical data to obtain spectral trend differences between different groups; constructing a reference spectral trend differentiation model based on the spectral trend differences; and verifying and correcting the reference spectral trend differentiation model to obtain each micro-difference analysis model.
[0156] Micro-difference analysis methods include eigenvalue analysis and spectral trend analysis. The implementation process is as follows: (1) Eigenvalue analysis: Using a self-developed eigenvalue analysis system, the data is analyzed, the fingerprint spectra of easily confused bacteria are compared, the same protein fingerprint peaks are blocked, the unstable protein fingerprint peaks are blocked, and among the remaining protein fingerprint peaks, the protein fingerprint peaks with high abundance, strong stability, high peak intensity, and large relative abundance of peaks are selected as reference difference points. Then, big data analysis is performed for verification. During this period, the number of reference difference points and the model parameters that affect accuracy and precision are adjusted in a rolling manner to determine the optimal difference point for differentiation and identification, and to determine the detection tolerance, relative abundance, relative abundance tolerance, and other parameters of the characteristic peak of the optimal difference point. Based on the above big data analysis results, a differentiation model for easily confused bacteria is established and the model is included in the model library.
[0157] (2) Spectral Trend Analysis Method: Spectra are categorized into different groups based on bacterial species. Using an artificial intelligence self-learning algorithm, the model autonomously learns across multiple groups to distinguish differences in spectra trends. Based on these trend differences, a reference spectra trend differentiation model for easily confused bacteria is automatically encoded. After big data validation and continuous model refinement, the optimal spectra trend analysis model is finally determined and incorporated into a model library.
[0158] In this embodiment, if easily confused bacteria are found in the mass spectrometry identification results, it is determined whether a micro-difference analysis model of the same type as the easily confused bacteria exists in the preset model library; the micro-difference analysis model includes at least a chain analysis model; the construction process of the chain analysis model includes: obtaining the protein fingerprint spectrum of known strains that are easily confused by mass spectrometry, performing noise reduction processing on the protein fingerprint spectrum to obtain the processed protein fingerprint spectrum; performing peak multiple alignment analysis on different species in the processed protein fingerprint spectrum to obtain each unique peak corresponding to different species, classifying each unique peak using a preset binary chain analysis model, and constructing a chain analysis model for all strains involved in the analysis according to the rule of progressively reducing the population size in binary classification, and training the constructed chain analysis model using the protein fingerprint spectrum of the training set strains to obtain the target chain analysis model.
[0159] Specifically, the process involves collecting microbial protein fingerprint profiles and screening known strains that meet preset confounding criteria and are likely to become confounded by mass spectrometry from all the microbial protein fingerprint profiles; then, noise reduction is performed on the protein fingerprint profiles using preset typing software to obtain the processed protein fingerprint profiles.
[0160] The specific noise processing procedure is as follows: the protein fingerprint spectrum is input into the preset typing software, and noise processing is performed on the protein fingerprint spectrum based on the wavelet transform method; the noise processing includes exponential smoothing elimination processing.
[0161] This application removes noise from the spectrum through wavelet transform and eliminates spikes caused by systematic errors through exponential smoothing, thereby reducing unstable interference factors included in the model group.
[0162] The process of peak multiple analysis and comparison is as follows: multiple analysis is performed on all processed protein fingerprints using preset typing software, and peak multiple comparison is performed on different species in the processed protein fingerprints to obtain the unique peaks corresponding to different species.
[0163] The process of training the chain analysis model is as follows: obtain the protein fingerprint peaks of historical species, use the protein fingerprint peaks of historical species to train the chain analysis model to obtain the training results; modify and adjust the parameters in the chain analysis model based on the training results to improve the accuracy and specificity of the model, so as to obtain the target chain analysis model.
[0164] The specific process of constructing the chain analysis model in this application is as follows: Based on each unique peak, a single binary analysis model is established in the typing software. When establishing the binary analysis model, multiple parameters such as the number of peaks, peak matching tolerance, peak abundance, and peak abundance tolerance should be initially set according to the analysis results of the medium-sized data volume of the protein fingerprint. Then, the established binary analysis model is cross-correlated to form a chain analysis structure, and a chain analysis model is constructed for the joint analysis of multiple easily confused bacteria.
[0165] The specific implementation methods for multiple analyses and comparisons in this application include the following two scenarios:
[0166] (1) The specific implementation process of the first multiple analysis and comparison is shown in Figure 2. Set an identification result as genus-level identification, which includes common characteristic peaks. First, take a species with a special unique peak and the peak situation of genus-level identification to form the first binary judgment. Two results will appear: one is that it contains a special peak and can be identified as the first species, and the other is that it does not contain a characteristic peak and can only be identified as the name of the genus. Then, take a second species with a characteristic peak and the genus identification to form the second binary judgment. Continue in this way until the analysis tree reaches the point of indivisibility. At this time, the distinction of all easily confused identification bacteria can be completed.
[0167] (2) The specific implementation process of the second type of multiple analysis and comparison is shown in Figure 3. First, the population peak analysis is performed on the several indistinguishable bacteria. Based on the distribution of characteristic peaks, the bacteria are divided into two populations. The difference peak combination between the two populations is used as the first binary branching judgment. Then, peak analysis is performed again in the two populations. Based on the specific peak combination, the bacteria in the population are divided into two small bacteria to form the second binary branching judgment. This process is repeated until the bacteria are finally indistinguishable.
[0168] In addition, this application uses known historical species protein fingerprint peaks for model training, and modifies the peak judgment parameters in the model based on the training results until the model accuracy is stable, thereby obtaining the target chain analysis model.
[0169] In this embodiment, the target chain analysis model is modularly stored, and model startup rules are set. The protein fingerprint spectrum of the easily confused bacteria to be identified is obtained. It is determined whether the protein fingerprint spectrum meets the model startup rules. If the protein fingerprint spectrum meets the model startup rules, the target chain analysis model is started, and the protein fingerprint spectrum is input into the target chain analysis model for identification, so as to output the identification result of the easily confused bacteria to be identified.
[0170] In other words, the target chain analysis model is modularly stored, and the model activation rules are defined. For example, when species-level mixed identification occurs in the identification results, and the species mixed in the identification happen to be species that the model can distinguish, the chain model is retrieved from the stored model library and the results are accurately interpreted.
[0171] The advantages of the micro-difference analysis model are: (1) it reduces the dependence on spectrograms by establishing a model for differentiation; (2) it can accurately identify easily confused strains included in the model; and (3) it is applicable not only to the accurate differentiation of pairwise mixed identification bacteria, but also to the accurate identification of mixed identification of multiple bacteria. This application uses the difference points between strains and populations to connect easily confused bacterial species through the difference point concatenation method, and accurately identifies each strain in the easily confused identification strain population.
[0172] Step S12: If a micro-difference analysis model of the same type as the easily confused bacteria exists in the preset model library, then the micro-difference analysis model is retrieved to analyze and identify the easily confused bacteria to obtain the model analysis and identification results.
[0173] In this embodiment, easily confused bacteria are matched with a model library, and the micro-difference analysis model is retrieved to analyze and identify the easily confused bacteria, outputting the model analysis and identification results at the species level with a single bacterial species name. When easily confused bacteria are present in the mass spectrometry identification results obtained using a mass spectrometer, and the species of the easily confused bacteria is the same as a certain micro-difference analysis model in the model library, the corresponding micro-difference analysis model in the model library is called for analysis and identification, that is, further refined analysis of the easily confused bacteria in the original report to obtain the model analysis and identification results at the species level with a single bacterial species name. In other words, only one bacterial species has a score greater than the minimum tolerance value for species-level identification.
[0174] For example, small models can be constructed as units to identify easily confused bacteria during MALDI-TOF MS identification. Because the data volume within each model unit is relatively small, subtle differences in the protein fingerprints of a limited number of specific bacterial species can be used to distinguish them individually through micro-difference analysis. For instance, model 1 can be constructed for easily confused bacterial species A and B, model 2 for easily confused bacterial species A and C, model 3 for easily confused bacterial species A, D, and E, and so on. These models can be used individually or in combination. For example, model unit 1 and model unit 2 can be combined to distinguish bacterial species A, B, C, and so on. Finally, all these unitized or linked unit combinations are incorporated into a cluster to form a model library. The activation rules of the model library are as follows: when the mass spectrometry identification results contain only species A and B at the species level, the identification of model 1 in the model library is triggered, and model 1 is called to accurately distinguish between species A and B. If the model determines that the result of species A is the final result, then in the final report, the result of species A is retained at the species level, that is, the score of species A is above the minimum tolerance value for species level identification, and the result of species B is reduced to the genus level, that is, the score of strain B drops below the minimum tolerance value for species level identification. If the original report results contain only species A, B, and C at the species level, then the chain model composed of model 1 and model 2 in the model library is triggered, and this chain model is called to further analyze species A, B, and C and generate model analysis identification results.
[0175] Step S13: If there is no micro-difference analysis model in the preset model library that is the same type as the easily confused bacteria, then the dynamic matching algorithm is used to perform dynamic calculation identification on the easily confused bacteria, or the dynamic matching algorithm is used and the micro-difference analysis model is retrieved to perform dynamic calculation identification on the easily confused bacteria, so as to obtain the dynamic calculation identification result.
[0176] In this embodiment, the process of using a dynamic matching algorithm to dynamically identify the easily confused bacteria is as follows:
[0177] Set an initial tolerance value for mass spectrum matching, and match the mass spectra of easily confused bacteria to be matched with the mass spectrometry database to obtain the initial matching result, initial peak matching status and database spectrum corresponding to the initial tolerance value;
[0178] Using the initial matching result and the initial peak matching situation, it is determined whether the initial tolerance value meets the preset convergence condition. If the initial tolerance value meets the preset convergence condition, the initial matching result is used as the dynamic calculation identification result of the easily confused bacteria.
[0179] If the initial tolerance value does not meet the preset convergence condition, the initial matching result and the initial peak matching situation are calculated to obtain the optimized tolerance value.
[0180] The database spectrum is matched based on the optimized tolerance value to obtain the current matching result and the current peak matching status. The process of judging the convergence condition is repeated until the optimized tolerance value meets the convergence condition. The current matching result is then used as the dynamic calculation and identification result of the easily confused bacteria.
[0181] The process of setting an initial tolerance value for mass spectrum matching and matching the mass spectrum of easily confused bacteria with the mass spectrometry database is as follows: adaptively setting the initial tolerance value for mass spectrum matching; matching the mass spectrum of easily confused bacteria with all spectra in the mass spectrometry database to obtain the initial matching result, initial peak matching status, and database spectra corresponding to the initial tolerance value; wherein, the initial peak matching status includes the error values between all matching peaks in the mass spectrometry database and the mass spectrum peak of the mass spectrum to be matched.
[0182] In the process of microbial mass spectrometry matching and identification, this application introduces a tolerance value 't' to correct for mass number errors generated during instrument detection. The tolerance value refers to the range selected when matching peak values. The larger the tolerance, the wider the range selected for peak matching. The purpose of the tolerance is to ensure that, within a certain limit, small variations in the number of peaks do not interfere with the accuracy of the overall matching results. Typically, 't' is an appropriate constant to ensure that a mass spectrometry peak is not misjudged due to 't' being too large, nor is it missed due to 't' being too small. However, due to the existence of inter-instrument differences and the varying degrees of difference in fingerprint spectra between different microorganisms or different strains of microorganisms, the pre-set tolerance value 't' may have a certain impact on the correctness of the actual matching results. This application proposes an adaptive quantitative method for 't', which allows the tolerance value 't' to be automatically optimized during the matching process and ultimately uses a suitable value for matching determination.
[0183] In this embodiment, an initial tolerance value for mass spectrum matching is adaptively set; the mass spectrum of easily confused bacteria to be matched is matched with all spectra in the mass spectrometry database to obtain the initial matching result, initial peak matching status, and database spectra corresponding to the initial tolerance value; wherein, the initial peak matching status includes the error values between all matching peaks in the mass spectrometry database and the mass spectrum peak to be matched in the mass spectrum to be matched.
[0184] Specifically, a large tolerance value t0 is used as the initial value to obtain the initial tolerance value t0. The mass spectrum s0 to be matched is compared with the spectrum L = {l} in the mass spectrometry database. i , i∈N *} Perform matching to obtain the initially sorted matching result R0 = {r i , i∈N *}, the database map S0 corresponding to R0 = {s i , i∈N*}, and the initial peak matching situation B0={b i , i∈N * (Where, B0 includes the error values between all mass spectrometry database matching peaks and the mass spectrometry peaks to be matched).
[0185] The process for determining whether the initial tolerance value meets the preset convergence condition is as follows: A termination equation is set, and the initial matching result and the initial peak matching condition are input into the termination equation; the termination equation is:
[0186] end = f(R0, B0);
[0187] Where R0 is the initial matching result and B0 is the initial peak matching situation.
[0188] The convergence condition determination process is as follows: It is determined whether the error values of all mass spectrometry peaks in the initial matching results are not greater than the initial tolerance value. If the error values of all mass spectrometry peaks in the initial matching results are not greater than the initial tolerance value, then a uniqueness assessment is performed on the initial peak matching, meaning that only one single bacterial species name exists in the matching results above the minimum tolerance value for species-level identification. If the uniqueness assessment passes, then the initial tolerance value is determined to meet the preset convergence condition.
[0189] Specifically, a termination equation is defined to determine whether the initial tolerance value t0 has converged (i.e., whether t0 is the optimal tolerance). This termination equation is then used to determine whether the initial tolerance t0 has met the convergence condition for the initial matching result R0 and the initial peak matching situation B0. Specifically, the initial matching result R0 and the initial peak matching situation B0 are substituted into the termination equation to determine the convergence condition. If, at a tolerance of t0, all mass spectrometry peak error values in variable B0 are less than or equal to t0, and the matching result R0 is unique (i.e., uniqueness is verified), then the tolerance has converged, meaning the initial tolerance value meets the preset convergence condition. At this point, t0 is assigned to the converged tolerance variable t. m , i.e. t m =t0, and R0 is used as the dynamic calculation and identification result of easily confused bacteria. m R m ={ri, i∈N *}
[0190] Then, it is determined whether the error values of all mass spectrometry peaks in the initial matching results are not greater than the initial tolerance value. If the error values of all mass spectrometry peaks in the initial matching results are not greater than the initial tolerance value, then the uniqueness of the initial peak matching is determined, that is, only one single bacterial species name exists in the matching results above the minimum tolerance value for species-level identification. If the uniqueness determination is successful, then the initial tolerance value is determined to meet the preset convergence condition.
[0191] If the initial tolerance value does not meet the preset convergence condition, then intermediate variable matching results and intermediate variable peak matching conditions are set; the values of the initial matching results and the initial peak matching conditions are assigned to the intermediate variable matching results and the intermediate variable peak matching conditions, a tolerance adjustment equation is set, and the intermediate variable matching results and the intermediate variable peak matching conditions are input into the tolerance adjustment equation for calculation to obtain the optimized tolerance value; the tolerance adjustment equation is:
[0192] t n =g(R) n B n );
[0193] Among them, t n To optimize the tolerance value, R n For the intermediate variable matching result, B n This refers to the peak matching situation for intermediate variables.
[0194] Specifically, if the initial tolerance value does not meet the preset convergence condition, then intermediate variables are set, including the intermediate variable matching result R. n Matching case B with intermediate variable peaks n Receive the initial matching result R0 and the initial peak matching status B0, and let R... n =R0,B n =B0, then set the tolerance adjustment equation, and the intermediate variable matching result R n Matching case B with intermediate variable peaks n Substituting into the tolerance adjustment equation, we obtain the optimized tolerance value t. n .
[0195] The database graph is matched based on the optimized tolerance value to obtain the current matching result and the current peak matching status. In this embodiment, the optimized tolerance value t is used. n The database graph S0 is matched again to obtain the current matching result R. n ={r i , i∈N *}, and R n The corresponding current database map S n ={s i , i∈N * Matching situation B with the current peak n ={b i , i∈N *}, using the termination equation again, (R) n B n Substitute the values into the termination equation and perform the calculation to determine t. n Has convergence occurred? If convergence has occurred, then the corresponding current matching result R is... nAs a result of dynamic computational identification of easily confused bacteria.
[0196] In this embodiment, an initial tolerance value for mass spectrum matching is set, and the mass spectrum of easily confused bacteria to be matched is matched with a mass spectrometry database to obtain an initial matching result, an initial peak matching status, and a database spectrum corresponding to the initial tolerance value. Using the initial matching result and the initial peak matching status, it is determined whether the initial tolerance value meets a preset convergence condition. If the initial tolerance value meets the preset convergence condition, the initial matching result is used as the dynamic calculation identification result of the easily confused bacteria. If the initial tolerance value does not meet the preset convergence condition, the initial matching result and the initial peak matching status are calculated to obtain an optimized tolerance value. Based on the optimized tolerance value, the database spectrum is matched to obtain a current matching result and a current peak matching status. The convergence condition judgment process is repeated until the optimized tolerance value meets the convergence condition, and the current matching result is used as the dynamic calculation identification result of the easily confused bacteria. This application utilizes a predefined initial tolerance value to match the mass spectrum to be matched with a mass spectrometry database. It then determines whether the initial tolerance value meets a preset convergence condition. If it does, the initial matching result is used as the dynamic identification result for easily confused bacteria. If not, an optimized tolerance value is calculated, and the matching is performed again until the optimized tolerance value meets the convergence condition. The current matching result is then used as the dynamic identification result for easily confused bacteria. This application enables dynamic identification of easily confused bacteria without adding an extra database, through dynamic matching between the mass spectrum to be matched and the mass spectrometry database, and by determining the convergence of the tolerance value. This reduces the workload of dynamic identification of mixed bacteria mass spectra and is applicable to the identification of various easily confused bacteria, reducing the complexity of dynamic identification of easily confused bacteria and improving its efficiency and accuracy.
[0197] In this embodiment, if there is no micro-difference analysis model in the preset model library that is the same type as the easily confused bacteria, it is determined whether there is a target dynamic operation start rule corresponding to each dynamic operation start rule in the dynamic operation start rule library; if there is a target dynamic operation start rule corresponding to each dynamic operation start rule in the dynamic operation start rule library, the target dynamic operation start rule is started, and a dynamic matching algorithm is used to perform dynamic operation identification on the easily confused bacteria.
[0198] Before determining whether there is a target dynamic operation startup rule corresponding to each dynamic operation startup rule in the dynamic operation startup rule library, the method further includes: constructing each dynamic operation startup rule based on business requirements and according to a preset rule construction method; and saving each of the dynamic operation startup rules to the preset dynamic operation startup rule library.
[0199] Therefore, the prerequisite for the dynamic computation identification process of easily confused bacteria is: first, determine whether there is a target dynamic computation startup rule corresponding to each of the dynamic computation startup rules in the dynamic computation startup rule library on the local machine; if there is a target dynamic computation startup rule corresponding to each of the dynamic computation startup rules in the dynamic computation startup rule library on the local machine, then start the target dynamic computation startup rule and execute the dynamic computation identification process of easily confused bacteria proposed in this application.
[0200] In a specific embodiment of the present invention, the process of using a dynamic matching algorithm and calling the micro-difference analysis model to perform dynamic calculation and identification of the easily confused bacteria is as follows: An initial tolerance value for mass spectrum matching is set; a first round of dynamic calculation is performed, matching the mass spectrum of the easily confused bacteria to be matched with a mass spectrometry database to obtain the initial matching result, initial peak matching status, and database spectrum corresponding to the initial tolerance value; using the initial matching result and the initial peak matching status, it is determined whether the initial tolerance value meets a preset convergence condition; if the initial tolerance value meets the preset convergence condition, the dynamic calculation ends, and the initial matching result is used as the dynamic calculation and identification result of the easily confused bacteria; if the initial tolerance value does not meet the preset convergence condition, the dynamic calculation ends ... If the convergence condition is set, i.e., the first round of dynamic calculation cannot yield an identification result, then it is determined whether there is a micro-difference analysis model in the preset model library that is the same type as the current matching result. If there is a micro-difference analysis model in the preset model library that is the same type as the current matching result, then the micro-difference analysis model is retrieved to analyze and identify the current matching result to obtain a model analysis and identification result. If there is no micro-difference analysis model in the preset model library that is the same type as the current matching result, then the initial matching result and the initial peak matching situation are calculated to obtain an optimized tolerance value. Based on the optimized tolerance value, a new round of dynamic calculation and model matching process is repeated until an identification result is obtained.
[0201] This application, based on the identification results of MALDI-TOF MS, combines a dynamic matching algorithm with a micro-difference analysis model from a model library. Through multi-dimensional calculations, erroneous matching results are eliminated, ultimately improving the accuracy of MALDI-TOF MS in identifying microorganisms. Specifically, this can be divided into the following five cases:
[0202] (1) Directly utilize micro-difference analysis models to analyze and identify easily confused bacteria. That is, when easily confused bacteria are present in the mass spectrometry identification results, and a micro-difference analysis model of the same type as the easily confused bacteria exists in the model library, the micro-difference analysis model is directly used to analyze and identify the easily confused bacteria. The corresponding micro-difference analysis model is retrieved from the model library, and the easily confused bacterial species in the mass spectrometry identification results are further analyzed to provide a precise species-level model analysis and identification result, i.e., a single bacterial species name. (If the model library contains model data for the corresponding easily confused bacteria, the accuracy of identifying mixed bacteria is relatively high in this case).
[0203] (2) Micro-difference analysis model + dynamic operation identification. That is, when there are easily confused bacteria in the mass spectrometry identification results and there is a micro-difference analysis model of the same type as the easily confused bacteria in the model library, the operation method is the same as (1); when there are easily confused bacteria in the mass spectrometry identification results and there is no micro-difference analysis model of the same type as the easily confused bacteria in the model library, dynamic operation identification is started, and a round of dynamic analysis is performed on the easily confused bacteria. If the easily confused bacteria species with errors in the original matching results are removed after dynamic operation (that is, after the first round of dynamic operation, only one species has a score greater than the minimum tolerance value of species-level identification), the final result is reported directly; if at least two species still exist in the species-level identification results after a round of dynamic operation, but at this time, the type of easily confused bacteria in the species-level identification results happens to be the same as a certain micro-difference analysis model in the model library, then the micro-difference analysis model analysis and identification is started, and the corresponding micro-difference analysis model is called for further analysis. The execution process is the same as in (1), and will not be repeated here. If, after one round of dynamic calculation, at least two bacterial species still exist in the species-level identification results, and no matching micro-difference analysis model is found, a second round of dynamic calculation is performed on the easily confused bacteria, followed by attempts at model matching and analysis. This process is repeated multiple times until a corresponding micro-difference analysis model is successfully matched, or after multiple rounds of dynamic calculation, only one species is identified at the species level (i.e., only one species has a score higher than the minimum tolerance value for species-level identification). In this case, the final dynamic calculation identification result is output. If, after multiple rounds of dynamic calculation, no matching rule for the micro-difference analysis model in the model library can be found, and after multiple rounds of dynamic calculation, no species exists at the species level (i.e., all species have scores lower than the minimum tolerance value for species-level identification), the original matching result is returned directly, along with prompts and explanations for the easily confused species in the original result.
[0204] (3) Micro-difference analysis model + judgment on whether it meets the dynamic operation start rule + dynamic operation identification. That is, when the type of easily confused bacteria is the same as a certain micro-difference analysis model in the model library, the operation method is the same as (1); when the type of easily confused bacteria is different from the micro-difference analysis model in the model library, the easily confused bacteria are evaluated, that is, it is judged whether the easily confused bacteria meet the dynamic operation start rule in the preset dynamic operation start rule library; if it does not meet the dynamic operation start rule, the original matching result is returned directly, and prompts and explanations are given for the easily confused bacteria in the original result; if it meets the dynamic operation start rule, the dynamic matching algorithm is used to perform dynamic operation identification on the easily confused bacteria to obtain the dynamic operation identification result.
[0205] The process of establishing the dynamic computation startup rule base is as follows: For easily confused bacteria generated during MALDI-TOF MS identification, the easily confused strains are activated and evaluated multiple times using the MALDI-TOF MS method. The identification process must follow these rules: Orthogonal experiments are conducted using different treatment methods, different experienced personnel, different reagent batches, and different instrument conditions. The evaluation results are then incorporated into the dynamic computation startup rule base. The evaluation results fall into the following categories: 1. If the identification results of all test species of the same strain are consistent in the orthogonal experiment, the strain with the highest score is the correct result, and the difference in identification scores between the correct and incorrect strains in the same report is greater than 3%, then the dynamic algorithm can be used only for the final distinction between the several easily confused strains used in the test; such easily confused bacteria can be included in the dynamic computation rule base. 2. If the identification results of all test species of the same strain are inconsistent in the orthogonal experiment, but incorrectly identified strains are eliminated after a limited number of dynamic computations, the remaining identification results can be matched with the model startup rules. Such cases can be finally distinguished using dynamic calculation + model methods, and these easily confused bacteria can be included in the dynamic calculation rule base; 3. When the identification results of all test species of the same species are inconsistent in the orthogonal experiment, and the species with the highest score has an incorrect species identification, easily confused bacteria in this case cannot be subjected to dynamic calculation and should be excluded from the dynamic calculation initiation rule base; 4. When the correct identification results of all test species of the same species are excluded after one dynamic calculation in the orthogonal experiment, easily confused bacteria in this case cannot be subjected to dynamic calculation and should be excluded from the dynamic calculation initiation rule base; 5. Easily confused bacteria that have not been evaluated should be excluded from the dynamic calculation initiation rule base.
[0206] (4) Directly perform multi-round dynamic calculation identification. When there are easily confused bacteria in the species identification of the original matching results (i.e. mass spectrometry identification results), start the first round of easily confused bacteria identification. If the easily confused bacteria in the species identification are removed after the first round of easily confused bacteria identification (i.e., after the first round of easily confused bacteria identification, only one species has a score greater than the minimum tolerance value of species level identification), then the final species identification result is given and the identification report is output. If the first round of easily confused bacteria identification cannot completely remove the easily confused bacteria that were incorrectly identified in species identification (i.e., after the first round of easily confused bacteria identification, there are at least two species whose scores are greater than the minimum tolerance value for species-level identification), then dynamic calculation continues until only one species name remains in species identification (i.e., after multiple rounds of dynamic calculation, only one species identification score remains that is greater than the minimum tolerance value for species-level identification), and then the final identification result is reported; or after these multiple rounds of dynamic calculation, all easily confused species in the original matching results are reduced to genus identification (i.e., after multiple rounds of dynamic calculation, the scores of all species-identified bacteria are reduced to below the minimum tolerance value for species identification), indicating that the dynamic calculation cannot make a final distinction between easily confused bacteria. In this case, the original matching results are returned, and prompts and explanations are given for the easily confused species in the original results.
[0207] (5) Dynamic Judgment + Dynamic Calculation Identification. When there are easily confused bacteria in the species identification of the original matching results, assess whether the strains can be distinguished using only dynamic calculation. If the rules in the rule base for dynamic calculation of easily confused bacteria species matching can be completed by using only dynamic calculation to complete the final identification, then perform dynamic calculation on the original matching results (the calculation method is the same as in (4), and will not be repeated here) and give the final identification result. When there are easily confused bacteria in the species identification of the original matching results, and after evaluation, dynamic calculation cannot be started, then return the original matching results and give prompts and explanations for the easily confused species in the original results.
[0208] The dynamic matching algorithm, based on the original database matching results, performs a small-scale dynamic comparison of the spectra of a given strain in the results, removing results with large deviations and improving the accuracy of the results. However, it is worth noting that while dynamic matching improves the accuracy of the results, it also sacrifices some precision to a certain extent. That is, during dynamic calculations, the spectra score will decrease proportionally with the number of calculations. Therefore, it is necessary to reasonably control the number of dynamic calculations to improve the accuracy of the results while ensuring the correctness of the results. The model library performs micro-difference analysis on each type of easily confused strain to determine whether they can be distinguished by micro-difference analysis methods (including eigenvalue analysis and spectra trend analysis). The distinguishing methods are established in the model library. When a micro-difference analysis model in the model library exists with the same results as the original database matching method, a secondary analysis of the model is performed to further analyze the easily confused strains and provide a definitive result.
[0209] In other embodiments of the present invention, for mixed identification results of bacteria with the highest identification score at the genus level and different genera at the genus level, the above method can also be used to accurately distinguish different genera and obtain a unique genus identification result at the genus level.
[0210] The innovations of this application are: (1) By improving the microbial identification algorithm of MALDI-TOF MS, the system's ability to accurately identify easily confused bacteria is enhanced; (2) The dynamic matching algorithm and micro-difference analysis model used in this application can be directly applied to the accurate differentiation of easily confused bacteria by MALDI-TOF MS, and the accuracy is not different from person to person, thus improving the generalizability of the method.
[0211] Referring to Figure 4, an embodiment of the present invention discloses an identification device for easily confused bacteria, which may specifically include:
[0212] The judgment module 11 is used to obtain the mass spectrometry identification results sent by the original mass spectrometer, and to determine whether there are easily confused bacteria in the mass spectrometry identification results. If there are easily confused bacteria in the mass spectrometry identification results, it is determined whether there is a micro-difference analysis model with the same type as the easily confused bacteria in the preset model library.
[0213] The model analysis module 12 is used to retrieve the micro-difference analysis model to analyze and identify the easily confused bacteria if there is a micro-difference analysis model in the preset model library that is the same type as the easily confused bacteria, so as to obtain the model analysis and identification results.
[0214] The dynamic calculation identification module 13 is used to perform dynamic calculation identification on the easily confused bacteria by using a dynamic matching algorithm if there is no micro-difference analysis model of the same type as the easily confused bacteria in the preset model library, or to perform dynamic calculation identification on the easily confused bacteria by using a dynamic matching algorithm and calling the micro-difference analysis model, so as to obtain the dynamic calculation identification result.
[0215] Figure 5 is a schematic diagram of an electronic device provided in an embodiment of this application. The electronic device 20 specifically includes: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input / output interface 25, and a communication bus 26. The memory 22 stores a computer program, which is loaded and executed by the processor 21 to implement the relevant steps in the method for identifying easily confused bacteria performed by the electronic device disclosed in any of the foregoing embodiments.
[0216] In this embodiment, the power supply 23 is used to provide operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and external devices, and the communication protocol it follows can be any communication protocol applicable to the technical solution of this application, and is not specifically limited here; the input / output interface 25 is used to acquire external input data or output data to the outside world, and its specific interface type can be selected according to specific application needs, and is not specifically limited here.
[0217] In addition, the memory 22, as a carrier for resource storage, can be a read-only memory, random access memory, disk or optical disk, etc. The resources stored on it include operating system 221, computer program 222 and data 223, etc., and the storage method can be temporary storage or permanent storage.
[0218] The operating system 221 manages and controls the various hardware devices and computer programs 222 on the electronic device 20 to enable the processor 21 to perform calculations and processing on the data 223 in the memory 22. The operating system 221 can be Windows, U12ix, Li12ux, etc. The computer program 222, in addition to including a computer program capable of performing the easily confused bacteria identification method executed by the electronic device 20 as disclosed in any of the foregoing embodiments, may further include computer programs capable of performing other specific tasks. The data 223 may include data received by the easily confused bacteria identification device from external devices, as well as data collected by its own input / output interface 25.
[0219] The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein can be implemented directly by hardware, a software module executed by a processor, or a combination of both. The software module can be located in random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art.
[0220] Furthermore, embodiments of this application also disclose a computer-readable storage medium storing a computer program. When the computer program is loaded and executed by a processor, it implements the steps of the method for identifying easily confused bacteria disclosed in any of the foregoing embodiments.
[0221] Finally, it should be noted that in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0222] The present invention provides a detailed description of a method, apparatus, device, and storage medium for identifying easily confused bacteria. Specific examples have been used to illustrate the principles and implementation methods of the present invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of the present invention. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of the present invention. Therefore, the content of this specification should not be construed as a limitation of the present invention.
[0223] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0224] Currently, mass spectrometry identifies microorganisms by comparing acquired spectra with those in a database, outputting similarity scores. Generally, scores above 9.0 represent species-level identification, scores between 6.0 and 9.0 represent genus-level identification, and scores below 6.0 generally indicate unreliable results. During mass spectrometry identification, due to limitations of the spectrometer, some microorganisms with high similarity or close phylogenetic relationships may have multiple species results (9.0 or higher) that are clearly distinguishable; these are called easily confused bacteria. Currently, instruments using database algorithms based on super-spectrum data employ a method where, after initial super-spectrum matching, confusion occurs, and a secondary matching is performed using a small spectrogram database containing easily confused bacteria. This can solve the identification problem of individual easily confused bacteria. However, this method is limited in its ability to distinguish species. Small, subdivided databases need to be constructed separately, and each class of easily confused bacteria requires a separate subdivided database, resulting in a large workload, a large sample size requirement, and difficulties in collecting most uncommon easily confused bacteria, thus limiting the completeness and effectiveness of the database. As can be seen from the above, how to reduce the complexity of dynamic computational identification of easily confused bacteria and improve the efficiency and accuracy of dynamic computational identification of easily confused bacteria is a problem to be solved in this field.
[0225] Referring to Figure 6, this embodiment of the invention discloses a dynamic computational identification method for easily confused bacteria, which may specifically include:
[0226] Step S31: Set the initial tolerance value for mass spectrum matching, and match the mass spectrum of easily confused bacteria to be matched with the mass spectrometry database to obtain the initial matching result, initial peak matching status and database spectrum corresponding to the initial tolerance value.
[0227] In the process of microbial mass spectrometry matching and identification, this application introduces a tolerance value 't' to correct for mass number errors generated during instrument detection. The tolerance value refers to the range selected when matching peak values. The larger the tolerance, the wider the range selected for peak matching. The purpose of the tolerance is to ensure that, within a certain limit, small variations in the number of peaks do not interfere with the accuracy of the overall matching results. Typically, 't' is an appropriate constant to ensure that a mass spectrometry peak is not misjudged due to 't' being too large, nor is it missed due to 't' being too small. However, due to the existence of inter-instrument differences and the varying degrees of difference in fingerprint spectra between different microorganisms or different strains of microorganisms, the pre-set tolerance value 't' may have a certain impact on the correctness of the actual matching results. This application proposes an adaptive quantitative method for 't', which allows the tolerance value 't' to be automatically optimized during the matching process and ultimately uses a suitable value for matching determination.
[0228] In this embodiment, an initial tolerance value for mass spectrum matching is adaptively set; the mass spectrum of easily confused bacteria to be matched is matched with all spectra in the mass spectrometry database to obtain the initial matching result, initial peak matching status, and database spectra corresponding to the initial tolerance value; wherein, the initial peak matching status includes the error values between all matching peaks in the mass spectrometry database and the mass spectrum peak to be matched in the mass spectrum to be matched.
[0229] Specifically, a large tolerance value t0 is used as the initial value to obtain the initial tolerance value t0. The mass spectrum s0 to be matched is compared with the spectrum L = {l} in the mass spectrometry database. i , i∈N *} Perform matching to obtain the initially sorted matching result R0 = {r i , i∈N *}, the database map S0 corresponding to R0 = {s i , i∈N *}, and the initial peak matching situation B0={b i , i∈N * (Where, B0 includes the error values between all mass spectrometry database matching peaks and the mass spectrometry peaks to be matched).
[0230] Step S32: Using the initial matching result and the initial peak matching situation, determine whether the initial tolerance value meets the preset convergence condition. If the initial tolerance value meets the preset convergence condition, then the initial matching result is used as the dynamic calculation identification result of the easily confused bacteria.
[0231] In this embodiment, a termination equation is set, and the initial matching result and the initial peak matching situation are input into the termination equation. It is then determined whether the initial tolerance value meets a preset convergence condition. If the initial tolerance value meets the preset convergence condition, the initial matching result is used as the dynamic calculation identification result of the easily confused bacteria. The termination equation is:
[0232] end = f(R0, B0);
[0233] Where R0 is the initial matching result and B0 is the initial peak matching situation.
[0234] The convergence condition judgment process is as follows: it is determined whether all mass spectrometry peak error values in the initial matching result are not greater than the initial tolerance value; if all mass spectrometry peak error values in the initial matching result are not greater than the initial tolerance value, then the uniqueness of the initial peak matching is determined; if the uniqueness determination is successful, then the initial tolerance value is determined to meet the preset convergence condition.
[0235] Specifically, a termination equation is defined to determine whether the initial tolerance value t0 has converged (i.e., whether t0 is the optimal tolerance). This termination equation is then used to determine whether the initial tolerance t0 has met the convergence condition for the initial matching result R0 and the initial peak matching situation B0. Specifically, the initial matching result R0 and the initial peak matching situation B0 are substituted into the termination equation to determine the convergence condition. If, at a tolerance of t0, all mass spectrometry peak error values in variable B0 are less than or equal to t0, and the matching result R0 is unique (i.e., uniqueness is verified), then the tolerance has converged, meaning the initial tolerance value meets the preset convergence condition. At this point, t0 is assigned to the converged tolerance variable t. m , i.e. t m =t0, and R0 is used as the dynamic calculation and identification result of easily confused bacteria. m R m ={r i , i∈N *}
[0236] Step S33: If the initial tolerance value does not meet the preset convergence condition, the initial matching result and the initial peak matching situation are calculated to obtain the optimized tolerance value.
[0237] In this embodiment, if the initial tolerance value does not meet the preset convergence condition, intermediate variable matching results and intermediate variable peak matching conditions are set. The values of the initial matching results and the initial peak matching conditions are assigned to the intermediate variable matching results and the intermediate variable peak matching conditions. A tolerance adjustment equation is set, and the intermediate variable matching results and the intermediate variable peak matching conditions are input into the tolerance adjustment equation for calculation to obtain the optimized tolerance value. The tolerance adjustment equation is:
[0238] t n =g(R) n B n );
[0239] Among them, t n To optimize the tolerance value, R n For the intermediate variable matching result, B n This refers to the peak matching situation for intermediate variables.
[0240] Specifically, if the initial tolerance value does not meet the preset convergence condition, then intermediate variables are set, including the intermediate variable matching result R. n Matching case B with intermediate variable peaks n Receive the initial matching result R0 and the initial peak matching status B0, and let R... n =R0,B n =B0, then set the tolerance adjustment equation, and the intermediate variable matching result R n Matching case B with intermediate variable peaksn Substituting into the tolerance adjustment equation, we obtain the optimized tolerance value t. n .
[0241] Step S34: Match the database spectrum based on the optimized tolerance value to obtain the current matching result and the current peak matching status. Repeat the process of judging the convergence condition until the optimized tolerance value meets the convergence condition. Use the current matching result as the dynamic calculation identification result of the easily confused bacteria.
[0242] In this embodiment, the optimized tolerance value t is used. n The database graph S0 is matched again to obtain the current matching result R. n ={r i , i∈N *}, and R n The corresponding current database map S n ={s i , i∈N * Matching situation B with the current peak n ={b i , i∈N *}, using the termination equation again, (R) n B n Substitute the values into the termination equation and perform the calculation to determine t. n Has convergence occurred? If convergence has occurred, then the corresponding current matching result R is... n As a result of dynamic computational identification of easily confused bacteria.
[0243] The specific process of dynamic operation identification in this application is shown in Figure 7. (1) Collect the mass spectra of easily confused bacteria to be matched; (2) Set the initial tolerance value for mass spectra matching, and match the mass spectra of easily confused bacteria to be matched with the mass spectrometry database to obtain the initial matching result, initial peak matching status and database spectrum corresponding to the initial tolerance value; (3) Determine whether the initial tolerance value meets the preset convergence condition. If the initial tolerance value meets the preset convergence condition, the initial matching result is used as the dynamic operation identification result of easily confused bacteria; (4) If the initial tolerance value does not meet the preset convergence condition... The preset convergence condition is used to calculate the initial matching result and the initial peak matching situation to obtain the optimized tolerance value. Then, the database spectrum is matched based on the optimized tolerance value to obtain the current matching result and the current peak matching situation. The convergence condition judgment process is repeated in a loop. If the optimized tolerance value does not meet the preset convergence condition, the optimized tolerance value is calculated again until the preset convergence condition is met. For example, if the optimized tolerance value meets the preset convergence condition in the 12th loop, the corresponding 12th matching result is used as the dynamic calculation identification result of the easily confused bacteria.
[0244] In addition, before setting the initial tolerance value for mass spectrum matching, the method further includes: constructing each dynamic operation startup rule based on business needs and according to a preset rule construction method; and saving each of the dynamic operation startup rules to a preset dynamic operation startup rule library.
[0245] Therefore, the prerequisite for the dynamic computation identification process of easily confused bacteria is: first, determine whether there is a target dynamic computation startup rule corresponding to each of the dynamic computation startup rules in the dynamic computation startup rule library on the local machine; if there is a target dynamic computation startup rule corresponding to each of the dynamic computation startup rules in the dynamic computation startup rule library on the local machine, then start the target dynamic computation startup rule and execute the dynamic computation identification process of easily confused bacteria proposed in this application.
[0246] In this embodiment, an initial tolerance value for mass spectrum matching is set, and the mass spectrum of easily confused bacteria to be matched is matched with a mass spectrometry database to obtain an initial matching result, an initial peak matching status, and a database spectrum corresponding to the initial tolerance value. Using the initial matching result and the initial peak matching status, it is determined whether the initial tolerance value meets a preset convergence condition. If the initial tolerance value meets the preset convergence condition, the initial matching result is used as the dynamic calculation identification result of the easily confused bacteria. If the initial tolerance value does not meet the preset convergence condition, the initial matching result and the initial peak matching status are calculated to obtain an optimized tolerance value. Based on the optimized tolerance value, the database spectrum is matched to obtain a current matching result and a current peak matching status. The convergence condition judgment process is repeated until the optimized tolerance value meets the convergence condition, and the current matching result is used as the dynamic calculation identification result of the easily confused bacteria. This application utilizes a predefined initial tolerance value to match the mass spectrum to be matched with a mass spectrometry database. It then determines whether the initial tolerance value meets a preset convergence condition. If it does, the initial matching result is used as the dynamic identification result for easily confused bacteria. If not, an optimized tolerance value is calculated, and the matching is performed again until the optimized tolerance value meets the convergence condition. The current matching result is then used as the dynamic identification result for easily confused bacteria. This application enables dynamic identification of easily confused bacteria without adding an extra database, through dynamic matching between the mass spectrum to be matched and the mass spectrometry database, and by determining the convergence of the tolerance value. This reduces the workload of dynamic identification of mixed bacteria mass spectra and is applicable to the identification of various easily confused bacteria, reducing the complexity of dynamic identification of easily confused bacteria and improving its efficiency and accuracy.
[0247] Referring to Figure 8, this embodiment of the invention discloses a dynamic computational identification device for easily confused bacteria, which may specifically include:
[0248] The matching module 110 is used to set the initial tolerance value for mass spectrum matching and to match the mass spectrum of easily confused bacteria with the mass spectrometry database to obtain the initial matching result, initial peak matching status and database spectrum corresponding to the initial tolerance value.
[0249] The first calculation identification result determination module 111 is used to determine whether the initial tolerance value meets the preset convergence condition by using the initial matching result and the initial peak matching situation. If the initial tolerance value meets the preset convergence condition, the initial matching result is used as the dynamic calculation identification result of the easily confused bacteria.
[0250] The calculation module 112 is used to calculate the initial matching result and the initial peak matching situation to obtain an optimized tolerance value if the initial tolerance value does not meet the preset convergence condition.
[0251] The second operation identification result determination module 113 is used to match the database spectrum based on the optimized tolerance value to obtain the current matching result and the current peak matching status, and repeat the process of judging the convergence condition until the optimized tolerance value meets the convergence condition, and use the current matching result as the dynamic operation identification result of the easily confused bacteria.
[0252] In this embodiment, an initial tolerance value for mass spectrum matching is set, and the mass spectrum of easily confused bacteria to be matched is matched with a mass spectrometry database to obtain an initial matching result, an initial peak matching status, and a database spectrum corresponding to the initial tolerance value. Using the initial matching result and the initial peak matching status, it is determined whether the initial tolerance value meets a preset convergence condition. If the initial tolerance value meets the preset convergence condition, the initial matching result is used as the dynamic calculation identification result of the easily confused bacteria. If the initial tolerance value does not meet the preset convergence condition, the initial matching result and the initial peak matching status are calculated to obtain an optimized tolerance value. Based on the optimized tolerance value, the database spectrum is matched to obtain a current matching result and a current peak matching status. The convergence condition judgment process is repeated until the optimized tolerance value meets the convergence condition, and the current matching result is used as the dynamic calculation identification result of the easily confused bacteria. This application utilizes a predefined initial tolerance value to match the mass spectrum to be matched with a mass spectrometry database. It then determines whether the initial tolerance value meets a preset convergence condition. If it does, the initial matching result is used as the dynamic identification result for easily confused bacteria. If not, an optimized tolerance value is calculated, and the matching is performed again until the optimized tolerance value meets the convergence condition. The current matching result is then used as the dynamic identification result for easily confused bacteria. This application enables dynamic identification of easily confused bacteria without adding an extra database, through dynamic matching between the mass spectrum to be matched and the mass spectrometry database, and by determining the convergence of the tolerance value. This reduces the workload of dynamic identification of mixed bacteria mass spectra and is applicable to the identification of various easily confused bacteria, reducing the complexity of dynamic identification of easily confused bacteria and improving its efficiency and accuracy.
[0253] In some specific embodiments, the matching module 110 may specifically include:
[0254] The initial tolerance setting module is used to adaptively set the initial tolerance value for mass spectrum matching;
[0255] The spectrum matching module is used to match the mass spectrum of easily confused bacteria to be matched with all spectra in the mass spectrometry database to obtain the initial matching result, initial peak matching status and database spectrum corresponding to the initial tolerance value; wherein, the initial peak matching status includes the error value between all matching peaks in the mass spectrometry database and the mass spectrum peak to be matched in the mass spectrum to be matched.
[0256] In some specific embodiments, the first calculation identification result determination module 111 may specifically include:
[0257] The convergence determination module is used to set a termination equation, input the initial matching result and the initial peak matching status into the termination equation, and determine whether the initial tolerance value meets the preset convergence condition; the termination equation is:
[0258] end = f(R0, B0);
[0259] Where R0 is the initial matching result and B0 is the initial peak matching situation.
[0260] In some specific embodiments, the first calculation identification result determination module 111 may specifically include:
[0261] The error value judgment module is used to determine whether the error values of all mass spectrometry peaks in the initial matching result are not greater than the initial tolerance value;
[0262] The uniqueness identification module is used to identify the uniqueness of the initial peak matching if all mass spectrometry peak error values in the initial matching result are not greater than the initial tolerance value. If the uniqueness identification is successful, the initial tolerance value is determined to meet the preset convergence condition.
[0263] In some specific embodiments, the computing module 112 may specifically include:
[0264] The intermediate variable setting module is used to set the intermediate variable matching results and the intermediate variable peak matching status;
[0265] The assignment module is used to assign the values of the initial matching result and the initial peak matching condition to the intermediate variable matching result and the intermediate variable peak matching condition.
[0266] In some specific embodiments, the computing module 112 may specifically include:
[0267] The tolerance value calculation module is used to set the tolerance adjustment equation, input the intermediate variable matching result and the intermediate variable peak matching situation into the tolerance adjustment equation for calculation, so as to obtain the optimized tolerance value; the tolerance adjustment equation is:
[0268] t n =g(R) n B n );
[0269] Among them, t n To optimize the tolerance value, R n For the intermediate variable matching result, B n This refers to the peak matching situation for intermediate variables.
[0270] In some specific embodiments, the matching module 110 may further include:
[0271] The rule building module is used to build various dynamic operation startup rules based on business needs and according to preset rule building methods;
[0272] The rule saving module is used to save each of the dynamic operation startup rules to the preset dynamic operation startup rule library.
[0273] In some specific embodiments, the dynamic calculation and identification device for easily confused bacteria may further include:
[0274] The judgment module is used to determine whether a target dynamic operation startup rule corresponding to each dynamic operation startup rule in the dynamic operation startup rule library exists locally;
[0275] The rule initiation module is used to initiate the target dynamic operation initiation rule and execute the dynamic operation identification process for easily confused bacteria if a target dynamic operation initiation rule corresponding to each of the dynamic operation initiation rule library exists locally.
[0276] Furthermore, this embodiment also provides another electronic device, including: a memory for storing a computer program; and a processor for executing the computer program to implement the aforementioned dynamic computational identification method for easily confused bacteria.
[0277] Furthermore, this embodiment also provides another computer-readable storage medium for storing a computer program; wherein, when the computer program is executed by a processor, it implements the aforementioned dynamic calculation identification method for easily confused bacteria.
[0278] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0279] Currently, mass spectrometry identifies microorganisms by comparing acquired spectra with those in a database. The similarity scores are then output as a score. Generally, scores above 9.0 or 2.0 represent species-level identification, scores between 6.0 and 9.0 or 1.7 and 2.0 represent genus-level identification, and scores below 6.0 or 1.7 generally indicate unreliable results. During mass spectrometry identification, due to limitations of the spectrometer, some microorganisms with high similarity or close phylogenetic relationships may show multiple clear species results with scores above 9.0 or 2.0. This situation is called mixed identification of mass spectrometry results, and such bacteria are called easily confused bacteria. Current methods for identifying easily confused bacteria using mass spectrometry employ database algorithms such as super-spectrum databases. One approach involves initial super-spectrum matching followed by secondary matching using a smaller spectral database containing these easily confused bacteria. However, this method is limited in its ability to distinguish species, and its accuracy is affected by the number of strains in the secondary database. The secondary database needs to contain a large number of strains to be identified, making the initial collection of uncommon strains difficult. Furthermore, even with a secondary database, accurate identification is not guaranteed. Therefore, reducing reliance on easily confused bacterial spectra and improving the accuracy and efficiency of their identification remains a challenge in this field.
[0280] Mass spectrometry identifies microorganisms by comparing their protein fingerprints with those in a database. During identification, high-abundance, high-concentration protein fingerprints are used, while medium-abundance or relatively low-concentration fingerprints are often ignored. High-abundance protein fingerprints are largely identical among similar strains, making it difficult to distinguish easily confused bacteria. By effectively utilizing fingerprint peaks that are ignored during database searches, accurate identification of easily confused bacteria can be achieved.
[0281] Referring to Figure 9, this embodiment of the invention discloses a method for identifying easily confused bacteria based on a chain analysis model, which specifically includes:
[0282] Step S41: Obtain the protein fingerprint of a known strain that is easily confused with the mass spectrometer, and perform noise processing on the protein fingerprint to obtain the processed protein fingerprint.
[0283] In this embodiment, microbial protein fingerprints are collected, and protein fingerprints of known strains that meet preset confounding criteria and are likely to be confounded by mass spectrometry are selected from all the microbial protein fingerprints. The protein fingerprints are then processed using preset typing software to obtain the processed protein fingerprints.
[0284] The specific noise processing procedure is as follows: the protein fingerprint spectrum is input into the preset typing software, and noise processing is performed on the protein fingerprint spectrum based on the wavelet transform method; the noise processing includes exponential smoothing elimination processing.
[0285] This application removes noise from the spectrum through wavelet transform and eliminates spikes caused by systematic errors through exponential smoothing, thereby reducing unstable interference factors included in the model group.
[0286] Step S42: Perform peak multiple analysis and comparison on different species in the processed protein fingerprint to obtain each unique peak corresponding to different species. Use a preset binary chain analysis model to classify each unique peak to construct a chain analysis model. Train the chain analysis model to obtain the target chain analysis model.
[0287] In this embodiment, a preset typing software is used to perform multiple analysis on the processed protein fingerprint spectrum, and multiple peak comparisons are performed on different species in the processed protein fingerprint spectrum to obtain the unique peaks corresponding to different species. A preset binary chain analysis model is used to classify the unique peaks to construct a chain analysis model, obtain historical species protein fingerprint peaks, and use the historical species protein fingerprint peaks to train the chain analysis model to obtain training results. Based on the training results, the parameters in the chain analysis model are modified and adjusted to obtain the target chain analysis model.
[0288] The specific process of constructing the chain analysis model in this application is as follows: Based on each unique peak, a single binary analysis model is established in the typing software. When establishing the binary analysis model, multiple parameters such as the number of peaks, peak matching tolerance, peak abundance, and peak abundance tolerance should be initially set according to the analysis results of the medium-sized data volume of the protein fingerprint. Then, the established binary analysis model is cross-correlated to form a chain analysis structure, and a chain analysis model is constructed for the joint analysis of multiple easily confused bacteria.
[0289] The specific implementation methods for multiple analyses and comparisons in this application include the following two scenarios:
[0290] (1) The specific implementation process of the first multiple analysis and comparison is shown in Figure 2. Set an identification result as genus-level identification, which includes common characteristic peaks. First, take a species with a special unique peak and the peak situation of genus-level identification to form the first binary judgment. Two results will appear: one is that it contains a special peak and can be identified as the first species, and the other is that it does not contain a characteristic peak and can only be identified as the name of the genus. Then, take a second species with a characteristic peak and the genus identification to form the second binary judgment. Continue in this way until the analysis tree reaches the point of indivisibility. At this time, the distinction of all easily confused identification bacteria can be completed.
[0291] (2) The specific implementation process of the second type of multiple analysis and comparison is shown in Figure 3. First, the population peak analysis is performed on the several indistinguishable bacteria. Based on the distribution of characteristic peaks, the bacteria are divided into two populations. The difference peak combination between the two populations is used as the first binary branching judgment. Then, peak analysis is performed again in the two populations. Based on the specific peak combination, the bacteria in the population are divided into two small bacteria to form the second binary branching judgment. This process is repeated until the bacteria are finally indistinguishable.
[0292] In addition, this application uses known historical species protein fingerprint peaks for model training, and modifies the peak judgment parameters in the model based on the training results until the model accuracy is stable, thereby obtaining the target chain analysis model.
[0293] Step S43: Input the protein fingerprint of the easily confused bacteria to be identified into the target chain analysis model for identification, so as to output the identification results of the easily confused bacteria to be identified.
[0294] In this embodiment, the target chain analysis model is modularly stored, and model startup rules are set. The protein fingerprint spectrum of the easily confused bacteria to be identified is obtained. It is determined whether the protein fingerprint spectrum meets the model startup rules. If the protein fingerprint spectrum meets the model startup rules, the target chain analysis model is started, and the protein fingerprint spectrum is input into the target chain analysis model for identification, so as to output the identification result of the easily confused bacteria to be identified.
[0295] In other words, the target chain analysis model is modularly stored, and the model activation rules are defined. For example, when species-level mixed identification occurs in the identification results, and the species mixed in the identification happen to be species that the model can distinguish, the chain model is retrieved from the stored model library and the results are accurately interpreted.
[0296] The innovations and advantages of this application are: (1) it reduces the dependence on spectrograms by establishing a model for differentiation; (2) it can accurately identify easily confused strains included in the model; (3) it is applicable not only to the accurate differentiation of pairwise mixed identification bacteria, but also to the accurate identification of mixed identification of multiple bacteria. This application uses the difference points between strains and populations to connect easily confused bacterial species through the difference point concatenation method, so as to accurately identify each strain in the easily confused identification strain population.
[0297] In this embodiment, protein fingerprints of known strains that are easily confused with each other in mass spectrometry are obtained. Noise is processed on these protein fingerprints to obtain processed protein fingerprints. Multiple peak analysis and comparison are performed on different species in the processed protein fingerprints to obtain unique peaks corresponding to each species. A preset binary chain analysis model is used to classify these unique peaks to construct a chain analysis model. This chain analysis model is then trained to obtain a target chain analysis model. The protein fingerprint of the easily confused bacteria to be identified is input into the target chain analysis model for identification, and the identification results of the easily confused bacteria to be identified are output. This application noise-processes the protein fingerprints of known strains that are easily confused with each other in mass spectrometry. Then, it performs peak multiplexing and comparison on different species within the processed protein fingerprints. A binary chain analysis model is used to classify the unique peaks obtained from the peak multiplexing and comparison, thereby constructing a chain analysis model. This reduces the reliance on easily confused bacterial fingerprints. Furthermore, the trained target chain analysis model is used to identify the protein fingerprints and output the identification results. This improves the accuracy and efficiency of easily confused bacterial identification and is applicable not only to the accurate differentiation of pairwise mixed bacteria but also to the accurate identification of mixed bacteria of multiple species.
[0298] Referring to Figure 10, this embodiment of the invention discloses a device for identifying easily confused bacteria based on a chain analysis model, which may specifically include:
[0299] The spectrum acquisition and processing module 1100 is used to acquire the protein fingerprint spectrum of a known strain that is easily mixed with the mass spectrometer, and to perform noise processing on the protein fingerprint spectrum to obtain the processed protein fingerprint spectrum.
[0300] The model building module 1101 is used to perform peak multiple analysis and comparison on different species in the processed protein fingerprint to obtain each special unique peak corresponding to different species. The special unique peak is classified using a preset binary chain analysis model to construct a chain analysis model. The chain analysis model is trained to obtain the target chain analysis model.
[0301] The easily confused bacteria identification module 1102 is used to input the protein fingerprint of the easily confused bacteria to be identified into the target chain analysis model for identification, so as to output the identification result of the easily confused bacteria to be identified.
[0302] In this embodiment, protein fingerprints of known strains that are easily confused with each other in mass spectrometry are obtained. Noise is processed on these protein fingerprints to obtain processed protein fingerprints. Multiple peak analysis and comparison are performed on different species in the processed protein fingerprints to obtain unique peaks corresponding to each species. A preset binary chain analysis model is used to classify these unique peaks to construct a chain analysis model. This chain analysis model is then trained to obtain a target chain analysis model. The protein fingerprint of the easily confused bacteria to be identified is input into the target chain analysis model for identification, and the identification results of the easily confused bacteria to be identified are output. This application noise-processes the protein fingerprints of known strains that are easily confused with each other in mass spectrometry. Then, it performs peak multiplexing and comparison on different species within the processed protein fingerprints. A binary chain analysis model is used to classify the unique peaks obtained from the peak multiplexing and comparison, thereby constructing a chain analysis model. This reduces the reliance on easily confused bacterial fingerprints. Furthermore, the trained target chain analysis model is used to identify the protein fingerprints and output the identification results. This improves the accuracy and efficiency of easily confused bacterial identification and is applicable not only to the accurate differentiation of pairwise mixed bacteria but also to the accurate identification of mixed bacteria of multiple species.
[0303] In some specific embodiments, the map acquisition and processing module 1100 may specifically include:
[0304] The spectral acquisition and screening module is used to acquire microbial protein fingerprints and screen the protein fingerprints of known strains that meet the preset conditions for easy conjugation of mass spectrometry from all the microbial protein fingerprints.
[0305] The noise processing module is used to process the protein fingerprint using preset typing software.
[0306] In some specific embodiments, the map acquisition and processing module 1100 may specifically include:
[0307] The protein fingerprint input module is used to input the protein fingerprint into a preset typing software and to perform noise processing on the protein fingerprint based on wavelet transform; the noise processing includes exponential smoothing elimination processing.
[0308] In some specific embodiments, the model building module 1101 may specifically include:
[0309] The multiple comparison module is used to perform multiple analysis on the processed protein fingerprint using preset typing software, and to perform multiple peak comparisons on different species in the processed protein fingerprint to obtain the unique peaks corresponding to different species.
[0310] In some specific embodiments, the model building module 1101 may specifically include:
[0311] The model training module is used to obtain historical species protein fingerprint peaks and use the historical species protein fingerprint peaks to train the chain analysis model to obtain training results.
[0312] The parameter modification and adjustment module is used to modify and adjust the parameters in the chain analysis model based on the training results to obtain the target chain analysis model.
[0313] In some specific embodiments, the easily confused bacteria identification module 1102 may specifically include:
[0314] The modular storage module is used to modularly store the target chain analysis model and set the model startup rules.
[0315] In some specific embodiments, the easily confused bacteria identification module 1102 may specifically include:
[0316] The judgment module is used to obtain the protein fingerprint spectrum of the easily confused bacteria to be identified and to determine whether the protein fingerprint spectrum meets the model startup rules.
[0317] The identification module is used to start the target chain analysis model and input the protein fingerprint into the target chain analysis model for identification if the protein fingerprint meets the model startup rules.
[0318] Furthermore, this embodiment also provides another electronic device, including: a memory for storing a computer program; and a processor for executing the computer program to implement the aforementioned method for identifying easily confused bacteria based on a chain analysis model.
[0319] Furthermore, this embodiment also provides another computer-readable storage medium for storing a computer program; wherein, when the computer program is executed by a processor, it implements the aforementioned method for identifying easily confused bacteria based on a chain analysis model.
[0320] The embodiments of a method for identifying easily confused bacteria, a method for identifying easily confused bacteria based on dynamic computation, and a method for identifying easily confused bacteria based on a chain analysis model can be used for mutual reference.
[0321] Furthermore, the electronic device structure diagrams for a dynamic computational identification method for easily confused bacteria and an identification method for easily confused bacteria based on a chain analysis model can both be found in Figure 5.
Claims
1. A method for identifying easily confused bacteria, characterized in that, include: Obtain the mass spectrometry identification results sent by the original mass spectrometer, determine whether there are easily confused bacteria in the mass spectrometry identification results, and if there are easily confused bacteria in the mass spectrometry identification results, determine whether there is a micro-difference analysis model with the same type as the easily confused bacteria in the preset model library. If a micro-difference analysis model of the same type as the easily confused bacteria exists in the preset model library, then the micro-difference analysis model is retrieved to analyze and identify the easily confused bacteria in order to obtain the model analysis and identification results. If there is no micro-difference analysis model in the preset model library that is the same type as the easily confused bacteria, then a dynamic matching algorithm is used to perform dynamic calculation identification on the easily confused bacteria, or a dynamic matching algorithm is used and the micro-difference analysis model is retrieved to perform dynamic calculation identification on the easily confused bacteria, so as to obtain the dynamic calculation identification result.
2. The method for identifying easily confused bacteria according to claim 1, characterized in that, Before determining whether a micro-difference analysis model of the same type as the easily confused bacteria exists in the preset model library, the method further includes: Each micro-difference analysis model was constructed using the micro-difference analysis method; Each of the aforementioned micro-difference analysis models is saved to the model library.
3. The method for identifying easily confused bacteria according to claim 2, characterized in that, The method of constructing various micro-difference analysis models using micro-difference analysis includes: Historical data is acquired, and the historical data is analyzed using the eigenvalue analysis method and a preset eigenvalue analysis system to obtain fingerprint profiles of each bacterial species. The fingerprint profiles of each bacterial species are then processed and subjected to big data analysis to construct micro-difference analysis models. Alternatively, a spectral trend analysis method and an artificial intelligence learning method can be used to autonomously learn the bacterial species types in the historical data to obtain the spectral trend differences between different groups. A reference spectral trend differentiation model can be constructed based on the spectral trend differences. The reference spectral trend differentiation model can be verified and corrected to obtain each of the micro-difference analysis models.
4. The method for identifying easily confused bacteria according to claim 1, characterized in that, The micro-difference analysis model includes a chain analysis model; the construction process of the chain analysis model includes: Obtain the protein fingerprint of a known strain that is easily confused with the mass spectrometer, and perform noise reduction on the protein fingerprint to obtain the processed protein fingerprint. Peak multiple analysis and comparison are performed on different species in the processed protein fingerprint to obtain each unique peak corresponding to different species. Each unique peak is classified using a preset binary chain analysis model to construct a chain analysis model. The chain analysis model is then trained to obtain the target chain analysis model.
5. The method for identifying easily confused bacteria according to claim 4, characterized in that, The process involves performing multiple peak analysis and comparison on different species in the processed protein fingerprint to obtain unique peak values corresponding to each species, including: The processed protein fingerprint was analyzed using a pre-defined typing software, and multiple peak comparisons were performed on different species in the processed protein fingerprint to obtain the unique peak values corresponding to each species.
6. The method for identifying easily confused bacteria based on a chain analysis model according to claim 4, characterized in that, Training the chain analysis model to obtain the target chain analysis model includes: Obtain historical species protein fingerprint peaks, and use the historical species protein fingerprint peaks to train the chain analysis model to obtain training results; Based on the training results, the parameters in the chain analysis model are modified and adjusted to obtain the target chain analysis model.
7. The method for identifying easily confused bacteria according to claim 1, characterized in that, The process of using a dynamic matching algorithm to dynamically identify the easily confused bacteria is as follows: Set an initial tolerance value for mass spectrum matching, and match the mass spectra of easily confused bacteria to be matched with the mass spectrometry database to obtain the initial matching result, initial peak matching status and database spectrum corresponding to the initial tolerance value; Using the initial matching result and the initial peak matching situation, it is determined whether the initial tolerance value meets the preset convergence condition. If the initial tolerance value meets the preset convergence condition, the initial matching result is used as the dynamic calculation identification result of the easily confused bacteria. If the initial tolerance value does not meet the preset convergence condition, the initial matching result and the initial peak matching situation are calculated to obtain the optimized tolerance value. The database spectrum is matched based on the optimized tolerance value to obtain the current matching result and the current peak matching status. The process of judging the convergence condition is repeated until the optimized tolerance value meets the convergence condition. The current matching result is then used as the dynamic calculation and identification result of the easily confused bacteria.
8. The method for identifying easily confused bacteria according to claim 7, characterized in that, The process involves setting an initial tolerance value for mass spectrum matching and matching the mass spectra of easily confused bacteria with a mass spectrometry database to obtain initial matching results, initial peak matching information, and database spectra corresponding to the initial tolerance value, including: Adaptively set the initial tolerance value for mass spectrum matching; The mass spectrum of the easily confused bacteria to be matched is matched with all spectra in the mass spectrometry database to obtain the initial matching result, initial peak matching status and database spectra corresponding to the initial tolerance value; wherein, the initial peak matching status includes the error values between all matching peaks in the mass spectrometry database and the mass spectrum peak to be matched in the mass spectrum of the bacteria to be matched.
9. The method for identifying easily confused bacteria according to claim 7, characterized in that, The step of determining whether the initial tolerance value meets the preset convergence condition using the initial matching result and the initial peak matching situation includes: A termination equation is defined, and the initial matching result and the initial peak matching condition are input into the termination equation. It is then determined whether the initial tolerance value satisfies a preset convergence condition. The termination equation is: end = f(R0, B0); Where R0 is the initial matching result and B0 is the initial peak matching situation.
10. The method for identifying easily confused bacteria according to claim 7, characterized in that, The step of determining whether the initial tolerance value meets a preset convergence condition, if the initial tolerance value meets the preset convergence condition, includes: Determine whether all mass spectrometry peak error values in the initial matching result are not greater than the initial tolerance value; If all mass spectrometry peak error values in the initial matching results are not greater than the initial tolerance value, then the initial peak matching is uniquely identified. If the uniqueness identification is successful, then the initial tolerance value is determined to meet the preset convergence condition.
11. The method for identifying easily confused bacteria according to any one of claims 7 to 10, characterized in that, Before calculating the initial matching result and the initial peak matching situation, the method further includes: Set the intermediate variable matching results and intermediate variable peak matching status; The values of the initial matching result and the initial peak matching condition are assigned to the intermediate variable matching result and the intermediate variable peak matching condition.
12. The method for identifying easily confused bacteria according to claim 11, characterized in that, The calculation of the initial matching result and the initial peak matching situation to obtain the optimized tolerance value includes: A tolerance adjustment equation is defined, and the intermediate variable matching results and intermediate variable peak matching are input into the tolerance adjustment equation for calculation to obtain the optimized tolerance value; the tolerance adjustment equation is: t n =g(R n ,B n ); Among them, t n To optimize the tolerance value, R n For the intermediate variable matching result, B n This refers to the peak matching situation for intermediate variables.
13. The method for identifying easily confused bacteria according to claim 1, characterized in that, The dynamic matching algorithm used to dynamically identify the easily confused bacteria includes: Determine whether a target dynamic computation startup rule exists that corresponds to each dynamic computation startup rule in the dynamic computation startup rule base; If a target dynamic operation startup rule exists that corresponds to each of the dynamic operation startup rules in the dynamic operation startup rule base, then the target dynamic operation startup rule is started, and a dynamic matching algorithm is used to perform dynamic operation identification on easily confused bacteria.
14. The method for identifying easily confused bacteria according to claim 13, characterized in that, Before determining whether a target dynamic computing startup rule exists that corresponds to each dynamic computing startup rule in the dynamic computing startup rule base, the method further includes: Each dynamic operation startup rule is constructed based on business needs and according to a preset rule construction method; Each of the aforementioned dynamic operation startup rules is saved to the preset dynamic operation startup rule library.
15. The method for identifying easily confused bacteria according to claim 1, characterized in that, The process of employing a dynamic matching algorithm and retrieving the micro-difference analysis model to perform dynamic calculation and identification of the easily confused bacteria, in order to obtain the dynamic calculation and identification results, includes: Set an initial tolerance value for mass spectrum matching, perform the first round of dynamic calculation, and match the mass spectra of easily confused bacteria to be matched with the mass spectrometry database to obtain the initial matching result, initial peak matching status and database spectrum corresponding to the initial tolerance value. Using the initial matching result and the initial peak matching situation, it is determined whether the initial tolerance value meets the preset convergence condition. If the initial tolerance value meets the preset convergence condition, the dynamic calculation ends, and the initial matching result is used as the dynamic calculation identification result of the easily confused bacteria. If the initial tolerance value does not meet the preset convergence condition, that is, the first round of dynamic calculation cannot produce an identification result, then it is determined whether there is a micro-difference analysis model in the preset model library that is the same type as the current matching result. If there is a micro-difference analysis model in the preset model library that is the same type as the current matching result, then the micro-difference analysis model is called to analyze and identify the current matching result in order to obtain the model analysis and identification result. If there is no micro-difference analysis model in the preset model library that is the same type as the current matching result, then the initial matching result and the initial peak matching situation are calculated to obtain the optimized tolerance value; Based on the optimized tolerance value, a new round of dynamic calculation and model matching process is repeated until the identification result is obtained.
16. A device for identifying easily mixed bacteria, characterized in that, include: The judgment module is used to obtain the mass spectrometry identification results sent by the original mass spectrometer, and to determine whether there are easily confused bacteria in the mass spectrometry identification results. If there are easily confused bacteria in the mass spectrometry identification results, it is then determined whether there is a micro-difference analysis model with the same type as the easily confused bacteria in the preset model library. The model analysis module is used to retrieve a micro-difference analysis model that is the same type as the easily confused bacteria in the preset model library to analyze and identify the easily confused bacteria, so as to obtain the model analysis and identification results. The dynamic calculation identification module is used to perform dynamic calculation identification on the easily confused bacteria if there is no micro-difference analysis model of the same type as the easily confused bacteria in the preset model library, or to perform dynamic calculation identification on the easily confused bacteria by using a dynamic matching algorithm and calling the micro-difference analysis model, so as to obtain the dynamic calculation identification result.
17. An electronic device, characterized in that, include: Memory, used to store computer programs; A processor for executing the computer program to implement the method for identifying easily confused bacteria as described in any one of claims 1 to 15.
18. A computer-readable storage medium, characterized in that, Used to store a computer program; wherein, when the computer program is executed by a processor, it implements the method for identifying easily confused bacteria as described in any one of claims 1 to 15.
19. A dynamic computational identification method for easily confused bacteria, characterized in that, include: Set an initial tolerance value for mass spectrum matching, and match the mass spectra of easily confused bacteria to be matched with the mass spectrometry database to obtain the initial matching result, initial peak matching status and database spectrum corresponding to the initial tolerance value; Using the initial matching result and the initial peak matching situation, it is determined whether the initial tolerance value meets the preset convergence condition. If the initial tolerance value meets the preset convergence condition, the initial matching result is used as the dynamic calculation identification result of the easily confused bacteria. If the initial tolerance value does not meet the preset convergence condition, the initial matching result and the initial peak matching situation are calculated to obtain the optimized tolerance value. The database spectrum is matched based on the optimized tolerance value to obtain the current matching result and the current peak matching status. The process of judging the convergence condition is repeated until the optimized tolerance value meets the convergence condition. The current matching result is then used as the dynamic calculation and identification result of the easily confused bacteria.
20. The dynamic calculation identification method for easily confused bacteria according to claim 19, characterized in that, The process involves setting an initial tolerance value for mass spectrum matching and matching the mass spectra of easily confused bacteria with a mass spectrometry database to obtain initial matching results, initial peak matching information, and database spectra corresponding to the initial tolerance value, including: Adaptively set the initial tolerance value for mass spectrum matching; The mass spectrum of the easily confused bacteria to be matched is matched with all spectra in the mass spectrometry database to obtain the initial matching result, initial peak matching status and database spectra corresponding to the initial tolerance value; wherein, the initial peak matching status includes the error values between all matching peaks in the mass spectrometry database and the mass spectrum peak to be matched in the mass spectrum of the bacteria to be matched.
21. The dynamic calculation identification method for easily confused bacteria according to claim 19, characterized in that, The step of determining whether the initial tolerance value meets the preset convergence condition using the initial matching result and the initial peak matching situation includes: A termination equation is defined, and the initial matching result and the initial peak matching condition are input into the termination equation. It is then determined whether the initial tolerance value satisfies a preset convergence condition. The termination equation is: end = f(R0, B0); Where R0 is the initial matching result and B0 is the initial peak matching situation.
22. The dynamic calculation identification method for easily confused bacteria according to claim 19, characterized in that, The step of determining whether the initial tolerance value meets a preset convergence condition, if the initial tolerance value meets the preset convergence condition, includes: Determine whether all mass spectrometry peak error values in the initial matching result are not greater than the initial tolerance value; If all mass spectrometry peak error values in the initial matching results are not greater than the initial tolerance value, then the initial peak matching is uniquely identified. If the uniqueness identification is successful, then the initial tolerance value is determined to meet the preset convergence condition.
23. The dynamic calculation identification method for easily confused bacteria according to any one of claims 19 to 22, characterized in that, Before calculating the initial matching result and the initial peak matching situation, the method further includes: Set the intermediate variable matching results and intermediate variable peak matching status; The values of the initial matching result and the initial peak matching condition are assigned to the intermediate variable matching result and the intermediate variable peak matching condition.
24. The dynamic calculation identification method for easily confused bacteria according to claim 23, characterized in that, The calculation of the initial matching result and the initial peak matching situation to obtain the optimized tolerance value includes: A tolerance adjustment equation is defined, and the intermediate variable matching results and intermediate variable peak matching are input into the tolerance adjustment equation for calculation to obtain the optimized tolerance value; the tolerance adjustment equation is: t n =g(R n ,B n ); Among them, t n To optimize the tolerance value, R n For the intermediate variable matching result, B n This refers to the peak matching situation for intermediate variables.
25. The dynamic calculation identification method for easily confused bacteria according to claim 19, characterized in that, Before setting the initial tolerance value for mass spectrum matching, the following is also included: Each dynamic operation startup rule is constructed based on business needs and according to a preset rule construction method; Each of the aforementioned dynamic operation startup rules is saved to a preset dynamic operation startup rule library. Determine whether a target dynamic computing startup rule exists locally, corresponding to each of the dynamic computing startup rules in the dynamic computing startup rule base; If a target dynamic operation startup rule corresponding to each of the dynamic operation startup rules in the dynamic operation startup rule base exists locally, then the target dynamic operation startup rule is started, and the process of dynamic operation identification of easily confused bacteria is executed.
26. A dynamic computational identification device for easily confused bacteria, characterized in that, include: The matching module is used to set the initial tolerance value for mass spectrum matching and match the mass spectrum of easily confused bacteria to be matched with the mass spectrometry database to obtain the initial matching result, initial peak matching status and database spectrum corresponding to the initial tolerance value. The first calculation identification result determination module is used to determine whether the initial tolerance value meets the preset convergence condition by using the initial matching result and the initial peak matching situation. If the initial tolerance value meets the preset convergence condition, the initial matching result is used as the dynamic calculation identification result of the easily confused bacteria. The calculation module is used to calculate the initial matching result and the initial peak matching situation to obtain an optimized tolerance value if the initial tolerance value does not meet the preset convergence condition. The second operation identification result determination module is used to match the database spectrum based on the optimized tolerance value to obtain the current matching result and the current peak matching status, and repeat the process of judging the convergence condition until the optimized tolerance value meets the convergence condition, and use the current matching result as the dynamic operation identification result of the easily confused bacteria.
27. The dynamic calculation and identification device for easily mixed bacteria according to claim 26, characterized in that, The matching module includes: The initial tolerance setting module is used to adaptively set the initial tolerance value for mass spectrum matching; The spectrum matching module is used to match the mass spectrum of easily confused bacteria to be matched with all spectra in the mass spectrometry database to obtain the initial matching result, initial peak matching status and database spectrum corresponding to the initial tolerance value; wherein, the initial peak matching status includes the error value between all matching peaks in the mass spectrometry database and the mass spectrum peak to be matched in the mass spectrum to be matched.
28. An electronic device, characterized in that, include: Memory, used to store computer programs; A processor is configured to execute the computer program to implement the dynamic computational identification method for easily confused bacteria as described in any one of claims 19 to 25.
29. A computer-readable storage medium, characterized in that, Used to store a computer program; wherein, when the computer program is executed by a processor, it implements the dynamic computational identification method for easily confused bacteria as described in any one of claims 19 to 25.
30. A method for identifying easily confused bacteria based on a chain analysis model, characterized in that, include: Obtain the protein fingerprint of a known strain that is easily mixed with mass spectrometry, and perform noise reduction on the protein fingerprint to obtain the processed protein fingerprint. Peak multiple analysis and comparison are performed on different species in the processed protein fingerprint to obtain each unique peak corresponding to different species. Each unique peak is classified using a preset binary chain analysis model to construct a chain analysis model. The chain analysis model is trained to obtain the target chain analysis model. The protein fingerprint of the easily confused bacteria to be identified is input into the target chain analysis model for identification, so as to output the identification results of the easily confused bacteria to be identified.
31. The method for identifying easily confused bacteria based on a chain analysis model according to claim 30, characterized in that, The process of obtaining protein fingerprints of known strains that are easily mixed with mass spectrometry, and performing noise reduction on the protein fingerprints, includes: Collect microbial protein fingerprints and screen the protein fingerprints of known strains that meet the preset conditions for easy conjugation of mass spectrometry from all the microbial protein fingerprints. The protein fingerprint spectrum is noise-processed using preset typing software.
32. The method for identifying easily confused bacteria based on a chain analysis model according to claim 31, characterized in that, The noise processing of the protein fingerprint using preset typing software includes: The protein fingerprint is input into a preset typing software, and noise is processed on the protein fingerprint based on wavelet transform; the noise processing includes exponential smoothing elimination.
33. The method for identifying easily confused bacteria based on a chain analysis model according to claim 30, characterized in that, The process involves performing multiple peak analysis and comparison on different species in the processed protein fingerprint to obtain unique peak values corresponding to each species, including: The processed protein fingerprint was analyzed using a pre-defined typing software, and multiple peak comparisons were performed on different species in the processed protein fingerprint to obtain the unique peak values corresponding to each species.
34. The method for identifying easily confused bacteria based on a chain analysis model according to claim 30, characterized in that, Training the chain analysis model to obtain the target chain analysis model includes: Obtain historical species protein fingerprint peaks, and use the historical species protein fingerprint peaks to train the chain analysis model to obtain training results; Based on the training results, the parameters in the chain analysis model are modified and adjusted to obtain the target chain analysis model.
35. The method for identifying easily confused bacteria based on a chain analysis model according to any one of claims 30 to 34, characterized in that, Before inputting the protein fingerprint of the easily confused bacteria to be identified into the target chain analysis model for identification, the method further includes: The target chain analysis model is stored in a modular manner, and model startup rules are set.
36. The method for identifying easily confused bacteria based on a chain analysis model according to claim 35, characterized in that, The step of inputting the protein fingerprint of the easily confused bacteria to be identified into the target chain analysis model for identification includes: Obtain the protein fingerprint profile of the easily confused bacteria to be identified, and determine whether the protein fingerprint profile meets the model activation rules; If the protein fingerprint meets the model activation rules, the target chain analysis model is activated, and the protein fingerprint is input into the target chain analysis model for identification.
37. A device for identifying easily confused bacteria based on a chain analysis model, characterized in that, include: The spectrum acquisition and processing module is used to acquire protein fingerprint spectra of known strains that are easily mixed with mass spectrometry, and to perform noise processing on the protein fingerprint spectra to obtain the processed protein fingerprint spectra. The model building module is used to perform peak multiple analysis and comparison on different species in the processed protein fingerprint to obtain each special unique peak corresponding to different species. The module uses a preset binary chain analysis model to classify each special unique peak to construct a chain analysis model and trains the chain analysis model to obtain the target chain analysis model. The easily confused bacteria identification module is used to input the protein fingerprint of the easily confused bacteria to be identified into the target chain analysis model for identification, so as to output the identification results of the easily confused bacteria to be identified.
38. An electronic device, characterized in that, include: Memory, used to store computer programs; A processor for executing the computer program to implement the method for identifying easily confused bacteria based on a chain analysis model as described in any one of claims 30 to 36.
39. A computer-readable storage medium, characterized in that, Used to store computer programs; wherein, when the computer programs are executed by a processor, they implement the method for identifying easily confused bacteria based on a chain analysis model as described in any one of claims 30 to 36.