Methods and applications for constructing a target proteome atlas

By performing multiple mass spectrometry scans in a data-dependent acquisition mode without dynamic exclusion, a target proteome spectral library was constructed, which solved the problems of insufficient accuracy and sensitivity in trace proteome mass spectrometry analysis and achieved efficient identification of trace proteomes.

CN116844640BActive Publication Date: 2026-06-26REPRODUCTIVE & GENETIC HOSPITAL OF CITIC XIANGYA CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
REPRODUCTIVE & GENETIC HOSPITAL OF CITIC XIANGYA CO LTD
Filing Date
2023-07-03
Publication Date
2026-06-26

Smart Images

  • Figure CN116844640B_ABST
    Figure CN116844640B_ABST
Patent Text Reader

Abstract

The application relates to a method for constructing a target proteome spectrum library and application in micro target proteome analysis, and the method comprises the following steps: performing multiple times of mass spectrum scanning on a target proteome in a data-dependent acquisition mode without dynamic exclusion through different scanning ranges, so as to obtain mass spectrum data of the target proteome, wherein the mass spectrum data of the target proteome comprises primary spectra of multiple scanning ranges of the target proteome and secondary spectra of parent ions in each scanning range, which meet a preset threshold; and performing library searching based on the mass spectrum data of the target proteome to construct the target proteome spectrum library. The application constructs a spectrum library through mass spectrum data in a data-dependent acquisition mode without dynamic exclusion in multiple scanning ranges, performs library searching analysis on DIA data of a micro proteome, improves the success rate of DIA data matching, and further improves the accuracy and sensitivity of DIA data analysis, and improves the accuracy and sensitivity of micro protein identification.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of protein detection technology, and more specifically, to methods and applications for constructing target proteome spectral libraries. Background Technology

[0002] Data-dependent acquisition (DDA) and data-independent acquisition (DIA) are two common data acquisition modes in non-targeted proteomics based on high-resolution mass spectrometry. In DDA, the mass spectrometer performs a primary scan, followed by a secondary scan of a list of precursor ions (intensity-dependent) selected from the primary scan mass spectrum. Theoretically, DIA can continuously and unbiasedly acquire MS2 information of all primary precursor ions, but due to the complexity of its MS2 mass spectrometry information and the weak correlation between precursor and daughter ions, data processing is difficult.

[0003] Currently, mass spectrometry analysis of trace proteomics mainly adopts the data-dependent acquisition mode. In order to increase the depth of proteomics identification and scan more proteins with low and medium abundance, dynamic exclusion function is generally used when scanning in DDA mode: that is, for a certain peptide precursor ion, if it has been scanned and fragmented once, even if the ion is scanned again by the primary mass spectrometer within a specified period of time, fragmentation and secondary mass spectrometry scanning will not be performed.

[0004] Compared to the randomness of DDA mode during data acquisition, mass spectrometry acquisition mode based on data independence can record all information of the sample. While ensuring that the number of identifications is not affected, it improves the reproducibility and quantitative accuracy of proteomics analysis and has been widely used in differential proteomics analysis, especially in large-scale clinical sample differential proteomics analysis.

[0005] Currently, there are few technologies that apply DIA mode to trace proteomics analysis, and most studies on trace sample proteomics use the traditional DDA mode. This is partly due to the limited sensitivity of most current DIA data processing software: DIA spectra are inherently complex, and the signal-to-noise ratio of mass spectra acquired from trace proteomics samples is also very low. For most DIA data analysis software, it may be difficult to extract effective information from extremely complex spectra with low ion intensities. On the other hand, most current DIA data analysis software relies on chromatographic retention time information when performing ion matching between spectral libraries and DIA data. During ion matching and candidate peptide scoring, it only considers fragment ions within a certain co-elution time window. This not only places high demands on chromatographic reproducibility but also makes short-gradient DIA analysis difficult to implement: because long gradients are generally required during library construction to improve protein coverage depth.

[0006] Therefore, improving the accuracy and sensitivity of mass spectrometry analysis is a challenge for the identification of trace proteomics. Summary of the Invention

[0007] To address the aforementioned issues and improve the accuracy and sensitivity of trace proteomics mass spectrometry analysis, the primary objective of this application is to provide a method for constructing a target proteomics spectral library, the method comprising:

[0008] Mass spectrometry data of the target proteome is obtained by performing multiple data-dependent acquisition modes of the target proteome with different scanning ranges. The mass spectrometry data of the target proteome includes the primary spectrum of the target proteome in multiple scanning ranges, and the secondary spectrum in which the intensity of the precursor ion in each scanning range meets the preset threshold.

[0009] A library of target proteome spectral maps was constructed by searching the mass spectrometry data of the target proteome.

[0010] This application constructs a spectral library using mass spectrometry data acquired in a data-dependent acquisition mode without dynamic exclusion, thereby increasing the number of proteins, peptides, and spectra in the library. By combining this library with DIA data from trace proteomics mass spectrometry for search analysis, the success rate of DIA data matching is improved, which in turn improves the accuracy and sensitivity of DIA data analysis, and consequently, the accuracy and sensitivity of trace protein identification.

[0011] In one embodiment, the mass spectrometry data of the target proteome satisfy at least one of the following conditions (1) to (4):

[0012] (1) Multiple scanning ranges include several consecutive scanning ranges within the range of 400m / z to 1250m / z;

[0013] (2) The acquisition time for the secondary spectrum is 30ms to 45ms;

[0014] (3) The number of scans for the secondary spectrum is 60 to 100;

[0015] (4) The preset threshold for the parent ion intensity of the secondary spectrum scan is 10 cps to 29 cps;

[0016] Optionally, multiple scanning ranges include 400m / z to 500m / z, 500m / z to 600m / z, 600m / z to 700m / z, 700m / z to 800m / z, 900m / z to 1000m / z, 1000m / z to 1100m / z, and 1100m / z to 1250m / z.

[0017] In one embodiment, the mass spectrometry data of the target proteome satisfy at least one of the following conditions (1) to (3):

[0018] (1) The target proteome is the proteome of the target organism;

[0019] (2) The loading amount of the target proteome is 10 ng to 2000 ng;

[0020] (3) The chromatographic gradient duration for the target proteome is 90–120 minutes;

[0021] Optionally, the target organism may be selected from any one of animals, plants, and microorganisms;

[0022] Optionally, the target organism is selected from cattle, horses, pigs, sheep, goats, rats, mice, dogs, cats, rabbits, camels, donkeys, deer, minks, chickens, ducks, geese, or humans.

[0023] In one embodiment, the parameters for the library search are: using Trypsin digestion, allowing a maximum of two missed cleavage sites, setting methionine oxidation and protein N-terminal acetylation as variable modifications, and setting cysteine ​​alkylation as a fixed modification.

[0024] The second objective of this application is to provide a target proteome map library constructed according to the above method.

[0025] The third objective of this application is to provide the application of the aforementioned target proteome library in trace proteome detection.

[0026] In one embodiment, the trace proteome satisfies at least one of the following conditions (1) to (3):

[0027] (1) The mass of the micro-protein group is 0.2 ng to 100 ng;

[0028] (2) The microproteome and the target proteome are derived from the same organism;

[0029] (3) The microprotein group is a single-cell protein group.

[0030] In one embodiment, the application includes the following steps:

[0031] Mass spectrometry data of trace proteomes acquired through a data-independent acquisition mode were obtained, and the mass spectrometry data of trace proteomes were searched using a target proteome spectral library.

[0032] In one embodiment, the data-independent acquisition mode satisfies at least one of the following conditions (1) to (4):

[0033] (1) The data-independent acquisition mode is either the fixed-window data-independent acquisition mode or the variable-window data-independent acquisition mode;

[0034] (2) The scanning range of the data-independent acquisition mode includes: 400m / z to 1250m / z;

[0035] (3) The acquisition time for the primary spectrum of microproteomics is 200ms to 300ms;

[0036] (4) The acquisition time for the secondary spectrum of the microproteome is 65ms to 100ms.

[0037] In one embodiment, the chromatographic gradient duration for the microproteome is 1–240 minutes.

[0038] In one embodiment, the parameters for searching the mass spectrometry data of the trace proteome are: a parent ion mass tolerance of ±25 Da and a secondary fragment ion mass tolerance of ±50 ppm. Attached Figure Description

[0039] To more clearly illustrate the technical solutions in the specific embodiments of this application or the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0040] Figure 1 A flowchart illustrating the method for constructing a target proteome map library provided in this application embodiment;

[0041] Figure 2 The results of analyzing the same DIA data using the data dependency acquisition mode with dynamic exclusion and the data dependency acquisition mode without dynamic exclusion in Embodiment 2 of this application are shown. Figure 2 In the middle, A represents the DIA data analysis results of a 10ng sample (n=3); Figure 2 In the middle, B represents the DIA data analysis results of a 100ng sample (n=3);

[0042] Figure 3 The protein identification results of 100ng and 10ng trace samples under the data-dependent acquisition mode and the data-independent acquisition mode provided in Embodiment 3 of this application; Figure 3 In the middle, A represents the number of proteins identified in a 10ng sample (n=3); Figure 3 In the middle B, it indicates the number of proteins identified per 100 ng sample (n = 3);

[0043] Figure 4 The results of protein identification of 10 ng trace samples under different chromatographic gradient durations in the data-dependent acquisition mode and the data-independent acquisition mode provided in Embodiment 4 of this application are shown. Detailed Implementation

[0044] Reference will now be made to detailed embodiments of this application, one or more of which are described below. Each example is provided for explanation and not for limitation of this application. In fact, it will be apparent to those skilled in the art that various modifications and variations can be made to this application without departing from its scope or spirit. For example, features described or illustrated as part of one embodiment may be used in another embodiment to produce further embodiments.

[0045] Therefore, this application is intended to cover such modifications and variations falling within the scope of the appended claims and their equivalents. Other objects, features, and aspects of this application are disclosed in or will be apparent from the following detailed description. It will be understood by those skilled in the art that this discussion is merely a description of exemplary embodiments and is not intended to limit the broader aspects of this application.

[0046] As mentioned above, improving the accuracy and sensitivity of mass spectrometry analysis is a challenge for the identification of trace proteomics. Furthermore, the inventors discovered that when the mass spectrometry sample volume is reduced to the single-cell level, the composition and intensity of ions in the spectrum change significantly compared to conventional sample volumes. Regardless of whether the method generates a spectral library based on the DDA model or a predictive spectral library based on machine learning, it uses or learns from the spectra generated by conventional sample volumes. When extracting protein identification and quantification results from single-cell DIA analysis based on these spectral libraries, even for the same peptide, the inconsistency in the composition and intensity of fragment ions between the spectral library and the single-cell DIA spectrum can lead to ineffective extraction or erroneous extraction of identification and quantification information.

[0047] To address at least one of the aforementioned technical problems, the first aspect of this application provides a method for constructing a target proteome map library, the flowchart of which is shown below. Figure 1 As shown, the method includes:

[0048] S10: Mass spectrometry scans of the target proteome are performed multiple times in a data-dependent acquisition mode without dynamic exclusion through different scan ranges to obtain mass spectrometry data of the target proteome. The mass spectrometry data of the target proteome includes primary spectra of multiple scan ranges of the target proteome, and secondary spectra of the precursor ion intensity in each scan range meeting a preset threshold.

[0049] S20: Search for libraries based on mass spectrometry data of the target proteome to construct a target proteome spectral library.

[0050] In this application, the term "target proteome" refers to the proteome derived from a target organism, including all proteins that the target organism can express. It can be the proteome of the target organism at a specific point in time (such as in embryonic and mature organisms), or the proteome of a specific cell type or tissue within the target organism. Depending on the type of target organism, the target proteome can be an animal proteome, a plant proteome, or a microbial proteome.

[0051] For the animal proteome, depending on the animal species, the target organism can be selected from cattle, horses, pigs, sheep, goats, rats, mice, dogs, cats, rabbits, camels, donkeys, deer, minks, chickens, ducks, geese, or humans.

[0052] Specifically, when collecting mass spectrometry data, the enzymatically digested peptides of the target proteome are used as samples for mass spectrometry detection to obtain mass spectrometry data of the target proteome. For example, the enzymatically digested peptides can be Trypsin-digested peptides and Lys-c-digested peptides. There are no restrictions on the acquisition of the enzymatically digested peptides of the target proteome, and they can be obtained through self-made or commercial means.

[0053] It should be noted that in order to increase the number of proteins, peptides and spectra in the spectral library, traditional strategies use chromatographic methods such as high pH reverse phase fractionation to pre-fractionate samples. However, commercially available chromatographic pre-fractionation strategies require sample amounts in the μg to mg range, and are therefore not suitable for pre-fractionation of ng-level samples.

[0054] Specifically, in the spectral library construction strategy of this application, the target proteome is fractionated by mass spectrometry gas phase fractionation, that is, the target proteome is loaded multiple times, with different scanning ranges for each loading, and data is acquired in a data-dependent acquisition mode without dynamic exclusion each time, so as to acquire as many secondary spectra as possible for each peptide.

[0055] Among them, the data-dependent acquisition mode without dynamic exclusion refers to the process of not dynamically excluding the precursor ions in the primary scanning range during mass spectrometry scanning, allowing the precursor ions that fall within the scanning range and whose intensity meets the preset threshold to undergo secondary fragmentation, so as to acquire as many secondary spectra as possible for each peptide segment.

[0056] It should be noted that in traditional data-dependent acquisition modes, the mass spectrometer performs a primary scan, followed by a secondary scan of the precursor ion (intensity-dependent) list selected from the primary scan mass spectrum. In contrast, the data-dependent acquisition mode without dynamic exclusion does not dynamically exclude precursor ions from the secondary fragmentation process, thus acquiring as many secondary spectra as possible for each peptide segment.

[0057] In some implementation schemes, when using mass spectrometry gas chromatography fractionation to fractionate trace protein samples, the target proteome is scanned by mass spectrometry in several consecutive scan ranges within the range of 400 m / z to 1250 m / z. Each scan uses a data-dependent acquisition mode without dynamic exclusion to obtain the primary spectrum of several consecutive scan ranges within the range of 400 m / z to 1250 m / z, as well as the secondary spectrum where the precursor ion intensity in each scan range meets a preset threshold.

[0058] In some implementations, the multiple scanning ranges include 400 m / z to 500 m / z, 500 m / z to 600 m / z, 600 m / z to 700 m / z, 700 m / z to 800 m / z, 900 m / z to 1000 m / z, 1000 m / z to 1100 m / z, and 1100 m / z to 1250 m / z. The scanning range for each scan is selected from any one of these multiple scanning ranges.

[0059] In some specific implementation schemes, the scanning range for the first mass spectrometry gas phase fractionation is 400 m / z to 500 m / z; the scanning range for the second mass spectrometry gas phase fractionation is 500 m / z to 600 m / z; the scanning range for the third mass spectrometry gas phase fractionation is 600 m / z to 700 m / z; the scanning range for the fourth mass spectrometry gas phase fractionation is 700 m / z to 800 m / z; the scanning range for the fifth mass spectrometry gas phase fractionation is 800 m / z to 900 m / z; the scanning range for the sixth mass spectrometry gas phase fractionation is 900 m / z to 1000 m / z; the scanning range for the seventh mass spectrometry gas phase fractionation is 1000 m / z to 1100 m / z; and the scanning range for the eighth mass spectrometry gas phase fractionation is 1100 m / z to 1250 m / z.

[0060] In some implementation schemes, in order to achieve mass spectrometry scanning with a data-dependent acquisition mode without dynamic exclusion, and to increase the number of proteins, peptides, and spectra in the spectral library and the matching efficiency of subsequent data-independent acquisition, the acquisition time of the secondary spectrum for the primary spectrum of each scanning range is 30ms to 45ms; the number of secondary spectrum scans is 60 to 100; and the preset threshold for the parent ion intensity of the secondary spectrum scan is 10cps to 29cps, so as to acquire as many secondary spectra as possible for each peptide, thereby improving the matching rate of subsequent micro-proteomics data-independent acquisition and the spectral library.

[0061] In some specific implementations, the acquisition time for mass spectrometry data of each scanning range is set to 30ms, 35ms, 40ms or 45ms, the number of scans is set to 60, 70, 80, 90 or 100, and the threshold for the intensity of the acquired precursor ion is set to 10cps, 15cps, 20cps, 25cps or 29cps, so as to acquire as many secondary spectra as possible for each peptide.

[0062] This application constructs a spectral library using mass spectrometry data in a data-dependent acquisition mode without dynamic exclusion. It acquires as many secondary spectra as possible for each peptide segment, thereby improving the matching rate between the target proteome spectral library and subsequent trace proteome mass spectrometry data, and thus improving the accuracy of trace proteome identification.

[0063] In some implementation schemes, the mass of the target proteome for each scan during library construction is 10 ng to 2000 ng. To achieve consistency with the peak intensity of the trace proteome mass spectrometry data and improve the success rate of matching the trace proteome with the spectral library, the sample loading amount of the target proteome for each scan can be further increased to 10 ng to 200 ng, and even further to 10 ng to 50 ng. Specifically, the trace sample is fractionated using gas chromatography mass spectrometry, which divides the target proteome into several fractions for scanning at different fractionation ranges to obtain mass spectrometry data for multiple scan ranges of the proteome.

[0064] In some implementations, to improve the coverage of the spectral library, the chromatographic gradient duration for the target proteome is 90–120 minutes. In some specific implementations, the chromatographic gradient duration for the target proteome is 90 minutes, 95 minutes, 100 minutes, 105 minutes, 110 minutes, 115 minutes, and 120 minutes per scan.

[0065] In some implementation schemes, the purpose of the library search in the above construction method is to construct a target proteome spectral library based on the mass spectrometry data of the target proteome and the search results of known spectral libraries, with the search parameters corresponding to the enzyme digestion methods of the target proteome.

[0066] In some specific implementation schemes, when loading the target proteome, the peptides are digested with Trypsin. The parameters for searching the library are: use Trypsin digestion, allow a maximum of two missed cleavage sites, set methionine oxidation and protein N-terminal acetylation as variable modifications, and set cysteine ​​alkylation as a fixed modification.

[0067] Accordingly, the second aspect of this application provides a target proteome spectral library constructed according to the above method to improve the matching rate of subsequent trace proteome mass spectrometry data.

[0068] Accordingly, a third aspect of this application provides the application of the aforementioned target proteome library in the detection of trace proteomes.

[0069] In this application, the term microproteome refers to a proteome with a mass at or below the ng level. Specifically, the microproteome can be derived from cells or tissues of the target organism; for example, cells can be cancer cells or immune cells, and tissues can be liver tissue or blood tissue, etc.

[0070] In some specific implementations, the microproteome can be a single-cell proteome.

[0071] Specifically, when collecting mass spectrometry data, enzymatically digested peptides of trace proteome are used as samples for mass spectrometry detection to obtain mass spectrometry data of trace proteome. There are no restrictions on the acquisition of enzymatically digested peptides of trace proteome; they can be self-made enzymatically digested peptides of trace proteome or obtained through commercial means.

[0072] In some implementation schemes, in order to achieve the detection of trace proteomes, the trace proteome and the target proteome are derived from the same organism.

[0073] In some implementation schemes, the mass of the trace proteome is 0.2 ng to 100 ng, specifically, it can be 0.2 ng, 10 ng, 25 ng, 50 ng, 80 ng or 100 ng.

[0074] The target proteome spectral library constructed in this application can maintain consistency with the composition and intensity of fragment ions in mass spectrometry data of trace proteomes with low intensity, thereby improving the matching rate of trace proteome data independent of acquisition data, and thus improving the accuracy and sensitivity of trace proteome identification.

[0075] In some implementation schemes, the application includes the following steps:

[0076] Mass spectrometry data of trace proteomes acquired through a data-independent acquisition mode were obtained, and the mass spectrometry data of trace proteomes were searched using a target proteome spectral library.

[0077] In some implementations, the data-independent acquisition mode can be either a fixed-window or variable-window mode. A fixed-window mode refers to an acquisition mode with a fixed window length, while a variable-window mode refers to an acquisition mode where the window length can be freely set. In some specific implementations, the acquisition parameters for the data-independent acquisition mode include: a scan range of 400 m / z to 1250 m / z; an acquisition time for the primary spectrum ranging from 200 ms to 300 ms, specifically 200 ms, 220 ms, 250 ms, 280 ms, or 300 ms; and an acquisition time for the secondary spectrum ranging from 65 ms to 100 ms, specifically 65 ms, 75 ms, 85 ms, 95 ms, or 100 ms, to acquire trace proteomic data in a data-independent manner.

[0078] This application constructs a spectral library using mass spectrometry data acquired without dynamic exclusion, increasing the number of proteins, peptides, and spectra in the library. By combining mass spectrometry data from trace protein samples with data acquired independently of the trace proteome, the application performs library search analysis, improving the success rate of matching data acquired independently of the data, thereby improving the accuracy of data analysis and the accuracy of trace proteome identification.

[0079] Understandably, in order to improve the coverage of the spectral library, the construction of the target proteome spectral library generally uses pre-fractionation and a longer chromatographic duration. If the chromatographic duration of the data-independent acquisition data needs to be consistent with that in the spectral library, it will limit the acquisition efficiency of the data-independent acquisition data.

[0080] In some specific implementation schemes, this application avoids the above-mentioned problems by using MSPLIT-DIA software, which can acquire data in a shorter time without relying on collected data, thereby increasing the number of samples analyzed in the same amount of time.

[0081] In some implementations, the chromatographic gradient duration for microproteomics ranges from 1 to 240 minutes.

[0082] In some specific implementations, the chromatographic gradient duration for the microproteome is 1 minute, 50 minutes, 100 minutes, 150 minutes, 200 minutes, or 240 minutes.

[0083] The implementation scheme of this application will now be described in detail with reference to the embodiments.

[0084] Example 1

[0085] In this embodiment, mass spectrometry gas chromatography fractionation samples were constructed using both a data-dependent acquisition mode without dynamic exclusion and a data-dependent acquisition mode with dynamic exclusion.

[0086] The database construction process using the data dependency collection mode without dynamic exclusion includes:

[0087] Trace samples were fractionated using gas chromatography-mass spectrometry (GC-MS): 200 ng of K562 cell enzymatically digested peptides (purchased from Promega) were divided into 8 aliquots of 25 ng each. GC-MS fractionation was performed on each aliquot, with different scan ranges for each fraction (400 m / z to 500 m / z for the first fraction; 500 m / z to 600 m / z for the second fraction; and 600 m / z to 700 m / z for the third fraction). 0 m / z; the scanning range for the fourth mass spectrometry gas phase fractionation is 700 m / z to 800 m / z; the scanning range for the fifth mass spectrometry gas phase fractionation is 800 m / z to 900 m / z; the scanning range for the sixth mass spectrometry gas phase fractionation is 900 m / z to 1000 m / z; the scanning range for the seventh mass spectrometry gas phase fractionation is 1000 m / z to 1100 m / z; and the scanning range for the eighth mass spectrometry gas phase fractionation is 1100 to 1250 m / z. Mass spectrometry was performed using a non-dynamic exclusion mode for data acquisition. The acquisition parameters for this data-dependent acquisition mode included: a secondary spectrum acquisition time of 30 ms, 100 secondary spectrum scans, a precursor ion intensity threshold of 10 cps, and a chromatographic gradient duration of 90 minutes per loading. The chromatographic phase A consisted of 99.9% water and 0.1% formic acid; the chromatographic phase B consisted of 80% acetonitrile, 19.9% ​​water, and 0.1% formic acid. The chromatographic gradient was as follows: 0–5 minutes for 0% B–5% B; 5.1–60 minutes for 5% B–28% B; 60.1–70 minutes for 28% B–80% B; and 70.1–90 minutes for 80% B. After collecting the mass spectrometry data, MS-GF+ software was used to merge and search all the mass spectrometry data to construct a spectral library. The search parameters were as follows: Trypsin digestion was used, a maximum of two missed cleavage sites were allowed, methionine oxidation and protein N-terminal acetylation were set as variable modifications, and cysteine ​​alkylation was set as a fixed modification.

[0088] The database construction process using the dynamic exclusion data dependency collection mode includes:

[0089] The acquisition parameters for library construction in the data-dependent acquisition mode with dynamic exclusion include: the secondary spectrum accumulation time is set to 100ms, the number of secondary spectrum scans is set to 10, the parent ion intensity threshold for secondary spectrum acquisition is set to 100cps, and the remaining parameters are the same as those for the data-dependent acquisition mode without dynamic exclusion.

[0090] The comparison results of the number of proteins, peptides and spectra in the library after library construction using the data dependency acquisition mode without dynamic exclusion and the data dependency acquisition mode with dynamic exclusion are shown in Table 1.

[0091] Table 1

[0092]

[0093] Experimental results show that using a data-dependent acquisition mode without dynamic exclusion, combined with gas chromatography-mass spectrometry fractionation, more than 120,000 spectra, corresponding to nearly 25,000 peptides, can be identified from a total of 200 ng of trace proteome sample. Compared with the traditional method of library construction using a data-dependent acquisition mode with dynamic exclusion, under the same gas chromatography-fractionation conditions, the number of spectra increased by 310%, the number of peptides increased by 105%, and the number of proteins increased by 69%; the average number of spectra per peptide increased from 2.4 to 4.8, which is beneficial for subsequent DIA data extraction.

[0094] Example 2

[0095] In this embodiment, the data library was constructed using both the data-dependent acquisition mode without dynamic exclusion and the data-dependent acquisition mode with dynamic exclusion from Example 1. Then, trace proteomic data of 10 ng and 100 ng were acquired using the DIA mode. The DIA scan range was 400 m / z to 1250 m / z; the acquisition time for the primary spectrum was 250 ms, and the acquisition time for the secondary spectrum was 65 ms; the chromatographic gradient duration was 90 minutes; the chromatographic phase A consisted of 99.9% water + 0.1% formic acid; the chromatographic phase B consisted of 80% acetonitrile + 19.9% ​​water + 0.1% formic acid; the chromatographic gradient was as follows: 0–5 minutes for 0% B to 5% B; 5.1–60 minutes for 5% B to 28% B; 60.1–70 minutes for 28% B to 80% B; and 70.1–90 minutes for 80% B. After acquiring the DIA data, the MSPLIT-DIA software was used to import the previously established spectral library results and the acquired DIA data for library search analysis. The search parameters for MSPLIT-DIA were: parent ion mass tolerance of ±25 Da and secondary fragment ion mass tolerance of ±50 ppm. The same DIA data was searched using different models of libraries constructed within the same dataset using MSPLIT-DIA software. The results are as follows. Figure 2 As shown.

[0096] Experimental results show that, compared with the data-dependent acquisition mode using dynamic exclusion, the number of proteins identified in DIA data using library construction with data-dependent acquisition mode without dynamic exclusion increased from ~2208 to ~2630; and from ~1163 to ~1402. In other words, compared with the data-dependent acquisition mode using dynamic exclusion, the number of proteins identified in 10 ng and 100 ng trace samples using the data-dependent acquisition mode without dynamic exclusion increased by 21% (1402±50 vs. 1163±5) and 19% (2630±22 vs. 2208±8), respectively. This indicates that mass spectrometry analysis methods based on data-dependent acquisition mode without dynamic exclusion and data-independent acquisition mode have certain advantages in the depth of identification of trace target proteomes compared with traditional mass spectrometry analysis methods based on data-dependent acquisition mode with dynamic exclusion and data-independent acquisition mode.

[0097] Example 3

[0098] This embodiment uses the data-dependent acquisition mode without dynamic exclusion from Example 1 to build the library. Then, mass spectrometry data of 10 ng and 100 ng of trace proteome are acquired using the data-dependent acquisition mode and the data-independent acquisition mode, respectively. The DIA scan range is 400 m / z to 1250 m / z; the acquisition time for the first-order spectrum is 250 ms, and the acquisition time for the second-order spectrum is 65 ms; the chromatographic gradient duration is 90 minutes. The chromatographic phase A is 99.9% water + 0.1% formic acid; the chromatographic phase B is 80% acetonitrile + 19.9% ​​water + 0.1% formic acid; the chromatographic gradient is: 0-5 minutes 0%B-5%B; 5.1-60 minutes 5%B-28%B; 60.1-70 minutes 28%B-80%B; 70.1-90 minutes 80%B. After acquiring the DIA data, the MSPLIT-DIA software was used to import the spectral library results established above and the acquired DIA data for DIA data search analysis. The search parameters for MSPLIT-DIA were: the mass tolerance of the parent ion was ±25 Da, and the mass tolerance of the secondary fragment ions was ±50 ppm.

[0099] The parameters for acquiring mass spectrometry data using the data-dependent acquisition mode include: a secondary spectrum acquisition time of 100 ms, 10 secondary spectrum scans, and a precursor ion intensity threshold of 100 cps; a chromatographic gradient duration of 90 minutes; chromatographic phase A consisting of 99.9% water and 0.1% formic acid; and chromatographic phase B consisting of 80% acetonitrile, 19.9% ​​water, and 0.1% formic acid; and a chromatographic gradient of: 0–5 minutes for 0% B–5% B; 5.1–60 minutes for 5% B–28% B; 60.1–70 minutes for 28% B–80% B; and 70.1–90 minutes for 80% B.

[0100] MSPLIT-DIA software was used to perform library search analysis on mass spectrometry data acquired independently of data source using a data-dependent acquisition mode without dynamic exclusion. Simultaneously, Mascot and Comet software were used to perform library search analysis on mass spectrometry data acquired dependently. The database used was the human protein library downloaded from the UniProt website, containing 72,481 protein sequences. After the library search was completed, Trans-Proteomic Pipeline software was used to integrate the search results from Mascot and Comet. The final comparison of the library search results is shown below. Figure 3 As shown.

[0101] Experimental results show that the spectral library constructed based on the data-dependent acquisition mode without dynamic exclusion, when analyzing trace proteomes using data-independent acquisition mass spectrometry data, identified more than 2,600 proteins from a 100 ng trace sample and ~1,400 proteins from a 10 ng trace sample; while the same sample volume analyzed using traditional data-dependent acquisition mass spectrometry data only identified ~1,600 and ~840 proteins, respectively.

[0102] In other words, the spectral library constructed based on a data-dependent acquisition mode without dynamic exclusion, when analyzed using data-independent acquisition mass spectrometry data for trace proteomic samples of 10 ng and 100 ng, showed a 68% (1402±50 vs. 836±4) and 65% (2630±22 vs. 1597±6) increase in the number of proteins identified, respectively, compared to traditional data-dependent acquisition mass spectrometry methods. This indicates that the mass spectrometry analysis method combining a data-dependent acquisition mode without dynamic exclusion with a data-independent acquisition mode has a certain advantage in the depth of identification of trace target proteomes compared to traditional data-dependent acquisition mass spectrometry methods.

[0103] Example 4

[0104] In this embodiment, the library was built using the data-dependent acquisition mode without dynamic exclusion as described in Example 1. Then, mass spectrometry data of a 10 ng trace proteome sample were acquired using the data-independent acquisition mode at chromatographic gradient durations of 30 min and 90 min, respectively. The DIA scan range is 400 m / z to 1250 m / z; the acquisition time for the primary spectrum is 250 ms, and the acquisition time for the secondary spectrum is 65 ms; the chromatographic phase A is 99.9% water + 0.1% formic acid; the chromatographic phase B is 80% acetonitrile + 19.9% ​​water + 0.1% formic acid; the 90-minute chromatographic gradient is: 0–5 minutes 0% B–5% B; 5.1–60 minutes 5% B–28% B; 60.1–70 minutes 28% B–80% B; 70.1–90 minutes 80% B; the 30-minute chromatographic gradient is: 0–2 minutes 0% B–5% B; 2.1–20 minutes 5% B–28% B; 20.1–25 minutes 28% B–80% B; 25.1–30 minutes 80% B. After acquiring DIA data, the MSPLIT-DIA software was used to import the spectral library results established above and the acquired DIA data for DIA data search analysis. The search parameters of MSPLIT-DIA were: the mass tolerance of the parent ion was ±25 Da, and the mass tolerance of the secondary fragment ions was ±50 ppm.

[0105] Simultaneously, mass spectrometry data of a 10 ng trace proteome sample were acquired using data-dependent acquisition mode at a 90-min chromatographic gradient. The secondary spectrum acquisition time in data-dependent acquisition mode was set to 100 ms, the number of secondary spectrum scans was set to 10, and the precursor ion intensity threshold for secondary spectrum acquisition was set to 100 cps. Chromatographic phase A consisted of 99.9% water + 0.1% formic acid; chromatographic phase B consisted of 80% acetonitrile + 19.9% ​​water + 0.1% formic acid. The 90-min chromatographic gradient was as follows: 0–5 min: 0%B–5%B; 5.1–60 min: 5%B–28%B; 60.1–70 min: 28%B–80%B; 70.1–90 min: 80%B.

[0106] Mass spectrometry data from 10 ng trace proteomic samples acquired independently of data acquisition were analyzed using MSPLIT-DIA software. Simultaneously, mass spectrometry data from 10 ng trace proteomic samples acquired dependently of data acquisition were analyzed using Mascot and Comet software. The database used was the human protein library downloaded from the UniProt website, containing 72,481 protein sequences. After the library search was completed, the results from Mascot and Comet were integrated using Trans-Proteomic Pipeline software. The results are as follows: Figure 4 As shown.

[0107] Experimental results show that the spectral library constructed based on a data-dependent acquisition mode without dynamic exclusion can identify 832±19 proteins in a 10 ng trace proteome sample using data-independent acquisition mass spectrometry data at different chromatographic gradient durations, even at a gradient duration of 30 minutes. This is comparable to the number of proteins identified (836±4) obtained with data-dependent acquisition mass spectrometry data at a gradient duration of 90 minutes. This indicates that the mass spectrometry analysis method combining a library constructed based on a data-dependent acquisition mode without dynamic exclusion with a data-independent acquisition mode improves the analytical throughput of trace target proteomes compared to traditional data-dependent acquisition mass spectrometry methods.

[0108] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0109] The above embodiments merely illustrate several implementation methods of this application, and while the descriptions are relatively specific and detailed, they should not be construed as limiting the scope of the invention patent. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this patent application should be determined by the appended claims.

Claims

1. The application of a target proteome library in trace proteome detection, characterized in that, The method for constructing the target proteome map library includes: Mass spectrometry data of the target proteome is obtained by performing multiple data-dependent acquisition mode mass spectrometry scans of the target proteome in multiple scan ranges without dynamic exclusion. The mass spectrometry data of the target proteome includes primary spectra of the target proteome in multiple scan ranges and secondary spectra of the precursor ion intensity in each scan range meeting a preset threshold. A library of target proteome spectral maps was constructed by searching mass spectrometry data of the target proteome. The plurality of scanning ranges include several consecutive scanning ranges within the range of 400 m / z to 1250 m / z; The acquisition time for the secondary spectrum is 30ms~45ms; The number of scans for the secondary spectrum is 60-100; The preset threshold for the parent ion intensity of the secondary spectrum scan is 10 cps to 29 cps; The application includes the following steps: Mass spectrometry data of trace proteomes acquired using a data-independent acquisition mode are obtained, and the target proteome spectral library is used to perform a library search analysis on the trace proteome mass spectrometry data. The data-independent acquisition mode can be a fixed-window data-independent acquisition mode or a variable-window data-independent acquisition mode. The scan range of the data-independent acquisition mode includes 400 m / z to 1250 m / z. The acquisition time for the primary spectrum of the trace proteome is 200 ms to 300 ms. The acquisition time for the secondary spectrum of the trace proteome is 65 ms to 100 ms. The mass of the trace proteome is 0.2 ng to 100 ng.

2. The application according to claim 1, characterized in that, The multiple scanning ranges include 400m / z~500m / z, 500m / z~600m / z, 600m / z~700m / z, 700m / z~800m / z, 900m / z~1000m / z, 1000m / z~1100m / z, and 1100m / z~1250m / z.

3. The application according to claim 1, characterized in that, The mass spectrometry data of the target proteome satisfy at least one of the following conditions (1) to (2): (1) The target proteome is the proteome of the target organism; (2) The loading amount of the target proteome is 10 ng to 2000 ng.

4. The application according to claim 3, characterized in that, The target organism is selected from any one of animals, plants, and microorganisms.

5. The application according to claim 4, characterized in that, The target organism is selected from cattle, horses, pigs, sheep, goats, rats, mice, dogs, cats, rabbits, camels, donkeys, deer, minks, chickens, ducks, geese, or humans.

6. The application according to any one of claims 1 to 5, characterized in that, The parameters for searching the mass spectrometry data of the target proteome are as follows: use Trypsin digestion, allow a maximum of two missed cleavage sites, set methionine oxidation and protein N-terminal acetylation as variable modifications, and set cysteine ​​alkylation as a fixed modification.

7. The application according to any one of claims 1 to 5, characterized in that, The microproteome satisfies at least one of the following conditions (1) to (2): (1) The trace proteome and the target proteome are derived from the same organism; (2) The microproteome is a single-cell proteome.

8. The application according to any one of claims 1 to 5, characterized in that, The parameters for searching the mass spectrometry data of the trace proteome are: a parent ion mass tolerance of ±25 Da and a secondary fragment ion mass tolerance of ±50 ppm.