circrna cross-bsj peptide segment verification method fusing multiple public databases

By integrating multiple public databases, we can segment linear homologous and BSJ-specific ion sets, perform signal decoupling and local quality correction, solve the signal interference and instrument offset problems in cross-BSJ peptide validation, and achieve higher detection accuracy and robustness.

CN122201439APending Publication Date: 2026-06-12SHENZHEN RONKEDIT TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHENZHEN RONKEDIT TECHNOLOGY CO LTD
Filing Date
2026-04-25
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing technologies suffer from linear homologous ion signal interference, instrument mass axis offset, and robustness issues in cross-BSJ peptide validation, resulting in insufficient accuracy and robustness in detecting cross-joint fragmentation sites.

Method used

By integrating multiple public databases, we divide the linear homologous ion set and the BSJ-specific ion set, perform signal decoupling and local quality correction, conduct independent verification by combining the co-occurrence threshold of multiple libraries, and construct a targeted fragment feature library.

🎯Benefits of technology

It significantly improves the signal-to-noise ratio and detection sensitivity of BSJ-specific ions, enhances the detection accuracy and robustness of cross-joint fragmentation sites, and reduces the risk of false positives and computational overhead.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122201439A_ABST
    Figure CN122201439A_ABST
Patent Text Reader

Abstract

The application discloses a circRNA cross-BSJ peptide segment verification method fusing multiple public databases, and relates to the technical field of proteomics mass spectrometry data analysis; the method comprises the following steps: acquiring candidate cross-BSJ peptide segment sequences and precursor ion mass, and dividing theoretical fragment ions into a linear homologous ion set and a BSJ-specific ion set; extracting matched secondary spectra from multiple public mass spectrometry databases; performing signal decoupling on the linear homologous ion set to obtain a residual sub-spectrum; constructing a cross-joint fragmentation site pairing state sequence through two rounds of detection and local mass offset correction in the residual sub-spectrum; performing continuity determination and forming a multi-database verification vector; when the number of databases passing the determination meets a multi-database co-occurrence threshold, it is determined that the peptide segment verification is passed; and extracting cross-joint-specific fragment ion characteristics for the passed peptide segment and writing the characteristics into a local targeted fragment feature library; and the application effectively improves the signal-to-noise ratio, accuracy and cross-database robustness of cross-BSJ peptide segment verification.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of proteomics mass spectrometry data analysis technology, and more specifically, to a method for validating circRNA across BSJ peptides by integrating multiple public databases. Background Technology

[0002] Circular RNA (circRNA) is a covalently closed circular non-coding RNA molecule formed by backsplicing of precursor mRNA. In recent years, studies have found that some circRNAs can be translated by ribosomes to produce biologically functional cross-backsplicing site (BSJ) peptides. Since the amino acid sequence of cross-BSJ peptides spans the suture point formed by backsplicing, their sequence arrangement is fundamentally different from that of linearly encoded peptides in the genome. Therefore, mass spectrometry identification and verification of cross-BSJ peptides at the proteomics level has become a key technical step in the functional study of circRNA translation products. Currently, mass spectrometry identification of cross-BSJ peptides generally employs a conventional proteomics analysis workflow based on database search. This workflow first constructs a customized protein sequence database containing cross-seam sequences based on the open reading frames predicted by circRNA. Then, the raw tandem mass spectrometry data to be analyzed is submitted to a mass spectrometry database search engine. By matching the experimentally obtained secondary mass spectrometry fragment ion spectra with theoretical fragmentation spectra, candidate peptide-spectrum matching results are obtained. To control the false positive rate, the database search results are usually further evaluated using a target-decoy database strategy to estimate the false discovery rate, retaining only high-confidence matching peptides below a preset threshold. In the validation stage, existing technologies mostly rely on data mining from single-source public proteomics mass spectrometry databases or manually verify candidate peptides by re-acquiring targeted mass spectrometry data. Regarding the attribution of fragment ion signals, existing methods mostly directly apply the fragmentation model of linear peptides, evaluating the reliability of peptide identification by calculating the coverage of b-ion and y-ion sequences. However, the aforementioned existing technologies have several inherent limitations when applied to the verification of cross-BSJ peptides. Firstly, cross-BSJ peptides generate two types of fragment ions in their mass spectrometric fragmentation behavior: one type is linear homologous ions whose sequence arrangement completely overlaps with the linear homologous peptide, and the other type is BSJ-specific ions generated due to the sequence crossing the seam. During routine database searches and subsequent manual verification, the lack of systematic differentiation and isolation of these two types of fragment ions leads to significant increase in the background noise level of the cross-BSJ-specific signal due to the lack of systematic differentiation and isolation, resulting in the obliteration or misjudgment of independent evidence of cross-seam fragmentation sites. Secondly, mass spectrometers exhibit mass axis shift during actual measurements, especially at the low mass end and high intensity signal range of secondary fragment ions. This systematic bias can cause some BSJ fragments to be lost or misjudged. The measured mass of specific fragment ions deviates from the theoretical tolerance window, leading to missed detections or mismatches in the pairing detection of cross-seam sites. Existing methods lack adaptive correction mechanisms for mass shifts in local regions across cross-seam sites, making it difficult to recover pairing information lost due to instrument deviation at the level of a single spectrum. Thirdly, data in public proteomics mass spectrometry databases come from different laboratories, instrument platforms, and acquisition batches, resulting in significant heterogeneity in spectral quality and signal response characteristics. Existing validation strategies are usually based on the judgment conclusions of a single database or a single spectrum, without establishing a consistency evaluation framework for independent reproducibility across databases. This makes the robustness of validation conclusions highly dependent on the randomness of the selected data source, lacking a systematic quantitative basis for determining whether cross-BSJ peptide fragmentation features can be reproduced in multiple independent data sources. To address the above problems, this invention proposes a solution. Summary of the Invention

[0003] To overcome the aforementioned deficiencies of the prior art, embodiments of the present invention provide a method for validating circRNA across BSJ peptides by integrating multiple public databases, thereby addressing the problems mentioned in the background art.

[0004] To achieve the above objectives, the present invention provides the following technical solution: A method for validating circRNA across BSJ peptides by integrating multiple public databases includes the following steps: obtaining candidate cross-BSJ peptide sequences and their precursor ion masses, calculating theoretical fragment ion masses based on standard peptide fragmentation theory, dividing theoretical fragment ions into linear homologous ion sets and BSJ-specific ion sets, and extracting raw secondary mass spectrometry data that match the precursor ion masses from multiple independent public proteomics mass spectrometry databases. Perform signal decoupling on each original MS / MS spectrum using a linear homologous ion set to obtain a residual sub-spectrum. In the residual sub-spectrum, perform a first-round detection on the theoretical b-ions and theoretical complementary y-ions at each cross-seam fragmentation site. Mark the cross-seam fragmentation sites where both b-ions and complementary y-ions are detected as initial double-end sites. Obtain the local mass correction amount of the spectrum based on the paired mass deviation of the initial double-end sites, and perform an offset correction on the theoretical ion masses of the remaining sites based on the local mass correction amount of the spectrum, and then perform a second-round detection. Construct a cross-seam fragmentation site pairing status sequence based on the combined results of the two rounds of detection. Perform continuity determination on the cross-seam fragmentation site pairing status sequence, and form a verification vector with the determination results of each database. When the number of databases passing the determination is not less than the preset multi-database co-occurrence threshold, it is determined that the verification is passed. For the peptides that pass the verification, extract the cross-seam specific fragment ion features from the spectra that pass the determination and write them into the local targeted fragment feature library.

[0005] In a preferred embodiment, divide the theoretical fragment ions into a linear homologous ion set and a BSJ-specific ion set, including: Define that the seam point is located between the S-th residue and the (S + 1)-th residue of the peptide sequence, and the original fragmentation site r represents the cleavage position of the peptide backbone between the r-th residue and the (r + 1)-th residue. For any original fragmentation site r, its corresponding theoretical b-ion is , The ion contains the 1st to the r-th residues. When r ≤ S, the ion is classified into the linear homologous ion set. When r > S, the ion is classified into the BSJ-specific ion set; for the original fragmentation site r, its corresponding complementary theoretical y-ion is , The ion contains the (r + 1)-th to the L-th residues. When r ≥ S, the complementary theoretical y-ion is classified into the linear homologous ion set. When r < S, the complementary theoretical y-ion is classified into the BSJ-specific ion set. When at least one of the theoretical b-ion or the complementary theoretical y-ion corresponding to the original fragmentation site r is classified into the BSJ-specific ion set, mark the original fragmentation site r as a cross-seam fragmentation site.

[0006] In a preferred embodiment, the number of multiple independent public proteomics mass spectrometry databases is at least three; for each public proteomics mass spectrometry database, retrieve the spectra in which the absolute value of the difference between the precursor ion mass and the precursor ion mass of the candidate cross-BSJ peptide does not exceed the parent ion mass tolerance in all its MS / MS spectra. Sort the spectra passing the matching screening in descending order of the total ion current intensity, and retain the top single-database retention spectrum quantity upper limit number of spectra.

[0007] In a preferred embodiment, signal decoupling is performed on each original secondary mass spectrum using a linear homologous ion set to obtain a residual sub-spectrum. This includes: traversing each theoretical mass value in the linear homologous ion set, searching for a peak in the peak list of the spectrum whose mass deviation does not exceed the fragment ion mass tolerance, and setting the intensity of the matched peak to zero; after the operation is completed, the remaining peaks in the spectrum constitute the residual sub-spectrum.

[0008] In a preferred embodiment, a first round of detection is performed on each cross-joint fragmentation site in the residual sub-spectrum, including: traversing all cross-joint fragmentation sites, and for each cross-joint fragmentation site, searching in the residual sub-spectrum for peaks whose theoretical b-ion mass deviation from the site does not exceed the fragmentation ion mass tolerance, and peaks whose theoretical complementary y-ion mass deviation from the site does not exceed the fragmentation ion mass tolerance. When both peaks are found, they are marked as initial paired-end sites; when only one peak is found, it is marked as initial single-end sites; when neither is found, it is marked as initial missing sites; when multiple candidate peaks exist within the same mass tolerance window, the candidate peak with the smallest absolute mass deviation from the theoretical ion mass is selected as the matching peak; when multiple candidate peaks have the same or approximately the same absolute mass deviation from the theoretical ion mass, the candidate peak with the largest intensity is selected as the matching peak.

[0009] In a preferred embodiment, obtaining the local quality correction of the map based on the pairing quality deviation of the initial paired sites includes: calculating the pairing quality deviation of all initial paired sites based on the mass conservation relationship, and performing robust statistical summarization on the pairing quality deviation of all initial paired sites; wherein, the robust statistical summarization is to take the median of the pairing quality deviation of all initial paired sites; when the number of initial paired sites is not less than a preset minimum number of paired sites threshold, the median is taken as the local quality correction of the map; when the number of initial paired sites is less than the preset minimum number of paired sites threshold, the local quality correction of the map is set to zero.

[0010] In a preferred embodiment, after offset correction of the theoretical ion mass of the remaining sites based on the local mass correction amount of the spectrum, a second round of detection is performed, including: re-searching only the cross-joint fragmentation sites marked as initial single-end sites and initial missing sites in the first round; wherein, the local mass correction amount of the spectrum is the sum of the deviations between the b-ion end and the complementary y-ion end. In this embodiment, the theoretical b-ion and theoretical complementary y-ion generated in step one are single-charge theoretical ions, so the local mass correction amount of the spectrum is equally distributed to the b-ion end and the complementary y-ion end; For each fracture site of the cross-joint to be re-examined, its theoretical b-ion mass and theoretical complementary y-ion mass are each increased by half of the corresponding local mass correction amount in the spectrum to form a corrected theoretical mass value. With the corrected theoretical mass value as the center and the fragment ion mass tolerance as half the width, matching peaks are searched again in the residual sub-spectrum, and the pairing status value of the fracture site of the cross-joint is updated according to the search results. Among them, the pairing status values ​​of 2, 1, and 0 correspond to double-end measured, single-end measured, and double-end missing, respectively.

[0011] In a preferred embodiment, a continuity determination is performed on the paired state sequence of fracture sites across the joint, and the determination results from each database are combined into a verification vector. The continuity determination includes a dual-channel determination rule and a single-channel determination rule, specifically: After applying the dual-end channel determination rule and the single-end channel determination rule to all spectra in the same database, if at least one spectra in the database has a single-database determination status of "pass", then the database's verification conclusion for the candidate cross-BSJ peptide is "pass"; otherwise, the database's verification conclusion is "fail". The verification conclusions of all databases are combined into a verification vector.

[0012] In a preferred embodiment, the determination condition of the dual-end channel determination rule is: there exists at least one position index j in the cross-joint fracture site pairing state sequence, such that the pairing state value of the j-th cross-joint fracture site and the (j+1)-th cross-joint fracture site is 2. The determination condition for the single-ended channel determination rule is as follows: There exists at least one starting index j in the cross-joint fragmentation site pairing state sequence, such that the pairing state values ​​of all consecutive cross-joint fragmentation sites of a predetermined single-ended continuity window length starting from the j-th cross-joint fragmentation site are not less than 1; where j is the site index in the cross-joint fragmentation site pairing state sequence, and .

[0013] In a preferred embodiment, extracting cross-joint specific fragment ion features from the qualified spectra and writing them into a local targeted fragment feature library includes: calculating the signal-to-noise ratio (SNR) index for all qualified spectra, wherein the SNR index is the sum of the intensity of the signal peaks matched to the cross-joint fragmentation sites in the residual sub-spectrum divided by the sum of the intensity of the noise peaks not matched, and selecting the spectrum with the highest SNR index as the feature source spectrum. Traverse the cross-seam fragmentation sites in the feature source spectrum where the pairing state value is not less than 1, extract the measured mass and measured intensity of the detected fragment ions, and normalize the measured intensity to obtain the relative intensity; assemble the BSJ peptide sequence, precursor ion mass, total number of cross-seam fragmentation sites, feature fragment ion list, and local mass correction of the spectrum into a targeted fragment feature entry; when a new database is added subsequently, perform targeted matching between the peak list of the new spectrum and the feature fragment ion list in the targeted fragment feature entry; when the number of matched feature fragment ions is not less than the preset minimum number of matched ions and the cosine similarity is not less than the preset spectrum similarity threshold, the new spectrum is determined to be an effective reproduction of the BSJ peptide feature pattern.

[0014] The technical effects and advantages of the circRNA cross-BSJ peptide verification method integrating multiple public databases in this invention are as follows: This application pre-divides theoretical fragment ions into a linear homologous ion set and a BSJ-specific ion set, and performs signal decoupling operation on the original secondary mass spectrum, effectively eliminating the interference of linear homologous fragment signals on the retrieval of cross-seam sites, and significantly improving the signal-to-noise ratio and detection sensitivity of BSJ-specific ions. Building upon this foundation, adaptive correction of local mass shifts in the spectral data is performed using the pairing mass deviation of the initial paired sites. This allows BSJ fragment ions missed due to instrument mass axis shifts to be accurately paired and confirmed in the second round of detection, improving the completeness and accuracy of paired-end detection of cross-seam fragmentation sites. Simultaneously, this application performs independent validation from multiple independent public mass spectrometry databases and jointly verifies the validation conclusions using multi-database co-occurrence thresholds. This effectively avoids the risk of false positives caused by accidental matching from a single data source, enhancing the robustness and cross-platform reliability of the validation results. Furthermore, cross-seam-specific fragmentation features are extracted from validated peptides, and a local targeted fragmentation feature library is constructed. Subsequent new data can be rapidly determined for effective reproduction of fragmentation patterns through targeted matching, significantly reducing the computational overhead of repeated validation while ensuring accuracy. Attached Figure Description

[0015] Figure 1 This is a flowchart illustrating the method for validating circRNA across BSJ peptides using multiple public databases, as described in this invention. Figure 2 This is a schematic diagram illustrating an example of the classification of seam crossings and the renumbering of seam fracture sites according to the present invention. Figure 3 This is a schematic diagram of signal decoupling and two-round detection and correction in this invention; Figure 4 This is a schematic diagram illustrating the continuity determination and multi-database joint verification of the present invention; Figure 5 This is a schematic diagram illustrating the construction of the targeted fragment feature library of the present invention. Detailed Implementation

[0016] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0017] Example Please see Figure 1 As shown, this invention discloses a method for validating circRNA across BSJ peptides by integrating multiple public databases, comprising the following steps: Step 1: Obtain candidate cross-BSJ peptide sequences and their precursor ion masses. Calculate the theoretical fragment ion masses based on the standard peptide fragmentation theory. Divide the theoretical fragment ions into a linear homologous ion set and a BSJ-specific ion set. Extract raw secondary mass spectrometry data that match the precursor ion masses from multiple independent public proteomics mass spectrometry databases. The purpose of this step is to provide a complete and well-defined input dataset for subsequent signal decoupling, pairing bias correction, and continuity determination, including a classified list of theoretical fragment ion masses and a list of raw mass spectrum peaks from multiple sources; specifically, it includes: The system receives a candidate trans-BSJ peptide sequence output by the mass spectrometry library search engine. This candidate trans-BSJ peptide sequence is generated by translating a circular transcript formed by backsplicing of circRNA. Its amino acid sequence spans a backsplicing site, meaning there is a seam point in the BSJ peptide sequence. The residues upstream of this point originate from the downstream exon of the linear precursor RNA, and the residues downstream of this point originate from the upstream exon of the linear precursor RNA. Simultaneously, the system receives the precursor ion mass corresponding to this candidate trans-BSJ peptide, denoted as [mass value missing]. precursor ion mass Provided by the mass spectrometry search engine during the first-level mass spectrometry matching stage; It should be noted that the mass spectrometry search engine in this embodiment is used to receive the mass spectrometry data to be tested and the corresponding search database information, perform database matching search on the mass spectrometry data to be tested, and output candidate peptide sequences, the precursor ion mass corresponding to the candidate peptide, and the matching score result. The present invention preferably uses the candidate cross-BSJ peptide sequences and their precursor ion masses output by the mass spectrometry search engine as the input basis for subsequent theoretical fragment ion calculation and multi-library original spectrum extraction, rather than using the specific search algorithm of the mass spectrometry search engine itself as the limiting content of the present invention. According to the standard peptide fragmentation theory, calculate the monoisotopic masses of all theoretical b and y ions in this candidate trans-BSJ peptide; for a peptide of length... A peptide segment containing 10 amino acid residues, where L represents the total number of residues in the entire candidate trans-BSJ peptide segment. There are 10 possible fragmentation sites, each fragmentation site r, where r represents the original position index of all possible fragmentation sites of the peptide and This produces a pair of complementary fragment ions: The ion represents residues from the N-terminus of the sequence from the 1st to the rth residue. Ions represent sequences starting from the C-terminus 1 to 2. One residue; theory Ion mass equal to the previous The sum of the residue masses of each residue plus the mass of a proton. ;theory After the ion mass equals The sum of the residue masses of each residue plus the mass of a water molecule plus the mass of a proton; The above calculations were all performed based on the standard monoisotope residue mass table for each amino acid residue; among which, the proton mass... In this embodiment, the value of is set to 1.007276 Da, and the value of the water molecule mass is set to 18.010565 Da; It should be noted that the standard peptide fragmentation theory described in this embodiment refers to the conventional fragmentation mode in which, under tandem mass spectrometry fragmentation conditions, the peptide bonds in the main chain of a peptide break, forming a set of b ions composed of continuous N-terminal residues and a set of y ions composed of continuous C-terminal residues. Here, the b ion corresponds to the theoretical mass of the continuous prefix fragment of the peptide starting from the N-terminus, and the y ion corresponds to the theoretical mass of the continuous suffix fragment of the peptide starting from the C-terminus. The mass of each theoretical fragment ion is calculated based on a standard single isotope residue mass table, and corrected by combining the proton mass and the water molecule mass corresponding to the y ion. In this embodiment, step one preferably generates a basic theoretical fragment ion set based on single-charged b ions and single-charged y ions, temporarily excluding neutral loss ions, internal fragment ions, and other unconventional fragmentation ions from the basic theoretical mass list. This ensures that the theoretical mass set used for subsequent linear homologous ion erasure, cross-seam fragmentation signal retrieval, and pairing deviation correction has a clear, unified, and repeatable calculation basis. For the same original fragmentation site, the b ion and the complementary y ion are used together to characterize the complementary fragmentation results on both sides of the site. The pairing detection and pairing quality deviation correction in the subsequent step two are based on this complementary relationship. After the calculation is completed, all theoretical fragment ions are physically isolated and classified according to whether the b-ions and complementary y-ions generated at the corresponding original fragmentation sites cross the seam point; define the seam position index as S, indicating that the seam is located between the S-th residue and the (S + 1)-th residue of the peptide sequence; the original fragmentation site index r represents the cleavage position of the peptide backbone between the r-th residue and the (r + 1)-th residue, where r = 1, 2, …, L - 1; since both r and S represent the boundary position indices in the same peptide sequence coordinate system, the two can be directly compared in size; For any original fragmentation site r, its corresponding theoretical b-ion is , which contains the 1st to the r-th residues; its corresponding complementary theoretical y-ion is , which contains the (r + 1)-th to the L-th residues; If r ≤ S, then the ions do not cross the seam and are classified into the linear homologous ion set; if r > S, then the ions cross the seam and are classified into the BSJ-specific ion set; correspondingly, if r ≥ S, then the complementary ions do not cross the seam and are classified into the linear homologous ion set; if r < S, then the complementary ions cross the seam and are classified into the BSJ-specific ion set; Only when at least one of the ions or the complementary ions corresponding to the original fragmentation site r is classified into the BSJ-specific ion set, will the original fragmentation site r be marked as a cross-seam fragmentation site; when r = S, the ions and the complementary ions are located on both sides of the seam and do not cross the seam, so the original fragmentation site r is not marked as a cross-seam fragmentation site; Suppose a total of cross-seam fragmentation sites are generated after this classification, and they are re-numbered in the order from the N-terminus to the C-terminus of the peptide sequence as j, where j represents the re-numbering index of the cross-seam sites after screening, and its ; Subsequently, multi-library original spectrum extraction is performed. In this embodiment, at least three public proteomics mass spectrometry databases are pre-specified as data sources, including PRIDE, MassIVE, and iProX; the above databases come from different operating agencies and collection platforms and have an independent verification basis, and are respectively denoted as database , database , database ; among them, the principle for database selection is that each database is independently collected from different laboratories, different instrument platforms, or different time periods to ensure the independence basis for subsequent multi-library joint verification; for each database and Search all its secondary mass spectra for precursor ion masses and the precursor ion masses corresponding to peptide segments. The absolute value of the difference does not exceed the parent ion mass tolerance. The spectrum; for each spectrum screened by precursor ion matching, its complete peak list is extracted, including the mass-to-charge ratio of all fragment ions and their corresponding signal intensity values; where the precursor ion mass tolerance is included. Its initial value is set at 10 ppm, which is determined by the typical mass accuracy of the high-resolution orbital trap mass spectrometer in the first-level mass spectrometry scanning mode; Multiple precursor ion matching spectra may exist in the same database. To control subsequent computational load and retain the spectra with the most information, the matching spectra in each database are sorted in descending order of total ion current intensity, and the top-ranked spectra are retained. Zhang Tupu will proceed to the next step; This indicates the upper limit of the number of spectra to be retained in a single library, with an initial preset value of 5. This setting is based on the fact that the same precursor ion often has multiple duplicate or near-duplicate secondary mass spectra in public databases. The top 5 spectra are retained in order of total ion current intensity, which can cover the main strong signal spectra and take into account the computational load control of subsequent dual-round detection and multi-library joint determination. If the number of spectra that meet the precursor ion matching conditions in a database is less than 5, all of them are retained. It should be noted that the results of the above physical isolation classification make the signal decoupling operation in the subsequent step two have a clear target for elimination: the theoretical quality value of the linear homologous ion set is used to identify and erase homologous interference signals in the original spectrum; at the same time, the theoretical quality value of the BSJ specific ion set is used to retrieve cross-joint fragmentation signals in the residual spectrum after interference elimination; the clear division of the two sets is a prerequisite for the correct execution of all subsequent processing steps. Please see Figure 2As illustrated, for ease of explanation, consider a 7-residue trans-BSJ peptide segment with the suture point located between the 5th and 6th residues. The original fragmentation site r ranges from 1 to 6. For original fragmentation sites r = 1, 2, 3, and 4, the corresponding complementary y-ion sequences all contain residues on both sides of the suture point, making them BSJ-specific ions. For original fragmentation site r = 6, the corresponding b-ion sequence contains residues on both sides of the suture point, making it a BSJ-specific ion. For original fragmentation site r = 5, the b5-ion sequence... The ion and its complementary γ2 ion are located on opposite sides of the seam and do not cross the seam point, therefore they are not marked as seam-crossing fragmentation sites. After classification, a total of 5 seam-crossing fragmentation sites were identified, with their corresponding original fragmentation sites being r=1, 2, 3, 4, and 6, and renumbered as j=1, 2, 3, 4, and 5 in order from the N-terminus to the C-terminus. Eight parent ion matching spectra were retrieved from database A, and the first 5 were retained after sorting by total ion current to proceed to step two. Three spectra were retrieved from database B, and all were retained. Six spectra were retrieved from database C, and the first 5 were retained. This step outputs three results, which are then passed to step two: the first is a list of theoretical masses for the linear homologous ion set, used for subsequent signal decoupling; the second is a list of theoretical b-ion masses and complementary y-ion masses for all cross-junction fragmentation sites in the BSJ-specific ion set, totaling... The third item is a list of raw secondary mass spectrometry peaks from each database that have passed the screening process. Each spectrum is accompanied by the identifier of its source database and its sort number in the database.

[0018] Please see Figure 3 As shown, step two: Signal decoupling is performed on each original secondary mass spectrum using a linear homologous ion set to obtain a residual sub-spectrum. In the residual sub-spectrum, the theoretical b-ions and theoretical complementary y-ions at each cross-junction fragmentation site are detected in the first round of detection. Cross-junction fragmentation sites where both b-ions and complementary y-ions are detected are marked as initial paired-end sites. Based on the pairing mass deviation of the initial paired-end sites, a local mass correction amount is obtained for the spectrum. After offset correction of the theoretical ion mass of the remaining sites based on the local mass correction amount, a second round of detection is performed. The results of the two rounds of detection are combined to construct a pairing state sequence of cross-junction fragmentation sites; specifically, this includes: For any map output from step one, it comes from the database. The Zhang Tupu, denoted as Perform signal decoupling operation, traverse each theoretical mass value in the linear homologous ion set output in step one, and search the peak list of the spectrum for a mass value whose mass deviation does not exceed the fragment ion mass tolerance. The peak; for the matched peak, its intensity is set to zero; where the fragment ion mass tolerance Its initial value is set at 20 ppm, which is determined by the typical fragment ion mass accuracy of the high-resolution mass spectrometer in the secondary mass spectrometry fragmentation scan mode, and is usually 1.5 to 2 times the parent ion mass tolerance. After the signal decoupling operation is completed, the remaining peaks in the spectrum constitute the residual sub-spectrum, denoted as . The residual sub-spectrum no longer contains interference signals from linear homologous fragment ions, retaining only BSJ-specific ion signals and background noise; Furthermore, the first round of pairing detection is performed on the residual sub-map; it should be noted that the theoretical b ions and theoretical complementary y ions participating in the first and second rounds of detection in this step are all single-charge theoretical fragment ions generated in step one. Traverse the output of step one Each cross-joint fracture site, for each cross-joint fracture site Search for theoretical fracture sites of the cross-joint in the residual sub-maps. Ion mass deviation not exceeding The peak, and the mass deviation of the γ-ion theoretically complementary to the fracture site of the cross joint, does not exceed [a certain value]. If both peaks are found, the fracture site of the cross-joint is marked as the initial paired site; if only one peak is found, it is marked as the initial single-ended site; if neither peak is found, it is marked as the initial missing site. For the cross-joint fragmentation site j, which was determined to be an initial double-ended site by the first round of pairing detection, its measured mass of b ions was recorded. Measured mass of complementary γ ions The two values ​​are taken from the actual mass-to-charge ratio readings of the matching peaks in the residual sub-spectrum; It should be noted that if there are multiple candidate peaks within the mass tolerance window corresponding to any theoretical ion mass, the candidate peak with the smallest absolute mass deviation from the theoretical ion mass is preferably selected as the matching peak. When the absolute mass deviations of two or more candidate peaks are the same or approximately the same, the candidate peak with the largest signal intensity is further selected as the matching peak. The first round of pairing detection in step two, the second round of post-correction detection, and the feature fragment extraction in step four are all performed according to this rule to ensure the uniqueness and consistency of the matching results. Next, a pairwise mass deviation correction calculation is performed. The physical basis of this calculation is the law of conservation of mass in mass spectrometry fragmentation: at any cross-slit fragmentation site, the sum of the masses of the b-ion and its complementary y-ion is theoretically strictly equal to the precursor ion mass plus a proton mass. Therefore, for each cross-slit fragmentation site j identified as an initial double-ended site, the difference between the measured sum of its mass pairs and the theoretical value directly reflects the instrument measurement offset of the spectrum in that local mass range. For each cross-slit fragmentation site j identified as an initial double-ended site, the measured mass of its b-ion and the measured mass of its complementary y-ion are added together, and the precursor ion mass is subtracted. With proton mass The sum of these values ​​and the resulting difference represent the pairing quality deviation of the initial paired loci. This is used to quantify the instrument offset of a single paired site; the formula for calculating the paired quality deviation is expressed as: ;in, Indicates the first Measured mass values ​​of the b-ion matching peaks at each cross-joint site. This represents the measured mass value of the complementary γ-ion matching peak at this site. This indicates the precursor ion mass of the spectrum. For the precise mass of the proton; Furthermore, robust statistical methods were used to summarize the pairing quality deviations of all initial paired sites in the same spectrum to obtain the local quality correction: first, outlier values ​​deviating from the mean by three times the standard deviation were removed, and then the following analysis was performed: The number of cross-joint fracture sites determined as initial double-end points Not less than the preset minimum number of paired sites threshold At that time, the median deviation value of all cross-joint fragmentation sites identified as initial double-end sites was taken as the local quality correction value of the atlas. Among them, the minimum number of pre-set pairing sites threshold Its initial value is set to 1; in some embodiments, the minimum number of paired sites can be set to 2 to ensure that the local quality correction of the map is determined by no less than two initial paired sites, thereby improving the stability of local offset estimation. The formula for calculating the local quality correction of the spectrum is expressed as follows: ;in, This indicates the median operation. This represents the total number of cross-joint fragmentation sites identified as initial double-ended sites in the current spectrum. The reason for taking the median instead of the mean is that the median has a natural ability to resist interference from single abnormal ion pairings, avoiding the risk that the correction value will be skewed by extreme values ​​after a noise peak is mismatched. The total number of cross-joint fracture sites determined to be initial double-ended sites Less than the preset minimum number of paired sites threshold When this occurs, it indicates that the high-confidence paired information in the spectrum is insufficient to support reliable correction calculation. In this case, the local quality correction value of the spectrum is set to... That is, without applying any correction, the center of the tolerance window for the subsequent second round of testing will not shift; A second round of correction and detection was then performed. Specifically, only the cross-joint fragmentation sites marked as initial single-end sites and initial missing sites in the first round were re-searched. Since the local quality correction of the spectrum is the sum of the deviations between the b-ion end and the complementary y-ion end, and the theoretical data generated in step one of this embodiment... Since the ion mass and theoretical complementary y ion are single-charge theoretical ions, the local mass correction amount of the spectrum is equally divided between the b-ion end and the complementary y-ion end. For each cross-junction fragmentation site to be re-examined, half of the local mass correction amount of the spectrum is added to both the theoretical b-ion mass and the theoretical complementary y-ion mass to form the corrected theoretical mass value specific to that spectrum. Then, using this corrected theoretical mass value as the center, the fragment ion mass tolerance is calculated. The peak width is set at half width, and a matching peak is re-searched in the residual sub-spectrum; then a second round of correction is performed before detection. The pairing status of the cross-joint fragmentation site is updated based on the search results: if both complementary ions are detected, the status value is assigned as 2, i.e., double-ended measurement; if only one is detected, it is assigned as 1, i.e., single-ended measurement; if neither is detected, it is assigned as 0, i.e., double-ended absence; cross-joint fragmentation sites that have been identified as initial double-ended sites in the first round are directly assigned a status value of 2 and do not participate in the second round of detection. Finally, the entire atlas was... The state values ​​of each fracture site across the joint are arranged in the order of their renumbering, forming a paired state sequence of fracture sites across the joint. Each element The value can be 0, 1, or 2; For example, following the example of step one, One cross-joint site; for the database The first image shows the following results from the first round of testing: Cross-joint fracture site 1 (double-ended). ppm, fracture site 2 at both ends of the joint ppm, only b ions were detected at cross-joint fragmentation site 3, while cross-joint fragmentation sites 4 and 5 were both absent; initial number of paired-end sites ,satisfy The conditions; the median of the pairing quality deviation is obtained. ppm; Second round of re-detection of sites 3, 4, and 5: After correction of the complementary γ-ion at the cross-joint fragmentation site 3, the theoretical value center shift caused a peak that originally fell outside the window edge to be re-included in the window, and the cross-joint fragmentation site 3 was upgraded to a double-ended peak. Y ions were detected at the fracture site 4 of the cross-joint after correction. No signal was detected at fracture site 5 of the cross-joint. The final output of the state sequence of the graph is as follows: and correction amount ppm; This step outputs three results, which are then passed to step three: the first is the residual sub-spectrum data of each spectrum after erasing from the same source signal; the second is the local quality correction amount for each spectrum. The third item is the final site pairing state sequence generated after two rounds of scanning for each map. .

[0019] Please see Figure 4 As shown, step three: Perform continuity determination on the pairing state sequence of fracture sites across the joint, and combine the determination results of each database into a verification vector. Verification is considered successful when the number of databases that pass the determination is not less than a preset multi-database co-occurrence threshold. Specifically, this includes: For data from the database The Zhang Tupu, read the pairing state sequence of cross-joint fragmentation sites output in step two. and the corresponding correction amount Each element in the sequence The value is 0, 1 or 2, which correspond to the three corrected final detection states of the fracture site of the cross joint: double-end missing, single-end measured and double-end measured, respectively. At the single-database level, the following continuity determination is performed on each map; this embodiment sets up two mutually exclusive continuity determination channels, which are evaluated in descending order of priority: The first continuity determination rule is called the two-end channel determination rule, which determines the paired state sequence at the cross-joint fracture site. In the process, check whether there is at least one pair of geographically adjacent sites. and ,in , making and Simultaneously true; the mathematical expression of the dual-channel determination condition is: ;in, and The pairing state sequence of the fracture sites of the cross joint is as follows: and the Pairing status values ​​of each cross-joint fracture site; If the dual-channel determination condition is met, the spectrum is directly determined to be passed, and the single-library determination status of the spectrum is recorded. The subsequent rules will no longer be evaluated. The physical basis is that both adjacent cross-joint fragmentation sites are detected at both ends, which means that physical fragmentation signals were generated independently from the N-end and C-end directions within this local sequence interval, and the signals in these two directions exist simultaneously at adjacent sites, constituting a bidirectional complementary verification of the sequence crossing the joint; adjacency ensures the spatial continuity of the fragmentation chain, and the duality ensures that the signal at each cross-joint fragmentation site is independently confirmed in both directions. The second continuity determination rule is called the single-end channel determination rule. If the double-end channel determination condition is not met, the pairing status sequence of the cross-joint fragmentation sites continues to be checked. Does there exist at least one segment with a length not less than the length of a single-ended continuity window? A continuous subsequence of sites, such that the state value of each site in the subsequence is not less than 1; Among them, the single-ended continuity window length Its initial value is set to 3, which is greater than the 2 sites required by the two-end channel rule, reflecting the design principle of compensating for insufficient single-end evidence strength with a longer continuous chain. The selection criterion is as follows: Under the random noise model, it is assumed that the probability of a noise peak being mismatched at a single location is... With a typical value of approximately 0.05 to 0.10, the probability of three consecutive mismatches is... The cube of the order of magnitude is to Between these levels, the requirement of controlling the false positive rate to be at the level of one in a thousand is met; The mathematical expression for the single-ended channel determination condition is: ;in, The first pairing state sequence of cross-joint fragmentation sites Pairing status values ​​of each cross-joint fracture site; If the single-end channel determination condition is met, the spectrum is determined to be passed, and the single-library determination status is recorded. ; If neither the dual-channel nor the single-channel judgment condition is met, the spectrum is judged as failing, and the single-library judgment status is recorded. ; After completing the single-database level determination, for the same database The number of maps that pass the judgment is summarized; if the database There is at least one map in it. The validation conclusion of this database for the BSJ peptide is denoted as follows: ;otherwise ; At the multi-database level, the validation conclusions of all databases are combined to form a validation vector. When the number of elements with a value of 1 in the vector is not less than the preset multi-library co-occurrence threshold. At that time, the final verification result for this cross-BSJ peptide segment was determined to be passed; a preset multi-library co-occurrence threshold was set. The initial value is set to 2. The preset multi-database co-occurrence threshold is set to 2 to avoid relying solely on accidental matching results from a single database to form a final conclusion. When at least two independent data sources reproduce the same cross-seam fragmentation pattern, the robustness of the final verification conclusion can be improved without significantly reducing recall. That is, only when the corrected paired state sequences of at least two independent databases based on their own spectra pass the above continuity judgment can it be confirmed that the cross-seam fragmentation pattern of the BSJ peptide can be reproduced in multiple independent data sources. The following example demonstrates the direct impact of the correction process in step two on the judgment conclusion of this step; continuing from the database example in step two. The first image has the following corrected state sequence: The fracture site 1 and fracture site 2 of the cross-joint are adjacent and both are 2, thus the condition for determining a double-ended channel is met. ; For another map Assuming the result of the first round is After correction, the fracture site 2 across the joint was upgraded from 1 to 2, resulting in... The dual-channel determination condition is met at both the cross-joint fracture site 1 and the cross-joint fracture site 2, thus the determination passes; while the state sequence without correction is... The double-ended channel determination condition is not met if there are no adjacent double-ended pairs. The single-ended channel determination condition requires a continuous length of [value missing] in the segment from cross-joint fracture site 1 to cross-joint fracture site 4. If the single-ended channel determination condition is met, it is still determined to pass, but the single-ended channel is used. It can be seen that the correction process in step two changes the specific determination channel triggered in step three, thus affecting the level of verification confidence. database Verification conclusions ,database Verification conclusions The verification vector is The number of elements with a value of 1 is 2, which is not less than 2. The BSJ peptide was ultimately validated. This step outputs two results: the first is the final verification conclusion of the BSJ peptide, including whether it passes or fails; the second is the spectrum number of the passed spectrum in each database, along with its corresponding correction amount and state sequence, which is then passed to step four for feature spectrum selection.

[0020] Please see Figure 5 As shown, step four: For the validated peptides, extract the cross-joint specific fragment ion features from the validated spectra and write them into the local targeted fragment feature library; specifically, this includes: This step summarizes all judgment statuses from all databases for the BSJ peptides that passed verification in step three. The graph; for each graph that passes the judgment, calculate its residual subgraph. The signal-to-noise ratio (SNR) index is calculated as follows: In the residual sub-spectrum, the peaks that have been successfully matched to the cross-joint fracture sites during the first and second rounds of detection in step two are collectively referred to as signal peaks, and the remaining unmatched peaks are collectively referred to as noise peaks; the SNR index is defined as the sum of the intensities of all signal peaks divided by the sum of the intensities of all noise peaks. When no noise peaks are present, the signal-to-noise ratio (SNR) is assigned the value of the sum of the peak intensities of the spectrum. All valid spectra are sorted in descending order of their SNR, and the spectrum with the highest SNR is selected as the feature source spectrum, denoted as [missing value]. The database containing this map is denoted as The corresponding correction amount is The pairing state sequence of its cross-joint fragmentation sites is as follows ; Subsequently, characteristic information of cross-joint specific fragment ions was extracted from the residual sub-spectrum of the characteristic source map; traversing Each cross-joint fracture site, for the state value For the fragmentation sites across the joints, the measured mass and measured intensity of the detected fragment ions are extracted; if b ions are detected at the fragmentation site across the joints, their measured mass is recorded. and measured strength If complementary γ-ions are detected at the fracture site of the cross-joint, record its measured mass. and measured strength The intensity values ​​of all detected fragment ions are normalized. In this embodiment, the normalization process can be based on the maximum intensity value among all extracted fragment ions. The intensity of each ion is divided by the maximum intensity value and then multiplied by 1000 to obtain the relative intensity value and rounded to the nearest integer. The normalized feature information is assembled into a target fragment feature entry; the data structure of this entry includes the following fields: BSJ peptide sequence, precursor ion mass. Total number of fracture sites across joints The list of characteristic fragment ions includes an ion type identifier (b or y) and a site number for each ion. The corrected theoretical mass is the original theoretical mass plus... The relative intensity value, the source database number of the characteristic source spectrum, and the spectrum-level correction amount. Write this entry into the local targeted fragment feature library; The local targeted fragment feature library uses an indexed structure for storage, with precursor ion mass as the primary index key. All entries are arranged in ascending order of precursor ion mass, supporting tolerances for given precursor ion mass. Binary search within the database; the fragment ion list for each record is sorted in ascending order of corrected theoretical mass, supporting a given fragment ion mass tolerance. One-to-one matching within; The feature library is used in subsequent verification tasks as follows: When a new public database is added... When a new batch of experimental mass spectrometry data needs to validate the same batch of BSJ peptides, the precursor ion matching entries are directly searched in the local feature library. The peak list of the new spectrum is then matched with the feature fragment ion list in the entries, and the number of matched fragment ions and their cosine similarity in relative intensity are calculated. Assuming the feature entries contain... Each fragment ion, for the new spectrum and the characteristic fragment ions in Ion pairs matched within tolerance will form a vector from the relative intensities in the feature entries. The normalized intensity composition vector of the corresponding matching peak in the new spectrum The cosine similarity is calculated by dividing the inner product of two vectors by the product of their respective magnitudes. The formula for calculating cosine similarity is shown below: ;in, This represents the relative intensity vector of fragment ions in the feature entries. This is the normalized intensity vector of the corresponding matching peak in the new spectrum; When the cosine similarity is not lower than the preset graph similarity threshold At this time, the new map is considered a reproduction of the characteristic pattern of the BSJ peptide; a preset map similarity threshold is used. The preset spectrum similarity threshold is used to measure the consistency between the new spectrum and the local feature library entries in terms of fragment ion intensity distribution. Its initial value is set to 0.7, which is an initial judgment threshold given under the premise of taking into account the instrument fluctuation tolerance and feature mode discrimination. In some embodiments, this threshold can also be calibrated by combining the cosine similarity distribution of historically verified BSJ peptide spectra and non-target spectra, and adjusted according to different instrument platforms or acquisition batches. It is important to emphasize that, in order to avoid high cosine similarity due to accidental matching of only a few feature ions, before determining that the new spectrum is a valid reproduction of the BSJ peptide feature pattern, the number of feature fragment ions matched between the new spectrum and the target fragment feature entry is not less than the preset minimum number of matching ions. In this embodiment, the preset minimum number of matching ions can be set to 3. When the number of matched feature fragment ions is not less than the preset minimum number of matching ions and the cosine similarity is not lower than the preset spectrum similarity threshold, the new spectrum is considered a valid reproduction of the BSJ peptide feature pattern. Furthermore, the prerequisite for using the target matching pattern is that the local feature library already contains BSJ peptide entries that have passed the full process verification of steps one to three. For BSJ peptides that appear for the first time and do not exist in the feature library, the complete process of steps one to three still needs to be performed for verification. After successful verification, the feature entry is written back to the feature library to form a closed-loop accumulation. Following the previous example, the BSJ peptide passed the verification, and a total of 3 spectra passed the assessment. The first and second pictures, The first image; after calculating the signal-to-noise ratio. The first map, ranking highest, was selected as the feature source map; its state sequence... A total of 7 fragment ions were detected at fracture sites 1 to 4 across the joint; the measured mass and normalized relative intensity of these 7 ions constitute a characteristic fragment ion list, along with... The ppm value is written into the feature library; if new database entries are added later... Targeting matching was performed by directly extracting entries from the feature library for this BSJ peptide. Five characteristic fragment ions were matched in a new spectrum, with a cosine similarity of 0.82, exceeding [the target value]. If the threshold is reached, the graph is considered a valid reproduction. This step outputs two results: the first is the targeted fragment feature library entries written to or updated to local storage; the second is the targeted matching process itself, which serves as an accelerated retrieval channel when new databases are subsequently accessed, replacing the full sequence search process.

[0021] The above formulas are all dimensionless calculations. The formulas are derived from software simulations based on a large amount of collected data to obtain the most recent real-world results. The preset parameters in the formulas are set by those skilled in the art according to the actual situation.

[0022] The above embodiments can be implemented, in whole or in part, by software, hardware, firmware, or any other combination thereof. When implemented using software, the above embodiments can be implemented, in whole or in part, in the form of a computer program product.

[0023] Those skilled in the art will recognize that the modules and algorithm modules of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and inventive constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0024] In addition, the functional modules in the various embodiments of this application can be integrated into one processing module, or each module can exist physically separately, or two or more modules can be integrated into one module.

[0025] The above are merely specific embodiments of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

[0026] In conclusion, the above are merely preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A method for validating circRNA across BSJ peptides by integrating multiple public databases, characterized in that, Includes the following steps: Candidate cross-BSJ peptide sequences and their precursor ion masses were obtained. The theoretical fragment ion masses were calculated based on the standard peptide fragmentation theory. The theoretical fragment ions were divided into linear homologous ion sets and BSJ-specific ion sets. Raw secondary mass spectrometry data matching the precursor ion masses were extracted from multiple independent public proteomics mass spectrometry databases. A residual sub-spectrum was obtained by decoupling the signal from each original secondary mass spectrum using a linear homologous ion set. In the residual sub-spectrum, the theoretical b-ions and theoretical complementary y-ions at each cross-junction fragmentation site were detected in the first round of detection. Cross-junction fragmentation sites where both b-ions and complementary y-ions were detected were marked as initial paired sites. A local mass correction was obtained based on the pairing mass deviation of the initial paired sites. The theoretical ion mass of the remaining sites was then shifted and corrected based on the local mass correction before a second round of detection. The pairing state sequence of cross-junction fragmentation sites was constructed by combining the results of the two rounds of detection. A continuity determination was performed on the pairing state sequence of cross-junction fragmentation sites. The determination results from each database were combined to form a verification vector. When the number of databases that passed the determination was not less than a preset multi-database co-occurrence threshold, the candidate cross-BSJ peptide was deemed to have passed verification. For peptides that pass the verification, the cross-joint specific fragment ion features are extracted from the qualified spectra and written into the local targeted fragment feature library.

2. The method for validating circRNA across BSJ peptides by integrating multiple public databases according to claim 1, characterized in that, The theoretical fragment ions are divided into a linear homologous ion set and a BSJ-specific ion set, including: defining the seam position index as S, indicating that the seam is located between the Sth residue and the (S+1)th residue of the peptide sequence, and the original fragmentation site r, indicating the break position of the peptide backbone between the rth residue and the (r+1)th residue. For any original fragmentation site r, its corresponding theoretical b ion is , The ion contains the 1st to the rth residues. When r ≤ S, The ion is classified into the linear homologous ion set. When r > S, The ion is classified into the BSJ-specific ion set; for the original fragmentation site r, its corresponding complementary theoretical y ion is , The ion contains the (r + 1)th to the Lth residues. When r ≥ S, the complementary theoretical y ion is classified into the linear homologous ion set. When r < S, the complementary theoretical y ion is classified into the BSJ-specific ion set; where L represents the total number of residues of the entire candidate cross-BSJ peptide segment. When at least one of the theoretical b ions or complementary theoretical y ions corresponding to the original fragmentation site r is included in the BSJ specific ion set, the original fragmentation site r is marked as a cross-joint fragmentation site.

3. The method for validating circRNA across BSJ peptides by integrating multiple public databases according to claim 2, characterized in that, The number of multiple independent public proteomics mass spectrometry databases should be at least three; for each public proteomics mass spectrometry database, the absolute value of the difference between the precursor ion mass and the precursor ion mass of the candidate trans-BSJ peptide should not exceed the parent ion mass tolerance in all its secondary mass spectra; the spectra that pass the matching screening should be sorted in descending order of total ion current intensity, and the maximum number of spectra to be retained in a single database should be kept at the top of the list.

4. The method for validating circRNA across BSJ peptides by integrating multiple public databases according to claim 1, characterized in that, The residual sub-spectrum is obtained by performing signal decoupling on each original secondary mass spectrum using a linear homologous ion set. This includes: traversing each theoretical mass value in the linear homologous ion set, searching for a peak in the peak list of the spectrum whose mass deviation does not exceed the mass tolerance of the fragment ions, and setting the intensity of the matched peak to zero; after the operation is completed, the remaining peaks in the spectrum constitute the residual sub-spectrum.

5. The method for validating circRNA across BSJ peptides by integrating multiple public databases according to claim 4, characterized in that, The first round of detection was performed on each cross-joint fragmentation site in the residual sub-spectrum, including: traversing all cross-joint fragmentation sites, and for each cross-joint fragmentation site, searching for peaks in the residual sub-spectrum whose theoretical b-ion mass deviation from the site does not exceed the fragment ion mass tolerance, and peaks whose theoretical complementary y-ion mass deviation from the site does not exceed the fragment ion mass tolerance. When both peaks are found, they are marked as initial paired-end sites; when only one peak is found, it is marked as initial single-end sites; when neither is found, it is marked as initial missing sites; when multiple candidate peaks exist within the same mass tolerance window, the candidate peak with the smallest absolute mass deviation from the theoretical ion mass is selected as the matching peak; when multiple candidate peaks have the same or approximately the same absolute mass deviation from the theoretical ion mass, the candidate peak with the largest intensity is selected as the matching peak.

6. The method for validating circRNA across BSJ peptides by integrating multiple public databases according to claim 1, characterized in that, The local quality correction of the map is obtained based on the pairing quality deviation of the initial paired sites, including: calculating the pairing quality deviation of all initial paired sites based on the mass conservation relationship, and performing robust statistical summarization of the pairing quality deviation of all initial paired sites; wherein, the robust statistical summarization is to take the median of the pairing quality deviation of all initial paired sites; when the number of initial paired sites is not less than the preset minimum number of paired sites threshold, the median is taken as the local quality correction of the map; when the number of initial paired sites is less than the preset minimum number of paired sites threshold, the local quality correction of the map is set to zero.

7. The method for validating circRNA across BSJ peptides by integrating multiple public databases according to claim 6, characterized in that, After offset correction of the theoretical ion mass of the remaining sites based on the local mass correction of the spectrum, a second round of detection is performed, including: re-searching for the cross-joint fragmentation sites marked as initial single-end sites and initial missing sites in the first round of detection; since the local mass correction of the spectrum is the sum of the deviations between the b-ion end and the complementary y-ion end, the local mass correction of the spectrum is equally distributed to each end. For each fracture site of the cross-joint to be re-examined, half of the local mass correction amount of the spectrum is added to its theoretical b-ion mass and theoretical complementary y-ion mass to form a corrected theoretical mass value. With the corrected theoretical mass value as the center and the fragment ion mass tolerance as half the width, matching peaks are searched again in the residual sub-spectrum, and the pairing status value of the fracture site of the cross-joint is updated according to the search results. Among them, the pairing status values ​​of 2, 1, and 0 correspond to double-end measured, single-end measured, and double-end missing, respectively.

8. The method for validating circRNA across BSJ peptides by integrating multiple public databases according to claim 1, characterized in that, Continuity determination is performed on the paired state sequence of fracture sites across the joint, and the determination results from each database are used to form a verification vector. The continuity determination includes two-end channel determination rules and one-end channel determination rules, specifically: After applying the dual-end channel determination rule and the single-end channel determination rule to all spectra in the same database, if at least one spectra in the database has a single-database determination status of "pass", then the database's verification conclusion for the candidate cross-BSJ peptide is "pass"; otherwise, the database's verification conclusion is "fail". The verification conclusions of all databases are combined into a verification vector.

9. The method for validating circRNA across BSJ peptides by integrating multiple public databases according to claim 8, characterized in that, The determination condition of the dual-end channel determination rule is: there exists at least one position index j in the pairing state sequence of cross-joint fracture sites such that the pairing state value of the j-th cross-joint fracture site and the (j+1)-th cross-joint fracture site is 2. The determination condition of the single-end channel determination rule is: there is at least one starting index j in the pairing state sequence of cross-joint fracture sites, such that the pairing state value of the consecutive cross-joint fracture sites with a preset single-end continuity window length starting from the j-th cross-joint fracture site is not less than 1. Where j is the site index in the cross-joint fragmentation site pairing state sequence, and .

10. The method for validating circRNA across BSJ peptides by integrating multiple public databases according to claim 1, characterized in that, Extracting cross-joint specific fragment ion features from the qualified spectra and writing them into the local targeted fragment feature library includes: calculating the signal-to-noise ratio (SNR) index for all qualified spectra, wherein the SNR index is the sum of the intensity of the signal peaks matched to the cross-joint fragmentation sites in the residual sub-spectrum divided by the sum of the intensity of the noise peaks not matched, and selecting the spectrum with the highest SNR index as the feature source spectrum. Traverse the cross-seam fragmentation sites in the feature source spectrum where the pairing state value is not less than 1, extract the measured mass and measured intensity of the detected fragment ions, and normalize the measured intensity to obtain the relative intensity; assemble the BSJ peptide sequence, precursor ion mass, total number of cross-seam fragmentation sites, feature fragment ion list, and local mass correction of the spectrum into a targeted fragment feature entry; when a new database is added subsequently, perform targeted matching between the peak list of the new spectrum and the feature fragment ion list in the targeted fragment feature entry; when the number of matched feature fragment ions is not less than the preset minimum number of matched ions and the cosine similarity is not less than the preset spectrum similarity threshold, the new spectrum is determined to be an effective reproduction of the BSJ peptide feature pattern.