Use of polypeptide fragment as barcode in nucleic acid sequencing
By using peptide fragments as barcodes in nucleic acid sequencing, the problem of limited signal features in traditional nanopore sequencing is solved, achieving higher sample differentiation accuracy and a simplified signal analysis process.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- SHENZHEN HUADA GENE INST
- Filing Date
- 2024-12-25
- Publication Date
- 2026-07-02
Smart Images

Figure PCTCN2024142120-FTAPPB-I100001 
Figure PCTCN2024142120-FTAPPB-I100002 
Figure PCTCN2024142120-FTAPPB-I100003
Abstract
Description
The use of peptide fragments as barcodes in nucleic acid sequencing Technical Field
[0001] This invention belongs to the field of biotechnology. Specifically, it relates to the use of polypeptide fragments as barcodes in nucleic acid sequencing. More specifically, it relates to a method for labeling nucleic acid molecules, a method for preparing nucleic acid sequencing libraries, sequencing libraries, and a method for distinguishing different nucleic acids to be tested in a sample. Background Technology
[0002] Using barcodes to label and identify specific molecules or sequences is a common technique in molecular biology and genomics. It allows for the tracking and identification of these molecules in experiments by introducing specific marker sequences into the molecules being tested.
[0003] The use of barcodes for labeling stems from the demand for high-throughput sequencing technology. Traditional barcodes for labeling are typically composed of natural nucleotide sequences (ATGCs), as is the case with barcode labeling methods used in single-molecule sequencing based on nanopore platforms. However, these natural nucleic acid sequence barcodes rely on base recognition, resulting in a relatively high error probability. Furthermore, the limited length of nucleic acid sequences restricts the design space for barcodes. To improve the accuracy of barcode detection and broaden the feasible design types of barcodes, there is an urgent need to develop a new labeled barcode based on nanopore sequencing current signals. Summary of the Invention
[0004] This invention aims to address at least one of the technical problems existing in the prior art. To this end, this invention provides the use of polypeptide fragments as barcodes in nucleic acid sequencing.
[0005] This invention is based on the following discoveries of the inventors:
[0006] In the field of nanopore sequencing, traditional sample differentiation methods rely on specific tag sequences. However, these methods suffer from limited current characteristics and unclear signal features when distinguishing different nucleic acids. Subsequent training with neural networks on large amounts of data is required. Furthermore, since both the tag sequence and the nucleic acid being tested are nucleotide sequences, the identification of the tag sequence is easily interfered with by the nucleic acid being tested, affecting the accuracy of identification. To overcome this problem, this invention innovatively proposes using peptide fragments as barcodes in nucleic acid sequencing. Through their unique influence on the current signal, more accurate sample differentiation (distinguishing the sample origin of the nucleic acid being tested) and molecular differentiation (distinguishing different nucleic acid fragments within the same sample) are achieved. Compared with traditional current signal analysis methods based on base sequences, this invention not only improves the richness of signal information but also enables sample differentiation in the early stages of data processing, thereby optimizing the subsequent analysis workflow. Simultaneously, the introduction of peptide fragments significantly enhances the characteristics of the current signal, improves the accuracy of sample differentiation, and makes real-time sequencing analysis possible.
[0007] In a first aspect, the present invention proposes the use of polypeptide fragments as barcodes in nucleic acid sequencing.
[0008] In a second aspect, the present invention provides a method for labeling nucleic acid molecules. According to an embodiment of the invention, the method includes: providing a barcode and a nucleic acid to be tested; and concatenating the barcode with the nucleic acid to be tested, wherein the barcode includes a polypeptide fragment. According to the method of the present invention, using polypeptides as sample barcodes or molecular barcodes to label nucleic acid molecules, particularly those subsequently used for nanopore sequencing, can significantly enhance sequencing signal characteristics and improve sample differentiation accuracy.
[0009] In a third aspect, the present invention provides a method for preparing a nucleic acid sequencing library. According to an embodiment of the invention, the method includes labeling the nucleic acid to be tested using the method described in the second aspect of the invention to obtain a nucleic acid sequencing library. According to the method of the embodiment of the invention, peptides are used as sample barcodes or molecular barcodes to prepare the nucleic acid sequencing library, which can significantly enhance sequencing signal characteristics and improve sample discrimination accuracy during nanopore sequencing.
[0010] In a fourth aspect, the present invention provides a sequencing library. According to an embodiment of the invention, the library is prepared by the method described in the third aspect of the invention. The sequencing library of the present invention generates a unique current signal, distinct from nucleic acid molecules and possessed by barcode peptides, when passing through nanopores, thereby improving sample differentiation accuracy and simplifying subsequent signal analysis procedures.
[0011] In a fifth aspect, the present invention provides a method for distinguishing different analyte nucleic acids in a sample. According to an embodiment of the present invention, the method includes: preparing a nucleic acid sequencing library using the method described in the third aspect of the present invention; performing sequencing processing on the nucleic acid sequencing library to obtain a current signal; and distinguishing different analyte nucleic acids in the sample based on the differences in the current signals. The method for distinguishing different analyte nucleic acids in a sample according to the embodiments of the present invention significantly enhances the characteristics of the sequencing signal, improves the accuracy of sample distinction, simplifies the subsequent signal analysis process, and enables real-time sequencing analysis by introducing barcode peptides into the nucleic acid sequencing library.
[0012] Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description
[0013] The above and / or additional aspects and advantages of the present invention will become apparent and readily understood from the description of the embodiments taken in conjunction with the following drawings, in which:
[0014] Figure 1 is a schematic diagram of the structure of a nucleic acid-peptide fragment-nucleic acid (DPD) linker synthesized by a linking reaction according to an embodiment of the present invention.
[0015] Figure 2 is a schematic diagram of the sequencing library structure obtained by annealing the backbone nucleic acid and the nucleic acid-peptide fragment-nucleic acid, and linking the motor protein-sequencing adapter complex according to an embodiment of the present invention.
[0016] Figure 3 is a schematic diagram of the structure of the sequencing library and anchor sequence combined according to an embodiment of the present invention.
[0017] Figure 4 shows four types of barcode library nanopore sequencing signals used for image extraction model training according to an embodiment of the present invention, wherein (a) corresponds to nucleic acid 5-peptide fragment 1-nucleic acid 1; (b) corresponds to nucleic acid 5-peptide fragment 2-nucleic acid 2; (c) corresponds to nucleic acid 5-peptide fragment 3-nucleic acid 3; and (d) corresponds to nucleic acid 5-peptide fragment 4-nucleic acid 4.
[0018] Figure 5 shows four types of barcode library nanopore sequencing signals used for the separation and verification of nucleic acids according to embodiments of the present invention, wherein (a) corresponds to nucleic acid 5-peptide fragment 1-nucleic acid 1; (b) corresponds to nucleic acid 5-peptide fragment 2-nucleic acid 2; (c) corresponds to nucleic acid 5-peptide fragment 3-nucleic acid 3; and (d) corresponds to nucleic acid 5-peptide fragment 4-nucleic acid 4. Detailed Implementation
[0019] The embodiments of the present invention are described in detail below. The embodiments described below are exemplary and are only used to explain the present invention, and should not be construed as limiting the present invention.
[0020] It should be noted that the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Therefore, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. Furthermore, in the description of this invention, unless otherwise stated, "a plurality of" means two or more.
[0021] To facilitate understanding of this invention, certain technical and scientific terms are specifically defined below. Unless otherwise expressly defined elsewhere in this invention, all other technical and scientific terms used herein have the meanings commonly understood by one of ordinary skill in the art to which this invention pertains.
[0022] In this invention, the terms "comprising" or "including" are open-ended expressions, meaning they include the contents specified in this invention but do not exclude other aspects.
[0023] In this invention, the terms “optionally,” “optionally,” or “optionally” generally refer to events or conditions described subsequently that may but may not occur, and the description includes both cases in which the event or condition occurs and cases in which the event or condition does not occur.
[0024] In this invention, the term "different polypeptide fragments" refers to polypeptide fragments with different amino acid sequences and lengths, such as the charge, polarity, and hydrophobicity of the polypeptide.
[0025] This invention proposes the use of polypeptide fragments as barcodes in nucleic acid sequencing, methods for labeling nucleic acid molecules, methods for preparing nucleic acid sequencing libraries, sequencing libraries, and methods for distinguishing different nucleic acids to be tested in a sample. These will be described in detail below.
[0026] The use of peptide fragments as barcodes in nucleic acid sequencing
[0027] In a first aspect, the present invention proposes the use of polypeptide fragments as barcodes in nucleic acid sequencing.
[0028] According to embodiments of the present invention, the barcode includes a sample barcode or a molecular barcode.
[0029] According to an embodiment of the present invention, in the nucleic acid sequencing, the sequencing library includes a single strand of the nucleic acid to be tested, the polypeptide fragment, and a single strand of the sequencing adapter nucleic acid.
[0030] According to embodiments of the present invention, the nucleic acid chain to be tested, the polypeptide fragment, and the sequencing adapter nucleic acid chain are directly or indirectly linked.
[0031] According to an embodiment of the present invention, any end of the polypeptide fragment is connected to any end of the first strand of the nucleic acid to be tested.
[0032] According to an embodiment of the present invention, the sequencing adapter nucleic acid first strand includes a sequencing adapter nucleic acid 5' segment and a sequencing adapter nucleic acid 3' segment. One end of the polypeptide fragment is connected to the 3' end of the sequencing adapter nucleic acid first strand 5' segment, and the other end of the polypeptide fragment is connected to the 5' end of the sequencing adapter nucleic acid first strand 3' segment. The 3' end of the sequencing adapter nucleic acid first strand 3' segment is connected to the 5' end of the nucleic acid first strand to be tested.
[0033] According to an embodiment of the present invention, one end of the polypeptide fragment is connected to the 5' end of the nucleic acid strand to be tested, and the other end of the polypeptide fragment is connected to the 3' end of the sequencing adapter nucleic acid strand.
[0034] According to an embodiment of the present invention, the sequencing library further includes a adapter-linked nucleic acid, the 5' end of which is connected to the 3' end of one strand of the sequencing adapter nucleic acid, one end of the polypeptide fragment is connected to the 5' end of one strand of the nucleic acid to be tested, and the other end of the polypeptide fragment is connected to the 3' end of the adapter-linked nucleic acid.
[0035] According to an embodiment of the present invention, the sequencing library further includes a second strand of nucleic acid to be tested and a second strand of sequencing adapter nucleic acid, wherein the second strand of nucleic acid to be tested is at least partially complementary to the first strand of nucleic acid to be tested, the second strand of sequencing adapter nucleic acid is at least partially complementary to the first strand of sequencing adapter nucleic acid, and the 3' end of the second strand of nucleic acid to be tested is connected to the 5' end of the second strand of sequencing adapter nucleic acid.
[0036] According to an embodiment of the present invention, the sequencing library further includes a adapter-connected complementary nucleic acid, wherein the 3' end of the adapter-connected complementary nucleic acid is connected to the 5' end of the sequencing adapter nucleic acid second strand, and the 5' end of the adapter-connected complementary nucleic acid is connected to the 3' end of the nucleic acid second strand to be tested.
[0037] According to an embodiment of the present invention, the length of the polypeptide fragment is not less than 1.
[0038] According to an embodiment of the present invention, the length of the polypeptide fragment is 2 to 100 amino acids.
[0039] According to embodiments of the present invention, the polypeptide fragment includes, but is not limited to, polypeptide sequences composed of one or more of any 20 natural amino acids, any unnatural amino acids, and any non-canonical amino acids.
[0040] According to embodiments of the present invention, the polypeptide fragment may include various N-terminal and C-terminal non-amino acid linkers in addition to amino acids, and the linkers may be any one or more of the following: 5'C3 spacer, 5'C6 spacer, 5'spacer 9, 5'spacer 18, 3'C3 spacer, 3'C6 spacer, 3'spacer 9, 3'spacer 18, int spacer C3, or int spacer 18.
[0041] According to an embodiment of the present invention, the length of the nucleic acid to be tested can be any length from 1 to 10,000 bases.
[0042] According to embodiments of the present invention, the nucleic acid to be tested includes, but is not limited to, deoxyribonucleosides (A / T / G / C), ribonucleosides (A / U / G / C), polymononucleotides, polymononucleotides, nucleic acids / RNAs to be tested composed of random base sequences, abasic spacers / dspacers, and nucleic acids containing organic linkers (e.g., 5'C3 spacer, 5'C6 spacer, 5'spacer 9, 5'spacer 18, 3'C3 spacer, 3'C6 spacer, 3'spacer 9, 3'spacer 18, int spacer C3, int spacer C4, etc.). 18, etc.), special nucleotides (e.g., 2-aminopurine, 5-bromodeoxyuridine, 5-bromodeoxyuridine, dideoxynucleosides (ddA, ddT, ddC, ddG), 5-methylcytosine deoxynucleoside, 5-hydroxymethylcytosine deoxynucleoside, N6-methyladenine nucleoside, deoxyhypoxanthine nucleoside, 5-nitro-2-deoxycytosine nucleoside, etc.), dT / dG inversion, G-tetramer.
[0043] According to an embodiment of the present invention, the sequencing library further includes a motor protein that binds to the first strand of the sequencing adapter nucleic acid.
[0044] Methods for labeling nucleic acid molecules
[0045] In a second aspect, the present invention provides a method for labeling nucleic acid molecules. According to an embodiment of the present invention, the method includes: providing a barcode and a nucleic acid to be tested, and concatenating the barcode with the nucleic acid to be tested, wherein the barcode includes a polypeptide fragment.
[0046] According to embodiments of the present invention, the barcode includes a sample barcode or a molecular barcode. The method according to embodiments of the present invention uses peptides as sample barcodes or molecular barcodes to label nucleic acid molecules, particularly nucleic acid molecules subsequently used for nanopore sequencing, which can significantly enhance sequencing signal characteristics and improve sample differentiation accuracy.
[0047] According to an embodiment of the present invention, the nucleic acid to be tested includes a chain of nucleic acid to be tested, and the method includes: performing a first ligation process on any end of the polypeptide fragment and any end of the chain of nucleic acid to be tested to label the chain of nucleic acid to be tested.
[0048] According to an embodiment of the present invention, the method includes performing the first ligation process by connecting any end of the polypeptide fragment to the 3' or 5' end of the first strand of the nucleic acid to be tested.
[0049] According to an embodiment of the present invention, the method further includes: providing a sequencing adapter nucleic acid, the sequencing adapter nucleic acid comprising a sequencing adapter nucleic acid strand; performing a first ligation treatment by connecting either end of the polypeptide fragment to the 5' end of the test nucleic acid strand; and performing a second ligation treatment by connecting the other end of the polypeptide fragment to the 3' end of the sequencing adapter nucleic acid strand. When both the sequencing adapter nucleic acid and the test nucleic acid are double-stranded, the structure of the nucleic acid sequencing library obtained after labeling can be as shown in Figure 2b.
[0050] According to an embodiment of the present invention, the first connection processing and the second connection processing are performed synchronously or in stages.
[0051] According to an embodiment of the present invention, the first connection process and the second connection process are covalent connection processes.
[0052] According to embodiments of the present invention, the first covalent connection process and the second covalent connection process may be the same or different.
[0053] According to an embodiment of the present invention, the method further includes: providing sequencing adapter nucleic acid and adapter-linking nucleic acid, wherein the sequencing adapter nucleic acid includes a sequencing adapter nucleic acid strand; performing a first ligation treatment on either end of the polypeptide fragment and the 5' end of the target nucleic acid strand; performing a third ligation treatment on the other end of the polypeptide fragment and the 3' end of the adapter-linking nucleic acid strand; and performing a fourth ligation treatment on the 5' end of the sequencing adapter nucleic acid strand. When the sequencing adapter nucleic acid, adapter-linking nucleic acid, and target nucleic acid are all double-stranded, the structure of the nucleic acid sequencing library obtained after labeling can be as shown in Figure 2a.
[0054] According to an embodiment of the present invention, the first connection processing, the third connection processing, and the fourth connection processing are performed synchronously or in stages.
[0055] According to an embodiment of the present invention, the first connection process, the third connection process, and the fourth connection process are covalent connection processes.
[0056] According to embodiments of the present invention, the first covalent connection process, the third covalent connection process, and the fourth connection process may be the same or different.
[0057] According to an embodiment of the present invention, the method further includes: providing a sequencing adapter nucleic acid, the sequencing adapter nucleic acid comprising a sequencing adapter nucleic acid strand, the sequencing adapter nucleic acid strand comprising a 5' segment and a 3' segment of the sequencing adapter nucleic acid strand; performing a fifth ligation process by connecting either end of the polypeptide fragment to the 5' end of the 3' segment of the sequencing adapter nucleic acid strand; performing a sixth ligation process by connecting the other end of the polypeptide fragment to the 3' end of the 5' segment of the sequencing adapter nucleic acid strand; and performing a seventh ligation process by connecting the 3' end of the 3' segment of the sequencing adapter nucleic acid strand to the 5' end of the nucleic acid to be tested strand. When both the sequencing adapter nucleic acid and the nucleic acid to be tested are double-stranded, the structure of the nucleic acid sequencing library obtained after labeling can be as shown in Figure 2c.
[0058] According to an embodiment of the present invention, the fifth connection process, the sixth connection process, and the seventh connection process are performed simultaneously or in stages;
[0059] According to an embodiment of the present invention, the fifth connection process, the sixth connection process, and the seventh connection process are covalent connection processes.
[0060] According to embodiments of the present invention, the fifth covalent connection process, the sixth covalent connection process, and the seventh connection process may be the same or different.
[0061] According to an embodiment of the present invention, the nucleic acid to be tested is single-stranded, and the method further includes: providing a backbone nucleic acid, the backbone nucleic acid comprising a plurality of random bases, and hybridizing the ligation product with the backbone nucleic acid to hybridize at least a portion of the backbone nucleic acid with the nucleic acid to be tested.
[0062] According to an embodiment of the present invention, the ligation process and the hybridization process are performed simultaneously or in steps.
[0063] According to an embodiment of the present invention, the nucleic acid to be tested is double-stranded, and the method includes linking the barcode to one strand of the nucleic acid to be tested.
[0064] According to an embodiment of the present invention, the nucleic acid to be tested includes a first strand of nucleic acid to be tested and a second strand of nucleic acid to be tested, wherein the first strand of nucleic acid to be tested and the second strand of nucleic acid to be tested are at least partially complementary; the sequencing adapter nucleic acid includes a first strand of sequencing adapter and a second strand of sequencing adapter, wherein the first strand of sequencing adapter nucleic acid and the second strand of sequencing adapter nucleic acid are at least partially complementary; the method further includes: performing an eighth ligation process on the second strand of nucleic acid to be tested and the second strand of sequencing adapter nucleic acid.
[0065] According to an embodiment of the present invention, the sequencing adapter nucleic acid further comprises a motor protein, which binds to the sequencing adapter on one strand.
[0066] According to embodiments of the present invention, the first covalent connection, the second covalent connection, the third covalent connection, the fifth covalent connection, the sixth covalent connection, and the eighth covalent connection are each independently selected from at least one of the following: thiol-olefin compound connection, thiol-maleimide connection, amino-N-hydroxysuccinimide connection, carbonyl-hydroxylamine-containing oxime connection, carbonyl-hydrazone-containing hydrazone connection, tetrazolium-alkyne compound connection, tetrazolium-olefin compound connection, carbonyl-urea-containing urea-structure compound connection, halogen-nucleophilic reagent substitution connection, 1,3-dipolar cycloaddition reaction connection, copper-catalyzed azide-alkynyl cycloaddition reaction connection, ruthenium-catalyzed azide-alkynyl cycloaddition reaction connection, Staudinger connection of azide-phosphorus compound, click chemistry reaction connection of azide compound-alkynyl compound, or native chemical connection.
[0067] According to embodiments of the present invention, the click chemistry link of the azide compound-alkynyl compound includes any one or more of the following: azide-DBCO click chemistry link, azide-OCT click chemistry link, azide-DIBO click chemistry link, azide-BARAC click chemistry link, azide-ALO click chemistry link, azide-DIFO click chemistry link, azide-MOFO click chemistry link, azide-DIBAC click chemistry link, azide-DIMAC click chemistry link, or azide-cyclooctene click chemistry link.
[0068] According to an embodiment of the present invention, the covalent linking is achieved through a chemical reaction between the modifying group at the end of the polypeptide fragment and the modifying group at the end of the linked nucleic acid.
[0069] According to embodiments of the present invention, the modifying groups at the ends of the polypeptide fragment and the modifying groups at the ends of the linked nucleic acids include any one of the following groups: (a) thiol-olefin group; (b) thiol-maleimide group; (c) amino-N-hydroxysuccinimide group; (d) carbonyl-hydroxylamine group; (e) carbonyl-hydrazine group; (f) carbonyl-urea group; (g) azido-phosphorus group; (h) azido-alkynyl group; (i) tetrazolyl-alkynyl group; (j) tetrazolyl-olefin group; (k) halogen-hydroxyl group; (l) halogen-cyano group; (m) halogen-amino group.
[0070] According to embodiments of the present invention, when the modifying group at the end of the polypeptide fragment is a thiol group, the modifying group at the end of the nucleic acid is an olefinic group or a maleimide group; when the modifying group at the end of the polypeptide fragment is an amino group, the modifying group at the end of the nucleic acid is an N-hydroxysuccinimide group; when the modifying group at the end of the polypeptide fragment is a carbonyl group, the modifying group at the end of the nucleic acid is a hydroxylamine group, a hydrazine group, or a urea group; when the modifying group at the end of the polypeptide fragment is an azide group, the modifying group at the end of the nucleic acid is a phosphorus group or an alkynyl group; when the modifying group at the end of the polypeptide fragment is a tetrazolium group, the modifying group at the end of the nucleic acid is an alkynyl group or an olefinic group; or when the modifying group at the end of the polypeptide fragment is a halogen group, the modifying group at the end of the nucleic acid is a hydroxyl group, a cyano group, or an amino group.
[0071] According to embodiments of the present invention, the fourth and seventh covalent connections are each independently achieved via phosphodiester bonds. In some embodiments of the present invention, the phosphodiester bonds are formed using a ligase ligation reaction.
[0072] Methods for preparing nucleic acid sequencing libraries
[0073] In a third aspect, the present invention provides a method for preparing a nucleic acid sequencing library. According to an embodiment of the invention, the method includes labeling the nucleic acid to be tested using the method described in the second aspect of the invention to obtain a nucleic acid sequencing library. According to the method of the embodiment of the invention, peptides are used as sample barcodes or molecular barcodes to prepare the nucleic acid sequencing library, which can significantly enhance sequencing signal characteristics and improve sample discrimination accuracy during nanopore sequencing.
[0074] According to an embodiment of the present invention, the adapter in the nucleic acid sequencing library is a Y-type adapter.
[0075] sequencing libraries
[0076] In a fourth aspect, the present invention provides a sequencing library. According to an embodiment of the invention, the library is prepared by the method described in the third aspect of the invention. The sequencing library of the present invention generates a unique current signal, distinct from nucleic acid molecules and possessed by barcode peptides, when passing through nanopores, thereby improving sample differentiation accuracy and simplifying subsequent signal analysis procedures.
[0077] Methods for distinguishing different nucleic acids in a sample
[0078] In a fifth aspect, the present invention provides a method for distinguishing different analyte nucleic acids in a sample. According to an embodiment of the present invention, the method includes: preparing a nucleic acid sequencing library using the method described in the third aspect of the present invention; performing sequencing processing on the nucleic acid sequencing library to obtain a current signal; and distinguishing different analyte nucleic acids in the sample based on the differences in the current signals. The method for distinguishing different analyte nucleic acids in a sample according to the embodiments of the present invention significantly enhances the characteristics of the sequencing signal, improves the accuracy of sample distinction, simplifies the subsequent signal analysis process, and enables real-time sequencing analysis by introducing barcode peptides into the nucleic acid sequencing library.
[0079] According to an embodiment of the present invention, different nucleic acids to be tested in a sample are distinguished based on differences in current signals in the following manner:
[0080] Different nucleic acids to be tested are linked to polypeptide fragments with different sequences. These polypeptide fragments are labeled at different or the same positions on the different nucleic acids to be tested. Based on the different current signals generated by the polypeptide fragments with different sequences, different nucleic acids to be tested in the sample can be distinguished; or
[0081] Different nucleic acids to be tested are linked to polypeptide fragments with the same sequence. The polypeptide fragments are labeled at different positions on the different nucleic acids to be tested. Based on the current signals generated by the same polypeptide fragments at different positions, the different nucleic acids to be tested in the sample can be distinguished.
[0082] According to an embodiment of the present invention, the sequencing is nanopore sequencing.
[0083] According to embodiments of the present invention, the sample is a mixed sample, and the polypeptide fragment is a sample barcode (Barcode / Index) to distinguish the origins of different nucleic acids to be tested in the mixed sample; or, the sample is a single sample, and the polypeptide fragment is a molecular barcode (Unique Molecular Identifier, UMI) to distinguish different nucleic acid chains to be tested in the single sample.
[0084] According to an embodiment of the present invention, the difference in the current signal includes at least one of the following: the position of the current signal peak, the height of the current signal peak, the shape of the current signal peak, and the number of current signal peaks.
[0085] The sequence of the present invention is as follows:
[0086] sequence list
[0087] The 5'Maleimide (maleamide) modified structure is as follows:
[0088] LYS(N3) is an azide modification of the Lys side chain (the azide group replaces the original amino group), and its structural formula is as follows:
[0089] X is a deoxynucleoside without a base, and its structural formula is:
[0090] The iSpC3 spacer has the following structure:
[0091] The iSp18 spacer has the following structure:
[0092] Chol-TEG is a modified sterol with the following structural formula:
[0093] The present invention will be explained below with reference to embodiments. Those skilled in the art will understand that the following embodiments are for illustrative purposes only and should not be considered as limiting the scope of the invention. Where specific techniques or conditions are not specified in the embodiments, they are performed according to the techniques or conditions described in the literature in the field or according to the product instructions. Reagents or instruments whose manufacturers are not specified are all conventional products that can be obtained commercially.
[0094] Example 1: Construction of barcode complex
[0095] Experimental methods:
[0096] (1) Four polypeptide fragment-test nucleic acid (PD) ligation products were prepared: (a) polypeptide fragment 1 (specific amino acid sequence as shown in SEQ ID NO: 1)-test nucleic acid 1 (specific nucleotide sequence as shown in SEQ ID NO: 5) (P1D1), (b) polypeptide fragment 2 (specific amino acid sequence as shown in SEQ ID NO: 2)-test nucleic acid 2 (specific nucleotide sequence as shown in SEQ ID NO: 6) (P2D2), (c) polypeptide fragment 3 (specific amino acid sequence as shown in SEQ ID NO: 3)-test nucleic acid 3 (specific nucleotide sequence as shown in SEQ ID NO: 7) (P3D3), and (d) polypeptide fragment 4 (specific amino acid sequence as shown in SEQ ID NO: 4)-test nucleic acid 4 (specific nucleotide sequence as shown in SEQ ID NO: 8) (P4D4). Among them, polypeptide fragment 1, polypeptide fragment 2, polypeptide fragment 3, and polypeptide fragment 4 are polypeptide fragments that provide barcode feature signals, and test nucleic acid 1, test nucleic acid 2, test nucleic acid 3, and test nucleic acid 4 are four sequence-similar test nucleic acids to be distinguished by barcode. The preparation methods for each polypeptide fragment and the nucleic acid to be tested are as follows:
[0097] The nucleic acid to be tested was fully dissolved in pure water, and its concentration was quantified using the Qubit SS Nucleic Acid Detection Kit (ThermoFisher). The peptide powder was fully dissolved in pure water to a concentration of 10 mg / mL. 1 μL of 10× ligation reaction buffer (1 M HEPES (pH = 7.2), 50 mM EDTA) was added to an EP tube, followed by 80 nmol of peptide fragment and 200 mM TCEP solution (peptide fragment to TCEP molar ratio 1:1). Pure water was then added to dilute the reaction solution. After vortexing and mixing, the reaction was incubated at 25°C in a metal bath for 10 minutes. After 10 minutes, 5 nmol of the nucleic acid to be tested was added, bringing the final reaction volume to 10 μL. After vortexing and mixing, the reaction was incubated at 25°C in a metal bath for 4 hours. After the reaction, the product was purified using an Agilent 1260 Infinity II high-performance liquid chromatograph, and the PD ligation product fraction was collected. The fraction was then lyophilized overnight for the next DPD ligation reaction.
[0098] (2) Four nucleic acid-peptide fragment-nucleic acid (DPD) ligands were prepared: (a) Nucleic acid 5-peptide fragment 1-nucleic acid 1 (D5P1D1), (b) Nucleic acid 5-peptide fragment 2-nucleic acid 2 (D5P2D2), (c) Nucleic acid 5-peptide fragment 3-nucleic acid 3 (D5P3D3), and (d) Nucleic acid 5-peptide fragment 4-nucleic acid 4 (D5P4D4). Nucleic acid 5 (specific nucleotide sequence shown in SEQ ID NO: 9) was a fixed 5' phosphorylated oligonucleotide fragment used for ligation with sequencing adapters. The preparation method for each DPD is as follows:
[0099] The nucleic acid 5 powder to be tested was fully dissolved in pure water, and its concentration was quantitatively detected using the Qubit SS Nucleic Acid Detection Kit (ThermoFisher). The nucleic acid 5 solution was added to the lyophilized PD ligation product powder at a molar ratio of 1:7.5, along with 10 μL of 10× ligation reaction buffer (1M HEPES (pH=7.2), 50mM EDTA). The final reaction volume was adjusted to 100 μL with pure water, vortexed, and incubated at 25°C in a metal bath for 16 hours. After the reaction, the product was purified using an Agilent 1260 Infinity II high-performance liquid chromatograph, and the ligation product fraction (DPD) was collected. The concentration of the fraction was quantitatively detected using the Qubit SS Nucleic Acid Detection Kit (ThermoFisher). The sample was aliquoted, lyophilized overnight, and stored at -80°C. The resulting product is the barcode complex, and its composition is shown in Table 1.
[0100] Table 1: "Oligonucleotide-Label Barcode Peptide-Test Nucleic Acid" Barcode Complex
[0101] Example 2: Construction of barcode sequencing libraries
[0102] Experimental methods:
[0103] Nucleic acid 6 (specific nucleotide sequence as shown in SEQ ID NO: 10), 7 (specific nucleotide sequence as shown in SEQ ID NO: 11), 8 (specific nucleotide sequence as shown in SEQ ID NO: 12), and 9 (specific nucleotide sequence as shown in SEQ ID NO: 13) powders were fully dissolved in pure water, and their concentrations were quantified using the Qubit SS Nucleic Acid Detection Kit (ThermoFisher). Aqueous solutions of nucleic acids 6, 7, 8, and 9 were added to lyophilized EP tubes containing D5P1D1, D5P2D2, D5P3D3, and D5P4D4 at a 1:1 molar ratio. The mixture was heated to 95°C for 5 minutes, then slowly cooled to 25°C and maintained at 25°C for 30 minutes to complete annealing. The annealed complex was mixed with a pre-prepared adapter complex, T4 ligase (NEB), and T4 ligase buffer (NEB) at a DPD concentration of 0.4 μM and incubated at room temperature for 30 minutes to form four different barcode sequencing libraries. The adapter complex was formed by hybridization of the test nucleic acid 10 (specific nucleotide sequence as shown in SEQ ID NO: 14) and the test nucleic acid 11 (specific nucleotide sequence as shown in SEQ ID NO: 15) and cross-linked with motor protein.
[0104] Example 3: Sequencing of barcode-tagged libraries
[0105] Experimental methods:
[0106] Take 2 μL of the sequencing library solution obtained in Example 2, and incubate it with the target nucleic acid 12 (specific nucleotide sequence as shown in SEQ ID NO: 16) and sequencing buffer (0.5 M KCI, 10 mM HEPES, 0.5 mM ATP, 1 mM MgCl2, pH 8) to prepare the sequencing sample. The final DPD concentration is 2.67 nM. A patch-clamp amplifier (or other electrical signal amplifier) is used to acquire the current signal. A single-channel nanopore detection system based on patch-clamp and signal amplifier is constructed according to the method disclosed in the literature (Ji Z, Guo P. Channel from bacterial virus T7 DNA packaging motor for the differentiation of peptides composed of a mixture of acidic and basic amino acids. Biomaterials. 2019 May 21; 214:119222). The electrolytic cell was divided into two chambers, a cis chamber and a trans chamber, using a planar 1,2-diphynoyl-sn-glycero-3-phosphocholine (DPhPC, Avanti Polar Lipids) phospholipid bilayer membrane; each chamber was fitted with a pair of Ag / AgCl electrodes. Nanoporins were added to the phospholipid bilayer membrane, and a voltage of 180 mV was applied to promote the embedding of the nanoporins into the phospholipid bilayer membrane, forming individual nanoporous channels. After the individual nanoporins were inserted into the phospholipid membrane, sequencing buffer (0.5 M KCl, 10 mM HEPES, 0.5 mM ATP, 1 mM MgCl2, pH 8) was introduced to remove excess nanoporins. Then, the above co-incubation mixture was added to the cis chamber and incubated at 25 °C for 10 min. Finally, 180 mV was applied, and nanopore current data were recorded at a frequency of 5 kHz.
[0107] The sequencing was repeated 6 times. Data from 3 of the experiments were used for machine learning of the current feature signals of the peptide barcode region (training set), and data from the other 3 experiments were used to verify the signal splitting effect (test set).
[0108] Example 4: Verification of the accuracy of nanopore sequencing signal processing and single nucleic acid library signal recognition
[0109] Experimental methods:
[0110] First, peptide electrical signal regions with different characteristics are marked in the time-varying current signal plots of different single barcode libraries in the training set. These regions are used to identify the current characteristic signals of the peptide barcode regions for nucleic acids 1, 2, 3, and 4. Here, a single barcode library refers to each independent barcode sequencing library.
[0111] Then, using image feature extraction, machine learning is performed on the current feature signals of the above four types of polypeptide barcode regions (i.e. polypeptide fragment signal intervals) to obtain a classification model that can be used for the segmentation of nucleic acid signals to be tested.
[0112] Finally, all nanopore sequencing signals in the test set were split into two groups at a 4:1 ratio: 80% and 20% of the data, and the peptide electrical signal regions were labeled for each group. Then, the 80% data group of each barcode library was fed into a trained image extraction model for classification. The classification results were compared with the actual signal labels to calculate the accuracy of signal recognition for a single nucleic acid library.
[0113] Experimental results:
[0114] Figure 4 shows representative nanopore electrical signals generated by four different peptide barcode libraries in the training set. Figures 4a, 4b, 4c, and 4d correspond to the sequencing signals of barcode libraries D5P1D1, D5P2D2, D5P3D3, and D5P4D4, respectively, with the peptide barcode sequencing signals within the dashed boxes. Comparing the electrical signal ranges of nucleic acids 1, 2, 3, and 4, it was found that when the nucleic acid sequences were similar, their nanopore sequencing signals were also similar, with no significant differential current characteristics. Conversely, when the peptide sequences were different, there were very significant differences in the nanopore electrical signal trends of peptide fragments 1, 2, 3, and 4. Therefore, this invention utilizes the significant differences in peptide nanopore electrical signals to introduce peptides into nucleic acid libraries and uses these peptides as barcodes to identify similar nucleic acid sequences, achieving the purpose of identifying and segmenting the nucleic acids.
[0115] Meanwhile, comparing the sequence information of the peptides reveals that peptide fragments 1, 2, and 3 are all electrically neutral peptides with similar lengths, but they still exhibit very significant differences in nanopore electrical signals, which can be used as library barcodes. Peptide fragment 4, due to its longer length and negative charge, shows even more pronounced differences from peptide fragments 1, 2, and 3. This further demonstrates the feasibility of using peptides to separate similar nucleic acid signals for testing in this invention.
[0116] Figure 5 shows representative nanopore electrophysiological signals generated from four different peptide barcode libraries in the split test set. Figures 5a, 5b, 5c, and 5d correspond to the sequencing signals of barcode libraries D5P1D1, D5P2D2, D5P3D3, and D5P4D4, respectively, with the peptide barcode sequencing signals within the dashed boxes. Comparing Figures 4 and 5, it can be observed that the reproducibility of the nanopore sequencing electrophysiological signals from the two sequencing runs of the same library is excellent, confirming the stability of the method of this invention.
[0117] The trained image extraction model was used to identify single peptide barcode libraries on the test set. Since the data input to the classifier consisted of libraries of known peptide barcode sequences, the accuracy was calculated based on the classification results given by the classifier. The results are shown in Table 2. The accuracy for all barcode libraries reached 100%, indicating that the method of this invention can effectively identify which peptide barcode is present in the test sample, thereby achieving the purpose of labeling the test sample.
[0118] Table 2: Accuracy of Signal Identification from a Single Nucleic Acid Library
[0119] Example 5: Validation of the accuracy of signal recognition from mixed nucleic acid libraries
[0120] Experimental methods:
[0121] The remaining 20% datasets from each barcode library in Example 4 were mixed to form a sequencing signal containing four different barcode libraries. This signal was used to simulate the signal obtained by nanopore sequencing after mixing multiple barcode-tagged libraries, and is therefore called a mixed nucleic acid library. This mixed library was then fed into a trained image extraction and classification model for classification. Since each read contains a library tag (i.e., it knows which barcode library it came from before mixing), the classification results can be compared with the actual signal tags to calculate the accuracy of signal splitting from the mixed nucleic acid library.
[0122] Experimental results:
[0123] The trained image extraction model was used to attempt to identify the mixed peptide barcode library formed from the test set. The results are shown in Table 3. The resolution and accuracy of all barcode libraries reached 100%. Resolution refers to the proportion of sequences with different barcodes that were successfully assigned to their respective samples (there may be unrecognizable sequencing signals, i.e., samples that were not classified into any DNA category by the classifier). The calculation formula is as follows:
[0124] The split rate reflects the proportion of successfully allocated reads in the overall sequencing data; a higher split rate means that more reads are correctly allocated.
[0125] Accuracy refers to the proportion of reads correctly assigned to the corresponding samples out of all successfully assigned reads during the partitioning process. The calculation formula is as follows:
[0126] Accuracy reflects the proportion of correctly assigned reads among successfully assigned reads. Higher accuracy means that the assigned reads are more reliable, indicating that the method of the present invention can effectively identify which nucleic acid sequence to be tested is contained in a particular sequencing signal from a mixed sample.
[0127] Table 3: Accuracy of signal recognition in mixed DNA libraries
[0128] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of different embodiments or examples.
[0129] Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention. Those skilled in the art can make changes, modifications, substitutions and variations to the above embodiments within the scope of the present invention.
Claims
1. The use of peptide fragments as barcodes in nucleic acid sequencing.
2. The use according to claim 1, characterized in that, The barcode includes sample barcodes or molecular barcodes.
3. The use according to claim 1 or 2, characterized in that, In the nucleic acid sequencing, the sequencing library includes the target nucleic acid strand, the polypeptide fragment, and the sequencing adapter nucleic acid strand.
4. The use according to claim 3, characterized in that, The nucleic acid chain to be tested, the polypeptide fragment, and the sequencing adapter nucleic acid chain are directly or indirectly linked.
5. The use according to claim 3, characterized in that, The polypeptide fragment is attached to either end of the first strand of the nucleic acid to be tested.
6. The use according to claim 3, characterized in that, The sequencing adapter nucleic acid first strand includes a sequencing adapter nucleic acid 5' segment and a sequencing adapter nucleic acid 3' segment. One end of the polypeptide fragment is connected to the 3' end of the sequencing adapter nucleic acid first strand 5' segment, and the other end of the polypeptide fragment is connected to the 5' end of the sequencing adapter nucleic acid first strand 3' segment. The 3' end of the sequencing adapter nucleic acid first strand 3' segment is connected to the 5' end of the nucleic acid first strand to be tested.
7. The use according to claim 5, characterized in that, One end of the polypeptide fragment is connected to the 5' end of the nucleic acid strand to be tested, and the other end of the polypeptide fragment is connected to the 3' end of the nucleic acid strand of the sequencing adapter.
8. The use according to claim 5, characterized in that, The sequencing library further includes a adapter-linked nucleic acid, the 5' end of which is connected to the 3' end of one strand of the sequencing adapter nucleic acid, one end of which is connected to the 5' end of one strand of the nucleic acid to be tested, and the other end of which is connected to the 3' end of the adapter-linked nucleic acid.
9. The use according to claim 3, characterized in that, The sequencing library further includes a second strand of nucleic acid to be tested and a second strand of sequencing adapter nucleic acid, wherein the second strand of nucleic acid to be tested is at least partially complementary to the first strand of nucleic acid to be tested, and the second strand of sequencing adapter nucleic acid is at least partially complementary to the first strand of sequencing adapter nucleic acid, and the 3' end of the second strand of nucleic acid to be tested is connected to the 5' end of the second strand of sequencing adapter nucleic acid.
10. The use according to claim 8, characterized in that, The sequencing library further includes a adapter-connected complementary nucleic acid, wherein the 3' end of the adapter-connected complementary nucleic acid is connected to the 5' end of the sequencing adapter nucleic acid second strand, and the 5' end of the adapter-connected complementary nucleic acid is connected to the 3' end of the nucleic acid second strand to be tested.
11. The use according to claim 3, characterized in that, The sequencing library further includes a motor protein that binds to the first strand of the sequencing adapter nucleic acid.
12. A method for labeling nucleic acid molecules, characterized in that, include: A barcode and a nucleic acid to be tested are provided, and the barcode and the nucleic acid to be tested are linked. The barcode includes a polypeptide fragment.
13. The method according to claim 12, characterized in that, The barcode includes sample barcodes or molecular barcodes.
14. The method according to claim 12, characterized in that, The nucleic acid to be tested includes a chain of nucleic acid to be tested, and the method includes: performing a first ligation process on any end of the polypeptide fragment and any end of the chain of nucleic acid to be tested to label the chain of nucleic acid to be tested.
15. The method according to claim 12, characterized in that, The method includes performing a first ligation process by connecting any end of the polypeptide fragment to the 3' or 5' end of the first strand of the nucleic acid to be tested.
16. The method according to any one of claims 12 to 15, characterized in that, The method further includes: providing a sequencing adapter nucleic acid, the sequencing adapter nucleic acid comprising a sequencing adapter nucleic acid strand; performing a first ligation process by connecting either end of the polypeptide fragment to the 5' end of the nucleic acid strand to be tested; and performing a second ligation process by connecting the other end of the polypeptide fragment to the 3' end of the sequencing adapter nucleic acid strand.
17. The method according to claim 16, characterized in that, The first connection processing and the second connection processing are performed synchronously or in stages.
18. The method according to claim 16, characterized in that, The first connection process and the second connection process are covalent connection processes.
19. The method according to claim 16, characterized in that, The first covalent linking process and the second covalent linking process may be the same or different.
20. The method according to any one of claims 12 to 15, characterized in that, The method further includes: providing sequencing adapter nucleic acid and adapter-linking nucleic acid, wherein the sequencing adapter nucleic acid includes a sequencing adapter nucleic acid strand; performing a first ligation process by connecting either end of the polypeptide fragment to the 5' end of the nucleic acid strand to be tested; performing a third ligation process by connecting the other end of the polypeptide fragment to the 3' end of the adapter-linking nucleic acid strand; and performing a fourth ligation process by connecting the 5' end of the adapter-linking nucleic acid to the 3' end of the sequencing adapter nucleic acid strand.
21. The method according to claim 20, characterized in that, The first connection processing, the third connection processing, and the fourth connection processing are performed simultaneously or in stages.
22. The method according to claim 20, characterized in that, The first connection process, the third connection process, and the fourth connection process are covalent connection processes.
23. The method according to claim 20, characterized in that, The first covalent linking process, the third covalent linking process, and the fourth linking process may be the same or different.
24. The method according to claim 12, characterized in that, The method further includes: providing a sequencing adapter nucleic acid, the sequencing adapter nucleic acid comprising a sequencing adapter nucleic acid strand, the sequencing adapter nucleic acid strand comprising a 5' segment and a 3' segment of the sequencing adapter nucleic acid strand; performing a fifth ligation process by connecting either end of the polypeptide fragment to the 5' end of the 3' segment of the sequencing adapter nucleic acid strand; performing a sixth ligation process by connecting the other end of the polypeptide fragment to the 3' end of the 5' segment of the sequencing adapter nucleic acid strand; and performing a seventh ligation process by connecting the 3' end of the 3' segment of the sequencing adapter nucleic acid strand to the 5' end of the nucleic acid strand to be tested.
25. The method according to claim 24, characterized in that, The fifth connection process, the sixth connection process, and the seventh connection process are performed simultaneously or in stages.
26. The method according to claim 24, characterized in that, The fifth, sixth, and seventh connection processes are covalent connection processes.
27. The method according to claim 24, characterized in that, The fifth, sixth, and seventh covalent linking processes may be the same or different.
28. The method according to claim 12, characterized in that, The nucleic acid to be tested is single-stranded, and the method further includes: providing a backbone nucleic acid, the backbone nucleic acid comprising a plurality of random bases, and hybridizing the ligation product with the backbone nucleic acid so that at least a portion of the backbone nucleic acid hybridizes with the nucleic acid to be tested.
29. The method according to claim 28, characterized in that, The ligation process is performed simultaneously with or in steps with the hybridization process.
30. The method according to claim 12, characterized in that, The nucleic acid to be tested is double-stranded, and the method includes linking the barcode to one strand of the nucleic acid to be tested.
31. The method according to any one of claims 16 to 24, characterized in that, The nucleic acid to be tested includes a first strand and a second strand, wherein the first strand and the second strand are at least partially complementary. The sequencing adapter nucleic acid includes a first strand and a second strand, wherein the first strand and the second strand are at least partially complementary. The method further includes: performing an eighth ligation process on the second strand of the nucleic acid to be tested and the second strand of the sequencing adapter nucleic acid.
32. The method according to claim 31, characterized in that, The sequencing adapter nucleic acid further includes a motor protein, which binds to the sequencing adapter on one strand.
33. The method according to any one of claims 16 to 24, characterized in that, The first covalent link, the second covalent link, the third covalent link, the fifth covalent link, the sixth covalent link, and the eighth covalent link are each independently selected from at least one of the following: The linkages include: thiol-olefin linkage, thiol-maleimide linkage, amino-N-hydroxysuccinimide linkage, carbonyl-hydroxylamine-containing oxime linkage, carbonyl-hydrazine-containing hydrazone linkage, tetrazolium-alkyne linkage, tetrazolium-olefin linkage, carbonyl-urea-containing urea linkage, halogen-nucleophilic substitution linkage, 1,3-dipolar cycloaddition linkage, copper-catalyzed azide-alkynyl cycloaddition linkage, ruthenium-catalyzed azide-alkynyl cycloaddition linkage, Staudinger linkage of azide-phosphorus compounds, and click chemistry or natural chemical linkage of azide-alkynyl compounds. Preferably, the click chemistry link of the azide compound-alkynyl compound includes any one or more of the following: azide-DBCO click chemistry link, azide-OCT click chemistry link, azide-DIBO click chemistry link, azide-BARAC click chemistry link, azide-ALO click chemistry link, azide-DIFO click chemistry link, azide-MOFO click chemistry link, azide-DIBAC click chemistry link, azide-DIMAC click chemistry link, or azide-cyclooctene click chemistry link.
34. The method according to claim 33, characterized in that, The covalent linking is achieved through a chemical reaction between the modifying groups at the ends of the polypeptide fragment and the modifying groups at the ends of the linked nucleic acid.
35. The method according to claim 34, characterized in that, The modifying groups at the ends of the polypeptide fragment and the modifying groups at the ends of the linked nucleic acid include any one of the following groups: (a) thiol-olefin group; (b) Thio-maleimide group; (c) Amino-N-hydroxysuccinimide group; (d) Carbonyl-hydroxylamine group; (e) Carbonyl-hydrazine; (f) Carbonyl-urea group; (g) Azide-phosphorus group; (h) Azide-alkynyl; (i) Tetrazolyl-alkynyl; (j) Tetrazoazolyl-olefin group; (k) Halogen-hydroxyl group; (l) Halogen-cyano; (m) Halogen-amino.
36. The method according to claim 35, characterized in that, When the modifying group at the end of the polypeptide fragment is a thiol group, the modifying group at the end of the nucleic acid linker is an olefin group or a maleimide group; When the modifying group at the end of the polypeptide fragment is an amino group, the modifying group at the end of the nucleic acid linker is an N-hydroxysuccinimide group; When the modifying group at the end of the polypeptide fragment is a carbonyl group, the modifying group at the end of the nucleic acid is a hydroxylamine group, a hydrazine group, or a urea group; When the modifying group at the end of the polypeptide fragment is an azide group, the modifying group at the end of the nucleic acid linker is a phosphorus group or an alkynyl group; When the modifying group at the end of the polypeptide fragment is a tetrazolium group, the modifying group at the end of the nucleic acid linker is an alkynyl or olefinic group; or When the modifying group at the end of the polypeptide fragment is halogen, the modifying group at the end of the nucleic acid linker is hydroxyl, cyano, or amino.
37. The method according to any one of claims 20 to 27, characterized in that, The fourth and seventh covalent connections are achieved independently via phosphodiester bonds.
38. A method for preparing a nucleic acid sequencing library, characterized in that, include: The nucleic acid to be tested is labeled using the method described in any one of claims 12 to 37 to obtain a nucleic acid sequencing library; Preferably, the adapter in the nucleic acid sequencing library is a Y-type adapter.
39. A sequencing library, characterized in that, It is prepared by the method according to claim 38.
40. A method for distinguishing different nucleic acids to be tested in a sample, characterized in that, include: Nucleic acid sequencing libraries are prepared using the method of claim 38; The nucleic acid sequencing library was sequenced to obtain an electric current signal; Different nucleic acids to be tested in a sample can be distinguished based on the differences in current signals.
41. The method according to claim 40, characterized in that, Based on the differences in current signals, different nucleic acids to be tested in a sample are distinguished in the following way: Different nucleic acids to be tested are linked to polypeptide fragments with different sequences. These polypeptide fragments are labeled at different or the same positions on the different nucleic acids to be tested. Based on the different current signals generated by the polypeptide fragments with different sequences, different nucleic acids to be tested in the sample can be distinguished; or Different nucleic acids to be tested are linked to polypeptide fragments with the same sequence. The polypeptide fragments are labeled at different positions on the different nucleic acids to be tested. Based on the current signals generated by the same polypeptide fragments at different positions, the different nucleic acids to be tested in the sample can be distinguished.
42. The method according to claim 40, characterized in that, The sequencing was nanopore sequencing.
43. The method according to claim 40, characterized in that, The sample is a mixed sample, and the polypeptide fragment serves as a sample barcode to distinguish the origins of different nucleic acids to be tested in the mixed sample; or, The sample is a single sample, and the polypeptide fragment is a molecular barcode to distinguish different nucleic acid chains to be tested in the single sample.
44. The method according to claim 40, characterized in that, The differences in the current signal include at least one of the following: the position of the current signal peak, the height of the current signal peak, the shape of the current signal peak, and the number of current signal peaks.