Chemical compositions and methods for utilizing them
Enzyme- and amplification-free sequencing probes with a target-binding and barcode domain facilitate rapid and accurate nucleic acid sequencing, addressing the limitations of current methods by providing long readout lengths and low error rates.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- BRUKER SPATIAL BIOLOGY INC
- Filing Date
- 2024-07-31
- Publication Date
- 2026-06-10
AI Technical Summary
Current nucleic acid sequencing methods require amplification and enzymes, which are time-consuming and limit the speed and accuracy of sequencing.
The development of sequencing probes with a target-binding domain and a barcode domain, utilizing a synthetic skeleton with modified nucleotides and multiple attachment sites, allowing for enzyme-free and amplification-free sequencing with long readout lengths and low error rates.
Enables rapid and accurate nucleic acid sequencing without amplification or enzymes, suitable for clinical applications.
Smart Images

Figure 0007872815000019 
Figure 0007872815000020 
Figure 0007872815000021
Abstract
Description
Technical Field
[0001] Cross - References to Related Applications This application claims the priority and benefit of U.S. Provisional Application No. 62 / 424,887, filed on November 21, 2016; U.S. Provisional Application No. 62 / 457,237, filed on February 10, 2017; and U.S. Provisional Application No. 62 / 536,147, filed on July 24, 2017. The entire contents of each of the above patent applications are hereby incorporated herein by reference in their entirety.
[0002] Sequence Listing This application contains a sequence listing submitted in ASCII format via EFS - Web, the entire contents of which are hereby incorporated herein by reference. A copy of this ASCII was created on November 20, 2017 and named NATE033 - 001WO_SeqList_ST25.txt, and it is 19,351 bytes in size.
Background Art
[0003] Currently, there are various methods for nucleic acid sequencing, that is, the process of determining the exact order of nucleotides in a nucleic acid molecule. Current methods require amplification of the nucleic acid by enzymes (e.g., PCR) and / or by cloning. Further polymerization by an enzyme is required to generate a signal that can be detected by optical detection means. Therefore, in this field, there is a need for a method of nucleic acid sequencing that is rapid and does not require amplification and enzymes. The present disclosure addresses such a need.
Summary of the Invention
[0004] This disclosure provides sequencing probes, methods, kits, and apparatus that offer long readout lengths, low error rates, rapid sequencing, and enzyme-, amplification-, and library-free nucleic acid sequencing. The sequencing probes described herein include barcode domains, where each position within the barcode domain corresponds to at least two nucleotides in a target-binding domain. Furthermore, these methods, kits, and apparatus have the ability to rapidly obtain results from a sample. These features are particularly useful for sequencing in clinical settings. This disclosure is an improvement on the disclosures disclosed in United States Patent Application Publication No. 2016 / 0194701, which are incorporated herein by reference in their entirety.
[0005] This disclosure provides for a) a composition comprising a target-binding domain and a barcode domain (wherein the target-binding domain comprises at least eight nucleotides capable of binding to a target nucleic acid, at least six nucleotides in the target-binding domain capable of identifying the corresponding nucleotide in the target nucleic acid molecule, and at least two nucleotides in the target-binding domain not capable of identifying the corresponding nucleotide in the target nucleic acid molecule; at least two of the at least six nucleotides in the target-binding domain are modified nucleotides or nucleotide analogs; the barcode domain comprises a synthetic skeleton, and the barcode domain comprises at least three attachment sites, each attachment site capable of binding to a complementary nucleic acid molecule (Includes at least one attachment region containing at least one nucleic acid sequence to which it can bind, wherein the nucleic acid sequences of the at least three attachment sites determine the position and identity of at least six nucleotides in the target nucleic acid to which the target binding domain binds, and each of the at least three attachment sites has a different nucleic acid sequence); and a first complementary primary nucleic acid molecule that hybridizes to the first attachment site of the at least three attachment sites (where the first complementary primary nucleic acid molecule includes at least two domains and a linker modification, of which the first domain hybridizes to the first attachment site of the barcode domain, and the second domain can hybridize to at least one second complementary secondary nucleic acid molecule, and the linker modification is [ka] A complex is provided that includes one of the following (this linker modification is located between the first and second domains).
[0006] This disclosure provides for a) a composition comprising a target-binding domain and a barcode domain (wherein the target-binding domain comprises at least eight nucleotides capable of binding to a target nucleic acid, at least six nucleotides in the target-binding domain capable of identifying a corresponding nucleotide in the target nucleic acid molecule, and at least two nucleotides in the target-binding domain not capable of identifying a corresponding nucleotide in the target nucleic acid molecule; at least two of the at least six nucleotides in the target-binding domain are modified nucleotides or nucleotide analogs; the barcode domain comprises a synthetic skeleton, the barcode domain comprises at least three attachment sites, each attachment site comprising at least one attachment region comprising at least one nucleic acid sequence to which a complementary nucleic acid molecule can bind, and the nucleic acid sequences of the at least three attachment sites before the target-binding domain binds (The following are described): (determining the positions and attributes of at least six nucleotides in the target nucleic acid, each of the at least three attachment sites corresponding to two of the at least six nucleotides in the target binding domain, and each of the at least three attachment sites having a different nucleic acid sequence, the nucleic acid sequence at each of the at least three attachment sites determining the positions and attributes of the corresponding two of the at least six nucleotides in the target nucleic acid to which the target binding domain binds); and a first complementary primary nucleic acid molecule that hybridizes to the first attachment site of the at least three attachment sites (wherein the first complementary primary nucleic acid molecule comprises at least two domains and a linker modification, the first domain of which hybridizes to the first attachment site of the barcode domain, and the second domain of which can hybridize to at least one second complementary secondary nucleic acid molecule, the linker modification is [ka] A complex is provided that includes one of the following (this linker modification is located between the first and second domains).
[0007] This disclosure provides a sequencing probe comprising a target-binding domain and a barcode domain; the target-binding domain comprises at least eight nucleotides capable of binding to a target nucleic acid, at least six nucleotides in the target-binding domain capable of identifying corresponding nucleotides in the target nucleic acid molecule, at least two nucleotides in the target-binding domain not capable of identifying corresponding nucleotides in the target nucleic acid molecule; at least two of the at least six nucleotides in the target-binding domain are modified nucleotides or nucleotide analogs; the barcode domain comprises a synthetic skeleton, the barcode domain comprises at least three attachment sites, each attachment site comprising at least one attachment region comprising at least one nucleic acid sequence to which a complementary nucleic acid molecule can bind, the nucleic acid sequences of the at least three attachment sites determine the positions and attributes of the at least six nucleotides in the target nucleic acid to which the target-binding domain binds, and each of the at least three attachment sites has a different nucleic acid sequence.
[0008] This disclosure provides a sequencing probe comprising a target-binding domain and a barcode domain; the target-binding domain comprises at least eight nucleotides capable of binding to a target nucleic acid, at least six nucleotides in the target-binding domain capable of identifying corresponding nucleotides in the target nucleic acid molecule, at least two nucleotides in the target-binding domain not capable of identifying corresponding nucleotides in the target nucleic acid molecule; at least two of the at least six nucleotides in the target-binding domain are modified nucleotides or nucleotide analogs; the barcode domain comprises a synthetic skeleton, the barcode domain comprises at least three attachment sites, each attachment site comprising at least one attachment region comprising at least one nucleic acid sequence to which a complementary nucleic acid molecule can bind, each of the at least three attachment sites corresponds to two of the at least six nucleotides in the target-binding domain, and each of the at least three attachment sites has a different nucleic acid sequence, the nucleic acid sequence at each of the at least three attachment sites determines the position and attributes of the two corresponding nucleotides in the at least six nucleotides in the target nucleic acid to which the target-binding domain binds.
[0009] This disclosure provides a sequencing probe comprising a target-binding domain and a barcode domain; the target-binding domain comprises at least 10 nucleotides capable of binding to a target nucleic acid, at least 6 nucleotides in the target-binding domain capable of identifying corresponding nucleotides in the target nucleic acid molecule, and at least 4 nucleotides in the target-binding domain not capable of identifying corresponding nucleotides in the target nucleic acid molecule; and the barcode domain comprises a synthetic skeleton, the barcode domain comprising at least 3 attachment sites, each attachment site comprising at least 1 attachment region comprising at least 1 nucleic acid sequence to which a complementary nucleic acid molecule can bind, the nucleic acid sequences of the at least 3 attachment sites determining the positions and attributes of the at least 6 nucleotides in the target nucleic acid to which the target-binding domain binds, and each of the at least 3 attachment sites having a different nucleic acid sequence.
[0010] This disclosure provides a sequencing probe comprising a target-binding domain and a barcode domain; the target-binding domain comprises at least 10 nucleotides capable of binding to a target nucleic acid, at least 6 nucleotides in the target-binding domain capable of identifying corresponding nucleotides in the target nucleic acid molecule, and at least 4 nucleotides in the target-binding domain not capable of identifying corresponding nucleotides in the target nucleic acid molecule; and the barcode domain comprises a synthetic skeleton, the barcode domain comprising at least 3 attachment sites, each attachment site comprising at least 1 attachment region comprising at least 1 nucleic acid sequence to which a complementary nucleic acid molecule can bind, each of the at least 3 attachment sites corresponding to 2 nucleotides of the at least 6 nucleotides in the target-binding domain, and each of the at least 3 attachment sites having a different nucleic acid sequence, the nucleic acid sequence at each of the at least 3 attachment sites determining the position and attributes of the 2 corresponding nucleotides of the at least 6 nucleotides in the target nucleic acid to which the target-binding domain binds.
[0011] The synthetic skeleton may include any of the following: polysaccharides, polynucleotides, peptides, peptide nucleic acids, or polypeptides. The synthetic skeleton may include DNA. The synthetic skeleton may contain single-stranded DNA. The sequencing probe includes a single-stranded DNA synthetic skeleton and a double-stranded DNA spacer between the target-binding domain and the barcode domain. The double-stranded DNA spacer can be approximately 1 to 100 nucleotides long, or approximately 2 to 50 nucleotides long, or 20 to 40 nucleotides long. The double-stranded DNA spacer is approximately 36 nucleotides long. The sequencing probe may also include a polymer-based spacer between the single-stranded DNA synthetic skeleton and the target-binding domain and the barcode domain, the polymer-based spacer providing similar mechanical properties to the double-stranded DNA spacer.
[0012] The number of nucleotides in the target-binding domain can be the same as, less than, or more than the number of attachment sites in the barcode domain. Preferably, the number of nucleotides in the target-binding domain is greater than the number of attachment sites in the barcode domain. The number of nucleotides in the target-binding domain can be at least 3 more than the number of attachment sites in the barcode domain, at least 4 more than, at least 5 more than, at least 6 more than, at least 7 more than, at least 8 more than, at least 9 more than, or at least 10 more than. Preferably, the target-binding domain contains 8 nucleotides and the barcode domain contains 3 attachment sites.
[0013] Modified nucleotides or nucleic acid analogs can be at least three, at least four, at least five, or at least six nucleotides within the target binding domain that can identify the corresponding nucleotide in the target nucleic acid molecule. Possible modified nucleotides or nucleic acid analogs include locked nucleic acids (LNA), cross-linked nucleic acids (BNA), propyne-modified nucleic acids, zip nucleic acids (ZNA®), isoguanine, isocytosine, or any combination thereof. The modified nucleotide or nucleic acid analog is preferably a locked nucleic acid (LNA).
[0014] The at least two nucleotides within the target-binding domain that do not identify a corresponding nucleotide can be located before the at least six nucleotides within the target-binding domain. The at least two nucleotides within the target-binding domain that do not identify a corresponding nucleotide can be located after the at least six nucleotides within the target-binding domain. At least one of the at least two nucleotides within the target-binding domain that do not identify a corresponding nucleotide can be located before the at least six nucleotides within the target-binding domain, and at least one of the at least two nucleotides within the target-binding domain that do not identify a corresponding nucleotide can be located after the at least six nucleotides within the target-binding domain. In other words, the at least two nucleotides within the target-binding domain that do not identify a corresponding nucleotide are adjacent to the at least six nucleotides within the target-binding domain.
[0015] The at least two nucleotides within the target-binding domain that do not identify a corresponding nucleotide can be universal bases, degenerate bases, or a combination thereof. Of the at least four nucleotides within the target-binding domain that do not identify a corresponding nucleotide, at least two can be universal bases, degenerate bases, or a combination thereof.
[0016] Each attachment location within a barcode domain may contain one attachment area. Each attachment location within a barcode domain may contain two or more attachment areas.
[0017] Each attachment site within a barcode domain may contain the same number of attachment regions. Each attachment site within a barcode domain may contain a different number of attachment regions. At least one of at least three attachment sites within a barcode domain may contain a different number of attachment regions than the other two. When an attachment site within a barcode domain contains two or more attachment regions, those attachment regions may be the same. When an attachment site within a barcode domain contains two or more attachment regions, those attachment regions may contain the same nucleic acid sequence. When an attachment site within a barcode domain contains two or more attachment regions, those attachment regions may be different. When an attachment site within a barcode domain contains two or more attachment regions, those attachment regions may contain different nucleic acid sequences.
[0018] Each nucleic acid sequence containing each attachment region within the barcode domain is approximately 8 to 20 nucleotides long. Each nucleic acid sequence containing each attachment region within the barcode domain is approximately 12 nucleotides long. Each nucleic acid sequence containing each attachment region within the barcode domain is approximately 14 nucleotides long.
[0019] At least one, at least two, or at least three of the at least three attachment sites within the barcode domain can be adjacent to at least one flanking single-stranded polynucleotide. Each of the at least three attachment sites within the barcode domain can be adjacent to at least one flanking single-stranded polynucleotide.
[0020] At least one, at least two, or at least three attachment regions within at least one attachment site can be integrated into the synthetic skeleton. Each of the at least three attachment regions can be integrated into the synthetic skeleton. At least one, at least two, or at least three attachment regions within at least one attachment site can branch off from the synthetic skeleton. Each of the at least three attachment regions can branch off from the synthetic skeleton.
[0021] The complementary nucleic acid molecule that can directly or indirectly bind to at least one nucleic acid sequence within at least one attachment region at each attachment site can be RNA, DNA, or PNA. DNA is preferred as the complementary nucleic acid molecule.
[0022] A primary nucleic acid molecule can serve as a complementary nucleic acid molecule. The primary nucleic acid molecule directly binds to at least one attachment region within at least one attachment site of the barcode domain.
[0023] The primary nucleic acid molecule may contain at least two domains, the first of which can bind to at least one attachment region within at least one attachment site of the barcode domain, and the second domain may bind to at least one complementary secondary nucleic acid molecule. The primary nucleic acid molecule may contain at least two domains, the first of which can bind to at least one attachment region within at least one attachment site of the barcode domain, and the second domain may contain a first detectable label and at least a second detectable label.
[0024] The primary nucleic acid molecule can hybridize to at least one attachment region within at least one attachment site of the barcode domain, and can hybridize to at least one, two, three, four, five, or more secondary nucleic acid molecules. Preferably, the primary nucleic acid molecule hybridizes to four secondary nucleic acid molecules. The primary nucleic acid molecule can hybridize to at least one attachment region within at least one attachment site of the barcode domain, and can hybridize to a first detectable label and at least a second detectable label. The first detectable label and at least the second detectable label may have the same emission spectrum or different emission spectra.
[0025] The primary nucleic acid molecule may contain a cleavable linker. The cleavable linker is preferably located between the first and second domains. The cleavable linker can be a light-cleavable linker (e.g., a UV-cleavable linker), a reducing agent-cleavable linker, or an enzyme-cleavable linker. The linker is preferably a light-cleavable linker.
[0026] The secondary nucleic acid molecule may contain at least two domains, the first of which may bind to a complementary sequence in at least one primary nucleic acid molecule, and the second domain may bind to (a) a first detectable label and at least a second detectable label, or (b) at least one complementary tertiary nucleic acid molecule, or (c) a combination thereof.
[0027] A secondary nucleic acid molecule can hybridize to at least one primary nucleic acid molecule and to at least one, two, three, four, five, six, seven, or more tertiary nucleic acid molecules. It is preferable that the secondary nucleic acid molecule hybridizes to one tertiary nucleic acid molecule. The secondary nucleic acid molecule can hybridize to at least one primary nucleic acid molecule and may include a first detectable label and at least a second detectable label. The secondary nucleic acid molecule can hybridize to at least one primary nucleic acid molecule, at least one tertiary nucleic acid molecule, a first detectable label, and at least a second detectable label. The first detectable label and at least the second detectable label may have the same emission spectrum or different emission spectra. When a secondary nucleic acid molecule hybridizes with at least one primary nucleic acid molecule, at least one tertiary nucleic acid molecule containing a first detectable label and at least a second detectable label, and the first detectable label and at least a second detectable label, the first detectable label and at least a second detectable label located on the surface of the secondary nucleic acid molecule may have the same emission spectrum, the first detectable label and at least a second detectable label located on the surface of the tertiary nucleic acid molecule may have the same emission spectrum, and the emission spectrum of the detectable label on the surface of the secondary nucleic acid molecule may differ from the emission spectrum of the detectable label on the surface of the tertiary nucleic acid molecule.
[0028] The secondary nucleic acid molecule may contain a cleavable linker. The cleavable linker is preferably located between the first and second domains. The cleavable linker can be a light-cleavable linker (e.g., a UV-cleavable linker), a reducing agent-cleavable linker, or an enzyme-cleavable linker. The linker is preferably a light-cleavable linker.
[0029] The tertiary nucleic acid molecule may contain at least two domains, the first of which can bind to a complementary sequence in at least one secondary nucleic acid molecule, and the second domain can bind to a first detectable label and at least a second detectable label.
[0030] A tertiary nucleic acid molecule may hybridize to at least one secondary nucleic acid molecule and may include a first detectable label and at least a second detectable label. The first detectable label and at least the second detectable label may have the same emission spectrum or different emission spectra.
[0031] A tertiary nucleic acid molecule may contain a cleavable linker. The cleavable linker is preferably located between the first and second domains. The cleavable linker can be any of the following: a light-cleavable linker (e.g., a UV-cleavable linker), a reducing agent-cleavable linker, or an enzyme-cleavable linker. The linker is preferably a light-cleavable linker.
[0032] This disclosure also provides a group of sequencing probes comprising multiple sequencing probes disclosed herein. Preferably, each sequencing probe among the multiple sequencing probes contains a different target-binding domain and binds to a different region in the target nucleic acid.
[0033] The present invention also provides a method for sequencing nucleic acids, which comprises: (1) hybridizing a first sequencing probe described herein to a target nucleic acid immobilized at one or more positions on a substrate, optionally; (2) binding a first complementary nucleic acid molecule containing a first detectable label and at least a second detectable label to a first attachment site among at least three attachment sites of a barcode domain; (3) detecting the first detectable label and at least a second detectable label on the bound first complementary nucleic acid molecule; (4) identifying the positions and attributes of at least two nucleotides in the immobilized target nucleic acid; (5) freeing the first complementary nucleic acid molecule containing the detectable label by binding a first hybridizing nucleic acid molecule lacking the detectable label to the first attachment site, or bringing the first complementary nucleic acid molecule containing the detectable label into contact with sufficient force to release the first detectable label and at least a second detectable label; and (6) The process includes: (7) binding a second complementary nucleic acid molecule containing three detectable labels and at least a fourth detectable label to a second attachment site among at least three attachment sites of the barcode domain; (8) detecting a third detectable label and at least a fourth detectable label on the bound second complementary nucleic acid molecule; (9) optionally identifying the positions and attributes of at least two nucleotides in the immobilized target nucleic acid; (10) optionally removing the sequencing probe from the immobilized target nucleic acid; (11) optionally removing the sequencing probe from the immobilized target nucleic acid; and (12)
[0034] This method further involves (11) hybridizing a second sequencing probe to a target nucleic acid immobilized on a substrate at one or more locations (where the target-binding domains of the first sequencing probe and the second sequencing probe are different); (12) binding a first complementary nucleic acid molecule containing a first detectable label and at least a second detectable label to a first attachment site among at least three attachment sites of the barcode domain; (13) detecting the first detectable label and at least a second detectable label of the bound first complementary nucleic acid molecule; (14) optionally identifying the locations and attributes of at least two nucleotides in the immobilized target nucleic acid; and (15) freeing the first complementary nucleic acid molecule or complex containing the detectable label by binding a first hybridizing nucleic acid molecule lacking the detectable label to the first attachment site, or releasing the first detectable label and at least a second detectable label from the first complementary nucleic acid molecule or complex containing the detectable label. (16) Apply sufficient force to bring the barcode domain into contact with the second of at least three attachment sites; (17) Detect the third and at least four detectable labels on the bound second complementary nucleic acid molecule; (18) optionally identify the positions and attributes of at least two nucleotides in the immobilized target nucleic acid; (19) Repeat steps (15) to (18) until a complementary nucleic acid molecule containing two detectable labels is bound to each of the at least three attachment sites in the barcode domain and the two detectable labels on the bound complementary nucleic acid molecule are detected, thereby identifying a linear sequence of at least six nucleotides for at least two regions of the immobilized target nucleic acid hybridized to the target-binding domain of the sequencing probe; (20) optionally remove the second sequencing probe from the immobilized target nucleic acid.
[0035] This method may further include identifying the sequence of the immobilized target nucleic acid by assembling the linear sequence of nucleotides identified in at least a first region and at least a second region of the immobilized target nucleic acid, respectively.
[0036] Steps (5) and (6) can be performed sequentially or simultaneously. The first detectable label and at least the second detectable label may have the same emission spectrum or different emission spectra. The third detectable label and at least the fourth detectable label may have the same emission spectrum or different emission spectra.
[0037] The first complementary nucleic acid molecule may contain a cleavable linker. The second complementary nucleic acid molecule may also contain a cleavable linker. The first and second complementary nucleic acid molecules may each contain a cleavable linker. Preferably, the cleavable linker is a light-activated linker. Light can be used as the emitting force. UV light is preferred. The light can be provided by a light source selected from the group consisting of arc lamps, lasers, focused UV light sources, and light-emitting diodes.
[0038] The first complementary nucleic acid molecule and the first hybridizing nucleic acid molecule lacking a detectable label may contain the same nucleic acid sequence. The first hybridizing nucleic acid molecule lacking a detectable label may contain a nucleic acid sequence complementary to the flanking single-stranded polynucleotide adjacent to the first attachment site in the barcode domain.
[0039] The second complementary nucleic acid molecule and the second hybridizing nucleic acid molecule lacking a detectable label can contain the same nucleic acid sequence. The second hybridizing nucleic acid molecule lacking a detectable label can contain a nucleic acid sequence complementary to the flanking single-stranded polynucleotide adjacent to the second attachment site within the barcode domain.
[0040] The present invention provides a method for sequencing nucleic acids, comprising: (1) hybridizing a first group of at least one sequencing probe, comprising a plurality of sequencing probes described herein, to a target nucleic acid immobilized at one or more positions on a substrate; (2) binding a first complementary nucleic acid molecule, comprising a first detectable label and at least a second detectable label, to a first attachment site among at least three attachment sites of a barcode domain; (3) detecting the first detectable label and at least a second detectable label of the bound first complementary nucleic acid molecule; (4) optionally identifying the positions and attributes of at least two nucleotides in the immobilized target nucleic acid; and (5) freeing the first complementary nucleic acid molecule containing the detectable label by binding a first hybridizing nucleic acid molecule lacking the detectable label to the first attachment site, or freeing the first complementary nucleic acid molecule containing the detectable label from the first detectable label. A method is also provided which includes: (6) bringing a label and at least a second detectable label into contact with a force sufficient to release a first detectable label; (7) binding a second complementary nucleic acid molecule containing a third detectable label and at least a fourth detectable label to a second attachment site among at least three attachment sites of the barcode domain; (8) detecting the third detectable label and at least a fourth detectable label of the bound second complementary nucleic acid molecule; (9) optionally identifying the location and attribute of at least two nucleotides in the immobilized target nucleic acid; (10) optionally removing at least one first population of the first sequencing probe from the immobilized target nucleic acid; and repeating steps (5) to (8) until a complementary nucleic acid molecule containing two detectable labels is bound to each of the at least three attachment sites in the barcode domain and the two detectable labels of the bound complementary nucleic acid molecule are detected; and (11) optionally removing at least one first population of the first sequencing probe from the immobilized target nucleic acid.
[0041] This method further involves (11) hybridizing at least one second group of second sequencing probes, comprising multiple sequencing probes described herein, to a target nucleic acid immobilized at one or more positions on a substrate (where the target binding domains of the first sequencing probe and the second sequencing probe are different); and (12) attaching a first complementary nucleic acid molecule, comprising a first detectable label and at least a second detectable label, to a first attachment site among at least three attachment sites of the barcode domain. (13) bind to the attachment site; (14) detect the first detectable label and at least a second detectable label on the bound first complementary nucleic acid molecule; (15) optionally identify the positions and attributes of at least two nucleotides in the immobilized target nucleic acid; (16) free the first complementary nucleic acid molecule or complex containing the detectable label by binding the first hybridizing nucleic acid molecule lacking the detectable label to the first attachment site, or free the first complementary nucleic acid molecule or complex containing the detectable label from the first detectable label and at least a second (16) bringing the two detectable labels into contact with sufficient force to release them; (17) binding a second complementary nucleic acid molecule containing a third detectable label and at least a fourth detectable label to a second attachment site among at least three attachment sites of the barcode domain; (18) detecting the third detectable label and at least a fourth detectable label on the bound second complementary nucleic acid molecule; (19) optionally identifying the positions and attributes of at least two nucleotides in the immobilized target nucleic acid; (10) identifying a linear sequence of at least six nucleotides for at least a second region of the immobilized target nucleic acid hybridized to the target binding domain of the sequencing probe by repeating steps (15) to (18) until a complementary nucleic acid molecule containing two detectable labels is bound to each of the at least three attachment sites in the barcode domain and the two detectable labels on the bound complementary nucleic acid molecule are detected; (11) optionally removing at least one second population of the second sequencing probe from the immobilized target nucleic acid.
[0042] This method may further include identifying the sequence of the immobilized target nucleic acid by reconstructing the linear order of nucleotides identified in at least a first region and at least a second region of the immobilized target nucleic acid, respectively.
[0043] Steps (5) and (6) can be performed sequentially or simultaneously. The first detectable label and at least the second detectable label may have the same emission spectrum or different emission spectra. The third detectable label and at least the fourth detectable label may have the same emission spectrum or different emission spectra.
[0044] The first complementary nucleic acid molecule may contain a cleavable linker. The second complementary nucleic acid molecule may also contain a cleavable linker. The first and second complementary nucleic acid molecules may each contain a cleavable linker. Preferably, the cleavable linker is a light-activated linker. Light can be used as the emitting force. UV light is preferred. The light can be provided by a light source selected from the group consisting of arc lamps, lasers, focused UV light sources, and light-emitting diodes.
[0045] The first complementary nucleic acid molecule and the first hybridizing nucleic acid molecule lacking a detectable label may contain the same nucleic acid sequence. The first hybridizing nucleic acid molecule lacking a detectable label may contain a nucleic acid sequence complementary to the flanking single-stranded polynucleotide adjacent to the first attachment site in the barcode domain.
[0046] The second complementary nucleic acid molecule and the second hybridizing nucleic acid molecule lacking a detectable label can contain the same nucleic acid sequence. The second hybridizing nucleic acid molecule lacking a detectable label can contain a nucleic acid sequence complementary to the flanking single-stranded polynucleotide adjacent to the second attachment site within the barcode domain.
[0047] This disclosure also provides a method for determining the nucleotide sequence of a nucleic acid, the method comprising: (1) hybridizing a first sequencing probe disclosed herein to a target nucleic acid immobilized at one or more positions on a substrate, optionally; (2) hybridizing a first complementary nucleic acid molecule containing a first detectable label and a second detectable label to a first attachment position among at least three attachment positions of a barcode domain; (3) identifying the first and second detectable labels of the first complementary nucleic acid molecule hybridized to the first attachment position; (4) removing the first and second detectable labels hybridized to the first attachment position; (5) hybridizing a second complementary nucleic acid molecule containing a third and a fourth detectable label to a second attachment position among at least three attachment positions of a barcode domain; and (6) hybridizing the first complementary nucleic acid molecule hybridized to the second attachment position (7) Identifying a third and fourth detectable label of a second complementary nucleic acid molecule that has been hybridized; (8) Removing the third and fourth detectable labels hybridized to the second attachment site; (9) Hybridizing a third complementary nucleic acid molecule containing a fifth and sixth detectable label to the third attachment site of at least three attachment sites of the barcode domain; (10) Identifying the fifth and sixth detectable labels of the third complementary nucleic acid molecule hybridized to the third attachment site; (11) Identifying the linear sequence of at least six nucleotides of the optionally immobilized target nucleic acid hybridized to the target binding domain of the sequencing probe, based on the attributes of the first, second, third, fourth, fifth, and sixth detectable labels.
[0048] This method further includes (11) optionally removing at least a first sequencing probe from a first region of an immobilized target nucleic acid; (12) optionally hybridizing at least a second sequencing probe disclosed herein to a second region of the target nucleic acid immobilized on a substrate at one or more positions (where the target binding domains of the first sequencing probe and at least a second sequencing probe are different); (13) hybridizing a first complementary nucleic acid molecule containing a first detectable label and a second detectable label to a first attachment site among at least three attachment sites of the barcode domain; (14) detecting the first detectable label and the second detectable label of the first complementary nucleic acid molecule hybridized to the first attachment site; and (15) hybridizing a second complementary nucleic acid molecule containing a third detectable label and a fourth detectable label to a second attachment site among the at least three attachment sites of the barcode domain. (16) Detect the third and fourth detectable labels of the second complementary nucleic acid molecule hybridized to the second attachment site; (17) Remove the third and fourth detectable labels hybridized to the second attachment site; (18) Hybridize the third complementary nucleic acid molecule containing the fifth and sixth detectable labels to the third attachment site of the barcode domain among the at least three attachment sites; (19) Identify the fifth and sixth detectable labels of the third complementary nucleic acid molecule hybridized to the third attachment site; (20) Identify the linear sequence of at least six nucleotides in the second region of the potentially immobilized target nucleic acid hybridized to the target binding domain of the second sequencing probe, based on the attributes of the first, second, third, fourth, fifth, and sixth detectable labels.
[0049] This method may further include identifying the sequence of the immobilized target nucleic acid by reconstructing the linear order of nucleotides identified in at least a first region and at least a second region of the immobilized target nucleic acid, respectively.
[0050] Steps (4) and (5) can be performed sequentially or simultaneously. The first detectable label and at least the second detectable label may have the same emission spectrum or different emission spectra. The third detectable label and at least the fourth detectable label may have the same emission spectrum or different emission spectra. The fifth detectable label and at least the sixth detectable label may have the same emission spectrum or different emission spectra.
[0051] The first complementary nucleic acid molecule may contain a cleavable linker. The second complementary nucleic acid molecule may contain a cleavable linker. The third complementary nucleic acid molecule may contain a cleavable linker. The first complementary nucleic acid molecule, the second complementary nucleic acid molecule, and at least the third complementary nucleic acid molecule may each contain a cleavable linker. The cleavable linker is preferably a light-cleavable linker. A method for removing any one of the first complementary nucleic acid molecule, the second complementary nucleic acid molecule, and at least the third complementary nucleic acid molecule may include contact with light. UV light is preferred. The light can be provided by a light source selected from the group consisting of arc lamps, lasers, focused UV light sources, and light-emitting diodes.
[0052] This disclosure provides a method for determining the nucleotide sequence of a nucleic acid, comprising: (1) hybridizing a first sequencing probe disclosed herein to a first region of a target nucleic acid obtained from a predetermined gene and optionally immobilized at one or more positions on a substrate; (2) hybridizing a first complementary nucleic acid molecule containing a first detectable label and a second detectable label to a first attachment site among at least three attachment sites of a barcode domain; (3) detecting the first and second detectable labels of the first complementary nucleic acid molecule hybridized to the first attachment site; (4) removing the first and second detectable labels hybridized to the first attachment site; (5) hybridizing a second complementary nucleic acid molecule containing a third and a fourth detectable label to a second attachment site among at least three attachment sites of a barcode domain; and (6) removing the second detectable label hybridized to the second attachment site. A method is also provided which includes: (7) detecting a third detectable label and a fourth detectable label of a complementary nucleic acid molecule; (8) removing the third detectable label and the fourth detectable label hybridized to the second attachment site; (9) hybridizing the third complementary nucleic acid molecule, which includes a fifth detectable label and a sixth detectable label, to the third attachment site of the barcode domain among the at least three attachment sites; (10) identifying the fifth detectable label and the sixth detectable label of the third complementary nucleic acid molecule hybridized to the third attachment site; and (11) identifying the linear sequence of at least six nucleotides in a first region of a potentially immobilized target nucleic acid hybridized to the target binding domain of a first sequencing probe, based on the attributes of the first detectable label, the second detectable label, the third detectable label, the fourth detectable label, the fifth detectable label, and the sixth detectable label.
[0053] This method further includes (11) optionally removing at least a first sequencing probe from a first region of an immobilized target nucleic acid; (12) optionally hybridizing at least a second sequencing probe disclosed herein to a second region of the target nucleic acid immobilized on a substrate at one or more positions (where the target binding domains of the first sequencing probe and at least a second sequencing probe are different); (13) hybridizing a first complementary nucleic acid molecule containing a first detectable label and a second detectable label to a first attachment site among at least three attachment sites of the barcode domain; (14) detecting the first detectable label and the second detectable label of the first complementary nucleic acid molecule hybridized to the first attachment site; and (15) hybridizing a second complementary nucleic acid molecule containing a third detectable label and a fourth detectable label to a second attachment site among the at least three attachment sites of the barcode domain. (16) Detect the third and fourth detectable labels of the second complementary nucleic acid molecule hybridized to the second attachment site; (17) Remove the third and fourth detectable labels hybridized to the second attachment site; (18) Hybridize the third complementary nucleic acid molecule containing the fifth and sixth detectable labels to the third attachment site of the barcode domain among the at least three attachment sites; (19) Identify the fifth and sixth detectable labels of the third complementary nucleic acid molecule hybridized to the third attachment site; (20) Identify the linear sequence of at least six nucleotides in the second region of the potentially immobilized target nucleic acid hybridized to the target binding domain of the second sequencing probe, based on the attributes of the first, second, third, fourth, fifth, and sixth detectable labels.
[0054] This method may further include identifying the sequence of the immobilized target nucleic acid by reconstructing the linear order of nucleotides identified in at least a first region and at least a second region of the immobilized target nucleic acid, respectively.
[0055] Steps (4) and (5) can be performed sequentially or simultaneously. The first detectable label and at least the second detectable label may have the same emission spectrum or different emission spectra. The third detectable label and at least the fourth detectable label may have the same emission spectrum or different emission spectra. The fifth detectable label and at least the sixth detectable label may have the same emission spectrum or different emission spectra.
[0056] The first complementary nucleic acid molecule may contain a cleavable linker. The second complementary nucleic acid molecule may contain a cleavable linker. The third complementary nucleic acid molecule may contain a cleavable linker. The first complementary nucleic acid molecule, the second complementary nucleic acid molecule, and at least the third complementary nucleic acid molecule may each contain a cleavable linker. The cleavable linker is preferably a light-cleavable linker. A method for removing any one of the first complementary nucleic acid molecule, the second complementary nucleic acid molecule, and at least the third complementary nucleic acid molecule may include contact with light. UV light is preferred. The light can be provided by a light source selected from the group consisting of arc lamps, lasers, focused UV light sources, and light-emitting diodes.
[0057] This disclosure also provides apparatus for carrying out any of the methods disclosed herein.
[0058] This disclosure also provides one or more kits comprising a substrate, a collection of sequencing probes disclosed herein, at least three complementary nucleic acid molecules including a first detectable label and at least two second detectable labels, and instructions for use. This one or more kits may further include at least one capture probe. This one or more kits may further include at least two capture probes.
[0059] Any one of the above aspects can be combined with any other aspect.
[0060] Unless otherwise specified, all scientific and technical terms used herein have the same meaning as generally understood by those skilled in the art to which this disclosure belongs. In the specification, unless the context clearly indicates otherwise, the singular form includes the plural, for example, the terms “one” and “the foregoing” are understood to be singular or plural, and the term “or” is understood to be inclusive. For example, “one element” means one or more elements. Throughout this specification, the word “contains” or its variations (“contains,” “contains”) is understood to encompass the elements, integers, processes, groups of elements, groups of integers, and groups of processes described, but not to exclude any other elements, integers, processes, groups of elements, groups of integers, and groups of processes described. “Approximately” is understood to mean within 10%, or within 9%, or within 8%, or within 7%, or within 6%, or within 5%, or within 4%, or within 3%, or within 2%, or within 1%, or within 0.5%, or within 0.1%, or within 0.05%, or within 0.01%. Unless the context clearly indicates otherwise, all figures presented herein are modified by the term "approximately".
[0061] While methods and materials similar to or equivalent to those described herein may be used in the practice and testing of this disclosure, appropriate methods and materials are described below. All publications, patent applications, patents, and other references referenced herein are incorporated herein by reference in their entirety. References cited herein are not considered prior art to the claimed inventions. In the event of any dispute, this specification shall prevail, including definitions. Other features and advantages of this disclosure will become apparent from the following detailed description and claims. [Brief explanation of the drawing]
[0062] This patent or application file includes at least one color drawing. A copy of this patent or patent application publication containing the color drawing will be provided by the Patent and Trademark Office upon request and payment of the required fees.
[0063] The above features, as well as other features, will become clearer when combined with the attached drawings in the following detailed explanation.
[0064] [Figure 1] Figure 1 is a diagram of an example of a sequencing probe according to this disclosure.
[0065] [Figure 2] Figure 2 shows the designs of the standard sequencing probe and the three-part sequencing probe according to this disclosure.
[0066] [Figure 3] Figure 3 shows an example of a reporter complex of the present disclosure hybridized to an example of a sequencing probe of the present disclosure.
[0067] [Figure 4] Figure 4 shows a schematic diagram of an example of a reporter probe according to this disclosure.
[0068] [Figure 5]Figure 5 is a schematic diagram of some examples of reporter probes according to this disclosure.
[0069] [Figure 6] Figure 6 is a schematic diagram of an example of a reporter probe of this disclosure, including an "additional handle".
[0070] [Figure 7] Figure 7 is a schematic diagram of some examples of reporter probes according to this disclosure that include tertiary nucleic acids with different arrangements.
[0071] [Figure 8] Figure 8 is a schematic diagram of some examples of reporter probes according to this disclosure that include branched tertiary nucleic acids.
[0072] [Figure 9] Figure 9 shows an example of a reporter probe according to this disclosure, including a severable linker modification.
[0073] [Figure 10] Figure 10 shows a location in an example of a reporter probe according to this disclosure where a severable linker modification is possible.
[0074] [Figure 11] Figure 11 is a schematic diagram illustrating the capture of a single target nucleic acid using the two capture probe systems described herein.
[0075] [Figure 12-1] Figure 12 shows the results from an experiment utilizing the method of the present invention to capture and detect a multi-cancer panel consisting of 100 targets using FFPE samples. [Figure 12-2] Same as the explanation above.
[0076] [Figure 13]Figure 13 is a schematic diagram of two captured target DNA molecules hybridized with a capture probe, a blocker oligo, and a sequencing probe for sequencing targeting large target nucleic acid molecules.
[0077] [Figure 14] Figure 14 is a schematic diagram of one cycle of the sequencing method according to this disclosure.
[0078] [Figure 15-1] Figure 15 shows a schematic diagram of one cycle of the sequencing method according to this disclosure and the corresponding imaging data recovered during this cycle. [Figure 15-2] Same as the explanation above.
[0079] [Figure 16] Figure 16 shows an example of the configuration of a sequencing probe pool according to this disclosure, in which eight different sequencing probe pools are designed using eight different color combinations.
[0080] [Figure 17] Figure 17 compares the barcode domain design disclosed in United States Patent Application Publication No. 2016 / 019470 with the barcode domain design of this disclosure.
[0081] [Figure 18] Figure 18 is a schematic diagram of one or more sequencing probes of this disclosure hybridized to a captured target nucleic acid molecule.
[0082] [Figure 19] Figure 19 shows fluorescence images recorded during the sequencing method according to this disclosure when one or more sequencing probes are hybridized to a target nucleic acid.
[0083] [Figure 20] Figure 20 shows a schematic diagram of several sequencing probes of this disclosure bound along the length of a target nucleic acid, and the corresponding recorded fluorescence image.
[0084] [Figure 21] Figure 21 shows an example of imaging data recorded during a sequencing cycle according to this disclosure, and the fluorescence signal intensity profile of the reporter probe according to this disclosure.
[0085] [Figure 22] Figure 22 is a schematic diagram of the sequencing cycle according to this disclosure, in which one position of the barcode is darkened using a severable linker modification.
[0086] [Figure 23] Figure 23 illustrates an example of the sequencing cycle according to this disclosure, in which one position within the barcode domain is darkened by the substitution of the primary nucleic acid.
[0087] [Figure 24] Figure 24 shows an example of capturing RNA and DNA together from an FFPE sample.
[0088] [Figure 25] Figure 25 is a schematic diagram illustrating how the sequencing method of this disclosure enables sequencing of the same base of a target nucleic acid using different sequencing probes.
[0089] [Figure 26] Figure 26 illustrates how multiple base calls at specific nucleotide positions on a target nucleic acid are recorded from one or more sequencing probes, combined to form a consensus sequence, thereby increasing the accuracy of the final base call.
[0090] [Figure 27] Figure 27 shows fluorescence images of the sequencing method of this disclosure recorded after the capture, extension, and detection of a 33 kilobase DNA fragment.
[0091] [Figure 28-1] Figure 28 shows the results from a sequencing experiment, obtained using the sequencing method of this disclosure and analyzed using the ShortStack® algorithm. Regarding the graph on the left, the sequences shown correspond to sequence numbers 3, 4, 6, 8, 7, and 5, starting from the top left and moving clockwise. Regarding the table on the right, the sequences correspond to sequence numbers 3, 4, 7, 8, 6, and 5, from top to bottom. [Figure 28-2] Same as the explanation above.
[0092] [Figure 29] Figure 29 shows a schematic diagram of the experimental design for multiple capture and sequencing of oncogene targets from FFPE samples.
[0093] [Figure 30] Figure 30 shows a schematic diagram of direct sequencing and results from experiments investigating the compatibility of RNA molecules using the sequencing method of this disclosure.
[0094] [Figure 31] Figure 31 shows the results of sequencing RNA and DNA molecules having the same nucleotide sequence using the sequencing method of this disclosure.
[0095] [Figure 32] Figure 32 shows the results of multi-target capture using an RNA panel.
[0096] [Figure 33] Figure 33 shows a schematic diagram of the entire ShortStack® software pipeline process.
[0097] [Figure 34] Figure 34 shows the results of mutation analysis performed on the simulated dataset using the ShortStack™ software pipeline.
[0098] [Figure 35] Figure 35 shows the overall variant reading accuracy of the ShortStack® software pipeline for various types of variants.
[0099] [Figure 36] Figure 36 shows the intensity distribution for reporter complexes labeled with specific color combinations.
[0100] [Figure 37] Figure 37 shows a typical sedimentary gradient of this disclosure.
[0101] [Figure 38] Figure 38 shows the capture efficiency of this disclosure while titrating DNA mass inputs from 25 ng to 500 ng.
[0102] [Figure 39] Figure 39 shows the HLPC purification of an example of the reporter complex of this disclosure.
[0103] [Figure 40-1] Figure 40 shows the efficiency and accuracy of hybridization of the reporter probe of this disclosure in the presence of various buffer additives. [Figure 40-2] Same as the explanation above.
[0104] [Figure 41] Figure 41 shows the loss rate of target nucleic acids when using the sequencing method of this disclosure in the presence of various buffer additives.
[0105] [Figure 42]Figure 42 shows the efficiency and error of the reporter probe of this disclosure, which contains a dodecamer complementary nucleic acid.
[0106] [Figure 43] Figure 43 shows the efficiency and error of a reporter probe using an example of an 8×8×8 14-mer reporter set.
[0107] [Figure 44] Figure 44 shows the efficiency and error of a reporter probe using an example of a 10 × 10 × 10¹⁴-mer reporter set.
[0108] [Figure 45] Figure 45 shows a comparison of the performance of the standard sequencing probe and the three-part sequencing probe of this disclosure.
[0109] [Figure 46] Figure 46 shows the effect of LNA substitution in the target binding domain by this disclosure when using individual probes.
[0110] [Figure 47] Figure 47 shows the effect of LNA substitution in the target binding domain according to this disclosure when using a pool of nine probes.
[0111] [Figure 48] Figure 48 shows the effect of substitutions by modified nucleotides and nucleic acid analogs within the target-binding domain as described in this disclosure.
[0112] [Figure 49] Figure 49 shows the results from an experiment to quantify the raw accuracy of the sequencing method according to this disclosure.
[0113] [Figure 50]Figure 50 shows experimental results to determine the accuracy of the sequencing method according to this disclosure when sequencing nucleotides in a target nucleic acid using two or more sequencing probes. [Modes for carrying out the invention]
[0114] This disclosure provides sequencing probes, reporter probes, methods, kits, and apparatus that enable rapid nucleic acid sequencing without enzymes, amplification, or libraries, and that offer long readout lengths and low error rates.
[0115] Composition of the present disclosure
[0116] This disclosure provides a sequencing probe comprising a target-binding domain and a barcode domain; the target-binding domain comprises at least eight nucleotides capable of binding to a target nucleic acid, at least six nucleotides in the target-binding domain capable of identifying a corresponding (complementary) nucleotide in the target nucleic acid molecule, and at least two nucleotides in the target-binding domain not capable of identifying a corresponding nucleotide in the target nucleic acid molecule; at least one or at least two of the at least six nucleotides in the target-binding domain are modified nucleotides or nucleotide analogs, and at least one or at least two of the nucleotides in the target-binding domain are A sequencing probe is provided, which is a universal base or a degenerate base; the barcode domain comprises a synthetic skeleton, the barcode domain comprising at least three attachment sites, each attachment site comprising at least one attachment region comprising at least one nucleic acid sequence to which a complementary nucleic acid molecule can bind, each of the at least three attachment sites corresponding to two of the at least six nucleotides in the target binding domain, and each of the at least three attachment sites has a different nucleic acid sequence, the nucleic acid sequence at each of the at least three attachment sites determining the positions and attributes of the corresponding two of the at least six nucleotides in the target nucleic acid to which the target binding domain binds.
[0117] This disclosure provides a sequencing probe comprising a target-binding domain and a barcode domain; the target-binding domain comprises at least 10 nucleotides capable of binding to a target nucleic acid, at least 6 nucleotides in the target-binding domain capable of identifying corresponding (complementary) nucleotides in the target nucleic acid molecule, and at least 4 nucleotides in the target-binding domain not capable of identifying corresponding nucleotides in the target nucleic acid molecule; and the barcode domain comprises a synthetic skeleton, the barcode domain comprising at least 3 attachment sites, each attachment site comprising at least 1 attachment region comprising at least 1 nucleic acid sequence to which a complementary nucleic acid molecule can bind, each of the at least 3 attachment sites having a different nucleic acid sequence, the nucleic acid sequence at each of the at least 3 attachment sites determining the position and attributes of the corresponding 2 nucleotides in the at least 6 nucleotides in the target nucleic acid to which the target-binding domain binds.
[0118] This disclosure also provides a group of sequencing probes, including any of the sequencing probes disclosed herein.
[0119] The target-binding domain, barcode domain, and backbone of the disclosed sequencing probe, as well as complementary nucleic acid molecules (e.g., reporter molecules or reporter complexes), are described in more detail below.
[0120] The sequencing probes of this disclosure include a target-binding domain and a barcode domain. Figure 1 is a schematic diagram of an example of a sequencing probe according to this disclosure. Figure 1 shows that the target-binding domain can bind to a target nucleic acid. The target nucleic acid can be any nucleic acid that the sequencing probe of this disclosure can hybridize. The target nucleic acid can be DNA or RNA. The target nucleic acid can be obtained from a biological sample derived from the subject. The terms “target-binding domain” and “sequencing domain” are used interchangeably herein.
[0121] The target-binding domain can contain a series of nucleotides (e.g., a polynucleotide). The target-binding domain can contain DNA, RNA, or a combination thereof. If the target-binding domain is a polynucleotide, it binds to the target nucleic acid by hybridizing to a portion of the target nucleic acid that is complementary to the target-binding domain of the sequencing probe, as shown in Figure 1.
[0122] The target-binding domain of a sequencing probe can be designed to control the likelihood and rate of hybridization and / or dehybridization of the sequencing probe. Generally, the lower the probe's Tm, the greater the likelihood of the probe dehybriding from the target nucleic acid. Therefore, using probes with lower Tm will result in fewer probes binding to the target nucleic acid.
[0123] The length of the target-binding domain partially influences the likelihood of a probe hybridizing to a target nucleic acid and the likelihood of it remaining hybridized. Generally, the longer the target-binding domain (the more nucleotides it contains), the less likely a complementary sequence is to exist within the target nucleotide. Conversely, the shorter the target-binding domain, the greater the likelihood of a complementary sequence being present within the target nucleotide. For example, the probability of a tetrameric sequence being located within a target nucleic acid is 1 / 256, while the probability of a hexamer sequence being located within a target nucleic acid is 1 / 4096. As a consequence, a set of shorter probes is likely to bind to more positions on a given nucleic acid of a given length compared to a set of longer probes.
[0124] In various cases, to increase the number of readouts in a given nucleic acid of length, it is preferable to prepare probes with shorter target-binding domains. This increases coverage of a single target nucleic acid or a portion of such a target nucleic acid, particularly the portion of particular interest, when detecting mutations or SNP alleles, for example.
[0125] The target-binding domain can consist of any number of nucleotides. Specifically, it can be any of the following: at least 12 nucleotides, at least 10 nucleotides, at least 8 nucleotides, at least 6 nucleotides, or at least 3 nucleotides.
[0126] Each nucleotide within the target-binding domain can identify (or encode) the complementary nucleotide of the target molecule. Alternatively, some nucleotides within the target-binding domain may identify (or encode) the complementary nucleotide of the target molecule, while others may not.
[0127] The target-binding domain may contain at least one native base. The target-binding domain may not contain a native base. The target-binding domain may contain at least one modified nucleotide or nucleic acid analog. The target-binding domain may not contain a modified nucleotide or nucleic acid analog. The target-binding domain may contain at least one universal base. The target-binding domain may not contain a universal base. The target-binding domain may contain at least one degenerate base. The target-binding domain may not contain a degenerate base.
[0128] The target-binding domain may contain any combination of native bases (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more native bases), modified nucleotides or nucleic acid analogs (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more modified nucleotides or nucleic acid analogs), universal bases (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more universal bases), and degenerate bases (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more degenerate bases). The native bases, modified nucleotides or nucleic acid analogs, universal bases, and degenerate bases of specific target-binding domains can be arranged in any order when present in combination.
[0129] Non-limiting examples of the term “modified nucleotide” or “nucleic acid analog” include cross-linked nucleic acids (LNA), cross-linked nucleic acids (BNA), propyne-modified nucleic acids, zip nucleic acids (ZNA®), isoguanine, and isocytosine. The target-binding domain may contain 0 to 6 modified nucleotides or nucleic acid analogs (e.g., 0, 1, 2, 3, 4, 5, or 6). The modified nucleotides or nucleic acid analogs are preferably cross-linked nucleic acids (LNA).
[0130] In this specification, a non-limiting example of the term “crosslinked nucleic acid (LNA)” includes modified RNA nucleotides in which the ribose moiety contains a methylene crosslink connecting the 2' oxygen and 4' carbon. This methylene crosslink locks the ribose into the 3'-endoconformation (also known as the north conformation) found in type A RNA double strands. The term inaccessible RNA can be used interchangeably with LNA. In this specification, a non-limiting example of the term “crosslinked nucleic acid (BNA)” includes modified RNA molecules containing a five- or six-membered crosslink structure with a fixed 3'-endoconformation (also known as the north conformation). The crosslink structure connects the 2' oxygen of ribose to the 4' carbon of ribose. A variety of different crosslink structures are possible, including carbon atoms, nitrogen atoms, and hydrogen atoms. In this specification, a non-limiting example of the term “propyne-modified nucleic acid” includes pyrimidines, i.e., cytosine and thymine / uracil, which contain propyne modifications at the C5 position of the nucleic acid base. In this specification, non-limiting examples of the term "Zip Nucleic Acid (ZNA®)" include oligonucleotides conjugated to a cationic spermine moiety.
[0131] In this specification, the non-limiting term “universal base” includes nucleotide bases that do not follow the Watson-Crick base pairing rules but can bind to any of the four basic bases (A, T / U, C, G) located on the target nucleic acid. In this specification, the non-limiting term “degenerate base” includes nucleotide bases that do not follow the Watson-Crick base pairing rules but can bind to at least two, but not all, of the four normal bases (A, T / U, C, G) located on the target nucleic acid. Degenerate bases may also be called fluctuating bases, and these terms are used interchangeably in this specification.
[0132] The sequencing probe illustrated in Figure 1 exhibits a target-binding domain containing a 6-length nucleotide sequence (b1-b2-b3-b4-b5-b6) that specifically hybridizes to complementary nucleotides 1-6 of the target nucleic acid to be sequenced. This hexameric portion (b1-b2-b3-b4-b5-b6) of the target-binding domain identifies (or codes for) the complementary nucleotides (1-2-3-4-5-6) in the target sequence. Each side of this hexameric sequence is flanked by a base (N). The base represented by (N) can be independently a universal base or a degenerate base. Typically, the base represented by (N) is independently one of the canonical bases. The base represented by (N) does not identify (or code for) the complementary nucleotide to which it binds in the target sequence and is independent of the nucleic acid sequence of the (hexameric) sequence (b1-b2-b3-b4-b5-b6).
[0133] The sequencing probes shown in Figure 1 can be used in combination with the sequencing method of this disclosure to sequence target nucleic acids, utilizing only hybridization reactions and requiring no covalent chemistry, enzymes, or amplification. A total of 4096 sequencing probes are required to sequence all possible hexameric sequences within a target nucleic acid molecule (4 6 (=4096).
[0134] Figure 1 shows one example configuration of the target-binding domain of the sequencing probe of this disclosure. Table 1 shows several other configurations of the target-binding domain of this disclosure. One preferred target-binding domain is called the “6 LNA” target-binding domain and contains six LNAs at positions b1–b6 of the target-binding domain. Each of these six LNAs is flanked by a base (N). As used herein, the base (N) can be a universal base / degenerate base independent of the nucleic acid sequence (b1-b2-b3-b4-b5-b6) of the (hexameric) sequence, or a canonical base. In other words, while the bases b1-b2-b3-b4-b5-b6 may be specific to any given target sequence, the (N) base can be a universal base / degenerate base, or any of the four canonical bases that are not specific to the target specified by b1-b2-b3-b4-b5-b6. For example, if the target sequence being investigated is CAGGCATA, the bases b1-b2-b3-b4-b5-b6 of the target-binding domain are thought to be TCCGTA, while each of the (N) bases of the target-binding domain can independently be A, C, T, or G. Therefore, the resulting target-binding domain can be any of the sequences ATCCGTAG, TTCCGTAC, GTCCGTAG, or any other of the 16 possible sequences. Alternatively, two (N) bases can be located before the 6 LNA. Or, furthermore, two (N) bases can come after the 6 LNA.
[0135] [Table 1]
[0136] Table 1 also lists “decamer” target-binding domains containing 10 naturally occurring target-specific bases.
[0137] Table 1 also shows the “Natural I” target-binding domain, which contains six native bases at positions b1–b6. Each of these six native bases is flanked by two (N) bases. Alternatively, all four (N) bases could be located in front of the six native bases. Alternatively, all four (N) bases could be located behind the six native bases. If any number of the four (N) bases (i.e., 1, 2, 3, or 4) could be located in front of the six native bases, the remaining (N) bases would be located behind them.
[0138] Table 1 also shows the “Natural II” target-binding domain, which contains six native bases at positions b1–b6. Each of these six native bases is flanked by one (N) base. Alternatively, both (N) bases could be positioned in front of the six native bases. Alternatively, both (N) bases could be positioned behind the six native bases.
[0139] Table 1 also describes "2 LNA" target-binding domains, which contain combinations of two LNAs and four native bases at positions b1–b6 of the target-binding domain. The two LNAs and four native bases can appear in any order. For example, if positions b3 and b4 can be LNAs, then positions b1, b2, b5, and b6 are native bases. Each of the bases b1–b6 has one (N) base adjacent to it. Alternatively, two (N) bases can be positioned before bases b1–b6. Or, two (N) bases can be positioned after bases b1–b6.
[0140] Table 1 also describes a “4 LNA” target-binding domain, which contains combinations of four LNAs and two native bases at positions b1–b6 of the target-binding domain. The four LNAs and two native bases can appear in any order. For example, if positions b2–b5 can be LNAs, then positions b1 and b6 are native bases. Each of the bases b1–b6 has one (N) base adjacent to it. Alternatively, two (N) bases can be placed before bases b1–b6. Or, two (N) bases can be placed after bases b1–b6.
[0141] The sequencing probes of this disclosure include a synthetic skeleton. The target-binding domain and the barcode domain are functionally linked. The target-binding domain and the barcode domain can be covalently linked as part of a single synthetic skeleton. The target-binding domain and the barcode domain can be linked via a linker (e.g., a nucleic acid linker, a chemical linker). The synthetic skeleton can include any material (e.g., polysaccharides, polynucleotides, polymers, plastics, fibers, peptides, peptide nucleic acids, polypeptides). The synthetic skeleton is preferably rigid. The synthetic skeleton can include a single-stranded DNA molecule. The skeleton can include a "DNA origami" consisting of six DNA double helices (see, for example, Lin et al., "Submicrometre geometrically encoded fluorescent barcodes self-assembled from DNA," Nature Chemistry; October 2012; Vol. 4(10): pp. 832-839). Barcodes can be created from DNA origami tiles (Jungmann et al., "Multiplexed 3D cellular super-resolution imaging with DNA-PAINT and Exchange-PAINT," Nature Methods, Vol. 11, No. 3, 2014).
[0142] The sequencing probes of this disclosure may include a synthetic skeleton that is partially double-stranded. The sequencing probes may include a single-stranded DNA synthetic skeleton and a double-stranded DNA spacer between the target-binding domain and the barcode domain. The sequencing probes may include a single-stranded DNA synthetic skeleton and a polymer-based spacer between the target-binding domain and the barcode domain that has mechanical properties similar to those of double-stranded DNA. Typical polymer-based spacers include polyethylene glycol (PEG) type polymers.
[0143] A double-stranded DNA spacer can consist of approximately 1 to 100 nucleotides in length, approximately 2 to 50 nucleotides in length, or approximately 20 to 40 nucleotides in length. Preferably, the double-stranded DNA spacer has a length of approximately 36 nucleotides.
[0144] One sequencing probe in this disclosure is referred to as the “standard probe” and is shown in the left panel of Figure 2. The standard probe in Figure 2 contains a barcode domain covalently bound to the target-binding domain, so that the target-binding domain and the barcode domain are located on the same oligonucleotide chain. In the left panel of Figure 2, a single-stranded oligonucleotide is bound to a stem oligonucleotide, creating a double-stranded spacer region of 36 nucleotides in length called the stem sequence. Using this structure, each sequencing probe in a pool of multiple probes can hybridize to the same stem sequence.
[0145] Another sequencing probe in this disclosure is referred to as a “three-part probe” and is shown in the right-hand figure of Figure 2. The three-part probe in Figure 2 includes a barcode domain linked to a target-binding domain via a linker. In this example, the linker is a single-stranded stem oligonucleotide that hybridizes a single-stranded oligonucleotide containing the target-binding domain to a single-stranded oligonucleotide containing the barcode domain, creating a 36-nucleotide double-stranded spacer region that bridges the barcode domain (18 nucleotides) and the target-binding domain (18 nucleotides). Using this example of probe configuration, each barcode can be designed to hybridize to a single stem sequence with the aim of preventing exchange of barcode domains. Furthermore, after hybridizing each barcode domain to the corresponding stem oligonucleotide, different sequencing probes can be pooled together.
[0146] The barcode domain contains multiple attachment sites (e.g., one, two, three, four, five, six, seven, eight, nine, ten, or more). The number of attachment sites can be less than, equal to, or greater than the number of nucleotides in the target-binding domain. The target-binding domain can contain more nucleotides than the number of attachment sites in the skeletal domain (e.g., one, two, three, four, five, six, seven, eight, nine, ten, or more). The target-binding domain can contain eight nucleotides, and the barcode domain can contain three attachment sites. The target-binding domain can contain ten nucleotides, and the barcode domain can contain three attachment sites.
[0147] The barcode domain has no length limit, as long as it is long enough for at least three attachment sites, as described below. The terms “attachment site,” “location,” and “spot” are used interchangeably herein. The terms “barcode domain” and “reporting domain” are used interchangeably herein.
[0148] Each attachment site within the barcode domain corresponds to two nucleotides (dinucleotides) within the target-binding domain, and therefore to a complementary dinucleotide within the target-binding domain that hybridizes with the aforementioned dinucleotide within the target-binding domain. As a non-restrictive example, the first attachment site in the barcode domain corresponds to the first and second nucleotides in the target-binding domain (for example, in Figure 1, R1 is the first attachment site in the barcode domain, and R1 corresponds to dinucleotides b1 and b2 in the target-binding domain, which identify dinucleotides 1 and 2 of the target nucleic acid); the second attachment site in the barcode domain corresponds to the third and fourth nucleotides in the target-binding domain (for example, in Figure 1, R2 is the second attachment site in the barcode domain, and R2 corresponds to dinucleotides b3 and b4 in the target-binding domain, which identify dinucleotides 3 and 4 of the target nucleic acid); and the third attachment site in the barcode domain corresponds to the fifth and sixth nucleotides in the target-binding domain (for example, in Figure 1, R3 is the third attachment site in the barcode domain, and R3 corresponds to dinucleotides b5 and b6 in the target-binding domain, which identify dinucleotides 5 and 6 of the target nucleic acid).
[0149] Each attachment location within a barcode domain contains at least one attachment region (e.g., 1 to 50, or more). Some locations within a barcode domain may have more attachment regions than others (e.g., the first attachment location may have 3 attachment regions while the second attachment location has 2). Alternatively, each attachment location within a barcode domain may have the same number of attachment regions. Each attachment location within a barcode domain may contain one attachment region. Each attachment location within a barcode domain may contain two or more attachment regions. At least one of at least three attachment locations within a barcode domain may contain a different number of attachment regions than the other two attachment locations within the barcode domain.
[0150] Each attachment region contains at least one copy (i.e., 1 to 50, e.g., 10 to 30) of the nucleic acid sequence to which a complementary nucleic acid molecule (e.g., DNA or RNA) can reversibly bind. The nucleic acid sequences of the attachment regions at one attachment site can be the same. In this case, the complementary nucleic acid molecules that bind to these attachment regions are the same. Alternatively, the nucleic acid sequences of the attachment regions at one site are not the same. In this case, the complementary nucleic acid molecules that bind to these attachment regions are not the same.
[0151] The nucleic acid sequence containing each attachment region within the barcode domain can be approximately 8 to 20 nucleotides in length. The nucleic acid sequence containing each attachment region within the barcode domain can be approximately 12 to 14 nucleotides in length. Preferably, the nucleic acid sequence containing each attachment region within the barcode domain is approximately 14 nucleotides in length.
[0152] Each nucleic acid containing each attachment region within a barcode domain can independently be a normal base, a modified nucleotide, or a nucleic acid analog. At least one, two, three, four, five, or six nucleotides contained within the attachment regions of a barcode domain can be modified nucleotides or nucleic acid analogs. A typical ratio of modified nucleotides or nucleic acid analogs to normal bases within a barcode domain is 1:2 to 1:8. Typical modified nucleotides or nucleic acid analogs useful in attachment regions within a barcode domain are isoguanine and isocytosine. Using modified nucleotides or nucleic acid analogs (e.g., isoguanine and isocytosine) improves the binding efficiency and accuracy of the reporter to appropriate attachment regions within the barcode domain, while minimizing binding to other sites, including the target.
[0153] One or more attachment regions can be integrated with a polynucleotide backbone. That is, the backbone is a single polynucleotide, and the attachment regions are part of the sequence of the single polynucleotide. One or more attachment regions can be linked to a modified monomer (e.g., a modified nucleotide) in the synthetic backbone, and the attachment regions can be branched from the synthetic backbone. One attachment site can contain two or more attachment regions, some of which branch from the synthetic backbone, while some attachment regions are integrated with the synthetic backbone. At least one attachment region in at least one attachment site can be integrated with the synthetic backbone. Each attachment region in each of the at least one attachment site can be integrated with the synthetic backbone. At least one attachment region in at least one attachment site can be branched from the synthetic backbone. Each attachment region in each of at least three attachment sites can be branched from the synthetic backbone.
[0154] Each attachment site within the barcode domain corresponds to one of 16 dinucleotides, namely adenine-adenine, adenine-thymine / uracil, adenine-cytosine, adenine-guanine, thymine / uracil-adenine, thymine / uracil-thymine / uracil, thymine / uracil-cytosine, thymine / uracil-guanine, cytosine-adenine, cytosine-thymine / uracil, cytosine-cytosine, cytosine-guanine, guanine-adenine, guanine-thymine / uracil, guanine-cytosine, and guanine-guanine. Therefore, one or more attachment regions located within a single attachment site in the barcode domain correspond to one of the 16 dinucleotides and contain nucleic acid sequences specific to the dinucleotide to which the attachment region corresponds. Attachment regions located at different attachment sites in the barcode domain contain their own nucleic acid sequences, even if those positions within the barcode domain correspond to the same dinucleotide. For example, given a sequencing probe of this disclosure containing a target-binding domain having a hexamer encoding the sequence AGAGAC, the barcode domain of this sequencing probe would contain three positions, with the first attachment site corresponding to the adenine-guanine dinucleotide, the second attachment site corresponding to the adenine-guanine dinucleotide, and the third attachment site corresponding to the adenine-cytosine dinucleotide. The attachment region at position 1 of the probe in this example would contain a unique nucleic acid sequence different from the nucleic acid sequence of the attachment region at position 2, even if both attachment sites 1 and 2 correspond to the adenine-guanine dinucleotide. The sequences of specific attachment sites are designed, and it is investigated whether the complementary nucleic acids of each attachment site interact with other attachment sites. In addition, there are no restrictions on the nucleotide sequence of the complementary nucleic acid. Preferably, the nucleotide sequence has no substantial homology (e.g., 50% to 99.9%) to known nucleotide sequences. Doing so limits undesirable hybridization of the complementary nucleic acid and the target nucleic acid.
[0155] Figure 1 shows an example of a sequencing probe of this disclosure, including an example of a barcode domain. The barcode domain illustrated in Figure 1 includes three attachment sites R1, R2, and R3. Each attachment site corresponds to a specific dinucleotide present in the hexameric sequence (b1-b6) of the target-binding domain. In this example, R1 corresponds to positions b1 and b2, R2 corresponds to positions b3 and b4, and R3 corresponds to positions b5 and b6. Thus, each site decodes a specific dinucleotide present in the hexameric sequence of the target-binding domain, thereby enabling the identification of two specific bases (A, C, G, or T) present in each dinucleotide.
[0156] In the example of the barcode domain shown in Figure 1, each attachment site contains a single attachment region integrated into the synthetic skeleton. Each attachment region of the three attachment sites contains a specific nucleotide sequence corresponding to the individual dinucleotide encoded by that attachment site. For example, attachment site R1 contains an attachment region having a specific sequence corresponding to the attributes of dinucleotides b1-b2.
[0157] A barcode domain may further contain one or more binding regions. A barcode domain may contain at least one single-stranded nucleic acid sequence adjacent to at least one attachment site. A barcode domain may contain at least two single-stranded nucleic acid sequences adjacent to at least two attachment sites. A barcode domain may contain at least three single-stranded nucleic acid sequences adjacent to at least three attachment sites. These adjacent regions are known as "toe-holds" and can be used to accelerate the exchange rate of oligonucleotides hybridizing adjacent to toe-holds by providing additional binding sites for single-stranded oligonucleotides (see "toe-hold" probes; e.g., Seeling et al., "Catalyzed Relaxation of a Metastable DNA Fuel"; J. Am. Chem. Soc. 2006, Vol. 128(37), pp. 12211-12220).
[0158] The sequencing probes of this disclosure may have a total length of approximately 20 nanometers to approximately 50 nanometers (including a target-binding domain, a barcode domain, and an optional domain). A polynucleotide molecule containing approximately 120 nucleotides can serve as the backbone of the sequencing probe.
[0159] Sequencing probes may include cleavable linker modifications. Any cleavable linker modification known to those skilled in the art may be used. Non-limiting examples of cleavable linker modifications include UV-cleavable linkers, reducing agent-cleavable linkers, and enzymatically cleavable linkers. An example of an enzymatically cleavable linker is the insertion of uracil for cleavage by the USER® enzyme. Cleavable linker modifications can be located at any location along the length of the sequencing probe, non-limiting examples of location include the region between the target-binding domain and the barcode domain. The right-hand figure in Figure 10 shows a cleavable linker modification that can be incorporated into the probe of this disclosure.
[0160] Reporter probe
[0161] The nucleic acid molecule that binds (e.g., hybridizes) to a complementary nucleic acid sequence in at least one attachment region within at least one attachment site of the barcode domain of the sequencing probe according to this disclosure contains a detectable label (directly or indirectly). The detectable label is referred to herein as the “reporter probe” or “reporter probe complex,” and these terms are used interchangeably herein. The reporter probe may be DNA, RNA, or PNA. The reporter probe is preferably DNA.
[0162] The reporter probe may comprise at least two domains, the first of which can bind to at least one complementary nucleic acid molecule, and the second domain can bind to a first detectable label and at least a second detectable label. Figure 3 shows a schematic diagram of an example of a reporter probe according to this disclosure bound to a first attachment site of a barcode domain of an example sequencing probe. In Figure 3, the first domain of the reporter probe (shown as a chestnut checkerboard pattern) is bound to a complementary nucleic acid sequence within attachment site R1 of the barcode domain, and the second domain of the reporter probe (shown in gray) is bound to two detectable labels (one green and one red).
[0163] Alternatively, the reporter probe may contain at least two domains, the first of which can bind to at least one first complementary nucleic acid molecule, and the second domain can bind to at least one second complementary nucleic acid molecule. The at least one first complementary nucleic acid molecule and the at least one second complementary nucleic acid molecule can be different (have different nucleic acid sequences).
[0164] The "primary nucleic acid molecule" is a reporter probe comprising at least two domains, the first of which can bind (hybridize) to a complementary nucleic acid sequence in at least one attachment region within at least one attachment site of the barcode domain of the sequencing probe, and the second domain can bind (hybridize) to at least one additional complementary nucleic acid. The primary nucleic acid molecule can directly bind to the complementary nucleic acid sequence in at least one attachment region within at least one attachment site of the barcode domain of the sequencing probe. The primary nucleic acid molecule can indirectly bind to the complementary nucleic acid sequence in at least one attachment region within at least one attachment site of the barcode domain of the sequencing probe via a nucleic acid linker. The primary nucleic acid molecule may include a cleavable linker. The cleavable linker may be located between the first and second domains. Preferably, the cleavable linker can be cleaved by light.
[0165] Each nucleic acid containing the first domain of a primary nucleic acid molecule can independently be a normal base, a modified nucleotide, or a nucleic acid analog. At least one, at least two, at least three, at least four, at least five, or at least six nucleotides in the first domain of a primary nucleic acid molecule can be a modified nucleotide or a nucleotide analog. A typical ratio of modified nucleotides or nucleic acid analogs to normal bases in a barcode domain is 1:2 to 1:8. Typical modified nucleotides or nucleic acid analogs useful in the first domain of a primary nucleic acid molecule are isoguanine and isocytosine. Using modified nucleotides or nucleic acid analogs (e.g., isoguanine and isocytosine) can improve the binding efficiency and accuracy of the first domain of a primary nucleic acid molecule to a suitable complementary nucleic acid sequence in at least one attachment region within at least one attachment site of the barcode domain of a sequencing probe, while minimizing binding to other sites, including the target.
[0166] In this specification, at least one additional complementary nucleic acid that binds to a primary nucleic acid molecule is referred to as a “secondary nucleic acid molecule.” A primary nucleic acid molecule can bind (e.g., hybridize) to at least one, at least two, at least three, at least four, at least five, or more secondary nucleic acid molecules. Preferably, a primary nucleic acid molecule binds (e.g., hybridizes) to four secondary nucleic acid molecules.
[0167] The secondary nucleic acid molecule comprises at least two domains, the first of which can bind (e.g., hybridize) to at least one complementary sequence in at least one primary nucleic acid molecule, and the second domain can bind (e.g., hybridize) to (a) a first detectable label and at least a second detectable label, or (b) at least one additional complementary nucleic acid molecule, or (c) a combination thereof. The secondary nucleic acid molecule may include a cleavable linker. The cleavable linker may be located between the first and second domains. Preferably, the cleavable linker can be cleaved by light.
[0168] Each nucleic acid in the first domain of a secondary nucleic acid molecule can independently be a normal base, a modified nucleotide, or a nucleic acid analog. At least one, two, three, four, five, or six nucleotides in the first domain of a secondary nucleic acid molecule can be modified nucleotides or nucleotide analogs. A typical ratio of modified nucleotides or nucleic acid analogs to normal bases in a barcode domain is 1:2 to 1:8. Typical modified nucleotides or nucleic acid analogs useful in the first domain of a secondary nucleic acid molecule are isoguanine and isocytosine. Using modified nucleotides or nucleic acid analogs (e.g., isoguanine and isocytosine) can improve the binding efficiency and accuracy of the first domain of the secondary nucleic acid molecule to a suitable complementary nucleic acid sequence in the second domain of the primary nucleic acid molecule, while minimizing binding to other sites.
[0169] In this specification, at least one additional complementary nucleic acid that binds to a secondary nucleic acid molecule is referred to as a "tertiary nucleic acid molecule." A secondary nucleic acid molecule can bind (e.g., hybridize) to at least one, at least two, at least three, at least four, at least five, at least six, at least seven, or more tertiary nucleic acid molecules. Preferably, the at least one secondary nucleic acid molecule binds (e.g., hybridizes) to one tertiary nucleic acid molecule.
[0170] The tertiary nucleic acid molecule comprises at least two domains, the first of which can bind (e.g., hybridize) to at least one complementary sequence in at least one secondary nucleic acid molecule, and the second domain can bind (e.g., hybridize) to a first detectable label and at least a second detectable label. Alternatively, the second domain may contain the first detectable label and at least a second detectable label by directly or indirectly attaching these labels during oligonucleotide synthesis, for example, using phosphoramidites or NHS chemistry. The tertiary nucleic acid molecule may contain a cleavable linker. The cleavable linker may be located between the first and second domains. Preferably, the cleavable linker can be cleaved by light.
[0171] Each nucleic acid containing the first domain of a tertiary nucleic acid molecule is independently a normal base, a modified nucleotide, or a nucleic acid analog. At least one, two, three, four, five, or six nucleotides in the first domain of a tertiary nucleic acid can be modified nucleotides or nucleotide analogs. A typical ratio of modified nucleotides or nucleic acid analogs to normal bases in the first domain of a tertiary nucleic acid molecule is 1:2 to 1:8. Typical modified nucleotides or nucleic acid analogs useful in the first domain of a tertiary nucleic acid molecule are isoguanine and isocytosine. Using modified nucleotides or nucleic acid analogs (e.g., isoguanine and isocytosine) improves the binding efficiency and precision of the first domain of the tertiary nucleic acid molecule to appropriate complementary nucleic acid sequences in the second domain of the secondary nucleic acid molecule, while minimizing binding to other sites.
[0172] The reporter probe is coupled to a first detectable label and at least a second detectable label to produce a two-color combination. This dual combination of fluorescent dyes may include a single color overlap (e.g., blue-blue). In this specification, the term “label” includes a single portion capable of generating a detectable signal, or multiple portions capable of generating the same or substantially the same detectable signal. For example, a label may include a single yellow fluorescent dye (e.g., ALEXA FLUOR® 532), or multiple yellow fluorescent dyes (e.g., ALEXA FLUOR® 532).
[0173] The reporter probe can be bound to a first detectable label and at least a second detectable label, each of which is one of four fluorescent dyes: blue (B), green (G), yellow (Y), and red (R). Using these four dyes, there are 10 possible combinations of two colors (BB;BG;BR;BY;GG;GR;GY;RR;RY;YY). In some aspects, as shown in Figure 3, the reporter probe of this disclosure is labeled with one of the eight possible color combinations, namely BB;BG;BR;BY;GG;GR;GY;YY. The first detectable label and at least the second detectable label may have the same emission spectrum or different emission spectra.
[0174] In aspects relating to sequencing probes and primary nucleic acid molecules, the present disclosure provides a sequencing probe comprising a target-binding domain and a barcode domain; the target-binding domain comprises at least eight nucleotides capable of binding to a target nucleic acid, at least six nucleotides within the target-binding domain capable of identifying corresponding (complementary) nucleotides in the target nucleic acid molecule, and at least two nucleotides within the target-binding domain not capable of identifying corresponding nucleotides in the target nucleic acid molecule; at least one or at least two of the six nucleotides within the target-binding domain are modified nucleotides or nucleotide analogs; the barcode domain comprises a synthetic skeleton, and the barcode domain comprises at least three attachment sites. A sequencing probe is also provided, wherein each attachment site includes at least one attachment region containing at least one nucleic acid sequence to which at least one complementary primary nucleic acid molecule is bound, the complementary primary nucleic acid molecule containing a first detectable label and at least a second detectable label, each of the at least three attachment sites corresponds to two of the at least six nucleotides in the target binding domain, and each of the at least three attachment sites has a different nucleic acid sequence, and at least a first detectable label and at least a second detectable label of each complementary primary nucleic acid molecule bound to each of the at least three attachment sites determines the position and attributes of the two corresponding nucleotides of the at least six nucleotides in the target nucleic acid to which the target binding domain is bound. The at least two nucleotides in the target binding domain that do not identify the corresponding nucleotides in the target nucleic acid molecule can be any of four non-specific normal bases not specified by the at least six nucleotides in the target binding domain, or universal bases, or degenerate bases.
[0175] In aspects relating to sequencing probes and primary nucleic acid molecules, the present disclosure provides a sequencing probe comprising a target-binding domain and a barcode domain; the target-binding domain comprises at least 10 nucleotides capable of binding to a target nucleic acid, at least 6 nucleotides within the target-binding domain capable of identifying corresponding (complementary) nucleotides in the target nucleic acid molecule, and at least 4 nucleotides within the target-binding domain not capable of identifying corresponding nucleotides in the target nucleic acid molecule; the barcode domain comprises a synthetic skeleton, the barcode domain comprising at least 3 attachment sites, each attachment site capable of binding to at least one complementary primary nucleic acid molecule. A sequencing probe is also provided, comprising at least one attachment region containing a single nucleic acid sequence, wherein the complementary primary nucleic acid molecule comprises a first detectable label and at least a second detectable label, each of the at least three attachment sites corresponds to two nucleotides out of the at least six nucleotides in the target binding domain, and each of the at least three attachment sites has a different nucleic acid sequence, wherein at least the first detectable label and at least the second detectable label of each complementary primary nucleic acid molecule bound to each of the at least three attachment sites determines the position and attributes of the corresponding two nucleotides out of the at least six nucleotides in the target nucleic acid to which the target binding domain is bound.
[0176] In aspects including a sequencing probe, a primary nucleic acid molecule, and a secondary nucleic acid molecule, the present disclosure provides a sequencing probe comprising a target-binding domain and a barcode domain; the target-binding domain comprises at least eight nucleotides capable of binding to a target nucleic acid, at least six nucleotides within the target-binding domain capable of identifying a corresponding (complementary) nucleotide in the target nucleic acid molecule, at least two nucleotides within the target-binding domain not capable of identifying a corresponding nucleotide in the target nucleic acid molecule, at least one or at least two of the six nucleotides within the target-binding domain being modified nucleotides or nucleotide analogs, and at least one or at least two of the nucleotides within the target-binding domain being any of four types of canonical bases, universal bases, or degenerate bases that are not specific to a target not specified by the other bases within the target-binding domain. A sequencing probe is also provided, wherein the barcode domain comprises a synthetic skeleton, the barcode domain comprises at least three attachment sites, each attachment site comprising at least one attachment region comprising at least one nucleic acid sequence to which at least one complementary primary nucleic acid molecule is bound, the complementary primary nucleic acid molecule further comprising at least one complementary secondary nucleic acid molecule comprising a first detectable label and at least a second detectable label, each of the at least three attachment sites corresponds to two nucleotides of the at least six nucleotides in the target binding domain, and each of the at least three attachment sites has a different nucleic acid sequence, and at least a first detectable label and at least a second detectable label of each complementary secondary nucleic acid molecule bound to each of the at least three attachment sites determines the position and attributes of the corresponding two nucleotides of the at least six nucleotides in the target nucleic acid to which the target binding domain is bound.
[0177] In aspects including a sequencing probe, a primary nucleic acid molecule, and a secondary nucleic acid molecule, the present disclosure provides a sequencing probe comprising a target-binding domain and a barcode domain; the target-binding domain comprises at least 10 nucleotides capable of binding to a target nucleic acid, at least 6 nucleotides within the target-binding domain capable of identifying the corresponding (complementary) nucleotide in the target nucleic acid molecule, and at least 4 nucleotides within the target-binding domain not capable of identifying the corresponding nucleotide in the target nucleic acid molecule; the barcode domain comprises a synthetic skeleton, the barcode domain comprising at least 3 attachment sites, each attachment site comprising at least 1 nucleic acid sequence to which at least 1 complementary primary nucleic acid molecule binds. A sequencing probe is also provided, comprising at least one attachment region, wherein the complementary primary nucleic acid molecule is further bound to at least one complementary secondary nucleic acid molecule comprising a first detectable label and at least a second detectable label, each of the at least three attachment sites corresponding to two of the at least six nucleotides in the target binding domain, and each of the at least three attachment sites has a different nucleic acid sequence, wherein at least the first detectable label and at least the second detectable label of each complementary secondary nucleic acid molecule bound to each of the at least three attachment sites determines the position and attributes of the corresponding two nucleotides among the at least six nucleotides in the target nucleic acid to which the target binding domain is bound.
[0178] In aspects including a sequencing probe, a primary nucleic acid molecule, a secondary nucleic acid molecule, and a tertiary nucleic acid molecule, the present disclosure provides a sequencing probe comprising a target-binding domain and a barcode domain; the target-binding domain comprises at least eight nucleotides capable of binding to a target nucleic acid, at least six nucleotides within the target-binding domain capable of identifying a corresponding (complementary) nucleotide in the target nucleic acid molecule, and at least two nucleotides within the target-binding domain not capable of identifying a corresponding nucleotide in the target nucleic acid molecule; at least one or at least two of the six nucleotides within the target-binding domain are modified nucleotides or nucleotide analogs, and the at least two nucleotides within the target-binding domain that do not identify a corresponding nucleotide in the target nucleic acid molecule may be any of the four types of non-specific normal bases, or universal bases, or degenerate bases, that are not specified by the at least six nucleotides within the target-binding domain. A sequencing probe is also provided, wherein the barcode domain comprises a synthetic skeleton, the barcode domain comprises at least three attachment sites, each attachment site comprises at least one attachment region comprising at least one nucleic acid sequence to which at least one complementary primary nucleic acid molecule is bound, the complementary primary nucleic acid molecule further comprises at least one complementary secondary nucleic acid molecule, the at least one complementary secondary nucleic acid molecule further comprises at least one complementary tertiary nucleic acid molecule comprising a first detectable label and at least a second detectable label, each of the at least three attachment sites corresponds to two nucleotides of the at least six nucleotides in the target binding domain, and each of the at least three attachment sites has a different nucleic acid sequence, and the first detectable label and at least a second detectable label of each complementary tertiary nucleic acid molecule bound to each of the at least three attachment sites determines the position and attributes of the corresponding two nucleotides of the at least six nucleotides in the target nucleic acid to which the target binding domain is bound.
[0179] In aspects including a sequencing probe, a primary nucleic acid molecule, a secondary nucleic acid molecule, and a tertiary nucleic acid molecule, the present disclosure provides a sequencing probe comprising a target-binding domain and a barcode domain; the target-binding domain comprises at least 10 nucleotides capable of binding to a target nucleic acid, at least 6 nucleotides within the target-binding domain capable of identifying a corresponding (complementary) nucleotide in the target nucleic acid molecule, and at least 4 nucleotides within the target-binding domain not capable of identifying a corresponding nucleotide in the target nucleic acid molecule; the barcode domain comprises a synthetic skeleton, the barcode domain comprising at least 3 attachment sites, each attachment site comprising at least 1 attachment region comprising at least 1 nucleic acid sequence to which at least 1 complementary primary nucleic acid molecule binds; A sequencing probe is also provided, wherein the complementary primary nucleic acid molecule is further bound to at least one complementary secondary nucleic acid molecule, and the at least one complementary secondary nucleic acid molecule is further bound to at least one complementary tertiary nucleic acid molecule containing a first detectable label and at least a second detectable label, and each of the at least three attachment sites corresponds to two nucleotides out of the at least six nucleotides in the target binding domain, and each of the at least three attachment sites has a different nucleic acid sequence, and the first detectable label and at least a second detectable label of each complementary tertiary nucleic acid molecule bound to each of the at least three attachment sites determines the position and attributes of the corresponding two nucleotides out of the at least six nucleotides in the target nucleic acid to which the target binding domain is bound.
[0180] This disclosure also provides a sequencing probe and a reporter probe having detectable labels on the surfaces of both a secondary nucleic acid molecule and a tertiary nucleic acid molecule. For example, a secondary nucleic acid molecule can bind to a primary nucleic acid molecule, which may include both a first detectable label and at least a second detectable label, as well as to at least one tertiary nucleic acid molecule containing a first detectable label and at least a second detectable label. The first detectable label and at least a second detectable label located on the surface of the secondary nucleic acid molecule may have the same emission spectrum or different emission spectra. The first detectable label and at least a second detectable label located on the surface of the tertiary nucleic acid molecule may have the same emission spectrum or different emission spectra. The emission spectra of the detectable labels located on the surface of the secondary nucleic acid molecule may be the same as those of the detectable labels located on the surface of the tertiary nucleic acid molecule.
[0181] Figure 4 is a schematic diagram of an example of a reporter probe of this disclosure, including an example of a primary nucleic acid molecule, a secondary nucleic acid molecule, and a tertiary nucleic acid molecule. The primary nucleic acid molecule contains a first domain at its 3' end, which contains a sequence of 12 nucleotides that hybridizes to a complementary attachment region within one attachment site of the barcode domain of the sequencing probe. The 5' end is a second domain that hybridizes to six secondary nucleic acid molecules. The illustrated secondary nucleic acid molecule contains a first domain at its 5' end that hybridizes to the primary nucleic acid molecule and a domain at its 3' end that hybridizes to five tertiary nucleic acid molecules.
[0182] A tertiary nucleic acid molecule contains at least two domains. The first domain can bind to a secondary nucleic acid molecule. The second domain of the tertiary nucleic acid can bind to a first detectable label and at least a second detectable label. The second domain of the tertiary nucleic acid can bind to a first detectable label and at least a second detectable label by directly incorporating one or more fluorescently labeled nucleotide monomers into the sequence of the second domain of the tertiary nucleic acid. The second domain of the secondary nucleic acid molecule can be bound to a first detectable label and at least a second detectable label by hybridizing labeled short polynucleotides into the second domain of the secondary nucleic acid molecule. These short polynucleotides (referred to as "labeled oligos") can be labeled by directly incorporating fluorescently labeled nucleotide monomers or by other nucleic acid labeling methods known to those skilled in the art. The tertiary nucleic acid illustrated in Figure 4 can be considered a "labeled oligonucleotide," and it contains a first domain that hybridizes to a secondary nucleic acid molecule, as well as a second domain that is fluorescently labeled, for example, by indirectly attaching the label during oligonucleotide synthesis using NHS chemistry, or by incorporating one or more fluorescently labeled nucleotide monomers during the synthesis of the tertiary nucleic acid molecule. The labeled oligonucleotide can be DNA, RNA, or PNA.
[0183] Alternatively, the second domain of a secondary nucleic acid molecule can be bound to a first detectable label and at least a second detectable label. The second domain of a secondary nucleic acid molecule can be bound to a first detectable label and at least a second detectable label by directly incorporating one or more fluorescently labeled nucleotide monomers into the sequence of the second domain of the secondary nucleic acid. The second domain of a secondary nucleic acid molecule can be bound to a first detectable label and at least a second detectable label by hybridizing a labeled short polynucleotide to the second domain of the secondary nucleic acid. These short polynucleotides (referred to as "labeled oligos") can be labeled by directly incorporating fluorescently labeled nucleotide monomers or by other nucleic acid labeling methods known to those skilled in the art.
[0184] A primary nucleic acid molecule can contain approximately 100, 95, 90, 85, 80, or 75 nucleotides. A primary nucleic acid molecule can contain approximately 100 to 80 nucleotides. A primary nucleic acid molecule can contain approximately 90 nucleotides. A secondary nucleic acid molecule can contain approximately 90, 85, 80, 75, or 70 nucleotides. A secondary nucleic acid molecule can contain approximately 90 to 80 nucleotides. A secondary nucleic acid molecule can contain approximately 87 nucleotides. A secondary nucleic acid molecule can contain approximately 25, 20, 15, or 10 nucleotides. A tertiary nucleic acid molecule can contain approximately 20 to 10 nucleotides. A tertiary nucleic acid molecule can contain approximately 15 nucleotides.
[0185] The reporter probes of this disclosure can be designed in various ways. For example, a primary nucleic acid molecule can be hybridized to at least one secondary nucleic acid molecule (e.g., one, two, three, four, five, six, seven, eight, nine, ten, or more). Each secondary nucleic acid molecule can be hybridized to at least one tertiary nucleic acid molecule (e.g., one, two, three, four, five, six, seven, eight, nine, ten, or more). To produce a reporter probe labeled with a specific two-color combination, a reporter probe can be designed that includes a secondary nucleic acid molecule, or a tertiary nucleic acid molecule, or a labeled oligo, or any combination of a secondary nucleic acid molecule, a tertiary nucleic acid molecule, and a labeled oligo, labeled with each of the two colors of the specific two-color combination. For example, Figure 4 shows a reporter probe of this disclosure containing a total of 30 dyes (15 dyes for color 1 and 15 dyes for color 2). To avoid color exchange or cross-hybridization between different fluorescent dyes, each tertiary nucleic acid or labeled oligo bound to a specific label or fluorescent dye contains its own unique nucleotide sequence.
[0186] Figure 5 shows four examples of reporter probes of the present disclosure. The upper left figure of Figure 5 shows a 5×5 reporter probe. The 5×5 probe contains a primary nucleic acid, which contains a first domain consisting of 12 nucleotides. The primary nucleic acid also contains a second domain, which contains a nucleotide sequence that can hybridize into five secondary nucleic acid molecules. Each secondary nucleic acid molecule contains a nucleotide sequence that allows five tertiary nucleic acids, bound to a detectable label, to hybridize into the respective secondary nucleic acid.
[0187] The upper right diagram in Figure 5 shows a 4x3 reporter probe. The 4x3 reporter probe contains a primary nucleic acid, which has a first domain consisting of 12 nucleotides. The primary nucleic acid also contains a second domain, which has a nucleotide sequence that can hybridize to four secondary nucleic acid molecules. Each secondary nucleic acid molecule has a nucleotide sequence that allows three tertiary nucleic acid molecules, to which a detectable label is bound, to hybridize to the respective secondary nucleic acid molecule.
[0188] The lower left diagram in Figure 5 shows a 3x4 reporter probe. The 3x4 reporter probe contains a primary nucleic acid, which has a first domain consisting of 12 nucleotides. The primary nucleic acid also contains a second domain, which has a nucleotide sequence that can hybridize to three secondary nucleic acid molecules. Each secondary nucleic acid molecule has a nucleotide sequence that allows four tertiary nucleic acid molecules, to which a detectable label is bound, to hybridize to the respective secondary nucleic acid molecule.
[0189] The lower right diagram in Figure 5 shows a spacer 3x4 reporter probe. The spacer 3x4 reporter probe contains a primary nucleic acid, which contains a first domain consisting of 12 nucleotides. Between the first and second domains of the primary nucleic acid lies a spacer region consisting of 20 to 40 nucleotides. While this spacer is said to be 20 to 40 nucleotides long, there is no limit to the length of the spacer; it can consist of fewer than 20 nucleotides or more than 40 nucleotides. The second domain of the primary nucleic acid contains a nucleotide sequence that can hybridize to three secondary nucleic acid molecules. Each secondary nucleic acid contains a nucleotide sequence that allows four tertiary nucleic acid molecules, to which detectable labels are bound, to hybridize to the respective secondary nucleic acid molecule.
[0190] In Figure 5, each primary nucleic acid contains a first domain with a length of 12 nucleotides. However, there is no limit to the length of the first domain of a primary nucleic acid; it can have fewer than 12 nucleotides or more than 12 nucleotides. Preferably, the first domain of a primary nucleic acid has 14 nucleotides.
[0191] Any feature of one specific reporter probe design shown in the individual figures of Figure 5 can be combined with any feature of a reporter probe design shown in another figure of Figure 5 or described elsewhere in this specification. For example, a 5×5 reporter probe can be modified to include a spacer region consisting of approximately 20 to 40 nucleotides between the complementary nucleic acid and the primary nucleic acid. In another example, a 4×3 reporter probe can be modified to create a 4×5 reporter probe by including nucleotide sequences in the four secondary nucleic acids that allow five tertiary nucleic acids bound to a detectable label to hybridize to each of the secondary nucleic acids.
[0192] Referring to Figure 5, the fluorescence intensity of the 5x5 reporter will be higher because it contains more fluorescent labels (25) than the 4x3 reporter (12). The fluorescence detected in a given field of view (FOV) is a function of various variables (including the fluorescence intensity of a given reporter probe and, in some cases, the number of bound target molecules within the FOV). In some cases, the number of bound target molecules per FOV can range from 1 million to 2.5 million. Typical numbers of bound target molecules per FOV are 20,000–40,000, 220,000–440,000, and 1 million–2 million target molecules. A typical FOV is 0.05 mm. 2 ~1 mm 2 Another typical example of an FOV is 0.05 mm. 2 ~0.65 mm 2 That is the case.
[0193] Figure 6 shows a reporter probe design in which the secondary nucleic acid molecule includes an "additional handle." This "additional handle" does not hybridize to the tertiary nucleic acid molecule and is distal to the primary nucleic acid molecule. In Figure 6, each "additional handle" is 12 nucleotides long ("dodecamer"), but there is no limit to this length; fewer than 12 or more than 12 nucleotides are possible. Each "additional handle" can contain the nucleotide sequence of the first domain of the primary nucleic acid molecule, and the secondary nucleic acid hybridizes to this sequence. Therefore, when the reporter probe contains an "additional handle," the reporter probe can hybridize to the sequencing probe either via the first domain of the primary nucleic acid molecule or via the "additional handle." This increases the likelihood that the reporter probe will bind to the sequencing probe. This design of the "additional handle" can also improve the hybridization reaction rate. Without being constrained by any theory, the "additional handle" can increase the effective concentration of the complementary nucleic acid in the reporter probe. Using a 5×4 "additional handle" reporter probe, approximately 4750 fluorescence counts per standard FOV are expected. Using a 5×3 "additional handle" reporter probe, a 4×4 "additional handle" reporter probe, a 4×3 "additional handle" reporter probe, or a 3×4 "additional handle" reporter probe, approximately 6000 fluorescence counts per standard FOV are expected. The example reporter probe design shown in Figure 5 can also be modified to include "additional handles".
[0194] Each secondary nucleic acid molecule in a reporter probe can hybridize to multiple tertiary nucleic acid molecules, all labeled with the same detectable label. For example, the left panel of Figure 7 shows a "5×6" reporter probe. The 5×6 reporter probe contains one primary nucleic acid including a second domain, the second domain containing nucleotide sequences that hybridize to six secondary nucleic acid molecules. Each secondary nucleic acid contains a nucleotide sequence that allows five tertiary nucleic acid molecules, each bound with a detectable label, to hybridize to that secondary nucleic acid molecule. Each of the five tertiary nucleic acid molecules that bind to a particular secondary nucleic acid molecule is labeled with the same detectable label. For example, three of the secondary nucleic acid molecules are bound to tertiary nucleic acid molecules labeled with a yellow fluorescent dye, and the remaining three are bound to tertiary nucleic acid molecules labeled with a red fluorescent dye.
[0195] Each secondary nucleic acid molecule in a reporter probe can hybridize to tertiary nucleic acid molecules labeled with different detectable labels. For example, the center diagram in Figure 7 shows a "3×2×6" reporter probe design. The "3×2×6" reporter probe contains one primary nucleic acid including a second domain, the second domain containing nucleotide sequences that hybridize to six secondary nucleic acid molecules. Each secondary nucleic acid contains a nucleotide sequence that allows five tertiary nucleic acid molecules, bound to a detectable label, to hybridize to its respective secondary nucleic acid molecule. Each secondary nucleic acid binds to both tertiary nucleic acid molecules labeled with a yellow fluorescent dye and tertiary nucleic acid molecules labeled with a red fluorescent dye. In this particular example, three secondary nucleic acid molecules bind to two red tertiary nucleic acid molecules and three yellow tertiary nucleic acid molecules, while the remaining three secondary nucleic acid molecules bind to two red tertiary nucleic acid molecules and three yellow tertiary nucleic acid molecules. Each secondary nucleic acid molecule can bind to any number of tertiary nucleic acid molecules labeled with different detectable labels. In the central diagram of Figure 7, the tertiary nucleic acid molecules bound to individual secondary nucleic acid molecules are arranged so that the label colors alternate (i.e., red-yellow-red-yellow-red or yellow-red-yellow-red-yellow).
[0196] In any of the reporter probe designs described, tertiary nucleic acids labeled with different detectable labels can be arranged in any order along the secondary nucleic acids. For example, the right-hand panel of Figure 7 shows a "FRET-resistant 3×2×6" reporter probe similar to the 3×2×6 reporter probe design, differing only in the arrangement (e.g., linear order or grouping) of the red and yellow tertiary nucleic acid molecules along each secondary nucleic acid molecule.
[0197] Figure 8 shows yet another reporter probe design of this disclosure, including individual secondary nucleic acid molecules that bind to various tertiary nucleic acid molecules. The left figure shows a “6 × 1 × 4.5” reporter probe containing one primary nucleic acid molecule, the primary nucleic acid molecule containing a second domain, which contains nucleotide sequences hybridized to six secondary nucleic acid molecules. Each secondary nucleic acid molecule hybridizes to five tertiary nucleic acid molecules. Four of the five tertiary nucleic acid molecules hybridized to each secondary nucleic acid molecule are directly labeled with a detectable label of the same color. The fifth tertiary nucleic acid molecule (referred to as branched tertiary nucleic acid) is bound to five labeled oligos that have become the other color of the two-color combination. Three of the six secondary nucleic acids are bound to branched tertiary nucleic acids labeled with one color of the two-color combination (red in this example), while the remaining three secondary nucleic acids are bound to branched tertiary nucleic acids labeled with the other color of the two-color combination (yellow in this example). In summary, the 6×1×4.5 reporter probe is labeled with a total of 54 dyes, 27 of which are for each color. The central figure in Figure 8 shows the "4×1×4.5" reporter probe. This 4×1×4.5 reporter probe shares the same overall architecture as the 6×1×4.5 reporter probe, but differs in that the primary nucleic acid binds to only four secondary nucleic acids, resulting in a total of 36 dyes, 18 of which are for each color.
[0198] A reporter probe can contain an equal number of dyes for each color in a two-color combination. Alternatively, a reporter probe can contain different numbers of dyes for each color in a two-color combination. The choice of which color dye to use more of within a single reporter probe can be based on the energy levels of light absorbed by the two dyes. For example, the right-hand diagram in Figure 8 shows a "5x5 energy optimized" reporter probe design. This reporter probe design contains 15 yellow dyes (higher energy) and 10 red dyes (lower energy). In this example, the 15 yellow dyes can constitute the first marker, and the 10 red dyes can constitute the second marker.
[0199] A detectable moiety, label, or reporter can be attached to a secondary nucleic acid molecule, a tertiary nucleic acid molecule, or a labeled oligo in various ways, including direct or indirect attachment of the detectable moiety (fluorescent moiety, colorimetric moiety, etc.). Those skilled in the art can refer to references on labeling nucleic acids. Non-limiting examples of fluorescent moieties include yellow fluorescent protein (YFP), green fluorescent protein (GFP), cyan fluorescent protein (CFP), red fluorescent protein (RFP), umbelliferone, fluorescein, fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, cyanine, dansyl chloride, phycocyanin, and phioeritrin.
[0200] Fluorescent labeling and the attachment of such fluorescent labels to nucleotides and / or oligonucleotides are described in numerous publications, including Haugland, *Handbook of Fluorescent Probes and Research Chemicals*, 9th edition (Molecular Probes, Inc., Eugene, 2002); Keller and Manak, *DNA Probes*, 2nd edition (Stockton Press, New York, 1993); Eckstein (ed.), *Oligonucleotides and Analogues: A Practical Approach* (IRL Press, Oxford, 1991); and Wetmur, *Critical Reviews in Biochemistry and Molecular Biology*, Vol. 26: pp. 227-259 (1991). Specific methods applicable to this disclosure are disclosed in U.S. Patents No. 4,757,141; No. 5,151,507; and No. 5,091,519, which are sample references. One or more fluorescent dyes can be used as labels for labeled target sequences. This is disclosed in, for example, U.S. Patent No. 5,188,934 (4,7-dichlorofluorescein dye); No. 5,366,860 (spectrally decomposable rhodamine dye); No. 5,847,162 (4,7-dichlororhodamine dye); No. 4,318,846 (ether-substituted fluorescein dye); No. 5,800,996 (energy transfer dye); Lee et al. No. 5,066,580 (xanthine dye); No. 5,688,648 (energy transfer dye), etc. Labeling can also be done using quantum dots. This is disclosed, for example, in United States Patent Nos. 6,322,901; 6,576,291; 6,423,551; 6,251,303; 6,319,426; 6,426,513; 6,444,143; 5,990,479; 6,207,392; and in United States Patent Application Publications 2002 / 0045045 and 2003 / 0017264.In this specification, the term "fluorescent labeling" includes signal transduction segments that transmit information through the fluorescence absorption and / or fluorescence emission properties of one or more molecules. Such fluorescence properties include fluorescence intensity, fluorescence lifetime, emission spectral properties, and energy transfer.
[0201] Non-exclusive examples of commercially available fluorescent nucleotide analogs that readily incorporate into nucleotide and / or oligonucleotide sequences include Cy3-dCTP, Cy3-dUTP, Cy5-dCTP, Cy5-dUTP (Amersham Biosciences, Piscataway, New Jersey), fluorescein-12-dUTP, tetramethylrhodamine-6-dUTP, TEXAS RED®-5-dUTP, CASCADE BLUE®-7-dUTP, BODIPY™FL-14-dUTP, BODIPY™MR-14-dUTP, BODIPY™TR-14-dUTP, RHODAMINE GREEN®-5-dUTP, OREGON GREEN®488-5-dUTP, TEXAS RED®-12-dUTP, and BODIPY™ 630 / 650-14-dUTP, BODIPY™ 650 / 665-14-dUTP, ALEXA FLUOR™ 488-5-dUTP, ALEXA FLUOR™ 532-5-dUTP, ALEXA FLUOR™ 568-5-dUTP, ALEXA FLUOR™ 594-5-dUTP, ALEXA FLUOR™ 546-14-dUTP, Fluorescein-12-UTP, Tetramethylrhodamine-6-UTP, TEXAS RED™-5-UTP, mCherry, CASCADE BLUE™-7-UTP, BODIPY™ FL-14-UTP, BODIPY™ MR-14-UTP, BODIPY™ TR-14-UTP, RHODAMINE Examples include GREEN(trademark)-5-UTP, ALEXA FLUOR(trademark) 488-5-UTP, and LEXA FLUOR(trademark) 546-14-UTP (Molecular Probes, Inc., Eugene, Oregon). Alternatively, the above-mentioned phosphors and those mentioned herein can be added during oligonucleotide synthesis, for example, using phosphoramidites or NHS chemistry. Protocols for the custom synthesis of nucleotides with other phosphors are known in this field (see Henegariu et al. (2000) Nature Biotechnol. Vol. 18: p. 345).2-aminopurines are fluorescent bases that can be directly incorporated into oligonucleotide sequences during synthesis. Nucleic acids can also be pre-stained with intercalating dyes (such as DAPI, YOYO-1, ethidium bromide, or cyanine dyes (e.g., SYBR Green)).
[0202] Other phosphors that can be used for attachment after synthesis include, but are not limited to, ALEXA FLUOR® 350, ALEXA FLUOR® 405, ALEXA FLUOR® 430, ALEXA FLUOR® 532, ALEXA FLUOR® 546, ALEXA FLUOR® 568, ALEXA FLUOR® 594, ALEXA FLUOR® 647, BODIPY 493 / 503, BODIPY FL, BODIPY R6G, BODIPY 530 / 550, BODIPY TMR, BODIPY 558 / 568, BODIPY 558 / 568, BODIPY 564 / 570, BODIPY 576 / 589, BODIPY 581 / 591, BODIPY TR, BODIPY These include 630 / 650, BODIPY 650 / 665, Cascade Blue, Cascade Yellow, Dansyl, Lissamin Rhodamine B, Marina Blue, Oregon Green 488, Oregon Green 514, Pacific Blue, Pacific Orange, Rhodamine 6G, Rhodamine Green, Rhodamine Red, Tetramethylrhodamine, Texas Red (available from Molecular Probes, Inc., Eugene, Oregon), Cy2, Cy3, Cy3.5, Cy5, Cy5.5, Cy7 (Amersham Biosciences, Piscataway, New Jersey), etc. FRET tandem phosphors can also be used, and the non-limiting examples include PerCP-Cy5.5, PE-Cy5, PE-Cy5.5, PE-Cy7, PE-Texas Red, APC-Cy7, PE-Alexa dyes (610, 647, 680), and APC-Alexa dyes.
[0203] Silver or gold particles, which are metallic particles, can be used to enhance signals from fluorescently labeled nucleotide and / or oligonucleotide sequences (Lakowicz et al. (2003) BioTechniques Vol. 34: p. 62).
[0204] Other labels suitable for oligonucleotide sequences can be included, such as fluorescein (FAM, FITC), digoxigenin, dinitrophenol (DNP), dansyl, biotin, bromodeoxyuridine (BrdU), hexahistidine (6×His), and phospho-amino acids (e.g., P-tyr, P-ser, P-thr). The following hapten / antibody pairs, namely biotin / α-biotin, digoxigenin / α-digoxigenin, dinitrophenol (DNP) / α-DNP, and 5-carboxyfluorescein (FAM) / α-FAM, can be used for detection. Each antibody in the aforementioned pairs is derivatized with a detectable label.
[0205] The detectable labels described herein are spectrally decomposable. “Spectrally decomposable” with respect to multiple fluorescent labels means that the fluorescence emission bands of those labels are sufficiently distinct, i.e., non-overlapping, so that molecular tags to which each label is attached can be identified by a standard photodetector system, such as a system consisting of a band-pass filter and a photomultiplier tube, based on the fluorescence signal emitted from each label. Examples of such systems are described in U.S. Patents 4,230,558; 4,811,218, etc., or in Wheeles et al., pp. 21–76, in *Flow Cytometry: Instrumentation and Data Analysis* (Academic Press, New York, 1985). Spectrometrically decomposable organic dyes (fluorescein, rhodamine, etc.) mean that their maximum emission wavelengths are at least 20 nm apart from each other, or, in other words, at least 40 nm apart. For chelated lanthanide compounds and quantum dots, spectrally decomposable means that their maximum emission wavelengths are at least 10 nm apart from each other, or at least 15 nm apart.
[0206] The reporter probe can include one or more cleavable linker modifications. The one or more cleavable linker modifications can be located anywhere within the reporter probe. The cleavable linker modification can be located between the first domain and the second domain of the primary nucleic acid molecule of the reporter probe. FIG. 9 shows an example of a reporter probe of the present disclosure that includes a linker modification between the first domain and the second domain of the primary nucleic acid molecule. The cleavable linker modification can be present between the first domain and the second domain of the secondary nucleic acid molecule of the reporter probe. The cleavable linker modification can be present between the first domain and the second domain of the primary nucleic acid molecule and the secondary nucleic acid molecule of the reporter probe. The left diagram of FIG. 10 shows an example of a reporter probe of the present disclosure that includes a cleavable linker modification between the first domain and the second domain of the primary nucleic acid and between the first domain and the second domain of the secondary nucleic acid.
[0207] As the cleavable linker modification, a compound of formula (I):
Chemical formula
[0208] In one aspect, R1 is C 1-6 It is alkyl, C 1-3 Alkyl (methyl, ethyl, propyl, isopropyl, etc.) is preferred; R2 is NH or N; R3 is a 5-membered or 6-membered cycloalkyl, preferably cyclohexyl; R4 is C 1-6 It is alkyl, C 1-3 Alkylenes (such as methylene, ethylene, propylene, and isopropylene) are preferred; R5 is a 5- or 6-membered heterocycline containing one nitrogen atom and zero or one additional heteroatoms selected from N, O, and S, wherein the heterocycline may optionally contain one or two R 10 It is substituted with; R6 is O; R7 is C 1-6 It is alkyl, C 1-3 Alkylenes (such as methylene, ethylene, propylene, and isopropylene) are preferred; R8 is O; R9 is a 5-membered or 6-membered heterocycline containing one nitrogen atom and zero or one additional heteroatoms selected from N, O, and S, wherein the heterocycline may optionally contain one or two R 10 It is replaced by; each R 10 These are independently halogen and C 1-6 Alkyl, Halo C 1-6 Alkyl, oxo, -SO2H, -SO3 - It is one of the following:
[0209] On one side, R3 is cyclohexyl, R4 is methylene, R5 is 1H-pyrrole-2,5-dione, R9 is pyrrolidine-2,5-dione, and optionally SO3 - is substituted.
[0210] As a linker compound,
Chemical Formula
[0211] As a linker compound,
Chemical Formula
[0212] As a linker compound or linker modification,
Chemical Formula
[0213] As a linker compound or linker modification,
Chemical Formula
[0214] The reporter probe can be constituted by mixing three kinds of storage solutions with water. One storage solution contains a primary nucleic acid molecule, one storage solution contains a secondary nucleic acid molecule, and the last storage solution contains a tertiary nucleic acid molecule. Table 2 shows examples of the amounts of each storage solution that can be mixed to constitute the reporter probe of each design.
Table 2
[0215] target nucleic acid
[0216] The present disclosure provides a method for sequencing nucleic acids using the sequencing probes of the present disclosure. The nucleic acid to be sequenced using the method of the present disclosure is referred to herein as a "target nucleic acid". The term "target nucleic acid" means a nucleic acid molecule (DNA, RNA, PNA) whose sequence is determined by the probes, methods, and devices of the present disclosure. Generally, the terms "target nucleic acid", "target nucleic acid molecule", "target nucleic acid sequence", "target nucleic acid fragment", "target oligonucleotide", and "target polynucleotide" are used interchangeably, and it is assumed that the non-limiting examples above include nucleotides in polymer form (deoxyribonucleotides or ribonucleotides, or analogs thereof) that can have various lengths. Non-limiting examples of nucleic acids include genes, gene fragments, exons, introns, intergenic DNA (including heterochromatic DNA in the non-limiting examples above), messenger RNA (mRNA), transfer RNA, ribosomal RNA, ribozymes, small interfering RNA (siRNA), non-coding RNA (ncRNA), cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of a certain sequence, isolated RNA of a certain sequence, nucleic acid probes, and primers. Before sequencing using the method of the present disclosure, the attributes and / or sequence of the target nucleic acid are known. Alternatively, the attributes and / or sequence are unknown. It is also possible that a part of the sequence of the target nucleic acid is known before sequencing using the method of the present disclosure. For example, this method can be used to determine point mutations in known target nucleic acid molecules.
[0217] The methods of this disclosure preferably directly sequence nucleic acid molecules obtained from a sample (e.g., a sample from a biological organism) without any conversion (or amplification) steps. For example, to perform direct RNA-based sequencing, the methods of this disclosure do not require conversion from RNA molecules to DNA molecules (i.e., through cDNA synthesis) before a sequence can be obtained. Because no amplification or conversion is required, the nucleic acids sequenced in this disclosure will retain any unique bases and / or epigenetic markers present in the nucleic acid, whether present in or obtained from a sample. Such unique bases and / or epigenetic markers are lost in sequencing methods known in the art.
[0218] The method of this disclosure enables sequencing at single-molecule resolution. In other words, the method of this disclosure allows users to generate a final sequence based on data recovered from a single target nucleic acid molecule, without the need to combine data from different target nucleic acid molecules, thus preserving all the unique characteristics of the particular target.
[0219] Target nucleic acids can be obtained from any sample or source of nucleic acids (e.g., any cells, tissues, organisms, in vitro, chemical synthesis equipment, etc.). Target nucleic acids can be obtained by any method recognized in this art. Target nucleic acids can be obtained from blood samples of clinical subjects. Target nucleic acids can be extracted, isolated, or purified from sources or samples using methods and kits well known in this art.
[0220] The target nucleic acid can be fragmented by any means known in the art. Fragmentation is preferably carried out by enzymatic or mechanical means. Mechanical means include sonication or physical shearing. Enzymatic means can be carried out by digestion using a nuclease (e.g., deoxyribonuclease I (DNase I)) or one or more restriction endonucleases.
[0221] When the nucleic acid molecule containing the target nucleic acid is a single complete chromosome, a step should be taken to avoid fragmentation of the chromosome.
[0222] As is well known in this field, target nucleic acids may include natural or non-natural nucleotides, and these may include modified nucleotides or nucleic acid analogs.
[0223] Target nucleic acid molecules that can be included are DNA molecules, RNA molecules, and PNA molecules with lengths of up to several hundred kilobases (e.g., 1 kilobase, 2 kilobases, 3 kilobases, 4 kilobases, 5 kilobases, 10 kilobases, 20 kilobases, 30 kilobases, 40 kilobases, 50 kilobases, 100 kilobases, 200 kilobases, 500 kilobases, or larger).
[0224] Capture probe
[0225] The target nucleic acid can be immobilized on the substrate (for example, at one, two, three, four, five, six, seven, eight, nine, ten, or more positions).
[0226] Examples of useful substrates include those containing binding sites, the selection of which is made from a group consisting of ligands, light sources, carbohydrates, nucleic acids, receptors, lectins, and antibodies. Capture probes contain substrate binding sites that can bind to the binding sites on the substrate. Non-limiting examples of useful substrates containing reactive sites include surfaces containing any of the following: epoxy, aldehyde, gold, hydrazide, sulfurhydryl, NHS ester, amine, alkyne, azide, thiol, carboxylate, maleimide, hydroxymethylphosphine, imide ester, isocyanate, hydroxyl, pentafluorophenyl ester, psoralen, pyridyl disulfide or vinyl sulfone, polyethylene glycol (PEG), hydrogel, or mixtures thereof. Such surfaces can be obtained from commercially available sources or prepared according to standard techniques. Non-limiting examples of useful substrates containing reactive sites include the OptArray-DNA NHS group (Accler8), Nexterion Slide AL (Schott), and Nexterion Slide E (Schott).
[0227] Any rigid support known in the art that can immobilize target nucleic acids (e.g., coated slides or microfluidic devices) can be used as a substrate. Possible substrates include surfaces, films, beads, porous materials, electrodes, and arrays. Examples of substrates include polymer materials, metals, silicon, glass, and quartz. Target nucleic acids can be immobilized on the surface of any substrate obvious to those skilled in the art.
[0228] When the substrate is an array, the substrate may contain wells, the size and spacing of which will vary depending on the target nucleic acid molecules to be attached. In one example, the substrate is configured to accommodate an array of target nucleic acids ordered to an ultra-high density. An example of the density of the target nucleic acid array on the substrate is 1 mm 2 This corresponds to 500,000 to 10,000,000 target nucleic acid molecules per unit, or 1,000,000 to 4,000,000 target nucleic acid molecules, or 850,000 to 3,500,000 target nucleic acid molecules per unit.
[0229] The wells in the substrate are the positions where target nucleic acid molecules are attached. The surface of the wells is functionalized using the reactive parts described above, and specific chemical groups present on the surface of the target nucleic acid molecule or on the surface of a capture probe bound to the target nucleic acid molecule are attracted and bound to it, thereby attracting, immobilizing, and binding the target nucleic acid molecule. These functional groups are well known to be able to specifically attract and bind biomolecules through various conjugation chemistry processes.
[0230] To sequence a single nucleic acid molecule on a substrate (such as an array), a universal capture probe, or a universal sequence complementary to the substrate-binding portion of the capture probe, is attached to each well. Then, a single target nucleic acid molecule is bound to the universal capture probe, or to the universal sequence complementary to the substrate-binding portion of the capture probe and bound to the capture probe, thereby initiating sequencing.
[0231] One or more capture probes (i.e., two, three, four, five, six, seven, eight, nine, ten, or more) can be attached to a target nucleic acid molecule. Each capture probe contains a domain complementary to a portion of the target nucleic acid and a domain containing a substrate-binding portion. The portion of the target nucleic acid complementary to the capture probe can be the terminal portion or a portion not near the terminal.
[0232] Biotin can be used as the substrate binding portion of the capture probe, and avidin (e.g., streptavidin) can be used as the substrate. Useful substrates containing avidin are commercially available, including TB0200 (Accler8), SAD6, SAD20, SAD100, SAD500, SAD2000 (Xantec), SuperAvidin (Array-It), streptavidin slide (catalog number MPC 000, Xenopore), and STREPTAVIDINnslide (catalog number 439003, Greiner Bio-one). Avidin (e.g., streptavidin) can be used as the substrate binding portion of the capture probe, and biotin can be used as the substrate. Non-exclusive examples of commercially available useful substrates containing biotin include Optiarray-Biotin (Accler8), BD6, BD20, BD100, BD500, and BD2000 (Xantec).
[0233] A reactive portion that can bind to the substrate by photoactivation is possible as the substrate-binding portion of the capture probe. The substrate may contain a photoreactive portion, or the first portion of the nanoreporter may contain a photoreactive portion. Some examples of photoreactive portions include aryl azides (such as N((2-pyridyldithio)ethyl)-4-azidosalicylamide); fluorinated aryl azides (such as 4-azido-2,3,5,6-tetrafluorobenzoic acid); benzophenone-based reagents (such as 4-benzoylbenzoate succinimidyl); and 5-bromodeoxyuridine.
[0234] The substrate binding portion of the capture probe can be a nucleic acid capable of hybridizing to a complementary binding portion of the substrate. Each nucleic acid comprising the substrate binding portion of the capture probe can independently be a normal base, a modified nucleotide, or a nucleic acid analog. At least one, at least two, at least three, at least four, at least five, or at least six nucleotides in the substrate binding portion of the capture probe can be a modified nucleotide or nucleic acid analog. The typical ratio of modified nucleotides or nucleic acid analogs to normal bases in the substrate binding portion of the capture probe is 1:2 to 1:8. Typical modified nucleotides or nucleic acid analogs useful in the substrate binding portion of the capture probe are isoguanine and isocytosine.
[0235] The substrate binding portion of the capture probe can be immobilized on the substrate via other binding pairs obvious to those skilled in the art. After binding to the substrate, the target nucleic acid can be stretched by applying sufficient force to elongate it (e.g., gravity, hydrodynamic force, electromagnetic force "electrostretching", flow-stretching, receding meniscus technique, or a combination thereof). The capture probe may include or be associated with a detectable label (i.e., a reference spot).
[0236] A second capture probe containing a domain complementary to the second portion of the target nucleic acid can be bound to the target nucleic acid. The second portion of the target nucleic acid to which the second capture probe binds is different from the first portion to which the first capture probe binds. This portion can be the terminal or non-terminal portion of the target nucleic acid. Binding of the second capture probe can occur after or during the elongation of the target nucleic acid, or on a target nucleic acid that has not yet finished elongating. The second capture probe can have the binding described above.
[0237] The target nucleic acid can be bound to a third, or fourth, or fifth, or sixth, or seventh, or eighth, or ninth, or tenth capture probe that includes a domain complementary to the third, or fourth, or fifth, or sixth, or seventh, or eighth, or ninth, or tenth portion of the target nucleic acid. As this portion, it is possible to use the end of the target nucleic acid or a portion not near the end. The binding of the third, or fourth, or fifth, or sixth, or seventh, or eighth, or ninth, or tenth capture probe can occur after, or during, the extension of the target nucleic acid, or can occur with respect to a target nucleic acid that has not finished extending. The third, or fourth, or fifth, or sixth, or seventh, or eighth, or ninth, or tenth capture probe can have the binding as described above.
[0238] The capture probe can isolate the target nucleic acid from the sample. Here, the capture probe is added to the sample containing the target nucleic acid. The capture probe binds to the target nucleic acid through a region complementary to one region of the target nucleic acid among the capture probes. When the substrate to which the substrate-binding portion of the capture probe binds contacts the target nucleic acid, the nucleic acid is immobilized on the surface of the substrate.
[0239] Figure 11 illustrates the capture of a single target nucleic acid using two capture probe systems according to this disclosure. Genomic DNA is denatured at 95°C and hybridized into a pool of capture reagents. This pool of capture reagents contains oligonucleotides, probe A, probe B, and an antisense block probe. Probe A contains a biotin moiety at its 3' end and a sequence complementary to the 5' end of the target nucleic acid. Probe B contains a purified binding sequence to which a paramagnetic bead can be bound at its 5' end and a nucleotide sequence complementary to the 3' end of the target nucleic acid. The antisense block probe contains a nucleotide sequence complementary to the antisense strand of the target nucleic acid to be sequenced. After hybridization with the capture reagents, a sequencing window is created in the target nucleic acid between the hybridized probes A and B. The target nucleic acid is purified using a paramagnetic bead that binds to the 5' sequence of probe B. After washing away any excess capture reagent or complementary antisense DNA strands, the desired target nucleic acid is purified. Next, the purified target nucleic acid is passed through a flow cell containing a surface (such as streptavidin) that can bind to the biotin portion of the hybridized probe A. This causes one end of the target nucleic acid to bind to the surface of the flow cell. To capture the other end, the target nucleic acid is flow-stretched, and a biotinylated probe complementary to the purified binding sequence of probe B is added. The biotinylated probe, upon hybridization with the purified binding sequence of probe B, can bind to the surface of the flow cell, resulting in a captured target nucleic acid that is elongated and bound to the surface of the flow cell at both ends.
[0240] To ensure that users can reliably "capture" as many target nucleic acid molecules as possible from highly fragmented samples, it is useful to include multiple capture probes, each complementary to a different region of the target nucleic acid. For example, three pools of capture probes are possible: the first pool complementary to the region near the 5' end of the target nucleic acid, the second pool complementary to the central region, and the third pool complementary to the region near the 3' end. This can be generalized to "n regions of interest" per target nucleic acid. In this example, each pool of fragmented target nucleic acid binds to a capture probe that contains or is bound to a biotin tag. One-n of the input sample (where n is the number of different regions in the target nucleic acid) is isolated in each pool chamber. The capture probes bind to the target nucleic acids of interest. The target nucleic acids are then immobilized on avidin molecules attached to a substrate via the biotin in the capture probes. In some cases, the target nucleic acids are extended, for example, by hydrodynamic or electrostatic forces. To simultaneously extend and bind all n pools, or to maximize the number of fully extended molecules, pool 1 (which captures most of the 5' region) can be extended and bound first, followed by pool 2 (which captures the central region of the target), and finally pool 3.
[0241] This disclosure enables users to capture and sequence multiple target nucleic acids simultaneously, and to hybridize multiple capture probes into a sample containing a mixture of multiple target nucleic acids. Multiple target nucleic acids can include groups of two or more nucleic acids, each containing the same sequence, or groups of two or more nucleic acids, each not necessarily containing the same sequence. Similarly, multiple target nucleic acids can include groups of two or more capture probes with the same sequence, or groups of two or more capture probes with different sequences. For example, using multiple capture probes, each containing the same sequence, allows users to capture multiple target nucleic acids, each containing the same sequence. Sequencing these multiple target nucleic acids, each containing the same sequence, can achieve a high level of sequencing accuracy due to data redundancy. In another example, using a set of capture probes, each complementary to a gene of interest, allows simultaneous capture and sequencing of two or more specific genes of interest. This enables users to perform multiple sequencing of specific genes. Figure 12 shows the results from an experiment in which a multi-cancer panel consisting of 100 targets using FFPE samples was captured and detected by the method of this disclosure.
[0242] When sequencing the entire range is desirable, the number of separate capture probes required is inversely related to the size of the target nucleic acid fragment. In other words, highly fragmented target nucleic acids will require more capture probes. For types of samples containing highly fragmented and degraded target nucleic acids (e.g., formalin-fixed paraffin-embedded tissue), including a large pool of capture probes may be useful. Conversely, for samples with long target nucleic acid fragments (e.g., isolated nucleic acids obtained in vitro), one capture probe at the 5' end may suffice.
[0243] The region of a target nucleic acid between two capture probes, or the region after one capture probe and before the end of the target nucleic acid, is called the "sequencing window." Figure 11 shows the sequencing window that occurs when one target nucleic acid is captured using two capture probes. The sequencing window is the portion of the target nucleic acid that the sequencing probe can use for binding. The minimum sequencing window is the length of the target binding domain (e.g., 4 to 10 nucleotides), and the maximum sequencing window is most of an entire chromosome.
[0244] When sequencing large target nucleic acid molecules using the method of this disclosure, the size of the sequencing window can be controlled by hybridizing one or more “blocker oligos” along the length of the target nucleic acid. The blocker oligos hybridize to the target nucleic acid at specific locations, thereby preventing sequencing probes from binding to those locations and creating a smaller sequencing window of interest. Figure 13 shows a schematic diagram in which two captured target DNA molecules hybridize to a capture probe, blocker oligos, and a sequencing probe. By creating a smaller sequencing window, the sequencing reaction is limited to specific regions of the target DNA molecule of interest, thereby improving the speed and accuracy of sequencing. The use of blocker oligos is particularly useful when sequencing specific mutations at known locations within the target nucleic acid, because it is not necessary to sequence the entire target nucleic acid. An example in Figure 13 shows sequencing targeting two heterologous sites with the aim of distinguishing between two different haplotypes.
[0245] Method of Disclosure
[0246] The sequencing method of this disclosure includes reversibly hybridizing at least one sequencing probe disclosed herein to a target nucleic acid.
[0247] Nucleic acid sequencing methods may include (1) hybridizing a sequencing probe described herein to a target nucleic acid. The target nucleic acid may optionally be immobilized on a substrate at one or more positions. An example of a sequencing probe may include a target-binding domain and a barcode domain; the target-binding domain contains at least eight nucleotides hybridized to the target nucleic acid, wherein at least six nucleotides in the target-binding domain can identify corresponding nucleotides in the target nucleic acid molecule (for example, when the sequence of the target-binding domain is exactly six nucleotides, these six nucleotides identify six nucleotides complementary to the target molecule hybridized by the target-binding domain), and at least two nucleotides in the target-binding domain do not identify corresponding nucleotides in the target nucleic acid molecule (for example, these at least two nucleotides do not identify two nucleotides complementary to the target molecule hybridized by the target-binding domain); the target At least two of the at least six nucleotides in the binding domain are modified nucleotides or nucleotide analogs; the barcode domain comprises a synthetic skeleton and includes at least three attachment sites, each attachment site comprising at least one attachment region containing at least one nucleic acid sequence to which a complementary nucleic acid molecule can bind; each of the at least three attachment sites corresponds to two of the at least six nucleotides in the target binding domain, and each of the at least three attachment sites has a different nucleic acid sequence, the nucleic acid sequence at each of the at least three attachment sites determines the position and attributes of the corresponding two nucleotides among the at least six nucleotides in the target nucleic acid to which the target binding domain binds.
[0248] This method includes: (2) hybridizing a sequencing probe to a target nucleic acid; (3) binding a first complementary nucleic acid molecule containing a first detectable label and at least a second detectable label to a first attachment site among at least three attachment sites of the barcode domain; (4) detecting the first detectable label and at least a second detectable label of the bound first complementary nucleic acid molecule; and (5) identifying the positions and attributes of at least two nucleotides in the immobilized target nucleic acid. For example, when the first complementary nucleic acid molecule contains two detectable labels, those two detectable labels identify the at least two nucleotides in the immobilized nucleic acid molecule.
[0249] After detecting the at least two detectable labels, the at least two detectable labels are removed from the first complementary nucleic acid molecule. Thus, this method further includes (5) freeing the first complementary nucleic acid molecule containing the detectable labels by binding the first hybridizing nucleic acid molecule lacking the detectable labels to the first attachment site, or bringing the first complementary nucleic acid molecule containing the detectable labels into contact with a force sufficient to release the first detectable label and at least the second detectable label. Thus, after step (5), the first detectable label is not bound to the first attachment site. The method further comprises (6) binding a second complementary nucleic acid molecule containing a third detectable label and at least a fourth detectable label to a second attachment site among at least three attachment sites of the barcode domain; (7) detecting the third detectable label and at least a fourth detectable label on the bound second complementary nucleic acid molecule; (8) optionally identifying the positions and attributes of at least two nucleotides in the immobilized target nucleic acid; (9) identifying a linear sequence of at least six nucleotides for at least a first region of the immobilized target nucleic acid hybridized to the target binding domain of the sequencing probe by repeating steps (5) to (8) until a complementary nucleic acid molecule containing two detectable labels is bound to each of the at least three attachment sites in the barcode domain and the two detectable labels on the bound complementary nucleic acid molecule are detected; and (10) optionally removing the sequencing probe from the immobilized target nucleic acid.
[0250] This method further involves (11) hybridizing a second sequencing probe to a target nucleic acid immobilized on a substrate at one or more locations (where the target-binding domains of the first sequencing probe and the second sequencing probe are different); (12) binding a first complementary nucleic acid molecule containing a first detectable label and at least a second detectable label to a first attachment site among at least three attachment sites of the barcode domain; (13) detecting the first detectable label and at least a second detectable label of the bound first complementary nucleic acid molecule; (14) optionally identifying the locations and attributes of at least two nucleotides in the immobilized target nucleic acid; and (15) freeing the first complementary nucleic acid molecule or complex containing the detectable label by binding a first hybridizing nucleic acid molecule lacking the detectable label to the first attachment site, or releasing the first complementary nucleic acid molecule or complex containing the detectable label from the first complementary nucleic acid molecule or complex containing the detectable label. (16) Applying sufficient force to make contact; (17) Binding a second complementary nucleic acid molecule containing a third detectable label and at least a fourth detectable label to a second attachment site of at least three attachment sites of the barcode domain; (18) Detecting the third detectable label and at least a fourth detectable label of the bound second complementary nucleic acid molecule; (19) Identifying the location and attribute of at least two nucleotides in the immobilized target nucleic acid; (10) Identifying a linear sequence of at least six nucleotides for at least a second region of the immobilized target nucleic acid hybridized to the target binding domain of the sequencing probe by repeating steps (15) to (18) until a complementary nucleic acid molecule containing two detectable labels is bound to each of the at least three attachment sites in the barcode domain and the two detectable labels of the bound complementary nucleic acid molecule are detected; (11) optionally removing the second sequencing probe from the immobilized target nucleic acid.
[0251] This method may further include identifying the sequence of the immobilized target nucleic acid by reconstructing the linear order of nucleotides identified in at least a first region and at least a second region of the immobilized target nucleic acid, respectively.
[0252] Steps (5) and (6) can be performed sequentially or simultaneously. The first detectable label and at least the second detectable label may have the same emission spectrum or different emission spectra. The third detectable label and at least the fourth detectable label may have the same emission spectrum or different emission spectra.
[0253] The first complementary nucleic acid may contain a cleavable linker. The second complementary nucleic acid may contain a cleavable linker. The first and second complementary nucleic acids may each contain a cleavable linker. The cleavable linker is preferably cleavable by light. Light can be used as the emission force. UV light is preferred. Light can be provided by a light source selected from the group consisting of arc lamps, lasers, focused UV light sources, and light-emitting diodes.
[0254] The first complementary nucleic acid and the first hybridizing nucleic acid lacking a detectable label can contain the same nucleic acid sequence. For example, the first hybridizing nucleic acid lacking a detectable label can contain the same nucleic acid sequence as the portion of the first complementary nucleic acid molecule that binds to the first attachment site among at least three positions in the barcode domain. The first hybridizing nucleic acid lacking a detectable label can contain a nucleic acid sequence complementary to the flanking single-stranded polynucleotide adjacent to the first attachment site in the barcode domain.
[0255] The second complementary nucleic acid and the second hybridizing nucleic acid lacking a detectable label may contain the same nucleic acid sequence. The second hybridizing nucleic acid lacking a detectable label may contain a nucleic acid sequence complementary to the flanking single-stranded polynucleotide adjacent to the second attachment site in the barcode domain.
[0256] The present invention further provides a method for sequencing nucleic acids using multiple sequencing probes disclosed herein. For example, by hybridizing a target nucleic acid with two or more sequencing probes, each probe can sequence the portion of the target nucleic acid that has been hybridized with the probe.
[0257] The present invention provides a method for sequencing nucleic acids, comprising: (1) hybridizing a first group of at least one sequencing probe, comprising a plurality of sequencing probes described herein, to a target nucleic acid immobilized at one or more positions on a substrate; (2) binding a first complementary nucleic acid molecule, comprising a first detectable label and at least a second detectable label, to a first attachment site among at least three attachment sites of a barcode domain; (3) detecting the first detectable label and at least a second detectable label of the bound first complementary nucleic acid molecule; (4) identifying the positions and attributes of at least two nucleotides in the immobilized target nucleic acid; and (5) freeing the first complementary nucleic acid molecule containing the detectable label by binding a first hybridizing nucleic acid molecule lacking the detectable label to the first attachment site, or freeing the first complementary nucleic acid molecule containing the detectable label from the first detectable label. A method is also provided which includes: (6) bringing the first complementary nucleic acid molecule containing a third and at least a fourth detectable label into contact with the target-binding domain of the sequencing probe with sufficient force to release at least a second detectable label; (7) binding a second complementary nucleic acid molecule containing two detectable labels to a second attachment site among at least three attachment sites of the barcode domain; (8) detecting the third and at least a fourth detectable label on the bound second complementary nucleic acid molecule; (9) optionally identifying the location and attribute of at least two nucleotides in the immobilized target nucleic acid; (10) optionally removing at least a first population of the first sequencing probe from the immobilized target nucleic acid.
[0258] This method further involves (11) hybridizing at least one second group of second sequencing probes, comprising multiple sequencing probes described herein, to a target nucleic acid immobilized at one or more positions on a substrate (where the target binding domains of the first sequencing probe and the second sequencing probe are different); and (12) attaching a first complementary nucleic acid molecule, comprising a first detectable label and at least a second detectable label, to a first of at least three attachment sites of the barcode domain. (13) Bind to the attachment site; (14) Detect the first detectable label and at least a second detectable label of the bound first complementary nucleic acid molecule; (15) Identify the position and attributes of at least two nucleotides in the immobilized target nucleic acid, if applicable; (16) Free the first complementary nucleic acid molecule or complex containing the detectable label by binding the first hybridizing nucleic acid molecule lacking the detectable label to the first attachment site, or free the first complementary nucleic acid molecule or complex containing the detectable label with the first detectable label and at least (16) bringing the barcode domain into contact with sufficient force to release a second detectable label; (17) binding a second complementary nucleic acid molecule containing a third detectable label and at least a fourth detectable label to a second attachment site among at least three attachment sites; (18) detecting the third detectable label and at least a fourth detectable label on the bound second complementary nucleic acid molecule; (19) optionally identifying the location and attribute of at least two nucleotides in the immobilized target nucleic acid; (10) identifying a linear sequence of at least six nucleotides for at least a second region of the immobilized target nucleic acid hybridized to the target binding domain of the sequencing probe by repeating steps (15) to (18) until a complementary nucleic acid molecule containing two detectable labels is bound to each of the at least three attachment sites in the barcode domain and the two detectable labels on the bound complementary nucleic acid molecule are detected; (11) optionally removing at least a second population of the second sequencing probe from the immobilized target nucleic acid.
[0259] This method may further include identifying the sequence of the immobilized target nucleic acid by reconstructing the linear order of nucleotides identified in at least a first region and at least a second region of the immobilized target nucleic acid, respectively.
[0260] Steps (5) and (6) can be performed sequentially or simultaneously. The first detectable label and at least the second detectable label may have the same emission spectrum or different emission spectra. The third detectable label and at least the fourth detectable label may have the same emission spectrum or different emission spectra.
[0261] The first complementary nucleic acid molecule may contain a cleavable linker. The second complementary nucleic acid molecule may also contain a cleavable linker. The first and second complementary nucleic acid molecules may each contain a cleavable linker. Preferably, the cleavable linker is cleavable by light. Light can be used as the emission force. UV light is preferred. Light can be provided by a light source selected from the group consisting of arc lamps, lasers, focused UV light sources, and light-emitting diodes.
[0262] The first complementary nucleic acid molecule and the first hybridizing nucleic acid molecule lacking a detectable label may contain the same nucleic acid sequence. The first hybridizing nucleic acid molecule lacking a detectable label may contain a nucleic acid sequence complementary to the flanking single-stranded polynucleotide adjacent to the first attachment site in the barcode domain.
[0263] The second complementary nucleic acid molecule and the second hybridizing nucleic acid molecule lacking a detectable label may contain the same nucleic acid sequence. The second hybridizing nucleic acid molecule lacking a detectable label may contain a nucleic acid sequence complementary to the flanking single-stranded polynucleotide adjacent to the second attachment site in the barcode domain.
[0264] This sequencing method will be further described herein.
[0265] Figure 14 shows a schematic diagram of one entire cycle of the sequencing method of the present disclosure. The method of the present disclosure does not require the immobilization of the target nucleic acid before sequencing, but in this example, the method begins with the target nucleic acid captured using a capture probe and bound to the surface of the flow cell, as shown in the upper left figure. Next, a pool of sequencing probes is introduced into the flow cell so that the sequencing probes can hybridize to the target nucleic acid. In this example, the sequencing probes are shown in Figure 1. These sequencing probes contain a hexameric sequence within the target-binding domain that hybridizes to the target nucleic acid. Each hexameric sequence has an (N) base adjacent to it. The (N) base can be a universal base / degenerate base, or any of the four canonical bases that are not specific to the target not specified by the bases b1- b2- b3- b4- b5- b6. Using the hexameric sequence, 4096(4 6 A set of 4096 sequencing probes makes it possible to sequence any target nucleic acid. In this example, a set of 4096 sequencing probes hybridizes to target nucleic acids in each of eight pools, each containing 512 sequencing probes. The hexameric sequence in the target-binding domain of the sequencing probe hybridizes along the length of the target nucleic acid at a position where the hexameric sequence and the target nucleic acid are perfectly complementary, as shown in the upper center diagram of Figure 14. In this example, a single sequencing probe hybridizes to the target nucleic acid. Any sequencing probes that do not bind are washed away from the flow cell.
[0266] These sequencing probes also include a barcode domain with three attachment sites R1, R2, and R3, as described above. The attachment region at attachment site R1 contains one or more nucleotide sequences corresponding to the first dinucleotide of the hexamer of the sequencing probe. Therefore, only reporter probes containing complementary nucleic acids corresponding to the attributes of the first dinucleotide present in the target-binding domain of the sequencing probe will hybridize to attachment site R1. Similarly, the attachment region at attachment site R2 of the sequencing probe corresponds to the second dinucleotide present in the target-binding domain, and the attachment region at attachment site R3 of the sequencing probe corresponds to the third dinucleotide present in the target-binding domain.
[0267] This method follows the diagram at the top right of Figure 14. A pool of reporter probes is introduced into a flow cell. Each reporter probe in the pool contains a detectable label in the form of a two-color combination and a complementary nucleic acid that can hybridize to the corresponding attachment region within the attachment site R1 of the sequencing probe. The two-color combination and the complementary nucleic acid of a particular reporter probe correspond to one of the 16 possible dinucleotides, as described above. Each pool of reporter probes is designed so that a two-color combination corresponding to a specific dinucleotide is established before sequencing. For example, in the sequencing experiment shown in Figure 14, for the first pool of reporter probes hybridizing to attachment site R1, the two-color combination yellow-red can be associated with the dinucleotide adenine-thymine. As shown in the upper right diagram of Figure 14, after the reporter probe hybridizes to the attachment site R1, any unbound reporter probes are washed away from the flow cell, and the detectable label of the bound reporter probes is recorded to determine the attribute of the first dinucleotide of the hexamer.
[0268] The detectable label belonging to the reporter probe hybridized to the attachment site is removed. To remove the detectable label, the reporter probe may contain a cleavable linker to which an appropriate cleavage reagent may be added. Alternatively, the reporter probe with the detectable label is replaced by hybridizing a complementary nucleic acid lacking the detectable label to the attachment site R1 of the sequencing probe. Whatever method is used to remove the detectable label, the attachment site R1 no longer generates a detectable signal. The method of making the attachment site of a barcode domain that previously generated a detectable signal no longer generate a detectable signal is referred to herein as "darkening".
[0269] A second pool of reporter probes is introduced into the flow cell. Each reporter probe in the pool contains a detectable label in the form of a two-color combination and a complementary nucleic acid that can hybridize to the corresponding attachment region in the attachment site R2 of the sequencing probe. The two-color combination and the complementary nucleic acid of a particular reporter probe correspond to one of 16 possible dinucleotides. A particular two-color combination may correspond to one dinucleotide in the context of the first pool of reporter probes and to a different dinucleotide in the context of the second pool of reporter probes. After the reporter probes have hybridized to the attachment site R2 as shown in the lower right of Figure 14, any unbound reporter probes are washed out of the flow cell, and the detectable labels of the bound reporter probes are recorded to determine the attribute of the second dinucleotide of the hexameric molecule.
[0270] To remove the detectable label at position R2, the reporter probe may contain a cleavable linker to which a suitable cleavage reagent can be added. Alternatively, the reporter probe with the detectable label may be replaced by hybridizing a complementary nucleic acid lacking the detectable label to the attachment position R2 of the sequencing probe. Whatever method is used to remove the detectable label, the attachment position R2 no longer generates a detectable signal.
[0271] Next, a third pool of reporter probes is introduced into the flow cell. Each reporter probe in the third pool contains a detectable label in the form of a two-color combination and a complementary nucleic acid that can hybridize to the corresponding attachment region within the attachment site R3 of the sequencing probe. The two-color combination and the complementary nucleic acid of a particular reporter probe correspond to one of 16 possible dinucleotides. After the reporter probes hybridize to site R3, as shown in the lower center of Figure 14, any unbound reporter probes are washed out of the flow cell, and the detectable labels of the bound reporter probes are recorded to determine the attribute of the third dinucleotide of the hexameric structure. In this way, all three dinucleotides of the target-binding domain are identified, and they can be reconstructed together to reveal the sequence of the target-binding domain, and thus the sequence of the target nucleic acid.
[0272] To continue sequencing of the target nucleic acid, any bound sequencing probes can be removed from the target nucleic acid. Even if a reporter probe remains hybridized to position R3 of the barcode domain, the sequencing probe can be removed from the target nucleic acid. Alternatively, a reporter probe hybridized to position R3 can be removed from the barcode domain before removing the sequencing probe from the target binding domain, for example, by using the darkening procedure described above for reporters at positions R1 and R2.
[0273] The sequencing cycle shown in Figure 14 can be repeated any number of times, and each sequencing cycle can begin by hybridizing the same pool of sequencing probes to the target nucleic acid molecule, or by hybridizing different pools of sequencing probes to the target nucleic acid molecule. The second pool of sequencing probes can bind to the target nucleic acid at a position that overlaps with the position where the first sequencing probe, or the pool of the first sequencing probes, bound to the target nucleic acid during the first sequencing cycle. In this way, it is possible to sequence some nucleotides in the target nucleic acid two or more times and to use two or more sequencing probes.
[0274] Figure 15 shows a schematic diagram of one entire cycle of the sequencing method of this disclosure and the corresponding imaging data recovered during this cycle. In this example, the sequencing probe used is shown in Figure 1, and the sequencing process is the same as shown in Figure 14 and described above. After hybridizing the sequencing domain of the sequencing probe to the target nucleic acid, the reporter probe is hybridized to the first attachment site (R1) of the sequencing probe. Next, an image of the first reporter probe is acquired and a color dot is recorded. In Figure 15, the color dots are shown as dotted circles. The color dots correspond to a single sequencing probe recorded during one entire cycle. In this example, seven sequencing probes are recorded (1-7). Next, the first attachment site of the barcode domain is darkened, and the bifluorescent reporter probe is hybridized to the second attachment site (R2) of the sequencing probe. Next, an image of the second reporter probe is acquired and a color dot is recorded. Next, the second attachment site of the barcode domain is darkened, and the bifluorescent reporter probe is hybridized to the third attachment site (R3) of the sequencing probe. Then, an image of the third reporter probe is acquired and color dots are recorded. Next, the three color dots from each sequencing probe 1 to 7 are arranged in order. Then, each color dot is mapped to a specific dinucleotide using a decoding matrix to reveal the sequences of the target-binding domains of sequencing probes 1 to 7.
[0275] The number of reporter probes required to sequence the target-binding domain of any sequencing probe bound to a target nucleic acid during a single sequencing cycle is equal to the number of attachment sites within the barcode domain. Therefore, for a barcode domain with three attachment sites, three reporter probes would be cycled through with the sequencing probe.
[0276] A pool of sequencing probes can contain multiple sequencing probes with identical sequences, or multiple sequencing probes with different sequences. When a pool of sequencing probes contains multiple sequencing probes with different sequences, there can be an equal number of each different sequencing probe, or there can be different numbers of each different sequencing probe.
[0277] Figure 16 shows an example configuration of a sequencing probe pool according to this disclosure, where the sequencing probe contains (a) a target-binding domain containing six nucleotides (hexamers) that specifically bind to the target nucleic acid, and (b) three attachment sites (R1, R2, R3) within the barcode domain. Therefore, eight different pools of sequencing probes are designed using the eight color combinations shown above. There are 4096 possible different hexamer sequences (4×4×4×4×4×4=4096). Since each of the three attachment sites within the barcode domain can hybridize to a complementary nucleic acid bound to one of the eight different color combinations, there are 512 different sets (8×8×8=512) consisting of three possible color combinations. For example, in a probe where R1 hybridizes to a complementary nucleic acid bound to color combination GG, R2 hybridizes to a complementary nucleic acid bound to color combination BG, and R3 hybridizes to a complementary nucleic acid bound to color combination YR, the set of three color combinations is GG-BG-YR. Within one pool of sequencing probes, three different sets of color combinations would correspond to different hexamers in the target binding domain. Since each pool contains 512 different hexamers, there are a total of 4096 possible hexamers, and therefore eight pools are needed to sequence all possible hexamers (4096 / 512=8). The specific sequencing probes to be placed in each of the eight pools are determined so that each sequencing probe optimally hybridizes to the target nucleic acid. There are several considerations to ensure optimal hybridization. The aforementioned precautions include (a) separating the complete complements of the hexamers into different pools; (b) separating hexamers with high Tm and low Tm into different pools; and (c) separating hexamers into different pools based on empirically learned hybridization patterns.
[0278] Figure 17 illustrates the difference between the sequencing probe described in U.S. Patent Application Publication 2016 / 0194701 and the sequencing probe of this disclosure. As shown in the left panel of Figure 17, U.S. Patent Application Publication 2016 / 0194701 describes a sequencing probe having a barcode domain with six attachment sites that hybridize to complementary nucleic acids. Each complementary nucleic acid binds to one of four different fluorescent dyes. In this configuration, each color (red, blue, green, yellow) corresponds to one nucleotide (A, T, C, G) in the target binding domain. By designing the probe in this way, 4096 different probes (4 6 ) is generated. As shown in the right panel of Figure 17, in one example of this disclosure, the barcode domain of each sequencing probe contains three attachment sites that hybridize to complementary nucleic acids. Unlike United States Patent Application Publication 2016 / 0194701, one of eight color combinations (GG, RR, GY, RY, YY, RG, BB, RB) binds to these complementary nucleic acids. Each color combination corresponds to a specific dinucleotide in the target binding domain. In this configuration, 512 combinations (8 3 This generates different probes. To cover all possible hexamer combinations (4096) within the target nucleic acid domain, eight separate pools of these 512 different probes are required to sequence an entire target nucleic acid. Eight color combinations are used to label complementary nucleic acids, but there are 16 possible dinucleotides, so depending on which pool of sequencing probes is used, some color combinations will correspond to different dinucleotides. For example, in Figure 17, in the first, second, third, and fourth pools of sequencing probes, color combination BB corresponds to dinucleotide AA, and color combination GG corresponds to dinucleotide AT. In the fifth, sixth, seventh, and eighth pools of sequencing probes, color combination BB corresponds to dinucleotide CA, and color combination CT corresponds to dinucleotide AT.
[0279] Multiple sequencing probes (i.e., two or more sequencing probes) can be hybridized within a sequencing window. During sequencing, the attributes and spatial positions of detectable labels bound to each of the hybridized sequencing probes are recorded. This makes it possible to identify both the positions and attributes of multiple dinucleotides after the fact. In other words, by simultaneously hybridizing multiple sequencing probes to a single target nucleic acid molecule, multiple positions along the target nucleic acid can be sequenced simultaneously, thus improving the sequencing speed.
[0280] Figure 18 shows schematic diagrams of a single sequencing probe and multiple sequencing probes hybridized to a captured target nucleic acid molecule. The sequencing window between the two hybridized 5' and 3' capture probes allows a single sequencing probe (left) or multiple sequencing probes (right) to be hybridized along the length of the target nucleic acid molecule. Hybridizing multiple sequencing probes along the length of the target nucleic acid molecule allows for simultaneous sequencing of two or more locations on the target nucleic acid molecule, thereby improving the sequencing speed. Figure 19 shows fluorescence images recorded during the sequencing method of this disclosure when a single sequencing probe (left) or multiple sequencing probes (right) are hybridized to a captured target nucleic acid. The right image in Figure 19 shows that the fluorescence signals from individual probes of multiple sequencing probes bound along the length of the target nucleic acid can be spatially separated.
[0281] Figure 20 shows a schematic diagram of multiple sequencing probes according to this disclosure bound along the length of a target nucleic acid with a length of 15 kilobases, and the corresponding recorded fluorescence images. As shown in the right panel of Figure 20, the sequencing probes can be bound at equal intervals along the length of the target nucleic acid. As shown in the left panel of Figure 20, the sequencing probes do not need to be bound at equal intervals along the length of the target nucleic acid. The fluorescence images shown in Figure 20 demonstrate that it is possible to spatially separate signals from multiple sequencing probes bound along the length of the target nucleic acid and simultaneously obtain sequencing information at multiple locations on the target nucleic acid.
[0282] The distribution of probes along the length of the target nucleic acid is crucial for the resolution of the detectable signal. Too many probes in one region can lead to overlapping detectable labels, hindering the separation of two adjacent probes. This can be explained as follows: Since one nucleotide is 0.34 nm long and the lateral (xy) spatial resolution of the sequencing instrument is approximately 200 nm, the resolution limit of the sequencing instrument is approximately 588 base pairs (i.e., 1 nucleotide / 0.34 nm × 200 nm). That is, when two probes are within approximately 588 base pairs of each other, the above sequencing instrument is unlikely to be able to separate the signals from the two probes hybridized to the target nucleic acid. Therefore, in order to separate the detectable labels as separate "spots," the two probes should be spaced approximately 600 base pairs apart, depending on the resolution of the sequencing instrument. For this reason, the optimal spacing is one probe every 600 bp of the target nucleic acid. It is preferable that each sequencing probe in a probe population does not bind to each other at a distance of less than 600 nucleotides. Various software approaches (e.g., using a ratio dependent on fluorescence intensity value and wavelength) can be used to monitor, limit, and potentially analyze the number of probes hybridizing into a single separable region of the target nucleic acid, and the probe population can be designed accordingly. Furthermore, detectable labels (e.g., fluorescent labels) that provide more distinctly different signals can be selected. Additionally, methods described in the literature (Small and Parthasarthy: "Superresolution localization methods," Annu. Rev. Phys Chem., 2014; Vol. 65: pp. 107-125) describe various super-resolution approaches that reduce the resolution limit of sequencing microscopes to tens of nanometers with structured illumination. Using higher-resolution sequencing equipment allows for the use of probes with shorter target-binding domains.
[0283] As mentioned above, the design of the probe's Tm can affect the number of probes that hybridize to the target nucleic acid. Alternatively, or in addition to the above, the concentration of sequencing probes in a single population can be increased to increase probe coverage in a specific region of the target nucleic acid. Conversely, the concentration of sequencing probes can be decreased to reduce probe coverage in a specific region of the target nucleic acid, for example, to be above the resolution limit of the sequencing instrument.
[0284] The resolution limit of the two detectable labels is approximately 600 nucleotides, but this does not hinder the robust sequencing method of this disclosure. In some respects, multiple sequencing probes in any given population will not be 600 nucleotides apart on the target nucleic acid. However, statistically (following a Poisson distribution), there will be target nucleic acids to which only one sequencing probe is bound, and thus the sequencing probes are optically separable. For target nucleic acids that have multiple probes within 600 nucleotides (and therefore are not optically separable), the data relating to these inseparable sequencing probes can be discarded. Importantly, the method of this disclosure involves multiple operations of binding and detection of multiple sequencing probes. Therefore, it is possible to detect signals from all sequencing probes in some operations, signals from only some sequencing probes in some operations, and signals from no sequencing probes in some operations. In some respects, the distribution of sequencing probes bound to target nucleic acids can be controlled (for example, by controlling the concentration or dilution) so that only one sequencing probe binds to a single target nucleic acid.
[0285] Randomly, but partly depending on the length of the target-binding domain, the probe's Tm, and the concentration of the probe being applied, it is possible for two distinctly different sequencing probes within a single population to bind to each other within 600 nucleotides.
[0286] Alternatively, or in addition to the foregoing, the concentration of sequencing probes in a single population can be reduced to decrease probe coverage in a specific region of the target nucleic acid, for example, above the resolution limit of the sequencing instrument, thereby enabling single readouts from the resolution-limited spots.
[0287] If the sequence or a portion of the sequence of the target nucleic acid is known before sequencing the target nucleic acid using the method disclosed herein, the sequencing probes can be designed and selected such that no two sequencing probes bind to each other within 600 nucleotides.
[0288] Before hybridizing a sequencing probe to a target nucleic acid, one or more complementary nucleic acid molecules can be bound with a first detectable label, and at least a second detectable label can be hybridized to one or more attachment sites within the barcode domain of the sequencing probe. For example, before hybridizing to the target nucleic acid, one or more complementary nucleic acid molecules bound with the first detectable label and at least a second detectable label can be hybridized to the first attachment site of each sequencing probe. Therefore, since the sequencing probe can generate a detectable signal from the first attachment site when it comes into contact with the target nucleic acid, there is no need to prepare a first pool of complementary nucleic acids or reporter probes directed to the first position on the barcode domain. In another example, one or more complementary nucleic acid molecules bound with the first detectable label and at least a second detectable label can be hybridized to all attachment sites within the barcode domain of the sequencing probe. Therefore, in this example, a sequence consisting of six nucleotides can be read without the need to sequentially exchange the complementary nucleic acids. Using this pre-hybridized sequencing probe-reporter probe complex is expected to shorten the time required to acquire sequence information, as many steps in the described method can be omitted. However, this probe is expected to benefit from non-overlapping detectable labels. For example, phosphors are excited by or emit light of non-overlapping wavelengths.
[0289] During sequencing, the signal intensity from the recorded color dots can be used to more accurately sequence the target nucleic acid. Figure 21 shows imaging data recorded during one sequencing cycle of this disclosure. The right panel of Figure 21 shows a fluorescence microscope image recorded after the reporter probe hybridized to the first attachment site of the sequencing probe. Individual color dots are highlighted to make them stand out, and specific color combinations recorded are emphasized. This shows that the dual fluorescence signal is clearly detectable and identifiable. Bright signals from reference markers are indicated by arrows. The left panel of Figure 21 shows that by using the spot intensity of individual colors within a single color dot, the probability that a particular single color dot corresponds to a color combination (i.e., any of BB, GG, YY, or RR) where one color overlaps can be determined.
[0290] Darkening of a single location within a barcode domain can be achieved by cleaving the strand at the site of a cleavable linker modification present in the reporter probe hybridized to that location. Figure 22 illustrates darkening of a barcode location during a sequencing cycle using a cleavable linker modification. The first step, shown in the leftmost panel of Figure 22, involves hybridizing the primary nucleic acid of the reporter probe to a first attachment site of the sequencing probe. The primary nucleic acid hybridizes to a specific complementary sequence within the attachment region at the first location of the barcode domain. The first and second domains of the primary nucleic acid are covalently linked by a cleavable linker modification. In the second step, a detectable label is recorded to determine the attribute and location of a specific dinucleotide within the target-binding domain of the sequencing probe. In the third step, the first location of the barcode domain is darkened by cleaving the reporter probe at the site of the cleavable linker modification. This releases the second domain of the primary nucleic acid, thereby releasing a detectable label. The first domain of the primary nucleic acid molecule remains hybridized to the first attachment site of the barcode domain, now lacking any detectable labeling. Therefore, the first site of the barcode domain no longer generates a detectable signal and will not be able to hybridize to any other reporter probe in the subsequent sequencing steps. In the final step shown in the rightmost diagram of Figure 22, the reporter probe hybridizes to the second site of the barcode domain, and sequencing continues.
[0291] The attachment site of the barcode domain can be darkened by substituting any secondary or tertiary nucleic acid in the reporter probe to which a detectable label is bound, while the primary nucleic acid of the reporter probe remains hybridized to the sequencing probe. This substitution can be achieved by hybridizing the primary nucleic acid with a secondary or tertiary nucleic acid that is not bound to a detectable label. Figure 23 illustrates an example of a sequencing cycle of this disclosure, in which one site in the barcode domain is darkened by substitution of a secondary nucleic acid with a label. The leftmost panel of Figure 23 shows the start of the sequencing cycle, in which the primary nucleic acid molecule of the reporter probe hybridizes to the first attachment site of the barcode domain of the sequencing probe. Next, the secondary nucleic acid molecule bound to the detectable label hybridizes to the primary nucleic acid molecule, and the detectable label is recorded. To darken the first site in the barcode domain, the secondary nucleic acid molecule bound to the detectable label is substituted with a secondary nucleic acid molecule that does not have a detectable label. In the next step of the sequencing cycle, a reporter probe containing a detectable label is hybridized to a second position in the barcode domain. Darkening of the barcode domain attachment site is possible by hybridizing an unlabeled nucleic acid to the sequencing probe at the corresponding attachment site in the barcode domain, thereby replacing any primary nucleic acid molecule in the reporter probe. If the barcode domain contains at least one single-stranded nucleic acid sequence adjacent to at least one attachment site, the unlabeled nucleic acid can replace the primary nucleic acid molecule by hybridizing the flanking sequence to the portion of the barcode domain occupied by the primary nucleic acid molecule.If necessary, the rate of exchange of the detectable label can be increased by incorporating a small single-stranded oligonucleotide that increases the rate of exchange of the detectable label (e.g., "toehold" probes; see, e.g., Seeling et al., "Catalyzed Relaxation of a Metastable DNA Fuel"; J. Am. Chem. Soc. 2006, Vol. 128(37), pp. 12211-12220).
[0292] Complementary nucleic acids containing detectable labels, i.e., reporter probes, can be removed from the attachment site but cannot be replaced with hybridizing nucleic acids lacking detectable labels. This can be achieved, for example, by adding chaotropic agents, and / or increasing the temperature, and / or changing the salt concentration, and / or adjusting the pH, and / or applying hydrodynamic forces. In these examples, fewer reagents (i.e., hybridizing nucleic acids lacking detectable labels) are required.
[0293] The method of this disclosure allows for the simultaneous capture and sequencing of RNA and DNA molecules (including mRNA and gDNA) from the same sample. The capture and sequencing of both RNA and DNA molecules from the same sample can be performed within the same flow cell. The left panel of Figure 24 is a schematic diagram illustrating how the method of this disclosure enables the simultaneous capture, detection, and sequencing of both mRNA and gDNA from an FFPE sample.
[0294] The sequencing method of the present disclosure further includes a step of identifying the sequence of an immobilized target nucleic acid by reconstructing the linear order of nucleotides identified for each region of the immobilized target nucleic acid. The reconstruction step utilizes a non-temporary computer-readable storage medium storing an executable program. The nucleic acid sequence is obtained by the program issuing instructions to a microprocessor to arrange the linear order of nucleotides identified for each region of the target nucleic acid. The reconstruction can be performed in "real time," that is, while the data is being retrieved from the sequencing probe, rather than after all the data has been retrieved or after complete data acquisition.
[0295] The raw specificity of the sequencing method according to this disclosure is approximately 94%. The accuracy of the sequencing method according to this disclosure can be increased to approximately 99% by sequencing the same base in the target nucleic acid using two or more sequencing probes. Figure 25 illustrates how the sequencing method according to this disclosure enables sequencing of the same base in the target nucleic acid using different sequencing probes. In this example, the target nucleic acid is a fragment of NRAS exon 2 (sequence number 1). The specific base of interest is cytosine (C), which is highlighted in the target nucleic acid. This base of interest will hybridize to two sequencing probes with different hybridization footprints for the target nucleic acid. In this example, sequencing probes 1-4 (barcodes 1-4) bind to the three nucleotides to the left of the base of interest, while sequencing probes 5-8 (barcodes 5-8) bind to the five nucleotides to the left of the base of interest. As a result, the bases of interest are sequenced by two different probes, increasing the amount of base readings at the particular location, thereby improving the overall accuracy at that location. Figure 26 shows how multiple different base readings at a specific nucleotide location on the target nucleotide are recorded by one or more sequencing probes and combined to create a consensus sequence (SEQ ID NO: 2), thereby improving the accuracy of the final base readings.
[0296] "Hyb & Seq chemistry," "Hyb & Seq sequencing," and "Hyb & Seq" refer to the methods described above in this disclosure.
[0297] Any aspect described above can be combined with any other aspect disclosed herein.
[0298] definition
[0299] In this specification, the terms “annealing” and “hybridization” are used interchangeably and refer to the formation of a stable double helix. In one aspect, a stable double helix means that the double helix structure is not destroyed by stringent washing under conditions where the temperature is about 5°C below or about 5°C above the Tm of one of the double helix strands and the concentration of the monovalent salt is low (e.g., less than 0.2 M or less than 0.1 M, or a salt concentration known to those skilled in the art). The expression “perfectly matched,” when used in reference to a double helix, means that the polynucleotide and / or oligonucleotide strands constituting the double helix form a double helix structure with each other, and all nucleotides in each strand form Watson-Crick base pairs with nucleotides in the other strand. The term “double helix” includes, but is not limited to, pairs of available nucleoside analogs (such as deoxyinosine), nucleosides having a 2-aminopurine base, PNA, etc. A "mismatch" within the double helix between two oligonucleotides means that the pair of nucleotides within the double helix cannot form a Watson-Crick bond.
[0300] In this specification, the term “hybridization conditions” will typically include salt concentrations of less than approximately 1 M, more commonly less than approximately 500 mM, and even more commonly less than approximately 200 mM. While low temperatures of 5°C are possible for hybridization, temperatures are typically above 22°C, more commonly above approximately 30°C, and often above approximately 37°C. Hybridization is usually performed under stringent conditions (e.g., conditions under which the probe will specifically hybridize to the corresponding subsequence of the target). Stringent conditions are sequence-dependent and vary from situation to situation. Longer fragments may require higher hybridization temperatures for specific hybridization. Because other factors (including the base composition and length of complementary chains, the presence of organic solvents, and the degree of base mismatch) can affect the stringency of hybridization, the combination of parameters is more important than the absolute indicator of any single parameter.
[0301] Generally, stringent conditions are selected so that the Tm of a particular sequence at a given ionic strength and pH is approximately 5°C lower. Examples of stringent conditions include a Na ion concentration (or other salt) of at least 0.01 M to 1 M at a pH of 7.0–8.3 and a temperature of at least 25°C. For example, conditions of 5×SSPE (750 mM NaCl, 50 mM sodium phosphate, 5 mM EDTA, pH 7.4) and a temperature of 25–30°C are suitable for hybridization of allele-specific probes. For more information on stringent conditions, see, for example, Sambrook, Fritsche, and Maniatis, *Molecular Cloning: A Laboratory Manual*, 2nd edition, Cold Spring Harbor Press (1989), and Anderson, *Nucleic Acid Hybridization*, 1st edition, BIOS Scientific Publishers Limited (1999). In this specification, “specifically hybridizes to” or “specifically hybridizes to” and similar expressions mean that, under stringent conditions, a molecule substantially binds to, doubles, or hybridizes with one or more specific nucleotide sequences.
[0302] A detectable label associated with a specific location on a probe can be “read out” (e.g., by detecting the fluorescence) once or multiple times. “Readout” can be used as a synonym for “base readout.” Multiple readouts improve accuracy. A target nucleic acid sequence is “read out” when a continuous stretch of sequence information derived from the original single target molecule is detected. Typically, this is generated through a multipass consensus (as defined below). Herein, the terms “coverage” or “depth of coverage” mean the number of times a single region of the target is sequenced and aligned with a reference sequence (through separate readouts). Readout coverage is the total number of readouts that map to a particular reference target sequence. Base coverage is the total number of base readouts at a particular genomic location.
[0303] A single "readout" is a unit of sequence output. A continuous stretch of sequence information derived from the original single target molecule. Each readout has a quality index related to the confidence level of the base readouts within that readout. A unit of sequencer output. A continuous stretch of sequence information derived from the original single target molecule. In Hyb & Seq, all readouts are generated through multipass consensus.
[0304] "Read length" is an index that describes the length of the array from each read (unit: bp).
[0305] In this specification, “Hyb & Seq cycle” refers to the entire process required to detect each attachment region on the surface of a particular probe or a group of probes. For example, with a probe capable of detecting six locations on the surface of a target nucleic acid, one “Hyb & Seq cycle” would include, at a minimum, hybridizing the probe to the target nucleic acid, hybridizing a complementary nucleic acid / reporter probe to each of the six attachment regions on the surface of the probe’s barcode domain, and detecting the detectable label associated with each of those six locations.
[0306] The term "k-mer probe" is a synonym for sequencing probe as used in this disclosure. A k-mer readout is the basic unit of Hyb & Seq data. One k-mer readout is obtained from one target molecule per Hyb & Seq cycle. By performing a sufficient number of Hyb & Seq cycles to generate a sufficient number of separate k-mer readouts from a single target molecule, it becomes possible to obtain a continuous stretch of the sequence by unambiguously aligning the separate k-mers.
[0307] When aligning two or more sequences from separate readouts, overlapping portions can be combined to create a single consensus sequence. Where the overlapping portions share the same base (in one column of the alignment), those bases become the consensus. Various rules can be used to generate consensus for locations where discrepancies exist between overlapping sequences. The simple majority rule uses the most frequently occurring base in a column as the consensus. "Multipass consensus" is the alignment of all separate probe readouts from a single target molecule. Depending on the probe population / total number of cycles, queries to each base position within the single target molecule can be performed with different levels of redundancy or redundancy. Generally, redundancy improves the confidence level of the base readouts.
[0308] "Consensus" refers to the alignment of two or more DNA sequences from separate readouts, where overlapping portions can be combined to generate a single consensus sequence. Where overlapping portions share the same base (in one column of the alignment), those bases become the consensus. Various rules can be used to generate consensus for locations where discrepancies exist between overlapping sequences. The simple majority rule uses the most frequently occurring base in a column as the consensus.
[0309] "Raw precision" is one indicator of the system's inherent ability to correctly identify bases. Raw precision depends on the sequencing technique. "Consensus precision" is one indicator of the system's ability to correctly identify bases using additional readouts and the power of statistics. "Specificity" refers to the proportion of readouts that map to the intended target out of all readouts per trial. "Uniformity" refers to the variability of sequence coverage between target regions. High uniformity correlates with low variability. This feature is generally reported as the proportion of target regions that are covered with an average coverage depth of 20% or more out of all targeted regions. Random errors (i.e., errors inherent in sequencing chemistry) can be easily corrected by "multipass" sequencing of the same target nucleic acid. With a sufficient number of passes, it is possible to achieve virtually "perfect consensus" or "error-free" sequencing.
[0310] Implementing the methods described herein and / or recording the results is possible using any apparatus capable of implementing the methods and / or recording the results. Non-limiting examples of usable apparatus include electronic computing devices, including computers of any type. When implementing and / or recording the methods described herein in a computer, the computer program that can be used to cause the computer to perform the steps of the methods can be contained in any computer-readable medium capable of containing such computer program. Non-limiting examples of usable computer-readable medium include diskettes, CD-ROMs, DVDs, ROMs, RAMs, non-temporary computer-readable media, other memory, and other computer storage devices. The computer program that can be used to cause a computer to perform the steps of the methods and / or to reconstruct the sequence information and / or record the results can also be provided on an electronic network (e.g., the Internet, an intranet, or other network).
[0311] A "disposable sequencing card" can be incorporated into fluorescence imaging devices known in this field. Any fluorescence microscope with many different features can perform this sequencing readout. For example, a wide-field lamp, laser, LED, multiphoton, confocal illumination, or total internal reflection illumination can be used for excitation and / or detection. The emission detection channel of the fluorescence microscope can be fitted with one or more cameras and / or photomultiplier tubes capable of spectral decomposition (spectral decomposition of one or more emission wavelengths) based on filters or gratings. A standard computer can be used to control the disposable sequencing card, the reagents passing through the card, and the detection by the fluorescence microscope.
[0312] Sequencing data can be analyzed by any number of standard next-generation sequencing assemblers (see, for example, Wajid and Serpedin, "Review of general algorithmic features for genome assemblers for next generation sequencers," Genomics, proteomics & bioinformatics, Vol. 10(2), pp. 58-73, 2012). Sequencing data obtained within a single diffraction-limited region of a microscope is "locally reconstructed" to generate a consensus sequence from multiple readouts within a single diffraction spot. Then, by mapping the reconstructed readouts from numerous diffraction spots together, a continuous sequence representing a novel assembly of the entire target gene set or the entire genome is generated.
[0313] Additional teachings relating to this disclosure are found in United States Patents Nos. 8,148,512, 7,473,767, 7,919,237, 7,941,279, 8,415,102, 8,492,094, 8,519,115, and United States Patent Application Publications Nos. 2009 / 0220978, 2009 / 0299640, 2010 / 0015607, 2010 / 0261026, 2011 / 0086774, It is listed in one or more of the following issues: 2011 / 0145176, 2011 / 0201515, 2011 / 0229888, 2013 / 0004482, 2013 / 0017971, 2013 / 0178372, 2013 / 0230851, 2013 / 0337444, 2013 / 0345161, 2014 / 0005067, 2014 / 0017688, 2014 / 0037620, 2014 / 0087959, 2014 / 0154681, 2014 / 0162251, and 2016 / 0194701. Each of these documents is incorporated herein by reference in its entirety. [Examples]
[0314] Example 1: Readout of single molecule length using Hyb & Seq chemistry
[0315] The sequencing probes and sequencing methods using the sequencing probes disclosed herein are referred to as Hyb & Seq for convenience. This terminology is used throughout the specification to describe the disclosed sequencing probes and sequencing methods. Hyb & Seq is a library-free, amplification-free single-molecule sequencing technique that utilizes the nucleic acid hybridization cycle of fluorescent molecular barcodes on the surface of natural targets.
[0316] This paper demonstrates long readout using Hyb & Seq for a 33 kilobase (kb) single-molecule DNA target. The key steps are as follows: (1) The long DNA molecule is captured on the surface of a sequencing flow cell and stretched by hydrodynamic forces; and / or (2) Multiple perfectly matched sequencing probes hybridize along the long single-molecule target; and / or (3) A fluorescence reporter hybridizes to the barcode region within the sequencing probe, and all bound sequences are identified; and / or (4) The relative position of the sequences within the single-molecule target is determined using spatially separated fluorescence data.
[0317] Non-limiting examples of the significant advantages of long readouts using Hyb & Seq include: readout length, which is determined by molecular length, is not chemically limited; and / or less fragmentation as a result of simpler and more limited sample preparation; and / or positional information related to sequencing probes aids in reconstruction; and / or the ability to create long-range haplotypes through variant fading.
[0318] Hyb & Seq Chemistry Design: The sequencing probe contains a target-binding domain that forms a base pair with a single-molecule target, and a barcode domain with at least three positions (R1, R2, R3) corresponding to the hexameric sequence present within the target-binding domain. 4096 different sequencing probes enable sequencing of any target sequence. Reporter Probes: Three reporter probes bind sequentially to the aforementioned positions on the barcode domain. Each reporter complex corresponds to a specific dinucleotide. Functionality is achieved through hybridization.
[0319] The long and short readout sequencing methods of this disclosure utilize the same simple probe hybridization workflow for targeting and capturing nucleic acids, as shown in Figures 18 and 19. As shown in the right-hand panel of Figure 18, multiple sequencing probes can hybridize to the target nucleic acid simultaneously, and as shown in the right-hand panel of Figure 19, optical resolution allows for the individual identification of several spots per long target. By simultaneously hybridizing and recording multiple sequencing probes, the amount of information recorded in a single readout is increased. Long haplotypes are unique to single-molecule analysis and can be reconstructed by their actual physical location rather than by computer reconstruction. The sequencing methods of this disclosure enable long sequencing readouts up to several hundred kilobases.
[0320] As shown in Figure 20, a pool of special sequencing probes hybridizes to a 15-kilobase target in the expected pattern. The sequencing probes hybridize to an extended target (preferably a hydrodynamically extended target) at the expected sequence-specific location and relative physical distance. The sequencing method of this disclosure allows for the reading of more bases in each cycle because it contains more information than short readout techniques. The long readout sequencing method of this disclosure also records the relative position of the sequencing readout result. This helps in the reconstruction of the long readout. When using the long readout sequencing method of this disclosure, the readout length = consensus sequence length = captured target molecule length.
[0321] Figure 27 shows the results of an experiment in which a 33-kilobase DNA fragment was captured, extended, hybridized with a sequencing probe and a reporter probe, and detected. The sequencing method of this disclosure is compatible with DNA fragments up to 33 kilobases and DNA fragments exceeding 33 kilobases. The readout length is limited only by the initial length of the target nucleic acid fragment, and not by enzymes or sequencing chemistry.
[0322] Figure 13 illustrates the additional capabilities of the sequencing method of this disclosure for targeted phasing long readouts. The phasing long range of haplotypes is unique to the data and readily identifiable with respect to variant phasing. Sequencing of the entire long target molecule is unnecessary because sequence cycling can be restricted to the sequencing window of interest using “blocker oligos”.
[0323] The results of Example 1 demonstrate that the sequencing method of this disclosure enables monomolecular sequencing of long readout lengths. In particular, these results show that the capture and hydrodynamic extension of 15-kilobase and 33-kilobase single-stranded DNA molecules are successful; that spatially separated fluorescence data accurately correspond to the actual relative positions along long monomolecules; and that multiple sequences of more than 10 bases are read out simultaneously per sequencing cycle.
[0324] Example 2: ShortStack™ Technology: Reference-guided and accurate reconstruction of Hyb & Seq readouts for targeted sequencing to separate short nucleotide variants and InDel.
[0325] ShortStack is an open-source algorithm designed to reconstruct different hexamer readouts (hexamer spectra) from Hyb & Seq. This algorithm is single-molecule based and uses hexamer readouts from each imaged feature to identify targets, and then reconstructs these hexamer readouts into a consensus sequence using a statistical approach with error correction.
[0326] Single-molecule sequencing using Hyb & Seq chemistry and ShortStack was performed as follows: After each hybridization cycle using Hyb & Seq chemistry, a hexamer readout of the single-molecule target was generated; after multiple hybridization cycles were completed, hexamer spectra covering multiple regions of each single-molecule target were generated; the consensus sequences of each single-molecule target were derived using the hexamer spectra along with the reference sequences of each target nucleic acid molecule.
[0327] The results of targeted sequencing using Hyb & Seq technology with ShortStack show the following: the single-molecule target identification algorithm using hexamer spectra had a 100% success rate; and / or the reference-guided reconstruction algorithm achieved single-molecule consensus accuracy of over 99% (QV approximately 32) with 5 coverages, and matching somatic variant detection (R 2 Approximately 90% were confirmed using pre-characterized gDNA samples; and / or all hexamers and in silico experiments using ShortStack confirmed that the mean QV was greater than 90 in larger target panels. The ShortStack algorithm can accurately reconstruct Hyb & Seq data. Figure 28 shows the results from a sequencing experiment. These results were obtained using the sequencing method of this disclosure and then analyzed using the ShortStack algorithm. In this experiment, the sequenced target nucleic acids included fragments of the gene BRAF (SEQ ID NO: 3), the gene EGFRex18 (SEQ ID NO: 4), the gene KRAS (SEQ ID NO: 5), the gene PIK3CA (SEQ ID NO: 6), the gene EGFRex20 (SEQ ID NO: 7), and the gene NRAS (SEQ ID NO: 8). Figure 28 shows both base coverage and variant readings. The coverage plot shows base coverage in FFPE (formalin-fixed paraffin-embedded) gDNA. The results indicate that the available sequencing probes cover most of the bases of the diverse targets. The error plot shows the relationship between the error rate and coverage at the location examined in the FFPE gDNA sample for the diverse targets. The results show an error rate of less than 1% with 8 coverages. The frequency plot shows the correlation between the expected and known frequencies of variants in the sequenced Horizon gDNA sample. The table lists the sequenced Horizon genome reference gDNA, and it can be seen that some of the variant molecules match the known frequencies in the reference sample.
[0328] The results of Example 2 demonstrate that ShortStack is an accurate algorithm for partial reconstruction of hexamer spectra obtained using the sequencing method of this disclosure. In particular, the results show: target identification with 100% accuracy and an average quality value of over 30 per base using simulation data; base readings in Hyb & Seq experimental data with over 99% accuracy in 5 coverages; and the frequency of variant detection from genomic DNA matching known values (R 2 Approximately 90%; it boasts excellent computational performance, directly proportional to the number of hexamers being reconstructed, and can reconstruct 69,000 molecules in about 15 minutes on a personal computer.
[0329] Example 3: Target sequencing of natural gDNA from FFPE samples without a library using Hyb & Seq™ technology - Hybridization-based single-molecule sequencing system
[0330] Using the sequencing method (Hyb & Seq) of this disclosure, cancer panel-targeted sequencing was performed on native gDNA from FFPE samples, revealing the following: accurate base readings in single-molecule sequencing targeting oncogene targets; and / or accurate detection of known oncogene mononucleotide variants (SNVs) and insertions / deletions (InDels); and / or multiple capture of oncogene targets from gDNA extracted from FFPE (median DNA fragment size of 200 bases); and / or end-to-end automated sequencing using an improved prototype instrument.
[0331] The Hyb & Seq chemistry and workflow were as follows: Capture the genomic target of interest directly onto the surface of the sequencing flow cell; introduce a pool containing hundreds of hexameric sequencing probes into the sequencing chamber; identify the hexameric bases by sequentially hybridizing fluorescent reporter probes to the barcode regions of the sequencing probes in three reporter exchange cycles; wash away the sequencing probes once the bases are identified; repeat the cycle with a fresh pool of sequencing probes until the target region is read to a sufficient depth.
[0332] Key advantages of Hyb & Seq: Simple and rapid FFPE workflow - clinical sample to begin sequencing within 60 minutes; and / or enzyme- or amplification-free / library-free configuration; and / or total work time of 15 minutes; and / or high accuracy - low chemical error rate + intrinsic error correction; and / or both long and short readouts - readout length is determined by the input sample and not limited by chemistry.
[0333] The design of the Hyb & Seq chemistry is as described in Example 1. Preparation of Hyb & Seq samples for processing FFPE tissue consists of three simple steps: (1) deparaffinization and lysis in a single test tube; (2) particle removal using a syringe filter; and (3) optional DNA fragmentation and target capture. This process requires 1 to 3 10-micron FFPE sections per sample. The entire process can be completed in under 60 minutes, requiring only common laboratory equipment: a heat block, pipettes, filters, and reagents.
[0334] Figure 29 shows a schematic diagram of the experimental design for multiplexing the capture and sequencing of oncogene targets derived from FFPE samples. A total of 425 sequencing probes were designed and constructed to sequence a portion of 11 oncogene targets (sequence numbers 3-13). The loci of known variants for each gene target were covered by numerous sequencing probes (perfect matches + single mismatches). Base coverage and base-level accuracy were measured in these regions. The accuracy of variant detection was obtained using pre-characterized reference samples. The top panel of Figure 29 shows that the sequencing probes (blue) are aligned to target sequences (gray) surrounding the locations of known variants (red). For each variant location (red), four probe sequences containing the respective (A, G, C, T) base variants were included. Tracking a single target DNA molecule over 800 barcode exchange cycles during sequencing yielded numerous hexamer readouts, which were reconstructed using the ShortStack® algorithm as described in Example 2.
[0335] Figure 28 shows the sequencing results, including observed and predicted values for the average coverage, single-nucleotide error rate, and variant frequency for each target. The results of Example 3 demonstrate that Hyb & Seq sequencing allows for the detection of single-nucleotide variations with a small error by performing multiplex sequencing of 11 target regions in FFPE and a reference gDNA sample.
[0336] Example 4: Direct single-molecule RNA sequencing without conversion to cDNA using Hyb & Seq (trademark) chemistry.
[0337] Direct single-molecule RNA sequencing using Hyb & Seq chemistry was performed as follows: native RNA molecules were directly captured without conversion to cDNA and immobilized on the surface of a sequencing flow cell; a pool containing several hundred hexameric sequencing probes was introduced into the sequencing flow cell; perfectly matched sequencing probes were randomly hybridized to the surface of a single-molecule RNA target; fluorescent reporter probes were sequentially hybridized to the barcode regions of the sequencing probes to identify the hexameric bases; after base identification, the sequencing probes were washed away; the cycle was repeated until the target region was read to a sufficient depth.
[0338] Key results: Sequencing of targeted single RNA molecules showed a coverage profile similar to that of DNA; and / or RNA molecules were stably maintained on the flow cell surface over more than 200 Hyb & Seq cycles; and / or mRNA and genomic DNA were simultaneously captured and quantified from a single FFPE slice; and / or eight transcripts were multiple-captured and quantified using as little as 10 ng of total RNA.
[0339] The design of the Hyb & Seq chemistry is as described in Example 1. The upper part of Figure 30 shows a schematic diagram comparing the experimental steps related to direct RNA sequencing with those related to conventional RNA sequencing performed using cDNA conversion. The middle and lower parts of Figure 30 show the results from experiments investigating the compatibility of RNA molecules with the sequencing method of this disclosure. In these experiments, four target RNA molecules were sequenced (SEQ ID NOs: 14-17). The results show that RNA molecules can be captured and detected over at least 200 sequencing cycles.
[0340] Figure 31 shows the results from an experiment performing direct single-molecule RNA sequencing using the sequencing method of this disclosure. A native RNA molecule encoding the NRASex2 (SEQ ID NO: 18) fragment was directly captured without conversion to cDNA, immobilized on the surface of a sequencing flow cell, and sequenced using the method of this disclosure. This experiment was repeated using a captured DNA molecule instead of RNA. Figure 31 shows that the sequencing coverage of DNA and RNA was equivalent. This demonstrates that RNA can be directly sequenced without conversion to cDNA using the sequencing method of this disclosure.
[0341] Figure 24 shows an example of capturing RNA and DNA together from an FFPE sample. The left panel of Figure 24 is a schematic diagram showing that both gDNA and mRNA can be simultaneously captured, detected, and sequenced from an FFPE sample using the method disclosed herein. The sample was prepared using the same FFPE workflow as described in Example 3. The same capture protocol was used, but RNA-specific capture probes and DNA-specific capture probes were used. DNA and RNA molecules were simultaneously sequenced in the same flow cell using the same sequencing probe. The right panel of Figure 24 shows the results from an experiment in which NRAS RNA and DNA were simultaneously captured and detected from a tonsil FPPE sample. The fluorescence image shows that both RNA and DNA can be captured and detected. The bar graph shows that specific RNA capture probes and specific DNA capture probes are required to simultaneously capture RNA and DNA.
[0342] Figure 32 shows the results of multiple target capture of an RNA panel. Multiple capture of eight transcripts expressed at moderate to high levels was performed for human universal reference RNA (HUT) at various input levels (0 ng, 1 ng, 10 ng, 100 ng, 1000 ng). The multiple-captured RNA molecules were immobilized on the surface of a flow cell, and specific sequencing probes and reporter probes were hybridized to the immobilized RNA molecules for quantification. The lower part of Figure 32 shows fluorescence images of RNA molecules captured with a 100 ng input. An example of each RNA is circled, and the name of the transcript and the corresponding color combination of the reporter complex used for identification are indicated. The upper part of Figure 32 shows the quantitative results of the count for each specific RNA target.
[0343] The results of Example 4 demonstrate that single-molecule RNA sequencing can be achieved with Hyb & Seq chemistry. In particular, the results show that (1) direct RNA sequencing is possible without conversion to cDNA; (2) RNA molecules remain stable throughout the Hyb & Seq cycling process; (3) both RNA and DNA molecules can be captured and sequenced in a single Hyb & Seq workflow; and (4) targeted capture of the mRNA panel can be performed with a small total RNA input of 10 ng.
[0344] Example 5: Integrated bioinformatics algorithm for high-throughput, short molecular-level readouts generated from a Hyb & Seq sequencing platform
[0345] We designed ShortStack® software to perform bioinformatics analysis tasks based on standard sequencing, including alignment, error correction, mutation reading, and readout reconstruction. Figure 33 shows a schematic diagram of the entire ShortStack® software pipeline, which includes hexamer alignment and coverage estimation; and / or mutation identification; and / or graph data structure construction; and / or sequence reconstruction and error correction at the molecular level.
[0346] All algorithms were strictly executed within the scope of information obtained from single molecules, ensuring that the final mutation reading results were independent of the sample's mutation frequency. The hexamers were grouped into different molecules according to their panel binding sites. To assign the molecules to targets, the hexamers were aligned to all different target regions for each molecule, and the best-matched gene target was selected.
[0347] The quality of molecular identification was evaluated by measuring one statistical metric. From alignments to N different target regions, N distributions of sums of coverage values were generated for each target. The one with the highest match score of the sum of coverage values was selected as the correct match. The z-score statistics of the selected highest match score were measured for all score distributions of the N different targets. Molecular attributes with low confidence (z-score less than 2.5σ) were filtered out.
[0348] Key advantages of the ShortStack™ algorithm include its ability to accurately handle potential sequence ambiguity by implementing a hierarchical hash index design; and / or its advanced algorithmic design structure ensures mapping quality through prioritization and prevents mutation overestimation.
[0349] In addition, the variant graph data structure makes it possible to create computational models for various types of mutations (substitutions, insertions, deletions), generating outputs for sequence reconstruction and variant reading (substitution variants are represented as additional nodes in a graph of the same length as the original sequence; insertions can be modeled by adding connected nodes of arbitrary length; deletions are modeled by adding empty base pair strings to artificial nodes in the graph); and / or in blind variant searches (i.e., searches for mutagenic sequence alignments), Hamming distances from all positions in the reference sequence are measured, and new nodes are added to the graph representing the searched mutations; and / or coverage estimation for mutated hexamers is performed using a hierarchical hash table.
[0350] The constructed graph data structure enables sequence reconstruction at the molecular level and correction of instrument errors. A dynamic programming algorithm is applied to the constructed graph to find the best scoring pathway, where the score is defined as normalized base coverage. The best scoring pathway in the graph represented sequence reconstruction at the molecular level. Correct mutant sequences were included, while instrument errors within the hexamers were discarded.
[0351] From the simulation dataset, we confirmed that this software can provide highly accurate sequence reconstruction and mutation reading results at the molecular level. Figure 34 shows the results of mutation analysis of the simulation dataset using the ShortStack® software pipeline. These results show the accuracy of mutation reading for 10 random mutations. In a dataset with moderate instrument error, the accuracy averaged 99.39% (targeted search) and 98.02% (blind search). Under simulations with high instrument error, the performance averaged 97.19% (targeted search) and 93.53% (blind search). When the threshold for base coverage at the molecular level was doubled, the results improved to 99.5% (2-pass coverage) and 99.9% (3-pass coverage).
[0352] ShortStack™ software can handle a wide range of different mutations. Figure 35 shows the overall mutation reading accuracy, where each bar represents the average value from 10 different mutated targets for various types of mutations. Insertion and deletion lengths were selected between 1 bp and 15 bp. These results show mutation reading accuracy of 94.4% (1 coverage), 97.2% (2 coverage), and 98.5% (3 coverage).
[0353] Example 6: Sample preparation aimed at processing FFPE tissue for Hyb & Seq
[0354] Formalin-fixed paraffin-embedded (FFPE) tissues are a challenging type of sample input for traditional sequencing platforms. Hyb & Seq's sample preparation method successfully handles FFPE tissue input for downstream sequencing. First, nucleic acids to be sequenced are extracted from the formalin-fixed paraffin-embedded (FFPE) tissue in a single-step process. One or more FFPE sections, 10 μm thick, are heated in a nucleic acid extraction buffer solution to simultaneously melt the paraffin wax, decompose the tissue, and release nucleic acids from the cells. Suitable extraction buffers are known in this field, and typical examples include proteinases, washing agents (such as Triton-100), chelating agents (such as EDTA), and ammonium ions. The FFPE sections and extraction buffer are incubated at 56°C for 30 minutes to separate the paraffin from the tissue and allow proteinase K to digest the tissue structure, exposing the embedded cells to the washing agent and enabling cell lysis. The solution is inverted three times at 8-minute intervals to facilitate mixing of the reagents during the deparaffinization and digestion processes of the tissue. After this step, the solution is heated to 98°C to promote the reversal of formaldehyde crosslinks, thereby further facilitating nucleic acid extraction.
[0355] Once nucleic acids are extracted from FFPE tissue, the solution is filtered using a 2.7 μm pore size glass fiber filter (Whatman) to remove tissue debris and frozen paraffin. The resulting solution is a homogeneous, translucent solution containing nucleic acids. These nucleic acids are highly fragmented due to the formalin fixation process and storage conditions. If further fragmentation is required, the DNA can be physically sheared using a Covaris concentrated sonication system. Due to the buffer conditions, extended sonication is required for nucleic acid shearing. Sonication is performed for 600 seconds using standard settings of 50 W peak input power, 20% duty cycle, and 200 cycles / burst to maximize the capture of target. To further shorten the fragment length, the emulsified paraffin can be precipitated from the filtered solution by centrifugation at 21,000 g for 15 minutes at 4°C. This allows the DNA to be sheared to approximately 225 bp.
[0356] Next, target capture is performed by binding a pair of capture probes to the target nucleic acid molecule during a rapid hybridization process. The 5' capture probe contains a 3' biotin moiety, allowing the target to bind to the surface of a streptavidin-coated flow cell during the target deposition process. The 3' capture probe contains a 5' tag sequence (G sequence) that allows it to bind to beads during the purification process. The reaction rate is governed by the concentration of the capture probes, which are added in a low nanomolar range to maximize the reaction rate. The capture probes hybridize to the target adjacent to the region of interest to generate a sequencing window. For each DNA target, the capture probe set also includes oligos with the same sequencing window and sequence configuration, which hybridize to the target's antisense strand to prevent re-annealing. The solution containing the capture probes is heated to 98°C for 3 minutes to denature the genomic DNA, and then incubated at 65°C for 15 minutes. NaCl concentrations ranging from 400 mM to 600 mM are used in this hybridization reaction. Table 3 lists a panel of over 100 targets identified experimentally. This table details the genes and the exons of the target DNA regions. [Table 3]
[0357] After binding the capture probe to the target DNA region, the remaining genomic DNA is purified to produce a target-rich solution. Beads coated with an antisense oligonucleotide (anti-G sequence) against the binding sequence of the 3' capture probe are incubated with the capture reaction mixture at room temperature for 15 minutes. Following this binding step, the beads are washed three times with 0.1×SSPE to remove non-target DNA and biotin-containing 5' capture probe. After washing, the beads are resuspended in 14 μl of 0.1×SSPE and then heated at 45°C for 10 minutes to elute the purified DNA target from the beads. After elution, 1 μl of 5 M NaCl is added to ensure the capture probe remains bound to the DNA target.
[0358] The final step in the sample preparation process is to deposit the DNA target onto the surface of the flow cell, where the DNA target can be analyzed using the probes disclosed herein as described herein. The rate at which the target is loaded into the fluid channel of the flow cell is controlled using a syringe pump. Thus, all targets have time to diffuse across the height of the channel and bind to the surface of streptavidin. This loading method creates a density gradient of the target, with the number of molecules per unit area being highest at the inlet of the fluid channel and decreasing along the length of the channel in the direction of the fluid flow toward the outlet. At a flow rate of 0.35 μl / second, a large amount of capture is achieved within a channel length of approximately 10 mm in a channel that is 1.6 mm wide and 40 μm high. Once the target is bound to the surface by the biotinylated 5' capture probe, a solution of the biotinylated oligo (G hook), which is the reverse complement of the binding sequence of the 3' capture probe, is injected, causing the free end of the target to hang down and create a cross-linked structure. The ssDNA region in the central part is the sequencing window of interest. Next, a solution of G sequence oligos is added to hybridize with the excess G hooks on the surface, thereby reducing the amount of ssDNA on the surface. Figure 11 shows the capture of a single target nucleic acid using two capture probe systems according to this disclosure.
[0359] Example 7: Multicolor Reporter Image Processing for Hyb & Seq
[0360] The image processing pipeline includes the following steps: background subtraction, registration, feature detection, and classification. In the background subtraction step, the average background of any given channel is a function of shot noise and exposure. In our system, the blue channel has the highest level of background, coupled with larger fluctuations. A simple top-hat filter with a circular structure element of radius 7 pixels is applied to subtract the localized background. For registration, perfect alignment of the features of interest is essential for multicolor and multicycle feature analysis. This system requires two forms of registration. For the first form, a local affine transform is applied to every image channel in a single acquisition stack. This transform is a function of the optical system and therefore fits a given apparatus. This function is pre-calculated for all trials and applied to all acquired images. For the second form, a fixed-shift morphological transformation is calculated using normalized cross-correlation to capture the drift of the mechanical gantry during the trial. The next step is feature detection.
[0361] Once all images are registered, features are detected using a matched filter and a LoG (Laplacian of Gaussian) filter. The filter is applied with a fixed kernel size (matched to the diffraction limit of the feature) and a varying standard deviation (matched to the wavelength of the corresponding channel) to enhance and match the spot response. Potential locations of reporters are identified using maxima. Intensity values associated with each identified feature are collected and classified. The final step is classification. The intensity of multicolor reporters is classified using a Gaussian naive Bayes model. This model assumes that the reporter intensities are independent of each other and follow a normal distribution. Next, the maximum a posteriori rule, i.e., the MAP rule, is used in this model to classify (all channels x l The probability that a specific feature y (specified by its intensity) belongs to a given class (Ck) is calculated.
number
[0362] The intensity distribution of reporter complexes labeled with individual color combinations is shown in Figure 36. Figure 36 shows a coding scheme using two dyes, blue and red. In the two-color coding scenario, there are six classes (including the background). In the implemented system, 14 classes are possible by selecting four colors. Note that there is some overlap between the distribution of single semi-dye and whole dye. As a consequence, the classification between these classes shows a higher error rate, with the maximum misclassification rate between "xG" and "GG" being 11.8%. The misclassification rate in the 10-class model is less than 0.2%. Since each reporter requires a maximum of eight classes, it is straightforward to select the one with the smallest classification error. The detected color codes are translated to identified base pairs based on the reference table. Features are tracked over multiple cycles using the probes disclosed herein.
[0363] Example 8: Purification and deposition of target nucleic acids using a capture probe
[0364] Two capture probe systems are used to capture target nucleic acid molecules, achieving highly specific enrichment. The capture probes are designed to bind to the target nucleic acid at a position adjacent to the region of interest, creating a "sequencing window." The 5' capture probe (called CapB) contains the 3' biotin moiety. The 3' capture probe (called CapA) contains the 5' affinity tag sequence (called the G sequence). On average, the capture probes are approximately 40 nucleotides long and are designed based on Tm and sequence context. The sequencing window is approximately 70 nucleotides long and is easily adjustable. Figure 11 shows schematic diagrams of the two capture probe systems.
[0365] The biotin portion on CapB connects the target nucleic acid to the surface of a streptavidin-coated sequencing flow cell. The affinity tag on CapB allows for reversible binding of the target nucleic acid molecule to magnetic beads during purification. Using CapA and CapB enables highly stringent target enrichment because both probes remain bound to a single target nucleic acid molecule, allowing the target to survive both the magnetic bead purification process and the surface deposition process. Multiple capture has been demonstrated with up to 100 targets simultaneously. To achieve efficient capture in a short time, the capture probe is added at concentrations ranging from 1 nM to 10 nM.
[0366] In experimental tests, a panel consisting of approximately 10 target nucleic acid molecules was purified using G beads and two probe capture systems. CapA and CapB probes initially hybridized to the target nucleic acids. Next, the G sequence portion of the bound CapA probe hybridized to the G hook on the G bead, thereby ligating the target nucleic acid molecule to the G bead. A series of stringent washes using 0.1×SSPE were performed to remove CapB that did not bind to non-target DNA. To release the target nucleic acid molecules from the G beads, the G sequences were denatured by elution at 45°C with a low salt concentration, while CapA and CapB remained hybridized to the target nucleic acids.
[0367] The tests revealed that when purifying a panel consisting of approximately 100 target nucleic acid molecules, nonspecific / background signals significantly increased. This increase in background could be attributed to several factors: (1) increased interaction between CapA and CapB probe species, leading to an increased amount of freed CapB probe carried over through purification; and (2) increased interaction between CapB probes and G hooks or G beads, leading to the purification of unwanted target nucleic acids. Furthermore, as the panel size increases, the possible interactions between CapB and CapA species and sequencing probes increase exponentially. These interactions can interfere with the ability to densely deposit targets, resulting in wasted sequencing readouts.
[0368] The purification procedure can be modified to reduce nonspecific signals and background signals caused by the purification of freed probe species and unwanted target nucleic acid molecules. Firstly, including formamide at a concentration of 30% v / v in the buffer used when binding the target nucleic acid molecule to the G beads reduces the background count by half (measured by the count in a control lacking the target molecule). This is likely due to the incomplete hybridization of the capture probe, which is released by interference, with the G hook. This allows for the washing away of excess probe. Secondly, including four iso-dG bases in the G hook on the G bead and complementary iso-dC bases in the CapA G sequence reduces the background count by one-third (measured by the count in a control lacking the target molecule). Iso-dC and iso-dG are isomeric variants of the natural dC and dG bases. Because isobases form base pairs with other isobases but not with native bases, only incomplete hybridization between the capture probe and the iso-G hook can exist between the non-isobase of the G sequence and the iso-G hook. These incomplete interactions are more easily broken during stringent washing. Finally, the subsequent purification of the iso-G beads using Ampure® XP (Agencourt Biosciences Company) beads reduces the background count (measured by the count in a control lacking the target molecule) to at least 1 / 20. During purification with Ampure® XP beads, the DNA sample is mixed with a suspension of carboxylated magnetic beads in a solution of polyethylene glycol (PEG) and NaCl. The concentrations of PEG and NaCl can be titrated so that only molecules exceeding a molecular weight threshold precipitate and bind to the beads. The Hyb & Seq target hybridized to the capture probe is on the order of 81 kDa, while the free probe is on the order of 17 kDa or less.By mixing a suspension of Ampure® XP beads with the isoguanine bead eluent at a volume ratio of 1.8:1, hybridized targets bind to the beads, allowing a significant portion of the free probes to be washed away prior to final elution.
[0369] Therefore, the model purification workflow consists of the following steps: (1) hybridize the capture probe-target nucleic acid complex to isoguanine beads in 5×SSPE / 30% formamide; (2) wash the isoguanine beads with 0.1×SSPE; (3) elute the capture probe-target nucleic acid complex at 45 °C in 0.1×SSPE; (4) bind 1.8 volumes of the isoguanine bead eluent to Ampure® XP beads; (5) wash the Ampure® XP beads with 75% ethanol; (6) elute the capture probe-target nucleic acid complex in 0.1×SSPE, and since the target elutes into 7.5 μl, add 0.5 μl of 5 M NaCl thereafter.
[0370] After purification, use an infusion syringe pump to deposit the capture probe-target nucleic acid complex onto the sequencing surface and slowly inject the purified target into the flow cell. To determine the deposition gradient, take images of the flow cell at various positions along the length of the channel. A typical deposition gradient is shown in Figure 37. If the channel height is 20 μm, loading the sample at a flow rate of 0.167 μl / min results in target concentration, with 80% of the total target binding within 5.1 mm along the length of the channel. This represents approximately 240 FOVs in a Gen2 imager with a FOV of 0.0357 mm 2 and a channel width of 1.7 mm for the flow cell. The gradient can be varied by adjusting the flow rate during deposition.
[0371] Using the procedure described above, we investigated the purification and deposition of 100 target nucleic acid panels containing genomic DNA sheared to approximately 300 base pairs in size. A series of experiments were performed in triple replication with DNA input ranging from 25 ng to 500 ng. As shown in Figure 38, the total number of targets on the flow cell was extrapolated by imaging the deposition gradient, and the average count was obtained. The capture efficiency was 6.6%, which was consistent across the range of DNA mass input.
[0372] Example 9: Design and Features of a Sequencing Probe
[0373] The sequencing probe hybridizes to the target nucleic acid molecule via a target-binding domain. In this embodiment, the target-binding domain is eight nucleotides long and contains a cross-linked nucleic acid (LNA) hexamer with adjacent (N) bases (the (N) bases can be universal / degenerate or canonical bases) (N1-B1-B2-B3-B4-B5-B6-N2, where B1-B6 are LNAs and N1 and N2 are universal / degenerate or canonical bases independent of the (hexamer) sequence B1-B2-B3-B4-B5-B6). A complete set of 4,096 sequencing probes encodes all possible hexamers, thus enabling sequencing of any target nucleic acid. Each sequencing probe also contains a barcode domain that encodes the hexamer sequence present within the target-binding domain. Each barcode domain contains three positions (R1, R2, R3). Each position within the barcode domain corresponds to a specific dinucleotide within the hexamer of the target-binding domain and contains a unique sequence capable of binding to a specific labeled reporter complex. A schematic diagram of the entire sequencing probe is shown in Figure 1. Each position within the barcode domain codes for one of eight "color combinations" created using four fluorescent dyes: blue (B), green (G), yellow (Y), and red (R). During each sequencing cycle, the reporter complex binds to one of three positions within the barcode domain. This indicates the attribute of the corresponding dinucleotide within the hexamer of the target-binding domain. Three "color combinations," one for each position within the barcode domain, are recorded during three consecutive sequencing cycles, allowing for the identification of the entire hexamer of the target-binding domain. The 4,096 sequencing probes are divided into eight pools, each associated with one of 512 possible barcodes.
[0374] Example 10: Conditions for the design, purification, and binding of the reporter complex
[0375] In this embodiment, each reporter complex was a branched structure consisting of 37 DNA oligomers, designed to hold a total of 30 fluorescent dyes (15 dyes for each color in the color combination). The 37 DNA oligomers constituting the reporter complex can be classified by size. The largest oligomer, called the primary nucleic acid, covalently binds to a complementary nucleic acid that is 12 or 14 nucleotides long. The primary nucleic acid is 96 nucleotides long. The complementary primary nucleic acid binds to one of the positions R1, R2, or R3 on the barcode domain of the sequencing probe. The next largest DNA oligomer is 89 nucleotides long and is called the secondary nucleic acid. There are 6 secondary nucleic acids per reporter complex, with 3 secondary nucleic acids for each color in the color combination. Each secondary nucleic acid contains a sequence of 14 nucleotides long, which allows the secondary nucleic acid to hybridize to the primary nucleic acid. The smallest DNA oligomer is 15 nucleotides long and is called the tertiary nucleic acid. Each two-color probe contains 30 tertiary nucleic acids, with 15 tertiary nucleic acids per color. A schematic diagram of the branched structure consisting of 37 DNA oligomers is shown in Figure 4.
[0376] The tertiary nucleic acid contains a detectable label in the form of a fluorescent dye. There are four types of fluorescent dyes: blue (B), green (G), yellow (Y), and red (R). Combining the dyes within a single reporter complex allows for 10 possible combinations of two colors (BB, BG, BR, BY, GG, GR, GY, RR, YR, YY). To prevent color exchange or cross-hybridization between different fluorescent dyes, each secondary and tertiary nucleic acid corresponding to a specific fluorescent dye contains its own unique sequence. For example, each tertiary nucleic acid labeled with the Alexa 288 fluorescent dye, i.e., blue, contains a sequence that is complementary only to the blue secondary nucleic acid. The blue secondary nucleic acid further has a different sequence that is complementary only to the primary nucleic acid molecule corresponding to the color combination containing blue.
[0377] Each complementary nucleic acid contains different sequences between positions R1, R2, and R3 of the barcode domain of the sequencing probe. Therefore, even if positions R1 and R2 of the same barcode domain encode the same dinucleotide, the complementary nucleic acid molecule identifying the dinucleotide at position R1 will not bind to position R2. Similarly, the complementary nucleic acid molecule identifying the dinucleotide at position R2 will not bind to position R1. The complementary nucleic acids are designed to efficiently detach from the sequencing probe by utilizing competitive toehold exchange (for complementary nucleic acids with a length of 12 nucleotides) or UV cleavage (for complementary nucleic acids with a length of 14 nucleotides).
[0378] The preparation of the reporter complex involves two consecutive hybridization steps: (1) hybridizing the tertiary nucleic acid to the secondary nucleic acid, and (2) hybridizing the tertiary nucleic acid + secondary nucleic acid to the primary nucleic acid. Four separate tertiary nucleic acid reaction products are prepared by combining 100 μM of secondary nucleic acid and 600 μM of tertiary nucleic acid in 4.2 × SSPE buffer at room temperature for 30 minutes. Next, 24 reporter probes are prepared separately using 2 μM of primary nucleic acid, 7.2 μM of secondary nucleic acid + dye #1 tertiary nucleic acid, and 7.2 μM of secondary nucleic acid + dye #2 tertiary nucleic acid in 4.8 × SSPE. These reaction products are heated at 45°C for 5 minutes and then cooled to room temperature for 30 minutes. Next, these 24 reaction products are classified into three different pools corresponding to the barcode domain (i.e., R1, R2, R3). For example, eight different reporter probes (2 μM each) bound to the R1 barcode domain are pooled together and diluted 10-fold to a final working concentration of 200 nM for each reporter complex. The reporter complexes can be purified using high-pressure liquid chromatography (HPLC). Figure 39 shows that the reporter probes can be obtained by removing the free oligomers and probes that did not form properly by HPLC purification.
[0379] After preparing the reporter complex, a standard quality assurance test is performed. In three separate flow cells, each of the three pools of reporter probes is tested for binding to the corresponding barcode region (one of R1, R2, or R3). The test is performed with a modified sequencing probe construct that has only one barcode domain and is immobilized on the surface of the flow cell. When all eight dodecamers representing each color are multiplexed, it is expected that all eight reporter probes will be identified with a large color count.
[0380] Various buffer additives were tested to improve the efficiency and accuracy of hybridization of the barcode domains of the reporter probe and the sequencing probe. Figure 40 shows experimental results indicating that hybridization of the reporter probe and the sequencing probe is possible with the highest efficiency and accuracy in a short time using a buffer containing 5% dextran sulfate (500K) and 15% formamide or 15% ethylene carbonate. However, Figure 41 shows experimental results indicating that ethylene carbonate has a negative effect on the surface of the sequencing slide, resulting in significant loss of target nucleic acid over time. Therefore, a buffer containing 5% dextran sulfate (500K) and 15% formamide is superior for efficient and accurate hybridization of the reporter probe and the sequencing probe.
[0381] Example 11: Design and verification of complementary nucleic acid sequences
[0382] The reporter probe contains a complementary nucleic acid that binds to a specific position (one of R1, R2, or R3) on the barcode domain of the sequencing probe. Complementary nucleic acids containing 12 nucleotides (12-mer) or 14 nucleotides (14-mer) were designed and tested to determine the sequence best suited for hybridization. For screening, the following criteria were used to determine the optimal sequence: the sequence must exhibit a high binding efficiency, with the reporter probe and sequencing probe binding with an efficiency of over 80% in 10 sequencing cycles; the sequence must exhibit a fast hybridization rate occurring within 15 to 30 seconds; and the sequence must exhibit a high specificity, with a cross-hybridization error of less than 5% in the reporter pool.
[0383] Table 4 shows the 24 identified dodecamer sequences (sequences 19-42). Since each barcode domain contains three positions, the 24 dodecamer sequences can be divided into three groups to create an 8×8×8 dodecamer reporter set. [Table 4]
[0384] Similarly, 14-mer sequences were designed, but differed from 12-mer sequences in three respects. First, 14-mer sequences contain longer hybridization sequences because they contain 14 single-stranded nucleotides that bind to specific positions on the barcode domain, instead of the 12 single-stranded nucleotides present in the 12-mer. Second, 14-mer sequences exhibit greater sequence diversity because they were not designed for removal via toehold. 14-mer sequences hybridize more strongly to the sequencing probe, reducing the efficiency of removal via toehold. Therefore, for 14-mer sequences, a sequence-independent removal strategy was sought to relax the constraints on sequences during screening. Sequences for screening were designed using an algorithm that included the following set of rules. The aforementioned rules are: nucleotide composition lacking "G" or "C" (i.e., low complexity sequence); GC content of 40% to 60%; melting point (Tm) of 35°C to 37°C; hairpin folding energy (dG) greater than 2; and compatibility with other sequencing probes (Hamming distance greater than 7). To minimize hybridization of the 14-mer sequence with genomic sequences that may be present in the target nucleic acid, possible sequences were filtered using External RNA Controls Consortium sequences as a guide. Thirdly, the 14-mer sequence was designed to be removed from the barcode domain of the sequencing probe by cleaving the strand using linker modifications that allow the 14-mer complementary nucleic acid to be cleaved at the site where it is attached to the primary nucleic acid of the reporter complex. Removal of the 14-mer sequence causes "darkening" of the reporter complex signal, enabling the next screening cycle and signal detection. We investigated various cleavable linker modifications, including those cleavable with UV light, those cleavable with reducing agents (such as TCEP), and those cleavable with enzymes (such as uracil cleaved by the USER® enzyme). All of these cleavable linker modifications were found to promote efficient darkening of the reporter complex. Darkening was further enhanced by introducing the cleavable linker modifications into secondary nucleic acids.These cleavable linker modifications were positioned between the sequence that hybridizes to the primary nucleic acid and the sequence that hybridizes to the tertiary nucleic acid. Figure 10 shows the possible positions for the cleavable linker modifications within the reporter probe.
[0385] Screening of potential 14-mer sequences identified two groups of acceptable sequences. Table 5 shows the first group, which contained 24 sequences (sequences 43-66). These 24 sequences could be divided into three groups to create an 8×8×8 14-mer reporter set. [Table 5]
[0386] Table 6 shows the other group, which contained 30 sequences (sequence numbers 67-96). These 30 sequences could be divided into three groups to create a 10×10×10¹⁴-mer reporter set. [Table 6]
[0387] Following screening, the 8×8×8 dodecamer reporter set, the 8×8×8 quadrameric reporter set, and the 10×10×10 quadrameric reporter set were experimentally confirmed. For the 8×8×8 dodecamer binding scheme, confirmation was performed using the Hyb & Seq prototype, and 10 sequencing cycles were recorded. Three pools of reporter probes were used in both the long and short workflow methods. All barcode domains of 512 possible sequencing probes were examined. Table 7 shows the experimental steps for the long and short workflow methods. [Table 7]
[0388] Long workflow experiments resulted in a darkening efficiency of over 97%. While darkening was estimated to be similarly efficient in short workflow experiments, it was anticipated that un-darkened reporters might remain in each image at a small frequency and be mistakenly identified as new reporters. Indeed, the highest barcode count in short workflow experiments was YYYYYY, which was likely un-darkened artifacts and background. Figure 42 shows that the performance of the 8×8×8 dodecamer reporter set was generally lower in short workflows compared to long workflows. Reporter complex 1 (binding to barcode domain position R1) and reporter complex 3 (binding to barcode domain position R3) were less efficient in short workflows compared to long workflows. This was expected for reporter complex 3, as it contains eight additional toehold oligonucleotides at a high concentration of 2.5 μM each, which could interfere with reporter hybridization. Reporter complex 1 should function similarly in both workflows, as neither the short nor the long workflow used a toehold to remove the first reporter complex. Total errors were also higher (1.3 to 2 times) in the short workflow than in the long workflow for all three reporter probes.
[0389] The 8×8×8 14-mer reporter set was validated by examining the efficiency, specificity, and hybridization rate for all barcode domains of 512 possible sequencing probes. The barcode domains of the sequencing probes were immobilized directly onto the glass surface of the Hyb & Seq sequencing cartridge. Figure 43 shows that the 8×8×8 14-mer reporter probes hybridized with an average efficiency of 88% and an average error rate of 5.1% in just 15 seconds. The majority of this error was due to inaccurate identification of the reporters, not to inaccurate hybridization. Misclassification of the reporters remained the leading cause of reporter errors.
[0390] A 10 × 10 × 10¹⁴-mer reporter set was identified by examining the efficiency, specificity, and hybridization rate of 30 complementary head-section sequencing probes for barcode domains. Each barcode domain contained only one reporter binding site. These barcode domains were immobilized directly onto the glass surface of a Hyb & Seq sequencing cartridge. Figure 44 shows that the 10 × 10 × 10¹⁴-mer reporter set hybridized in just 15 seconds with an average efficiency of 90% and an average error rate of 5.0%. Here again, the majority of errors were due to inaccurate identification of the reporter, not to inaccurate hybridization.
[0391] Example 12: Design and testing of a standard sequencing probe and a three-part sequencing probe.
[0392] The target-binding domain and barcode domain of a sequencing probe are separated by a double-stranded "stem." Figure 2 shows two architectures of sequencing probes investigated experimentally. In a standard sequencing probe, the target-binding domain and barcode domain reside on the same oligonucleotide, which binds to a stem oligonucleotide to create a double-stranded region with a length of 36 nucleotides. Using this architecture, each sequencing probe in the probe pool utilizes the same stem sequence. In a three-part probe, the target-binding domain and barcode domain are separate DNA oligonucleotides, bound to each other by a 36-nucleotide stem oligonucleotide. To prevent potential exchange of barcode domains, each barcode has its own stem sequence and is hybridized separately before the sequencing probes are pooled.
[0393] Figure 45 shows the results of a series of experiments comparing a three-part sequencing probe with a standard sequencing probe. These experiments confirmed that the three-part sequencing probe survived an entire sequencing cycle in approximately 80% of all readouts for both configurations, including detection of the third reporter probe. The three-part probe showed approximately 12% fewer counts compared to the standard sequencing probe. To study the tendency for barcode domain oligo exchange, high concentrations of short alternative oligonucleotides containing the same stem sequence were added to the reactant. The results showed that approximately 13% of the detected three-part sequencing probes had exchanged the barcode oligo. Oligonucleotide exchange would likely need to be mitigated by incorporating a unique stem sequence. Despite slightly lower performance, the three-part probe offers advantages such as design flexibility, rapid oligo synthesis, and cost reduction.
[0394] Example 13: Effect of cross-linked nucleic acid substitution in the target-binding domain
[0395] The effect of inserting a cross-linked nucleic acid (LNA) by substitution into the target-binding domain of a sequencing probe was investigated as follows. The sequencing probe was hybridized with a reporter probe in solution, and the appropriately formed sequencing probe-reporter probe was purified. Next, the sequencing probe-reporter probe was hybridized with a synthetic target nucleic acid in solution and loaded onto the surface of a prototype sequencing cartridge. The synthetic target nucleic acid consisted of 50 nucleotides and was biotinylated. The sequencing probes were tested individually or in a pool of nine sequencing probes. For the pool of nine sequencing probes, the probes were designed to bind along the length of the target nucleic acid. For analysis, the entire reactant was deposited onto the surface of a streptavidin-coated coverslide using a circuit board apparatus, and then extended by hydrodynamics. Next, images of the reporter probes were acquired and counted using appropriate equipment and software (e.g., NanoString nCounter® equipment and software).
[0396] Each sequencing probe contained a target-binding domain consisting of 10 nucleotides (SEQ ID NO: 97). LNA substitutions were performed within the target-binding domain so that two, three, or four LNA bases were included at the positions shown in Figure 46. Figure 46 shows that the binding affinity of individual sequencing probes to the target nucleic acid increased as the number of LNA bases increased. Importantly, Figure 46 shows that the incorporation of LNA bases did not decrease the specificity of the sequencing probe binding. A pool of nine sequencing probes was tested, and the base coverage at which each probe was able to complete binding to the target was determined. Figure 47 shows that when a single LNA probe was introduced into the pool, the coverage of the affected bases increased, but it had little effect on the binding of the surrounding probes. These results demonstrated that substitution with LNA bases can improve base sensitivity without decreasing specificity.
[0397] Example 14: Effect of substitution with modified nucleotides and nucleic acid analogs in the target-binding domain
[0398] The effects of introducing substitutions with various modified nucleotides and nucleic acid analogs (including cross-linked nucleic acids (LNA), cross-linked nucleic acids (BNA), propyne-modified nucleic acids, zip nucleic acids (ZNA®), isoguanine, and isocytosine) into the target-binding domain of sequencing probes were investigated as follows. A biotinylated target nucleic acid, 50 nucleotides in length, was loaded onto the surface of a streptavidin coverslide in a prototype sequencing cartridge. Next, sequencing probes and reporter probes were sequentially introduced into the sample chamber, and images were acquired using a Hyb & Seq prototype instrument. Images were processed for each different sequencing probe, and the counts were compared. Substitutions were performed in the target-binding domain of the sequencing probe, consisting of 10 nucleotides (SEQ ID NO: 99), so that LNA bases, BNA bases, propyne bases, and ZNA® bases were included at the positions shown in Figure 48. Figure 48 shows that probes containing LNA and BNA showed the greatest increase in binding affinity while maintaining specificity. The aforementioned findings are illustrated by the counts detected for matched and mismatched targets. These results indicated that substitution with LNA or BNA bases can improve base sensitivity without a decrease in specificity.
[0399] Example 15: Determining the accuracy of the sequencing method according to this disclosure
[0400] Figure 49 shows the results from an experiment to quantify the raw specificity of the sequencing method according to this disclosure. In this experiment, a sequencing reaction was performed in which a pool of four different sequencing probes was hybridized to a target nucleic acid containing a fragment of NRAS exon 2 (SEQ ID NO: 1). Each sequencing probe (barcodes 1-4) had a target-binding domain that was identical except for the position b5 of the hexamer, as shown in the upper part of Figure 49. In this example, barcode 4 is the correct sequencing probe. After hybridizing the sequencing probes, reporter probes were sequentially hybridized to each of the three positions of the barcode domain (R1, R2, R3), and the corresponding fluorescence data was recorded. The middle part of Figure 49 shows the number of times each color combination was recorded and the percentage of times the correct combination was recorded for each of the three barcode domain positions. The color combinations in R1 were correctly identified 96% of the time, the color combinations in R2 were correctly identified 97% of the time, and the color combinations in R3 were correctly identified 94% of the time. As shown in the lower part of Figure 49, this results in an overall raw specificity of 94%. Possible sources of error that could explain the misreading of the barcode domain location include (a) nonspecific binding of the reporter probe to the flow cell surface and (b) inaccurate hybridization of the reporter probe. The estimated reporter hybridization error was approximately 2–4%.
[0401] Figure 50 shows the results from an experiment to determine the accuracy of the sequencing method according to this disclosure when sequencing nucleotides in a target nucleic acid using two or more sequencing probes. As shown in the upper panel of Figure 50, the target nucleic acid in this example is a fragment of NRAS exon 2 (SEQ ID NO: 1). The specific base of interest is cytosine (C), highlighted in the target nucleic acid. This base of interest hybridizes to two different sequencing probes, each with a different hybridization footprint to the target nucleic acid. In this example, sequencing probes 1-4 (barcodes 1-4) attach three nucleotides to the left of the base of interest, while sequencing probes 5-8 (barcodes 5-8) attach five nucleotides to the left of the base of interest. The middle panel of Figure 50 shows the number of times a specific color combination was recorded at each position in the barcode domain of the sequencing probes. After quantifying the image, using the base reading technique shown in Figure 26, an average accuracy of approximately 98.98% can be achieved.
Claims
1. A method for detecting a target nucleic acid in a sample, wherein the method is (1) A step of hybridizing at least one probe to a target nucleic acid in the sample, Here, the at least one probe includes a target-binding domain and a barcode domain, The target-binding domain comprises at least 12 nucleotides and is capable of binding to the target nucleic acid, and The barcode domain includes at least two attachment sites, each attachment site includes at least one attachment region, and each attachment site includes at least one nucleic acid sequence that can be bound by a reporter probe; (2) A step of hybridizing a first reporter probe, which includes a first detectable marker and a second detectable marker, to a first attachment position among the at least two attachment positions of the barcode domain; (3) A step of identifying the first detectable label and the second detectable label of the first reporter probe hybridized to the first attachment position; (4) A step of removing the portion of the first reporter probe, including the first detectable marker and the second detectable marker, from the first attachment position; (5) A step of hybridizing a second reporter probe, which includes a third detectable marker and a fourth detectable marker, to the second attachment position of the at least two attachment positions of the barcode domain; (6) A step of identifying the third detectable label and the fourth detectable label of the second reporter probe hybridized to the second attachment position; (7) A step of detecting the target nucleic acid in the sample based on at least the attributes of the first detectable label, the second detectable label, the third detectable label, and the fourth detectable label, wherein each of the reporter probes comprises a primary nucleic acid molecule hybridized to at least six secondary nucleic acid molecules, and each primary nucleic acid molecule of each reporter probe comprises at least one cleavable linker between a first domain and a second domain. Each of the six secondary nucleic acid molecules comprises a first domain and a second domain, the first domain hybridizes to the second domain of the primary nucleic acid molecule, the second domain hybridizes to at least five tertiary nucleic acid molecules, and each secondary nucleic acid molecule comprises at least one cleavable linker between the first domain and the second domain, A method wherein each of the tertiary nucleic acid molecules comprises at least one detectable label.
2. The method according to claim 1, wherein the at least one severable linker is a light-cuttable linker.
3. The aforementioned severable linker 【Chemistry 1】 The method according to claim 2, selected from the following.
4. The method according to any one of claims 1 to 3, wherein the at least one severable linker is a light-cuttable linker.
5. The aforementioned severable linker 【Chemistry 2】 The method according to claim 4, selected from the following.
6. The primary nucleic acid molecule of the reporter probe comprises a first domain and a second domain. The at least six secondary nucleic acid molecules of the reporter probe are hybridized to the second domain, and The method according to claim 1, wherein each reporter probe comprises at least one cleavable linker located between the first domain and the second domain of the primary nucleic acid molecule.
7. Each secondary nucleic acid molecule of the reporter probe comprises a first domain and a second domain. The first domain is hybridized to the primary nucleic acid molecule of the reporter probe, and the second domain is hybridized to the at least five tertiary nucleic acid molecules of the reporter probe, and The method according to claim 6, wherein each reporter probe comprises at least one cleavable linker located between the first domain and the second domain of the secondary nucleic acid molecule.
8. The method according to claim 6 or 7, wherein the at least one severable linker is a light-cuttable linker.
9. The aforementioned severable linker 【Transformation 3】 The method according to claim 8, selected from the following.
10. The method according to claim 6 or 7, wherein each reporter probe includes at least 30 detectable labels.
11. The method according to claim 10, wherein all of the at least 30 detectable labels have the same emission spectrum.
12. The method according to claim 10, wherein at least one of the 30 detectable labels has a first emission spectrum, and at least one of the 30 detectable labels has a second emission spectrum, and the first emission spectrum and the second emission spectrum are spectrally decomposable.
13. The method according to claim 1, wherein the removal of the first detectable label and the second detectable label in step (4) includes exposing the first reporter probe to light.
14. The method according to claim 13, wherein the light is provided by a light source selected from the group consisting of an arc lamp, a laser, a focused UV light source, and a light-emitting diode.