Paired index position-encoded sequencing

WO2026107231A3PCT designated stage Publication Date: 2026-06-25THE BROAD INST INC

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
THE BROAD INST INC
Filing Date
2025-11-13
Publication Date
2026-06-25

Smart Images

  • Figure US2025055390_25062026_PF_FP_ABST
    Figure US2025055390_25062026_PF_FP_ABST
Patent Text Reader

Abstract

Paired index position-encoded sequencing is described. A plurality of transposons may be generated, each of the plurality of transposons comprising two oligonucleotides coupled together in a 5'-to-5' orientation to encode a unique pair of index sequences that distinguishes one transposon of the plurality of transposons from other transposons of the plurality of transposons. The plurality of transposons may be integrated into genomic DNA during a sequencing workflow. The unique pair of index sequences of the plurality of transposons may be matched between sequencing reads obtained during the sequencing workflow. A genomic sequence of the biological sample may be reconstructed based on the matching.
Need to check novelty before this filing date? Find Prior Art

Description

Paired Index Position-Encoded Sequencing RELATED APPLICATION

[0001] This application claims priority to U.S. Provisional Patent Application Serial No.63 / 720,382, filed November 14, 2024, entitled “Paired Index Position-Encoded Sequencing,” the entire disclosure of which is hereby incorporated by reference herein in its entirety.BACKGROUND

[0002] Sequencing is a methodological approach that resolves a sequential order, or sequence, of nucleotides in nucleic acid molecules (e g., DNA or RNA molecules) within a sample. Sequencing may be used to determine genomic composition, genetic variations, gene expression patterns, and the like. Current whole genome sequencing (WGS) methods ty pically rely on high sequencing depth for genome reconstruction by joining shorter reads that uniquely overlap. Traditional “linked-read” or short-read overlap approaches can cause sporadic and incomplete coverage over genomic fragments.

[0003] One challenge in accurate genome assembly arises from low complexity or repetitive regions of the genome. These low complexity regions represent a significant obstacle in the accurate dissociation, amplification, and assembly of contiguous sequences. The generation of contiguous sequences from overlap assembly is computationally expensive, time-consuming, and often fails in particular genomic contexts. Moreover, existing methods often struggle with de novo assembly of heterogeneous genomes, such as those found in various types of cancer or certain organisms.SUMMARY

[0004] Paired index position-encoded sequencing is described. A plurality of transposons may¬ be generated, each of the plurality of transposons comprising two oligonucleotides coupled together in a 5'-to-5' orientation to encode a unique pair of index sequences that distinguishes one transposon of the plurality of transposons from other transposons of the plurality- of transposons. The plurality of transposons may be integrated into genomic DNA during a sequencing workflow. The unique pair of index sequences of the plurality of transposons may be matched between sequencing reads obtained during the sequencing workflow. A genomic sequence of the biological sample may be reconstructed based on the matching.FIG. 1 Patents 1 Docket No.: BI-11119-PCT

[0005] This Summary introduces a selection of concepts in a simplified form that are further described below in the Detailed Description. As such, this Summary’ is not intended to identify essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.BRIEF DESCRIPTION OF THE DRAWINGS

[0006] The detailed description is described with reference to the accompanying figures.

[0007] FIG. 1 is an illustration of an environment in an example implementation that is operable to employ paired index position-encoded sequencing.

[0008] FIGS. 2A and 2B depict an example workflow in an implementation of paired index position-encoded sequencing for single-cell whole genome reconstruction.

[0009] FIG. 3 depicts a first illustrative example process for generating transposons for paired index position-encoded sequencing.

[0010] FIG. 4 depicts a second illustrative example process for generating transposons for paired index position-encoded sequencing.

[0011] FIG. 5 depicts an illustrative example process for generating transposomes for paired index position-encoded sequencing.

[0012] FIGS. 6A and 6B depict an example implementation of the paired index position-encoded sequencing.

[0013] FIG. 7 depicts an example procedure in which transposomes are generated for paired index position-encoded sequencing.

[0014] FIG. 8 depicts an example procedure in which paired index position-encoded sequencing is performed.

[0015] FIG. 9 illustrates an example system including various components of an example device that can be implemented as any type of computing device as described and / or utilized with reference to FIGS. 1-8 to implement the techniques described herein.

[0016] FIG. 10 depicts an example illustrating paired index position-encoded sequencing results.

[0017] FIG. 11 depicts an example illustrating read distributions obtained using paired index position-encoded sequencing as described herein.FIG. 1 Patents 2 Docket No.: BI-11119-PCTDETAILED DESCRIPTIONOverview

[0018] As mentioned above, cunent whole genome sequencing (WGS) methods typically include joining together shorter reads that overlap in sequence into longer contiguous sequences known as “contigs."’ By way of example, for accurate genome reconstruction using these techniques, many shorter reads (e.g., arising from sequencing fragmented DNA) typically overlap at different sections, allowing for the generation of longer sequence contigs with the goal of accurately assembling the native context of the nucleic acid sequence, often a genome. The process of overlap assembly, where short reads are joined based on overlapping sequences, is computationally expensive because of the high number of short reads (e.g., millions or billions) generated for WGS. For example, to ensure that there are enough overlapping reads to cover the entire genome accurately, high sequencing depth is used, which is resource intensive and time-consuming. Sequencing depth refers to how many times a given region of the genome is read during sequencing, on average.

[0019] Accordingly, traditional linked-read (e.g., TELL-seq) or short-read overlap approaches may provide sporadic and incomplete coverage over the genome, making it difficult to get a continuous and accurate genome reconstruction. Genome reconstruction using these methods may be particularly challenging for de novo assembly where the genome is constructed without use of a reference genome.

[0020] Moreover, repetitive regions of the genome where the DNA sequence repeats many times provide an additional challenge for genome assembly using the existing techniques. By way of example, short reads may not uniquely align to these regions due to the sequence repetition. As a result, it may be difficult to assign corresponding short reads to particular regions of the genome, leading to errors or incomplete assembly of the genome.

[0021] To overcome these issues, paired index position-encoded sequencing, or PIPE-seq, is disclosed herein. The techniques described herein enable highly efficient reconstruction of single DNA molecules and facilitate de novo genome assembly. The paired index position-encoded sequencing introduces paired indexes encoding relative positional information of genome fragments. These indexes are sequenced along with genomic DNA (gDNA) and informatically link natively adjacent DNA fragments in sequencing data. Accordingly, “paired index position-encoded sequencing” refers to a sequencing workflow in which nucleic acid fragments (e.g., derived the gDNA) are labeled with paired index sequences that enable computational linking of fragments that were natively adjacent in the genome.FIG. 1 Patents 3 Docket No.: BI-11119-PCT

[0022] In one or more implementations, the PIPE-seq technique described herein uses paired indexes that are physically and then informatically linked together so that proximity information between two natively adjacent genomic fragments is retained after their dissociation from each other. By way of example, the PIPE-seq technique uses transposons comprising a single-stranded linker generated from oligonucleotides coupled in a 5 '-end to 5'-end orientation. The transposons further comprise polymerase promoter sites, amplification sequences, and paired indexes, wherein a uniquely paired index sequence is included near both 3'-terminal ends of a given transposon. In at least one implementation, each transposon molecule includes a different sequence (or sequences) for the paired indexes such that each transposon molecule enables unique labeling of a genetic location. As such, the transposons described herein may be referred to as paired index position-encoding transposons, or PIPE transposons. The paired index position-encoding transposons further comprise transposon recognition sequences (e.g., mosaic ends) at each terminus of the single-stranded linker, which bind a transposase enzyme to form transposomes (e g., PIPE transposomes).

[0023] In at least one implementation, the PIPE transposomes are combined with permeabilized nuclei, and the PIPE transposons are integrated into the gDNA within the nuclei at a plurality of different locations. It shall be understood that transposon integration can include any technique of integrating PIPE transposons into gDNA. The nuclei may then be individually sorted so that single-cell sequencing may be performed. By performing the transposon integration in intact nuclei, the gDNA may be labeled with the paired indexes while in its native form. By way of example, techniques that extract bulk DNA from a sample may result in DNA shearing. These techniques also do not enable single-cell sequencing, as they combine DNA from multiple different cells. Moreover, the sorting of the nuclei may enable specific nuclei to be selected for sequencing in a controlled manner.

[0024] In at least one implementation, in vitro transcription (IVT) is used to linearly amplify fragments of the indexed gDNA. By way of example, the IVT may occur in opposite directions from the 5'-to-5' coupling, e.g., from polymerase promoter sites adjacent to the 5'-to-5' coupling. This results in index-labeled RNA fragments having a complement of a given gDNA sequence and an index sequence near at least one terminus. The index sequence informationally links a given RNA fragment to another RNA fragment that has the matching (e.g., uniquely paired) index sequence, and the position of the index sequence (e.g., upstream or dow nstream with respect to the gDNA sequence complement) indicates their relative arrangement in the genome. As used herein, “informationally links" refers to a non-physical association between two or more nucleic acid fragments, molecules, or sequencing reads by w ay of identifiersFIG. 1 Patents 4 Docket No.: BI-11119-PCTincluded therein (e.g., paired index sequences, barcodes, and / or other tags) such that a computational procedure can determine a defined relationship between the associated two or more nucleic acid fragments, molecules, or sequencing reads (for example, native adjacency, order, and / or common origin) without the items being physically joined.

[0025] In one or more implementations, cDNA is generated from the RNA fragments via reverse transcription, resulting in index-labeled cDNA. The index-labeled cDNA may be sequenced directly or further amplified prior to sequencing. In at least one implementation, long read sequencing is used in order to generate reads spanning an entirety of a cDNA fragment. Once sequencing data are generated, an analysis algorithm (e.g., an index matching algorithm) may identify reads having matching index sequences in order to determine the relative arrangement of the reads. This may enable a contiguous DNA sequence to be constructed with no reliance or less reliance on read overlaps.

[0026] In this way, the paired index position-encoding transposons encode positional information of natively adjacent DNA fragments of the gDNA, enabling efficient and accurate genome reconstruction with reduced sequencing depth. This reduced sequencing depth may enable cost savings and / or reduce resource consumption, enabling a greater number of samples to be processed for a given amount of sequencing data. As a result of the paired index positionencoding sequencing techniques described herein, whole genome sequencing, de novo genome assembly, single-molecule sequencing, genome reconstruction of cancer, and genome reconstruction of organisms with complex genomes may be efficiently and accurately performed. For instance, a single genome may be reconstructed from a single copy of DNA.

[0027] In some aspects, the techniques described herein relate to a method for paired index position-encoded sequencing, including: generating a plurality of transposons, each of the plurality of transposons including two oligonucleotides coupled together in a 5'-to-5' orientation to encode a unique pair of index sequences that distinguishes one transposon of the plurality of transposons from other transposons of the plurality of transposons; integrating the plurality of transposons into genomic DNA during a sequencing workflow; matching the unique pair of index sequences between sequencing reads obtained during the sequencing workflow; and reconstructing a genomic sequence of the genomic DNA based on the matching.

[0028] In some aspects, the techniques described herein relate to a method, wherein integrating the plurality of transposons into the genomic DNA informationally links natively adjacent fragments of the genomic DNA via the unique pair of index sequences.

[0029] In some aspects, the techniques described herein relate to a method, wherein integrating the plurality of transposons into the genomic DNA during the sequencing workflow includes:FIG. 1 Patents 5 Docket No.: BI-11119-PCTpermeabilizing a biological sample, wherein the biological sample includes a cell, a nucleus, or a tissue; combining the plurality of transposons with a transposase enzyme to form transposomes; and adding the transposomes to the permeabilized biological sample.

[0030] In some aspects, the techniques described herein relate to a method, wherein the two oligonucleotides of said each of the plurality of transposons each include: an RNA polymerase promoter site at a 5'-end; an amplification sequence adjacent to the RNA polymerase promoter site; an index sequence of the unique pair of index sequences; and a transposon recognition sequence at a 3'-end.

[0031] In some aspects, the techniques described herein relate to a method, wherein integrating the plurality of transposons into the genomic DNA during the sequencing workflow generates paired index position-encoded genomic DNA, and the sequencing workflow further includes: performing DNA extension to generate a double-stranded RNA polymerase promoter site from the RNA polymerase promoter site; performing in vitro transcription on the paired index position-encoded genomic DNA using an RNA polymerase that binds to the double-stranded RNA polymerase promoter site, the in vitro transcription generating paired index position-encoded RNA; performing reverse transcription on the paired index position-encoded RNA using a reverse transcriptase and primers targeting the amplification sequence or the RNA polymerase promoter site, the reverse transcription generating paired index position-encoded complementary DNA (cDNA); and sequencing the paired index position-encoded cDNA to generate the sequencing reads.

[0032] In some aspects, the techniques described herein relate to a method, further including: amplifying the paired index position-encoded cDNA prior to sequencing the paired index position-encoded cDNA.

[0033] In some aspects, the techniques described herein relate to a method, wherein generating the plurality of transposons includes: synthesizing a 5'-to-5' coupled oligonucleotide by coupling a first oligonucleotide having a first functional group attached to its 5'-end with a second oligonucleotide having a second functional group attached to its 5'-end, wherein the first functional group and the second functional group are configured to react selectively with each other.

[0034] In some aspects, the techniques described herein relate to a method, wherein the first functional group is a strained alky ne, the second functional group is an azide, and the second oligonucleotide is identical in sequence to the first oligonucleotide.

[0035] In some aspects, the techniques described herein relate to a method, wherein generating the plurality of transposons further includes: distributing aliquots of the 5'-to-5' coupledFIG. 1 Patents 6 Docket No.: BI-11119-PCToligonucleotide to a plurality of individual reaction vessels; adding different primer molecules to respective reaction vessels of the plurality of individual reaction vessels, the different primer molecules configured to anneal to both 3'-ends of the 5'-to-5' coupled oligonucleotide and provide a template for the unique pair of index sequences for one of the plurality of transposons; and extending the 5'-to-5' coupled oligonucleotide in a 3'-direction from an annealed primer molecule.

[0036] In some aspects, the techniques described herein relate to a method, wherein generating the plurality of transposons further includes: distributing aliquots of the 5'-to-5' coupled oligonucleotide to a plurality of individual reaction vessels; adding different index oligonucleotide molecules to respective reaction vessels of the plurality of individual reaction vessels, wherein each of the different index oligonucleotide molecules includes a transposon recognition sequence and an index sequence for the unique pair of index sequences for one of the plurality of transposons; adding bridge oligonucleotide molecules to respective reaction vessels of the plurality of individual reaction vessels, wherein the bridge oligonucleotide molecules are configured to anneal to a 3'-end of the 5'-to-5' coupled oligonucleotide and a 5'-end of the different index oligonucleotide molecules at each 3'-end of the 5'-to-5' coupled oligonucleotide; and ligating the 5'-to-5' coupled oligonucleotide to the different index oligonucleotide molecules at the 3'-ends of the 5'-to-5' coupled oligonucleotide using a ligase enzyme.

[0037] In some aspects, the techniques described herein relate to a method, further including: mixing the plurality of transposons with a transposase enzyme to form transposomes prior to integrating the plurality of transposons into the genomic DNA.

[0038] In some aspects, the techniques described herein relate to a method, wherein reconstructing the genomic sequence of the genomic DNA based on the matching includes: computationally determining that a DNA sequence in a sequencing read is natively immediately upstream of another DNA sequence from a separate sequencing read based on identifying the unique pair of index sequences in a downstream-positioned index sequence of the sequencing read and an upstream-positioned index sequence of the separate sequencing read.

[0039] In some aspects, the techniques described herein relate to a method, wherein the genomic sequence is a whole genome of a single cell.

[0040] In some aspects, the techniques described herein relate to a system for paired index position-encoded sequencing, including: a sequencing data processor; and a computer-readable storage medium having instructions stored thereon that, when executed by the sequencing dataFIG. 1 Patents 7 Docket No.: BI-11119-PCTprocessor, cause the sequencing data processor to perform operations including: receiving sequencing data including sequencing reads of paired index position-encoded DNA generated from genomic DNA of a biological sample, the paired index position-encoded DNA generated using transposons that informationally link natively adjacent fragments of the genomic DNA using paired indexes; and reconstructing a genomic sequence of the biological sample based on matching the paired indexes between the sequencing reads.

[0041] In some aspects, the techniques described herein relate to a system, wherein, to reconstruct the genomic sequence of the biological sample based on matching the paired indexes between the sequencing reads, the operations further include: matching a first sequencing read of the sequencing reads to a second sequencing read of the sequencing reads by identifying an index sequence of the paired indexes from a particular transposon in both of the first sequencing read and the second sequencing read; and determining a relative arrangement of a first DNA sequence of the first sequencing read and a second DNA sequence of the second sequencing read in the genomic sequence based on a position of the index sequence in each of the first sequencing read and the second sequencing read.

[0042] In some aspects, the techniques described herein relate to a system, wherein the paired index position-encoded DNA includes complementary DNA (cDNA) derived from the genomic DNA after labeling the genomic DNA with the paired indexes using a plurality of transposomes, each of the plurality of transposomes including one of the transposons in complex with a transposase, and wherein each of the transposons includes at least one different index sequence for the paired indexes.

[0043] In some aspects, the techniques described herein relate to a system, wherein the paired indexes include unique index sequence pairs that distinguish one transposon used to generate the paired index position-encoded DNA from other transposons used to generate the paired index position-encoded DNA.

[0044] In some aspects, the techniques described herein relate to a system, wherein the transposons that informationally link the natively adjacent fragments of the genomic DNA using the paired indexes each include: two oligonucleotides coupled together in a 5'-to-5' orientation via a covalent linker between each 5'-end of the two oligonucleotides, each of the two oligonucleotides of a given transposon including, from 5' to 3': an RNA promoter sequence; an amplification sequence; a unique index sequence for the paired indexes; and a transposon recognition sequence; and hybridizing oligos configured to anneal to the transposon recognition sequence at each 3'-end.FIG. 1 Patents 8 Docket No.: BI-11119-PCT

[0045] In some aspects, the techniques described herein relate to a method for paired index position-encoded sequencing, including: integrating a plurality’ of paired index positionencoding transposons into genomic DNA within an intact nucleus isolated from a biological sample to form paired index position-encoding transposon integrated genomic DNA, each of the plurality of paired index position-encoding transposons including two identical oligonucleotides coupled together in a 5'-to-5' orientation, each of the two identical oligonucleotides of a given paired index position-encoding transposon including, from 5' to 3': an RNA promoter sequence; an amplification sequence; an index sequence that is unique to the given paired index position-encoding transposon; and a transposon recognition sequence; generating paired index position-encoded RNA from the paired index position-encoding transposon integrated genomic DNA via in vitro transcription using an RNA polymerase that binds the RNA promoter sequence; generating paired index position-encoded cDNA from the paired index position-encoded RNA via reverse transcription using primers targeting the amplification sequence; sequencing the paired index position-encoded cDNA to generate sequencing reads; and reconstructing a genomic sequence of the biological sample by matching index sequences between the sequencing reads.

[0046] In some aspects, the techniques described herein relate to a method, further including generating the plurality of paired index position-encoding transposons by: synthesizing a 5'-to-5' coupled oligonucleotide by coupling a first oligonucleotide having a first functional group attached to its 5'-end with a second oligonucleotide having a second functional group attached to its 5'-end, wherein the first functional group and the second functional group are configured to react selectively with each other; and performing a plurality of separate reactions with aliquots of the 5'-to-5' coupled oligonucleotide, each of the plurality of separate reactions using a molecule encoding a different index sequence, wherein each of the plurality of separate reactions produces one of the plurality of paired index position-encoding transposons.

[0047] In the following discussion, an example environment is first described that may employ the techniques described herein. Example implementation details and procedures are then described which may be performed in the example environment as well as other environments. Consequently, performance of the example procedures is not limited to the example environment, and the example environment is not limited to performance of the example procedures.FIG. 1 Patents 9 Docket No.: BI-11119-PCTExample Environment

[0048] FIG. 1 is an illustration of an environment 100 in an example implementation that is operable to employ paired index position-encoded sequencing as described herein. The illustrated environment 100 includes a service provider system 102, a client device 104, a nucleic acid amplifier 106, a nucleic acid sequencer 108, and a sequencing data processor 110 that are communicatively coupled, one to another, via a network 112. The network 112 may enable wired and / or wireless electronic communication, for example. Although the sequencing data processor 110 is illustrated as separate from the service provider system 102, the client device 104, and the nucleic acid sequencer 108, this functionality may be incorporated as part of the service provider system 102, the client device 104, and / or the nucleic acid sequencer 108, further divided among other entities, and so forth. By way of example, an entirety of or portions of the functionality of the sequencing data processor 110 may be incorporated as part of the nucleic acid sequencer 108 and / or the client device 104. Additionally, or alternatively, an entirety of or portions of the client device 104 may be incorporated as part of the nucleic acid sequencer 108 and / or the sequencing data processor 110. Moreover, in at least one variation, the nucleic acid amplifier 106 and / or the nucleic acid sequencer 108 is not communicatively coupled to the network 112.

[0049] Computing devices that are usable to implement the sen ice provider system 102, the client device 104, and the sequencing data processor 110 may be configured in a variety of ways. A computing device, for instance, may be configured as a desktop computer, a laptop computer, a mobile device (e.g., assuming a handheld configuration such as a tablet or mobile phone), and so forth. Thus, the computing device may range from full resource devices with substantial memory' and processor resources (e.g., personal computers, game consoles) to a low-resource device with limited memory and / or processing resources (e.g., mobile devices). Additionally, a computing device may be representative of a plurality of different devices, such as multiple servers utilized to perform operations “over the cloud,” as further described in relation to FIG. 9.

[0050] The service provider system 102 is illustrated as including an application manager module 114 that is representative of functionality to provide access to the sequencing data processor 110 to a user of the client device 104 via the network 112. The application manager module 114, for instance, may expose content or functionality' of the sequencing data processor 110 that is accessible via the network 112 by an application 116 of the client device 104. The application 116 may be configured as a network-enabled application, a browser, a native application, and so on, that exchanges data with the service provider system 102 via theFIG. 1 Patents 10 Docket No.: BI-11119-PCTnetwork 112. The data can be employed by the application 116 to enable the user of the client device 104 to communicate with the service provider system 102, such as to receive application updates and features when the service provider system 102 provides functionality to manage the application 116.

[0051] In the context of the described techniques, the application 116 includes functionality to view and / or analyze data generated by a sequencing event. In the illustrated example, the application 116 includes an interface 118 that is implemented at least partially in hardware of the client device 104 for facilitating communication between the client device 104 and the sequencing data processor 110. By way of example, the interface 118 includes functionality to receive inputs to the sequencing data processor 110 from the client device 104 (e.g., from a user of the client device 104) and output information, data, and so forth from the sequencing data processor 110 to the client device 104, as will be further elaborated herein.

[0052] The sequencing event includes determining an order of nucleotides (e.g., adenine, thymine or uracil, cytosine, and guanine) in a sample of nucleic acids, such as nucleic acids derived from a biological sample 120. The order of nucleotides is referred to herein as a “sequence / ’ The nucleotides are also referred to as “bases.” Although the sequencing event will be described herein with respect to deoxyribonucleic acid (DNA) sequencing, and more particularly to single-cell w hole genome sequencing, it is to be appreciated that the techniques described herein may be adapted for other types of sequencing.

[0053] The nucleic acid sequencer 108 is configured to produce sequencing data 122 that is analyzed by the sequencing data processor 1 10 to determine the order of nucleotides in the nucleic acid sample. In at least one implementation, the sequencing data 122 comprise a textbased file format, such as FASTQ files that store both nucleotide sequence information and quality scores for the bases in a sequencing read. In variations, the sequencing data 122 comprise another type of file format. The nucleic acid sequencer 108 may use one of a plurality of sequencing techniques to produce the sequencing data 122, e.g., “sequencing reads” or “reads.” By way of example, the nucleic acid sequencer 108 may use a short read sequencing technique that produces sequence fragments t pically ranging from approximately 10 bases to approximately 1000 bases and more typically from approximately 50 bases to approximately 500 bases. Sequence fragments produced via short read sequencing techniques are also referred to as “short reads.” Alternatively, the nucleic acid sequencer 108 may use a long read sequencing technique that produces sequence fragments that typically range from approximately 2000 bases to 1,000,000 bases and more typically from 5000 bases to 500,000FIG. 1 Patents 11 Docket No.: BI-11119-PCTbases in length. Sequence fragments produced via long read sequencing techniques are also referred to as ‘long reads.”

[0054] In at least one implementation, the nucleic acid sequencer 108 is a DNA sequencer. In at least one variation, the nucleic acid sequencer 108 is an RNA sequencer capable of directly sequencing RNA. The RNA sequencer, for instance, may use single-molecule sequencing techniques (e.g., nanopore sequencing).

[0055] Regardless of the sequencing technique, in the illustrated example environment 100, the nucleic acid sequencer 108 produces the sequencing data 122 for the biological sample 120 through a process referred to herein as paired index position-encoded sequencing, or PIPE-seq, as further explained herein. By way of example, genomic DNA (e.g., gDNA) 124 of the biological sample 120 is labeled using PIPE transposons 126 having paired indexes 128 using transposon integration 130. Briefly, the PIPE transposons 126 each include a single-stranded linker generated from oligonucleotides coupled in a 5'-end to 5'-end orientation (e.g., a 5'-to-5' coupled oligonucleotide). The 5'-to-5' coupled oligonucleotide includes two oligonucleotides covalently linked at their 5' ends through a chemical linkage (e g., a triazole or other covalent linkage), with each of the two oligonucleotides extending away from the chemical linkage in the 3' direction. The single-stranded linker may further comprise polymerase promoter sites, amplification sequences, and the paired indexes 128. In at least one implementation, each PIPE transposon 126 molecule includes a different sequence or sequences for the paired indexes 128 such that each PIPE transposon 126 enables unique labeling of a genetic location. That is. the paired indexes 128 may comprise one or more index sequences that are uniquely paired for a given PIPE transposon 126. For instance, a single PIPE transposon 126 may include a first index sequence on a first side of the 5'-to-5' coupling and a second, different index sequence on a second side of the 5'-to-5' coupling, where the first index sequence is uniquely paired with the second index sequence. Alternatively, a single PIPE transposon 126 may include the same index sequence on both sides of the 5'-to-5' coupling, where the same index sequence is uniquely paired with itself. Accordingly, the paired indexes 128 may include a unique pair of index sequences as two index sequence portions (having the same or a different sequence) that are associated together as a pair for a given transposon, with one of the two index sequence portions on each side of the 5'-to-5' coupling, and that are designed to be distinguishable from index pairs of other transposons. Uniqueness may be defined by sequence dissimilarity thresholds suitable for error-tolerant matching. At each terminus (e.g., each 3'-end) of the single-stranded linker, the PIPE transposons 126 further comprise transposon recognition sequences (e.g., mosaic ends), which bind a transposase enzyme to form transposomes. AsFIG. 1 Patents 12 Docket No.: BI-11119-PCTsuch, each of the PIPE transposons 126 may comprise two oligonucleotide portions that are coupled together at their 5'-ends. The PIPE transposons 126 and transposomes will be further described herein, e.g., with reference to FIGS. 3-5.

[0056] When the transposomes are added to the gDNA 124 during the transposon integration 130, a given transposome locates and binds to a location in the genome and causes a complexed PIPE transposon 126 to be inserted into the gDNA 124. By way of example, during the transposon integration 130, the transposase enzyme binds to the gDNA 124 and introduces cuts at specific locations of the gDNA 124. The transposase enzyme further makes cuts at or near the transposon recognition sequences within the PIPE transposon 126, and the transposon is integrated into the gDNA 124 at the cut sites. The transposon integration 130, which will be further described below with respect to FIGS. 2A and 6A, results in transposon-integrated gDNA 132.

[0057] After gap extension, the transposon-integrated gDNA 132 includes double-stranded polymerase promoter sites (e.g., sequences) at each location of PIPE transposon 126 integration. In at least one implementation, the polymerase promoter sites comprise a sequence that is recognized by an RNA polymerase (e.g., T7 RNA polymerase), which enables transcription of the transposon-integrated gDNA 132 to corresponding PIPE RNA 134 via an in vitro transcription (IVT) reaction 136. During the IVT reaction 136, for instance, the RNA polymerase transcribes the PIPE RNA 134 using the transposon-integrated gDNA 132 as a template. By way of example, a given molecule of the PIPE RNA 134 may comprise a first index sequence near a first end of an RNA sequence that is complementary to the transposon-integrated gDNA 132 (and the gDNA 124) and a second index sequence near a second end of the RNA sequence. As such, a given indexed RNA fragment may include two different index sequences that encode relative positional information of the RNA fragment (e.g., in relation to adjacent fragments), as further explained herein. It is to be appreciated that the PIPE RNA 134 is complementary to the transposon-integrated gDNA 132. However, unlike the transposon-integrated gDNA 132, where contiguous sequences are linked together via the 5'-to-5' coupled PIPE transposons 126, the PIPE RNA 134 comprises fragments that are not physically linked to one another.

[0058] The IVT reaction 136 linearly amplifies the PIPE RNA 134. That is, the transposon-integrated gDNA 132 does not replicate during the IVT reaction 136, and so each round of transcription produces one RNA molecule from one DNA template. This is in contrast to the exponential amplification that occurs during polymerase chain reaction (PCR), where newly synthesized DNA fragments can serve as templates. The exponential amplification of PCR mayFIG. 1 Patents 13 Docket No.: BI-11119-PCTintroduce length bias toward shorter fragments, as shorter fragments amplify more efficiently than longer fragments. As such, the IVT reaction 136 may advantageously generate the PIPE RNA 134 without the length bias that may occur in PCR.

[0059] In at least one implementation, paired index position-encoded DNA 138 (e.g., PIPE DNA 138) is synthesized from the PIPE RNA 134. In such implementations, the PIPE DNA 138 comprises a plurality of complementary DNA (cDNA) molecules that are complementary to respective RNA molecules of the PIPE RNA 134. By way of example, the PIPE DNA 138 may be synthesized from the PIPE RNA 134 at the nucleic acid amplifier 106. The nucleic acid amplifier 106 may then facilitate cDNA synthesis through a reverse transcription reaction. The nucleic acid amplifier 106 may further facilitate second-strand synthesis and / or amplification of the PIPE DNA 138. if desired. By way of example, the nucleic acid amplifier 106 may be a thermal cycler that is configured to cycle through different temperature stages, which allow for the denaturation of the PIPE RNA 134 (e.g., by disrupting secondary structures of the PIPE RNA 134 as well as any RNA-DNA hybrids), annealing of primers 140 (e.g., short oligonucleotides that target the amplification sequences in the PIPE RNA 134), extension of new, complementary DNA strands from the primers 140 using a reverse transcriptase enzyme, and inactivation of the reverse transcriptase enzyme to terminate the reverse transcription reaction.

[0060] In at least one variation, the PIPE DNA 138 may be generated directly from the transposon-integrated gDNA 132. For example, the process may include performing an amplification reaction (e.g., a polymerase chain reaction) on the transposon-integrated gDNA 132, without first performing the IVT reaction 136 and the reverse transcription reaction. In such an approach, the primers 140 may target the amplification sequences or other suitable regions of the integrated PIPE transposons 126. As such, in at least one variation, the PIPE DNA 138 includes amplified fragments of the transposon-integrated gDNA 132 rather than cDNA generated from the PIPE RNA 134. In at least one implementation where the IVT reaction 136 is not used, the PIPE transposons 126 may not include the promoter sequence.

[0061] The reactions conducted in the nucleic acid amplifier 106, including the reverse transcription reaction and / or amplification of the PIPE DNA 138, may be performed in one or more rounds, also referred to herein as “reaction cycles." A reaction cycle, for instance, may include a denaturation step followed by a primer annealing step (e.g., where the primers 140 bind through complementary base pairing via hydrogen bonding, where A pairs with T / U and C pairs with G). which is followed by an extension step. To facilitate this, the nucleic acid amplifier 106 may include a thermal block or heating / cooling element to regulate temperature,FIG. 1 Patents 14 Docket No.: BI-11119-PCTa programmable interface to set reaction cycle parameters (e.g., temperature and time), and heating / cooling mechanisms to rapidly transition between the different temperature stages. The nucleic acid amplifier 106 may further include a heated hd to prevent condensation of the samples during the reaction.

[0062] In at least one implementation, the nucleic acid amplifier 106 is further used during the transposon integration 130 and / or the IVT reaction 136. By way of example, the nucleic acid amplifier 106 may be held at an incubation temperature rather than transitioning through different temperature stages. In at least one variation, however, another type of temperature regulating component is used for single temperature incubation, such as a water bath or a heating block.

[0063] In one or more implementations, second strand synthesis is performed to generate double-stranded PIPE cDNA. The second strand synthesis may be performed during the reverse transcription reaction by including a DNA polymerase enzyme along with the reverse transcriptase enzyme or subsequent to the reverse transcription reaction, e g., in a separate reaction performed in the nucleic acid amplifier 106. By way of example, the second strand synthesis may be performed via PCR, although other nucleic acid amplification techniques may be used. As mentioned above, in at least one implementation, the PIPE DNA 138 is amplified subsequent to the reverse transcription reaction.

[0064] The primers 140 may be designed to enable the PIPE DNA 138 to capture entire sequences of the respective molecules of the PIPE RNA 134 and / or to capture entire sequences between two adjacently integrated PIPE transposons 126. Moreover, the primers 140 may include at least one adapter sequence to facilitate sequencing. The at least one adapter sequence may enable the corresponding cDNA molecule of PIPE DNA 138 to bind to a solid support or surface, such as the surface of a sequencing flow cell. Additionally, or alternatively, the at least one adapter sequence may include a unique molecular identifier (UMI) that provides a cellspecific barcode and / or an additional sequencing index used to distinguish individual samples in a multiplexed sequencing reaction, thus enabling the origin of a corresponding sequencing read to be identified during a downstream analysis.

[0065] By way of example, the biological sample 120 may include cells or tissue obtained from a subject (e.g., an individual, such as a patient, or another type of organism, such as bacteria) and / or a culture. In at least one implementation, nuclei from the cells are isolated and permeabilized in bulk, with the transposon integration 130 performed within intact nuclei. The nuclei may then be flow-sorted, such as into individual microtiter plate wells via flow-activated cell sorting (FACS), so that sequencing data 122 corresponding to individual cells (e.g.,FIG. 1 Patents 15 Docket No.: BI-11119-PCTindividual nuclei) may be obtained. However, it may be desirable to sequence DNA from multiple individual nuclei at once while distinguishing sequencing data 122 corresponding to one nucleus from another (e.g., using cell-specific barcodes). In at least one variation, the transposon integration 130 is performed in permeabilized cells, permeabilized tissues, microbes, bacteria, sorted cells that have been lysed, or the like rather than in intact nuclei. As such, the techniques described herein may be applied to a variety of sample and preparation types.

[0066] The sequencing data processor 110 is configured to receive the sequencing data 122 and determine sequencing reads 142 therefrom, e.g., as represented in a binary alignment / map (BAM) file, for instance. In at least one implementation, long read sequencing is advantageously used in order to generate reads spanning an entire length of a given molecule of the PIPE DNA 138. In at least one variation, however, short read sequencing is used. The sequencing data processor 110 illustrated in FIG. 1 is further configured to output a sequence 144 by analyzing the sequencing reads 142 via a genome reconstruction module 146. The sequence 144, for instance, corresponds to a genome of the biological sample 120 and / or a portion of the biological sample 120 (e.g., a single nucleus). Additionally, or alternatively, the sequence 144 corresponds to a portion of the genome rather than a whole genome. In at least one implementation, the genome reconstruction module 146 is representative of the functionality of the sequencing data processor 110 to identify contiguous reads of the sequencing data 122 based on the paired indexes 128. By way of example, the genome reconstruction module 146 may utilize an index matching algorithm 148 that computationally determines which sequencing reads 142 correspond to natively adjacent DNA fragments in the genome (e.g., in the gDNA 124) by matching indexes.

[0067] As will be further illustrated with respect to FIGS. 6 A and 6B, integration of the PIPE transposons 126 and subsequent in vitro transcription (e.g., to generate the PIPE RNA 134) and reverse transcription and / or amplification (e.g., to generate the PIPE DNA 138) results in the paired indexes 128 being associated with natively adjacent PIPE DNA 138 fragments. That is, a first PIPE cDNA molecule may include a first index sequence at an upstream end of a first DNA sequence, the first DNA sequence corresponding to a first portion of the gDNA 124, and a second index sequence at the downstream end of the first DNA sequence. A second PIPE cDNA molecule may include a second DNA sequence corresponding to a second portion of the gDNA 124, the second portion of the gDNA 124 adjacent to and immediately upstream of the first portion of the gDNA 124. The second PIPE cDNA molecule may further include the first index sequence at the downstream end of the second DNA sequence. A third PIPE cDNAFIG. 1 Patents 16 Docket No.: BI-11119-PCTmolecule may include a third DNA sequence corresponding to a third portion of the gDNA 124, the third portion of the gDNA 124 adjacent to and immediately downstream of the first portion of the gDNA 124. The third PIPE cDNA molecule may further include the second index sequence at the upstream end of the third DNA sequence.

[0068] Thus, during genome reconstruction, the index matching algorithm 148 may computationally determine that the second DNA sequence of the second PIPE cDNA molecule is immediately upstream of the first DNA sequence of the first PIPE cDNA molecule in response to the downstream-positioned first index sequence of the second PIPE cDNA molecule matching with the upstream-positioned first index sequence of the first PIPE cDNA molecule. The index matching algorithm 148 may further computationally determine that the third DNA sequence of the third PIPE cDNA molecule is immediately downstream of the first DNA sequence of the first PIPE cDNA molecule in response to the downstream-positioned second index sequence of the first PIPE cDNA molecule matching with the upstream-positioned second index sequence of the third PIPE cDNA molecule.

[0069] As used herein with respect to nucleic acids, the terms ’upstream" and ’downstream" describe the relative positions of sequences in relation to a particular reference (e.g., a sequence or genomic location). ’’Upstream" refers to the direction opposite to the direction of transcription (e.g., toward the 5 '-end of a DNA strand), whereas ’’dow nstream" refers to the direction in which transcription proceeds (e.g., toward the 3'-end of the DNA strand). Moreover, the term '‘matching7’ as used herein with respect to the paired indexes 128 may refer to the process of identifying and arranging sequencing reads based on their associated paired index sequences, which may be matched to a particular transposon. By way of example, the PIPE transposons 126 may each include two copies of the same index sequence or two different index sequences that are paired together in a single transposon.

[0070] In at least one implementation, the index matching algorithm 148 may use a fuzzy matching scoring scheme that takes into consideration errors that may be introduced during preparation of PIPE DNA 138 and / or introduced via base read errors in the sequencing data 122. The fuzzy matching scoring scheme may, for example, allow a configurable tolerance or threshold of mismatch (e.g., where a nucleotide position of the sequencing read 142 varies with respect to the paired indexes 128 due to a substitution, a deletion, or an insertion).

[0071] It is to be appreciated that although one sequence 144 is shown, the genome reconstruction module 146 may generate a plurality of different sequences 144. such as sequences corresponding to different biological samples 120 or different cells within the same biological sample 120.FIG. 1 Patents 17 Docket No.: BI-11119-PCT

[0072] The genome reconstruction module 146 generates and outputs the sequence 144, e.g., the reconstructed whole genome sequence. By way of example, reconstructing a sequence refers to determining the sequence 144. The client device 104 is shown displaying, via a display device 150, the sequence 144 or a portion thereof. It is to be appreciated that the sequencing data 122 and / or the sequence 144 may be also stored in a memory of the sequencing data processor 110 and / or the client device 104 for subsequent access. By way of example, the sequencing data 122, the sequencing reads 142, and / or the sequence 144 may be stored in a single data file or in multiple data files. It is to be appreciated that at least a portion of the information in the sequencing data 122, the sequencing reads 142, and / or the sequence 144 may be stored in a standardized file format, e g., tab-separated values (TSV), to facilitate and automate downstream processing.

[0073] In this way, using the PIPE transposons 126 in a sequencing workflow encodes positional information of natively adjacent DNA fragments of the gDNA 124, enabling efficient and accurate genome reconstruction with reduced sequencing depth. This reduced sequencing depth may enable cost savings and / or reduce resource consumption, enabling a greater number of samples to be processed for a given amount of sequencing data, for example. As a result of the paired index position-encoding sequencing techniques described herein, whole genome sequencing, de novo genome assembly, single-molecule sequencing, genome reconstruction of cancer, and genome reconstruction of organisms with complex genomes may be efficiently and accurately performed.Paired Index Position-Encoded Sequencing

[0074] FIGS. 2 A and 2B depict an example workflow 200 in an implementation of paired index position-encoded sequencing. By way of example, the workflow 200 is a sequencing workflow for performing the paired index position-encoded sequencing. In at least one implementation, the workflow 200 is performed for single-cell whole genome reconstruction. Where appropriate, reference will be made to components previously introduced in FIG. 1.

[0075] Referring first to FIG. 2A, in the example workflow 200, nuclei 202 are extracted from the biological sample 120. By way of example, the nuclei 202 may be prepared from homogenized tissue or another type of cell suspension (e.g., from bodily fluids or cultured cells) under conditions that preserve and permeabilize the nuclear membrane. In at least one implementation, the nuclei 202 are isolated and permeabilized from cells without retaining mitochondrial DNA. and thus, the retained DNA may be the gDNA 124 (e.g., retained within the individual nuclei 202). As non-limiting examples, the nuclei 202 may be isolated andFIG. 1 Patents 18 Docket No.: BI-11119-PCTpermeabilized using hypotonic lysis, detergents, digitonin, and / or lithium diiodosalicylate for nucleosome disruption. In at least one variation, another type of cell lysis technique is used to isolate the nuclei 202 from the biological sample 120. A hypotonic lysis buffer, for instance, may include a lower solute concentration than inside of the cells of the biological sample 120, causing the cells to swell and burst open. The hypotonic lysis buffer may further include digitonin, one or more other detergents, and / or the lithium diiodosalicylate.

[0076] It is to be appreciated that the nuclei 202 remain intact while permeable. However, when permeabilized in this manner, the nuclear membrane of respective nuclei 202 allows the reagents used in the transposon integration 130 to pass through and enter the nucleus.

[0077] The workflow 200 further includes PIPE transposon preparation 204. The PIPE transposon preparation 204 generates the PIPE transposons 126, each including a different set of paired indexes 128, as well as PIPE transposomes 206 comprising the PIPE transposons 126 in combination with transposase enzyme 208. In at least one implementation, the PIPE transposon preparation 204 includes coupling together (e.g., covalently bonding) two oligonucleotides each comprising a polymerase promoter, an amplification sequence, and a reactive moiety to facilitate the coupling. In accordance with the techniques described herein, the two oligonucleotides are coupled together in a 5 '-to-5 ' orientation using a chemical reaction, such as a click chemistry reaction, in an aqueous solution. In at least one variation, the two identical oligonucleotides are coupled together using a different type of highly selective reaction that will not react with off-target portions of the oligonucleotide molecules, such as another type of bioorthogonal reaction. By way of example, a first oligonucleotide molecule may include a first reactive moiety (e.g., a first functional group) at its 5 '-end, and a second oligonucleotide molecule may include a second reactive moiety (e.g., a second functional group) at its 5'-end, where the second reactive moiety is configured to selectively react with the first reactive moiety. As a non-limiting example, the two oligonucleotides are coupled together using a ring strain-promoted azide-alkyne cycloaddition reaction (e.g., the SPAAC reaction), where the first reactive moiety is an azide, and the second reactive moiety is a strained alkyne. Using a strained alkyne, such as cyclooctyne or a derivative thereof (e.g., dibenzocyclooctyne), makes the alkyne highly reactive, enabling efficient and specific conjugation with the azide in mild reaction conditions and without use of a metal catalyst.

[0078] Following the coupling reaction, the 5'-to-5' coupled oligonucleotide may be purified via gel electrophoresis. By way of example, polyacrylamide gel electrophoresis or agarose gel electrophoresis may be used. Alternatively, or in addition, another type of purificationFIG. 1 Patents 19 Docket No.: BI-11119-PCTtechnique may be used to isolate the 5'-to-5' coupled oligonucleotide from uncoupled reagents, for example.

[0079] To prepare the PIPE transposons 126, in at least one implementation, the 5'-to-5' coupled oligonucleotide is annealed and extended with oligonucleotides including a specific index sequence and transposon recognition sequences, e.g., the mosaic ends (ME). The annealing and extension may be performed in microtiter plate wells, for example, where each well receives primers having a different index sequence so that the same index sequence is annealed to each end of the 5'-to-5' coupled oligonucleotide, thus resulting in the paired indexes 128 rather than two different index sequences on either terminal end of the 5'-to-5' coupled oligonucleotide. Each index sequence may be a random sequence or selected sequence of nucleotides, e.g., ranging from 5 to 100 base pairs. As a non-limiting example, computational algorithms may be used to design the index sequences, and the primers may be designed using pre-determined (e.g., computationally pre-approved to minimize undesired interactions and / or sequence ambiguity ) index sequences. In at least one variation, rather than the annealing and extension, ligation is performed to add the index sequences and the transposon recognition sequences to the 5'-to-5' coupled oligonucleotide. Additional details regarding the PIPE transposon preparation 204 will be described herein, e.g., with reference to FIGS. 3 and 4.

[0080] The pool of PIPE transposons 126 (e.g., from every well of the microtiter plate) is mixed with the transposase enzyme 208 to generate the PIPE transposomes 206. In at least one implementation, the transposase enzy me 208 includes hyperactive Tn5 transposase protein monomers, although other transposase enzymes may be used. The PIPE transposomes 206 may be used in the transposon integration 130 upon preparation or stored at an appropriate temperature to prevent degradation (e.g., -20 °C) before use.

[0081] The transposon integration 130 is performed within the nuclei 202, e.g., in an integration reaction mixture 210. The integration reaction mixture 210, for example, includes the nuclei 202 and the PIPE transposomes 206 in an aqueous buffer that includes magnesium ions (e.g., Mg2), which serve as a cofactor for the catalytic function of the transposase enzyme 208. The PIPE transposomes 206 and the magnesium ions in the integration reaction mixture 210 pass through the permeabilized nuclear membrane of the nuclei 202 to interact with the gDNA 124. The transposase enzy me 208 of the PIPE transposomes 206 binds to the gDNA 124 at a plurality of locations, and, guided by its interaction with the mosaic ends of the PIPE transposons 126. cuts the gDNA 124 and attaches the 5'-to-5' coupled oligonucleotide havingFIG. 1 Patents 20 Docket No.: BI-11119-PCTthe paired indexes 128 at the cut sites, resulting in the PIPE transposons 126 being integrated into the gDNA 124 at the cut sites.

[0082] In at least one implementation, the transposon integration 130 is terminated by adding a chelating agent (e.g., ethylenediaminetetraacetic acid, or EDTA) to the integration reaction mixture 210. The chelating agent binds the magnesium ions, thus stopping the catalytic function of the transposase enzyme 208. The labeled nuclei 212 may undergo flow sorting 214 in order to isolate individual labeled nuclei for single-cell sequencing. By way of example, the flow sorting 214 may include staining the labeled nuclei 212 and performing flow-activated cell sorting (FACS). The flow sorting 214, for instance, may place one labeled nucleus into individual wells of a microtiter plate for further processing, represented in the workflow 200 as sorted nuclei 216. The sorted nuclei 216 may be further processed immediately following termination of the transposon integration 130 or stored at an appropriate temperature to prevent degradation (e.g., -80 °C) before processing.

[0083] It is to be appreciated that performing the transposon integration 130 inside of the nuclei 202 may offer several advantages. For example, bulk DNA extracted from cells is often sheared, which may hinder the positional encoding of the PIPE transposons 126, which are designed to informationally link natively adjacent portions of the gDNA 124. If a DNA fragment is sheared, for instance, it is no longer connected to adjacent portions of the gDNA 124. As another example, nuclei 202 are uniform in size, unlike cells from different tissues, cell types, and so forth, making the nuclei 202 easier to capture and process in a uniform manner from experiment to experiment and between different biological samples 120.

[0084] The sorted nuclei 216 undergo transposome protein removal 218 via treatment with a proteinase 220. By way of example, the proteinase 220 may be thermolabile proteinase K. The proteinase 220 removes the transposase enzyme 208 bound to the transposon-integrated gDNA 132 by digesting the transposase enzyme 208, for instance. When thermolabile, the proteinase 220 may be heat-inactivated following the transposome protein removal 218.

[0085] It is to be appreciated that the gDNA 124 exists in a double-stranded form. However, during the transposon integration 130, a given PIPE transposon 126 may be integrated into a DNA fragment from a first strand of the gDNA 124, and a second strand of the gDNA 124, which is complementary to the first strand, does not include the given PIPE transposon 126. In this scenario, the first strand may be referred to as a transferred strand (e.g., indicating that the PIPE transposon 126 has been transferred to this strand), and the second strand may be referred to as a non-transferred strand. This results in a gap where the transposon-integrated gDNA 132 is not double-stranded. Thus, after the transposon integration 130, gap extension 222 isFIG. 1 Patents 21 Docket No.: BI-11119-PCTperformed using DNA polymerase 224 in order to extend the 3'-end of the non-transferred DNA strand and generate double-stranded DNA, including a double-stranded polymerase promoter. By way of example, the DNA polymerase 224 may be added to the sorted nuclei 216 following the transposome protein removal 218 (e.g., and inactivation of the proteinase 220) along with nucleotides (e.g., deoxyribonucleotide triphosphates, or dNTPs, for DNA synthesis) to be incorporated into the newly synthesized portions, thus generating doubled-stranded transposon-integrated gDNA 132.

[0086] In at least one implementation, the IVT reaction 136 is performed on the doubled-stranded transposon-integrated gDNA 132. By way of example, RNA polymerase 226 is added to the transposon-integrated gDNA 132 in an IVT reaction mixture 228 to generate the PIPE RNA 134. In at least one implementation, the RNA polymerase 226 is a recombinant T7 RNA polymerase, such as when the promoter sequences in the PIPE transposons 126 are T7 promoter sequences. However, other RNA polymerases and promoter sequences may be used. As such, the promoter sequences are matched to the specific RNA polymerase 226 to be used. The RNA polymerase 226 is configured to bind to the promoter sequences of the transposon-integrated gDNA 132 and transcribe it into index-labeled RNA fragments, e.g., the PIPE RNA 134.

[0087] In at least one implementation, the IVT reaction 136 is optimized for the RNA polymerase 226 used, particularly because the amount of transposon-integrated gDNA 132 is low (e.g., arising from a single nucleus). As such, to increase the yield of the PIPE RNA 134, the IVT reaction 136 may be extended over a prolonged period (e.g., 72 hours), and additional RNA polymerase 226 may be added at pre-determined timepoints along with additional nucleotides (e.g., ribonucleotide triphosphates, orNTPs, for RNA synthesis). As anon-limiting example, additional portions of the RNA polymerase 226 and the nucleotides may be added at 24-hour intervals throughout the IVT reaction 136, although other intervals may be used (e.g., 8-hour intervals, 12-hour intervals, etc.).

[0088] Moreover, in order to reduce losses of the transposon-integrated gDNA 132, in at least one implementation, the transposome protein removal 218, the gap extension 222, and the IVT reaction 136 are performed in the same microtiter plate wells that received the sorted nuclei 216. In some such scenarios, the concentration of dNTPs used in the gap extension 222 may be reduced to minimize the amount of dNTPs present in the IVT reaction mixture 228.

[0089] In at least one variation, rather than performing the IVT reaction 136, the transposon-integrated gDNA 132 is amplified (e.g., using the nucleic acid amplifier 106) to directly generate the PIPE DNA 138 from the transposon-integrated gDNA 132. as further described below.FIG. 1 Patents 22 Docket No.: BI-11119-PCT

[0090] In one or more implementations, a first cleanup 230 is performed to isolate the PIPE RNA 134 from the IVT reaction mixture 228 (e.g., mosaic end primers, enzymes, dNTPs and / or NTPs, buffer, etc.). Various reaction cleanup techniques may be used, including techniques that enable the PIPE RNA 134 to be selectively captured over the transposon-integrated gDNA 132 and other reaction components. By way of example, solid phase reversible immobilization (SPRI) may be used in the first cleanup 230. where paramagnetic beads are used to selectively bind RNA fragments of a selected size range while genomic DNA, unused nucleotides, enzymes, salts, etc. are washed away. The PIPE RNA 134 may be eluted from the paramagnetic beads using an appropriate eluent, such as nuclease-free water.

[0091] Continuing to FIG. 2B, in at least one implementation, the PIPE RNA 134 undergoes reverse transcription 232 at the nucleic acid amplifier 106 to generate the PIPE DNA 138. By way of example, the reverse transcription 232 includes combining the PIPE RNA 134 with a reverse transcriptase 234 and reverse transcription (RT) primers 236 to form an RT reaction mixture 238. The RT primers 236 are a subset of the primers 140 introduced with respect to FIG. 1. for example. Other reagents, such as buffer(s) and nucleotides to be incorporated into newly synthesized strands of cDNA (e.g., dNTPs). are also included in the RT reaction mixture 238. The RT primers 236 may be complementary to the promoter sequence at the 3'-end of the PIPE RNA 134 molecules, for instance, and may serve as a starting point for reverse transcription by the reverse transcriptase 234 to catalyze the synthesis of the PIPE DNA 138 using the bound PIPE RNA 134 molecule as the template.

[0092] In one or more implementations, at least a portion of the reagents of the RT reaction mixture 238 are provided in a commercially available kit. The commercially available kit may include a so-called “master mix” of, for example, the reverse transcriptase 234, the buffer, and / or the nucleotides. Alternatively, however, at least a portion of these reagents may be added separately. The RT primers 236 may be designed to target the promoter sequence at the 3'-end of the PIPE RNA 134 molecules, and thus, in at least one implementation, the RT primers 236 may be separately obtained.

[0093] In at least one implementation, the reverse transcriptase 234 is selected from a plurality of reverse transcriptase enzymes that do not possess terminal transferase activity. Terminal transferase activity refers to the ability of some reverse transcriptase enzymes (e.g., Moloney murine leukemia virus, or MMLV, -type reverse transcriptases) to add a non-templated series of nucleotides (e.g., three cytosine nucleotides) at the 3' of the PIPE DNA 138 fragment once the reverse transcriptase enzyme reaches the 5'-end of the PIPE RNA 134 fragment. As such,FIG. 1 Patents 23 Docket No.: BI-11119-PCTin at least one implementation, the reverse transcriptase 234 is not an MMLV-type reverse transcriptase.

[0094] In one non-limiting example, the reverse transcriptase 234 is a thermostable group II intron-encoded reverse transcriptase that has an active site with multiple additional interactions with the PIPE RNA 134 to increase stability, fidelity, and processivity of the reverse transcription over MMLV-type reverse transcriptases. The thermostable group II intron-encoded reverse transcriptase lacks terminal transferase activity . As such, the PIPE DNA 138 may be synthesized with higher fidelity (e.g., accuracy) and processivity (e.g., ability to remain bound to replicate the entire length of the PIPE RNA 134 molecule without dissociating) than techniques that rely on template switching via the terminal transferase activity of lower fidelity and processivity reverse transcriptases. This supports cDNA synthesis from long RNA transcripts and / or RNA transcripts with strong secondary structures.

[0095] As mentioned above, the reverse transcription 232 may be performed in the nucleic acid amplifier 106. By way of example, the RT reaction mixture 238 is placed in the nucleic acid amplifier 106 in an appropriate volume in an appropriate container (e.g., a sealed plate or tube strip), and the nucleic acid amplifier 106 undergoes a timed series of temperature changes according to a program. In at least one implementation, the nucleic acid amplifier 106 includes a heated lid in order to prevent condensation of the RT reaction mixture 238 at a top of the container.

[0096] In at least one implementation, a first temperature step of the reverse transcription 232 is an annealing step, where the RT primers 236 anneal to the PIPE RNA 134. A second temperature step is a reverse transcription step, where the reverse transcriptase 234 synthesizes the PIPE DNA 138 using the PIPE RNA 134 as the template, and a third temperature step is an enzyme inactivation step, which stops the reverse transcription reaction. In a fourth temperature step, the RT reaction mixture 238 may be held indefinitely at a cold temperature for preservation.

[0097] The reverse transcription 232 results in the PIPE DNA 138, which may be directly sequenced or optionally amplified in an amplification reaction 240. The amplification reaction 240, for instance, may be performed in order to generate additional material for sequencing, which may increase sequencing sensitivity'. However, the amplification reaction 240 may also introduce bias. For example, the amplification reaction 240 may introduce distorting effects that bias toward shorter length molecules, causing shorter cDNA molecules to be over-represented (relative to their representation in the original biological sample) and longer cDNA molecules to be under-represented (relative to their representation in the original biologicalFIG. 1 Patents 24 Docket No.: BI-11119-PCTsample) in the final, amplified PIPE DNA 138 mixture that is analyzed via the nucleic acid sequencer 108.

[0098] In the example shown in FIG. 2B, a subset of the primers 140 used in the amplification reaction 240 includes amplification primers 242. The amplification primers 242 may include forward and reverse primers targeting the promoter sequence at respective ends of the PIPE DNA 138 molecules, for instance, and may serve as a starting point for the DNA polymerase 224 to replicate the PIPE DNA 138. By way of example, the amplification primers 242 and the DNA polymerase 224 may be added to the PIPE DNA 138 generated by the reverse transcription 232 in an amplification reaction mixture 244. Additional reagents may be included in the amplification reaction 240, including one or more buffers, nucleotides to be incorporated into newly synthesized strands of DNA (e.g., dNTPs). and water. In one or more implementations, additional additives may be used that help facilitate amplification by modifying the melting (e.g., denaturation) behavior of DNA. In one or more implementations, at least a portion of these reagents are provided in a commercially available kit. The commercially available kit may include a so-called ‘“master mix" of, for example, the polymerase enzyme(s), the buffer, and the nucleotides. Alternatively, however, these reagents may be added separately.

[0099] In at least one implementation, optionally, the amplification primers 242 include sequencing indices to enable sample multiplexing during sequencing and / or adapter sequences that facilitate flow cell binding during sequencing. It is to be appreciated, however, that sequencing indices and / or adapter sequences, when used, may be incorporated via adapter ligation or other techniques.

[0100] The amplification reaction 240, when included, is performed in the nucleic acid amplifier 106. By way of example, the amplification reaction mixture 244 is placed in the nucleic acid amplifier 106 in an appropriate container, and the nucleic acid amplifier 106 undergoes a timed series of temperature changes according to a program. In at least one implementation, a first temperature step may correspond to an initial activation step, where the DNA polymerase 224 is activated, and a second temperature change may correspond to a denaturation step, where secondary structures of the PIPE DNA 138 are disrupted. A third temperature step may correspond to an annealing step where the amplification primers 242 bind to targeted regions of the PIPE DNA 138, with the temperature used adjusted based on an annealing temperature of the amplification primers 242. A fourth temperature step may correspond with an extension step of new strands of cDNA using the DNA polymerase 224. The time used during the fourth temperature step may be adjusted based on a length of a desiredFIG. 1 Patents 25 Docket No.: BI-11119-PCTamplification product, as longer products may take more synthesis time than shorter products. The second through fourth temperature steps may be repeated, e.g., a number of times adjusted based on conditions optimized for targeted cDNA recover}'. A fifth temperature step may correspond to a final extension step, and a sixth temperature step may hold the amplification reaction mixture 244 indefinitely at a cold temperature for preservation following completion of the programmed cycles.

[0101] In at least one variation, the amplification reaction 240 is performed directly on the transposon-integrated gDNA 132. That is, rather than performing the IVT reaction 136 and the reverse transcription 232, the workflow 200 may include amplifying the PIPE DNA 138 directly from the transposon-integrated gDNA 132. In this approach, the amplification primers 242 may target the amplification sequences or other suitable regions of the integrated PIPE transposons 126 (e.g., the promoter sequences) to generate the PIPE DNA 138.

[0102] It is to be appreciated that although not explicitly shown in the workflow 200, in at least one implementation, one or more additional cleanups are performed in order to isolate the PIPE DNA 138 from the RT reaction mixture 238 and / or the amplification reaction mixture 244. Various reaction clean-up techniques may be used, including techniques that enable the PIPE DNA 138 to be selectively captured. By way of example, the one or more additional cleanups may use SPRI and paramagnetic beads that are configured to selectively bind DNA fragments. Alternatively, another type of cleanup technique may be used to isolate the PIPE DNA 138, such as spin column purification.

[0103] Whether derived from the reverse transcription 232 and / or the amplification reaction 240, in at least one implementation, the PIPE DNA 138 is sequenced in a sequencing reaction at the nucleic acid sequencer 108, which produces the sequencing data 122. In at least one variation, the PIPE RNA 134 is sequenced at the nucleic acid sequencer 108. For instance, rather than performing the reverse transcription 232 and the optional amplification reaction 240 to generate the PIPE DNA 138 from the PIPE RNA 134, the PIPE RNA 134 may be directly sequenced using an RNA sequencing technique. Whether DNA or RNA sequencing is used, as introduced with respect to FIG. 1. due to usage of the PIPE transposons 126, the sequencing data 122 includes sequencing reads 142 having positional information encoded therein, enabling accurate and efficient genome reconstruction by the genome reconstruction module 146.

[0104] FIG. 3 depicts a first illustrative example process 300 for generating transposons for paired index position-encoded sequencing, e.g., as a part of the PIPE transposon preparation 204. The process 300, for instance, highlights one implementation of generating the PIPEFIG. 1 Patents 26 Docket No.: BI-11119-PCTtransposons 126 introduced with respect to FIG. 1. As such, where appropriate, reference will be made to components previously described with reference to FIGS. 1-2B. It is to be appreciated that the process 300 is a simplified example, and the relative lengths of the various sequence portions are not to scale. Moreover, for illustrative clarity, particular sequence portions are not labeled in every part of the figure. Sequence portions that are shaded the same are meant to denote the same sequence or a complement thereof.

[0105] The process 300 includes an oligonucleotide (e.g., "Aligo”) coupling 302. The oligo coupling 302 shows a first functionalized oligo 304 and a second functionalized oligo 306. The first functionalized oligo 304 and the second functionalized oligo 306 both include a polymerase promoter 308 (e g., depicted with no shading) and an amplification sequence 310 (e.g., depicted with black shading). Specifically, the polymerase promoter 308 is positioned at a 5'-end of the corresponding oligonucleotide, and the amplification sequence 310 is positioned at the 3'-end of the corresponding oligonucleotide. The first functionalized oligo 304 and the second functionalized oligo 306 differ based on a functional group attached to the 5 '-end. In the example depicted in FIG. 3. the first functionalized oligo 304 includes a strained alkyne 312 (e.g., cyclooctyne, or a derivative thereof) coupled to the polymerase promoter 308 at the 5'-end, whereas the second functionalized oligo 306 includes an azide 314 coupled to the polymerase promoter 308 at the 5'-end. As a non-limiting example, the strained alkyne 312 is dibenzocyclooctyne, where the alkyne is highly reactive due to its position in an eightmembered ring (e.g.. cyclooctyne) fused with two benzene rings. The eight-membered ring of cyclooctyne has inherent ring strain because the carbon atoms in the alkyne are forced into a bent geometry. Moreover, the benzene rings may increase the electrophilicity of the alkyne, which may increase its reactivity with the azide 314 via a cycloaddition reaction (e.g., a strain-promoted azide-alkyne cycloaddition reaction, or SPAAC reaction). It is to be appreciated, however, that other functional groups that enable efficient, selective 5'-to-5' coupling of the first functionalized oligo 304 and the second functionalized oligo 306 may be used.

[0106] The oligo coupling 302 results in a coupled oligo 316 formed from one molecule of the first functionalized oligo 304 and one molecule of the second functionalized oligo 306. By way of example, during the oligo coupling 302, the first functionalized oligo 304 and the second functionalized oligo 306 may be added together in an aqueous solution to generate the coupled oligo 316 in bulk. The coupled oligo 316 includes a covalent linkage 318 where the polymerase promoter 308 from the first functionalized oligo 304 and the polymerase promoter 308 from the second functionalized oligo 306 are coupled together at their 5'-ends. In the present example, the covalent linkage 318 is a triazole linkage. As mentioned with respect to FIG. 1,FIG. 1 Patents 27 Docket No.: BI-11119-PCTthe coupled oligo 316 may be isolated using a suitable purification method, such as a gel purification-based technique.

[0107] The coupled oligo 316 is subsequently used in PIPE transposon generation 320. The PIPE transposon generation 320, for instance, may use primers to extend the coupled oligo 316 to generate a plurality of different PIPE transposons having different index sequences. By way of example, the PIPE transposon generation 320 includes a plurality of different reactions, where each reaction is performed separately (e.g., in a separate vessel, such as separate reaction tubes or microtiter plate wells) using a different primer molecule. In the example illustrated in FIG. 3, the PIPE transposon generation 320 includes a first reaction 322 and a second reaction 324. Ellipses denote there are additional reactions in the PIPE transposon generation 320. As such, it is to be understood that the PIPE transposon generation 320 is not limited to two different reactions.

[0108] In the first reaction 322, the coupled oligo 316 is extended using a first primer 326, while in the second reaction 324, the coupled oligo 316 is extended using a second primer 328. The first primer 326 and the second primer 328, as well as other primers used in additional reactions, both include an annealing sequence 330 at the 3'-end. The annealing sequence 330 is configured to anneal to the coupled oligo 316. By way of example, the annealing sequence 330 may be a complement of the amplification sequence 310, or a portion thereof. In at least one variation, the annealing sequence 330 further includes at least a portion of the polymerase promoter 308. The annealing sequence 330 allows the respective primer to anneal to the coupled oligo 316 so that the coupled oligo 316 can be extended, e.g., using DNA polymerase (e.g., the DNA polymerase 224) in a polymerase extension reaction. As shown in the process 300, the annealing sequence 330 may enable the corresponding primer to bind to the coupled oligo 316 at both ends in order to extend from the amplification sequence 310 on each side of the covalent linkage 318.

[0109] The first primer 326 and the second primer 328, as well as other primers used in additional reactions, both further include a transposon recognition sequence template 332 at the 5 '-end. The transposon recognition sequence template 332 is a complementary sequence to a transposon recognition sequence. As mentioned above, the transposon recognition sequences may be mosaic ends. As such, the transposon recognition sequence template 332 provides a template for extending the coupled oligo 316 to include the transposon recognition sequence at its 3'-ends.

[0110] The first primer 326 further includes a first index template 334 (diamond shading) between the annealing sequence 330 and the transposon recognition sequence template 332,FIG. 1 Patents 28 Docket No.: BI-11119-PCTwhereas the second primer 328 includes a second index template 336 (horizontal shading) between the annealing sequence 330 and the transposon recognition sequence template 332. As such, the first primer 326 and the second primer 328 may differ with respect to the index sequence included therein. The first index template 334, for instance, provides a template for extending the coupled oligo 316 in the 3'-direction (e.g., toward both ends of the oligonucleotide) to include a first index sequence 338 via the first reaction 322. Similarly, the second index template 336 provides a template for extending the coupled oligo 316 in the 3' direction to include a second index sequence 340 via the second reaction 324. The first index template 334 is complementary to the first index sequence 338, and the second index template 336 is complementary to the second index sequence 340.

[0111] The first index template 334 and the second index template 336 comprise different nucleotide sequences. By way of example, the first index template 334 and the second index template 336 may have more than one nucleotide difference (e.g., two or more differences) with respect to each other and with respect to the index sequences of other primers used in the PIPE transposon generation 320 in order to account for potential errors that may arise during, for example, the workflow 200 of FIGS. 2A and 2B. For instance, as mentioned above with respect to FIG. 1, in at least one implementation, the index matching algorithm 148 uses fuzzy matching to match index sequences with respect to each other, and the fuzzy' matching may allow a configurable tolerance or threshold of mismatch between the sequencing reads 142. In at least one variation, however, the second index template 336 and the first index template 334 have at least one difference with respect to each other and with respect to the other primers used in the PIPE transposon generation 320. Moreover, it is to be appreciated that the first index template 334 and the second index template 336 may have the same length or different lengths (e.g.. numbers of nucleotides) with respect to each other and with respect to the index sequences of the other primers used in the PIPE transposon generation 320.

[0112] Because the first index template 334 and the second index template 336 are different, the first index sequence 338 and the second index sequence 340 are also different with respect to each other, as encoded by the respective index template sequence used. As such, using different pnmers having different index sequences in the separate reactions of the PIPE transposon generation 320 may create a pool of PIPE transposons 126 having unique paired indexes (e.g., the same index sequence on both sides of the covalent linkage 318 in a given molecule) from a same starting material (e.g., the coupled oligo 316).

[0113] Although not explicitly shown in FIG. 3, it is to be appreciated that the PIPE transposon generation 320 further includes extending the coupled oligo 316 to include the transposonFIG. 1 Patents 29 Docket No.: BI-11119-PCTrecognition sequence at the 3'-ends of each molecule, as mentioned above and further illustrated with respect to FIG. 4.

[0114] The process 300 may enable the PIPE transposons 126 to be made at-scale in a cost-effective manner. For instance, generating the coupled oligo 316 in bulk via the oligo coupling 302 and then extending the coupled oligo 316 to include a plurality of different index sequences and the transposon recognition sequences in individual reactions using different primers provides flexibility as well as reduces costs compared to individually synthesizing each different PIPE transposon 126. However, variations are also possible.

[0115] FIG. 4 depicts a second illustrative example process 400 for generating transposons for paired index position-encoded sequencing, e.g., as a part of the PIPE transposon preparation 204. The process 400, for instance, highlights another implementation of generating the PIPE transposons 126 introduced with respect to FIG. 1. As such, where appropriate, reference will be made to components previously described with reference to FIGS. 1-3. It is to be appreciated that the process 400 is a simplified example, and the relative lengths of the various sequence portions are not to scale. Moreover, for illustrative clarity, particular sequence portions are not labeled in every part of the figure. Sequence portions that are shaded the same are meant to denote the same sequence or a complement thereof.

[0116] The process 400 includes the coupled oligo 316, which may be generated via the oligo coupling 302 described with respect to FIG. 3, for example. The coupled oligo 316 is used in PIPE transposon generation 402, which is a ligation-based approach in the process 400. That is, rather than using primers to extend the coupled oligo 316 to generate a plurality of different PIPE transposons having different index sequences with respect to each other (e.g., as described for the PIPE transposon generation 320 of FIG. 3), the PIPE transposon generation 402 includes ligating index oligonucleotides to the 3' ends of the coupled oligo 316.

[0117] By way of example, the PIPE transposon generation 402 includes a plurality of different ligation reactions, where each ligation reaction is performed separately (e.g., in a separate vessel, such as separate reaction tubes or microtiter plate wells) using a different index oligonucleotide molecule. In the example illustrated in FIG. 4, the PIPE transposon generation 402 includes a first reaction 404, a second reaction 406, and a third reaction 408. The first reaction 404 produces a first transposon 410 having the first index sequence 338, the second reaction 406 produces a second transposon 412 having the second index sequence 340, and the third reaction 408 produces a third transposon 414 having a third index sequence 416, as will be elaborated below. Ellipses denote there are additional reactions in the PIPE transposonFIG. 1 Patents 30 Docket No.: BI-11119-PCTgeneration 402. As such, it is to be understood that the PIPE transposon generation 402 is not limited to three different reactions.

[0118] In the first reaction 404, the coupled oligo 316 is ligated with a first index oligonucleotide 418 using a bridge oligonucleotide 420 and ligase enzyme 422. The first index oligonucleotide 418 includes, from 5' to 3', a common splint sequence 424 (indicated by vertical shading in FIG. 4), the first index sequence 338, and a transposon recognition sequence 426 (e.g., a mosaic end sequence). The common splint sequence 424, for instance, may be a common 5' sequence portion that is shared by each different index oligonucleotide that is complementary to a portion of the bridge oligonucleotide 420. The bridge oligonucleotide 420 is configured to anneal to both 3'-ends of the coupled oligo 316 as well as the 5'-end of the first index oligonucleotide 418. By way of example, a first end (e.g.. the 5'-end) of the bridge oligonucleotide 420 is complementary to the 3'-ends of the coupled oligo 316 (e.g., the 3'-end of the amplification sequence 310), and a second end (e.g., the 3'-end) of the bridge oligonucleotide 420 is complementary to the common splint sequence 424 at the 5'-end of the first index oligonucleotide 418. The bridge oligonucleotide 420 thus brings the 5'-end of the first index oligonucleotide 418 in close proximity with one of the 3'-ends of the coupled oligo 316. This positioning allows the ligase enzyme 422 to create a covalent bond between the coupled oligo 316 and the first index oligonucleotide 418 on each 3'-end of the coupled oligo 316, thus creating the first transposon 410. The ligase enzyme 422 may be a DNA or RNA ligase, for example.

[0119] The second reaction 406 includes ligating the coupled oligo 316 with a second index oligonucleotide 428 using the bridge oligonucleotide 420 and the ligase enzyme 422. The second index oligonucleotide 428 includes, from 5' to 3', the common splint sequence 424, the second index sequence 340, and the transposon recognition sequence 426. As such, the second index oligonucleotide 428 varies from the first index oligonucleotide 418 based on the specific index sequence included. Annealing of the bridge oligonucleotide 420 to both the coupled oligo 316 and the second index oligonucleotide 428 allows the ligase enzyme 422 to create a covalent bond between the coupled oligo 316 and the second index oligonucleotide 428 on each 3'-end of the coupled oligo 316, resulting in the second transposon 412.

[0120] The third reaction 408 includes ligating the coupled oligo 316 with a third index oligonucleotide 430 using the bridge oligonucleotide 420 and the ligase enzyme 422. The third index oligonucleotide 430 includes, from 5' to 3', the common splint sequence 424, the third index sequence 416, which is different from both of the first index sequence 338 and the second index sequence 340, and the transposon recognition sequence 426. As such, the third indexFIG. 1 Patents 31 Docket No.: BI-11119-PCToligonucleotide 430 varies from the first index oligonucleotide 418 and the second index oligonucleotide 428 based on the specific index sequence included. Annealing of the bridge oligonucleotide 420 to both the coupled oligo 316 and the third index oligonucleotide 430 allows the ligase enzyme 422 to create a covalent bond between the coupled oligo 316 and the third index oligonucleotide 430 on each 3'-end of the coupled oligo 316, resulting in the third transposon 414.

[0121] The first transposon 410, the second transposon 412, and the third transposon 414, as well as other transposons included in the PIPE transposons 126, further include hybridizing oligos 432. The hybridizing oligos 432 are designed to hybridize with the transposon recognition sequence 426 and may help prime DNA synthesis, for instance, after transposition. It is to be appreciated that the bridge oligonucleotide 420 is not included in the PIPE transposons 126 but facilitates the ligation of the respective reactions by aligning the respective index oligonucleotide with the coupled oligo 316.

[0122] Further variations are also possible. For example, a plurality of different 5'-functionalized oligonucleotides may be commercially synthesized and ordered, where each of the plurality of different 5 '-functionalized oligonucleotides includes the polymerase promoter 308, the amplification sequence 310, an index sequence (e.g., a unique index sequence with respect to the other 5'-functionalized oligonucleotides), and the transposon recognition sequence. These pre-made pieces may then undergo the 5'-to-5' linkage process to form the PIPE transposons 126.

[0123] FIG. 5 depicts an illustrative example process 500 for generating transposomes for paired index position-encoded sequencing. The process 500, for instance, highlights one implementation of generating the PIPE transposomes 206 introduced with respect to FIG. 2A. As such, where appropriate, reference will be made to components previously described with reference to FIGS. 1-4. It is to be appreciated that the process 500 is a simplified example, and the relative lengths of the various sequence portions are not to scale. Moreover, for illustrative clarity, particular sequence portions are not labeled in every part of the figure. Similar to FIGS.3 and 4, sequence portions that are shaded the same are meant to denote the same sequence or a complement thereof.

[0124] The process 500 shows the PIPE transposons 126, e g., as generated via the process 300 of FIG. 3 and / or the process 400 of FIG. 4. For illustrative clarity, the common splint sequence 424 is not included in the PIPE transposons 126 but may be included when the PIPE transposons 126 are generated via the process 400 of FIG. 4. The PIPE transposons 126 include the first transposon 410, the second transposon 412, and the third transposon 414. EllipsesFIG. 1 Patents 32 Docket No.: BI-11119-PCTindicate there are additional transposons included in the PIPE transposons 126. The first transposon 410, for instance, may be generated via the first reaction 322 and / or the first reaction 404, as indicated by the inclusion of the first index sequence 338 between the amplification sequence 310 and the transposon recognition sequence 426 on both sides of the covalent linkage 318. Similarly, the second transposon 412 may be generated via the second reaction 324 and / or the second reaction 406, as indicated by the inclusion of the second index sequence 340 between the amplification sequence 310 and the transposon recognition sequence 426. Although not shown in FIG. 3, it is to be appreciated that the third transposon 414 may also be synthesized via the process 300 of FIG. 3 and / or may be generated via the third reaction 408 of FIG. 4.

[0125] As may be appreciated via the illustration of FIG. 5. each of the PIPE transposons 126 includes two identical oligonucleotides that are coupled together at their 5'-ends. By way of example, the two identical oligonucleotides each comprise, from 5' to 3', the polymerase promoter 308, the amplification sequence 310, the specific index sequence (e.g., the first index sequence 338 for the first transposon 410, the second index sequence 340 for the second transposon 412, or the third index sequence 416 for the third transposon 414), and the transposon recognition sequence 426. In at least one variation, however, each of the PIPE transposons 126 includes non-identical oligonucleotides coupled together at their 5'-ends. By w ay of example, the non-identical oligonucleotides may include different index sequences with respect to each other and with respect to the other PIPE transposons 126. That is, a given PIPE transposon 126 may include two index sequences that are known to be linked to each other via the covalent linkage 318.

[0126] The process 500 includes transposome assembly 502, where the PIPE transposons 126 are assembled into the PIPE transposomes 206 via the addition of the transposase enzyme 208. By w ay of example, the PIPE transposons 126 are mixed with monomers of the transposase enzyme 208 (e.g., hyperactive Tn5 transposase protein monomers) to generate the PIPE transposomes 206, which comprise functional complexes for integrating the PIPE transposons 126 into a target DNA sequence. As shown in FIG. 5, the transposase enzyme 208 binds to the transposon recognition sequence 426 to form a stable complex with the PIPE transposons 126. The transposase enzyme 208 binds to both ends of the PIPE transposons 126 (e.g., to both transposon recognition sequences 426), and once bound, brings the ends together to form the functional unit capable of catalyzing the transposition process.

[0127] In the illustrated example, the first transposon 410 forms a first transposome complex 504, the second transposon 412 forms a second transposome complex 506, and the thirdFIG. 1 Patents 33 Docket No.: BI-11119-PCTtransposon 414 forms a third transposome complex 508. It is to be appreciated that the PIPE transposomes 206 include additional transposome complexes, such as one for each of the different PIPE transposons 126.

[0128] In at least one implementation, the transposome assembly 502 includes pooling the PIPE transposons 126 prior to addition of the transposase enzy me 208. In at least one variation, however, the transposase enzyme 208 is added separately to the PIPE transposons 126, such as to each individual reaction of the PIPE transposon generation 320 of FIG. 3 or the PIPE transposon generation 402 of FIG. 4. As such, the PIPE transposons 126 may be pooled prior to the transposome assembly 502 or after the transposome assembly 502 for use in the transposon integration 130 described herein.

[0129] Having discussed example details of the techniques for paired index position-encoded sequencing, consider now an example to illustrate usage of the techniques.Example Application

[0130] FIGS. 6A and 6B depict an example implementation 600 of the paired index position-encoded sequencing. As such, where appropriate, reference will be made to components previously described with reference to FIGS. 1 -5. It is to be appreciated that the implementation 600 is a simplified example, and the relative lengths of the various sequence portions are not to scale. Moreover, for illustrative clarity, particular sequence portions are not labeled in every part of the figure.

[0131] Referring first to FIG. A, the implementation 600 schematically shows the transposon integration 130, where the first transposome complex 504, the second transposome complex 506, and the third transposome complex 508 (as well as other transposome complexes of the PIPE transposomes 206) are integrated into the gDNA 124. Ellipses denote that only a portion of the gDNA 124 is shown. By way of example, the first transposome complex 504 integrates into the gDNA 124 at a first cleavage site 602, the second transposome complex 506 integrates into the gDNA 124 at a second cleavage site 604, and the third transposome complex 508 integrates into the gDNA 124 at a third cleavage site 606. Although the first cleavage site 602, the second cleavage site 604, and the third cleavage site 606 are indicated at the same position in each strand of the gDNA 124, it is to be appreciated that the transposase enzyme 208 may make staggered cuts so that one strand of the gDNA 124 is cut at a slightly offset position from the other strand of the gDNA 124, resulting in two short single-stranded overhangs. The overhangs may help mediate joining of the transposon recognition sequence 426 with the gDNA 124, for instance.FIG. 1 Patents 34 Docket No.: BI-11119-PCT

[0132] The transposon integration 130 results in the transposon-integrated gDNA 132. Although not explicitly shown in FIG. 6A, it is to be appreciated that the flow sorting 214, the transposome protein removal 218, and the gap extension 222 may be performed as a part of generating the transposon-integrated gDNA 132, e.g., after the PIPE transposons 126 are integrated, such as described above with respect to FIG. 2A. In the illustrated example, the transposon-integrated gDNA 132 is shown as including the first transposon 410 between a first gDNA fragment 608 (e.g., "fragment 1”) and a second gDNA fragment 610 (e.g., “fragment 2”), the second transposon 412 between the second gDNA fragment 610 and a third gDNA fragment 612 (e.g., “fragment 3"). and the third transposon 414 between the third gDNA fragment 612 and a fourth gDNA fragment 614 (e.g., “fragment 4”). The first gDNA fragment 608, the second gDNA fragment 610, the third gDNA fragment 612. and the fourth gDNA fragment 614 form a contiguous sequence in the gDNA 124, for instance.

[0133] Although not explicitly shown in FIGS. 6A and 6B, the IVT reaction 136 may be performed on the transposon-integrated gDNA 132 to generate the PIPE RNA 134, followed by the reverse transcription 232 of the PIPE RNA 134 to generate the PIPE DNA 138, such as described above with respect to FIGS. 2A and 2B. Alternatively, amplification may be performed on the transposon-integrated gDNA 132 to generate the PIPE DNA 138. The PIPE DNA 138 is shown in FIG. 6B as including a first DNA fragment 616, a second DNA fragment 618, a third DNA fragment 620, and a fourth DNA fragment 622, which are each doublestranded fragments in the present example. In particular, the first DNA fragment 616 includes the first gDNA fragment 608 and the first index sequence 338; the second DNA fragment 618 includes the first index sequence 338, the second gDNA fragment 610, and the second index sequence 340; the third DNA fragment 620 includes the second index sequence 340, the third gDNA fragment 612 and the third index sequence 416; and the fourth DNA fragment 622 includes the third index sequence 416 and the fourth gDNA fragment 614. By w ay of example, the first index sequence 338 is positioned near a first end of the second DNA fragment 618 while the second index sequence 340 is positioned near a second end of the of the second DNA fragment 618, with the second gDNA fragment 610 positioned in between the first index sequence 338 and the second index sequence 340. It is to be appreciated that the first DNA fragment 616, the second DNA fragment 618, the third DNA fragment 620, and the fourth DNA fragment 622 also include other sequence portions of the PIPE transposons 126, e.g., the polymerase promoter 308, the amplification sequence 310, and the transposon recognition sequence 426. For instance, using the second DNA fragment 618 as an example, the second DNA fragment 618 may include, from 3' to 5' (or, alternatively, 5' to 3', depending on aFIG. 1 Patents 35 Docket No.: BI-11119-PCTdirection of the strand), the polymerase promoter 308, the amplification sequence 310, the first index sequence 338, the transposon recognition sequence 426. the second gDNA fragment 610, the transposon recognition sequence 426, the second index sequence 340, the amplification sequence 310, and the polymerase promoter 308.

[0134] The PIPE DNA 138 is sequenced (e.g., via the nucleic acid sequencer 108) to generate the sequencing data 122. The index matching algorithm 148 (e.g., of the genome reconstruction module 146 operating on the sequencing data processor 110) receives the sequencing data 122 and outputs the sequence 144 based thereon. In the implementation 600 shown in FIG. 6B, the sequencing data 122 include a first read 624, a second read 626, a third read 628, and a fourth read 630. It is to be appreciated that the sequencing data 122 comprise a vast number of reads (e.g., the sequencing reads 142 introduced with respect to FIG. 1), including a plurality of reads for each fragment of the PIPE DNA 138. As such, although four reads are shown for illustrative purposes, the implementation 600 is not meant to limit the sequencing data 122 to four reads.

[0135] In the present example, the first read 624 comprises a first sequence including the second gDNA fragment 610 between the first index sequence 338 and the second index sequence 340. Ellipses denote that additional sequence portions (e.g., the polymerase promoter 308, the amplification sequence 310, and / or the transposon recognition sequence 426) may precede or follow the indicated portions. The second read 626 comprises a second sequence including the third index sequence 416 and the fourth gDNA fragment 614. The third read 628 comprises a third sequence including the first gDNA fragment 608 and the first index sequence 338. The fourth read 630 comprises a fourth sequence including the second index sequence 340, the third gDNA fragment 612, and the third index sequence 416. Based on the particular index sequences found in a given read and their order, the index matching algorithm 148 may match the paired indexes 128 to determine the relative position of the reads with respect to each other in the sequence 144. By way of example, because the first read 624 includes the first index sequence 338 at its first end and the third read 628 includes the first index sequence 338 at its second end, the index matching algorithm 148 may determine that the first gDNA fragment 608 (e.g., as sequenced in the third read 628) is natively adjacent to the second gDNA fragment 610 (e.g., as sequenced in the first read 624), with the first gDNA fragment 608 arranged before the second gDNA fragment 610. That is, the first index sequence 338 of the third read 628 and the first index sequence 338 of the first read 624 comprise one of the paired indexes 128. Similarly, because the first read 624 includes the second index sequence 340 at its second end and the fourth read 630 includes the second index sequence 340 at its first end, the index matching algorithm 148 may determine that the third gDNA fragment 612 (e.g., asFIG. 1 Patents 36 Docket No.: BI-11119-PCTsequenced in the fourth read 630) follows the second gDNA fragment 610. Because the fourth read 630 further includes the third index sequence 416 at its second end and the second read 626 includes the third index sequence 416 at its first end, the index matching algorithm 148 may further determine that the fourth gDNA fragment 614 (e.g., as sequenced in the second read 626) follows the third gDNA fragment 612.

[0136] Accordingly, the sequence 144 shows a reconstructed genomic sequence comprising the first gDNA fragment 608 adjacent to and immediately followed by the second gDNA fragment 610, which is adjacent to and immediately followed by the third gDNA fragment 612, which is further adjacent to and immediately followed by the fourth gDNA fragment 614.

[0137] Because the paired indexes 128 enable the index matching algorithm 148 to determine natively adjacent gDNA fragments, the genome reconstruction module 146 is able to provide the sequence 144 with increased accuracy, particularly with respect to repetitive regions of the genome, and with greater computational efficiency compared to traditional linked-read or short-read overlap approaches. Moreover, the sequencing depth used to generate the sequencing data 122 may be decreased, enabling reduced sequencing resource consumption.

[0138] Having discussed example details of the techniques for paired index position-encoded sequencing, consider now an example procedure to illustrate additional aspects of the techniques.Example Procedures

[0139] This section describes example procedures for paired index position-encoded sequencing in one or more implementations. Aspects of the procedures may be implemented in hardware, firmware, or software, or a combination thereof. The procedures are shown as a set of blocks that specify operations performed by one or more devices and are not necessarily limited to the orders shown for performing the operations by the respective blocks. In at least some implementations, at least a portion of the procedures are performed by a suitably configured device, such as the sequencing data processor 110 of FIG. 1, by executing instructions stored in a non-transitory computer-readable storage medium.

[0140] FIG. 7 depicts an example procedure 700 in which transposomes are generated for paired index position-encoded sequencing.

[0141] A 5'-to-5' coupled oligonucleotide is synthesized (block 702). By way of example, the 5'-to-5' coupled oligonucleotide (e.g., the coupled oligo 316 of FIG. 3) may be synthesized by coupling a first oligonucleotide having a first functional group attached to its 5'-end (e.g., the first functionalized oligo 304) with a second oligonucleotide having a second functional groupFIG. 1 Patents 37 Docket No.: BI-11119-PCTattached to its 5'-end (e.g., the second functionalized oligo 306). In at least one implementation, the first oligonucleotide and the second nucleotide have a same sequence of nucleotides. By way of example, the first oligonucleotide and the second oligonucleotide may include a polymerase promoter site at the 5'-end (e.g., the polymerase promoter 308) and an amplification site at the 3'-end (e.g., the amplification sequence 310).

[0142] The first functional group and the second functional group are configured to react selectively with each other, e.g., in a click chemistry reaction. In at least one implementation, the first functional group is a strained alkyne (e.g., cyclooctyne, or a derivative thereof), and the second functional group is an azide. In such implementations, the first oligonucleotide and the second oligonucleotide may be coupled via a SPAAC reaction, and the resulting 5'-to-5' coupled oligonucleotide includes a covalent linkage (e.g., a tri azole linkage) between the 5'-ends. By way of example, the covalent linkage (e.g., the covalent linkage 318) may couple the polymerase promoter sites of the respective oligonucleotides. The 5'-to-5' coupled oligonucleotide may be isolated using a suitable purification method, such as a gel purificationbased technique.

[0143] A plurality of different transposons having uniquely paired index sequences are generated from the 5'-to-5' coupled oligonucleotide (block 704). By way of example, aliquots of the 5'-to-5' coupled oligonucleotide may be distributed to a plurality of individual w ells of a multi-well plate (e.g., a microtiter plate), and a different primer molecule (e.g., when a primer-mediated extension, such as the process 300 of FIG. 3, is used) or an index oligonucleotide molecule (e.g., when a ligation-based approach, such as the process 400 of FIG. 4, is used) may be added to respective wells.

[0144] When the primer-mediated extension is used to generate the plurality of different transposons, the different primer molecules may provide a template for extending the 5'-to-5' coupled oligonucleotide to include a unique index sequence, with one unique index sequence per reaction (e.g., per well). In at least one implementation, the primers may share a common annealing sequence (e.g., the annealing sequence 330) at the 3'-end that is designed to anneal to the amplification sites of the 5'-to-5' coupled oligonucleotide and a common transposon recognition sequence template (e.g., the transposon recognition sequence template 332) at the 5'-end. A template for the unique index sequence may be positioned between the common annealing sequence and the common transposon recognition sequence template.

[0145] A DNA polymerase may be used to extend the 5'-to-5' coupled oligonucleotide molecules in the 3'-directions (e.g., away from the covalent linkage) using the corresponding primer molecule as a template. Accordingly, in at least one implementation, the plurality ofFIG. 1 Patents 38 Docket No.: BI-11119-PCTdifferent transposons (e.g., the different PIPE transposon 126) may include, from 5' to 3' on both sides of the covalent linkage, the polymerase promoter site, the amplification site, the unique index sequence, and the transposon recognition sequence (e.g., the transposon recognition sequence 426). As such, the plurality of different transposons may vary only in the particular sequence of the unique index sequence (or sequences) included between the amplification site and the transposon recognition sequence.

[0146] When the ligation-based approach is used to generate the plurality of different transposons, the different index oligonucleotide molecules may include a unique index sequence near the 5'-end, with one unique index sequence per reaction (e.g., per well). The different index oligonucleotide molecules may further include the same transposon recognition sequence (e.g.. the transposon recognition sequence 426) at the 3 '-end and a common splint sequence (e.g., the common splint sequence 424) at the 5'-end. A bridge oligonucleotide (e.g., the bridge oligonucleotide 420) may be used to align a given index oligonucleotide molecule with the 5'-to-5' coupled oligonucleotide and enable ligation by a ligase enzy me (e.g., the ligase enzyme 422). The ligase enzyme may ligate the 5'-end of the index oligonucleotide molecule wi th the 3'-end of the 5'-to-5' coupled oligonucleotide (e.g., one index oligonucleotide on each 3'-end of the 5'-to-5' coupled oligonucleotide).

[0147] In each example, in at least one implementation, the unique index sequence is the same on both sides of the covalent linkage for a given transposon, such as illustrated with respect to FIGS. 3 and 4. In at least one variation, the unique index sequence is different on each side of the covalent linkage for a given transposon. Furthermore, in at least one implementation, hybridizing oligonucleotides (e.g., the hybridizing oligos 432) are added. The hybridizing oligonucleotides may anneal to the transposon recognition sequence, for instance.

[0148] The plurality of different transposons are mixed with a transposase enzyme to form transposomes (block 706). By way of example, the transposase enzyme (e.g., the transposase enzyme 208) may be transposase protein monomers (e.g., hyperactive Tn5 transposase protein monomers) that are configured to bind to the transposon recognition sequence. For instance, two transposase protein monomers may bind to each transposon molecule to form a functional complex, e.g., the transposons.

[0149] In this way, the procedure 700 may enable the plurality of different transposons and the subsequent transposomes to be made at-scale in a cost-effective and flexible manner.

[0150] FIG. 8 depicts an example procedure 800 in which paired index position-encoded sequencing is performed.FIG. 1 Patents 39 Docket No.: BI-11119-PCT

[0151] Nuclei are extracted from a biological sample and permeabilized (block 802). By way of example, the nuclei (e.g., the nuclei 202) may be prepared from homogenized tissue or another type of cell suspension (e.g., from bodily fluids or cultured cells) under conditions that preserve and permeabilize the nuclear membrane. In at least one implementation, the nuclei are isolated from cells in a manner that does not retain mitochondrial DNA. In contrast to the mitochondrial DNA, the retained DNA may be genomic DNA (e.g., the gDNA 124), e.g., as retained within the individual nuclei. Isolating the nuclei in this manner may enable single-cell whole genome sequencing with reduced or eliminated contamination from mitochondrial DNA. As non-limiting examples, the nuclei may be isolated and permeabilized using hypotonic lysis, detergents, digitonin, and / or lithium diiodosalicylate for nucleosome disruption.

[0152] Paired index position-encoding transposomes are integrated into genomic DNA within the permeabilized nuclei (block 804). By way of example, transposase enzymes of the paired index position-encoding transposomes bind to the genomic DNA at specific target sequences. In a process referred to herein as transposon integration (e.g., the transposon integration 130), the transposase enzymes introduce cuts in the genomic DNA as well as at or near transposon recognition sequences at the 3'-ends of paired index position-encoding transposons (e.g., the PIPE transposons 126) of the transposomes. As described herein, the paired index positionencoding transposons comprise 5'-end to 5'-end linked oligonucleotides, where the linked oligonucleotides include, from 5' to 3', a promoter sequence, an amplification sequence, an index sequence, and a transposon recognition sequence. The promoter sequence, the amplification sequence, and the transposon recognition sequence may be the same for each paired index position-encoding transposon molecule, while the index sequence varies between molecules.

[0153] The transposase enzyme of a given transposome may catalyze the integration of the bound paired index position-encoding transposon into the genomic DNA at the cut sites in an aqueous buffer that includes magnesium ions (e.g., Mg2+), which sen e as a cofactor for the catalytic function of the transposase enzymes. This results in PIPE transposon-integrated nuclei (e.g., the labeled nuclei 212) having the paired index position-encoding transposons integrated into the genomic DNA, e.g., also referred to as transposon-integrated genomic DNA (e.g., the transposon-integrated gDNA 132) or paired indexed genomic DNA. The transposon integration may be terminated by adding a chelating agent (e.g., EDTA), which may bind the magnesium ions and thus stop the catalytic activity of the transposase enzyme.

[0154] The nuclei are individually sorted (block 806). By way of example, following the transposon integration, the nuclei may be flow-sorted in order to isolate individual nuclei forFIG. 1 Patents 40 Docket No.: BI-11119-PCTsingle-cell sequencing. In at least one implementation, the flow-sorting includes staining the transposon-integrated nuclei for flow-activated cell sorting, which may place a single nucleus into a given well for further processing. As such, it is to be appreciated that in at least one implementation, after the nuclei are sorted, at least a portion of the procedure 800 may be performed with the nuclei isolated from each other in order to enable single-cell sequencing.

[0155] Bound transposase enzyme is removed from the integration sites in the nuclei (block 808). By way of example, the bound transposase enzyme may be removed by treating the nuclei with a proteinase (e.g., the proteinase 220). In one or more implementations, the proteinase is thermolabile proteinase K. The proteinase may digest the transposase enzyme, for example. When thermolabile, the proteinase may be heat-inactivated after the bound transposase enzyme is removed from the transposon integration sites in the nuclei. As mentioned above, the proteinase may be added to individual nuclei. By way of example, the proteinase may be added to one or more or each of the flow-sorted nuclei.

[0156] Double-stranded transposon-integrated genomic DNA is generated via gap extension (block 810). By way of example, gap extension (e.g., the gap extension 222) may be performed using a DNA polymerase (e.g., the DNA polymerase 224) in order to extend single-stranded portions where the paired index position-encoding transposons have been integrated into the genomic DNA. For instance, hybridizing oligonucleotides (e.g., the hybridizing oligos 432) included in the transposomes may serve as primers for extending the paired index positionencoding transposon sequence, such as from the transposon recognition sequences (e.g., the mosaic ends) to the promoter sequence, while overhangs of the genomic DNA may provide priming sites for filling gaps in single-stranded portions of the genomic DNA that may be introduced through the transposon insertion process.

[0157] Paired index position-encoded DNA is generated directly or indirectly from the doublestranded transposon-integrated genomic DNA (block 812). As a first example, the paired index position-encoded DNA is generated via in vitro transcription and subsequent reverse transcription (block 814). By way of example, RNA polymerase (e.g., the RNA polymerase 226) may be added to the individual nuclei in order to generate paired index position-encoded RNA (e.g., the PIPE RNA 134). The RNA polymerase may bind the promoter sequences of the paired index position-encoding transposons and generate the paired index position-encoded RNA, which may comprise nucleic acid fragments having a first index sequence (e.g., from a first paired index position-encoding transposon molecule) near a first terminus and a second index sequence (e.g.. from a second paired index position-encoding transposon molecule) near a second terminus, with an RNA transcript of the intervening genomic DNA positionedFIG. 1 Patents 41 Docket No.: BI-11119-PCTbetween the first index sequence and the second index sequence. By way of example, transcription may terminate when the RNA polymerase reaches the covalent linkage of an integrated paired index position-encoding transposon.

[0158] As such, a given RNA transcript may include dual index labels based on the particular paired index position-encoding transposon molecules that are integrated on either side of the corresponding genomic DNA. Because a given paired index position-encoding transposon molecule includes the paired indexes, paired index position-encoded RNA molecules corresponding to natively adjacent genomic DNA fragments are labeled with one of the index sequences of the paired indexes.

[0159] The paired index position-encoded DNA is then generated from the paired index position-encoded RNA via reverse transcription. By way of example, the reverse transcription (e.g., the reverse transcription 232) includes combining the paired index position-encoded RNA with a reverse transcriptase (e.g., the reverse transcriptase 234) and reverse transcription primers targeting the promoter sequence at the 3 '-end of the paired index position-encoded RNA. The reverse transcription primers may serve as a starting point for the reverse transcriptase catalyzing the synthesis of the paired index position-encoded DNA (e.g., the PIPE DNA 138) using a bound paired index position-encoded RNA molecule as the template, such as further described with respect to FIG. 2A. The paired position-encoded DNA may thus comprise linear fragments of cDNA having dual index labels that encode positional information regarding the native arrangement of the DNA fragment encoded therein. Generating the paired index position-encoded DNA via the in vitro transcription and the subsequent reverse transcription is an example of indirectly generating the paired index position-encoded DNA from the double-stranded transposon integration genomic DNA.

[0160] Optionally, the paired index position-encoded DNA is amplified. By way of example, the paired index position-encoded DNA may be amplified in an amplification reaction in order to increase an amount of the paired index position-encoded DNA for sequencing. However, in at least one variation, the paired index position-encoded DNA generated from the reverse transcription is directly sequenced without further amplification.

[0161] In at least one variation, the paired index position-encoded DNA is generated via an amplification reaction targeting the integrated transposons (block 816). By way of example, amplification primers (e.g., the amplification primers 242) may be used that target the amplification sequence or promoter sequence at either end of the integrated transposons in order to amplify the paired index position-encoded DNA directly from the double-stranded transposon-integrated genomic DNA. The amplification primers may serve as a starting pointFIG. 1 Patents 42 Docket No.: BI-11119-PCTfor the DNA polymerase to replicate the paired index position-encoded DNA, such as further described with respect to FIG. 2B.

[0162] The paired index position-encoded DNA is sequenced to generate sequencing data (block 818). By way of example, the sequencing data (e.g., the sequencing data 122) for one transposon-integrated nucleus may be distinguished from the sequencing data for other transposon-integrated nuclei, such as by labeling the corresponding paired index position-encoded DNA with sequencing indexes (e.g., when multiplexed sequencing is performed) or by sequencing the paired index position-encoded DNA from a single PIPE transposon-labeled nucleus separately. A “sequencing index,” for instance, may refer to an adapter-associated barcode (e.g., a sample barcode, a cell barcode, or a UMI) used for multiplexing or molecule counting, whereas “an index sequence” with respect to the PIPE transposons 126 may refer to a label used to encode positional adjacency of genomic fragments. In at least one implementation, long read sequencing is used in order to generate reads spanning an entire length of a given paired index position-encoded DNA molecule. Accordingly, the sequencing data may have positional information encoded therein due to the dual labeling of the paired index position-encoded DNA.

[0163] Reads of the sequencing data are arranged relative to each other based on matching paired indexes (block 820). By way of example, a sequencing data processor (e.g., the sequencing data processor 110) may receive the sequencing data 122 and use an index matching algorithm (e.g., the index matching algorithm 148) to computationally determine which sequencing reads correspond to natively adjacent genomic DNA fragments, e g., based on upstream or dow nstream positions of a particular index sequence of the paired index positionencoding transposon. For instance, the index sequences of the paired index position-encoding transposons are sequenced along with the DNA sequences of the genomic DNA, and the index matching algorithm may be programmed to identify the index sequences. By way of example, the index sequences may be known sequences and / or may be identified based on their arrangement with respect to the common sequence portions of the paired index positionencoding transposons. The index sequences informatically link the natively adjacent genomic DNA fragments. As an illustrative example, the index matching algorithm may match a first read including a first index sequence at an upstream position with a second read including the first index sequence at a downstream position to identify reads containing contiguous fragments of the genomic DNA. This may result in a relative arrangement of the reads with respect to locations of the genome.FIG. 1 Patents 43 Docket No.: BI-11119-PCT

[0164] The sequence of the biological sample is reconstructed based on the relative arrangement of the reads (block 822). By way of example, the sequence of the biological sample (e.g., a single cell of the biological sample) may be reconstructed by removing the portions of the reads that correspond to the paired index position-encoding transposons, resulting in sequences of natively adjacent DNA fragments. As a result, an accuracy of the sequencing data and its downstream analyses may be increased.

[0165] Although the procedure 800 is described with respect to performing the transposon integration in intact permeabilized nuclei, it is to be appreciated that the procedure 800 may be adapted to permeabilized cells, permeabilized tissues, bacteria, cell lysate, and the like.

[0166] Having described example procedures in accordance with one or more implementations, consider now an example system and device that can be utilized to implement the various techniques described herein.Example System and Device

[0167] FIG. 9 illustrates an example system generally at 900 that includes an example computing device 902 that is representative of one or more computing systems and / or devices that may implement the various techniques described herein. This is illustrated through inclusion of the sequencing data processor 110. The computing device 902 may be, for example, a server of a service provider, a device associated with a client (e.g., a client device), an on-chip system, and / or any other suitable computing device or computing system.

[0168] The example computing device 902 as illustrated includes a processing system 904, one or more computer-readable media 906, and one or more I / O interfaces 908 that are communicatively coupled, one to another. Although not shown, the computing device 902 may further include a system bus or other data and command transfer system that couples the various components, one to another. A system bus can include any one or combination of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and / or a processor or local bus that utilizes any of a variety of bus architectures. A variety of other examples are also contemplated, such as control and data lines.

[0169] The processing system 904 is representative of functionality to perform one or more operations using hardware. Accordingly, the processing system 904 is illustrated as including hardware elements 910 that may be configured as processors, functional blocks, and so forth. This may include implementation in hardware as an application specific integrated circuit or other logic device formed using one or more semiconductors. The hardware elements 910 are not limited by the materials from which they are formed or the processing mechanismsFIG. 1 Patents 44 Docket No.: BI-11119-PCTemployed therein. For example, processors may be comprised of semiconductor(s) and / or transistors (e.g., electronic integrated circuits (ICs)). In such a context, processor-executable instructions may be electronically executable instructions.

[0170] The computer-readable media 906 is illustrated as including memory / storage 912. The memory / storage 912 represents memory / storage capacity7associated with one or more computer-readable media. The memory / storage 912 may include volatile media (such as random-access memory (RAM)) and / or nonvolatile media (such as read only memory (ROM), Flash memory, optical disks, magnetic disks, and so forth). The memory / storage 912 may include fixed media (e.g., RAM, ROM, a fixed hard drive, and so on) as well as removable media (e.g., flash memory, a removable hard drive, an optical disc, and so forth). The computer-readable media 906 may be configured in a variety of other ways as further described below.

[0171] Input / output interface(s) 908 are representative of functionality to allow a user to enter commands and information to computing device 902 and also allow information to be presented to the user and / or other components or devices using various input / output devices. Examples of input devices include a keyboard, a cursor control device (e.g., a mouse), a microphone, a scanner, touch functionality (e.g., capacitive or other sensors that are configured to detect physical touch), a camera (e g., which may employ visible or non- visible wavelengths such as infrared frequencies to recognize movement as gestures that do not involve touch), and so forth. Examples of output devices include a display device (e.g., a monitor or projector), speakers, a printer, a network card, tactile-response device, and so forth. Thus, the computing device 902 may be configured in a variety7of ways as further described below to support user interaction.

[0172] Various techniques may be described herein in the general context of software, hardware elements, or program modules. Generally, such modules include routines, programs, objects, elements, components, data structures, and so forth that perform particular tasks or implement particular abstract data types. The terms “module,” “functionality,” and “component” as used herein generally represent software, firmware, hardware, or a combination thereof. The features of the techniques described herein are platform-independent, meaning that the techniques may be implemented on a variety of commercial computing platforms having a variety of processors.

[0173] For instance, the terms “module,” “functionality ,” and “component” may include a hardware and / or software system that operates to perform one or more functions. For example, a module, functionality, or component may include a computer processor, a controller, or another logic-based device that performs operations based on instructions stored on a tangibleFIG. 1 Patents 45 Docket No.: BI-11119-PCTand non- transitory computer-readable storage medium, such as a computer memory. Alternatively, a module, functionality, or component may include a hard-wired device that performs operations based on hard-wired logic of the device. Various modules, systems, and components shown in the attached figures may represent the hardware that operates based on software or hardwired instructions, the software that directs hardware to perform the operations, or a combination thereof.

[0174] An implementation of the described modules and techniques may be stored on or transmitted across some form of computer-readable media. The computer-readable media may include a variety of media that may be accessed by the computing device 902. By way of example, and not limitation, computer-readable media may include "computer-readable storage media” and "computer-readable signal media.”

[0175] '‘Computer-readable storage media” may refer to media and / or devices that enable persistent and / or non-transitory storage of information in contrast to mere signal transmission, carrier waves, or signals per se. Thus, computer-readable storage media refers to non-signal bearing media. The computer-readable storage media includes hardware such as volatile and non-volatile, removable and non-removable media, and / or storage devices implemented in a method or technology suitable for storage of information such as computer readable instructions, data structures, program modules, logic elements / circuits, or other data. Examples of computer-readable storage media may include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, hard disks, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or other storage device, tangible media, or article of manufacture suitable to store the desired information and which may be accessed by a computer.

[0176] '‘Computer-readable signal media” may refer to a signal-bearing medium that is configured to transmit instructions to the hardware of the computing device 902, such as via a network. Signal media typically may embody computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as carrier waves, data signals, or other transport mechanism. Signal media also include any information delivery media. The term '‘modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media.FIG. 1 Patents 46 Docket No.: BI-11119-PCT

[0177] As previously described, hardware elements 910 and computer-readable media 906 are representative of modules, programmable device logic and / or fixed device logic implemented in a hardware form that may be employed in some examples to implement at least some aspects of the techniques described herein, such as to perform one or more instructions. Hardware may include components of an integrated circuit or on-chip system, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a complex programmable logic device (CPLD), and other implementations in silicon or other hardware. In this context, hardware may operate as a processing device that performs program tasks defined by instructions and / or logic embodied by the hardware as well as a hardware utilized to store instructions for execution, e.g., the computer-readable storage media described previously.

[0178] Combinations of the foregoing may also be employed to implement various techniques described herein. Accordingly, software, hardware, or executable modules may be implemented as one or more instructions and / or logic embodied on some form of computer-readable storage media and / or by one or more hardware elements 910. The computing device 902 may be configured to implement particular instructions and / or functions corresponding to the software and / or hardware modules. Accordingly, implementation of a module that is executable by the computing device 902 as software may be achieved at least partially in hardware, e.g., through use of computer-readable storage media and / or hardware elements 910 of the processing system 904. The instructions and / or functions may be executable / operable by one or more articles of manufacture (for example, one or more computing devices 902 and / or processing systems 904) to implement techniques, modules, and examples described herein.

[0179] The techniques described herein may be supported by various configurations of the computing device 902 and are not limited to the specific examples of the techniques described herein. This functionality may also be implemented all or in part through use of a distributed system, such as over a '‘cloud” 914 via a platform 916 as described below.

[0180] The cloud 914 includes and / or is representative of a platform 916 for resources 918, which are depicted including the sequencing data processor 110. The platform 916 abstracts underlying functionality of hardware (e.g., servers) and software resources of the cloud 914. The resources 918 may include applications and / or data that can be utilized while computer processing is executed on servers that are remote from the computing device 902. Resources 918 can also include services provided over the Internet and / or through a subscriber network, such as a cellular or Wi-Fi network.

[0181] The platform 916 may abstract resources and functions to connect the computing device 902 with other computing devices. The platform 916 may also serve to abstract scaling ofFIG. 1 Patents 47 Docket No.: BI-11119-PCTresources to provide a corresponding level of scale to encountered demand for the resources 918 that are implemented via the platform 916. Accordingly, in an interconnected device example, implementation of functionality described herein may be distributed throughout the system 900. For example, the functionality may be implemented in part on the computing device 902 as well as via the platform 916 that abstracts the functionality of the cloud 914.Example Application

[0182] FIG. 10 depicts an example 1000 illustrating paired index position-encoded sequencing results. In the example 1000, sequencing data were obtained from transposon-integrated gDNA from a single nucleus (HG002 cell line) using long-read sequencing with a relatively low sequencing depth (e.g., 4 million reads). The example 1000 shows a DNA sequence 1002 spanning genomic coordinates from approximately 103,560 kb to 103,590 kb and includes a plurality of reads each representing a sequenced fragment from the paired index position-encoded DNA. Although the plurality of reads is shown aligned to the DNA sequence 1002, it is to be appreciated that the techniques described herein may be used for genome reconstruction using de novo assembly (e.g., without a reference sequence). In the DNA sequence 1002 and the plurality of reads, areas of variation are indicated by shaded markers, with different shaded markers corresponding to different types of variations, for example. The plurality' of reads includes a first read 1004, a second read 1006, a third read 1008, a fourth read 1010, a fifth read 1012, and a sixth read 1014 that are chained together, one to another.

[0183] ‘‘Linked reads,” also referred to as ‘‘chained reads,” are sequencing reads that have been computationally linked together based on matching paired index sequences to form longer contiguous genomic sequences. As described herein, linked reads may be identified by the index matching algorithm 148 when sequencing reads share common index sequences from the paired indexes 128, indicating that the sequencing reads correspond to natively adjacent fragments in the original genomic DNA. For example, a first sequencing read having a first index sequence at a downstream position may be chained to a second sequencing read having the same first index sequence at an upstream position, thereby indicating that the genomic sequences of the two reads are immediately adjacent in the genome. These two reads are therefore chained (or linked) together, and this two-read chain can be further linked to one or more additional reads. The number of chained reads refers to how many individual sequencing reads have been linked together through this index matching process to reconstruct longer portions of the original genomic sequence. Six linked reads, for instance, refers to six readsFIG. 1 Patents 48 Docket No.: BI-11119-PCTthat have chained together, one to another, to form a longer contiguous sequence via the paired indexes 128 and the index matching algorithm 148.

[0184] In the example 1000, the plurality of reads is chained together using a combination of end overlap and matching of the paired indexes 128 of adjacent reads. By way of example, the downstream end (e.g., having higher genomic coordinates) of the first read 1004 (SEQ ID NO: 1) overlaps with the upstream end (e.g., having lower genomic coordinates) of the second read 1006 (SEQ ID NO:2), as shown in a first inset 1016. Similarly, the downstream end of the second read 1006 (SEQ ID NOG) overlaps with the upstream end of the third read 1008 (SEQ ID NO:4), as shown in a second inset 1018. The downstream end of the third read 1008 (SEQ ID NO:5) overlaps with the upstream end of the fourth read 1010 (SEQ ID NO:6), as shown in a third inset 1020. The downstream end of the fourth read 1010 (SEQ ID NO: 7) overlaps with the upstream end of the fifth read 1012 (SEQ ID NO:8), as shown in a fourth inset 1022. The downstream end of the fifth read 1012 (SEQ ID NO:9) overlaps w ith the upstream end of the sixth read 1014 (SEQ ID NOTO), as shown in a fifth inset 1024.

[0185] Each inset includes a coverage graph 1026 indicating sequencing depth across the depicted genomic region. The coverage graph 1026 demonstrates increased sequencing depth at a region of overlap (e.g., where the downstream end of one read overlaps with the upstream end of another read). The region of overlap corresponds to a transposon integration region 1028, representing a duplicated sequence (e.g., an approximately nine-base pair sequence) that results from transposon integration. As described with respect to FIG. 6A, for example, during the transposon integration 130, the transposase enzyme 208 makes staggered cuts in the gDNA 124 so that one strand of the gDNA 124 is cut at a slightly offset position from the other strand of the gDNA 124, resulting in two short single-stranded overhangs that help mediate joining of the transposon recognition sequence 426 with the gDNA 124. The overhangs provide a template that is filled via the gap extension 222, thus producing duplicated double stranded DNA at the overhang region. The transposon integration region 1028 thus provides molecular evidence of successful transposon integration using the techniques described herein.

[0186] A reconstructed sequence 1030 is shown for a portion of the assembled contiguous genomic sequence derived from the six linked sequencing reads. The higher coverage indicated by the coverage graph 1026 shows the transposon integration region 1028 between separate reads used to generate the reconstructed sequence 1030. The reconstructed sequence 1030 includes a top strand (SEQ ID NO: 11) and a bottom strand (SEQ ID NO: 12). The top strand includes a first genomic sequence 1032 on a first side of the transposon integration region 1028 (e.g., in the direction of lower genomic coordinates) and a PIPE transposon sequence regionFIG. 1 Patents 49 Docket No.: BI-11119-PCT1034 on a second side of the transposon integration region 1028 (e.g., in the direction of increasing genomic coordinates). The PIPE transposon sequence region 1034 includes, among other sequence portions described herein with respect to the PIPE transposons 126, an index sequence 1036. The bottom strand includes a reverse complement pipe transposon sequence region 1038 on the first side of the transposon integration region 1028, which includes a reverse complement index sequence 1040. and a second genomic sequence 1042 on the second side of the transposon integration region 1028. The reverse complement pipe transposon sequence region 1038 is the reverse complement of the PIPE transposon sequence region 1034, and the reverse complement index sequence 1040 is the reverse complement of the index sequence 1036. Accordingly, the index sequence 1036 and the reverse complement index sequence 1040 form a pair of matching index sequences that informationally link the first genomic sequence 1032 and the second genomic sequence 1042. The first genomic sequence 1032 and the second genomic sequence 1042, for instance, correspond to natively adjacent DNA fragments represented in separate reads. Accordingly, the example 1000 demonstrates successful transposon integration and sequence reconstruction using the paired index position-encoded sequencing techniques described herein.

[0187] FIG. 11 depicts an example 1100 illustrating read distributions obtained using paired index position-encoded sequencing as described herein. The example 1100 includes a bar chart 1102 showing the number of chained reads (horizontal axis) plotted against count on a logarithmic scale (vertical axis) for sequencing data obtained from transposon-integrated gDNA from a single nucleus (HG002 cell line). The count value is given above each bar. The bar chart 1102 indicates that the paired index position-encoded sequencing techniques described herein enabled thousands of sequencing reads to be linked together, reaching up to six linked reads in this non-limiting example.

[0188] In the example 1100, the bar chart 1102 shows the highest count at two links (e.g., 3559 two-read chains were formed) and progressively lower counts as the number of reads increases to six (e.g., 431 three-read chains, 61 four-read chains, six five-read chains, and one six-read chain). The bar chart 1102 thus indicates that in the present example, it is more common to link two reads together than to link more than two reads together. However, these sequencing data were obtained with limited read depth, and increasing the read depth may also increase the number of reads chained.

[0189] A distribution chart 1104 displays count (vertical axis) versus contiguous sequence length in kilobases (kb), showing the progressive increase in contiguous sequence length achieved through the paired index position-encoded sequencing methodology described herein.FIG. 1 Patents 50 Docket No.: BI-11119-PCTA legend 1106 in the distribution chart 1104 indicates the number of reads chained in the contiguous sequence according to shading, with darker shading corresponding to lower numbers of links, which becomes progressively lighter. White shading corresponds to six chained reads, for example. The distribution chart 1104 demonstrates that longer reconstructed genomic sequences are typically associated with higher numbers of links between the sequencing reads 142. That is, the distribution shifts toward higher contiguous sequence length values as the number of reads chained increases. However, there are relatively long (e.g.. > 20 kb) contiguous sequences comprised of two chained reads due to some reads being longer than others. By way of example, two relatively long reads chained together may have a longer contiguous sequence than a greater number of linked shorter reads.

[0190] The bar chart 1102 and the distribution chart 1104 together demonstrate the successful implementation of the PIPE-seq approach, showing how the PIPE transposons 126 enable the linking of natively adjacent genomic fragments through the paired indexes 128 to create longer contiguous genomic sequences without relying solely on sequence overlaps between the sequencing reads 142 for read alignment and sequence reconstruction.Conclusion

[0191] Although the invention has been described in language specific to structural features and / or methodological acts, it is to be understood that the invention defined in the appended claims is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as example forms of implementing the claimed invention.

[0192] Features described above as well as those claimed below may be combined in various ways without departing from the scope hereof. The following disclosed items illustrate some possible, non-limiting combinations:FIG. 1 Patents 51 Docket No.: BI-11119-PCT

[0193] (Al) A method for paired index position-encoded sequencing, comprising: generating a plurality of transposons, each of the plurality of transposons comprising two oligonucleotides coupled together in a 5'-to-5' orientation to encode a unique pair of index sequences that distinguishes one transposon of the plurality of transposons from other transposons of the plurality of transposons; integrating the plurality7of transposons into genomic DNA during a sequencing workflow; matching the unique pair of index sequences between sequencing reads obtained during the sequencing workflow: and reconstructing a genomic sequence of the genomic DNA based on the matching.

[0194] (A2) For the method denoted as (Al), integrating the plurality of transposons into the genomic DNA informationally links natively adjacent fragments of the genomic DNA via the unique pair of index sequences.

[0195] (A3) For the method denoted as (Al) or (A2), integrating the plurality of transposons into the genomic DNA during the sequencing workflow comprises: permeabilizing a biological sample, wherein the biological sample comprises a cell, a nucleus, or a tissue; combining the plurality of transposons with a transposase enzyme to form transposomes; and adding the transposomes to the permeabihzed biological sample.

[0196] (A4) For the method denoted as any one of (Al) through (A3), the two oligonucleotides of each of the plurality of transposons each comprise: an RNA polymerase promoter site at a 5'-end; an amplification sequence adjacent to the RNA polymerase promoter site; an index sequence of the unique pair of index sequences; and a transposon recognition sequence at a 3'-end.

[0197] (A5) For the method denoted as (A4), integrating the plurality of transposons into the genomic DNA during the sequencing workflow generates paired index position-encoded genomic DNA, and the sequencing workflow further comprises: performing DNA extension to generate a double-stranded RNA polymerase promoter site from the RNA polymerase promoter site; performing in vitro transcription on the paired index position-encoded genomic DNA using an RNA polymerase that binds to the double-stranded RNA polymerase promoter site, the in vitro transcription generating paired index position-encoded RNA; performing reverse transcription on the paired index position-encoded RNA using a reverse transcriptase and primers targeting the amplification sequence or the RNA polymerase promoter site, the reverse transcription generating paired index position-encoded complementary DNA (cDNA); and sequencing the paired index position-encoded cDNA to generate the sequencing reads.FIG. 1 Patents 52 Docket No.: BI-11119-PCT

[0198] (A6) For the method denoted as (A5), the method further comprises amplifying the paired index position-encoded cDNA prior to sequencing the paired index position-encoded cDNA.

[0199] (A7) For the method denoted as any one of (Al) through (A6), generating the plurality of transposons comprises: synthesizing a 5'-to-5' coupled oligonucleotide by coupling a first oligonucleotide having a first functional group attached to its 5'-end with a second oligonucleotide having a second functional group attached to its 5'-end, wherein the first functional group and the second functional group are configured to react selectively with each other.

[0200] (A8) For the method denoted as (A7), the first functional group is a strained alkyne, the second functional group is an azide, and the second oligonucleotide is identical in sequence to the first oligonucleotide.

[0201] (A9) For the method denoted as (A7) or (A8), generating the plurality of transposons further comprises: distributing aliquots of the 5'-to-5' coupled oligonucleotide to a plurality of individual reaction vessels; adding different primer molecules to respective reaction vessels of the plurality of individual reaction vessels, the different primer molecules configured to anneal to both 3'-ends of the 5'-to-5' coupled oligonucleotide and provide a template for the unique pair of index sequences for one of the plurality of transposons; and extending the 5'-to-5' coupled oligonucleotide in a 3'-direction from an annealed primer molecule.

[0202] (A10) For the method denoted as (A7) or (A8), generating the plurality of transposons further comprises: distributing aliquots of the 5'-to-5' coupled oligonucleotide to a plurality of individual reaction vessels; adding different index oligonucleotide molecules to respective reaction vessels of the plurality of individual reaction vessels, wherein each of the different index oligonucleotide molecules includes a transposon recognition sequence and an index sequence for the unique pair of index sequences for one of the plurality of transposons; adding bridge oligonucleotide molecules to respective reaction vessels of the plurality of individual reaction vessels, wherein the bridge oligonucleotide molecules are configured to anneal to a 3'-end of the 5'-to-5' coupled oligonucleotide and a 5'-end of the different index oligonucleotide molecules at each 3'-end of the 5'-to-5' coupled oligonucleotide; and ligating the 5'-to-5' coupled oligonucleotide to the different index oligonucleotide molecules at the 3'-ends of the 5'-to-5' coupled oligonucleotide using a ligase enzyme.

[0203] (All) For the method denoted as any one of (Al) through (A10), the method further comprises mixing the plurality of transposons with a transposase enzyme to form transposomes prior to integrating the plurality of transposons into the genomic DNA.FIG. 1 Patents 53 Docket No.: BI-11119-PCT

[0204] (A12) For the method denoted as any one of (Al) through (All), reconstructing the genomic sequence of the genomic DNA based on the matching comprises: computationally determining that a DNA sequence in a sequencing read is natively immediately upstream of another DNA sequence from a separate sequencing read based on identifying the unique pair of index sequences in a dow nstream-positioned index sequence of the sequencing read and an upstream-positioned index sequence of the separate sequencing read.

[0205] (A13) Forthe method denoted as any one of (Al) through (A12), the genomic sequence is a whole genome of a single cell.

[0206] (A14) For the method denoted as any one of (Al) through (A12), a first of the two oligonucleotides comprises a first index sequence of the unique pair of index sequences, and a second of the two oligonucleotides comprises a second index sequence of the unique pair of index sequences.

[0207] (A15) For the method denoted in (A14), the first index sequence and the second index sequence are identical sequences.

[0208] (Al 6) For the method denoted as any one of (Al) through (Al 5), the sequencing workflow includes long read sequencing.

[0209] (Al 7) For the method denoted as any one of (Al) through (Al 5), the sequencing workflow includes short read sequencing.

[0210] (Al 8) For the method as denoted as any one of (Al) through (Al 7), wherein the plurality of paired index position-encoding transposons are integrated into genomic DNA within an intact nucleus isolated from a biological sample to form paired index positionencoding transposon integrated genomic DNA.

[0211] There are several technical effects of the method for paired index position-encoded sequencing of the present disclosure. First, the integration of the unique pair of index sequences into genomic DNA enables the informational linking of natively adjacent DNA fragments, which may improve the accuracy of genome reconstruction compared to traditional overlapbased assembly methods. The 5'-to-5' orientation of the coupled oligonucleotides provides a molecular structure that facilitates the encoding of positional information during the sequencing w orkflow, which enhances the ability to reconstruct contiguous genomic sequences without relying solely on sequence overlaps. Additionally, the transposon integration within intact permeabilized nuclei preserves the native genomic context while enabling single-cell sequencing, which improves the fidelity of genome reconstruction while providing cellular resolution. The paired index position-encoded sequencing offers a scalable solution that can enable reduced sequencing depth while maintaining high accuracy, thereby reducing theFIG. 1 Patents 54 Docket No.: BI-11119-PCTcomputational resources and time used for genome assembly. Furthermore, the linear amplification via in vitro transcription avoids the length bias associated with exponential amplification methods, enabling more accurate representation of genomic fragments across different sizes. The automated nature of the index matching reduces dependency on computationally expensive overlap assembly while maintaining high accuracy, which allows for the reconstruction of genomes from complex samples including repetitive regions and heterogeneous genomes. These technical advantages collectively contribute to a comprehensive solution for genome sequencing that is both computationally efficient and biologically accurate, representing a significant advancement in the application of transposonbased methods to genomic analysis.

[0212] (B 1) A system for paired index position-encoded sequencing, comprising: a sequencing data processor; and a computer-readable storage medium having instructions stored thereon that, when executed by the sequencing data processor, cause the sequencing data processor to perform operations comprising: receiving sequencing data comprising sequencing reads of paired index position-encoded DNA generated from genomic DNA of a biological sample, the paired index position-encoded DNA generated using transposons that informationally link natively adjacent fragments of the genomic DNA using paired indexes; and reconstructing a genomic sequence of the biological sample based on matching the paired indexes between the sequencing reads.

[0213] (B2) For the system denoted as (Bl), to reconstruct the genomic sequence of the biological sample based on matching the paired indexes between the sequencing reads, the operations further comprise: matching a first sequencing read of the sequencing reads to a second sequencing read of the sequencing reads by identifying an index sequence of the paired indexes from a particular transposon in both of the first sequencing read and the second sequencing read; and determining a relative arrangement of a first DNA sequence of the first sequencing read and a second DNA sequence of the second sequencing read in the genomic sequence based on a position of the index sequence in each of the first sequencing read and the second sequencing read.

[0214] (B3) For the system denoted as (Bl) or (B2), the paired index position-encoded DNA comprises complementary DNA (cDNA) derived from the genomic DNA after labeling the genomic DNA with the paired indexes using a plurality of transposomes, each of the plurality of transposomes comprising one of the transposons in complex with atransposase, and wherein each of the transposons comprises at least one different index sequence for the paired indexes.FIG. 1 Patents 55 Docket No.: BI-11119-PCT

[0215] (B4) For the system denoted as any one of (Bl) through (B3), the paired indexes comprise unique index sequence pairs that distinguish one transposon used to generate the paired index position-encoded DNA from other transposons used to generate the paired index position-encoded DNA.

[0216] (B5) For the system denoted as any one of (Bl) through (B4), the transposons that informationally link the natively adjacent fragments of the genomic DNA using the paired indexes each comprise: two oligonucleotides coupled together in a 5'-to-5' orientation via a covalent linker between each 5'-end of the two oligonucleotides, each of the two oligonucleotides of a given transposon comprising, from 5' to 3': an RNA promoter sequence; an amplification sequence; a unique index sequence for the paired indexes; and a transposon recognition sequence; and hybridizing oligos configured to anneal to the transposon recognition sequence at each 3'-end.

[0217] (B6) For the system denoted as any one of (Bl) through (B5), the sequencing data are obtained from a method as defined in any preceding or following item, or any of the methods or method steps described herein.

[0218] (B7) For the system denoted as any one of (Bl) through (B6), the sequencing data are obtained using long read sequencing.

[0219] (B8) For the system denoted as any one of (Bl) through (B7), the transposons that informationally link natively adjacent fragments of the genomic DNA using paired indexes are complexed with a transposase enzyme, forming transposomes.

[0220] (B9) For the system denoted as (B8), the transposase enzyme is Tn5 transposase.

[0221] (Cl) A method for paired index position-encoded sequencing, comprising: integrating a plurality of paired index position-encoding transposons into genomic DNA within an intact nucleus isolated from a biological sample to form paired index position-encoding transposon integrated genomic DNA, each of the plurality of paired index position-encoding transposons comprising two identical oligonucleotides coupled together in a 5 '-to-5 ' orientation, each of the two identical oligonucleotides of a given paired index position-encoding transposon comprising, from 5' to 3': an RNA promoter sequence; an amplification sequence; an index sequence that is unique to the given paired index position-encoding transposon; and a transposon recognition sequence; generating paired index position-encoded RNA from the paired index position-encoding transposon integrated genomic DNA via in vitro transcription using an RNA polymerase that binds the RNA promoter sequence; generating paired index position-encoded cDNA from the paired index position-encoded RNA via reverse transcription using primers targeting the amplification sequence; sequencing the paired index position-FIG. 1 Patents 56 Docket No.: BI-11119-PCTencoded cDNA to generate sequencing reads; and reconstructing a genomic sequence of the biological sample by matching index sequences between the sequencing reads.

[0222] (C2) For the method denoted as (Cl), the method further comprises generating the plurality of paired index position-encoding transposons by: synthesizing a 5'-to-5' coupled oligonucleotide by coupling a first oligonucleotide having a first functional group attached to its 5'-end with a second oligonucleotide having a second functional group attached to its 5'-end, wherein the first functional group and the second functional group are configured to react selectively with each other; and performing a plurality of separate reactions with aliquots of the 5'-to-5' coupled oligonucleotide, each of the plurality7of separate reactions using a molecule encoding a different index sequence, wherein each of the plurality of separate reactions produces one of the plurality of paired index position-encoding transposons.

[0223] (DI) A computer-implemented method for paired index position-encoded sequencing, comprising: receiving sequencing data comprising sequencing reads of paired index position-encoded DNA generated from genomic DNA of a biological sample, the paired index position-encoded DNA generated using transposons that informationally link natively adjacent fragments of the genomic DNA using paired indexes; and reconstructing a genomic sequence of the biological sample based on matching the paired indexes between the sequencing reads.

[0224] (D2) For the computer-implemented method denoted as (DI), reconstructing the genomic sequence of the biological sample based on matching the paired indexes between the sequencing reads further comprises: matching a first sequencing read of the sequencing reads to a second sequencing read of the sequencing reads by identifying an index sequence of the paired indexes from a particular transposon in both of the first sequencing read and the second sequencing read; and determining a relative arrangement of a first DNA sequence of the first sequencing read and a second DNA sequence of the second sequencing read in the genomic sequence based on a position of the index sequence in each of the first sequencing read and the second sequencing read.

[0225] (D3) For the computer-implemented method denoted as (DI) or (D2), the paired index position-encoded DNA comprises complementary DNA (cDNA) derived from the genomic DNA after labeling the genomic DNA with the paired indexes using a plurality of transposomes, each of the plurality7of transposomes comprising one of the transposons in complex with a transposase, and wherein each of the transposons comprises at least one different index sequence for the paired indexes.FIG. 1 Patents 57 Docket No.: BI-11119-PCT

[0226] (D4) For the computer-implemented method denoted as any one of (DI) through (D3), the paired indexes comprise unique index sequence pairs that distinguish one transposon used to generate the paired index position-encoded DNA from other transposons used to generate the paired index position-encoded DNA.

[0227] (D5) For the computer-implemented method denoted as any one of (DI) through (D4), the transposons that informationally link the natively adjacent fragments of the genomic DNA using the paired indexes each comprise: two oligonucleotides coupled together in a 5'-to-5' orientation via a covalent linker between each 5'-end of the two oligonucleotides, each of the two oligonucleotides of a given transposon comprising, from 5' to 3': an RNA promoter sequence; an amplification sequence; a unique index sequence for the paired indexes; and a transposon recognition sequence; and hybridizing oligos configured to anneal to the transposon recognition sequence at each 3'-end.

[0228] (El) A computer-readable storage medium comprising instructions which, when executed, cause one or more processors to perform a method as defined in any preceding item, or any of the methods or method steps described herein.

[0229] (Fl) A system comprising the computer-readable storage medium denoted as (El), and the one or more processors.

[0230] (Gl) A computer program comprising instructions which, when the program is executed by one or more computing devices, cause the one or more computing devices to perform a method as defined in any preceding item, or any of the methods or method steps described herein.

[0231] (Ell) A method for generating a plurality of paired index position-encoding transposons, comprising: synthesizing a 5'-to-5' coupled oligonucleotide by coupling a first oligonucleotide having a first functional group attached to its 5'-end with a second oligonucleotide having a second functional group attached to its 5'-end, wherein the first functional group and the second functional group are configured to react selectively with each other; and performing a plurality of separate reactions with aliquots of the 5'-to-5' coupled oligonucleotide, each of the plurality of separate reactions using a molecule encoding a different index sequence such that each of the plurality of separate reactions produces one of the plurality of paired index position-encoding transposons, wherein each of the plurality of paired index position-encoding transposons comprises two oligonucleotides coupled together in a 5'-to-5' orientation and encodes a unique pair of index sequences.

[0232] (H2) For the method denoted as (Hl), each of the two oligonucleotides of a given paired index position-encoding transposon comprises, from 5' to 3': an RNA promoter sequence; anFIG. 1 Patents 58 Docket No.: BI-11119-PCTamplification sequence; an index sequence of the unique pair of index sequences; and a transposon recognition sequence.

[0233] (H3) For the method denoted as (Hl) or (H2), synthesizing the 5'-to-5' coupled oligonucleotide comprises coupling the first oligonucleotide and the second oligonucleotide via a strain-promoted azide-alkyne cycloaddition to form a covalent linkage between the 5'-ends, wherein the first functional group is a strained alkyne, the second functional group is an azide.

[0234] (H4) For the method denoted as (Hl) or (H2), performing the plurality of separate reactions comprises primer-mediated extension, including: adding different primer molecules to respective reactions, the different primer molecules configured to anneal to both 3 '-ends of the 5'-to-5' coupled oligonucleotide and provide templates for the different index sequences and for the transposon recognition sequence; and extending the 5'-to-5' coupled oligonucleotide in a 3'-direction from an annealed primer molecule.

[0235] (H5) For the method denoted as (Hl) or (H2), performing the plurality of separate reactions comprises ligation-based assembly, including: adding different index oligonucleotide molecules to respective reactions, wherein each of the different index oligonucleotide molecules includes a transposon recognition sequence and one of the different index sequences; adding bridge oligonucleotide molecules to the respective reactions, wherein the bridge oligonucleotide molecules are configured to anneal to a 3'-end of the 5'-to-5' coupled oligonucleotide and a 5'-end of the different index oligonucleotide molecules at each 3'-end of the 5'-to-5' coupled oligonucleotide; and ligating the 5'-to-5' coupled oligonucleotide to the different index oligonucleotide molecules at the 3'-ends of the 5'-to-5' coupled oligonucleotide using a ligase enzyme.

[0236] (H6) For the method denoted as any one of (Hl) through (H5), the method further comprises adding hybridizing ohgos configured to anneal to the transposon recognition sequence at each 3'-end.

[0237] (H7) For the method denoted as any one of (Hl) through (H6), for a given paired index position-encoding transposon, index sequences on opposite sides of the 5'-to-5' coupling are identical sequences that together form the unique pair of index sequences.

[0238] (H8) For the method denoted as any one of (Hl) through (H6), for a given paired index position-encoding transposon, index sequences on opposite sides of the 5'-to-5' coupling are different sequences that together form the unique pair of index sequences.FIG. 1 Patents 59 Docket No.: BI-11119-PCT

[0239] (H9) For the method denoted as any one of (Hl) through (H8), the method further comprises purifying the 5'-to-5' coupled oligonucleotide and / or the plurality of paired index position-encoding transposons by gel electrophoresis.

[0240] (H10) For the method denoted as any one of (Hl) through (H9), the method further comprises mixing the plurality of paired index position-encoding transposons with a transposase enzyme to form transposomes.

[0241] (Hl 1) For the method denoted as any one of (Hl) through (H10), the unique pair of index sequences for a given paired index position-encoding transposon is distinguishable, as a pair, from index pairs of other paired index position-encoding transposons generated in the plurality.

[0242] (Hl 2) For the method denoted as any one of (Hl) through (Hl l), the two oligonucleotides coupled together in the 5'-to-5' orientation are identical in sequence.

[0243] (II) A kit, comprising: a plurality of paired index position-encoding transposons, each paired index position-encoding transposon of the plurality of paired index position-encoding transposons comprising two oligonucleotides coupled together in a 5'-to-5' orientation via a covalent linker between 5'-ends of the two oligonucleotides, wherein each of the two oligonucleotides of a given paired index position-encoding transposon includes a respective index sequence of a pair of index sequences that is unique to the given paired index positionencoding transposon within the plurality.

[0244] (12) For the kit denoted as (II), each of the two oligonucleotides of the given paired index position-encoding transposon further comprises, from 5' to 3': an RNA polymerase promoter site; an amplification sequence; the respective index sequence of the pair of index sequences; and a transposon recognition sequence at a 3'-end.

[0245] (13) For the kit denoted as (12), the transposon recognition sequence comprises a mosaic end sequence.

[0246] (14) For the kit denoted as (12) or (13), the kit further comprises hybridizing oligos configured to anneal to the transposon recognition sequence at each 3'-end.

[0247] (15) For the kit denoted as any one of (12) through (14), the kit further comprises a transposase enzyme configured to bind the transposon recognition sequence to form transposomes when mixed with the plurality of paired index position-encoding transposons.

[0248] (16) For the kit denoted as any one of (12) through (15), the amplification sequence is configured for primer binding to facilitate synthesis or amplification of fragments adjacent to the transposon recognition sequence.FIG. 1 Patents 60 Docket No.: BI-11119-PCT

[0249] (17) For the kit denoted as any one of (II) through (16), for the given paired index position-encoding transposon, respective index sequences on opposite sides of the 5'-to-5' coupling are identical sequences that together form the pair of index sequences.

[0250] (18) For the kit denoted as any one of (II) through (16), for the given paired index position-encoding transposon, respective index sequences on opposite sides of the 5'-to-5' coupling are different sequences that together form the pair of index sequences.

[0251] (19) For the kit denoted as any one of (I l ) through (18), the covalent linker comprises a triazole linkage.

[0252] (110) For the kit denoted as any one of (II) through (19), the pair of index sequences is configured to distinguish one paired index position-encoding transposon of the plurality from other paired index position-encoding transposons of the plurality.

[0253] (JI) A paired index position-encoding transposon comprising two oligonucleotides coupled together in a 5'-to-5' orientation via a covalent linker between 5'-ends of the two oligonucleotides, wherein each of the two oligonucleotides of a given paired index positionencoding transposon includes a respective index sequence of a pair of index sequences.

[0254] (J2) For the paired index position-encoding transposon denoted as (JI), the covalent linker comprises a triazole linkage.

[0255] (J3) For the paired index position-encoding transposon denoted as (JI) or (J2), the two oligonucleotides are identical in sequence.

[0256] (J4) For the paired index position-encoding transposon denoted as any one of (JI) through (J3), a first oligonucleotide of the two oligonucleotides includes a first index sequence of the pair of index sequences, and a second oligonucleotide of the two oligonucleotides includes a second index sequence of the pair of index sequences.

[0257] (J5) For the paired index position-encoding transposon denoted as any one of (JI) through (J4), the two oligonucleotides each comprise, from 5' to 3': an RNA polymerase promoter site; an amplification sequence configured to bind a primer during amplification; the respective index sequence of the pair of index sequences; and a transposon recognition sequence configured to bind a transposase enzyme.

[0258] (KI) A plurality of paired index position-encoding transposons, each paired index position-encoding transposon of the plurality of paired index position-encoding transposons comprising two oligonucleotides coupled together in a 5'-to-5' orientation via a covalent linker between 5'-ends of the two oligonucleotides, wherein each of the two oligonucleotides of a given paired index position-encoding transposon includes a respective index sequence of a pairFIG. 1 Patents 61 Docket No.: BI-11119-PCTof index sequences that is unique to the given paired index position-encoding transposon within the plurality of paired index position-encoding transposons.

[0259] (K2) For the plurality of paired index position-encoding transposons denoted as (KI), the covalent linker comprises a triazole linkage.

[0260] (K3) For the plurality of paired index position-encoding transposons denoted as (KI) or (K2), the two oligonucleotides of the given paired index position-encoding transposon are identical in sequence.

[0261] (K4) For the plurality of paired index position-encoding transposons denoted as any one of (KI) through (K3), a first oligonucleotide of the two oligonucleotides of the given paired index position-encoding transposon includes a first index sequence of the pair of index sequences, and a second oligonucleotide of the two oligonucleotides of the given paired index position-encoding transposon includes a second index sequence of the pair of index sequences.

[0262] (K5) For the plurality of paired index position-encoding transposons denoted as any one of (KI) through (K4), the two oligonucleotides of the given paired index position-encoding transposon each comprise, from 5' to 3': an RNA polymerase promoter site; an amplification sequence configured to bind a primer during amplification; the respective index sequence of the pair of index sequences that is unique to the given paired index position-encoding transposon within the plurality of paired index position-encoding transposons; and a transposon recognition sequence configured to bind a transposase enzyme.

[0263] (L 1 ) A method of generating a paired index position-encoding transposon, comprising: synthesizing a 5'-to-5' coupled oligonucleotide by coupling a first oligonucleotide having a first functional group attached to its 5'-end with a second oligonucleotide having a second functional group attached to its 5 '-end, wherein the first functional group and the second functional group are configured to react selectively with each other.

[0264] (L2) For the method denoted as (LI), generating the paired index position-encoding transposon further comprises: distributing the 5'-to-5' coupled oligonucleotide to a reaction vessel; adding primer molecules to the reaction vessel that are configured to anneal to both 3'-ends of the 5'-to-5' coupled oligonucleotide and provide a template for adding an index sequence to both 3'-ends of the 5'-to-5' coupled oligonucleotide; and extending the 5'-to-5' coupled oligonucleotide in a 3'-direction from an annealed primer molecule.

[0265] (L3) For the method denoted as (LI), generating the paired index position-encoding transposon further comprises: distributing the 5'-to-5' coupled oligonucleotide to a reaction vessel; adding index oligonucleotide molecules to the reaction vessel that include a transposon recognition sequence and an index sequence; adding bridge oligonucleotide molecules to theFIG. 1 Patents 62 Docket No.: BI-11119-PCTreaction vessel, wherein the bridge oligonucleotide molecules are configured to anneal to a 3'-end of the 5'-to-5' coupled oligonucleotide and a 5'-end of the index oligonucleotide molecules at each 3'-end of the 5'-to-5' coupled oligonucleotide; and ligating the 5'-to-5' coupled oligonucleotide to the index oligonucleotide molecules at the 3'-ends of the 5'-to-5' coupled oligonucleotide using a ligase enzyme.

[0266] (L4) For the method denoted as any one of (LI) through (L3), the first functional group is a strained alkyne, and the second functional group is an azide.

[0267] (L5) For the method denoted as any one of (LI) through (L4), the second oligonucleotide is identical in sequence to the first oligonucleotide.

[0268] (L6) For the method denoted as any one of (LI through (L4), the second oligonucleotide is different in sequence from the first oligonucleotide.FIG. 1 Patents 63 Docket No.: BI-11119-PCT

Claims

CLAIMSWhat is claimed is:

1. A method for paired index position-encoded sequencing, comprising: generating a plurality of transposons, each of the plurality of transposons comprising two oligonucleotides coupled together in a 5'-to-5' orientation to encode a unique pair of index sequences that distinguishes one transposon of the plurality of transposons from other transposons of the plurality of transposons;integrating the plurality of transposons into genomic DNA during a sequencing workflow;matching the unique pair of index sequences between sequencing reads obtained during the sequencing workflow; andreconstructing a genomic sequence of the genomic DNA based on the matching.

2. The method of claim 1, wherein integrating the plurality of transposons into the genomic DNA informationally links natively adjacent fragments of the genomic DNA via the unique pair of index sequences.

3. The method of claim 1 or claim 2, wherein integrating the plurality of transposons into the genomic DNA during the sequencing workflow comprises:permeabilizing a biological sample, wherein the biological sample comprises a cell, a nucleus, or a tissue;combining the plurality of transposons with a transposase enzyme to form transposomes; andadding the transposomes to the permeabilized biological sample.

4. The method of any one of claims 1 to 3, wherein the two oligonucleotides of said each of the plurality of transposons each comprise:an RNA polymerase promoter site at a 5'-end;an amplification sequence adjacent to the RNA polymerase promoter site;an index sequence of the unique pair of index sequences; anda transposon recognition sequence at a 3'-end.FIG. 1 Patents 64 Docket No.: BI-11119-PCT5. The method of claim 4, wherein integrating the plurality of transposons into the genomic DNA during the sequencing workflow generates paired index position-encoded genomic DNA, and the sequencing workflow further comprises:performing DNA extension to generate a double-stranded RNA polymerase promoter site from the RNA polymerase promoter site;performing in vitro transcription on the paired index position-encoded genomic DNA using an RNA polymerase that binds to the double-stranded RNA polymerase promoter site, the in vitro transcription generating paired index position-encoded RNA;performing reverse transcription on the paired index position-encoded RNA using a reverse transcriptase and primers targeting the amplification sequence or the RNA polymerase promoter site, the reverse transcription generating paired index position-encoded complementary DNA (cDNA); andsequencing the paired index position-encoded cDNA to generate the sequencing reads.

6. The method of claim 5. further comprising:amplifying the paired index position-encoded cDNA prior to sequencing the paired index position-encoded cDNA.

7. The method of any one of claims 1 to 6, wherein generating the plurality of transposons comprises:synthesizing a 5'-to-5' coupled oligonucleotide by coupling a first oligonucleotide having a first functional group attached to its 5'-end with a second oligonucleotide having a second functional group attached to its 5 '-end, wherein the first functional group and the second functional group are configured to react selectively with each other.

8. The method of claim 7, wherein the first functional group is a strained alky ne, the second functional group is an azide, and the second oligonucleotide is identical in sequence to the first oligonucleotide.FIG. 1 Patents 65 Docket No.: BI-11119-PCT9. The method of claim 7 or 8, wherein generating the plurality of transposons further comprises:distributing aliquots of the 5'-to-5' coupled oligonucleotide to a plurality of individual reaction vessels;adding different primer molecules to respective reaction vessels of the plurality of individual reaction vessels, the different primer molecules configured to anneal to both 3'-ends of the 5'-to-5' coupled oligonucleotide and provide a template for the unique pair of index sequences for one of the plurality of transposons; andextending the 5'-to-5' coupled oligonucleotide in a 3'-direction from an annealed primer molecule.

10. The method of claim 7 or 8, wherein generating the plurality of transposons further comprises:distributing aliquots of the 5'-to-5' coupled oligonucleotide to a plurality of individual reaction vessels;adding different index oligonucleotide molecules to respective reaction vessels of the plurality of individual reaction vessels, wherein each of the different index oligonucleotide molecules includes a transposon recognition sequence and an index sequence for the unique pair of index sequences for one of the plurality of transposons;adding bridge oligonucleotide molecules to respective reaction vessels of the plurality of individual reaction vessels, wherein the bridge oligonucleotide molecules are configured to anneal to a 3'-end of the 5'-to-5' coupled oligonucleotide and a 5'-end of the different index oligonucleotide molecules at each 3'-end of the 5'-to-5' coupled oligonucleotide; and ligating the 5'-to-5' coupled oligonucleotide to the different index oligonucleotide molecules at the 3'-ends of the 5'-to-5' coupled oligonucleotide using a ligase enzyme.

11. The method of any one of claims 1 to 10, further comprising:mixing the plurality of transposons with a transposase enzyme to form transposomes prior to integrating the plurality of transposons into the genomic DNA.FIG. 1 Patents 66 Docket No.: BI-11119-PCT12. The method of any one of claims 1 to 11, wherein reconstructing the genomic sequence of the genomic DNA based on the matching comprises:computationally determining that a DNA sequence in a sequencing read is natively immediately upstream of another DNA sequence from a separate sequencing read based on identifying the unique pair of index sequences in a downstream-positioned index sequence of the sequencing read and an upstream-positioned index sequence of the separate sequencing read.

13. The method of any one of claims 1 to 12, wherein the genomic sequence is a whole genome of a single cell.

14. The method of any one of claims 1-13, wherein the plurality of paired index position-encoding transposons are integrated into genomic DNA within an intact nucleus isolated from a biological sample to form paired index position-encoding transposon integrated genomic DNA.

15. A system for paired index position-encoded sequencing, comprising:a sequencing data processor; anda computer-readable storage medium having instructions stored thereon that, when executed by the sequencing data processor, cause the sequencing data processor to perform operations comprising:receiving sequencing data comprising sequencing reads of paired index position-encoded DNA generated from genomic DNA of a biological sample, the paired index position-encoded DNA generated using transposons that informationally link natively adjacent fragments of the genomic DNA using paired indexes; and reconstructing a genomic sequence of the biological sample based on matching the paired indexes between the sequencing reads.

16. The system of claim 15, wherein, to reconstruct the genomic sequence of the biological sample based on matching the paired indexes between the sequencing reads, the operations further comprise:matching a first sequencing read of the sequencing reads to a second sequencing read of the sequencing reads by identifying an index sequence of the paired indexes from a particular transposon in both of the first sequencing read and the second sequencing read; andFIG. 1 Patents 67 Docket No.: BI-11119-PCTdetermining a relative arrangement of a first DNA sequence of the first sequencing read and a second DNA sequence of the second sequencing read in the genomic sequence based on a position of the index sequence in each of the first sequencing read and the second sequencing read.

17. The system of claim 15 or claim 15, wherein the paired index position-encoded DNA comprises complementary DNA (cDNA) derived from the genomic DNA after labeling the genomic DNA with the paired indexes using a plurality of transposomes, each of the plurality of transposomes comprising one of the transposons in complex with a transposase, and wherein each of the transposons comprises at least one different index sequence for the paired indexes.

18. The system of any one of claims 15 to 17, wherein the paired indexes comprise unique index sequence pairs that distinguish one transposon used to generate the paired index position-encoded DNA from other transposons used to generate the paired index position-encoded DNA.

19. The system of any one of claims 15 to 18, wherein the transposons that informationally link the natively adjacent fragments of the genomic DNA using the paired indexes each comprise:two oligonucleotides coupled together in a 5'-to-5' orientation via a covalent linker between each 5'-end of the two oligonucleotides, each of the two oligonucleotides of a given transposon comprising, from 5' to 3':an RNA promoter sequence;an amplification sequence;a unique index sequence for the paired indexes; anda transposon recognition sequence; andhybridizing oligos configured to anneal to the transposon recognition sequence at each 3'-end.

20. The system of any one of claims 15 to 19, wherein the sequencing data are obtained from a method according to any one of claims 1-14.FIG. 1 Patents 68 Docket No.: BI-11119-PCT21. The system of any one of claims 15 to 20, wherein the sequencing data are obtained using long read sequencing.

22. The system of any one of claims 15 to 21, wherein the transposons that informationally link natively adjacent fragments of the genomic DNA using paired indexes are complexed with a transposase enzyme, forming transposomes.

23. The system of claim 22, wherein the transposase enzyme is Tn5 transposase.

24. A method for paired index position-encoded sequencing, comprising: integrating a plurality of paired index position-encoding transposons into genomic DNA within an intact nucleus isolated from a biological sample to form paired index positionencoding transposon integrated genomic DNA, each of the plurality of paired index positionencoding transposons comprising two identical oligonucleotides coupled together in a 5'-to-5' orientation, each of the two identical oligonucleotides of a given paired index positionencoding transposon comprising, from 5' to 3':an RNA promoter sequence;an amplification sequence;an index sequence that is unique to the given paired index position-encoding transposon; anda transposon recognition sequence;generating paired index position-encoded RNA from the paired index positionencoding transposon integrated genomic DNA via in vitro transcription using an RNA polymerase that binds the RNA promoter sequence;generating paired index position-encoded cDNA from the paired index position-encoded RNA via reverse transcription using primers targeting the amplification sequence; sequencing the paired index position-encoded cDNA to generate sequencing reads; and reconstructing a genomic sequence of the biological sample by matching index sequences between the sequencing reads.FIG. 1 Patents 69 Docket No.: BI-11119-PCT25. The method of claim 24, further comprising generating the plurality of paired index position-encoding transposons by:synthesizing a 5'-to-5' coupled oligonucleotide by coupling a first oligonucleotide having a first functional group attached to its 5'-end with a second oligonucleotide having a second functional group attached to its 5 '-end, wherein the first functional group and the second functional group are configured to react selectively with each other; andperforming a plurality of separate reactions with aliquots of the 5'-to-5' coupled oligonucleotide, each of the plurality of separate reactions using a molecule encoding a different index sequence, wherein each of the plurality of separate reactions produces one of the plurality' of paired index position-encoding transposons.

26. A paired index position-encoding transposon comprising two oligonucleotides coupled together in a 5'-to-5' orientation via a covalent linker between 5'-ends of the two oligonucleotides, wherein each of the two oligonucleotides of a given paired index positionencoding transposon includes a respective index sequence of a pair of index sequences.

27. The paired index position-encoding transposon of claim 26, wherein the covalent linker comprises a triazole linkage.

28. The paired index position-encoding transposon of claim 26 or 27, wherein the two oligonucleotides are identical in sequence.

29. The paired index position-encoding transposon of any one of claims 26 to 28, wherein a first oligonucleotide of the two oligonucleotides includes a first index sequence of the pair of index sequences, and a second oligonucleotide of the two oligonucleotides includes a second index sequence.

30. The paired index position-encoding transposon of any one of claims 26 to 29, wherein the two oligonucleotides each comprise, from 5' to 3':a promoter sequence configured to bind a polymerase;an amplification sequence configured to bind a primer during amplification;the respective index sequence of the pair of index sequences; anda transposon recognition sequence configured to bind a transposase enzyme.FIG. 1 Patents 70 Docket No.: BI-11119-PCT31. A plurality of paired index position-encoding transposons, each paired index position-encoding transposon of the plurality of paired index position-encoding transposons comprising two oligonucleotides coupled together in a 5'-to-5' orientation via a covalent linker between 5'-ends of the two oligonucleotides, wherein each of the two oligonucleotides of a given paired index position-encoding transposon includes a respective index sequence of a pair of index sequences that is unique to the given paired index position-encoding transposon within the plurality.

32. The plurality of paired index position-encoding transposons of claim 31, wherein the covalent linker comprises a triazole linkage.

33. The plurality of paired index position-encoding transposons of claim 31 or 32, wherein the two oligonucleotides of the given paired index position-encoding transposon are identical in sequence.

34. The plurality of paired index position-encoding transposons of any one of claims 31 to 33, wherein a first oligonucleotide of the two oligonucleotides of the given paired index position-encoding transposon includes a first index sequence of the pair of index sequences, and a second oligonucleotide of the two oligonucleotides of the given paired index position-encoding transposon includes a second index sequence.

35. The plurality7of paired index position-encoding transposons of any one of claims 31 to 34, wherein the two oligonucleotides of the given paired index position-encoding transposon each comprise, from 5' to 3':a promoter sequence configured to bind a polymerase;an amplification sequence configured to bind a primer during amplification;the respective index sequence of the pair of index sequences that is unique to the given paired index position-encoding transposon within the plurality; anda transposon recognition sequence configured to bind a transposase enzyme.FIG. 1 Patents 71 Docket No.: BI-11119-PCT36. A method of generating a paired index position-encoding transposon, comprising:synthesizing a 5'-to-5' coupled oligonucleotide by coupling a first oligonucleotide having a first functional group attached to its 5'-end with a second oligonucleotide having a second functional group attached to its 5 '-end, wherein the first functional group and the second functional group are configured to react selectively with each other.

37. The method of claim 36, wherein generating the paired index position-encoding transposon further comprises:distributing the 5'-to-5' coupled oligonucleotide to a reaction vessel;adding primer molecules to the reaction vessel that are configured to anneal to both 3'-ends of the 5'-to-5' coupled oligonucleotide and provide a template for adding an index sequence to both 3'-ends of the 5'-to-5' coupled oligonucleotide; andextending the 5'-to-5' coupled oligonucleotide in a 3'-direction from an annealed primer molecule.

38. The method of claim 36, wherein generating the paired index position-encoding transposon further comprises:distributing the 5'-to-5' coupled oligonucleotide to a reaction vessel;adding index oligonucleotide molecules to the reaction vessel that include a transposon recognition sequence and an index sequence;adding bridge oligonucleotide molecules to the reaction vessel, wherein the bridge oligonucleotide molecules are configured to anneal to a 3'-end of the 5'-to-5' coupled oligonucleotide and a 5'-end of the index oligonucleotide molecules at each 3'-end of the 5'-to-5' coupled oligonucleotide; andligating the 5'-to-5' coupled oligonucleotide to the index oligonucleotide molecules at the 3'-ends of the 5'-to-5' coupled oligonucleotide using a ligase enzyme.

39. The method of any one of claims 36 to 38, wherein the first functional group is a strained alkyne, and the second functional group is an azide.

40. The method of any one of claims 36 to 39, wherein the second oligonucleotide is identical in sequence to the first oligonucleotide.FIG. 1 Patents 72 Docket No.: BI-11119-PCT41. The method of any one of claims 36 to 39, wherein the second oligonucleotide s different in sequence from the first oligonucleotide.FIG. 1 Patents 73 Docket No. : BI-11119-PCT