Nucleic acid detection methods using oligohybridization and PCR-based amplification
By using hybridization and PCR amplification with single-stranded DNA oligonucleotide probes, the problems of RNA degradation and reverse transcription bias in single-cell RNA detection were solved, achieving high-sensitivity detection and spatial mapping for all RNA species and overcoming the limitations of existing technologies.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- MAX DELBRUECK CENT FUER MOLEKULARE MEDIZIN
- Filing Date
- 2021-10-22
- Publication Date
- 2026-07-01
Smart Images

Figure 0007883343000008 
Figure 0007883343000009 
Figure 0007883343000010
Abstract
Description
[Technical Field]
[0001] This invention relates to the field of nucleic acid sequencing at the single-cell level, for example, single-cell RNA sequencing (scRNA-Seq). In particular, the invention provides a method for detecting nucleic acids in fixed or unfixed nucleic acid-containing compartments, such as the nucleus of a eukaryotic cell, comprising: hybridizing multiple single-stranded (ss)DNA oligonucleotide probes with complementary nucleic acid molecules in the compartment; removing ssDNA oligonucleotide probes that do not specifically hybridize with nucleic acids from the compartment; and identifying ssDNA oligonucleotide probes that specifically hybridize with nucleic acid molecules in the compartment by sequencing or amplification, thereby determining the corresponding nucleic acids present in the compartment. This method does not require a series of ssDNA probe hybridization steps to the same target nucleic acid or steps of RNA isolation and cDNA production as a means of increasing specificity or sensitivity. The method of the present invention has the potential to detect substantially all known and / or unknown nucleic acid species, particularly RNA, such as protein-coding mRNA and non-coding RNA. The method also has sufficient sensitivity to detect less abundant nucleic acids and their abundance in intracellular compartments. This method further enables spatial mapping of detected nucleic acids, where the compartments are sectioned before probe hybridization to obtain a collection of fragments, and thus nucleic acid molecules are separated from each other depending on their localization. Spatial mapping of detected nucleic acids can be combined with the detection of at least one DNA locus, at least one protein, or analysis of chromatin condensation, chromatin contact, and chromatin radial position in the cell nucleus. [Background technology]
[0002] Ribonucleic acid (RNA) is a highly versatile molecule with numerous biological functions. While its role in transmitting genetic information from DNA to protein biosynthesis and ribosomes is well-known, RNA molecules can also exert regulatory, catalytic, or processing activity. Understanding when, where, how, and why RNA molecules are expressed in cells is therefore of great interest to researchers and clinicians alike.
[0003] As a result, numerous methods have been developed and are continuously being optimized for detecting, identifying, and quantifying RNA in individual cells, tissues, or entire organisms.
[0004] Northern blotting is a method in which RNA is first separated by size using electrophoresis, and then transferred to a membrane. RNA detection is then achieved by adding a labeled probe to the membrane that hybridizes with the target RNA (Josefsen et al., 2011, Northern Blotting Analysis. Methods Mol Biol. 703, 87-105).
[0005] Nuclease protection assays utilize radiolabeled or non-isotope probes that anneal to RNA in the sample. After hybridization, single-stranded, unhybridized probes and RNA are degraded by the nuclease, while the probe-bound RNA fragment is preserved, subsequently separated, and detected by autoradiography on an acrylamide gel (https: / / www.thermofisher.com / de / de / home / references / ambion-tech-support / ribonuclease-protection-assays / general-articles / the-basics-what-is-a-nuclease-protection-assay.html).
[0006] The discovery of reverse transcriptase, a retroviral enzyme that enables the reverse transcription of RNA molecules into complementary DNA (cDNA), has significantly advanced RNA research. The conversion of RNA to more stable cDNA reduces the rate of sample degradation, facilitating the handling of biological materials and thus enabling the development and application of more sophisticated RNA detection methods. Reverse transcription quantitative polymerase chain reaction (RT-qPCR) is a PCR-based method in which, for example, RNA is first reverse-transcribed into cDNA and then amplified using a specially designed primer pair that anneals with the target cDNA molecule. Sufficient RNA integrity is required to achieve polymerization with cDNA to a length suitable for further amplification (approximately >100 nts or typically >200 nts). Quantification of cDNA may involve the use of PCR amplification in the presence of a fluorescent dye, such as SYBR Green, which is inserted into the cDNA and detected by a sensor in a qPCR thermal cycler. After each PCR cycle, the increase in cDNA copy number increases the fluorescence signal until a certain threshold is reached. After the planned number of PCR cycles are completed, the computer then uses this information to calculate the amount of starting RNA for each tested gene in the sample.
[0007] However, none of the above methods enable global transcriptome-wide analysis of cells. High-throughput parallel detection of thousands of RNAs extracted from biological samples has only become possible with the development of hybridization-based DNA microarrays. Simply put, fluorescently labeled cDNA fragments obtained by reverse transcription of isolated RNA form complementary base pairs with single-stranded DNA oligonucleotide probes immobilized on predetermined regions (features) on the microchip surface. Each of these features contains a probe for a specific target gene. The bound cDNA sequence produces a fluorescent signal. The intensity of this signal depends on the amount of target cDNA bound to the probe at a particular feature on the chip. The intensity detected for each feature is compared to the corresponding feature under different conditions, for example, when the biological sample is treated with an environmental stimulus. As a result, changes in the expression of the test gene can be quantified relatively (Miller et al., 2009, Basic Concepts of Microarrays and Potential Applications in Clinical Microbiology., Clin Microbiol Rev. 22(4): 611-633).
[0008] Improvements and increasing affordability of next-generation sequencing (NGS) technologies are gradually replacing microarrays with RNA sequencing (RNA-Seq). Despite its name, RNA-Seq typically does not involve the direct sequencing of RNA molecules. In fact, conventional RNA-Seq methods sequence approximately 200 nt cDNA fragments derived from RNA isolated from biological material. The cDNA fragments are ligated to a specific sequencing adapter to form a sequencing library. This library is then PCR amplified, followed by a fragment size selection step. Some variations of RNA-Seq positively select highly abundant structural RNAs, such as ribosomal RNA (which can account for 90% of total RNA), adding a more costly step to the method. Various working platforms are known in the field for actual sequencing. The most commonly used sequencing platforms include Illumina, Roche 454, Helicos, PacBio, SOLiD, and Complete Genomics. Illumina, Roche 454, or PacBio are examples of sequencing-by-synthesis, utilizing fluorescently labeled deoxynucleoside triphosphates (dNTPs) detectable by the sequencing instrument. In short, one of four different fluorescently labeled dNTPs is added to a growing nucleic acid chain complementary to the nucleotide sequence of the cDNA fragment to be sequenced. The fluorescent label of each dNTP terminates further polymerization of the chain until the fluorescent dye is fabricated and removed by an enzyme. Another labeled dNTP can then be added to the chain. Based on the fluorescence signal and its intensity, the sequencing instrument can call the correct base corresponding to the template cDNA (https: / / www.illumina.com / documents / products / techspotlights / techspotlight_sequencing.pdf).This type of short-read cDNA sequencing approach sequences libraries to an average of 20 to 30 million reads per sample (Stark et al., 2019, RNA sequencing: the teenage years. Nature Reviews Genetics 20, 631-656). Other approaches used on different sequencing platforms differ from classical sequencing-by-synthesis methods, employing, for example, sequencing-by-ligation (e.g., SOLiD) or Nanopore sequencing technology.
[0009] For subsequent bioinformatics analysis of raw sequencing data, sequence reads are first tested for quality, then mapped to a reference genome, and the transcriptome is assembled. When a reference genome is unavailable, de novo assembly of sequence reads can be performed.
[0010] In recent years, rapid improvements have been observed in the field of RNA-Seq. Novel methods have been developed that enable direct RNA sequencing, eliminating the intermediate steps of sequencing and reverse transcription of long, unfragmented cDNA (Stark et al., 2019). Transcriptome-wide analysis of individual cells by single-cell RNA (scRNA) sequencing presents a major challenge in recent years. ScRNA-Seq has enabled researchers to test rare cell types, for example, in bulk samples, where unique transcriptional profiles are often obscured by other, more abundant cell types. ScRNA-Seq facilitates the identification of previously unknown cells based on unique transcriptional signatures. Various techniques and methods suitable for scRNA-Seq are known to vary in sensitivity, accuracy, number of cells analyzed, and monetary cost, and therefore can be used in a variety of applications and experimental equipment (Ziegenhain et al., 2017, Comparative Analysis of Single-Cell RNA Sequencing Methods. Mol Cell. 85(4), 631-643). Generally, individual cells are isolated in single wells or microfluidic droplets. Each well or droplet contains all the chemicals necessary for sequencing library preparation, including cell lysis and RNA isolation, reverse transcription, and adapter ligation (Ziegenhain et al., 2017).
[0011] Transcriptome analysis in single cells faces several significant challenges: firstly, deep sequencing of RNA requires minimal starting material; however, the amount of RNA that can be isolated from a single cell is limited, and RNA is highly susceptible to degradation during isolation; consequently, the informational value of conventional scRNA-Seq methods is dependent on the availability of starting material; and secondly, reverse transcription of isolated scRNA, along with preceding amplification steps such as "linear amplification," converts the transcriptome information into more stable cDNA molecules, but often introduces another bias that can ultimately mislead the evaluation of sequencing results.
[0012] Therefore, transcriptome testing with limited RNA concentrations extracted from individual cells often requires the use of alternative RNA detection methods. The use of fluorescently labeled oligonucleotide probes, which can be detected microscopically in situ, i.e., in the natural cell environment, thus bypassing the need for RNA isolation, represents a common approach. These fluorescent in-situ hybridization (FISH) probes localize and hybridize complementary RNA molecules to fixed tissue sections or cells. The bound probes can then be detected at intracellular resolution using a fluorescence microscope or specialized sensing device. Today, extensive libraries of RNA-FISH probes, such as Oligopaint (Beliveau et al., 2012, Versatile design and synthesis platform for visualizing genomes with Oligopaint FISH probes. Proc Natl Acad Sci 109(52), 21301-6), can be bioinformatics-designed to cover large portions of the genome and produced through large-scale parallel synthesis. However, the number of genes that can be analyzed simultaneously using FISH-based RNA detection methods is traditionally limited by the limited number of fluorescent tags that can be monitored in parallel. This problem is addressed by methods such as RNASeqFISH+, which uses modified probes with complex multicolor barcodes that significantly increase the number of genes that can be detected simultaneously in a single experiment.
[0013] Similarly, the multiplexed nCounter assay provided by NanoString Technologies is based on RNA detection using probe pairs consisting of a target-specific capture probe that anneals to the RNA molecule of interest and a gene-specific color-coded reporter probe that hybridizes with the target probe. The RNA-capture probe-reporter probe complex is immobilized and aligned to the imaging surface of a specific cartridge. The cartridge is then scanned with a specific microscopy instrument capable of automated fluorescence detection that directly counts the labeled probes (Geiss et al., 2008, Direct multiplexed measurement of gene expression with color-coded probe pairs. Nat Biotechnol. 26(3), 317-325). However, methods such as nCounter require the purchase of specific research instruments capable of detecting the probes. The required probes only bind to a further selected panel of mRNAs and miRNAs and cannot be simply produced in the laboratory.
[0014] Based on the nCounter assay, NanoString Technologies further developed GeoMx TM We developed a method known as Digital Spatial Profiling (DSP). This method involves co-staining tissue or cell sections with a fluorescent marker and an oligonucleotide "profiling" probe for RNA. Each probe is conjugated via a photocleavable linker to a target complementary sequence linked to a DSP oligonucleotide barcode. Individually selected regions of interest in the tissue / cell section are then irradiated with UV light to release the DSP-oligo sequences from the section. The DSP-oligonucleotides are then aspirated and transferred to wells in a microtiter plate. The information in each well indexes each of the pre-selected regions of interest on the tissue. Finally, the DSP-oligonucleotides are hybridized to NanoString barcodes and quantified on the nCounter platform (GeoMx). TMProduct catalog available at https: / / www.nanostring.com / products / geomx-digital-spatial-profiler / geomx-dsp. Therefore, GeoMx TM The technology also requires the purchase of expensive equipment, as well as the use of specially developed software and specialized DSP-containing probes.
[0015] WO2019 / 157445 and Merritt et al., 2020 (Multiplex digital spatial profiling of proteins and RNA in fixed tissue. Nature Biotechnology 38, 586-599) describe probe identification and quantification via NGS as an alternative to the nCounter platform, in addition to the above more detailed methods for DSP. For NGS analysis, DSP oligonucleotides (referred to as "identifier nucleotides" in WO2019 / 157445 or "indexed oligonucleotides" in Merritt et al.) may contain two primer binding sites and a sequencing adapter for amplification, as well as a specific nucleic acid sequence that identifies a unique molecular identifier and RNA target (i.e., barcode sequence). After UV-induced release of the target complementary sequence, the DSP oligonucleotide is subsequently PCR-amplified and sequenced. Thus, the method avoids the need for RNA reverse transcription and utilizes NGS for probe identification. However, the method is technically extremely difficult due to the need for considerably complex probe design. The unique barcode sequences present in DSP oligonucleotides can further induce off-target annealing events, thus potentially reducing assay specificity. This problem is exacerbated by the fact that the barcode sequences are located adjacent to unique molecular identifiers and primer binding sites. Barcode sequences are also known to increase the risk of PCR bias. Furthermore, due to the release of identifier nucleotides, samples may need to be exposed to UV light, for example, which can have harmful consequences for the nucleic acids present in the sample.
[0016] Spatial transcriptome methods such as SlideSeq and / or Visium Spatial Gene Expression (Rodriques et al., 2019. Science 363(6434):1463-1467) combine the use of short oligonucleotide probes and NGS for spatial capture and sequencing of RNA molecules released from proximal tissue regions. These methods involve the binding of spatially organized and barcoded oligo-d(T) primers or DNA-barcoded microparticles to the surface of a microscope slide in a tissue-specific / localized / aligned manner. When cells or tissues come into contact with and are processed by these slides, releasing RNA contents by osmosis, the primers capture generalized mRNA molecules in their vicinity. The captured mRNA is reverse transcribed into cDNA, incorporated into the spatial barcode of the primer, and then sequenced. During subsequent analysis, the barcode allows for the re-tracking of the intracellular region where the detected RNA was originally found (Stahl et al., 2016, Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353(6294), 78-82).
[0017] A recently published preprint by Marshall et al. describes a method called Hybridization of Probes to RNA for sequencing (HyPR-Seq; Marshal et al., 2020, HyPR-Seq: Single-cell quantification of chosen RNAs via hybridization and sequencing of DNA probes, BioRxiv preprint doi: https: / / doi.org / 10.1101 / 2020.06.01.128314, also published as Marshal et al., 2020, HyPR-Seq: Single-cell quantification of chosen RNAs via hybridization and sequencing of DNA probes. PNAS 117(52), 33404-33413). This method is based on the hybridization chain reaction (HCR) smFISH protocol. In short, HyPR-Seq first requires two ssDNA initiator probes to anneal to the target RNA via homologous base pairing. In the second step, these initiator probes serve as binding sites for a short hairpin oligo probe, which is then hybridized with a special "readout" probe. This "readout" probe then ligates to the 5' end of the initiator probe, ultimately yielding a single ssDNA fragment, which is first amplified by PCR and then sequenced. Thus, HyPR-Seq is an NGS-based technique that avoids the need for RNA isolation and reverse transcription, and transcript detection relies on a series of hybridizations of three layers of at least four different oligonucleotides, thus limiting the number of genes that can be analyzed simultaneously in a single experiment to fewer than 100 (Marshal et al., 2020). To date, there are no RNA detection methods that rely on next-generation sequencing but do not require steps such as RNA isolation, cDNA production, or signal amplification by sequential hybridization of complementary oligonucleotide probes.Conventional RNA-Seq methods inevitably suffer from RNA degradation and reverse transcription bias, which must be considered when evaluating raw sequencing data during bioinformatics assessments. Therefore, there is a risk of introducing bias into the analysis due to the probabilistic sampling of RNA contents from each cell, especially when the data originates from low-quality starting materials such as scRNA. In contrast, many FISH-based methods that rely on the use of oligonucleotide probes for RNA transcript detection can detect RNA at intracellular resolution, but are limited by the number of genes or transcripts that can be monitored simultaneously or by the high cost of purchasing specialized equipment and reagents. [Overview of the Initiative] [Problems that the invention aims to solve]
[0018] In light of the current state of this field, the inventors have addressed the challenge of providing a novel and highly sensitive method that can detect RNA at single-cell resolution, overcoming many of the limitations of currently used methods, and that has the ability to detect and quantify all RNA species in a sequence-specific manner, in all possible maturation states and their alternative splicing variants, isoforms, fusion products, or single nucleotide polymorphisms.
[0019] The problems are solved by the present invention, particularly by the subject matter of the claims. The method of the present invention is oligo-based mapping by specified sequencing (oligo-seq) or transcript oligo-based mapping by PCR (TOM-PCR). It is suitable for detecting not only RNA but also all types of nucleic acids present in a sample.
[0020] The present invention is a method for detecting nucleic acids, (a) Prepare a compartment containing nucleic acids; (b) Hybridizing at least one single-stranded DNA oligonucleotide probe, preferably multiple single-stranded DNA oligonucleotide probes, with the nucleic acid molecule in the compartment; (c) Remove any single-stranded DNA oligonucleotide probes that are not specifically hybridized with any nucleic acid in the compartment from the compartment; (d) Identify single-stranded DNA-oligonucleotide probes that specifically hybridize to nucleic acid molecules within the compartment by probe sequencing or probe amplification; and thereby determine the nucleic acid corresponding to the probe present in the compartment (including a step of providing a method that does not include a series of probe hybridizations as nucleic acid detection amplification means
[0021] Nucleic acids are biopolymers formed from monomeric building blocks called nucleotides. Each nucleotide consists of a 5-carbon sugar, a phosphate group, and a nitrogenous base. In the present invention, the term nucleic acid refers to naturally occurring nucleic acids, i.e., deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). The method of the present invention can be used particularly advantageously for RNA detection. Thus, preferably, throughout the present invention, the nucleic acid is RNA
[0022] DNA exists predominantly in a double-stranded state within cells and adopts a characteristic double helix shape. However, rarely, DNA can exist in a single-stranded state, for example, as part of an intermediate structure formed during DNA transcription and DNA replication, called an R-loop. Furthermore, ssDNA viruses encode genetic information in single-stranded circular DNA molecules
[0023] Ribonucleic acid (RNA) is a polymeric molecule assembled as a nucleotide chain formed by the sugar ribose, a phosphate group, and one of four nucleic acid bases: adenine (A), guanine (G), cytosine (C), and uracil (U). In contrast to most naturally occurring DNA, RNA is a single-stranded molecule that can form complex secondary and tertiary structures by intramolecular base pairing. Depending on its function, RNA can be assigned to various subclasses and species
[0024] Nucleic acid detection is interpreted here as meaning that a person skilled in the art can determine whether or not a particular nucleic acid is present in a sample by the method of the present invention. Detection also includes the quantification of nucleic acids, i.e., the discovery of how many copies of a particular nucleic acid are present in the sample. Furthermore, it includes comparison of nucleic acid amounts, i.e., testing whether a particular nucleic acid is more abundant in one sample than another or more abundant in another sample. The method of the present invention is sufficiently sensitive to detect nucleic acids that are rarely present in a sample. The nucleic acid to be detected may be, for example, any DNA or RNA of interest. In a preferred embodiment, the nucleic acid to be detected is endogenously produced within the compartment. However, the nucleic acid may be exogenous, i.e., introduced into the compartment from an external source, for example, by transfection or microinjection. An exogenous nucleic acid may be, for example, an artificial short hairpin RNA (shRNA) designed to knock down a protein-coding gene. The exogenous nucleic acid to be detected may also be naturally introduced into the compartment, for example, during a viral infection. In yet another embodiment, the exogenous nucleic acid may also be a nucleic acid tag conjugated with another chemical compound, for example, a biomolecule such as a protein, carbohydrate, lipid, or metabolite. For example, in a preferred embodiment, the exogenous nucleic acid may be a nucleic acid tag conjugated with an antibody, which will be further detailed below.
[0025] The nucleic acid-containing compartment of the present invention may be an organ, a eukaryotic cell or cell cluster, the nucleus of a eukaryotic cell, the nucleolus of a eukaryotic cell, the cytoplasm of a eukaryotic cell, mitochondria, chloroplasts, exosomes, or tissue within a prokaryotic cell or virus.
[0026] Eukaryotic cells may be, for example, plant cells, fungal cells, or animal cells. In a preferred embodiment, the eukaryotic cells are mammalian cells. Mammalian cells may preferably be of human origin, from a human patient having or diagnosed with a certain disease or disorder, or from a healthy subject. The cells may be, for example, tumor cells or stem cells. Such cells are particularly suitable for use in the present invention because they exhibit unique and highly characteristic transcriptional signatures that can be readily identified by the methods described herein. However, mammalian cells may also be non-human cells, such as cells from mammalian genetic model organisms, for example, mice, rats, rabbits, guinea pigs, pigs, or non-human primates.
[0027] The cells are preferably mammalian cells such as human cells, but the investigation and, if desired, comparison of RNA expression, etc., of other organisms such as Escherichia coli, yeast, Arabidopsis, Caenorhabditis elegans, African clawed frog, zebrafish, Nothobranchius farzai, Drosophila melanogaster, or planarian may also be of interest.
[0028] Nucleic acid-containing compartments may also originate from prokaryotic cells, i.e., bacteria or archaea. The present invention can also be applied, for example, to the study of spatial organization of the microbiome in biofilms or intestinal lumen where FISH has been successfully applied (reviewed in Tropini et al., 2017, The Gut Microbiome: Connecting Spatial Organization to Function. Cell Host & Microbe 21(4), 433-442; Liu et al., 2017, Low-abundant species facilitates specific spatial organization that promotes multispecies biofilm formation. Environ Microbiol. 19, 2893-905; Liu et al., 2019, Deciphering links between bacterial interactions and spatial organization in multispecies biofilms. The ISME Journal 13, 3054-3066).
[0029] In another embodiment, the nucleic acid-containing compartment may be, for example, an intracellular structure or organelle. For example, it may be the nucleus of a eukaryotic cell, e.g., a mammal, preferably a human cell. The nucleic acid-containing compartment may also be a substructure of the eukaryotic cell nucleus, i.e., a nucleolus. The compartment may also include the cytoplasm of a mammal, e.g., a human cell. The compartment may include only the organelle of the cell, or a combination thereof, and may be, for example, a complete human cell including the nucleus, cytoplasm, and mitochondria. For example, it may include the cytoplasm and mitochondria, but not the nucleus.
[0030] Exosomes are endosome-derived, small membrane-bound extracellular vesicles that are involved in intercellular communication and often contain mRNA and microRNA (miRNA) as cargo.
[0031] Cells may be derived from cell cultures, or may be analyzed ex vivo from living or dead organisms, i.e., from specific tissues after death, or from whole experimental organisms (e.g., whole Drosophila melanogaster embryos or any stage of Caenorhabditis elegans). Cells may be obtained, for example, from sections of brain regions generally associated with disease. Thus, cells may be the trunk of a complex tissue containing multiple different cell types. Even if the exact identity of the cells is not known at the time of analysis, it may be determined by the method of the present invention. This method may be suitable for identifying and describing previously uncharacterized cell types based on unique transcriptional signatures.
[0032] The cells targeted by the method of the present invention may be at any stage of the cell cycle. Depending on the purpose of the experiment, it may be helpful to isolate cells at a specific stage of the cell cycle, as transcriptional profiles can change considerably throughout the cell cycle. When comparing gene expression in multiple cells, all cells are preferably in a common cell cycle stage, for example, synchronized. The stage may be interphase, for example, G1, S, or G2 phase, mitosis, or cytokinesis. Preferably, the stage is interphase.
[0033] Alternatively, RNA transcription can be compared across cells at various cell cycle stages. Cells can also vary in their differentiation state. For example, cells may be totipotent or pluripotent stem cells with the ability to differentiate into different cell types. Alternatively, cells analyzed by the method of the present invention may be fully differentiated and fulfill specific functions in tissues. Cells may also be any cell in the process of differentiating from stem cells to terminally differentiated cells.
[0034] Nucleic acid-containing compartments can also be tissue sections containing multiple cells or a portion of them.
[0035] In one embodiment, the nucleic acid-containing compartment is sectioned prior to hybridization of the ssDNA oligonucleotide probe by step (b). This provides multiple fragments or sections. Sectioning of the nucleic acid-containing compartment can be achieved by any suitable method known in the art, e.g., freeze-ultrathin sectioning or cryo-grinding, preferably freeze-ultrathin sectioning. Frozen sections are preferably produced in the absence of resin embedding, e.g., by the Tokuyasu method (Tokuyasu, KT, 1973, A technique for ultracryotomy of cell suspensions and tissues. J. Cell Biol. 57, 551-65). The method includes cryoprotection of the fixed tissue using embedding in a saturated sucrose solution for short-term storage at a temperature of 0-25°C, preferably room temperature (20-25°C) or about 4°C, for example, 2 hours at room temperature or 2 hours at room temperature, followed by 1 day to 1 week at about 4°C. After embedding, the sucrose-embedded cell pellet or tissue or organism is placed in a metal stub, for example, acting as a sample holder, then frozen in liquid nitrogen and sectioned at -80 to -110°C, for example, about -100°C, depending on the cell type or tissue. Slightly modified methods (Guillot PV, Xie SQ, Hollinshead M., Pombo A., 2004 Fixation-induced redistribution of hyperphosphorylated RNA polymerase II in the nucleus of human cells. Exp. Cell Res. 295, 460-468; Pombo A, Hollinshead M, Cook PR, 1999, Bridging the resolution gap: Imaging the same transcription factories in cryosections by light and electron microscopy. J. Histochem. Cytochem. 47, 471-480) have been shown to provide good results.These methods preserve the overall cellular architecture better than those observed in unfixed frozen sections (McDowall et al., 1989, The structure of organelles of the endocytic pathway in hydrated cryosections of cultured cells, Eur. J. Cell Biol. 49, 281 - 294) and provide optimal preservation of active RNA polymerase and nuclear architecture (Guillot et al., 2004). The method of Chen et al., 2014, Small 10:3267 can be used as an alternative. Unfixed sections subjected to vitrification can be prepared, for example, by the method described in Lucik V., et al., 2013. J. Cell Biol. 202 (3), 407.
[0036] For example, nuclear sections may have a thickness of about 70 nm to about 1000 nm, preferably 150 to 220 nm or 180 to 200 nm, for a nuclear diameter of 5 to 15 micrometers. In this invention, slice thicknesses of less than 300 nm, for example 150 to 220 nm, preferably about 200 nm, are referred to as "ultrathin." Commercially available equipment for freeze-sectioning cells in sucrose medium is available (e.g., Leica UltraCut UCT 52 ultracryomicrotome). The sections may be 4-10 μm thick cryostat sections (https: / / www.protocols.io / view / Stellaris-RNA-FISH-Protocol-for-FrozenTissue-iwgs5v), 50-300 micrometer vibratome sections (https: / / www.protocols.io / view / exfish-tissue-slice-n6adhae), cells on a monolayer, or cells in suspension.
[0037] From cell sections, sections of isolated compartments, such as nuclear profiles, sections of nuclear components (especially for detecting RNA), and optionally sections of cytoplasm or mitochondria in the absence of mitochondrial components, can be prepared.
[0038] Sectioning results in a collection of fragments, i.e., multiple fragments. The optimal section thickness depends on the compartment size. It can be divided into 5-300 fragments, 10-100 fragments, more preferably 40-60 fragments or about 45-50 fragments. In one embodiment, the fragment thickness is uniform throughout the entire analysis. For another application, for example, for the relative quantification of several RNA species or as a calibration measure, e.g., if there is a certain RNA that is relative to others (e.g., actin), the thickness of various slices may also vary within a single compartment. Preferably, the DNA oligonucleotide library is constructed using ultrathin frozen sections on microscope slides, preferably laser microdissection slides, under conditions previously established for DNA- and RNA-cryoFISH (Branco, MR & Pombo, A, 2006, Intermingling of chromosome territories in interphase suggests role in translocations and transcription-dependent associations. PLoS Biol. 4, e138; Xie, SQ et al., 2006, Splicing speckles are not reservoirs of RNA polymerase II, but contain an inactive form, phosphorylated on Serine). 2residues of the C-terminal domain. Mol. Biol. Cell 17, 1723-1733; Branco, MR, 2006, Correlative microscopy using Tokuyasu cryosections: applications for immunogold labeling and in situ hybridization. In “Cell Imaging (Methods Express Series)”, ed. D. Stephens, Scion Publishing Ltd. (Bloxham, UK), 201-217; Ferrai, C., et al., 2010, Poised transcription factories prime silent uPA genes prior to activation. PLoS Biology 8, e1000270.).
[0039] If desired, the compartment is not sectioned before probe hybridization. Instead, the ssDNA oligonucleotide probe is brought into contact with the entire compartment, e.g., a group of cells or whole cells, which may or may contain the entire compartment or a complete nucleus or another organelle. When the compartment of the present invention is, for example, a group of cells in a tissue, the probe of the present invention may be hybridized with the group of cells in suspension. The cells are then washed to remove unbound or weakly bound probes, followed by separation into single-cell fragments by FACS. The hybridized probes may then be extracted from each individual cell for sequencing library preparation. The individual libraries may then be pooled and sequenced.
[0040] Alternatively, cell populations can be compartmentalized into oil droplets, as seen in Droplet-Seq and HyPR-Seq. Single cells can be isolated with a PCR mix (e.g., 1×EvaGreen Supermix) and barcoded beads, as designed, for example, in Macosko et al. (2015, Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell 161(5), 1202-1214) and provided by Chemgenes (https: / / www.fishersci.com / shop / products / macosko-2011-10-b / nc0927472). Cell preparation is sufficient to directly denature the probes from the target into the PCR mix (at 95°C) and then amplify them, thus eliminating the need for prior cell lysis before PCR; PCR amplification is performed within compartmentalized oil droplets. Barcoded beads individually barcode all probes from the same cell, providing single-cell resolution. The oil droplets are denatured, and the entire PCR product is sequenced. A commonly used droplet machine is BioRad These include the QX200, the Quanti3D system, and the Nadia system (Dolomite Bio).
[0041] The compartments may be on a solid support, such as a slide or cell monolayer as in CytoSpin, or the compartments may be in a suspension (Figure 6b). The cells and / or compartments are preferably immobilized to preserve nucleic acids and allow the probe to access the compartment of interest. Various immobilization regimens may cause loss of proteins or nucleic acids from cells or specific compartments and can be used for advantageous detection of RNA in specific compartments, such as the nucleus (Levsky, JM et al., 2002, Single-cell gene expression profiling. Science 297, 836-840; Pombo, A., 2003, Cellular genomics: which genes are transcribed when and where? Trends Biochem. Sci. 28, 6-9). In one embodiment of the present invention, cells and / or compartments may be treated with a mild surfactant, such as Triton X-100, or a saponin or Tween-20, up to 0.05% to 5%, preferably 0.1% to 0.5%, for 1 to 120 minutes, preferably 10 to 30 minutes, or with another agent such as a protease, to enhance primary probe access to nucleic acids within the compartment of interest.
[0042] In step (b), hybridization of the ssDNA oligonucleotide probe to nucleic acids, such as RNA, is preceded by the fixation of the nucleic acid-containing compartment, where, if desired, the nucleic acid-containing compartment is sectioned after fixation.
[0043] Numerous fixation methods are known in this field. Fixation may include, for example, the use of precipitating fixatives such as methanol, ethanol, or acetone. Precipitating fixatives are particularly suitable for fixing frozen sections.
[0044] In preferred embodiments, the immobilization of nucleic acid-containing compartments is achieved via a crosslinking agent such as formaldehyde or glutaraldehyde. Crosslinking is also achieved via UV or ionization irradiation. However, crosslinking by irradiation is less desirable because irradiation is a potent mutagen that can destroy the integrity of nucleic acids such as RNA and DNA.
[0045] Formaldehyde is preferably used as a crosslinking agent at a concentration of, for example, 0.5-8%, preferably 1-8%, 2-8%, or most preferably 4-8% (total w / w), in a buffered solution of, for example, 250 mM HEPES-NaOH pH 7.0-8.0 or in PBS or cytoskeleton (CSK) buffer (Tripathi et al., 2015, RNA Fluorescence In Situ Hybridization in Cultured Mammalian Cells. In: Carmichael G. (eds) Regulatory Non-Coding RNAs. Methods in Molecular Biology (Methods and Protocols), vol 1206. Humana Press, New York, NY. https: / / doi.org / 10.1007 / 978-1-4939-1369-5_11). For mammalian cells, the conditions are preferably pH 7.6-7.8, and contain 4% formaldehyde for 10 minutes to 24 hours, for example, 4% for 10 minutes, followed by 8% for 2 hours.For example, in the case of experimental organisms, crosslinking can be achieved by perfusing tissue or the entire organism with HEPES-buffered formaldehyde solution (e.g., 4%) or PBS-buffered formaldehyde solution, preferably for at least 30 minutes, followed by 30 minutes to 1 hour of ice-cold 4% formaldehyde in 250 mM HEPES-NaOH pH 7.6, and then tissue dissection for 1 to 3 hours of ice-cold 8% formaldehyde in 250 mM HEPES-NaOH pH 7.6 (Moeller et al., 2012, Proteomic analysis of mitotic RNA polymerase II complexes reveals novel interactors and association with proteins dysfunctional in disease. Mol. Cell. Proteomics 11(6):M111.011767; Winick-Ng et al., 2020, Cell-type specialization in the brain is encoded by specific long-range chromatin topologies, Biorxiv. https: / / doi.org / 10.1101 / 2020.04.02.020990; https: / / www.protocols.io / view / Stellaris-RNA-FISH-Protocol-for-FrozenTissue-iwgs5v; https: / / www.protocols.io / view / exfish-tissue-slice-n6adhae).
[0046] Tissue or cell fixation can be performed at various intensities. The method of the present invention enables the detection of nucleic acids even under extremely strong crosslinking conditions, such as treatment with 4-8% formaldehyde for several hours. High levels of crosslinking more effectively preserve cells and are advantageous for preserving nucleic acid species during subsequent processes. Therefore, high crosslinking prepares cells for time-consuming applications or subjecting them to high mechanical stress. Surprisingly, the present invention can be applied to highly crosslinked cells or tissues, particularly when combined with sectioning.
[0047] In another embodiment, the nucleic acid-containing compartment is not immobilized before oligonucleotide probe hybridization and is optionally vitrified thereafter. Vitrification refers to the rapid freezing of cells, tissues, or entire organisms to preserve cellular structure and avoid any artifacts that may be introduced by chemical crosslinking agents (e.g., formaldehyde).
[0048] In step (b) of the method of the present invention, a nucleic acid-containing compartment or, preferably, a fragment thereof, such as an ultrathin fragment, is brought into contact with a plurality of single-stranded oligonucleotide probes. The probes are sections of chemically modified oligonucleotides containing or consisting of DNA, RNA, or, for example, LNA (Lock Nucleic Acid), which can be used to detect the presence of nucleic acids in a sample. In the present invention, the probes are single-stranded DNA oligonucleotides. DNA oligonucleotides are typically short DNA molecules with a length of 150 nt or less. They can be manufactured as single-stranded molecules having any sequence that can be specified by the user, or the sequence may be random. These oligonucleotides bind to the target nucleic acid via sequence-specific complementary base pairing to form a stable double helix. The greater the number of complementary base pairs between the oligonucleotide probe and its target sequence, the stronger the non-covalent bond between the two strands. Stringent washing removes non-specific probes, leaving only the strongly paired strands bound to each other. The hybridization properties (melting temperature) are increased, meaning that the specificity is also increased by using chemically modified RNA molecules, including LNAs, which may be advantageous, for example, for detecting specific single nucleotide polymorphisms or less abundant RNA species in clinical samples (Domiguez and Kolodney, 2005, Wild-type blocking polymerase chain reaction for detection of single nucleotide minority mutations from clinical specimens. Oncogene 24, 6830-6834). Multiple probes refer to any number of DNA oligonucleotides greater than one. The exact number of probes forming the so-called probe library of the present invention may vary and correspond to the number of genes being tested by the present invention.
[0049] In the present invention, the term “the method herein does not involve a series of probe hybridizations as nucleic acid detection and amplification means” should be interpreted as excluding the use of secondary “helper / further” probes that hybridize with an initial probe bound to a target. The term also means that the method of the present invention does not involve the step of releasing the initial probe from its target, followed by its replacement with a novel probe.
[0050] Accordingly, the method of the present invention preferably does not include a ligation step, i.e., a step in which, for example, two or more probes of the present invention are ligated together, for example, chemically or enzymatically, or one or more probes of the present invention are ligated with another type of probe, in order to facilitate probe amplification and gene detection.
[0051] In a preferred embodiment, the probe of the present invention does not contain any molecular tags, such as DNA, RNA, or biotin tags. The probe preferably does not contain any molecular tags bound to the probe via cleavage motifs, such as restriction enzyme sites or photocleavable linkers. Accordingly, the method of the present invention does not include a step of cleaving molecular tags, such as DNA or RNA tags, from the probe, and it is preferable to initiate the need for UV treatment or restriction enzyme treatment, for example.
[0052] In a preferred embodiment, the nucleic acid to be detected by the probe used in the method of the present invention, i.e., the target nucleic acid, is RNA.
[0053] The method described herein does not sequence the RNA molecule itself, nor the cDNA gene produced therefrom, nor any cleavable (e.g., DNA) label or tag, but rather identifies the target RNA by direct sequencing or amplification of a complementaryly bound ssDNA oligonucleotide probe. Therefore, the method of the present invention preferably does not include a cDNA production step. The contact in step (b) is carried out under conditions that allow the probe to bind (or hybridize) with the RNA molecule in the compartment. Thus, the probe hybridizes with the RNA molecule in the compartment. The conditions may be selected so as to prevent hybridization of the probe to DNA that may still be present in the compartment, for example, by treating the compartment with DNase before ssDNA probe hybridization to remove dsDNA and ssDNA (Pombo, A., et al., 1994, Adenovirus replication and transcription sites are spatially separated in the nucleus of infected cells. EMBO J. 13(12), 5075-5085). Alternatively, specific dsDNases can be used for post-probe hybridization of dsDNA. Suitable dsDNases can be commercially obtained, for example, from ThermoFisher (Cat.No.: EN0771).
[0054] Examples of conditions for hybridizing the probe with RNA include a saline-sodium citrate (SSC) buffer containing, for example, 10-50%, preferably 30%, formamide and 5-20%, preferably 10%, dextran sulfate, at a temperature of 30-65°C, e.g., 37-55°C. Optionally, the buffer may contain ribonucleoside vanadyl complexes (RVCs), or, if the hybridization buffer does not contain formamide, RNaseOUT. TMAppropriate ribonuclease inhibitors may be added. The buffer may further contain tRNA (e.g., from yeast) as a nonspecific blocker at a concentration of, for example, about 1 mg / mL. Hybridization may be carried out for, for example, at least 15 minutes to 1 week, preferably at least 30 minutes, for example, at least 45 minutes, at least 1 hour, at least 2 hours, or at least 5 hours. It may also be continued for longer periods, for example, overnight, two nights, or three nights. Appropriate conditions are also described in the examples below. It should be noted that temperature and formamide concentration not only do not affect probe hybridization but also regulate the stringency of stringent washing used to remove unbound or partially hybridized probes. Thus, the conditions selected above greatly contribute to the sensitivity and specificity of the method of the present invention.
[0055] In another embodiment, the nucleic acid to be detected by the method of the present invention may be DNA, preferably single-stranded DNA.
[0056] Therefore, the contact step (b) is carried out under conditions that allow the probe to hybridize with the DNA molecules in the compartment. For example, when the DNA to be detected is dsDNA, it is denatured to obtain ssDNA that is accessible for probe hybridization. Optionally, RNase may be used to pre-treat the compartment before or after probe hybridization to remove RNA from the sample. Probe hybridization may be carried out under the same conditions as described above for RNA detection.
[0057] For accurate DNA detection, the method of the present invention may be further adapted to enable reliable differentiation between double-stranded DNA probe complexes and naturally occurring dsDNA fragments (e.g., genomic or plasmid DNA). For example, oligonucleotide probes may be modified with barcode sequences or unique molecular tags to facilitate identification, as described herein. Alternatively, genomic DNA modifications such as cytosine methylation may be considered to remove genomic DNA from the analysis.
[0058] In yet another embodiment, the method of the present invention enables the simultaneous detection of both RNA and DNA within a compartment. For the detection of both RNA and dsDNA, for example, a probe may first be added to hybridize with RNA without denaturing the dsDNA, and then another set of probes, for example, having various barcode sequences or tags, may be added to hybridize with the DNA after denaturation. RNA and ssDNA can also be detected simultaneously as described above for RNA detection, provided that conditions that disrupt ssDNA are avoided.
[0059] In preferred embodiments, the method of the present invention does not involve an RNA isolation step. Advantageously, nucleic acid isolation is not performed before step b). The method described herein preferably utilizes an ssDNA oligonucleotide probe that hybridizes in situ with the nucleic acid, i.e., within the compartment where the nucleic acid naturally exists. Therefore, the method of the present invention is suitable for localizing the detected nucleic acid within the compartment.
[0060] In another embodiment, the method of the present invention can also be used for the detection of nucleic acids isolated from a compartment. The nucleic acids can be detected, for example, in vitro, in a suitable buffer in a test tube. In such a scenario, the ssDNA oligonucleotide probe can be added directly to the isolated nucleic acid in the test tube for hybridization. Alternatively, probe hybridization can be performed in situ, after which the nucleic acid and the fully bound probe can be isolated from the nucleic acid-containing compartment.
[0061] The ssDNA oligonucleotide probes used in the method of the present invention can be produced by any method known in the art. For example, each probe can be synthesized separately (Femino, AM, et al., 1998, Visualization of Single RNA Transcripts in Situ. Science, 280(5363), 585-590). Recently, a large number of probes have been prepared by large-scale parallel synthesis on solid substrates such as microchips, before amplification and release into solution, if desired (Beliveau et al., 2012; Beliveau et al., 2017, In situ super-resolution imaging of genomic DNA with OligoSTORM and OligoDNA-PAINT. Methods Mol Biol. 1663, 231-252; https: / / oligopaints.hms.harvard.edu / protocols).
[0062] The ssDNA oligonucleotide probe of the present invention may have a length of about 55 to 150 nucleotides (nt), preferably 70 to 120 nt or about 80 to 115 nt. In the following examples and preferred embodiments, the probe may have a length of 75 to 85 nt or 107 to 113 nt. For example, the probe length may therefore be about 75 to 85 nt.
[0063] Each probe typically contains a target region whose nucleotide sequence is complementary to the nucleotide sequence of the target nucleic acid (e.g., target RNA) adjacent to a pair of primer regions, such as a universal primer region (Figure 1; probe structure). The nucleotide sequence of this central target region specifically hybridizes with the individual nucleic acid. Hybridization is the process by which two single-stranded nucleic acid molecules anneal to each other via complementary base pairing, i.e., a nitrogen-containing base adenine present in one nucleic acid strand pairs with thymine (DNA) or uracil (RNA) in the other strand, forming a hydrogen bond, while cytosine pairs with guanine. Importantly, the targeting region of the probe of the present invention is designed to exhibit high stringency with respect to each of its target nucleic acids to ensure specific binding, i.e., it cannot anneal with different nucleic acids. Preferably, the target region exhibits 100% complementarity with the sequence in the target nucleic acid. Thus, the method described herein has the ability to differentially detect different nucleic acid molecules with a single nucleotide. Possible methods for designing highly sensitive oligonucleotide probes capable of distinguishing various alleles by SNP detection are detailed in Beliveau et al., 2014, utilizing publicly available SNP collection databases for various species. The target region of the probe is designed to have stability that allows the double strand formed by the probe and its target nucleic acid to withstand stringent washing at a temperature of 37°C to 65°C, for example, 45°C to 50°C or more preferably 47°C, in order to remove partially hybridized probes before the denaturation of specifically hybridized probes, i.e., before the probe is removed from its target nucleic acid. Removal of excess probe by stringent washing at a higher temperature ensures high specificity of the detection method. In the following examples, stringent washing was performed in a buffer containing 40% formamide at 47°C in the presence of dextran. The target region may have a length of, for example, 20 to 50 nt, preferably 30 to 45 nt, such as 30 nt, 31 nt, 32 nt, 33 nt, 34 nt, 35 nt, 36 nt, 37 nt, 38 nt, 39 nt, 40 nt, 41 nt, 42 nt, 43 nt, 44 nt, or 45 nt.In two specific preferred embodiments, the target region may have a length of approximately 35 nt or approximately 39–45 nt. The target region may be longer than 45 nt, for example, 45–50 nt. Increased target region length increases probe specificity. However, the target region may be shorter than 30 nt, for example, to hybridize with small target nucleic acids. For example, mature miRNAs are only 20–25 nt long. Therefore, the length of the target region of the probe corresponds to the length of the target miRNA. The target region of the probe may have a minimum length of 20 nt to still ensure specific and selective detection of the target nucleic acid. Besides probe length, hybridization stringency may also depend on the exact nucleotide composition of the probe. High GC (guanine / cytosine) content usually correlates with high stringency.
[0064] In another embodiment, the target region or probe has less than 100% complementarity with the nucleic acid of interest, e.g., 85-99% or 90-95% complementarity. Incomplete complementarity may be desirable for applications where, for example, the exact genomic sequence of the target nucleic acid with 100% reliability is unknown. In particular in humans, SNPs between various alleles may be unknown. A probe that allows for a certain amount of mispairing can therefore be used to evaluate complex gene expression from two different alleles.
[0065] In an optional embodiment, the target region of the probe does not have to be designed to hybridize with a predetermined target nucleic acid. Instead, the target region may consist of a randomly selected stretch of nucleotides. Such a “random” oligonucleotide probe may be added to the compartment with the intention of hybridizing with an unknown RNA or DNA molecule.
[0066] Universal primer regions on each side of the target region allow for probe amplification and can be added to the sequencing adapter (Figure 5a). Optionally, the universal primer regions can be used as target locations for fluorescence in situ hybridization (FISH) probes and during optimization to allow for direct microscopic verification of the success of probe hybridization (Figure 4). Each universal primer has a length of 15–30 nt, preferably 20–25 nt, e.g., 21–23 nt. Optionally, the universal primer has a length of 22 nt.
[0067] Preferably, the single-stranded DNA oligonucleotide probe further includes a unique molecular identifier (UMI). The UMI is a randomly assembled short nucleotide sequence that serves as a unique molecular tag. Each UMI individually labels a specific probe, thus facilitating reliable probe identification while effectively reducing quantitative bias introduced by errors and amplification. Labeling of the probes of the present invention with UMIs is preferred in applications where nucleic acids, such as RNA, should be quantified within a compartment. The UMI can further facilitate the identification of rare transcript variants, for example, when processing RNA in samples with small overall amounts.
[0068] Optionally, ssDNA oligonucleotide probes may further include at least one (e.g., at least two) identification barcode sequences. Barcodes differ from UMIs in that they label a collection of different probes that can be grouped by common features, rather than uniquely identifying individual probes (Beliveau et al., 2012). For example, a barcode sequence can label all probes targeting the same specific gene for the same purpose. Separately and / or in addition to this, they may be shared by all probes targeting the same gene region of the gene, e.g., exons and introns or exon / intron junctions. Barcodes can also aggregate probes for genes that share a specific function or common pathway. Barcode sequences may further serve as primer binding regions during PCR-based quantification.
[0069] Optionally, single-stranded DNA oligonucleotide probes do not contain UMIs. Optionally, single-stranded DNA oligonucleotide probes do not contain barcode sequences, and in certain embodiments, neither UMIs nor barcode sequences. These components of the probe are unnecessary and, since they also need to be sequenced or amplified in the method of the present invention, may introduce bias that can be advantageously avoided. Barcodes can further introduce off-target annealing events, i.e., annealing with RNA not intended for targeting, which can reduce assay specificity and introduce noise into the data. This can be further problematic when adjacent to UMIs or primer regions or homologous sites, as this further increases the chances of off-target binding or increases the number of DNA barcodes used. Therefore, it is preferable that the probe does not contain barcode sequences.
[0070] For example, while it may be interesting to implement the method of the present invention with one type of probe to elucidate whether a specific RNA is expressed in a cell, typically, multiple probes are multiple different probes; that is, the single-stranded DNA oligonucleotide probes used in the method of the present invention specifically hybridize to multiple target nucleic acids present in a compartment, e.g., target RNA or various regions on a single target nucleic acid. For example, probes may be used to target a set of RNAs transcribed from genes that serve as specific cell type and / or differentiation markers, or to detect viral transcripts. Probes may be transcripts of well-known stem cell markers such as Oct4 or Sox2. Probes may also detect transcripts of marker genes that represent and track the differentiation of stem cells into different cell types or cell lineages. Another subset of genes that can be targeted by multiple probes may include, for example, a set of inflammatory cytokines, members of a specific signaling cascade, or specific cancer markers and other disease markers. Probes may also be specifically designed to detect cell cycle marker genes such as cyclin E or B to classify different compartments or fragments based on their stage in the cell cycle. Further probes targeting known housekeeping genes common to all or most of the analyzed cell types and unresponsive to environmental changes may be included, for example, as controls to normalize differential gene expression analysis.
[0071] A typical probe library that can be used may target hundreds of nucleic acids, such as mRNA transcribed from genes like marker genes. Such probe libraries may be suitable for multiple applications. For example, a probe library may target at least two nucleic acids, such as 2-100,000, 3-50,000, 4-25,000, 5-10,000, 10-5,000, 15-2,000, 20-1,000, 25-500, 30-250, 40-200, or 50-100 nucleic acids, such as mRNA. It may also target at least 50, at least 100, at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, or at least 1,000 nucleic acids, such as mRNA. In one embodiment, the probe library may also target fewer than 50 nucleic acids, such as RNA. For example, when the probes are detected by PCR-based probe amplification rather than large-scale parallel sequencing, probe libraries can be prepared that target up to 5, 10, 15, 20, 25, 50, 75, 100, 200, 300, 400, or 500 nucleic acids. However, in a preferred embodiment, probes from a common probe library or a fully common library may be used regardless of whether the probes are identified by sequencing or amplification.
[0072] To maximize the amount of information obtainable by the method of the present invention, a single nucleic acid, such as an mRNA transcript, can be targeted at various locations using, for example, multiple probes. Multiple probes may cover, for example, several exons, introns, and exon / intron or exon / exon junctions of a single gene to detect different splicing variants. Alternatively, probes may be specifically designed to target or recognize allele variants of transcripts from different genes, for example, due to nucleotide deletions, insertions, inversions, or substitutions. The probes used in this method may exhibit sufficient specificity to accurately distinguish between different transcripts at a single SNP. Gene fusions can be detected by designing probes that target known fusion junctions or by comparing the detection levels of probes that bind to the 3' or 5' ends of transcripts from different genes, respectively.
[0073] In one embodiment, a probe may cover the entire range of nucleic acids, such as transcripts from genes, including all exons and introns. Therefore, the number of probes that can bind to a single transcript may depend on the length of the encoding gene. For example, a 2000nt transcript may be targeted with more probes than 500nt RNAs. Designing probes that cover the entire length of the transcript of a gene of interest provides the maximum amount of information about the presence of the gene or transcript, particularly splice variants, isoforms, or fusion products. For example, probes for all exons and introns of a gene, or, if appropriate, all splice variants, may be used. However, in some experiments, especially when many genes are under consideration, it may be sufficient to determine which genes are actively transcribed or not. Therefore, to reduce cost and workload, RNA molecules may be targeted by only a few probes or, at most, a single specific probe. This is possible because the presence of RNA is determined by the method of the present invention through the sequencing or amplification of probes. In contrast, FISH-based methods for detecting RNA by microscopy typically require hybridization of multiple probes with transcripts to produce a detectable FISH signal. Similarly, FISH-based HyPR-Seq requires hybridization of at least one pair of initiator probes to the target RNA, followed by a series of annealing steps involving metastable hairpin oligos and "readout oligos."
[0074] Therefore, nucleic acids, such as transcripts from genes, can be targeted by as few as 1-100, 1-50, 1-20, 1-10, 1-5, or 1-3 oligonucleotide probes, as well as about 1-100, 1-700, 100-500, 150-400, or 200-300 oligonucleotide probes. The probe concentration sufficient for effective RNA detection according to the present invention may depend not only on the size and complexity of the initial probe library, but also on the section under investigation and the pre-probe hybridization processing method. For effective RNA detection on cell frozen sections about 200 nm thick, each oligonucleotide probe may be used at a concentration of about 0.1-1 μM, preferably 0.2-0.8 μM, 0.3-0.7 μM, or 0.4-0.6 μM, in a suitable buffer solution (e.g., water). In a particularly preferred embodiment, the probe may have a concentration of 0.5 μM. Preferably, about 0.1 nM per unique probe in the library.
[0075] Preferably, the single-stranded DNA oligonucleotide probe specifically hybridizes with a target RNA selected from the group including mRNA, rRNA, snRNA, snoRNA and tRNA, miRNA, siRNA, shRNA and piRNA (small regulatory RNAs), eRNA and lncRNA (long regulatory RNAs), circRNA, tracrRNA, crRNA, retrotransposons, viral RNA, satellites, TERC, vtRNA, DDRNA, PROMPT, or combinations thereof (RNAs involved in protein biosynthesis).
[0076] In principle, the present invention therefore enables the detection and identification of all RNA molecules found in a compartment.
[0077] In a preferred embodiment, the target RNA is mRNA. mRNA transmits genetic information from the DNA blueprint to ribosomes, where the encoded information is translated into polypeptide chains. Therefore, detection of mRNA in a compartment provides valuable information about protein-coding gene expression. In a particular embodiment, the single-stranded DNA oligonucleotide probes of the present invention may form a library that specifically hybridizes with substantially all mRNA present in the compartment. Using such a probe library, the method is not limited to the detection of equivalent small subsets, such as marker mRNAs, but rather enables the detection of the entire protein-coding transcriptome. Thus, the method of the present invention also provides a means for testing differences in comprehensive gene expression in response to, for example, changes in environmental conditions, aging, or altered (i.e., impaired or enhanced) cellular signaling.
[0078] In eukaryotic cells, mRNA is transcribed from DNA in the cell nucleus by RNA polymerase as a long precursor transcript known as precursor mRNA (premRNA). Before being transported to the cytoplasm for translation into protein, premRNA undergoes a series of processing steps in the nucleus. During one of these steps, premRNA is subjected to a process called splicing, in which non-coding intergeneric regions (introns) are cleaved from the nascent transcript and the remaining coding regions (exons) fuse together to form mature mRNA. A single premRNA can be spliced in various ways depending on the selection of segments considered as introns or exons by the splicing mechanism (spliceosome). For example, exons may be stretched or skipped, and introns may remain in the final transcript. Alternative splicing results in different mature mRNA variants that can produce a range of unique proteins with various functional properties. The probes used in this invention can detect premRNA and / or such splice variants. The probes can be further designed to detect gene isoforms, i.e., mRNAs that give rise to the same gene or locus but are different from each other because they were transcribed from different transcription start sites (TSSs), for example.
[0079] However, mRNA represents only a small portion of the total RNA present in a cell. It is estimated that only 2% of the entire human genome codes for proteins. The remaining 98% of RNA transcripts found in mammals are considered non-coding (Dhanoa, JK, Sethi, RS, Verma, R., Arora, JS, & Mukhopadhyay, CS, 2018, Long non-coding RNA: its evolutionary relics and biological implications in mammals: a review. Journal of animal science and technology, 60, 25). Therefore, the ssDNA oligonucleotides used in the methods of the present invention can hybridize with and detect non-coding RNA, either separately or in addition to it. For example, the probe can detect transfer RNA (tRNA). tRNA transports amino acids to ribosomes and plays a central role in translation from mRNA to polypeptide chains. The probe can also be specifically designed to detect ribosomal RNA (rRNA). rRNA represents the most commonly found class of RNA in cells and is associated with the protein set for ribosome formation. rRNA not only performs structural and functional roles but also exhibits enzymatic activity by catalyzing peptide bond formation between amino acids that make up polypeptide chains. Therefore, rRNA belongs to the class of ribozymes. Due to its ubiquitous abundance, rRNA is largely excluded from classical RNA-Seq experiments before sequencing by selective depletion or exclusively pre-selected polyadenylated transcripts. However, comparisons of rRNA sequences across species provide useful information about phylogenetic relationships and species diversity.
[0080] Another example of a ribozyme detectable by the probe in the method of the present invention is micronuclear RNA (snRNA), which, along with a group of proteins, forms the spliceosome. snRNA catalyzes the removal of introns from premRNA during the splicing process. rRNA, snRNA, and tRNA processing are regulated within the nucleolus, a small region of the nucleus, by the snoRNA family.
[0081] Regulatory RNAs (RIRNAs) are non-coding RNAs that regulate various biological processes. Among these RNAs, small regulatory RNAs (SRIs) are particularly well-studied and therefore of great interest in transcriptome studies. SRIs are less than 40 nt in size and typically regulate gene expression post-transcriptionally. Of these SRIs, microRNAs (miRNAs), small interfering RNAs (siRNAs), and Piwi-interacting RNAs (piRNAs) are the most well understood. Hundreds of different and evolutionarily conserved miRNAs are estimated to regulate the expression of over 60% of all human protein-coding genes in virtually all biomolecular pathways (Catalanotto, C., Cogoni, C., & Zardo, G., 2016, MicroRNA in Control of Gene Expression: An Overview of Nuclear Functions. International journal of molecular sciences, 17(10), 1712). Due to their specific roles, particularly in regulating cell death and proliferation, microRNAs can be both tumor suppressors and oncogenes. Therefore, they are commonly used as tumor markers. Probes designed to specifically hybridize to such cancer-associated miRNAs thus reliably identify cancer cells in tissues. miRNAs are produced from long precursor transcripts that form hairpin structures and are processed via endonuclease-like cleavage. miRNAs silence post-transcriptional gene expression via a molecular mechanism known as RNA interference (RNAi), which directs protein complexes to target mRNA via complementary base pairing to initiate mRNA degradation or translation inhibition. Small interfering RNAs (siRNAs) utilize a biodevelopmental pathway similar to miRNAs, except they begin as double-stranded RNA molecules. Similar to miRNAs, they use RNAi to prevent target mRNA from being translated into protein. The probes of the present invention specifically detect, for example, mature versions of these various small RNA species. In addition to or separately from this, probes may be designed to detect long primary or precursor transcripts.Short hairpin RNAs (shRNAs) are artificially produced molecules exogenously introduced by researchers into cells or organisms to reduce the expression of a target gene via the RNAi mechanism (gene knockdown). The methods of this invention allow scientists to verify the effectiveness of shRNA introduction and gene knockdown, for example, by evaluating the levels of shRNA and target mRNA in model organisms or cells. Slightly longer Piwi-interacting RNAs (piRNAs) are small non-coding RNAs, best known for transposition factor silencing.
[0082] In addition to short regulatory RNAs such as miRNAs, siRNAs, or piRNAs, the subset of non-coding regulatory RNAs further includes the class of long non-coding RNAs (lncRNAs) and enhancer RNAs. lncRNAs are highly expressed in humans, are at least 200 nt long, and are often processed in the same manner as mRNA. However, lncRNAs differ from mRNA in that they typically lack an open reading frame (ORF). LncRNAs regulate various cellular processes, including transcription, post-transcriptional regulation, RNAi, splicing, translation, and epigenetic regulation. However, the biological relevance of many lncRNAs is currently unknown (Dhanoa et al., 2018). For example, many lncRNAs encode peptides whose actual function is unknown (van Heesch et al., 2019, The Translational Landscape of the Human Heart. Cell 178(1), 242-260). Analysis of lncRNAs using the methods of the present invention provides novel information about their expression, localization, and ultimately, their function.
[0083] eRNAs are 50–2000 nt long and are transcribed from the DNA sequence of the enhancer region. Their expression correlates with the activity of the corresponding enhancer, and therefore they are suitable markers for distinguishing between active and quiescent enhancers (Arnold et al., 2020, Diversity and Emerging Roles of Enhancer RNA in Regulation of Gene Expression and Cell Fate. Front. Cell Dev. Biol.). eRNAs can regulate mRNA transcription through enhancer-promoter interactions, chromatin modifications, or by promoting the regulation of transcriptional mechanisms.
[0084] Circular RNAs (circRNAs) are single-stranded, loop-forming RNA molecules with often unknown molecular functions. Some circRNAs are thought to act as miRNA sponges (Kristensen et al., 2019, The biogenesis, biology and characterization of circular RNAs. Nat Rev Genet. 20(11), 675-691), while others are thought to encode proteins (van Heesch et al., 2019). Importantly, the expression of specific circRNAs in the human brain has been associated with the onset and progression of neurodegenerative disorders, particularly Alzheimer's disease (AD). Therefore, they are considered potential biomarkers for the diagnosis of AD (Akhter, 2018, Circular RNA and Alzheimer's Disease. Adv Exp Med Biol. 2018;1087:239-243).
[0085] The present invention may also be suitable for the detection of foreign, pathogenic RNA in cells, such as viral RNA genomes and satellites or parts thereof. It may be even more suitable for the detection of bacterial RNA associated with antiviral immune defense, such as transactivating RNA (tracrRNA), or CRISPR-RNA (crRNA) and modified versions thereof used in CRISPR / Cas-mediated genetic engineering. TERC (short for telomerase RNA element) is a non-coding RNA found in eukaryotes that serves as a template for telomere elongation by the enzyme telomerase (Feng et al., 1995, The RNA component of human telomerase. Science 269(5228), 1236-1241). DDRNAs are short non-coding RNAs produced at DNA double-strand break sites and are required for the DNA damage response in cells (Michelini, F., Pitchiaya, S., Vitelli, V. et al., 2017, Damage-induced lncRNAs control the DNA damage response through interaction with DDRNAs at individual double-strand breaks. Nat Cell Biol 19, 1400-1411). Retrotransposons are DNA molecules within the genome that are transcribed into RNA, then converted back into DNA by reverse transcription, and subsequently inserted at various genomic locations. Vault RNA (vtRNA) is a part of ribonucleoprotein particles known as vaults, which likely function in intracellular processes, particularly in nuclear plasmodesmotic transport. Promoter upstream transcripts (PROMPTs) are reverse-oriented non-coding RNA transcriptions of most active protein-coding genes approximately 1–1.5 kb upstream of the transcription start site. PROMPTS are normally rapidly degraded by RNA exosomes (Lloret-Llinares et al., 2016, Relationships between PROMPT and gene expression. RNA Biol. 13(1), 6-14). In summary, there has been recent evidence for numerous novel RNA species, but their functions are often not fully understood.The method of the present invention is extremely important for obtaining novel information on these RNA species through selective recognition by probes in the method of the present invention.
[0086] Alternatively, single-stranded DNA oligonucleotide probes can specifically hybridize to target DNA selected from the group including chromosomal DNA, mitochondrial DNA, chloroplast DNA, bacterial DNA, plasmid DNA, viral dsDNA, or double-stranded DNA containing double-stranded DNA transposons.
[0087] Plasmids are small, circular extrachromosomal DNA molecules that are physically separated from chromosomal DNA. Plasmids are typically found in bacteria and often carry genes that are beneficial to bacterial survival, such as antibiotic resistance. Artificially produced plasmids are often used as vectors for genetic modification of cells or whole animals, or for carrying out molecular cloning.
[0088] Viruses in Group I of the viral classification system possess dsDNA as their genetic material. In some viruses, the viral dsDNA genome has a circular shape (Baculoviridae, Parvoviridae), while in others, the viral DNA is linear (Adenoviridae, Herpesviridae).
[0089] DNA transposons are DNA sequences that can change position within the genome. Using a cut-and-paste-like mechanism, dsDNA is moved to a new gene locus and integrated by enzymes called transposases.
[0090] To enable hybridization of the oligonucleotide probes of the present invention, dsDNA must first be denatured into a single-stranded state. Denaturation of naturally occurring dsDNA is preferably carried out thermally, i.e., by raising the temperature of the DNA-containing sample to the point at which the hydrogen bonds between nitrogen-containing bases of the DNA are broken. The exact denaturation temperature depends on the length of the dsDNA molecule and its GC content. Alternatively, the DNA can be denatured chemically, for example, by exposure to NaOH or a high concentration of salt, or by treatment with DNAe I, to expose the ssDNA.
[0091] Before the two ssDNA strands can be restored to their original double-stranded form, ssDNA oligonucleotide probes must be added to the sample to bind to each target single-stranded DNA region.
[0092] In yet another embodiment, a single-stranded DNA oligonucleotide probe may be designed to specifically hybridize to target DNA containing a single-stranded DNA species selected from the group including viral ssDNA genomes, helicrons, and transiently exposed stretches or DNA damage sites of single-stranded genomic DNA present in, for example, R-loops. Viruses of group II of the viral classification system have ssDNA genomes that take on a circular morphology. The probe of the present invention may be designed to specifically detect and identify viral ssDNA genomes in compartments.
[0093] Similarly, the probe can also detect specific DNA transposons known as helicons, which switch genomic locations involved in the production of circular ssDNA intermediates through a rolling cycle mechanism.
[0094] The term R-loop is a triple-stranded nucleic acid structure consisting of a DNA:RNA hybrid and non-template sDNA. R-loops typically develop at active transcription sites when newly transcribed RNA is untwisted and hybridizes with the template guide DNA strand, thereby replacing the non-template passenger DNA strand (Allison and Wang, 2019, R-loops: formation, function, and relevance to cell stress. Cell Stress 3(2), 38-47). Traditionally, R-loops are detected by DNA sequencing using a sequence-independent, structure-specific antibody, followed by a method called DRIP-Seq. In the method of the present invention, a specially designed ssDNA oligonucleotide probe can hybridize, for example, with the displaced non-template DNA strand.
[0095] ssDNA stretching can also occur transiently in other situations, for example, at the location of agglutinative replication forks due to DNA damage.
[0096] The oligonucleotide probes of the present invention can also detect fragmented or more highly degraded DNA molecules, such as ancient DNA obtained from archaeological sites or prehistoric animals.
[0097] In summary, the present invention therefore enables scientists to detect any target DNA molecule in a DNA-containing compartment, as long as the DNA exists in a single-stranded state before probe hybridization.
[0098] Accordingly, in one embodiment, a library of single-stranded DNA oligonucleotide probes can detect substantially all RNA or all DNA or combinations thereof present in a compartment, as described herein. Previous bioinformatics analyses have shown that over 90% of the entire human genome and 100% of the Caenorhabditis elegans genome are covered at a density of approximately 10 probes / kb by specially designed unique DNA oligonucleotide probes known as oligopaints, similar to the probes used in this invention (Beliveau et al., 2012). Thus, the present invention has the ability to selectively detect RNA covering up to 75%, 80%, 85%, 90%, 95%, or 99% of the genome present in a cell. In one embodiment, the probe library of the method of the present invention can cover, for example, at least 99%, preferably 100%, of the RNA or DNA or combinations thereof present in a compartment of the test genome. Then, substantially all RNA or all DNA or combinations thereof present in the compartment can be detected.
[0099] The method of the present invention can also effectively contribute to the detection of nucleic acid modifications. Probes can be designed, for example, to detect rare edited RNA molecules. RNA editing relates to the process by which the nucleotide sequence of RNA is altered after transcription, for example, through nucleotide deletion, insertion, or substitution. RNA editing can affect the activity, stability, and localization of RNA. Other nucleic acid modifications, such as DNA or RNA methylation or base isomerization, can be detected with specific antibodies. By detecting nucleic acids with the method of the present invention in combination with the identification of such nucleic acid modifications, those skilled in the art can correlate the abundance of a particular nucleic acid with its modification at a certain spatial location.
[0100] In an optional embodiment of the present invention, specific nucleic acids of interest to the experiment can be excluded from analysis by adding a “cold oligonucleotide” to the nucleic acid-containing compartment before or simultaneously with the ssDNA oligonucleotide probe. The term “cold oligonucleotide” refers to an unlabeled oligonucleotide that has the function of competitively “neutralizing” unwanted nucleic acids. For example, the use of the probe of the present invention involves the addition of an oligonucleotide that lacks a flanking primer region and specifically hybridizes with highly abundant nucleic acids (e.g., rRNA or 7SK snRNA), but does not amplify in a post-hybridization step. As a result, the probe can no longer access and bind to the non-target nucleic acid. Because the cold oligonucleotide lacks a primer region, it is not amplified by PCR in a later step and consequently is not identified, for example, during sequencing.
[0101] Probes that do not hybridize with the target nucleic acid molecule or hybridize only with a portion of the homologous sequence, or optionally, cold oligonucleotides, are removed from the compartment in step (c) of the method of the present invention. Probe removal may include stringent washing with a suitable solvent. Non-hybridizing probes can be washed away from the compartment with a suitable buffer at any temperature from room temperature (20-25°C) to about 75°C. In a preferred embodiment, non-hybridizing probes are washed away from the compartment with about 40% formamide at a temperature of 45-49°C, preferably 47°C. Stringent washing may also be carried out using another buffer commonly used in hybridization experiments, such as saline-sodium citrate (SSC) or phosphate-buffered saline (PBS) buffer. The buffer may also include a surfactant, e.g., 0.1% Tween-20. Removal of unbound probes may include at least one washing step. The process may also include more than one washing step, for example two, three, four, or five, using the same or preferably different concentrations of washing buffer, for example, with increased stringency. In another embodiment, the non-hybridized probe may be removed from the compartment using an endonuclease that specifically degrades free ssDNA oligonucleotides, for example, exonuclease I.
[0102] If desired, hybridization of the ssDNA oligonucleotide probe to RNA can be microscopically confirmed prior to sequencing-based or PCR-based probes by targeting the hybridized probe with a second, fluorescently labeled FISH probe. The FISH probe binds complementaryly to the barcode region of the ssDNA oligonucleotide probe and can be visualized with a fluorescence microscope or an instrument suitable for detection of another fluorescent dye (Figures 4a and 4b). However, advantageously, imaging-based analysis of the probe's presence is not necessary (and preferably not performed) in the method of the present invention.
[0103] After removing non-hybridized ssDNA oligonucleotide probes in step (c) and before sequencing or amplification, oligonucleotide probes that are hybridized with nucleic acid molecules in the compartment are typically extracted from that compartment. Standard methods for DNA isolation may be used.
[0104] Next, amplification of the hybridized single-stranded DNA oligonucleotide probe can be performed. First, the hybridized probe can be released from the nucleic acid of the compartment, e.g., RNA, at a temperature of about 85°C, preferably about 95°C. Alternatively, denaturation of the nucleic acid-oligonucleotide probe complex can be achieved by alkaline hydrolysis, for example, using NaOH. If the probe has hybridized to an RNA molecule, probe release can be induced by the addition of an RNA denaturing enzyme, e.g., RNase H. Next, amplification of the probe can be achieved, typically by polymerase chain reaction (PCR) using primers that specifically anneal to each probe. The primers can be designed, for example, to hybridize to the universal primer region and / or target region of the probe. Preferably, the primers anneal directly to the probe that has hybridized complementaryly to the nucleic acid of interest, and not to any molecular tags that may be bound to the probe via a chemical linker (e.g., a photocleavable linker).
[0105] Amplification primers can be designed to include a sequencing adapter at the 5' end, such as an adapter sequence provided in Illumina-based NGS. Thus, the sequencing adapter can be added to the 5' end of the amplification probe during the PCR reaction. In another embodiment, the probe can first be ligated to the sequencing adapter before amplification. In such cases, the amplification primers can be specifically designed to anneal to these sequencing adapters to enable amplification of a readily usable sequencing library. In addition or alternatively, the amplification primers can encode additional information, such as a unique DNA sequence encoding the physical spatial location of a sample within a tissue, as previously performed in SLIDE-Seq and 10X Visium (Figure 5c). The additional sample information can also be included in barcode information (e.g., tissue type or patient identifier), enabling multiple samples to be sequenced together in the same sequencing run and assigned to the original sample after sequencing of the sequence reads.
[0106] Taq polymerase or a high-fidelity polymerase generally known to those skilled in the art, such as Q5, to minimize errors introduced by the polymerase (登録商標) , Phusion (登録商標) , Platinum (登録商標) or AccuPrime (登録商標) can be used. In another embodiment, the probe is not amplified prior to identification to prevent subsequent problems of identification due to replication bias or inaccurate amplification.
[0107] T7-mediated linear amplification can also be used as a means of probe amplification.
[0108] PCR amplification may be followed, preferably, by removing excess primers that would otherwise interfere with probe sequencing. Primer removal can be achieved, for example, by contacting the sample with a suitable exonuclease (e.g., exonuclease I) that effectively degrades excess single-stranded oligonucleotides while leaving the double-stranded complex intact. Thus, probes that hybridize to the target nucleic acid or its amplification product, or primers that anneal to the probe, remain in the sample and can be identified in subsequent steps.
[0109] Hybridized single-stranded DNA oligonucleotide probes are preferably identified by sequencing. The method disclosed herein utilizes single-stranded DNA oligonucleotide probes that can specifically hybridize to target nucleic acid molecules. In contrast to FISH-based detection methods, probes detected and identified by the method of the present invention do not require, and preferably do not require, imaging-based methods, such as microscopic methods. Rather, probes hybridized to target nucleic acids are identified, for example, by sequencing. Probes can be sequenced by any suitable method known in the art. A small set of selected probes can be sequenced, for example, by sequencing methods that use the incorporation of chain-stopped dideoxynucleotides during in vitro DNA replication (e.g., Sanger sequencing).
[0110] In a preferred embodiment, the probe is identified by next-generation sequencing (NGS). Because the oligonucleotide probes of the present invention are short in size, sequencing can be performed using commonly known short-read high-throughput NGS technologies, such as those offered by Illumina, Roche 454, Helicos, PacBio, SOLiD, and Complete Genomics. Depending on the specific NGS platform used, it may be necessary to load the sample to be prepared accordingly—i.e., the oligonucleotide probe—into a suitable sequencing adapter and then load the sample into a suitable flow cell. The sequencing adapter can be easily ligated to the end of each probe or added during PCR-based probe amplification to produce a sequencing library. Illumina, for example, prepares multiple kits for sequencing library preparation (e.g., Nextera Flex Library Prep Kit). A particular advantage of the present invention is that, because the method determines only the sequence of DNA oligonucleotides complementary to RNA, the method avoids the need to isolate RNA from a compartment and reverse transcribe it to cDNA for RNA detection. Both RNA isolation and reverse transcription are known to carry a risk of material loss and can introduce bias into the final analysis.
[0111] Sequencing provides a random number of probes per unique target location for the target nucleic acid, e.g., RNA species. The resulting sequence reads are then aligned to a probe reference map containing all probe sequence information, including probe-specific barcodes, primers, and / or UMIs. The probes can then be more accurately aligned to their genomic locations in the reference genome. Summary statistics can then be applied to the raw number of probes detected, for example, across the entire transcript or exon / intron.
[0112] In a preferred embodiment, after step c) and typically after extraction from the compartment of single-stranded DNA oligonucleotides that specifically hybridize with the nucleic acids in the compartment, in step d), single-stranded DNA-oligonucleotide probes that specifically hybridize with the nucleic acid molecules in the compartment can be identified in this order by probe amplification and probe sequencing.
[0113] In one embodiment, the entire length of the probe is sequenced. For example, sequencing may begin at the 3' or 5' end of the probe and continue to the opposite end. To reduce the cost of the method and increase throughput, the probe may be sequenced only partially, i.e., starting from the same starting position at the 3' or 5' end, the first 30 nt, first 35 nt, first 40 nt, first 45 nt, first 50 nt, first 55 nt, first 60 nt, first 65 nt, first 70 nt, first 75 nt, first 80 nt, first 85 nt, first 90 nt, first 95 nt, or first 100 nt of each probe may be sequenced. In the embodiment described in the Examples, sequencing begins at the 5' end, and each sequencing read has a length of 75 nt. Therefore, the sequencing read length of the method of the present invention is considerably shorter than that used in conventional RNA-Seq experiments (about 200 nt), thus saving monetary cost and time. When the probe contains a flanking barcode sequence, the sequencing read length can be further reduced to cover only a sufficient length of the flanking barcode to identify the target RNA. Incorporation of UMI sequences into the probe further facilitates the identification of artifacts such as duplication caused by PCR, and does not hinder the identification of RNA species present in the sample.
[0114] In another embodiment, paired-end sequencing may be performed, in which each probe is sequenced simultaneously from both its 3' and 5' ends, in order to improve sequencing sensitivity and detect duplication that may result from technical PCR bias.
[0115] Advantageously, in the method of the present invention, probe sequencing sequences at least a portion, and preferably all, of the region of the probe that hybridizes with the target nucleic acid, i.e., has a sequence complementary to the target region. This avoids the need for complex probe design, enables short probes, and allows for the quantification of specific segments of nucleic acids, for example, for the detection of alternative splicing events.
[0116] Despite becoming more affordable, sequencing costs still represent a limiting factor for many laboratories, especially when sample sizes are large or repeated analyses are required. Furthermore, interpreting sequencing results requires skilled labor and high computational power. The detection and quantification of selected and relatively small RNA subsets can be performed by more cost-effective and conventional methods that do not involve sequencing, such as Northern blotting or reverse transcription quantitative PCR (RT-qPCR). However, both techniques require the isolation and further processing of RNA, and therefore there is a risk of losing valuable material due to sample degradation. Northern blotting is also often insufficiently sensitive to detecting less abundant transcripts. RT-qPCR, in contrast, utilizes the reverse transcription of target RNA.
[0117] In another embodiment, the method of the present invention may also enable the identification of hybridized single-stranded DNA oligonucleotide probes by amplification, preferably by quantitative PCR, without requiring sequencing technology (Figures 5b, 6a, and 6b). Primers can be designed to specifically target the barcode sequence of the probe complementary to the specific gene of interest, and PCR, e.g., quantitative PCR, results in amplification. Various suitable protocols and reagents for qPCR analysis of nucleic acids are known in the art. For example, they are readily available from companies such as Sigma-Aldrich, Thermo-Fisher, and Promega. Oligonucleotide probe detection by PCR-based amplification therefore enables a rapid method for monitoring the expression of several selected genes in particular. Similarly, advantageously, the region of the probe hybridized with the target nucleic acid is amplified.
[0118] The method of the present invention further enables spatial mapping of the detected nucleic acid, such as RNA, within a compartment, and further, (i) Before step (b), section the compartment, in particular by freeze sectioning or cryopreservation, preferably by freeze sectioning to obtain a collection of fragments, thus separating nucleic acid molecules from one another depending on localization; (ii) Identify single-stranded DNA-oligonucleotide probes that specifically hybridize with nucleic acid molecules within each fragment in step (d), and thus determine the presence or absence of nucleic acids corresponding to the probes in each fragment; and (iii) Map nucleic acids into sections. Includes the process.
[0119] In this invention, spatial mapping is understood to mean the spatial localization or determination of the location of nucleic acids within a compartment. Existing methods used for spatial mapping of nucleic acids in cells or intracellular regions often employ FISH-based imaging methods that are either low-throughput or require expensive equipment. To date, spatial transcriptomes have utilized large-scale parallel sequencing of RNA-derived cDNA captured on pre-designed probe arrays, but their use is limited to tissue sections.
[0120] In contrast, the method of the present invention combines frozen ultrathin sectioning of individual compartments (e.g., single cells, cytoplasm, or cell nuclei) and direct detection of target nucleic acids by sequencing only ssDNA oligonucleotide probes hybridized with the nucleic acids within the compartments. Other types of sectioning that do not involve freezing of compartments may also be compatible with the present invention. For example, in one embodiment, the compartments may be formalin-fixed paraffin-embedded (FFPE), i.e., first fixed with formaldehyde to preserve the structural integrity of the compartment, then paraffin-embedded in a mass, and then sliced into different sections.
[0121] When thin frozen sections are cut through compartments as described herein, probes for transcription, translation, or, for example, RNA molecules that exert regulatory functions with common locality in the compartment are detected more frequently in the same section than probes for RNA molecules that function in different regions. In frozen sectioning, for example in the cell nucleus, the origin site of nascent premRNA transcripts can therefore be inferred by scoring the presence or absence of copies of the premRNA in the number of sections across individual nuclei (i.e., the abundance of probe oligos covering intron regions or exon-intron and intron-exon junctions). Similarly, when the cytoplasm of a single cell is frozen sectioned, the method can therefore infer, for example, the intracellular localization of where regulatory non-coding RNAs may predominantly exert their biological functions or where mRNA is translated into proteins (e.g., by selection of the apex of fibroblasts or nerve axons or dendrites). The method of the present invention can also provide information on the relative distances between individual nucleic acids, for example, RNA molecules, within the compartment space. Therefore, using the results of this analysis, it is possible to calculate the co-separation frequency of each RNA molecule relative to all other RNA molecules detected by the method of the present invention, for example, and create a matrix to estimate the relative distance between RNA transcripts.
[0122] Therefore, in one embodiment of the present invention, it is possible to analyze the entire section of a compartment using the method of the present invention and render a spatial mapping of nucleic acids in a single cell. However, this is not necessary, and the analyzed fragments can be sampled from multiple compartments, e.g., multiple single cells or nuclei, across the cell means of interest. Using the method of the present invention, preferably more than 180 fragments are analyzed for the presence or absence of a nucleic acid-targeting probe, and optionally for probe co-separation. For example, about 180 to about 10,000 fragments, preferably about 200 to 5,000, about 220 to 4,000, about 230 to 3,500, about 250 to 3,000, 300 to 2,000, or 500 to 1,000 fragments may be analyzed, where these fragments may be obtained from a single nucleic acid-containing compartment or multiple nucleic acid-containing compartments.
[0123] Investigations of the spatial distribution of nucleic acids in a compartment can be achieved through statistical analysis to determine spatial proximity, and can be complemented by analysis of nucleic acid cosegregation (e.g., Weibel, ER, 1979, Stereological Methods: Practical Methods for Biological Morphometry. Vol. 1 Academic Press, London, UK; Weibel, ER, 1980, Stereological Methods: Theoretical Foundations. Vol. 2. Academic Press, London, UK). This is particularly interesting because, for example, close spatial proximity between two RNA molecules with sequence similarity may be an indication of a common transcriptional origin between the two RNAs. The two RNA molecules could be, for example, unknown splice variants or isoforms derived from a common gene. In contrast, close spatial proximity between two clearly different RNA molecules and / or species in a nuclear fragment indicates that two independent genes were expressed at similar time points at close distances from each other, which in turn suggests that both genes share common regulatory elements. Depending on the context and exact sequences of the two RNA molecules, close proximity may also suggest regulatory interactions between them. Co-separation of viral RNA molecules and small RNAs in a common cytoplasmic fragment may further indicate, for example, an active RNAi response. Analysis of RNA molecule co-separation by statistical methods can therefore contribute not only to the spatial mapping of multiple RNA molecules but also to providing new insights into the transcriptional and post-transcriptional relationships of RNA and antiviral host responses. Furthermore, co-detection of two or more RNA species in a single cell, cell population, or tissue section (e.g., tissue sampling via sampling in a square or other form with RNA contents of equal size) can be used to reconstruct the cell type contents and cell-to-cell spatial relationships of composite tissues. The statistical methods used in the methods of the present invention may be, for example, inferential statistics.
[0124] In one embodiment, the location of a nucleic acid is determined when a probe for the nucleic acid is detected more frequently than a preset threshold, for example, to reduce mismapping of nucleic acid molecules to a region due to fixed artifacts or RNA / DNA contamination. The threshold may need to be adjusted according to the predicted abundance of the nucleic acid in the compartment. The location of nucleic acids, such as RNA, particularly extremely rare nucleic acid species whose detection frequency should be below the predicted threshold, can be verified when copies of the nucleic acid are repeatedly detected in the same or similar compartmental fragments in various biological replicas. The method may also distinguish between two nucleic acids, for example, RNAs, by integrating cosegregation data and sequence complementarity data, to distinguish between randomly occurring nucleic acids that may have RNA-RNA interactions and physiological functions.
[0125] The determination of the presence or absence of RNA may also include the determination of the amount of RNA.
[0126] Information on nucleic acid localization, such as RNA localization, can be tested, for example, under disease conditions or environmental changes. For instance, the spatial distribution and abundance of RNA can be tested by comparing cells from a healthy donor with corresponding cells from a donor with a medical condition, such as a human cancer patient. Comparisons can also be performed with various cell populations within tissues, such as all neurons versus glial cells in brain tissue or tumor cells versus healthy cells in a tissue biopsy sample.
[0127] For the analysis of the spatial mapping results in the present invention, the methods described in WO2016156469 or Beagrie et al., 2017 Nature 543(7646), 519-524 may be used in a corresponding manner, for example, by the statistical methods described herein.
[0128] Spatial mapping of nucleic acid molecules such as RNA or ssDNA, combined with the detection of additional biological molecules within a compartment, can provide novel insights into their interactions with other important processes in cells, such as gene expression.
[0129] Therefore, the method of the present invention further enables the detection and spatial mapping of nucleic acids, preferably RNA, combined with the detection of at least one DNA locus within a compartment. - The presence or absence of at least one DNA locus in each fragment is optionally determined by sequencing, preferably by next-generation sequencing; and - Determine the cosegregation of a single-stranded DNA oligonucleotide probe that specifically hybridizes with at least one DNA locus and nucleic acid, preferably RNA. Includes additional processes.
[0130] A locus is a specific location on a gene, DNA sequence, or chromosome. Therefore, a locus is, for example, the location of a protein-coding gene, a gene transcribed into a non-protein-coding RNA molecule, a pseudogene, an enhancer region, a transposition factor, a repetitive sequence, or any DNA sequence whose function is unknown or non-functional. In a preferred embodiment, at least one DNA locus is a genomic DNA locus. For example, the genomic DNA of a eukaryotic cell nucleus is organized as chromatin, a compressed and dense structure consisting of DNA enclosing protein complexes formed by histones.
[0131] Since RNA molecules originate from transcription genes, in one embodiment, the detection and spatial mapping of RNA in a compartment can be combined with the detection of the gene of interest or a specific portion thereof. This can be achieved, for example, by isolating and sequencing both genomic DNA and ssDNA oligonucleotide probes that bind to RNA from the same nuclear fragment in parallel and identifying loci (e.g., genes) that co-segregate with the probes. In another embodiment, the detected probes may be analyzed for co-segregation with regulatory DNA loci, such as enhancer regions, rather than with genes or specific regions thereof. Co-segregation of genomic DNA loci and the DNA of interest (e.g., viral ssDNA or R-loop ssDNA) can be determined in a similar manner.
[0132] Preferably, nucleic acid detection and spatial mapping are combined with the detection of more than one genomic DNA locus, e.g., 2, 5, 10, 50, 100, 500, or 1000 or more genomic DNA loci. In certain embodiments, RNA detection and spatial mapping are combined with analysis of the entire genome.
[0133] One of the main areas of interest explored by the inventors is the understanding of gene expression regulation and the interaction of genome architecture. Chromatin exists in both interacting and non-interacting states. Examining the structural properties and spatial organization of chromatin is crucial for understanding and evaluating gene expression regulation. A method known as optical reconstruction of chromatin architecture (ORCA), developed by Mateo et al., 2019, (Visualizing DNA folding and RNA in embryos at single-cell resolution. Nature 568, 49-54), uses oligopaint probes to three-dimensionally reconstruct genomic tissue in regions of approximately 100-700 kb. In this study, ORCA was used to test the expression of 30 RNAs and combined with single-molecule RNA-FISH to correlate them with local chromatin tissue. Therefore, existing methods such as ORCA are limited to capturing information from relatively short genomic regions and detecting only small subsets of RNA, or they utilize a vast number of hybridization and imaging steps.
[0134] The inventors have overcome these limitations by combining the high-throughput RNA detection of the present invention with a recently developed innovative sequencing-based method for three-dimensional structural analysis of DNA in entire compartments, such as single cells, nuclei, or organelles. The method has been named Genome Architecture Mapping (GAM) by the inventors. A detailed description of the GAM method is provided in Beagrie et al., 2017 Nature 543(7646), 519-524 and WO2016156469. Briefly, GAM calculates the spatial proximity of multiple DNA loci by determining co-separation between compartments, preferably nuclear fragments, using next-generation sequencing and statistical methods of genomic DNA. As a result, GAM enables a thorough analysis of high-dimensional chromatin interactions, including genome-wide binding site identification in an unbiased manner, and thus provides a detailed map of the genome architecture. GAM also enables complete genome haplotype reconstruction (Markowski et al., 2020, GAMIBHEAR: whole-genome haplotype reconstruction from Genome Architecture Mapping data. bioRxiv 2020.01.30.927061, also published as Markowski et al. (2021) GAMIBHEAR: whole-genome haplotype reconstruction from Genome Architecture Mapping data. Bioinformatics 19, 3128-3135.), which, combined with the present invention, enables allele-specific gene expression, i.e., testing of the mechanism by which gene variants regulate gene expression.
[0135] The combination of RNA detection and GAM analysis by the method of the present invention allows for sectioning of compartments, such as cell nuclei, into ultrathin fragments, preferably by, for example, freeze-sectioning. Single nuclear profiles (NPs) (or other nucleic acid-containing compartments, such as mitochondrial profiles) are isolated from these fragments, for example, by laser microdissection. From each profile, both genomic DNA and RNA-binding DNA oligonucleotide probes, retained in stringent post-wash compartments, are isolated and PCR-amplified sequentially or, preferably simultaneously. The probes and genomic DNA are then indexed separately, and two different sequencing libraries may be produced (one containing oligonucleotide probes and the other genomic DNA). The two libraries can then be pooled together and sequenced (Figure 3b). Thus, the entire sample to be tested by the method of the present invention can produce two independent sequencing files. The reads recovered from oligonucleotide probes can be quantitative as proxies of the relative abundance of RNA species in the original RNA-containing compartment and can be used to cluster individual compartment fragments before deconvolving the cell type / state-specific three-dimensional genomic topology from the genome reads. The co-separation of detected RNA molecules and genomic loci in each fragment can be determined. Thus, parallel RNA detection and genomic DNA from individual NPs allow for testing of the interaction between chromatin topology and gene expression regulation. Modifications to the above workflow are also possible. For example, the compartments can be any compartment containing both DNA and RNA, e.g., mitochondria or chloroplasts or prokaryotic cells. The fragments may also originate from multiple compartments, and the co-separation of RNA molecules and DNA loci can be compared for each corresponding fragment.
[0136] Preferably, the coselation frequencies of multiple genomic DNA loci in different compartments, for example, nuclear fragments, can be analyzed to determine specific chromatin interactions, relative and absolute distances between loci, and the radial position of loci within the nucleus. The resulting information can then be used to infer the chromatin architecture and topology of the compartment, e.g., the nucleus, and to determine, for example, the proximity of gene promoters and distal enhancer regions. Simultaneously, the relative distance between DNA loci and detected RNA molecules can be inferred by scoring the presence or absence of several sections across individual nuclei. As a result, the coselation frequency of each detected RNA molecule for any DNA locus can be calculated to create a matrix of inferred relative distances between RNA and genomic loci. The statistical methods used for analyzing the coselation of RNA molecules and DNA loci correspond to the methods used for analyzing the coselation of DNA loci or different RNA molecules in the GAM analysis described above.
[0137] Information on gene expression related to genome architecture can be explored using several different methods: In one embodiment, the genome architecture can be directly correlated with the spatial mapping of expressed RNA to determine the influence of chromatin topology and architecture on transcription, RNA processing, and mRNA expression. For example, when GAM analysis provides evidence of specific promoter-enhancer associations, the present invention can indicate whether these associations occur concurrently with increased transcriptional output. A previous study using ORCA surprisingly showed that promoter proximity to enhancers was weak but present at active gene locations characterized by high levels of nascent RNA (Mateo et al., 2019). However, this study was limited to relatively short genomic regions and a few genes in the model Drosophila melanogaster embryos. The present invention may extend such studies to entire genomes and mammalian or plant cells.
[0138] In another embodiment, the method may also demonstrate how changes in gene expression may affect chromatin topology. For example, the detection of nascent eRNAs expressed from enhancer regions may be used to test the eRNA's ability to induce alterations in chromatin organization, such as its ability to position its corresponding enhancer near a target promoter.
[0139] In yet another embodiment, gene expression information can be used as a readout for changes in cell state and their impact on genomic structure and gene location. A well-studied example of cell state is the cell cycle. Various cell cycle stages are associated with dramatic changes in the overall genomic structure (Nagano et al. 2017, Cell-cycle Dynamics of Chromosomal Organization at Single-Cell Resolution. Nature, 547(7661):61-67). By targeting RNA of important cell cycle marker genes, laser microanatomical nuclear slices from intact tissue can be conveniently classified based on the cell cycle time, thus improving the identification of cells at specific stages of the cell cycle while maintaining spatial resolution. Similarly, RNA detection in the method of the present invention can facilitate the identification of specific cell types or stem cell differentiation states and enable testing of associated characteristic chromatin rearrangements.
[0140] Depending on the degree of DNA compression around histones, chromatin can be organized into more or less condensed, higher-dimensional structures. In its less condensed form, DNA surrounds histone proteins, forming nucleosomes that can detach uniformly from one another, thus mimicking a set of unfolded beads on a thread. Such loosely packed chromatin is called euchromatin. Euchromatin is associated with more active transcription of genes into RNA because genes are readily accessible by transcription mechanisms and auxiliary transcription factors. In contrast, heterochromatin refers to a highly condensed, tightly packed chromatin structure in which individual nucleosomes are packaged in thick, higher-dimensional chromatin fibers. Due to its high degree of compression, heterochromatin is not sufficiently accessible to transcription mechanisms and therefore remains transcriptionally silent. While centromeres and telomeres are usually in an immutable heterochromatic state, chromatin condensation in other chromosomal regions is a highly dynamic process regulated by post-translational modifications of histone proteins, which may be epigenetically inheritable. GAM can also provide information on chromatin condensation status at the tens of kilobase or megabase scale and on a whole-chromosome scale (Beagrie et al., 2017). Therefore, by combining the method of the present invention, which is a specified oligo-seq, with GAM, it may be possible to further correlate RNA expression with chromatin condensation and remodel on a large scale (approximately 300 kb or more).
[0141] Those skilled in the art will recall numerous additional applications and uses of the combination of GAM and oligo-seq analysis. Any such applications and uses are herein included in the scope of the present invention.
[0142] Table 1 summarizes the unique advantages of combining oligo-seq and GAM analysis.
[0143] [Table 1]
[0144] Most or all possible applications of the combination of GAM and oligo-seq evaluation can be combined with a single multi-omics experiment to maximize the amount of information obtained through the method of the present invention.
[0145] The methods for detecting nucleic acids such as RNA disclosed herein are not limited to the detection of genomic DNA loci. In another embodiment, RNA / DNA detection and spatial mapping can also be combined with the detection of DNA from exogenous sources, such as viruses. Viral DNA may be single-stranded or double-stranded and may be present in the cytosol of infected cells or integrated into the genome of host cells. It may be present in some cells of a tissue and absent in others, or in the same cell type, but with altered cellular physiology (cellular state). Co-separation of RNA molecules and viral DNA in compartmental fragments can provide information on the extent, localization, and consequences of viral replication and transcription in intracellular, cell- or tissue-level cell interactions.
[0146] In yet another embodiment, at least one DNA locus that can be detected in combination with RNA may be a DNA oligonucleotide probe that labels a non-nucleic acid biological compound present in the compartment. For example, the DNA oligonucleotide probe may be conjugated to the ssDNA tail in a guide RNA used with an antibody that specifically recognizes the protein of interest or with Cas9 labeling of one or more loci.
[0147] Therefore, the method of the present invention can also be used to combine RNA detection and spatial mapping with the detection of at least one protein or post-translational modifications of said protein. - A compartment and a ligand labeled with an oligonucleotide probe that can specifically bind to at least one protein or post-translational modification, for example, an antibody, are brought into contact with it, and - Determine the cosegregation of the ligand-labeling ligand, the ligand-labeling ligand, and the single-stranded DNA oligonucleotide probe that specifically hybridizes with nucleic acids such as antibodies and RNA. This includes the following.
[0148] In a preferred embodiment, at least one protein is detected by the method. The at least one protein detected in combination with a nucleic acid such as RNA may be, for example, a protein directly encoded by the RNA detected by the method of the present invention. It may also be a protein whose biosynthesis is suspected or known to be regulated by the detected RNA. The protein may further be one suspected or known to interact with a nucleic acid detected by the method of the present invention. Thus, it may be a member of a protein-RNA or protein-DNA complex. However, the protein may also be a protein that has not been previously associated with a particular nucleic acid. The protein may be endogenously expressed within the compartment or introduced from an exogenous source, for example, by transfection. It may also be the result of viral gene expression. If desired, more than one protein may be detected in combination with the nucleic acid in the compartment. For example, at least two, at least five, at least ten, at least 25, at least 50, or at least 100 proteins may be detected in combination with the nucleic acid. The detection of multiple proteins can be used to assess cell physiology / state and help associate specific chromatin conformations (enhancer-promoter contacts) with transcription and protein expression or post-translational modifications.
[0149] Ligand-probe conjugates, such as antibody-probe conjugates, can be brought into contact with the compartment after immobilization or vitrification. Preferably, contact occurs with sectioning, for example, with a section of the compartment after freeze-sectioning. However, for example, detection of gene-encoded probes or Frankenbodies can also be brought into contact with the compartment before sectioning to detect nascent or mature proteins (Zhao, N., et al. 2019, A Genetically Encoded Probe for Imaging Nascent and Mature HA-tagged Proteins in Vivo. Nat. Commun. 10(10: 2947.)).
[0150] Parallel detection of ssDNA oligonucleotide probes hybridized with complementary nucleic acid molecules and DNA probe-labeled ligands such as proteins-bound antibodies can be achieved, for example, in a manner similar to the above-described combination of RNA and DNA locus detection. Briefly, after stringency washing, ssDNA oligonucleotide probes hybridized with complementary nucleic acids can be isolated, for example, from whole eukaryotic cells, fragments of compartments from the nucleus or cytoplasm, such as frozen sections. In parallel, DNA oligonucleotide probes conjugated with ligands such as proteins-bound antibodies can be released from the ligands using, for example, a suitable chemical solvent (e.g., dithiothreitol (DTT)-containing salt buffer) and similarly isolated from each fragment. The two DNA oligonucleotide probe collections thus obtained can be optionally PCR-amplified and separately indexed to create two different sequencing libraries (one for the nucleic acid to be detected and the other for the protein to be detected). The two libraries can then be pooled together and sequenced. Thus, all samples to be tested by the method of the present invention can produce two independent sequencing files. The physical proximity of nucleic acids and proteins can be inferred by evaluating the co-separation of probes that detect fragmentary nucleic acid molecules and probes that detect proteins. Therefore, the combination of RNA and protein detection, as well as the determination of their overall and relative spatial distribution within a compartment, using the method of the present invention, allows for conclusions about, for example, the rate and level of RNA translation into protein. The extreme proximity of specific nucleic acids and proteins further indicates the possibility of interaction or formation of biologically related RNA / DNA-protein complexes. Therefore, simultaneous detection of protein epitopes and / or post-translational modifications (PTMs), such as histone modifications or transcription factor phosphorylation, at the transcript level with spatial resolution expands the amount of readout information from biological samples beyond gene expression signatures to include information related to cellular identity. This method can be applied, for example, to patient-derived samples for early disease diagnosis.
[0151] Apart from directly sequencing the oligonucleotide probe conjugated with the ligand as described above, the method of the present invention may also include sequencing a free oligonucleotide probe that can specifically hybridize with the ligand-conjugated nucleic acid tag, such as an antibody or another affinity protein reagent, preferably an immunoconjugate. In such embodiments, the nucleic acid tag itself is therefore not directly sequenced, but rather serves as a sequence-specific hybridization junction for docking with a complementary oligonucleotide probe present in the library to convert the level of the target epitope into a DNA read. Thus, the hybridization of the oligonucleotide probe occurs after antibody binding. After in-situ probe hybridization to a nucleic acid-labeled antibody, the readout sequence can be amplified and prepared for NGS using a pipeline similar to that described above.
[0152] Therefore, this embodiment of the method of the present invention lies at a junction between existing approaches to multiplexed immunodetection, as it uses the hybridization principle without the production of cDNA intermediates, in combination with the direct detection of hybridized oligonucleotide probes by sequencing. A key advantage of quantifying antibody binding, for example, via sequencing of hybridized oligonucleotide probes, over other sequencing-based approaches that directly detect the conjugated nucleic acid tag itself, is that it provides flexibility in quantification due to the open-ended design and compatibility of readout probes, which depend on engineering applications for defining multiple panel conjugates or combinations thereof. Thus, it greatly alters the technical constraints associated with the repeated need for irreversible conjugation of various reporter nucleic acid tags to antibodies. Furthermore, in contrast to a series of imaging approaches such as CODEX, IBEX, and seqFISH+IF (Takei et al., 2021, Integrated spatial genomics reveals global architecture of single nuclei. Nature, 590, 344-350), which serve as hybridization platforms for multiple successive hybridizations, imaging, and fluorescently labeled readout probes of antibody-bound short nucleic acid tags, a single hybridization step is performed in parallel for multiple antibody conjugates, thus reducing time and cost, and, importantly, ensuring sample integrity and antibody binding preservation.
[0153] Preferably, the detection of proteins and / or their PTMs within the above compartments is carried out in parallel with the detection of another nucleic acid (e.g., RNA) or GAM. However, it is naturally understood that the method of the present invention can detect nucleic acid-tagged compounds in compartments without the need for parallel detection of another nucleic acid.
[0154] In another embodiment, parallel detection and mapping of RNA and at least one protein may be further combined with, for example, detection and mapping of at least one DNA locus via GAM. This approach has the ability to correlate both transcriptional and translational regulation with chromatin organization. Thus, it enables the reconstruction of genome architecture associated with specific cell types or states defined by relative protein and / or RNA contents. Furthermore, it may be shown to be useful for associating 3D genome characteristics with specific nuclear landmarks (thin membranes, nuclear structural components, histone modifications) or events (e.g., nascent gene expression, splicing).
[0155] Gene expression levels also depend on the local accessibility of chromatin to transcriptional mechanisms. Chromatin accessibility is related to the extent to which transcription factors can bind to DNA loci to promote or repress gene transcription, for example. While GAM analysis of genome structure has allowed conclusions to be drawn about large-scale chromatin condensation and structural organization, other methods such as ATAC-Seq are more specialized for determining chromatin accessibility at the single-nucleosome level of transcriptional mechanisms across the genome (Buenrostro, JD, Wu, B., Chang, HY, & Greenleaf, WJ, 2015, ATAC-seq: A Method for Assaying Chromatin Accessibility Genome-Wide. Current protocols in molecular biology, 109, 21.29.1-21.29.9). Therefore, ATAC-Seq represents a useful tool for mapping transcription factor binding sites.
[0156] The detection of nucleic acids, preferably RNA, according to the present invention can also be combined with the analysis of local chromatin accessibility within a compartment. (i) Prior to step (b), the compartments are freeze-sectioned or cryopreserved to obtain a collection of fragments, preferably by freeze-sectioning, so that nucleic acid molecules can be separated from each other depending on localization; (ii) Isolate the genomic DNA from each fragment; (iii) Simultaneously, the genomic DNA isolated from each fragment is fragmented and tagged to produce an ATAC-Seq library; (iv) Purify and optionally amplify the ATAC-Seq library; (v) Determine the status of local chromatin accessibility for a locus of genomic DNA; and (vi) Analyze the presence and / or abundance of mapped nucleic acids, preferably RNA, and correlate them with chromatin accessibility at loci of genomic DNA. Includes the process.
[0157] The transposase-accessible chromatin assay using high-throughput sequencing (ATAC-Seq) is an NGS-based method suitable for evaluating the regulatory landscape of chromatin in the cell nucleus. ATAC-Seq utilizes the activity of a hyperfunctional Tn5 transposase. Transposases naturally catalyze the movement of transposition factors from one genomic site to another via a cut-and-paste mechanism. During ATAC-Seq, the NGS adapter is focused on the Tn5 transposase. The packed transposase then accesses the DNA in the chromatin region of euchromatin, cleaving (fragmenting) the DNA and simultaneously ligating the NGS adapter to these chromatin fragments (tagmentation) to produce an ATAC-Seq sequencing library. The library can be purified, amplified by PCR using barcoded primers, and subsequently analyzed by qPCR or NGS. ATAC-Seq is typically performed with approximately 25,000–75,000 cells.
[0158] The detection of local chromatin accessibility by ATAC-Seq can be combined with nucleic acids, preferably RNA, detected by the method of the present invention. For example, genomic DNA isolated from cells or cell sections can first be subjected to ATAC-Seq to determine the chromatin accessibility status of various nuclear DNA loci. Then, changes in chromatin accessibility can be tested for their effect on local transcriptional activity by evaluating the presence or absence of RNA mapped to a certain locus.
[0159] In a preferred embodiment, the ATAC-Seq sequencing library and the hybridized ssDNA oligonucleotide probe library are produced from the same compartment and sequenced in a single sequencing run.
[0160] By evaluating treated cells at different states or time points, this method also helps determine, for example, whether increased RNA levels at a genomic locus are a result of increased transcription due to changes in local chromatin accessibility, or vice versa, or whether expressed RNA molecules affect chromatin accessibility as previously suggested in Kaenorhabditis elegans (Fields et al., 2019, Chromatin Compaction by Small RNAs and the Nuclear RNAi Machinery in C. elegans. Scientific reports 9). As a result, the method of the present invention enables the deciphering of chromatin accessibility and gene expression dynamics.
[0161] This invention also relates to the method of the present invention. (a) Determination of gene expression in a single cell, cell group, or intracellular compartment; (b) Identification of RNA isoforms and allele-specific variants within the compartment; (c) Quantification of gene transcription; (d) Identification of the cell type and state of complex heterogeneous tissues; (e) Identification of endogenous and exogenous dsDNA and ssDNA within the compartment; (f) Mapping of nucleic acid, e.g., RNA, positions within the compartment; (g) Mapping of RNA and nucleic acid locus locations in the compartment; (h) Mapping of nucleic acid, preferably RNA and protein positions in the compartment; and / or (i) Mapping of nucleic acids, preferably RNA, proteins, and nucleic acid loci in the compartment It also provides use for that purpose.
[0162] For example, gene expression can be determined in a single cell or a group of cells illuminated by the surrounding tissue, optionally in at least one, e.g., several, cells of a specific cell type within a particular tissue. The analysis may then involve comparing various cells, e.g., different cell types. The present invention therefore enables the spatial transcriptome of a cell within a 2D or 3D context of tissue. Mapping can also be at the tissue / organ level and across different species, e.g., a population of various individuals of various single-cell organisms or various prokaryotes.
[0163] The present invention further provides a method for diagnosing a disease associated with the misexpression of one or more genes and / or a different transcription profile in a patient, comprising: identifying and / or analyzing the presence of nucleic acids, particularly RNA, in a sample taken from the patient to obtain a patient-specific transcription profile; and comparing the patient-specific transcription profile with the transcription profile of a subject already diagnosed with the disease, wherein the transcription profile is preferably also compared with the transcription profile of a healthy subject. Alternatively, the transcription profile may be compared with a specific cell subgroup that may originate from the same patient, e.g., tumor cells and normal tissue, preferably normal tissue of the same cell type as the tumor tissue.
[0164] Since the present invention can be used to investigate gene expression disorders in patients, such as the overexpression of oncogenes associated with tumorigenic cell proliferation, it can also contribute to the treatment of patients with diseases associated with gene expression disorders, such as cancer.
[0165] In summary, the present invention provides a highly sensitive sequencing or amplification-based method for nucleic acid detection in a sample. In contrast to known RNA sequencing methods, the method of the present invention does not utilize RNA isolation or cDNA production as essential for RNA detection. Thus, it overcomes the challenges associated with bias introduced during rapid RNA degradation or reverse transcription. Instead, the method presented here sequences ssDNA oligonucleotide probes that complementary hybridize with nucleic acids within tissues, cells, or organelles.
[0166] Table 2 summarizes the differences between the present invention's method and state-of-the-art methods for specified oligo-seq.
[0167] [Table 2]
[0168] In recent studies, Marshall et al. proposed a similar concept during the development of HyPR-Seq. However, the present invention offers a series of decisive advantages over HyPR-Seq: First, the specificity of this method depends most on the length and number of probes used, as well as the selected incubation time, temperature, and stringency wash. HyPR-Seq uses two initial probes that are 25 nt long. The target recognition sites of the probes used in the present invention are preferably at least about 32 nt long, and therefore exhibit high specificity. Second, HyPR-Seq is a much more time-consuming and labor-intensive method because it involves eight different sequential hybridization and washing steps before amplification and sequencing. In comparison, oligo-seq involves only a single hybridization step followed by stringent washing. As the developers of HyPR-Seq acknowledge, multiple rounds of hybridization and washing can lead to considerable cell loss. Therefore, HyPR-Seq experiments should be started with at least 1 million cells. In contrast, the method of the present invention can be successfully implemented in a single compartment representing 1 / 20th of mammalian cells. Furthermore, the numerous washing steps and harsh ethanol permeation of the HyPR-Seq protocol can affect the structural integrity and protein contents of nucleic acid-containing compartments. Therefore, HyPR-Seq may not be compatible with other multi-omics techniques. The method of the present invention also requires only single-layer oligonucleotide hybridization and therefore does not utilize amplification of primary RNA-DNA hybridization events by additional secondary or tertiary oligonucleotide hybridization, compared to HyPR-Seq. Thus, oligo-seq is capable of parallel identification of a considerably larger number of nucleic acids than HyPR-Seq or, incidentally, most other FISH probe-based detection methods.
[0169] Table 3 also summarizes the differences between the present invention's method and HyPR-Seq.
[0170] [Table 3]
[0171] The method of the present invention effectively and accurately provides information on, for example, gene expression, even when handling very small amounts of starting RNA material. Therefore, it is particularly suitable for testing gene expression at the single-cell or intracellular level. Combined with tissue or cell sectioning, the method of the present invention further enables monitoring of the spatial distribution of nucleic acids relative to other biomolecules within a compartment, thus providing a highly sensitive technique suitable for conduction space transcriptomes. Nucleic acid detection according to the present invention does not utilize any microscopic or imaging technology; rather, it provides transcriptome data using sequencing. Therefore, it can be readily combined with other sequencing-based methods known in the art in large-scale multi-omics studies.
[0172] Throughout this invention, the terms “about” and “approximately” shall be interpreted as “±10%.” Where “about” or “approximately” relates to a range, it refers to both the lower and upper limits of the range. The singular expression means “one or more” unless otherwise specified.
[0173] All references cited herein are fully incorporated herein. The present invention will be further illustrated by the following embodiments, but will not be limited thereto. [Brief explanation of the drawing]
[0174] [Figure 1]Schematic diagrams of probe designs, including the designs (designs A and B) and oligo design C, used in the experiment below. Universal primers (UPs) are 20 nucleotides and are shared by all probes in the library. UP1, UP2, and UP3 represent different universal primer sequences. Unique molecular identifiers (UMIs) are 6-15 random nucleotide stretches that identify unique molecules in sequencing and are counted only once if repeats are found by PCR amplification. Homology regions (HRs) are sequences that target a unique nuclear sequence of the target molecule of interest. Each probe in the library has a unique homology region. The 5' barcode B1 used in the examples is specific to all probes targeting the same gene, and the 3' barcode B2 is specific to all exons or introns targeting the same gene.
[0175] [Figure 2] General characteristics of the probe designs: (a) A list of the 66 genes included in probe design A (OL66) and the number of targeting oligos per gene, in exons or introns. Hotair, Malat1, Xist, Neat1, and Firre encode long non-coding RNA, while other genes encode messenger RNA. (b) Histograms (Y-axis) and the number of targeting oligos per gene (X-axis) of the 1823 genes included in probe design B (OL1823). All genes encode messenger RNA.
[0176] [Figure 3] Schematic diagram of the method of the present invention. (a) RNA detected in the compartment. Optional verification of specific probe hybridization by fluorescence in situ hybridization (FISH) is shown in the dotted rectangle. (b) Combination of RNA detected in the compartment and parallel DNA detection using the GAM process.
[0177] [Figure 4]Probe library hybridization using RNA-FISH and validation of hybridization specificity using RNAe treatment before hybridization. (a) The upper panel shows hybridization of ultrathin (200 nm thick) frozen sections of mESC (clone F123) with the designed RNA-GAM probe (concentration 0.5 μM). The lower panel shows hybridization performed in parallel with frozen sections pretreated with 0.25 mg / ml RNAe A for 2 hours. The left panel shows DNA staining with DAPI for nuclear slice identification. The right panel shows the probe signal detected after secondary hybridization with an oligo targeting the universal sequence of the ssDNA probe (bridge) of the frozen section, followed by annealing with an AlexaFluor647 conjugate oligo hybridizing with the bridge. Scale bar, 10 μm. (b) Schematic diagram of fluorescence-based detection of the hybridized oligo-seq primary probe.
[0178] [Figure 5](a) A simplified overview of the PCR steps used for sequencing the ssDNA probes used in the examples. Probe design A shown is the same as in Figure 2a. For probe design B, the same design is applied with a P5 adapter that is phasic to UP1. Illumina adapters P5 and P7 are complementary to the universal primer regions (UP1 and UP2, respectively) at the ends of each probe. (b) A schematic overview of the qPCR steps used for amplifying ssDNA probes of design A-specific subsets, such as probes that are specific to a single gene or a single region within a specific gene. The probe designs are the same as in Figure 2a. Probes are extracted from the compartment of interest. The probes are then universally amplified with primers UP1 and UP2, and various samples are normalized by DNA concentration before qPCR. Relative abundances can then be determined for all probes targeting the same gene by amplification of the region between the 5' B1 barcode and 3' UP2 with primers B1 and UP2. Alternatively, the gene region can be specifically amplified by amplifying the probe region between B1 and B2 with primers B1 and B2. (c) A simplified overview of the PCR step for sequencing ssDNA probe design A (OL66) used in the example. The probe design is the same as in Figure 2a. For probe design B, the same design is applied with a P5 adapter homologous to UP3 and a P7 adapter for UP2. The primers / secondary oligos include additional information (I1 / I2) which can be used, for example, to encode the sample position in the 2D space of the tissue. Individual samples may also be barcoded with a unique DNA sequence using secondary / oligos containing P5 and P7 sample-indexing barcodes and directly onto the universal primer region at each probe end.
[0179] [Figure 6](a) qPCR results of oligoprobe amplification from 100 NP samples of mESC-F123(+), XEN cells(●), and RNAe-treated mESC-F123(▲) after hybridization in frozen sections using probe design A(OL66). The Y axis represents the number of quantitative cycles (Cq) required for SYBR-Green signal detection. Expression of six genes was performed in independent repeat samples (100 NP each). Sox2 and Oct4 genes are pluripotent genes specifically expressed in mESC-F123, Sox17 and Gata6 genes are specifically expressed in XEN cells, Malat1 is a long non-coding RNA that is abundantly and ubiquitously expressed in both mESC and XEN, and the BDNF gene is a neuronal gene that is not expressed in either mESC or XEN. -1 and -2 represent technical replicas 1 and 2. (b) qPCR results of whole-cell oligoprobe hybridization in solution using probe design A(OL66). mESC(F123) cells were trypsin-treated, hybridized with oligoprobe OL66, and washed. Binding probe contents were extracted from 1-20 cells and amplified (+). The amplified probe was purified and normalized to 2.5 ng / μl. As a background control, the probe stock was normalized to 2.5 ng / μl (●). The Y-axis represents the number of quantitative cycles (Cq) required for SYBR-Green signal detection. Sox2 and Oct4 genes are pluripotent genes specifically expressed in mESC-F123; Sox17 and Gata6 genes are specifically expressed in XEN cells; Malat1 is a long non-coding RNA abundantly and ubiquitously expressed in both mESC and XEN cells; and the Bdnf gene is a neuronal gene not expressed in either mESC or XEN. Specific detection beyond probe background was observed for mESC-expressed genes, Sox2, Oct4, and ubiquitously expressed Malat1. -1 and -2 represent technical copies 1 and 2, respectively.
[0180] [Figure 7]The number of hybridized probes at each homologous target site across the Sox2 gene in a 100 NP sample from mESC or XEN cells. Sox2 is expressed in mESCs but not in XEN cells. The probe library (design A) contains only 17 oligonucleotides that target the Sox2 gene across its coding region, arranged from left to right in genomic order from transcription start site (TSS) to transcription end site (TES) (positions 1-17). The Y-axis represents the number of probes targeting unique target sites.
[0181] [Figure 8] Parallel detection of RNA and DNA present in RNA-GAM. Ultrathin frozen sections from mESC and XEN cells were washed, laser microdissectioned, and hybridized with oligonucleotide probes and pre-amplification oligonucleotide probes (design A, OL66) for genomic DNA. After separate library preparation from the oligo probes and genomic DNA, the samples were pooled and sequenced.
[0182] (a) Genome browser tracks of raw sequencing data from oligonucleotide probes in a 100NP sample. The squares below the genome browser sequencing read tracks represent the exon locations where, as expected, more abundant probes hybridize due to the rapid splicing and degradation of introns. The tracks show the number of probes mapped to the probe locations on the probe reference map, with all probe references arranged in linear genomic order. All four tracks were autoscaled together. Three genes specifically expressed in mESCs, the housekeeping genes Ywhae, Oct4, and Gata4, specifically expressed in XEN cells, are highlighted.
[0183] (b) Raw sequencing data from parallel extraction and sequencing of cellular DNA. Tracks of genomic DNA from four GAM samples produced from 1 NP each, two mESC samples and two XEN cell samples. The tracks show typical detection of genomic DNA covering short continuity regions of chromosomes, as predicted from DNA extracted from thin-nuclear slices.
[0184] [Figure 9] (a) Expression of Sox2 and Oct4 in single ultrathin slices from mESC-F123 and XEN cells. The scatter plot shows the relationship between the number of reads for Sox2 (Y axis) and Oct4 (X axis) sequenced from individual 1NP samples obtained from mESC-F123 (□ - left panel; number of samples = 96) and XEN cells (☆ - right panel; number of samples = 40) after oligo-seq analysis using oligo probe design A (OL66). The top panel represents the count of sequence reads covering all probes across the gene, the middle panel represents sequence reads covering only the first 5 probes (from TSS), and the bottom panel represents sequence reads covering only the first probe of each gene (from TSS). The Sox2 and Oct4 genes are expressed in mESCs but not in XEN cells.
[0185] (b) Expression of Sox2 and Sox17 in single ultrathin slices from mESC-F123 and XEN cells. The scatter plot shows the relationship between the number of reads for Sox2 (Y axis) and Sox17 (X axis) sequenced from individual 1NP samples obtained from mESC-F123 (□ - left panel; number of samples = (96)) and XEN cells (☆ - right panel; number of samples = 40) after oligo-seq analysis using oligo probe design A (OL66). The top panel represents the count of sequence reads covering all probes across the gene, the middle panel represents sequence reads covering only the first 5 probes (from TSS), and the bottom panel represents sequence reads covering only the first probe of each gene (from TSS). The Sox2 gene is expressed in mESCs but not in XEN cells, and the Sox17 gene is expressed in XEN cells but not in mESCs.
[0186] [Figure 10] Oligo-seq in ultrathin frozen sections to distinguish between various cell types. Clustering of mESC-F123 and XEN cells using expression data from single-slice samples and expression levels of transcripts encoded by specific genes in each cell slice. U-map clustering from all mESC-F123 (□) and XEN (☆) 1NP samples from standardized read counts of a probe targeting exons, considering all targeted genes present in probe design A (OL66) including unexpressed control genes. The upper left panel shows the Y and X axis U-map coordinates for all 1NP samples, clearly distinguishing slices from mESC or XEN cells. Expression of the genes Ywhae (housekeeping), Sox2 and Oct4 (mESC specific), and Sox17 and Gata6 (XEN cell specific) is shown based on Z-score normalized expression (scale bar, grayscale Z-score of expression intensity, upper right).
[0187] [Figure 11]Oligo-seq in ultrathin frozen sections distinguishes various cell types. Clustering of mESC-F123, liver, and XEN cells using expression data from single-slice 1NP samples and transcript expression levels (standardized read counts) encoded by specific genes represented by probe design B (OL1823), which includes a minimum of 6 probes per gene, including a non-expressed control gene. U-map clustering of all mESC-F123 (○), liver (×), and XEN (▽) 1NP samples. The upper left panel (A) shows the Y and X-axis U-map coordinates for all 1NP samples, clearly distinguishing single-cell slices from mESC (B), XEN cells (C), or liver cells (D).
[0188] [Figure 12] (a) Oligo-seq combined with GAM (RNA-GAM) in single ultrathin slices from mESC-F123 can distinguish cell state and expression-specific 3D genome architecture. mESC-F123 1NP samples hybridized with probe design B (OL1823) were grouped by Sox2 probe recovery (lowest Sox2 recovery = 269 1NP samples, highest Sox2 recovery = 157 1NP samples). Each NPMI contact map was plotted on 150kb sections of the region surrounding the Sox2 gene (mouse genome assembly mm10, chromosome 3: 30~39Mb). The grayscale in the matrix shows the normalized self-mutual information (NPMI) score between two genomic loci. (b) Histogram of Sox2 RNA detected in mESC 1NP samples (total N=507), expressed as the number of Sox2 UMIs recovered per NP normalized by the total number of Sox2 target sites in probe design B (OL1823).
[0189] [Figure 13]RNA detection using oligo-seq is highly RNAe-sensitive at the transcript level, demonstrating its high specificity and sensitivity. Oligo-seq probe design B (OL1823) was applied to frozen sections from RNAe-pretreated (R) and untreated (NR) mESCs. RNA was quantified from 1 NP (top, panel A) and 100 NP (bottom, panel B). Genes were divided into five groups based on their mESC-F123 RNAeq expression levels (0-1, 1-10, 10-50, 50-100, >100 TPM). Bar plots represent the mean probe score for each gene across all collected mESC-F123 samples (untreated (NR)) and RNAe-treated (R) mESCs are shown as separate box plots. TPM gene sample size (N) (0-1: 455 genes, 1-10: 332 genes, 10-50: 457 genes, 50-100: 217 genes, >100: 285 genes).
[0190] [Figure 14] Oligo-seq results from 1NP or 100NP samples using probe design B (OL1823) show a high correlation between gene expression and bulk RNA-Seq and 1NP and 100NP samples. The mean oligo-seq-derived probe score (log10) per gene added across all 1NP (B) and 100NP (A) samples was plotted against gene expression from total mESC RNA-Seq (TPM(log10)) (number of genes = 1823). RNA-Seq was calculated from oligo-seq 1NP (approximately 20 cells with biological material value) and 15 oligo-seq 100NP samples (approximately 50 cells with biological material value) from millions of cells, 507 samples. Spearman's rank correlation (R) was performed on genes expressed at >1 TPM. (C) Oligo-seq expression levels per gene were highly reproducible with 1NP and 100NP collection strategies, with 1NP on the y-axis and 100NP on the x-axis. Spearman's rank correlation (R) was performed on all genes included in probe OL1823.
[0191] [Figure 15]Oligo-seq detection of RNA abundance in specific genes. Ultrathin frozen sections from mESCs, XEN cells, and liver cells were hybridized with an oligonucleotide probe (design B; OL1823) before washing, laser microdissection, and amplification of oligonucleotide probes and genomic DNA. After separate library preparation from the oligo probes, the samples were pooled and sequenced. Genome browser track of raw sequencing data from oligonucleotide probes from 100 NP samples. The X-axis represents the genomic coordinates in the centromere-to-telomere direction covering each indicated gene. The square below the x-axis (genome browser track) represents the location of the target site overlapping exon. All OL1823 probes were exclusively mapped to exons or exon-intron junctions. The gene track shows the number of probes mapped to the probe reference map. All probe references are arranged in linear genomic order. We focused on four genes: Ywhae, a housekeeping gene; Oct4, which is specifically expressed in mESCs; Gata4, which is specifically expressed in XEN cells (and in a small number of mESC-F123s); and Aldob, which is expressed in liver cells.
[0192] [Figure 16] Oligo-seq captures the specific expression of Oct4 and Ywhae in single ultrathin cell slices from mESC-F123. The scatter plot shows the relationship between UMI numbers (or UMI numbers normalized to the number of target sites per gene) for Aldob, Gata4, or Bdnf (Y-axis) and Oct4, Ywhae, or Aldob (X-axis) sequenced from individual 1NP samples obtained from mESC-F123 after oligo-seq analysis using oligo probe design B (OL1823). Oct4 and Ywhae genes are expressed in mESCs, Aldob is not, and Gata4 is expressed in only a small number of cells in the mESC-F123 population.
[0193] [Figure 17]Oligo-seq captures the specific expression of Gata4 and Ywhae in single ultrathin slices from XEN cells. The scatter plot shows the relationship between UMI numbers (or UMI numbers normalized to the number of target sites per gene) for Aldob, Gata4, or Bdnf (Y-axis) and Oct4, Ywhae, or Aldob (X-axis) sequenced from individual 1NP samples obtained from XEN cells after oligo-seq analysis using oligo probe design B (OL1823). Gata4 and Ywhae genes are expressed in mESCs, while Aldob, Oct4, and Bdnf are not.
[0194] [Figure 18] Oligo-seq captures the specific expression of Aldob and Ywhae in single ultrathin slices from adult liver cells. The scatter plot shows the relationship between UMI numbers (or UMI numbers normalized to the number of target sites per gene) for Aldob, Gata4, or Bdnf (Y-axis) and Oct4, Ywhae, or Aldob (X-axis) sequenced from individual 1NP samples obtained from liver cells after oligo-seq analysis using oligo-probe design B (OL1823). Aldob and Ywhae genes are expressed in liver cells, while Gata4, Oct4, and Bdnf are not. [Examples]
[0195] The present invention, a specified oligo-seq method, involves stringent hybridization of a short single-stranded oligonucleotide to the nucleic acid, followed by isolation of the hybridized oligo and finally sequencing or PCR amplification, after which the presence or abundance of nucleic acid, e.g., various RNA species, in a tissue, cell, or compartment is detected. The oligonucleotide is the nucleic acid of interest. FurthermoreThis includes both flanking and homology regions of known sequences used for oligo amplification and detection, and for purposes such as assignment to RNA species (Figure 1). A more functional implementation of oligo-seq combines RNA detection and genomic architecture mapping by extracting and sequencing RNA-hybridized oligo probes in parallel with the recovery and sequencing of genomic DNA contents from compartments (Figure 3b). Another implementation involves tagging samples with different types of probes (e.g., antibodies) on similar oligonucleotide sequences (e.g., samples from various patients to enable large-scale parallel sequencing in various spatial coordinates or diagnostic settings from tissue sections).
[0196] Preferred probe structure The central target region of each probe is unique to one individual target position within the mouse genome (mm10 assembly) and is approximately 35–50 bps long (homologous region, HR; Figure 1). Each target sequence is confirmed to have thermal stability to withstand stringent washing at 47–60°C with 40% formamide, excluding all possibilities of secondary structure formation.
[0197] If desired, either side of the target region is a sequence barcode, as shown in the proof of the OL66 library principle (Figure 1, Oligo Design A). The 5' barcode (B1) is shared by all probes targeting a particular gene, and the 3' barcode (B2) is shared by all probes targeting an exon or intron of the same gene. B1 and B2 enable independent targeting of various gene regions via FISH or qPCR. The barcodes do not exhibit complementarity and are therefore aligned to the entire genome sequence (mm10 assembly) to ensure nonspecific binding of flanking regions to specific RNAs.
[0198] Each probe end has two universal primer regions (UP1 and UP2; Figure 1). First, these allow for amplification of the probe from the original stock. Second, they allow for easy addition of Illumina-compatible primers to each oligo for sequencing. Furthermore, the universal target regions can be used as target locations for FISH for rapid verification of the success of probe hybridization in the desired compartment during the optimization stage, which is not a required step in this invention.
[0199] In an alternative design, the probe structure contains only sequences homologous to the target RNA species and the 5' and / or 3' universal primer sequence, without barcodes assigned to exons / introns. The probe structure may also include one or more barcode sequences for certain target site properties, compatibility with detection by FISH (for validation / optimization purposes), and PCR methods (for optimization and as a standalone implementation). The probe structure may also include UMI to reduce PCR bias and increase resolution for probe abundance measurement (Figure 1).
[0200] A unique molecular identifier (UMI), typically a random nucleotide thread of 4–12 base pairs, may be included separately or in addition to this (Figure 1, Oligo Designs B and C). With random probability, the UMI is unique to each targeting oligo in the oligo hybridize pool; i.e., oligos targeting the same target region will have different UMI sequences. UMIs are used to improve post-sequencing accuracy and quantitatively evaluate the original pooled RNA present in the sample.
[0201] Two probe designs were constructed, representing the characteristics of two different probe libraries. Probe library oligo-lot 66 (OL66) is shown in Figure 1 as the oligo design, which has a target region of approximately 35 bp, two flanking barcode regions, and a universal primer region at each end. Probe library oligo-lot 1823 (OL1823) has a target region of approximately 45 bp, two flanking universal primers, and an 8 bp UMI (Figure 1, oligo design B).
[0202] Proof of the principle: RNA detection The proof-of-principle library OL66 focused on a subset of 66 genes (Figure 2a). The subsequent library, OL1823, was extended to cover 1823 genes (Figure 2b).
[0203] For OL66, we selected key cell type markers appropriate to the cell types used in this method. The aim was to test the ability of oligo-seq to distinguish between different cell types. This required testing whether 66 genes were sufficient, as well as determining the number of probes required per gene, the level of nonspecific probe retention in compartments, and sensitivity to highly and low-expression genes. We selected two developmentally similar cell types with different gene expression profiles: mouse embryonic stem cells (mESCs; cell line F123) and extraembryonic endoderm cells (XEN cell line IM8A1), as well as mouse liver cells in adult tissue. We selected probes specific to genes expressed in mESCs or XEN cells. For example, the Oct4 gene is expressed in mESCs but not in XEN cells, while the Gata4 gene is expressed in XEN cells but not in mESCs. To evaluate the background retention of probes against unexpressed transcripts, we included probes specific to genes expressed in neurons, as well as several specific to dopaminergic neurons (e.g., Th, a tyrosine hydroxylase tycologne). mESC and XEN cells divide actively. Therefore, we also selected to test cell cycle markers that are expressed in both cell types but at varying levels in different cells of the dividing cell population at different stages of the cell cycle. We also included probes against highly expressed and well-studied lncRNAs that preferentially localize to nuclear subcompartments and play a role in the 3D architecture (Malat1, Firre, Neat1, and Xist). We also included probes against RNAs encoding pluripotency factors. For example, Nanog is a transcription factor expressed at varying levels within the mESC population but not in XEN cells. To explore oligo-seq sensitivity, we included a variety of probes for each gene depending on its length. For example, 1100 probes (9.8% of all probes) were assigned to Satb2, a long neuron gene not expressed in mESCs or XEN cells, and 17 probes were assigned to Sox2, a slightly shorter gene expressed in mESCs but not in XEN cells. We also included probes for housekeeping genes expressed at varying levels (Ywhae and Eif3h are highly expressed, while Faim and Alyref2 are expressed at low levels).All tested housekeeping genes showed homogeneous expression across the differentiation timeline in ESCs versus dopaminergic neurons (Ferrai et al., 2017, RNA polymerase II primes Polycomb-repressed developmental genes throughout terminal neuronal differentiation. Mol. Syst. Biol. 13, 946). Considering genes at various expression levels may help calibrate the expression of the target gene and provide a cell type-independent control for a successful experimental design.
[0204] A second probe library, OL1823 (90,941 individual oligoprobes; from CustomArray, Inc., see Methods), characterized genes in multiple panels used for microarray analysis from Nanostring.com. The OL1823 library targeted a much wider range of genes, covering the intended target cell types and expression ranges of unexpressed genes with negative or nonspecific background controls. Selected genes were concentrated in nanostring gene panels for the following cell cycles: induced pluripotent stem (iPS) cells, stem cells, stem cell signaling, stem cell transcription factors, hematopoiesis, Polycomb and trithorax target genes, neurotransmitter receptors, learning and memory, DNA damage repair, chromatin modifying enzymes, chromatin remodeling complexes, Notch pathway, WNT pathway, cytokines, Hippo pathway, mesenchymal stem cells, Notch target genes, and metabolic pathways, of which 1310 / 1823 genes were expressed with more than 1 transcript per million in mESCs, and 513 / 1823 were unexpressed.
[0205] For OL66, all genes were targeted across their entire range, including exons and introns, with the exception of extremely long genes where probes targeting introns were reduced (Figure 2a). Therefore, the number of probes targeting each gene differed from gene to gene, which was intentional in demonstrating this principle as a means of evaluating the relationship between the number of targeting probes per gene, the ability to detect expression, and the maximum amount of information to be recovered from each gene. For example, this aimed to understand whether different regions of the transcript or sequence composition of the target RNA region had preferred specificity and sensitivity.
[0206] For OL1823, the target site was limited to exon and exon-intron duplication. The target site of 39-45 nucleotides was determined using publicly available databases from Oligopaint and the genomic coordinates of mm10 "stringent" overlapping with the target gene, and the nucleic acid sequence was corrected for chainability. The OL1823 target site was designed to have a hybridization temperature of 47°C, a Tm of 47-52°C, and an 18-mer KMER length for the mm10 genome assembly (https: / / www.pnas.org / content / 115 / 10 / E2183).(https: / / oligopaints.hms.harvard.edu / genome-files).
[0207] In the future, we hope to implement a robust detection method using a minimum number of probes that may vary depending on the application. For example, the number of probes or genes required to define cell type composition in composite tissues may differ from the number required to define the effects of treatment or disease on cell type expression profiles.
[0208] A library of oligonucleotides, OL66, covering 66 genes of varying lengths, expression patterns (e.g., only in mESCs or XEN cells, or both, or neither, e.g., neuronal genes) and expression levels, was selected based on publicly available RNA-Seq data from mESCs differentiated into neurons (Ferrai et al. 2017) (Figure 2). For each gene, probes included oligonucleotides covering exons or introns, except for genes without introns. In some cases, the exon of a gene (e.g., H3f3a) was covered by only a single oligonucleotide, while the exon of another gene, such as Lhx1, was covered by 99 oligonucleotide probes. Similarly, the introns of genes were covered by varying amounts of oligonucleotide probes. For example, the intron of Hspa8 was covered by 7 oligonucleotides, while the intron of Satb2 was covered by 1031 oligonucleotides. The neuronal gene Bdnf was covered by a total of 531 probes to investigate the background and specificity of RNA detection in ESCs.
[0209] For the oligonucleotide library OL1823, which covers 1823 genes, all genes in the nanostring panel described above were included. For example, the gene Hmcn1 was covered by a maximum of 313 probes, while the genes Ubb, Ndufb4, Pcna, Ifna1, Cbx3, and Bcl2a1a were covered by a minimum of one probe.
[0210] Biological materials and ultrathin frozen sectioning As proof of principle, oligo-seq was implemented using ultrathin frozen sections from mESCs, XEN cells, and liver (tissue) cells. Thin-nucleated frozen sections were produced in the absence of resin embedding by the modified Tokuyasu method (Tokuyasu, KT, 1973; Guillot 2004; Pombo et al., 1999). After fixation using HEPES-buffered electron microscopy-grade formaldehyde, cells were cryoprotected by embedding in saturated sucrose solution and subsequently frozen in liquid nitrogen. Ultrathin frozen sections approximately 220 nm thick were cut at -100°C using a Leica Ultracut cryomicrotome and transferred to coverslips for hybridization verification or to laser microdissection PEN slides for oligo-seq.
[0211] Verification of oligo library hybridization by RNA-FISH of ultrathin frozen sections To test whether the oligo-seq probe OL66 efficiently hybridizes to mouse ESCs, we used fluorescence-based in-situ hybridization (RNA-FISH) on ultrathin frozen sections for the first time. Ultrathin frozen sections are known to preserve cellular RNA contents in both the nucleus and cytoplasm while enabling efficient hybridization of FITC-labeled oligo-dT probes in HeLa cells (Xie SQ and Pombo A., 2006, Distribution of different phosphorylated forms of RNA polymerase II in relation to Cajal and PML bodies in human cells: an ultrastructural study. Histochem. Cell Biol. 125, 21-31; Branco et al., 2006), and, after activation in HepG2 cells, provide efficient detection of single short RNAs as uPA / PLAU gene expression (Ferrai et al., 2010).
[0212] After hybridization of the oligo-seq library (OL66) to frozen sections from fixed and sucrose-embedded mESCs, unbound or partially hybridized probes were washed with stringent, and hybridized probes retained in the sections were confirmed by secondary hybridization with fluorescently labeled short ssDNA oligonucleotides homologous to the universal sequence of the flanking region in the oligonucleotide probe (Beliveau et al., 2012). As expected, the fluorescence signal was detected in the cytoplasm and nucleus, including regions of low nuclear chromatin called interchromatin domains or splicing speckles (arrows; Figure 4). The specificity of oligo detection for hybridized RNA was confirmed by pre-treating sections with RNAe (Figure 4, +RNAe) and the absence of a probe signal in the nucleolus (asterisk; Figure 4), which is abundant in nascent and mature ribosomal RNA but absent in protein-coding transcripts targeted by this probe library.
[0213] Oligo-seq hybridization of frozen sections As proof of principle, oligo-seq and GAM were combined using ultrathin frozen sections from mESC, XEN, and liver tissue cells. Oligonucleotide probes and cellular genomic DNA were extracted simultaneously, and the oligo sequences were amplified in parallel with the genomic DNA by the GAM process (as shown in Figure 3) (Beagrie et al. (2017); Beagrie et al., 2020, Multiplex-GAM: genome-wide identification of chromatin contacts yields insights not captured by Hi-C. bioRxiv 2020.07.31.230284; doi: https: / / doi.org / 10.1101 / 2020.07.31.230284). As an initial search for the presence or absence of cell type-specific transcripts in mESC or XEN cells, we began with the use of qPCR using primers homologous to a universal probe sequence, as outlined in Figure 5b. Using LMD (Laser Microdissection) of cresyl violet-stained frozen sections on plastic-coated slides, several batches of approximately 100 single-nucleus slices were collected in single PCR tubes from (a) mESC-F123, (b) XEN cells, and (c) mESC-F123 cells pre-treated with RNAe as described in the Methods. mESC (Sox2, Oct4) and XEN cell (Sox17, Gata6) specific genes were more enriched in each cell type (i.e., detected with fewer PCR cycle amplification values) (Figure 6a) and showed enrichment equal to that of highly expressed Malat1 lncRNA expressed in both cell types. BDNF, a neuronal marker not expressed in either cell type, showed a high number of PCR cycles in all analyzed samples, including RNAe-treated mESCs. RNAe-treated mESC samples did not show enrichment of cell type-specific gene signatures.
[0214] Next, we developed an oligo-seq method using NGS by producing a library using the pipeline shown in Figure 3b. We adapted an in-house developed approach for the extraction and amplification of genomic DNA within the GAM pipeline. The adapted Malbac method is based on the MALBAC whole-genome amplification approach (hong C et al. 2012 Genome-Wide Detection of Single Nucleotide and Copy Number Variations of a Single Human Cell, Science, 338(6114): 1622-1626.) for parallel detection of oligo probes, which is outlined in detail below.
[0215] In short, primers were designed with sequence compatibility with the Illumina Nextera kit added to the end of each oligo to produce a sequencing library from probes from a sample. These primers were directly incorporated into the in-house whole-genome amplification so that the probe and genomic DNA were simultaneously amplified in the same reaction. Excess primers were digested with exonuclease I. Next, the PCR reaction product containing both the amplified oligo probe and genomic DNA was divided into two so that the probe and genomic DNA (containing 1 NP or 100 NP) from the same sample were amplified independently (Figure 3b). The probe and genomic DNA sequencing libraries were indexed separately. Optionally, peptide nucleic acids (PNAs) may be included in the indexing step of the protocol to reduce unwanted sequencing results from primer and oligomis matching. PNAs complementary to the GAT-COM sequence used in the GAM method were included to prevent reads containing the GAT-COM sequence in the final sequencing library. Next, the DNA libraries were pooled for sequencing in the same run on a NextSeq Illumina sequencer (75 bp read length, single-ended sequencing). Using OL66, a total of 198 DNA libraries were produced from mESCs, each containing 96 genomes and 96 probe libraries from 96 samples, each containing 1 NP; and 6 DNA libraries, each containing 6 genomes and 6 probe libraries from 6 samples, each containing 100 NP (Table 4). Similar libraries were also produced from mESCs and XEN cells treated with pre-oligohybridization RNAe, as shown in Table 4. After sequencing (2-4 million reads for genome libraries, approximately 500,000 reads / probe library), the reads were demultiplexed to obtain two separate Fastq files per sample, one for RNA-GAM oligoprobes and the other for cellular genomic DNA. A detailed description is outlined in the methods below.
[0216] Genome sequencing files were mapped to a reference mouse genome (assembly mm10), and each sample passed quality control analyses typical of GAM data analysis, including several parameters, particularly the percentage of the orphan window (<60%) and the number of uniquely mappable reads (>25,000) (Beagrie et al. 2017, Winick-Ng et al. 2020). These parameters provide information on rare occurrences such as incomplete extraction, laser microdissection failure, or contamination (129 out of 154 1NP biological samples collected in proof-of-principle experiments passed QC). Quality control can be performed as described, for example, in Winick-Ng et al., 2020. Table 4 lists all collected samples. To map sequence reads from the probe set, a probe reference sequence map was constructed, parallel to the entire probe sequence including probe-specific barcodes and primers, maintaining the probes in genomic position order in mm10. Bowtie2 was used to determine and remove low-quality reads with poor mapping quality, i.e., those with a low probability of mapping to a single unique location. Because each probe was sequenced from the same starting position (5' end) to the RNA target homology sequence for 75 bp read lengths (Figures 1 and 7), the raw data obtained from oligo-seq showed inconsistent counts of sequence reads homologous to the probe per unique target location of the target RNA species. Then, simple summary statistics for RNA were applied based on read counts directly aligned to the probe reference genome. For the probe library OL1823, UMI was extracted computerically from sequence reads using the UMI tool, deduplication was performed, and quantitative counts of individual probe sequencing were obtained. https: / / umi-tools.readthedocs.io / en / latest / index.html ).
[0217] [Table 4]
[0218] [Table 5-1] [Table 5-2]
[0219] [Table 6]
[0220] After laser microdissection of single nuclear profiles (1NP) or 100NP (100NP) into single PCR tubes, three OL66 collections consisting of mESC-F123, RNAe-treated mESC-F123, and XEN cells were prepared (Table 4). The 100NP samples were collected as “bulk” samples, corresponding to the total cellular material contents of approximately 3–5 cells in each sample. After mapping of various sets of samples, including 100NP mESC or XEN cells, to a probe reference map, there was clear evidence of the detection of oligo probes mapped to housekeeping genes (Ywhae), particularly in the exon regions. Preferred oligo detection in the exon regions is due to the fact that the exons of expressed genes are present not only at the transcription site (nascent RNA) but also in the mature mRNA molecules that move from the nucleus to the cytoplasm for translation into proteins (Figure 8a). In contrast, intron sequences were also detected through introns in expressed genes, but introns are typically short-lived and quickly degraded by splicing of nascent mRNA, resulting in low abundance. Cell type-specific gene enrichment was also observed in OL1823, with Oct4 being highly enriched in mESCs, GATA4 in XEN cells, Aldob in the liver, and Ywhae in all cell types (Figure 15).
[0221] Next, we tested whether oligo-seq mapping of RNA in frozen sections interfered with the detection of genomic DNA, as in GAM. The read distribution obtained from sequencing of 1NP samples was found to exhibit the predicted characteristics, along with rare but continuous enrichment patches of genomic DNA detection that differed between samples due to nuclear sectioning with random orientation (Figure 8b). This observation was important because the extraction and detection of genomic DNA from intracellular compartments, such as thin (200 nm thick) frozen sections, can be susceptible to contamination, especially when biological samples are subjected to additional processes (e.g., oligo-seq) before DNA extraction. However, the implementation of the GAM and oligo-seq combination was successful, as the majority of the collected samples (approximately 85% of OL66 and approximately 62% of OL1823 samples had >25,000 uniquely mapped reads and a <60% orphan window) passed the current QC filtering process for GAM data.
[0222] Enrichment of cell type-specific genes, Aldob (liver), Oct4 (mESC), and Gata4 (XEN cells), was also evident when the probe counts for the entire gene (Figure 8a) were totaled for OL1823 derived from individual 1NPs. As expected for individual 1NPs, the neuronal marker BDNF was detected at low levels across XEN cells, mESCs, and liver cells.
[0223] For OL66, the first five probes from the transcription start site were sufficient to identify cell type-specific information for some genes (Figure 9a). For example, comparisons were made between frozen sections from mESCs hybridized with high-abundance oligonucleotides complementary to Sox2 (mESC-specific) and oligonucleotides complementary to Sox17 (XEN-specific), and vice versa for Sox17 (Figure 9b).
[0224] The sum of probe counts for each gene per sample was successfully obtained using UMAP (https: / / umap-learn.readthedocs.io / en / latest / basic_usage.html) to obtain cell type-specific clusters of samples created from mESC, liver, and XEN cell types, demonstrating that each sample is enriched with transcripts expressed by its known cell type (Figures 10 and 11). Clustering can be further improved by exploring improvements in RNA abundance metric other than the aforementioned standardization based on oligo-seq sequences and potential filtering of low-quality probes and samples.
[0225] Regarding OL1823, we further confirmed the specificity of cell type-specific transcript detection among mESCs, XEN cells, and hepatocytes (Figures 16-18). Aldob (liver marker gene), Gata4 (XEN cell marker), and Bdnf (neuron marker) were not expressed in oligo-seq data collected from single (1NP) mESC frozen sections, as expected, with a few rare exceptions in a small number of datasets due to the presence of 3-5% Gata4-positive cells in mESC cultures in LIF-supplemented serum. In contrast, Oct4 (mESC marker) or Ywhae (cell cycle marker) (Figure 16) were not expressed. Aldob, Bdnf, and Oct4 were not expressed in single (1NP) XEN cell frozen sections, while Gata4 and Ywhae were (Figure 17). Finally, in single (1NP) hepatocyte frozen sections, Aldob and Ywhae were detected, but Oct4 or Gata4 were not (Figure 18).
[0226] The sum of probe counts for each gene per sample was successfully obtained using UMAP (https: / / umap-learn.readthedocs.io / en / latest / basic_usage.html) to obtain cell type-specific clusters of samples created from mESC, liver, and XEN cell types, showing that each sample was enriched with transcripts expressed by its known cell type (Figures 10 and 11). Importantly, Oligo-seq was able to capture 3–5% of mESC cells that were distinguishable from XEN-like cell lineages (Figure 11). Clustering can be further improved by standardization based on oligo-seq sequences, exploring improvements in RNA abundance metric other than the bioprobe count per gene, and potentially filtering out low-quality probes and samples.
[0227] To investigate the linearity of transcriptome expression detected by oligo-seq, expression levels detected from each gene were correlated with bulk total RNA-Seq data from the same cell type using 100 NP or 1 NP of OL1823 in mESCs. As shown in Figure 14, both the 1 NP and 100 NP data correlated highly with gene expression quantified from millions of cells by bulk RNA-Seq, exceeding 1 TPM (Spearman's rank test R=0.85). Noise typically corresponded to low and consistent levels of gene expression in bulk RNA-Seq below 1 TPM, which is considered the minimum threshold for gene expression. The specificity of oligo-seq was tested by quantifying expression from samples pretreated with RNAe A before oligo-seq hybridization and further processing (Figure 13). In both 100NP and 1NP samples, higher gene expression levels than those in RNAe-treated samples were detected for expressed genes, including those within the minimum expression range (1–10 TPM), regardless of the expression level (one-sided t-test P<0.001). The insignificant background gene expression levels detected by oligo-seq in RNAe-treated samples were unrelated to gene expression, demonstrating the specificity and sensitivity of the assay.
[0228] To illustrate how the combination of oligo-seq and GAM (RNA-GAM) can provide cell-state-specific 3D genomic information, we investigated the 3D genomic conformation of a genomic region containing the Sox2 locus on chromosome 3 based on its expression (Figure 12). By separating RNA-GAM samples with the highest Sox2 transcript detection (157 / 507) from low / non-detectable RNA-GAM samples (269 / 507) (Figure 12b), we found that the Sox2 locus was highly compressed (with strong contact) in cells with the lowest Sox2 probe count and highly decondensed in cells with the highest Sox2 transcript expression (Figure 12a).
[0229] From this data, we conclude that RNA abundance and gene expression information can be recovered using oligo-seq from each sample, whether it be a single thin slice from a single cell or hundreds of such slices. Importantly, based solely on oligo-seq sequence reads, it is possible to cluster samples unbiased by origin cell type using two very similar cell types from cell lines representing the initial cell lineage involvement of the embryo, and this may potentially enable cell type clustering from complex heterogeneous tissues. One surprising outcome was the small number of probes required for expression detection, approximately 5 (Figure 9a, bottom panel; Figure 9b, bottom panel).
[0230] As an independent application of oligo-seq with potential for extension to droplet single-cell sequencing technology, we investigated the application of oligo-seq using cells in suspension. Whole cells (mESCs, clone 46C) were collected in solution, and the cells were collected by the method described below. Oligo-seq hybridization was performed with OL66 in solution. After post-hybridization and stringent washing, the cells were resuspended in PBS, and approximately 1–20 mESCs were separately dispensed into different PCR wells. The presence of oligo probes against cell type-specific genes was analyzed by qPCR as detailed in the method. As expected, mESCs showed enrichment of mESC-expressed genes (Sox2 and Oct4) and the ubiquitously expressed gene lncRNA Malat1 compared to the initial probe library composition (probes directly from stock). As expected, mESCs showed slight, detectable enrichment of oligoprobes specific to genes expressed in XEN cells (Sox17 and Gata6) and no enrichment of the neuronal marker Bdnf (Figure 6b).
[0231] Flow cytometry techniques were incorporated for single-cell isolation. Oligo-seq samples produced after whole-cell hybridization were equivalent to those produced from thin frozen sections and, therefore, amplified for next-generation sequencing, hybridized, extracted, and amplified for sequencing with the same oligo library and structure (Figure 1), as shown by the results above. The biological results of sequencing also remained consistent.
[0232] material and method Target location identification Probe sequences targeting gene bodies and exons were compiled into genome sequences in FASTA format. Genomic sequences can first be screened to exclude repetitive sequences (http: / / repeatmasker.org / ). Potential target sites can be filtered for favorable and consistent melting temperatures predicted by homopolymer run, "N" base, target site length, GC contents, and nearest neighbor dynamics (SantaLucia J, Jr (1998) A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc Natl Acad Sci USA 95:1460-1465). All candidate locations that pass these filters can be compiled into FASTQ format and then aligned to the entire genome of the target species to confirm that each candidate location is unique and not aligned to multiple locations in the genome; publicly available and commonly used aligners for NGS data are Bowtie and BWA (Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Bowtie: An ultrafast memory efficient short read aligner. Genome Biol 10:R25. 46; Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357-359. 47; Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26:589-595). NGS aligners also provide information about the mapping potential of the designed probe to target locations.The filtered and unique set of target sites can be filtered for secondary structure (Dirks RM, Pierce NA (2003) A partition function algorithm for nucleic acid secondary structure including pseudoknots. J Comput Chem 24:1664-1677), and verified with publicly available software to remove or ligate duplicate target sites and excess kmers (Marcais G, Kingsford C (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764-770). The target sites can be converted to reverse complementarity to target either side of the DNA strand required for RNA targeting. Target site sequences and genomic ranges can be formatted into the standard BED format, facilitating display and data handling in genome browsers (Quinlan AR, Hall IM (2010) BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 26:841-842). The specificity of each oligo-seq probe is ultimately validated experimentally using appropriate controls (e.g., RNAe-treated or biologically relevant tissues), and if partially or completely nonspecific, they are computationally removed from data analysis. The ability to remove nonspecific binding events identified at separate oligo levels is a clear advantage over other ISH techniques, such as FISH and microarray-based technologies exemplified by nanostrings, where the detection method cannot distinguish between individual oligos.
[0233] probe synthesis I ordered ssDNA oligonucleotides from CustomArray, Inc. (Bothell, WA 98011, US). CustomArray provides libraries containing thousands of oligonucleotides that are synthesized simultaneously using CMOS semiconductor technology.
[0234] The original stock of the oligo library was PCR-amplified using primers complementary to the universal primer ends present in all probes within libraries UP1 and UP2, and safely stored in the dsDNA library (Figure 1). The oligo library was converted to the ssDNA probe stock for hybridization by T7 amplification and reverse transcription, as outlined in Beliveau et al. 2017. Alternative amplification methods can be found, for example, at https: / / oligopaints.hms.harvard.edu / protocols.
[0235] UMI was added to the 5' end of each oligo as follows by the PCR described above: (20 μl KAPA GC buffer (KB2501), 0.5 μl SEQ ID NO: 10 primer 100 μM stock, 0.5 μl SEQ ID NO: 4 100 μM stock, 4 μl dNTPs (KN1009) mix 10 mM, 5 μl KAPA HiFi polymerase (KE2004) 1 U / μl stock, 50 ng oligo library stock, water up to 100 μl). PCR was performed as follows: 5 minutes at 95°C, (30 seconds at 95°C, 30 seconds at 58°C, 45 seconds at 72°C) × 12, 5 minutes at 72°C. Subsequently, the added oligo library was prepared as described above to create an ssDNA probe stock.
[0236] Cell proliferation and fixation Mouse ES cells (mESCs) used in oligo-seq from thin frozen sections belonged to the F123 lineage. Mouse ES cells (mESCs) oligo-seq'd whole cells in suspension belonged to the 46C lineage, a Sox1-GFP derivative of E14tg2a.
[0237] mESCs (clone F123) were cultured in a layer of mitotically inactivated feeder mouse embryonic fibroblasts (MEFs) (GSC-6201G, Global Stem). Feeder cells were grown at 37°C in feeder medium (90% DMEM (11995-065, Gibco), 10% FBS) and used up to 10 days after seeding. One day before mESC culture, the dish was covered with 0.1% gelatin and filled with approximately 1500 inactivated feeder cells / mm³. 2 Seeds were seeded at the following density. After the feeder settled (approximately 4-12 hours after seeding), mESCs were seeded into the feeder layer and added 15% knockout serum substitute (KSR, 10828028, Invitrogen), 1×Glutamax (35050, Gibco), 10 mM non-essential amino acids (11140-050, Gibco), 50 μM beta-mercaptoethanol (31350010, Gibco), 1000 U / ml LIF (GFM200, Cell Guidance Systems) to mESC-F123 medium (DMEM (11995-065, Cells were grown at 37°C in Gibco. Cells were divided into fresh feeder-covered dishes every 48 hours, and the culture medium was changed every 24 hours. Typically, after two passages, feeder cells were removed from the mESC culture by separating the cells into an uncovered dish for 30 minutes. MEF settled rapidly, while the mESCs remained suspended. The cell suspension was transferred to a fresh uncovered plate for another 30 minutes to increase feeder removal efficiency, after which the cells were seeded into a gelatin-covered dish (ESGRO Complete Gelatin; SF008, Merck). Feeder removal was repeated after 48 hours. Then, mESCs were seeded for collection. Because feeder removal reduces LIF levels during culture, the LIF concentration in the culture medium was doubled when cells were in feeder-free culture conditions. Cells were collected at 70-80% confluence after approximately 48 hours.
[0238] mESCs, clone 46C, cells were incubated at 37°C in a 5% (v / v) CO2 incubator, coated with gelatin (0.1% (v / v)) Nunc T25 flasks, and treated with 10% (v / v) fetal bovine serum (FCS; PAA, # A15-151), 2 U / mL LIF (Millipore, # ESG1107), 0.1 mM β-mercaptoethanol (Invitrogen, # 31350-010), 2 mM L-glutamine (Invitrogen, # 25030-024), 1 mM sodium pyruvate (Invitrogen, # 11360039), 1% penicillin-streptomycin (Invitrogen # 15140122), and 1% MEM non-essential amino acids (Invitrogen, # Cells were grown in GMEM medium (Invitrogen, # 21710025) supplemented with 11140035. The medium was changed daily, and cells were separated every other day. Before sample acquisition, mESCs were seeded in serum-free ESGRO Complete Clonal Grade medium (Millipore, # SF001-500) containing 1 U / mL LIF, on gelatin-coated (0.1% (v / v)) Nunc 10cm dishes. Cells were grown for 48 hours, and the medium was changed every 24 hours.
[0239] Mouse extra-embryonic endoderm (XEN) cells are derived from primitive endoderm (Kunath et al., 2005, Imprinted X-inactivation in extra-embryonic endoderm cell lines from mouse blastocysts. Development 132, 1649-1661). XEN cells (clone IM8A1, Kunath et al., 2005) were cultured on a 0.1% gelatin-coated surface in RPMI supplemented with 20% FCS, 2 mM L-glutamine, 1 mM sodium pyruvate, and 0.1 mM β-mercaptoethanol in a 5% CO2 incubator at 37°C.
[0240] Liver tissue was processed as described in Moeller A et al (2012 Mol. Cell. Proteomics, 10.1074 / mcp.M111.011767-2). Briefly, liver tissue was collected after intracardiac perfusion with PBS followed by 4% freshly depolymerized paraformaldehyde in 0.25 M HEPES-NaOH solution. The tissue was dissected into 1.5 mm sections in cold 4% PFA / HEPES solution and fixed with 8% PFA / HEPES (for 2 hours). The fixed tissue was embedded in 2.1 M sucrose / PBS solution and frozen in liquid nitrogen until frozen sections were prepared for oligo-seq and stored.
[0241] Frozen section preparation The cells were prepared for freezing and sectioning as previously described (Beagrie et al. 2017). Briefly, the cells were frozen in 4% and 8% paraformaldehyde in 250 mM HEPES-NaOH. Cells were fixed (pH 7.6; 10 minutes and 2 hours, respectively), pelletized, embedded in 2.1 M sucrose in PBS (2 hours), and frozen in liquid nitrogen on copper stubs. Frozen cells can be stored indefinitely in liquid nitrogen. LMD 4 μm PEN membranes on metal-framed slides (Leica) were prepared by drawing small rectangles of approximately 1 × 2 cm with a hydrophobic pen. Before cryopreservation, the glass knife used for cryopreservation and the LMD PEN membrane used for cryopreservation were UV treated (45 minutes, λ=385 nm). Ultrathin cryopreserved sections were cut to a thickness of approximately 220 nm using a glass knife with an UltraCutUCT 52 ultracryomicrotome (Leica, Milton Keynes, UK). Sections were captured in a droplet of PBS-free 2.1M sucrose, held in a copper loop, and transferred to LMD 4μm PEN coated glass slides for laser microdissection (Leica, Milton Keynes, UK) or coverslips for RNA-FISH.
[0242] Oligoprobe hybridization of cell / tissue frozen sections on LMD PEN membrane The hybridization oven for incubation was thoroughly cleaned, blotting paper moistened with sterile water was introduced, and then it was heated to 37°C (or 30°C to 60°C, but ideally 37°C; higher temperatures increase specificity). The entire process was carried out under RNAe-free conditions, and all reagents were molecular biology grade.
[0243] Frozen sections on LMD slides (for oligo-seq) or coverslips (for RNA-FISH) were washed with 0.2 μm filtered molecular biology grade PBS (three times, 5 minutes each), and sucrose solution was rinsed off the frozen sections. The frozen sections were permeabilized with 0.5% (V / V) Triton X-100 in PBS (10 minutes), followed by washing with PBS (three times, 5 minutes each).
[0244] For negative control samples, frozen sections were treated with Triton X-100 followed by RNAe treatment. After a second PBS wash from the previous step, the frozen sections were rinsed with 2×SSC and incubated in a humidified chamber with 250 micrograms / mL RNAe A in 2×SSC (2 hours, 37°C). Untreated samples were placed in 2×SSC at room temperature (approximately 20°C) or 4°C.
[0245] The primary probe hybridization mixture (2×SSC, 30% formamide, 2 mM vanadyl ribonucleoside complex, 1 mg / mL yeast tRNA, 10% dextran sulfate, 0.1 μM primary probe) was denatured at 78°C (3 minutes), cooled with ice, and maintained at 4°C for up to 4 hours until permeabilization and RNAe treatment were complete.
[0246] Prior to hybridization with the probes, the frozen sections were rinsed twice with wash buffer (30% formamide, 2×SSC, 2 mM vanadyl ribonucleoside complex) only on the side of the PEN membrane containing the frozen sections, and incubated with wash buffer for 5 minutes at room temperature. The wash buffer was carefully removed, and any excess buffer was removed from the edges of the LMD slide using UV-treated filter paper. The primary probe hybridization mixture (40 μL) was placed on the LMD side of the frozen section position. RNAse-Free Hybrislip (Invitrogen) was carefully overlaid on the primary probe hybridization mixture and then sealed with rubber cement. To aid in the removal of post-hybridization Hybrislip (Molecular Probes), a generous amount of rubber cement was applied and extended to the metal edge of the slide. The samples were incubated in a hybridization oven in a humidified chamber at 37°C for OL66 or 47°C for OL1823 (or 30°C–60°C, ideally 37°C) for 36 hours or 4–48 hours.
[0247] After hybridization, the rubber cement layer of the LMD slide was loosened by lightly covering it with wash buffer and carefully removed along with the Hybrislip. Frozen sections on the PEN membrane were immediately rinsed with wash buffer to avoid drying. A stringent wash to remove excess primary probe was performed for OL66 at 47°C with wash buffer (3 times, 20 minutes each). A stringent wash was performed for OL1823 at room temperature with 2×SSC (4 times, 5 minutes each). Wash buffer or 2×SSC was washed with PBS (3 times, 5 minutes each) to remove the excess. The LMD slide was rinsed once with water and incubated in a 1% cresyl violet aqueous solution (20 minutes). Excess cresyl violet was removed by washing the front and back of the LMD slide with water (3 times each), drying, and then immediately proceeding to LMD (storage of hybridized samples before laser microdissection may be possible, for example, in PBS at 4°C, but storage has not been tested).
[0248] Isolation of nuclear profiles Individual NPs were isolated from frozen sections by laser microdissection using a Leica laser microdissection microscope (Leica Microsystems, LMD7000) with 63-fold dry matter. Slices from individual cells were identified under bright-field imaging, and the slide membrane around each cell was cut using a laser. The cut cell slices were collected in a PCR adhesive cap (AdhesiveStrip 8C opaque; Carl Zeiss Microscopy #415190-9161-000). In each plate collection, two empty membrane regions equal in area to 100 NPs or 1 NP were collected as negative controls. These negative controls were also used for the production of sequencing libraries for quality control purposes. Laser microdissection samples on the AdhesiveStrip cap were stored at -20 °C until further use.
[0249] WGA GAT-7N: GTGAGTGATGGTTGAGGTAGTGTGGAGNNNNNNN (SEQ ID NO: 1) GAT-COM: GTGAGTGATGGTTGAGGTAGTGTGGAG (SEQ ID NO: 2) OL66 probe primer forward: GGCACAACGTTGCAGCACAG (SEQ ID NO: 3) OL66 / OL1823 probe primer reverse: CACCAACGCTACCAGCTCCG (SEQ ID NO: 4) OL66 probe primer A MeRev forward: TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGCACAACGTTGCAGCACAG (SEQ ID NO: 5) OL66 / OL1823 probe primer B MeRev reverse: GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCACCAACGCTACCAGCTCCG (SEQ ID NO: 6) OL1823 probe primer forward: GACCAGCCCACATCGCACTG (SEQ ID NO: 7) OL1823 Probe Primer A MeRev Forward: TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGACCAGCCCACATCGCACTG (SEQ ID NO: 8) PNA Sequence: GTGAGTGATGGTTGAGGTAGTGTGGAG (SEQ ID NO: 9) OL1823 UMI Attached - GACCAGCCCACATCGCACTGNNNNNNNNGGCACAACGTTGCAGCACAG (SEQ ID NO: 10) Lysis Mix: (21 mM Tris-HCl pH = 8.0, 1.4 nM EDTA pH = 8, guanidinium-HCl pH = 8.5, 3.5% Tween 20, 0.35% Triton X-100, 123 μg of Qiagen protease (19157), total volume 10 μl 2×DeepVent Buffer: 2×Thermo Pol Reaction Buffer (B9004s), 4 mM MgSO4 (B1003s), 400 μM dNTPs (N0447L) PCR Mix 1 (OL66): 1.333×DeepVent Buffer, 0.666 μM GAT-7n, 0.0066 μM OL66 Probe Primer Forward (SEQ ID NO: 3), 0.0066 μM OL66 Probe Primer Reverse (SEQ ID NO: 4), 80 U / mL DeepVent Polymerase PCR Mix 1 (OL1823): 1.333×DeepVent Buffer, 0.666 μM GAT-7n, 0.0066 μM OL1823 Probe Primer Forward (SEQ ID NO: 7), 0.0066 μM OL1823 Probe Primer Reverse (SEQ ID NO: 4), 80 U / mL DeepVent Polymerase PCR Mix 2 (OL66): 1×DeepVent Buffer, 1.2 μM GAT-COM, 0.12 μM OL66 Probe Primer Forward (SEQ ID NO: 3), 0.12 μM OL66 Probe Primer Reverse (SEQ ID NO: 4), 60 U / mL DeepVent Polymerase, 0.4 mM dNTPs. PCR mix 2 (OL1823): 1× DeepVent buffer, 1.2 μM GAT-COM, 0.12 μM OL1823 probe primer forward (SEQ ID NO: 7), 0.12 μM OL66 probe primer reverse (SEQ ID NO: 4), 60 U / mL DeepVent polymerase, 0.4 mM dNTPs. Probe PCR mix (OL66): 1× DeepVent buffer, 1 μM OL66 probe primer A MeRev forward (SEQ ID NO: 5), 1 μM OL66 probe primer B MeRev reverse (SEQ ID NO: 6), 30 U / mL DeepVent polymerase Probe PCR mix (OL1823): 1× DeepVent buffer, 1 μM OL1823 probe primer A MeRev forward (SEQ ID NO: 8), 1 μM OL1823 probe primer B MeRev reverse (SEQ ID NO: 6), 30 U / mL DeepVent polymerase Genome PCR mix: 1× DeepVent buffer, 1 μM GAT-COM, 30 U / mL DeepVent polymerase
[0250] Dissolution mix (10 μL) was added to each well of a 96-well plate. The 96-well plate was closed with an LMD cap containing the laser microscopy sample and tightly sealed. The 96-well plate was inverted, and the dissolution mix was collected in the cap lid by gently rotating the inverted plate at 200 × g for 2 minutes. The samples were incubated overnight at 60°C to digest them.
[0251] The following day, the buffer solution in the 96-well plate was collected at the bottom of the wells by centrifugation at 200 x g for 2 minutes. The protease was heat-inactivated using a PCR machine at 75°C for 30 minutes.
[0252] For linear amplification of genomic DNA and simultaneous probe amplification, 30 μL of PCR mix 1 was added to each well and incubated as follows: 95°C for 5 minutes, 11× (60°C for 50 seconds, 20°C for 50 seconds, 30°C for 50 seconds, 40°C for 45 seconds, 50°C for 45 seconds, 65°C for 7 minutes, 95°C for 20 seconds), 72°C for 5 minutes. Generally, in the method of the present invention, the primer concentrations in this step may be extraordinarily low, for example, less than 0.001 μM each for the forward and reverse primers.
[0253] For simultaneous amplification of genomic DNA and probes, 20 μL of PCR mix 2 was added to each well. This was incubated as follows: [95°C for 3 minutes, 3x (95°C for 20 seconds, 58°C for 30 seconds, 72°C for 3 minutes), 72°C for 3 minutes].
[0254] Excess primers were removed by adding 5 μL of exonuclease solution [1× exonuclease buffer (M0293L), 10U exonuclease 1 (M0293L)] to each well, incubating at 37°C for 40 minutes, and then thermally inactivating the exonuclease at 72°C for 20 minutes.
[0255] For PCR amplification of the probe, 30 μL of the reaction mix from each well was transferred to a new 96-well plate, 20 μL of the probe primer mix was added to each well of the new 96-well plate, and incubated as follows: 95°C-3 minutes, 18× (95°C-20 seconds, 60°C-30 seconds, 72°C-1 minute), 72°C-3 minutes. In parallel, for PCR amplification of genomic DNA, 20 μL of the genomic PCR mix was added to each well of the original plate containing approximately 30 μL of the original reaction mix, and incubated as follows: 95°C-3 minutes, 24× (95°C-20 seconds, 58°C-30 seconds, 72°C-3 minutes), 72°C-3 minutes). The plate was kept at -20°C until further use.
[0256] Until sequencing: Genomic DNA plates and probe DNA plates were purified with 1.7× and 1.8× SPRI beads and eluted with 20 μL of ultrapure water (Sigma). DNA concentration was quantified using the Quant-iT Picogreen dsDNA quantitative assay.
[0257] The genomic DNA concentration of each sample was normalized to 1 ng / μL. Genomic DNA was indexed and libraryized using an in-house Tn5 library preparation protocol directly compatible with the Illumina system (Winick-Ng, W., et al., 2020, Cell-type specialization in the brain is encoded by specific long-range chromatin topologies, Biorxiv. https: / / doi.org / 10.1101 / 2020.04.02.02099). Briefly, the tagmentation mix was incubated for the rearrangement reaction at 55°C for 7 minutes with a reaction mix containing a total of 5 μL (250 mM TAPS-HCl pH 8.5, 10% PEG, 250 μM MgCl2, 1 ng genomic DNA). Tn5 was then thermally inactivated by further incubation at 70°C for 5 minutes. Genomic DNA in 96-well plates was indexed with primers corresponding to Set A I5 and I7 Illumina barcodes from the Nextera XT system. PCR mix (5 μL) was added directly to the transposed DNA (PCR mix consisted of 2.5× KAPA GC buffer (KB2501), 0.9 mM dNTPs (KN1009), 0.06 U / μL KAPA HiFi polymerase (KE2004), 2.5 μM i5 primer, and 2.5 μM i7 primer). PCR was incubated as follows (95°C for 30 seconds, 12× (95°C for 10 seconds, 55°C for 30 seconds, 72°C for 30 seconds), 72°C for 5 minutes).
[0258] DNA amplification from oligoprobes in a Mirror 96-well PCR plate was directly indexed by PCR by adding it to the Illumina Set B I5 I7 Set B index using the KAPA Biosystems HiFi kit. The PCR mix consisted of 10 μL of 1× KAPA GC buffer (KB2501), 0.45 mM dNTPs (KN1009), 0.03 U / μL KAPA HiFi polymerase (KE2004), 1.25 μM i5 primer, 1.25 μM i7 primer, and 1 μL of probe DNA sample. If desired, PNA complementary to the GAT-Com sequence (SEQ ID NO: 9) was added to the final PCR solution at a concentration of 20 μM to reduce the GAT-Com-containing DNA fragments in the final library added to the OL1823 library. The PCR reaction was performed as follows (95°C for 30 seconds, 9x (95°C for 20 seconds, 60°C for 30 seconds, 72°C for 2 minutes), 72°C for 5 minutes).
[0259] Indexed genomic DNA plates and probe DNA plates were purified with 1.7×SPRI beads and eluted with 20 μL of ultrapure water (Sigma). DNA concentration was quantified using the Quant-iT Picogreen dsDNA quantitative assay.
[0260] Genomic DNA library samples were combined at equimolar concentrations (96 libraries in total). Separately, probe samples were also combined at equimolar concentrations (96 libraries in total). Each pooled sequencing library was purified twice using 1.6×SPRI beads and eluted in 50 μL of ultrapure water. The concentration of the combined pools was determined using the Qubit high-sensitivity dsDNA assay. The average fragment size was determined using BioAnalyzer. Next, the combined genomic and probe libraries were pooled together at equimolar concentrations before sequencing. The libraries were sequenced using an Illumina NextSeq500 sequencer in a single high-throughput sequencing run (75 bp single-ended).
[0261] Lobe and genomic DNA mapping Sequencing reads from each genomic DNA library were mapped to the mouse genome assembly GRCm38 (Dec. 2011, mm10) using Bowtie2 with default settings. Reads that were not uniquely mapped, reads with mapping quality <20, and PCR duplicates were excluded and removed from further analysis.
[0262] Sequencing reads from each probe from the DNA library were mapped to the known probe sequences using Bowtie2 with default settings. Reads that were not uniquely mapped and reads with mapping quality <20 were excluded and removed from further analysis. The count for each probe was defined for each gene as the number of reads within the first 75 bp of each read. The Bedtools ‘map’ function was used for the count of each probe at each position.
[0263] For UMI deduplication of the OL1823 library, UMI_tools was used. The UMI-tools dedup function was used with default settings to first extract the UMI from each sequencing read (umi_tools extract -- --bc-pattern=CCCCCCCCCCCCCCCCCCCCNNNNNNNNCCCCCCCCCCCCCCCCCCCC), SEQ ID NO: 11, and the UMI was deduplicated for each BAM file.
[0264] PCR-based detection of ssDNA Nuclear frozen sections were stored at -20 °C with 4 mm AdhesiveStrip 8C opaque ZEISS LMD caps until further use.
[0265] 10 μL of lysis mix was added to each sample well of a 96-well plate. The plate was closed with the LMD cap containing the frozen section and sealed tightly. The 96-well plate was inverted, and the lysis mix was collected on the lid of the cap by gently rotating the inverted plate at 200×g for 2 minutes. The samples were incubated overnight at 60 °C.
[0266] The following day, the buffer solution in the 96-well plate was collected at the bottom of the wells by centrifugation at 200 x g for 2 minutes. The protease was heat-inactivated using a PCR machine at 75°C for 30 minutes.
[0267] 30 μL of PCR mix (20 μL 2× DeepVent polymerase, 0.16 μL forward primer, 0.16 μL reverse primer, 1.8 μL 100 U / ml DeepVent polymerase) was added to each well. This was amplified (95°C - 3 minutes, 30x (95°C - 20 seconds, 58°C - 30 seconds, 72°C - 1 minute), 72°C - 3 minutes). The PCR mix was purified with 1.8 × SPRI beads and the concentration was determined by Qubit hsDNA assay. All samples were normalized to 0.2 ng / μL. qPCR was mixed as follows (2.5 μL normalized sample, 12.5 μL 2× Sybr-Green PCR master mix, 0.75 μL 10 μM primer stock (including both forward and reverse primers), PCR reaction volume adjusted with 25 μL water, molecular biology grade). qPCR was performed as follows: 95°C for 5 minutes, 40× (95°C for 30 seconds, 60°C for 15 seconds, 72°C for 30 seconds).
[0268] Fluorescence-based detection of ssDNA as a validation and optimization method. For RNA-FISH using oligo-seq probes, the same stock and hybridization conditions for the primary probe were used. Frozen sections were hybridized with bridges and secondary probes to visualize the hybridized oligo-seq probes by fluorescence microscopy (Figure 4b). Primary: [Universal Primer 1] [Barcode 1] [Target Homology] [Barcode 2] [Universal Primer 2] Bridge: [Barcode homology] [TT hinge] [Secondary homology] Secondary: [AlexaFluor647][AA][Bridge homology][AA][AlexaFlour647]
[0269] Frozen sections (220 nm thick) from mESC-F123 frozen samples were transferred to coverslips (1 cm diameter), washed with PBS (3 times, 5 minutes each), permeabilized with 0.1% (V / V) Triton X-100 in PBS (10 minutes), and washed with PBS (3 times, 5 minutes each). The samples were rinsed with 2×SSC and stored in 2×SSC at 4°C for 2 hours or incubated with 250 μg / ml RNAe A in 2×SSC in a humidified chamber (2 hours, 37°C).
[0270] The primary probe hybridization mixture (2×SSC, 30% formamide, 2 mM vanadyl ribonucleoside complex, 1 mg / mL yeast tRNA, 10% dextran sulfate, 0.1 μM primary probe) was denatured at 78°C (3 minutes), cooled with ice, and maintained at 4°C for up to 4 hours until frozen sections were ready.
[0271] Prior to hybridization with the probe, frozen sections were rinsed twice with wash buffer (30% formamide, 2×SSC, 2 mM vanadyl ribonucleoside complex) and incubated in wash buffer for 5 minutes at room temperature. The wash buffer was carefully removed, and any excess buffer was removed from the edges of the coverslip using UV-treated filter paper. The primary probe hybridization mixture (8 μL) was placed on RNAse-Free Hybrislip (Invitrogen), the coverslip was carefully overlaid on the primary probe hybridization mixture, and then sealed with rubber cement. The samples were incubated in a humidified chamber in a hybridization oven at 37°C (or 30°C–60°C, ideally 37°C) for 36 hours (overnight) or 4–48 hours.
[0272] The rubber cement was loosened by lightly covering it with wash buffer. The rubber cement was then carefully removed. The frozen sections were immediately rinsed with wash buffer to prevent drying. A stringent wash to remove excess primary probe was performed with wash buffer at 47°C (or 37°C–65°C, ideally 47°C; higher temperatures increase specificity) (three times, 20 minutes each).
[0273] The wash buffer was removed by washing with 2×SSC (3 times). Excess buffer was carefully removed with UV-treated filter paper. 8 μL of secondary hybridization buffer was placed on Parafilm protected from UV light, and the side of the coverslip containing the frozen section was incubated with secondary hybridization buffer containing (30% formamide, 2×SSC, 1.6 μM secondary oligonucleotide, 1.4 μM bridge oligonucleotide). A stringent wash to remove excess secondary oligonucleotide was performed at room temperature with 2×SSC and 40% formamide (3 times, 5 minutes each) with gentle shaking. The coverslip was then washed with 2×SSC for 5 minutes. Finally, the coverslip was washed with 1×PBS (3 times, 5 minutes each), stored at 4°C, or mounted on a DAPI-Vectashield and contrast-enhanced with a Leica confocal microscope. Images from frozen sections were acquired using a confocal laser scanning microscope (Leica TCS SP8; 63× objective, NA 1.4) equipped with a 405 nm diode and white light laser, with a pinhole equal to 1 Airy disk. Images from various channels were collected sequentially to avoid fluorescence pass-through.
[0274] Detailed information on oligo-seq in solution mESC (clone 46C) was seeded in a Nunc T75 flask (4.5 × 10 6 Cells were grown in mESC-46C medium + LIF for 48 hours. The cells were trypsinized with 0.05% trypsin in PBS (2.5 mL each in two Nunc T75 flasks) for 2 minutes at 37°C. The trypsinization was stopped in 10 mL of LIF-free mESC-46C culture medium. The contents of the two flasks were then pooled into a 50 mL Erlenmeyer flask (25 mL single-cell suspension) and centrifuged at 280 × g for 3 minutes. The cells were then scepter-fed. TM 2.0 Cells were counted using a handheld automated cell counter (1:10 dilution in PBS). Supernatant was removed, and cells were placed in mESC-46C culture medium + LIF in 5 × 10⁶ units. 6 The cells were resuspended at a concentration of cells / mL. 5 × 10⁶ cells were placed in a 1.5 mL tube. 6 The cells were divided equally into individual tubes.
[0275] Trypsin-isolated mESCs were pelleted at 900×g for 10 minutes at 4°C. The cells were then washed with 1×PBS and resuspended in 500 μL of 1×PBS. The cells were fixed in 1 mL of 4% electron microscopy-grade formaldehyde PBS solution for 20 minutes at room temperature, by adding 500 μL of 8% formaldehyde 1×PBS solution to a rotating wheel at RT. The cells were pelleted at 300×g for 6 minutes at 4°C, rinsed, and washed with 1×PBS for 5 minutes. The cells were then pelleted again at 300×g for 6 minutes at 4°C. The cells were resuspended and permeabilized with 200 μL of 0.5% Triton X-100 PBS solution for 10 minutes on ice with gentle shaking. 700 μL of PBS was added, and the cells were pelleted at 300×g for 6 minutes at 4°C. During this process, the primary probe was denatured at 78°C for 3 minutes and then transferred to an ice block for rapid cooling to prevent regeneration.
[0276] The cells were resuspended and incubated in 1× Wash Buffer (30% formamide, 2× SSC, 2 mM vanadyl ribonucleoside complex) with gentle shaking for 5 minutes, then centrifuged at 300×g for 6 minutes at 4°C. The total buffer was carefully removed, and the cells were gently resuspended in 100 μL of primary probe hybridization mixture (30% formamide, 2× SSC, 2 mM vanadyl ribonucleoside complex, 0.1 μM primary probe, 1 mg / ml yeast tRNA, 10% dextran sulfate) and incubated overnight in a humidified chamber with gentle rotation at 37°C.
[0277] 900 μL of 1×EWB was added directly to a 1.5 mL tube and pelleted at 500 × g for 6 minutes. The supernatant was removed, and the cells were resuspended in 500 μL of 1×EWB and washed at 47°C for 20 minutes. Cells were pelleted at 2000 × g for 3 minutes, the supernatant was removed, and the cells were resuspended in 500 μL of 1×EWB and washed at 47°C for 20 minutes. Cells were pelleted at 2000 × g for 3 minutes, rinsed, and washed with 1×PBS (two washes, 5 minutes each). Cells were resuspended in 50 μL of 1×PBS. Approximately 1-20 cells were divided equally into separate wells of a 96-well plate with 5 μL of PBS solution and stored at -20°C until ready for further use.
[0278] 17 μL of lysis buffer (30 mM Tris-HCl pH 8.0, 2 mM EDTA pH 8.0, 0.8 M guanidinium-HCl pH 8.5, 5% Tween-20, 0.5% Triton X-100) and 3 μL of Qiagen protease (Cat 19157) were added to each sample well. The samples were incubated overnight at 60°C with orbital shaking. The protease was heat-inactivated at 75°C for 30 minutes the following day. 40 μL of PCR mix (2x DeepVent solution, 0.24 μL of 100 μM forward primer, 0.24 μL of 100 μM reverse primer, 1.8 μL of DeepVent polymerase, and 8 μL of ultrapure water) was added to each well. PCR was performed as follows: 95°C for 3 minutes, 30x (95°C for 20 seconds, 58-30 seconds, 72°C for 1 minute, 72°C for 3 minutes).
[0279] The PCR mix was purified with 1.8× SPRI beads, and the concentration was determined by performing a Qubit hsDNA assay. All samples were normalized to 0.2 ng / μL. The qPCR mix was prepared as follows: 2.5 μL normalized sample (0.5 ng total), 12.5 μL Sybr-Green [TS1], 0.75 μL 10 μM primer stock (including both forward and reverse primers for the gene of interest), and 25 μL PCR-Clean water). qPCR was performed as follows: 95°C for 5 minutes, 40x (95°C for 30 seconds, 60°C for 15 seconds, 72°C for 30 seconds, plate reads for Sybr-green).
[0280] RNA-Seq data Total RNA-Seq data from mESC-F123 were produced as described in Kempfer (2020, Chromatin folding in health and disease: exploring allele-specific topologies and the reorganization due to the 16p11.2 deletion in autism-spectrum disorder. PhD thesis, Humboldt University of Berlin. doi:10.18452 / 22071). RNA was extracted with TRIzol Reagent and treated with DNase. RNA-Seq libraries were prepared using the Illumina TruSeq Stranded Total RNA Library Preparation Kit according to the manufacturer's instructions. The libraries were sequenced using a NextSeq500 sequencer at paired-end 75bp.
[0281] References 1. Josefsen, K. and Nielsen, H. (2011) Northern Blotting Analysis. Methods Mol Biol. 703, 87-105. 2. https: / / www.thermofisher.com / de / de / home / references / ambion-tech-support / ribonuclease-protection-assays / general-articles / the-basics-what-is-a-nuclease-protection-assay.html). 3. Miller, MB and Tang, Y.-W. (2009) Basic Concepts of Microarrays and Potential Applications in Clinical Microbiology., Clin Microbiol Rev. 22(4): 611-633. 4. https: / / www.illumina.com / documents / products / techspotlights / techspotlight_sequencing.pdf 5. Stark R., Grzelak, M. and Hadfield, J. (2019) RNA sequencing: the teenage years. Nature Reviews Genetics 20, 631-656. 6. Ziegenhain C., Vieth, B., Parekh, S., Reinius, B., Guillaumet-Adkins, A., Smets, M., Leonhardt, H., Heyn, H., Hellmann, I. and Enard, W. (2017) Comparative Analysis of Single-Cell RNA Sequencing Methods. Mol Cell. 85(4), 631-643. 7. Beliveau, B.J., Joyce, E.F., Apostolopolous, N., Yilmaz, F., Fonseka, C.Y., McCole, R.B., Chang, Y., Li, J.B., Senaratne, T., N., Williams, B.R., Rouillard, J.M. and Wu, C.T. (2012) Versatile design and synthesis platform for visualizing genomes with Oligopaint FISH probes. Proc Natl Acad Sci 109(52), 21301-6. 8. Geiss GK, Bumgarner RE, Birditt B, et al. (2008) Direct multiplexed measurement of gene expres-sion with color-coded probe pairs. Nat Biotechnol. 26(3), 317-325. 9. Stahl PL, Salmen F, Vickovic S et al. (2016) Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353(6294), 78-82. 10. WO2019 / 157445 11. Merritt et al. (2020) Multiplex digital spatial profiling of proteins and RNA in fixed tissue. Nature Biotechnology 38, 586-599. 12. Marshal et al. (2020) HyPR-Seq: Single-cell quantification of chosen RNAs via hybridization and sequencing of DNA probes. BioRxiv preprint doi: https: / / doi.org / 10.1101 / 2020.06.01.128314 (also published as Marshal et al., 2020, HyPR-Seq: Single-cell quantification of chosen RNAs via hybridization and sequencing of DNA probes. PNAS 117(52), 33404-33413). 13. Tropini et al. (2017) The Gut Microbiome: Connecting Spatial Organization to Function. Cell Host & Microbe 21(4), 433-442. 14. Liu et al. (2017) Low-abundant species facilitates specific spatial organization that promotes multispecies biofilm formation. Environ Microbiol. 19, 2893-905. 15. Liu et al. (2019) Deciphering links between bacterial interactions and spatial organization in multispecies biofilms. The ISME Journal 13, 3054-3066. 16. Tokuyasu, K. T. (1973) A technique for ultracryotomy of cell suspensions and tissues. , J. Cell Biol. 57, 551-65. 17. Guillot P.V., Xie S.Q., Hollinshead M., Pombo A. (2004) Fixation-induced redistribution of hyperphosphorylated RNA polymerase II in the nucleus of human cells. Exp. Cell Res. 295, 460-468. 18. Pombo A, Hollinshead M, Cook PR (1999) Bridging the resolution gap: Imaging the same transcription factories in cryosections by light and electron microscopy. J. Histochem. Cytochem. 47, 471-480. 19. McDowall et al. (1989) The structure of organelles of the endocytic pathway in hydrated cryosections of cultured cells, Eur. J. Cell Biol. 49, 281 - 294. 20. Chen et al. (2014) Nano-Dissection and Sequencing of DNA at Single Sub-Nuclear Structures. Small 10:3267. 21. Lucic V., et al. (2013) Cryo-electron tomography: The challenge of doing structural biology in situ. J Cell Biol 202 (3), 407. 22. https: / / www.protocols.io / view / Stellaris-RNA-FISH-Protocol-for-FrozenTissue-iwgs5v 23. https: / / www.protocols.io / view / exfish-tissue-slice-n6adhae 24. Branco, M. R. & Pombo, A. (2006) Intermingling of chromosome territories in interphase suggests role in translocations and transcription-dependent associations. PLoS Biol. 4, e138. 25. Xie, S.Q. et al. (2006) Splicing speckles are not reservoirs of RNA polymerase II, but contain an inactive form, phosphorylated on Serine2 residues of the C-terminal domain. Mol. Biol. Cell 17, 1723-1733. 26. Branco, M. R. (2006) Correlative microscopy using Tokuyasu cryosections: applications for immunogold labelling and in situ hybridisation. Cell Imaging (Methods Express Series)」, ed. D. Stephens, Scion Publishing Ltd. (Bloxham, UK), 201-217. 27. Ferrai, C., et al. (2010) Poised transcription factories prime silent uPA genes prior to activation. PLoS Biology 8, e1000270. 28. Macosko et al. (2015) Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell 161(5), 1202-1214. 29. Levsky, J.M. et al. (2002) Single-cell gene expression profiling. Science 297, 836-840. 30. Tripathi et al., 2015, RNA Fluorescence In Situ Hybridization in Cultured Mammalian Cells. In: Carmichael G. (eds) Regulatory Non-Coding RNAs. Methods in Molecular Biology (Methods and Protocols), vol 1206. Humana Press, New York, NY. https: / / doi.org / 10.1007 / 978-1-4939-1369-5_11. 31. Pombo, A. (2003) Cellular genomics: which genes are transcribed when and where? Trends Biochem. Sci. 28, 6-9. 32. Winick-Ng, W., et al., 2020, Cell-type specialization in the brain is encoded by specific long-range chromatin topologies, Biorxiv. https: / / doi.org / 10.1101 / 2020.04.02.020990. 33. Moeller et al. (2012) Proteomic analysis of mitotic RNA polymerase II complexes reveals novel in-teractors and association with proteins dysfunctional in disease. Mol. Cell. Proteomics 11(6):M111.011767. 34. https: / / www.protocols.io / view / Stellaris-RNA-FISH-Protocol-for-FrozenTissue-iwgs5v 35. https: / / www.protocols.io / view / exfish-tissue-slice-n6adhae 36. Domiguez and Kolodney (2005) Wild-type blocking polymerase chain reaction for detection of single nucleotide minority mutations from clinical specimens. Oncogene 24, 6830-6834. 37. Pombo, A., et al. (1994) Adenovirus replication and transcription sites are spatially separated in the nucleus of infected cells. EMBO J. 13(12), 5075-5085. 38. Femino, A.M., et al. (1998) Visualization of Single RNA Transcripts in Situ. Science, 280(5363). 39. Beliveau et al. (2017) In situ super-resolution imaging of genomic DNA with OligoSTORM and OligoDNA-PAINT. Methods Mol Biol. 1663, 231-252. 40. https: / / oligopaints.hms.harvard.edu / protocols 41. Dhanoa, J. K., Sethi, R. S., Verma, R., Arora, J. S., & Mukhopadhyay, C. S. (2018). Long non-coding RNA: its evolutionary relics and biological implications in mammals: a review. Journal of animal science and technology, 60, 25. 42. Catalanotto, C., Cogoni, C., & Zardo, G. (2016) MicroRNA in Control of Gene Expression: An Overview of Nuclear Functions. International journal of molecular sciences, 17(10), 1712. 43. van Heesch et al. (2019) The Translational Landscape of the Human Heart. Cell 178(1), 242-260. 44. Arnold et al., (2020) Diversity and Emerging Roles of Enhancer RNA in Regulation of Gene Ex-pression and Cell Fate. Front. Cell Dev. Biol. 7. 45. Kristensen et al. (2019) The biogenesis, biology and characterization of circular RNAs. Nat Rev Genet. 20(11), 675-691. 46. Akhter et al. (2018) Circular RNA and Alzheimer's Disease. Adv Exp Med Biol. 2018;1087:239-243. 47. Michelini, F., Pitchiaya, S., Vitelli, V. et al. (2017) Damage-induced lncRNAs control the DNA dam-age response through interaction with DDRNAs at individual double-strand breaks. Nat Cell Biol 19, 1400-1411. 48. Lloret-Llinares et al. (2016) Relationships between PROMPT and gene expression. RNA Biol. 13(1), 6-14. 49. Feng et al.,1995, The RNA component of human telomerase. Science 269(5228), 1236-1241. 50. Weibel, E.R. (1979) Stereological Methods: Practical Methods for Biological Morphometry. Vol. 1 Academic Press, London, UK. 51. Weibel, E.R. (1980) Stereological Methods: Theoretical Foundations. Vol. 2. Academic Press, London, UK. 52. Mateo et al. (2019) Visualizing DNA folding and RNA in embryos at single-cell resolution. Nature 568, 49-54. 53. Zhao, N., et al. (2019) A Genetically Encoded Probe for Imaging Nascent and Mature HA-tagged Proteins in Vivo. Nat. Commun. 10(10) 2947. 54. Beagrie et al. (2017). Complex multi-enhancer contacts captured by genome architecture mapping. Nature. 543, 519-524. 55. Markowski et al. (2020) GAMIBHEAR: whole-genome haplotype reconstruction from Genome Architecture Mapping data. bioRxiv 2020.01.30.927061, also published as Markowski et al. (2021) GAMIBHEAR: whole-genome haplotype reconstruction from Genome Architecture Mapping data. Bioinformatics 19, 3128-3135. 56. Nagano et al. (2017) Cell-cycle Dynamics of Chromosomal Organization at Single-Cell Resolution. Nature, 547(7661),61-67. 57. van Buggenum et al. (2018) Immuno-detection by sequencing enables large-scale high-dimensional phenotyping in cells. Nature Communications. 58. Buenrostro, J. D., et al. (2015) ATAC-Seq: A Method for Assay-ing Chromatin Accessibility Genome-Wide. Current protocols in molecular biology, 109, 21.29.1-21.29.9. 59. WO 2016156469 60. Beliveau et al. (2018) OligoMiner provides a rapid, flexible environment for the design of genome-scale oligonucleotide in situ hybridization probes. PNAS 115(10), 183-192 61. GeoMx TM product brochure, available on https: / / www.nanostring.com / products / geomx-digital-spatial-profiler / geomx-dsp 62. Ferrai et al. (2017) RNA polymerase II primes Polycomb-repressed developmental genes throughout terminal neuronal differentiation. Mol. Syst. Biol. 13, 946. 63. Xie SQ and Pombo A. (2006) Distribution of different phosphorylated forms of RNA polymerase II in relation to Cajal and PML bodies in human cells: an ultrastructural study. Histochem. Cell Biol. 125, 21-31. 64. Beagrie et al. (2020) Multiplex-GAM: genome-wide identification of chromatin contacts yields insights not captured by Hi-C. bioRxiv 2020.07.31.230284; doi: https: / / doi.org / 10.1101 / 2020.07.31.230284. 65. Kunath et al. (2005) Imprinted X-inactivation in extra-embryonic endoderm cell lines from mouse blastocysts. Development 132, 1649-1661. 66. http: / / repeatmasker.org / 67. SantaLucia J, Jr (1998) A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc Natl Acad Sci USA 95:1460-1465. 68. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Bowtie: An ultrafast memoryefficient short read aligner. Genome Biol 10:R25. 46. 69. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357-359. 47. 70. Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26:589-595 71. Dirks RM, Pierce NA (2003) A partition function algorithm for nucleic acid secondary structure including pseudoknots. J Comput Chem 24:1664-1677. 72. Marcais G, Kingsford C (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764-770. 73. Quinlan AR, Hall IM (2010) BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 26:841-842. 74. Kempfer, Rieke. (2020). Chromatin folding in health and disease: exploring allele-specific topologies and the reorganization due to the 16p11.2 deletion in autism-spectrum disorder. PhD thesis, Humboldt University of Berlin. doi:10.18452 / 22071. (See https: / / edoc.hu-berlin.de / handle / 18452 / 22777). 75. Takei et al. (2021) Integrated spatial genomics reveals global architecture of single nuclei. Nature, 590, 344-350. This disclosure provides, for example, the following: [Section 1] A method for detecting nucleic acids, (a) Prepare a compartment containing nucleic acids; (b) Hybridizing multiple single-stranded DNA oligonucleotide probes with nucleic acid molecules within the compartment; (c) Remove any single-stranded DNA oligonucleotide probes that are not specifically hybridized with any nucleic acid in the compartment from the compartment; (d) Identify single-stranded DNA-oligonucleotide probes that specifically hybridize with nucleic acid molecules in the compartment by probe sequencing or probe amplification; and thus determine the nucleic acids corresponding to the probes present in the compartment. Including the process, A method for nucleic acid detection and amplification that does not involve a series of probe hybridizations. [Section 2] The method of item 1, wherein the method does not include an RNA isolation step. [Section 3] The method of item 1 or 2, wherein the method does not include a cDNA production step. [Section 4] The nucleic acid is RNA, or preferably mRNA, using one of the methods described in items 1-3. [Section 5] The nucleic acid is ssDNA or dsDNA, preferably ssDNA, using any of the methods described in items 1 to 3. [Section 6] Any method according to items 1 to 5, wherein the nucleic acid-containing compartment is a eukaryotic cell, the nucleus of a eukaryotic cell, the cytoplasm of a eukaryotic cell, mitochondria, chloroplasts, exosomes, prokaryotic cells, a group of cells in a tissue, or a virus. [Section 7] The nucleic acid-containing section is sectioned before step (b) by any of the methods described in items 1 to 6. [Section 8] Any method according to items 1 to 7, wherein the nucleic acid-containing compartment or cells are biochemically separated or dissociated prior to step (b). [Section 9] Prior to step (b), the nucleic acid-containing section is fixed, and here, optionally, the nucleic acid-containing section is sectioned after fixation, using any of the methods described in item 1 to . [Section 10] A method according to items 1 to 7, wherein the nucleic acid-containing compartment is not fixed and is vitrified as desired. [Section 11] A single-stranded DNA oligonucleotide probe contains a pair of universal primer regions that include a target region complementary to the nucleotide sequence of the target nucleic acid located laterally. Herein, optionally, any method according to items 1 to 10 wherein the single-stranded DNA oligonucleotide probe further contains a unique molecular identifier. [Section 12] One of the methods described in items 1 to 11, wherein a single-stranded DNA oligonucleotide probe specifically hybridizes with multiple target nucleic acids present in the compartment. [Section 13] A single-stranded DNA oligonucleotide probe forms a library that specifically hybridizes substantially all mRNA, or optionally substantially all RNA, present in the compartment, by any of the methods in items 1-4 and 6-11. [Section 14] Any method according to items 1 to 13, wherein amplification of the bound single-stranded DNA oligonucleotide probe is performed between steps (c) and (d). [Section 15] A method according to any of items 1 to 14, wherein a bound single-stranded DNA oligonucleotide probe is identified by sequencing, preferably by next-generation sequencing. [Section 16] A method according to items 1 to 13, wherein the bound single-stranded DNA oligonucleotide probe is amplified, preferably by quantitative PCR. [Section 17] This includes further spatial mapping of the nucleic acids detected in the section, (i) Before step (b), section the compartments, freeze section or freeze grind, preferably freeze section, to obtain a collection of fragments, thus separating nucleic acid molecules from each other depending on localization; (ii) Identify single-stranded DNA-oligonucleotide probes that specifically hybridize with nucleic acid molecules within each fragment in step (d), and thus determine the presence or absence of RNA corresponding to the probe in each fragment; and (iii) Map nucleic acids within the compartment. Any method from items 1 to 16, including the process. [Section 18] The detection of nucleic acids, preferably RNA, is combined with the detection of at least one DNA locus within the compartment. - The presence or absence of at least one DNA locus in each fragment is determined, optionally, by sequencing, preferably by next-generation sequencing; and - Determine the co-separation of single-stranded DNA oligonucleotide probes that specifically hybridize to at least one DNA locus and RNA. The method of item 17, including an additional step. [Section 19] (a) Determination of gene expression in a single cell, cell group, or intracellular compartment; (b) Identification of RNA isoforms and allele-specific variants within the compartment; (c) Quantification of gene transcription; and (d) Identification of cell types in complex heterogeneous tissues; (e) Identification of endogenous and exogenous dsDNA and ssDNA within the compartment (f) Mapping of RNA positions in the compartment (g) Mapping of RNA and nucleic acid locus locations in the compartment (h) Mapping of RNA and protein positions in the compartment (i) Mapping of RNA, protein, and nucleic acid locus locations in the compartment Use of any of the methods described in items 1 through 18.
Claims
1. A method for detecting nucleic acids, (a) Prepare a nucleic acid-containing compartment; (b) Hybridizing multiple single-stranded DNA oligonucleotide probes with nucleic acid molecules within the compartment; (c) Remove any single-stranded DNA oligonucleotide probes that are not specifically hybridized with any nucleic acid in the compartment from the compartment; (d) Identify single-stranded DNA-oligonucleotide probes that specifically hybridize with nucleic acid molecules in the compartment by probe sequencing or probe amplification; and thus determine the nucleic acids corresponding to the probes present in the compartment. Including the process, Nucleic acid detection and amplification means that do not include a series of probe hybridizations; A method for facilitating probe amplification and gene detection, which does not involve the step of ligating two or more probes with each other or one or more probes with another type of probe.
2. The method according to claim 1, wherein the method does not include an RNA isolation step.
3. The method according to claim 1 or 2, wherein the method does not include a cDNA production step.
4. The method according to any one of claims 1 to 3, wherein the nucleic acid is RNA.
5. The method of claim 4, wherein the nucleic acid is mRNA.
6. The method according to any one of claims 1 to 3, wherein the nucleic acid is ssDNA or dsDNA.
7. The method according to any one of claims 1 to 6, wherein the nucleic acid-containing compartment is a eukaryotic cell, the nucleus of a eukaryotic cell, the cytoplasm of a eukaryotic cell, mitochondria, chloroplasts, exosomes, prokaryotic cells, a group of cells in a tissue, or a virus.
8. The method according to any one of claims 1 to 7, wherein the nucleic acid-containing section is sectioned before step (b).
9. The method according to any one of claims 1 to 8, wherein the nucleic acid-containing compartment or cells are biochemically separated or dissociated before step (b).
10. The method according to any one of claims 1 to 9, wherein the nucleic acid-containing compartment is fixed before step (b).
11. The method of claim 10, wherein the nucleic acid-containing compartment is sectioned after fixation.
12. The method according to any one of claims 1 to 8, wherein the nucleic acid-containing compartment is not fixed.
13. The method of claim 12, wherein the nucleic acid-containing compartment is vitrified.
14. The method according to any one of claims 1 to 13, wherein the single-stranded DNA oligonucleotide probe comprises a pair of universal primer regions that are complementary to the nucleotide sequence of the adjacent target nucleic acid.
15. The method of claim 14, wherein the single-stranded DNA oligonucleotide probe further comprises a unique molecular identifier.
16. A method according to any one of claims 1 to 15, wherein a single-stranded DNA oligonucleotide probe specifically hybridizes with multiple target nucleic acids present in a compartment.
17. The method according to any one of claims 1 to 5 and 7 to 15, wherein single-stranded DNA oligonucleotide probes form a library that covers at least 99% of the RNA present in a compartment.
18. The method according to any one of claims 1 to 17, wherein between steps (c) and (d), amplification of the bound single-stranded DNA oligonucleotide probe is performed.
19. A method according to any one of claims 1 to 18, wherein a bound single-stranded DNA oligonucleotide probe is identified by sequencing.
20. A method according to any one of claims 1 to 17, wherein a bound single-stranded DNA oligonucleotide probe is identified by amplification.
21. This includes further spatial mapping of the nucleic acids detected in the section, (i) Before step (b), the compartments are sectioned, freeze-sectioned, or freeze-ground to obtain a collection of fragments, thus separating nucleic acid molecules from each other depending on localization; (ii) Identify single-stranded DNA-oligonucleotide probes that specifically hybridize with nucleic acid molecules within each fragment in step (d), and thus determine the presence or absence of RNA corresponding to the probe in each fragment; and (iii) Map nucleic acids within the compartment. A method according to any one of claims 1 to 20, comprising the step.
22. The detection of nucleic acids is combined with the detection of at least one DNA locus within the compartment. - Determine the presence or absence of at least one DNA locus in each fragment; and - Determine the co-separation of single-stranded DNA oligonucleotide probes that specifically hybridize to at least one DNA locus and RNA. The method of claim 21, comprising an additional step.
23. (a) Determination of gene expression in a single cell, cell group, or intracellular compartment; (b) Identification of RNA isoforms and allele-specific variants within the compartment; (c) Quantification of gene transcription; and (d) Identification of cell types in complex heterogeneous tissues; (e) Identification of endogenous and exogenous dsDNA and ssDNA within the compartment (f) Mapping of RNA positions in the compartment (g) Mapping of RNA and nucleic acid locus locations in the compartment (h) Mapping of RNA and protein positions in the compartment (i) Mapping of RNA, protein, and nucleic acid locus locations in the compartment Use of any method of claims 1 to 22 for the purpose of